Open AI Real Time
Low-latency speech-to-speech API for real-time conversational AI.

About Open AI Real Time
OpenAI Realtime API: Fast, Multimodal Speech-to-Speech for Developers
OpenAI’s Realtime API is a public beta API that enables developers to build fast, low-latency speech-to-speech experiences-similar to ChatGPT’s Advanced Voice Mode-directly into their applications. Powered by the new GPT-4o model, it supports natural, expressive conversations with six preset voices and can handle both audio and text inputs and outputs.
Unlike previous approaches that required chaining separate ASR, LLM, and TTS models (often with lag and loss of expressiveness), the Realtime API streams audio in and out, allowing for natural, real-time conversations. The API can also handle interruptions smoothly, making interactions feel more human-like.
Audio input and output are also being added to the Chat Completions API (as gpt-4o-audio-preview
), supporting use cases that don’t require the ultra-low latency of the Realtime API.
Key Features
Low-latency Speech-to-Speech:
Real-time streaming audio input and output for natural conversations.Expressive, Multimodal Voices:
Six preset voices with improved range and emotion.Bidirectional WebSocket API:
Persistent connection for two-way, fast audio exchange.Function Calling:
Trigger actions or pull in external context during conversations.Audio & Text Inputs/Outputs:
Flexible multimodal support for diverse use cases.Interrupt Handling:
Users can interrupt the AI, just like in human conversation.Scalable Sessions:
No hard limit on simultaneous sessions (see docs for rate limits).Safety & Privacy:
Multiple layers of automated and human safety review; no training on your data without explicit permission.
Use Cases
Voice assistants and customer support agents
Language learning and educational role-play
Real-time AI coaching, accessibility tools, and translation
Interactive entertainment and outbound marketing calls
Model Selection
gpt-4o-realtime-preview:
For low-latency, real-time speech-to-speech.gpt-4o-audio-preview:
For audio input/output in the Chat Completions API.
Pricing
Text Input: $5 per 1M tokens
Text Output: $20 per 1M tokens
Audio Input: $100 per 1M tokens (~$0.06/min)
Audio Output: $200 per 1M tokens (~$0.24/min)
Cached Pricing: $2.50 per 1M cached text tokens, $20 per 1M cached audio tokens
Getting Started
Official Overview: Introducing the Realtime API
API Documentation: Realtime API Docs
Playground: Try the API in the OpenAI Playground
Reference Client: Reference Client (see announcement page for link)
Voices: Preset Voices
Partner Integrations:
Function Calling: Function Calling Guide
Usage Policies: OpenAI Usage Policies
Enterprise Privacy: Enterprise Privacy
Pricing Details: OpenAI Pricing
OpenAI’s Realtime API empowers developers to create next-generation, natural voice experiences-removing latency barriers and simplifying the stack for conversational AI across education, customer service, accessibility, and more.