Open AI Real Time

    Open AI Real Time

    Tech
    Speech To Speech
    Real time

    Low-latency speech-to-speech API for real-time conversational AI.

    Open AI Real Time banner

    About Open AI Real Time

    OpenAI Realtime API: Fast, Multimodal Speech-to-Speech for Developers

    OpenAI’s Realtime API is a public beta API that enables developers to build fast, low-latency speech-to-speech experiences-similar to ChatGPT’s Advanced Voice Mode-directly into their applications. Powered by the new GPT-4o model, it supports natural, expressive conversations with six preset voices and can handle both audio and text inputs and outputs.

    Unlike previous approaches that required chaining separate ASR, LLM, and TTS models (often with lag and loss of expressiveness), the Realtime API streams audio in and out, allowing for natural, real-time conversations. The API can also handle interruptions smoothly, making interactions feel more human-like.

    Audio input and output are also being added to the Chat Completions API (as gpt-4o-audio-preview), supporting use cases that don’t require the ultra-low latency of the Realtime API.

    Key Features

    • Low-latency Speech-to-Speech:
      Real-time streaming audio input and output for natural conversations.

    • Expressive, Multimodal Voices:
      Six preset voices with improved range and emotion.

    • Bidirectional WebSocket API:
      Persistent connection for two-way, fast audio exchange.

    • Function Calling:
      Trigger actions or pull in external context during conversations.

    • Audio & Text Inputs/Outputs:
      Flexible multimodal support for diverse use cases.

    • Interrupt Handling:
      Users can interrupt the AI, just like in human conversation.

    • Scalable Sessions:
      No hard limit on simultaneous sessions (see docs for rate limits).

    • Safety & Privacy:
      Multiple layers of automated and human safety review; no training on your data without explicit permission.

    Use Cases

    • Voice assistants and customer support agents

    • Language learning and educational role-play

    • Real-time AI coaching, accessibility tools, and translation

    • Interactive entertainment and outbound marketing calls

    Model Selection

    • gpt-4o-realtime-preview:
      For low-latency, real-time speech-to-speech.

    • gpt-4o-audio-preview:
      For audio input/output in the Chat Completions API.

    Pricing

    • Text Input: $5 per 1M tokens

    • Text Output: $20 per 1M tokens

    • Audio Input: $100 per 1M tokens (~$0.06/min)

    • Audio Output: $200 per 1M tokens (~$0.24/min)

    • Cached Pricing: $2.50 per 1M cached text tokens, $20 per 1M cached audio tokens

    Getting Started

    OpenAI’s Realtime API empowers developers to create next-generation, natural voice experiences-removing latency barriers and simplifying the stack for conversational AI across education, customer service, accessibility, and more.