Kyutai

Kyutai: Open-Science AI Lab Advancing Voice-Enabled Conversational AI

Kyutai is a Paris-based non-profit AI research lab committed to open science and democratizing artificial general intelligence. A key focus of Kyutai is developing cutting-edge voice AI technologies that enable smooth, natural, and expressive real-time conversations with artificial intelligence.

Kyutai’s flagship voice AI model, Moshi, represents a breakthrough in generative voice AI. Developed from scratch in just six months by a small expert team, Moshi supports full-duplex communication, allowing it to listen, process, and speak simultaneously with ultra-low latency (around 160-200 milliseconds). This enables fluid, human-like interactions with proper timing, overlapping speech, and rich emotional expression-qualities that traditional voice assistants lack.

Moshi’s text-to-speech capabilities are exceptional, supporting multiple expressive voices and dynamic roleplay scenarios, making it ideal as a coach, companion, or creative conversational partner. The model can run locally on unconnected devices, ensuring privacy and accessibility.

Kyutai is committed to open research and ecosystem development: the code, model weights, and audio codecs for Moshi will be freely shared with the global community. This openness allows researchers and developers to study, modify, extend, and specialize the voice AI for diverse applications, accelerating innovation in voice-based products and services.

In addition to Moshi, Kyutai released Hibiki, a groundbreaking voice technology for simultaneous speech-to-speech translation, further advancing voice AI capabilities.

Key Features

Real-Time Full-Duplex Voice AI:
Moshi processes audio input and generates speech output simultaneously for natural, fluid conversations.
Expressive Text-to-Speech:
Multiple voices with rich emotional nuance and interactive dialogue capabilities.
Local Deployment:
Runs safely on offline devices, preserving user privacy.
Open Source Commitment:
Model weights, code, and audio codecs are openly shared to foster community development.
Multimodal Capabilities:
Supports conversations about images and other modalities, expanding voice AI use cases.
Ultra-Low Latency:
Response times around 160 milliseconds, enabling near-instantaneous interactions.

Use Cases

Natural, human-like conversational agents and companions
Voice-enabled coaching and roleplay applications
Multilingual and simultaneous speech-to-speech translation
Research and development of voice AI models and applications
Privacy-sensitive voice AI running locally on devices

Getting Started

Website: kyutai.org
Moshi Demo & Code: Available for free testing and download on Kyutai’s website.
Research Papers: Detailed technical reports and open-source resources provided.

Kyutai is pioneering the future of voice AI by delivering the world’s first openly accessible, real-time, expressive conversational AI, empowering global research and innovation through transparency and collaboration.

About Kyutai

Kyutai: Open-Science AI Lab Advancing Voice-Enabled Conversational AI

Key Features

Use Cases

Getting Started