Whisper

Whisper: Robust, Multilingual Speech Recognition and Translation Model

Whisper is an open-source automatic speech recognition (ASR) system developed by OpenAI, designed to approach human-level robustness and accuracy in English and over 95 other languages. Trained on an exceptionally large and diverse dataset of 680,000 hours of multilingual and multitask supervised audio collected from the web, Whisper sets a new standard for robustness to accents, background noise, and technical language. Its simple, end-to-end Transformer architecture makes it highly versatile and easy to integrate into a wide range of applications.

Key Features

Multilingual Support: Transcribes speech in multiple languages and can translate non-English audio into English.
Robust Performance: Outperforms many specialized models in terms of zero-shot robustness across diverse datasets, with 50% fewer errors.
Large, Diverse Training Data: Trained on a vast dataset that includes a variety of accents, noise conditions, and technical language.
Simple Architecture: Uses an encoder-decoder Transformer, processing input audio in 30-second chunks as log-Mel spectrograms.
Versatile Task Handling: Capable of language identification, phrase-level timestamps, transcription, and translation within a single model.
Open Source: Models and inference code are publicly available for use, modification, and research.
Ease of Use: Designed for straightforward integration into applications, enabling developers to add voice interfaces with minimal effort.
No Fine-Tuning Required: Delivers strong performance out-of-the-box on a wide range of real-world audio.

Use Cases

Adding voice interfaces to software and mobile apps
Transcribing multilingual meetings, lectures, and interviews
Translating spoken content into English for accessibility and global reach
Enhancing accessibility tools for hearing-impaired users
Supporting research in speech processing and AI

Model Selection

Standard Whisper Models: Available in various sizes (small, medium, large, large-v2, large-v3) for different performance and resource needs.
Multilingual and English-Only Versions: Choose models specialized for English or capable of handling dozens of languages.
On-Premises or Cloud Deployment: Flexible deployment options to fit your infrastructure and privacy requirements.

Getting Started

Website: openai.com/index/whisper
Research Paper: Read the Paper
Model Card: View Model Card
GitHub Repository: View Code
Try Whisper: Online Demo

Whisper is a powerful foundation for building robust, multilingual voice interfaces and advancing research in speech technology. Its open-source nature and strong out-of-the-box performance make it accessible to developers, researchers, and organizations worldwide.

About Whisper

Whisper: Robust, Multilingual Speech Recognition and Translation Model

Key Features

Use Cases

Model Selection

Getting Started