We supercharged the Voice AI Newsroom πŸ”Š

    Voice AI News: May 18–24, 2026

    Key Voice AI updates: Sierra's $1B raise at $16B val, Hark's $700M, Google Gemini agents, OpenAI voice acquisition, ElevenLabs in Spotify audiobooks, new TTS mo

    Sierra Raises Nearly $1 Billion At $16B Valuation

    Enterprise conversational AI platform Sierra closed a ~$1 billion funding round led by Tiger and Google s GV, valuing the company at close to $16 billion as it expands its agent platform domestically and internationally.

    Hark Secures $700M Series A For Universal AI Interface

    Brett Adcock s secretive agentic AI startup Hark raised $700 million and plans to release its first multi-modal models this summer to power a personal AI platform.

    Vapi Raises Series B To Challenge Voice AI Rivals

    Vapi_AI completed a Series B funding round, positioning it to compete directly with Retell and ElevenLabs in the enterprise voice agent market. Weekly update on Voice and Video AI! πŸŽ‰As usual, there are many model and platform upgrades that you can check out in detail in the newsletter, but here are my top highlights:πŸ’° raised its Series B to keep up the fight with Retell and ElevenLabs on the Voice AI… Gustavo Garcia (@anarchyco)

    OpenAI Acquires Voice Cloning Team Behind Weights.gg

    OpenAI reportedly acquired the voice cloning team behind Weights.gg, signaling a continued push to deepen its proprietary voice capabilities.

    Sarvam AI Eyes $300M Round At $1.5B Valuation

    Indian voice-focused enterprise AI startup Sarvam AI, which specializes in voice-based solutions, is reportedly on track for a $300 million funding round at a $1.5 billion valuation.

    OpenEvidence Launches First Native Speech-To-Speech Medical AI

    OpenEvidence s Voice Mode is now live as the first native speech-to-speech clinical decision support interface, available free to users alongside an enterprise expansion with Cedars-Sinai.

    Spotify Integrates ElevenLabs TTS For Audiobook Creation

    Spotify launched an ElevenLabs-powered audiobook creation tool within its Spotify for Authors platform, expanding support to 10 additional languages with more expressive, human-like voice models.

    Google Beam Debuts Conversational AI Video Agent

    Google demonstrated Sophie, a lifelike conversational AI agent for its Beam video calling platform designed to make AI interactions feel more personable and responsive.

    Google Gemini Omni Adds Native Audio And Avatar Voice

    Gemini Omni is a natively multimodal model trained on text, code, audio, images, and video that also lets users create videos with personal digital avatars with their own voice and likeness. (, )

    Google Launches Gemini Spark Personal Voice Agent

    Gemini Spark is a $100/month personal AI agent capable of managing Gmail, workflows, and complex tasks continuously without requiring a device to remain open. (, )

    ElevenLabs Brings Voice AI Into University Classrooms

    ElevenLabs launched free voice AI access for professors and an interactive Einstein voice agent for immersive learning through a partnership with CMG Worldwide and the Hebrew University of Jerusalem.

    Amazon Alexa Plus Debuts AI-Generated Podcast Feature

    Alexa Plus now offers Alexa Podcasts, in which a pair of AI-generated hosts with synthesized voices break down any topic of a user s choice.

    Speechify SIMBA 3.0 Enters Global TTS Top 10

    Speechify s SIMBA 3.0 model broke into the global top 10 on the Artificial Analysis TTS leaderboard, outranking models from Google, OpenAI, and ElevenLabs.

    StepAudio 2.5 Unifies ASR, TTS, And Real-Time Voice

    StepAudio 2.5 is a new unified audio-language foundation model that matches or exceeds specialized systems in ASR, TTS, and real-time spoken interaction through task-tailored RLHF.

    Apple Upgrades VoiceOver And Real-Time Captions With AI

    Apple announced iOS 27 accessibility updates powered by Apple Intelligence, including improved VoiceOver image recognition, on-device speech recognition for auto-generated captions, and enhanced voice control. (, )

    Mega-ASR Advances Adverse-Condition Speech Recognition

    Researchers introduced Mega-ASR, which achieves a 45.69% word error rate on the VOiCES R4-B-F benchmark versus 54.01% for prior state-of-the-art systems by scaling real-world acoustic simulation.

    Māori Data Sovereignty Drives Indigenous TTS Model

    Researchers developed indigenous-owned Māori text-to-speech AI voice models using fewer than eight hours of recordings, achieving a 6.78% word error rate while centering data sovereignty principles.

    Comprehensive 37-Model TTS Benchmark Published

    A developer released the first known comprehensive TTS benchmark covering 37 models with cross-platform Windows and Mac results, providing a new community reference for evaluating voice synthesis quality.

    Streaming ASR Systems Face Technical Benchmark Study

    A new comparative study from Smallest.ai benchmarks streaming ASR systems across speed, accuracy, and resilience under real-world conditions to help developers navigate trade-offs.

    AI Used To Resurrect Voices Of Dead Pilots

    Voice cloning technology is being applied to reconstruct the voices of deceased pilots, raising both practical utility and ethical questions around consent and memorial use.

    Audio App Huxe Shuts Down Despite AI Tailwinds

    Huxe, an audio generation app founded by former NotebookLM developers, has shut down, illustrating the difficulty single-modality consumer audio apps face as broader AI platforms rapidly absorb similar features.

    More Roundups