Voice AI News: May 18β24, 2026
Key Voice AI updates: Sierra's $1B raise at $16B val, Hark's $700M, Google Gemini agents, OpenAI voice acquisition, ElevenLabs in Spotify audiobooks, new TTS mo
Sierra Raises Nearly $1 Billion At $16B Valuation
Enterprise conversational AI platform Sierra closed a ~$1 billion funding round led by Tiger and Google s GV, valuing the company at close to $16 billion as it expands its agent platform domestically and internationally.
Hark Secures $700M Series A For Universal AI Interface
Brett Adcock s secretive agentic AI startup Hark raised $700 million and plans to release its first multi-modal models this summer to power a personal AI platform.
Vapi Raises Series B To Challenge Voice AI Rivals
Vapi_AI completed a Series B funding round, positioning it to compete directly with Retell and ElevenLabs in the enterprise voice agent market. Weekly update on Voice and Video AI! πAs usual, there are many model and platform upgrades that you can check out in detail in the newsletter, but here are my top highlights:π° raised its Series B to keep up the fight with Retell and ElevenLabs on the Voice AIβ¦ Gustavo Garcia (@anarchyco)
Weekly update on Voice and Video AI! π
— Gustavo Garcia (@anarchyco) May 18, 2026
As usual, there are many model and platform upgrades that you can check out in detail in the newsletter, but here are my top highlights:
π° @Vapi_AI raised its Series B to keep up the fight with Retell and ElevenLabs on the Voice AIβ¦
OpenAI Acquires Voice Cloning Team Behind Weights.gg
OpenAI reportedly acquired the voice cloning team behind Weights.gg, signaling a continued push to deepen its proprietary voice capabilities.
Sarvam AI Eyes $300M Round At $1.5B Valuation
Indian voice-focused enterprise AI startup Sarvam AI, which specializes in voice-based solutions, is reportedly on track for a $300 million funding round at a $1.5 billion valuation.
OpenEvidence Launches First Native Speech-To-Speech Medical AI
OpenEvidence s Voice Mode is now live as the first native speech-to-speech clinical decision support interface, available free to users alongside an enterprise expansion with Cedars-Sinai.
Spotify Integrates ElevenLabs TTS For Audiobook Creation
Spotify launched an ElevenLabs-powered audiobook creation tool within its Spotify for Authors platform, expanding support to 10 additional languages with more expressive, human-like voice models.
Google Beam Debuts Conversational AI Video Agent
Google demonstrated Sophie, a lifelike conversational AI agent for its Beam video calling platform designed to make AI interactions feel more personable and responsive.
Google Gemini Omni Adds Native Audio And Avatar Voice
Gemini Omni is a natively multimodal model trained on text, code, audio, images, and video that also lets users create videos with personal digital avatars with their own voice and likeness. (, )
Google Launches Gemini Spark Personal Voice Agent
Gemini Spark is a $100/month personal AI agent capable of managing Gmail, workflows, and complex tasks continuously without requiring a device to remain open. (, )
ElevenLabs Brings Voice AI Into University Classrooms
ElevenLabs launched free voice AI access for professors and an interactive Einstein voice agent for immersive learning through a partnership with CMG Worldwide and the Hebrew University of Jerusalem.
Amazon Alexa Plus Debuts AI-Generated Podcast Feature
Alexa Plus now offers Alexa Podcasts, in which a pair of AI-generated hosts with synthesized voices break down any topic of a user s choice.
Speechify SIMBA 3.0 Enters Global TTS Top 10
Speechify s SIMBA 3.0 model broke into the global top 10 on the Artificial Analysis TTS leaderboard, outranking models from Google, OpenAI, and ElevenLabs.
StepAudio 2.5 Unifies ASR, TTS, And Real-Time Voice
StepAudio 2.5 is a new unified audio-language foundation model that matches or exceeds specialized systems in ASR, TTS, and real-time spoken interaction through task-tailored RLHF.
Apple Upgrades VoiceOver And Real-Time Captions With AI
Apple announced iOS 27 accessibility updates powered by Apple Intelligence, including improved VoiceOver image recognition, on-device speech recognition for auto-generated captions, and enhanced voice control. (, )
Mega-ASR Advances Adverse-Condition Speech Recognition
Researchers introduced Mega-ASR, which achieves a 45.69% word error rate on the VOiCES R4-B-F benchmark versus 54.01% for prior state-of-the-art systems by scaling real-world acoustic simulation.
MΔori Data Sovereignty Drives Indigenous TTS Model
Researchers developed indigenous-owned MΔori text-to-speech AI voice models using fewer than eight hours of recordings, achieving a 6.78% word error rate while centering data sovereignty principles.
Comprehensive 37-Model TTS Benchmark Published
A developer released the first known comprehensive TTS benchmark covering 37 models with cross-platform Windows and Mac results, providing a new community reference for evaluating voice synthesis quality.
Streaming ASR Systems Face Technical Benchmark Study
A new comparative study from Smallest.ai benchmarks streaming ASR systems across speed, accuracy, and resilience under real-world conditions to help developers navigate trade-offs.
AI Used To Resurrect Voices Of Dead Pilots
Voice cloning technology is being applied to reconstruct the voices of deceased pilots, raising both practical utility and ethical questions around consent and memorial use.
Audio App Huxe Shuts Down Despite AI Tailwinds
Huxe, an audio generation app founded by former NotebookLM developers, has shut down, illustrating the difficulty single-modality consumer audio apps face as broader AI platforms rapidly absorb similar features.