June 4, 2026

Voice AI News: May 18–24, 2026

Key Voice AI updates: Sierra's $1B raise at $16B val, Hark's $700M, Google Gemini agents, OpenAI voice acquisition, ElevenLabs in Spotify audiobooks, new TTS mo

Sierra Raises Nearly $1 Billion At $16B Valuation

Enterprise conversational AI platform Sierra closed a ~$1 billion funding round led by Tiger and Google s GV, valuing the company at close to $16 billion as it expands its agent platform domestically and internationally.

cnbc.com

Hark Secures $700M Series A For Universal AI Interface

Brett Adcock s secretive agentic AI startup Hark raised $700 million and plans to release its first multi-modal models this summer to power a personal AI platform.

techcrunch.com

Vapi Raises Series B To Challenge Voice AI Rivals

Vapi_AI completed a Series B funding round, positioning it to compete directly with Retell and ElevenLabs in the enterprise voice agent market. Weekly update on Voice and Video AI! 🎉As usual, there are many model and platform upgrades that you can check out in detail in the newsletter, but here are my top highlights:💰 raised its Series B to keep up the fight with Retell and ElevenLabs on the Voice AI… Gustavo Garcia (@anarchyco)

x.com

Weekly update on Voice and Video AI! 🎉

As usual, there are many model and platform upgrades that you can check out in detail in the newsletter, but here are my top highlights:

💰 @Vapi_AI raised its Series B to keep up the fight with Retell and ElevenLabs on the Voice AI…
— Gustavo Garcia (@anarchyco) May 18, 2026

OpenAI Acquires Voice Cloning Team Behind Weights.gg

OpenAI reportedly acquired the voice cloning team behind Weights.gg, signaling a continued push to deepen its proprietary voice capabilities.

x.com

Sarvam AI Eyes $300M Round At $1.5B Valuation

Indian voice-focused enterprise AI startup Sarvam AI, which specializes in voice-based solutions, is reportedly on track for a $300 million funding round at a $1.5 billion valuation.

techcrunch.com

OpenEvidence Launches First Native Speech-To-Speech Medical AI

OpenEvidence s Voice Mode is now live as the first native speech-to-speech clinical decision support interface, available free to users alongside an enterprise expansion with Cedars-Sinai.

businesswire.com

Spotify Integrates ElevenLabs TTS For Audiobook Creation

Spotify launched an ElevenLabs-powered audiobook creation tool within its Spotify for Authors platform, expanding support to 10 additional languages with more expressive, human-like voice models.

techcrunch.com

Google Beam Debuts Conversational AI Video Agent

Google demonstrated Sophie, a lifelike conversational AI agent for its Beam video calling platform designed to make AI interactions feel more personable and responsive.

theverge.com

Google Gemini Omni Adds Native Audio And Avatar Voice

Gemini Omni is a natively multimodal model trained on text, code, audio, images, and video that also lets users create videos with personal digital avatars with their own voice and likeness. (, )

techcrunch.com

Google Launches Gemini Spark Personal Voice Agent

Gemini Spark is a $100/month personal AI agent capable of managing Gmail, workflows, and complex tasks continuously without requiring a device to remain open. (, )

techcrunch.com

ElevenLabs Brings Voice AI Into University Classrooms

ElevenLabs launched free voice AI access for professors and an interactive Einstein voice agent for immersive learning through a partnership with CMG Worldwide and the Hebrew University of Jerusalem.

elevenlabs.io

Amazon Alexa Plus Debuts AI-Generated Podcast Feature

Alexa Plus now offers Alexa Podcasts, in which a pair of AI-generated hosts with synthesized voices break down any topic of a user s choice.

theverge.com

Speechify SIMBA 3.0 Enters Global TTS Top 10

Speechify s SIMBA 3.0 model broke into the global top 10 on the Artificial Analysis TTS leaderboard, outranking models from Google, OpenAI, and ElevenLabs.

prweb.com

StepAudio 2.5 Unifies ASR, TTS, And Real-Time Voice

StepAudio 2.5 is a new unified audio-language foundation model that matches or exceeds specialized systems in ASR, TTS, and real-time spoken interaction through task-tailored RLHF.

arxiv.org

Apple Upgrades VoiceOver And Real-Time Captions With AI

Apple announced iOS 27 accessibility updates powered by Apple Intelligence, including improved VoiceOver image recognition, on-device speech recognition for auto-generated captions, and enhanced voice control. (, )

techcrunch.com

Mega-ASR Advances Adverse-Condition Speech Recognition

Researchers introduced Mega-ASR, which achieves a 45.69% word error rate on the VOiCES R4-B-F benchmark versus 54.01% for prior state-of-the-art systems by scaling real-world acoustic simulation.

arxiv.org

Māori Data Sovereignty Drives Indigenous TTS Model

Researchers developed indigenous-owned Māori text-to-speech AI voice models using fewer than eight hours of recordings, achieving a 6.78% word error rate while centering data sovereignty principles.

spectrum.ieee.org

Comprehensive 37-Model TTS Benchmark Published

A developer released the first known comprehensive TTS benchmark covering 37 models with cross-platform Windows and Mac results, providing a new community reference for evaluating voice synthesis quality.

reddit.com

Streaming ASR Systems Face Technical Benchmark Study

A new comparative study from Smallest.ai benchmarks streaming ASR systems across speed, accuracy, and resilience under real-world conditions to help developers navigate trade-offs.

smallest.ai

AI Used To Resurrect Voices Of Dead Pilots

Voice cloning technology is being applied to reconstruct the voices of deceased pilots, raising both practical utility and ethical questions around consent and memorial use.

techcrunch.com

Audio App Huxe Shuts Down Despite AI Tailwinds

Huxe, an audio generation app founded by former NotebookLM developers, has shut down, illustrating the difficulty single-modality consumer audio apps face as broader AI platforms rapidly absorb similar features.

techcrunch.com

Voice AI News: May 18–24, 2026

Sierra Raises Nearly $1 Billion At $16B Valuation

Hark Secures $700M Series A For Universal AI Interface

Vapi Raises Series B To Challenge Voice AI Rivals

OpenAI Acquires Voice Cloning Team Behind Weights.gg

Sarvam AI Eyes $300M Round At $1.5B Valuation

OpenEvidence Launches First Native Speech-To-Speech Medical AI

Spotify Integrates ElevenLabs TTS For Audiobook Creation

Google Beam Debuts Conversational AI Video Agent

Google Gemini Omni Adds Native Audio And Avatar Voice

Google Launches Gemini Spark Personal Voice Agent

ElevenLabs Brings Voice AI Into University Classrooms

Amazon Alexa Plus Debuts AI-Generated Podcast Feature

Speechify SIMBA 3.0 Enters Global TTS Top 10

StepAudio 2.5 Unifies ASR, TTS, And Real-Time Voice

Apple Upgrades VoiceOver And Real-Time Captions With AI

Mega-ASR Advances Adverse-Condition Speech Recognition

Māori Data Sovereignty Drives Indigenous TTS Model

Comprehensive 37-Model TTS Benchmark Published

Streaming ASR Systems Face Technical Benchmark Study

AI Used To Resurrect Voices Of Dead Pilots

Audio App Huxe Shuts Down Despite AI Tailwinds

More Roundups

Voice AI News — Jul 6 – Jul 13, 2026

Voice AI News — Jun 29 – Jul 6, 2026

Voice AI News — Jun 22–29, 2026