New: the Voice AI Investors list release! Check it out
    Git Repo
    pulp-vision

    Suppresses background speakers and noise in real-time using a speech enhancement model designed for Voice AI and conversational systems.

    About Hush

    Hush is an open-source speech enhancement model built specifically for Voice AI, featuring real-time background speaker suppression. It addresses the "cocktail party problem" for AI agents, ensuring they can isolate a single voice even in chaotic environments.

    For the Non-Technical Reader

    Think of Hush as a high-definition "audio filter" for phone-based AI. While traditional noise cancellation can remove the sound of a fan or a car engine, it often gets confused when another person is talking in the background. Hush is designed to recognize the primary caller and treat background voices like static noise, effectively "muting" the rest of the room. This means voice assistants can finally work reliably in crowded restaurants, busy kitchens, or noisy streets.

    For the Technical Reader

    Hush is a fully causal model designed for production-grade Voice AI pipelines. Key specifications include:

    • Model Size: 8 MB, optimized for edge or server-side CPU deployment.

    • Latency: Under 1 ms processing per 10 ms of audio with ~20 ms algorithmic latency (zero lookahead).

    • Architecture: Trained on 10,000+ hours of speech and noise, specifically targeting the "competing speaker" problem that generic models often ignore.

    • Compatibility: 16 kHz native sample rate, making it a drop-in solution for G.711, WebRTC, and SIP telephony systems.

    Explore the code on GitHub and try the interactive audio demo or HuggingFace model card.

    Why It Matters

    Most open-source speech models treat human speech as a signal to preserve, which causes them to fail when background chatter is present. By treating background speech as noise, Hush bridges a critical gap between expensive proprietary solutions and the open-source community. Its CPU-only requirement significantly lowers the operational costs for startups building high-volume voice agents without the need for costly GPU clusters.

    The "Voice AI Space Lab" Idea

    Imagine building a "Focus-First" Drive-Thru Agent. Using Hush, you could deploy an AI ordering system that ignores the music playing in the customer's car, the kids shouting in the back seat, and the traffic outside, ensuring the order is captured perfectly the first time, every time.