New: the Voice AI Investors list release! Check it out

    LuxTTS

    Git Repo
    ysharma3501

    LuxTTS is a rapid TTS voice cloning model that achieves high-quality, realistic speech generation at speeds exceeding 150x realtime.

    About LuxTTS

    This repository introduces LuxTTS, a lightweight text-to-speech (TTS) model designed for high-quality voice cloning and realistic speech generation.

    For the Non-Technical Reader

    Imagine you have a digital voice double that can speak in your unique tone and style. LuxTTS is like a rapid voice cloning tool that creates a realistic copy of your voice from just a short audio sample. It's so fast it can generate speech 150 times faster than real-time. This means you can quickly create personalized voice messages, audiobooks, or even virtual assistants that sound just like you or anyone else you choose.

    For the Technical Reader

    LuxTTS is a distilled version of the ZipVoice architecture, optimized for speed and efficiency. Key features include:

    • Voice Cloning: Achieves state-of-the-art voice cloning performance comparable to larger models.

    • High Clarity: Generates speech at a 48kHz sampling rate.

    • Speed: Reaches speeds of 150x realtime on a single GPU.

    • Efficiency: Fits within 1GB of VRAM.

    The model uses a custom 48kHz vocoder and an improved sampling technique. It supports MPS and is currently implemented in float32, with plans to support float16 for further speed improvements. The code and model are licensed under the Apache-2.0 license.

    Why It Matters

    LuxTTS's efficiency and open-source nature democratize access to high-quality voice cloning technology. Its small memory footprint allows it to run on readily available hardware, reducing costs and enabling broader adoption. The Apache-2.0 license promotes collaboration and innovation within the TTS community.

    The "Voice AI Space Lab" Idea

    Imagine building a "Voice Mirror" – a fun application where users can speak into their phone, and the app instantly responds in the voice of a famous historical figure, using LuxTTS for real-time voice cloning and text-to-speech conversion.

    The Collaborative CTA

    How can we leverage LuxTTS's speed and efficiency to create real-time, interactive voice experiences that were previously impossible? What are the ethical considerations of rapid voice cloning, and how can we ensure responsible use? Share your thoughts and ideas!

    GitHub Repository

    Hugging Face Demo

    Colab Demo