New: the Voice AI Investors list release! Check it out

    ComfyUI-Qwen-TTS

    Git Repo
    flybirdxx

    ComfyUI custom nodes for speech synthesis, voice cloning, and voice design using Alibaba's Qwen3-TTS model.

    About ComfyUI-Qwen-TTS

    This repository provides ComfyUI custom nodes for speech synthesis, voice cloning, and voice design, based on the open-source Qwen3-TTS project.

    For the Non-Technical Reader:

    Imagine you want a computer to speak with a specific voice. This tool lets you do that in a few ways. You can either design a voice from scratch by describing it (like "a gentle female voice with a high pitch"), clone a voice from a short audio clip (5-15 seconds), or simply convert text to speech using high-quality voices. Think of it as having a voice actor in your computer, ready to perform any script you give it, and even mimic other actors' voices!

    For the Technical Reader:

    The repository offers ComfyUI nodes leveraging Qwen3-TTS for speech synthesis. It supports both 12Hz and 25Hz speech tokenizer architectures. Key features include:

    • Voice Cloning: Zero-shot voice cloning from short reference audio.
    • Voice Design: Custom voice creation based on natural language descriptions.
    • Attention Mechanism Selection: Options include sageattn, flashattn, sdpa, and eager, with auto-detection and graceful fallback.
    • Memory Management: Optional model unloading after generation to free GPU memory.
    • Multilingual Support: Native support for 10 languages.

    The tool also supports loading custom fine-tuned models and speakers. It includes nodes such as VoiceClonePromptNode and DialogueInferenceNode. Generation parameters like topp, topk, temperature, and repetition penalty are adjustable in all TTS nodes.

    Why It Matters:

    This project democratizes access to advanced voice AI. By providing an open-source implementation of Qwen3-TTS within ComfyUI, it lowers the barrier to entry for researchers, developers, and hobbyists. The ability to run this locally also offers privacy advantages compared to cloud-based solutions.

    The "Voice AI Space Lab" Idea:

    Imagine building a "talking book" application where users can select from a library of cloned voices (e.g., famous actors, family members) to read aloud their favorite ebooks. Or, create a multi-character interactive story where each character's voice is uniquely designed using the voice design feature.

    The Collaborative CTA:

    How can we ensure that voice cloning technologies are used ethically and responsibly, preventing misuse while still fostering innovation and creativity in the Voice AI space?

    GitHub Repository: ComfyUI-Qwen-TTS

    #VoiceAI #TTS