ComfyUI-Qwen-TTS
ComfyUI custom nodes for speech synthesis, voice cloning, and voice design using Alibaba's Qwen3-TTS model.
About ComfyUI-Qwen-TTS
This repository provides ComfyUI custom nodes for speech synthesis, voice cloning, and voice design, based on the open-source Qwen3-TTS project.
For the Non-Technical Reader:
Imagine you want a computer to speak with a specific voice. This tool lets you do that in a few ways. You can either design a voice from scratch by describing it (like "a gentle female voice with a high pitch"), clone a voice from a short audio clip (5-15 seconds), or simply convert text to speech using high-quality voices. Think of it as having a voice actor in your computer, ready to perform any script you give it, and even mimic other actors' voices!
For the Technical Reader:
The repository offers ComfyUI nodes leveraging Qwen3-TTS for speech synthesis. It supports both 12Hz and 25Hz speech tokenizer architectures. Key features include:
- Voice Cloning: Zero-shot voice cloning from short reference audio.
- Voice Design: Custom voice creation based on natural language descriptions.
- Attention Mechanism Selection: Options include sageattn, flashattn, sdpa, and eager, with auto-detection and graceful fallback.
- Memory Management: Optional model unloading after generation to free GPU memory.
- Multilingual Support: Native support for 10 languages.
The tool also supports loading custom fine-tuned models and speakers. It includes nodes such as VoiceClonePromptNode and DialogueInferenceNode. Generation parameters like topp, topk, temperature, and repetition penalty are adjustable in all TTS nodes.
Why It Matters:
This project democratizes access to advanced voice AI. By providing an open-source implementation of Qwen3-TTS within ComfyUI, it lowers the barrier to entry for researchers, developers, and hobbyists. The ability to run this locally also offers privacy advantages compared to cloud-based solutions.
The "Voice AI Space Lab" Idea:
Imagine building a "talking book" application where users can select from a library of cloned voices (e.g., famous actors, family members) to read aloud their favorite ebooks. Or, create a multi-character interactive story where each character's voice is uniquely designed using the voice design feature.
The Collaborative CTA:
How can we ensure that voice cloning technologies are used ethically and responsibly, preventing misuse while still fostering innovation and creativity in the Voice AI space?
GitHub Repository: ComfyUI-Qwen-TTS
#VoiceAI #TTS