ComfyUI-Qwen-TTS

This repository provides ComfyUI custom nodes for speech synthesis, voice cloning, and voice design, based on the open-source Qwen3-TTS project.

For the Non-Technical Reader:

Imagine you want a computer to speak with a specific voice. This tool lets you do that in a few ways. You can either design a voice from scratch by describing it (like "a gentle female voice with a high pitch"), clone a voice from a short audio clip (5-15 seconds), or simply convert text to speech using high-quality voices. Think of it as having a voice actor in your computer, ready to perform any script you give it, and even mimic other actors' voices!

For the Technical Reader:

The repository offers ComfyUI nodes leveraging Qwen3-TTS for speech synthesis. It supports both 12Hz and 25Hz speech tokenizer architectures. Key features include:

Voice Cloning: Zero-shot voice cloning from short reference audio.
Voice Design: Custom voice creation based on natural language descriptions.
Attention Mechanism Selection: Options include sageattn, flashattn, sdpa, and eager, with auto-detection and graceful fallback.
Memory Management: Optional model unloading after generation to free GPU memory.
Multilingual Support: Native support for 10 languages.

The tool also supports loading custom fine-tuned models and speakers. It includes nodes such as VoiceClonePromptNode and DialogueInferenceNode. Generation parameters like topp, topk, temperature, and repetition penalty are adjustable in all TTS nodes.

Why It Matters:

This project democratizes access to advanced voice AI. By providing an open-source implementation of Qwen3-TTS within ComfyUI, it lowers the barrier to entry for researchers, developers, and hobbyists. The ability to run this locally also offers privacy advantages compared to cloud-based solutions.

The "Voice AI Space Lab" Idea:

Imagine building a "talking book" application where users can select from a library of cloned voices (e.g., famous actors, family members) to read aloud their favorite ebooks. Or, create a multi-character interactive story where each character's voice is uniquely designed using the voice design feature.

The Collaborative CTA:

How can we ensure that voice cloning technologies are used ethically and responsibly, preventing misuse while still fostering innovation and creativity in the Voice AI space?

GitHub Repository: ComfyUI-Qwen-TTS

#VoiceAI #TTS

About ComfyUI-Qwen-TTS

For the Non-Technical Reader:

For the Technical Reader:

Why It Matters:

The "Voice AI Space Lab" Idea:

The Collaborative CTA: