MLX-qwen3TTS-with-frontend-for-voicecloning
Full-stack application for local, Apple Silicon-based text-to-speech and zero-shot voice cloning using the Qwen3-TTS model.
About MLX-qwen3TTS-with-frontend-for-voicecloning
This repository offers a full-stack application for Text-to-Speech (TTS) generation and voice cloning, leveraging the Qwen3-TTS model on Apple Silicon (MLX).
For the Non-Technical Reader
Imagine you have a digital voice double. This tool lets you create one using just a 10-second recording of your voice. Then, you can type in any text, and it will be spoken in your cloned voice. Think of it as a personalized voice assistant or a way to create unique audio content without having to record everything yourself. This could be used to generate personalized audiobooks, create custom voice prompts for smart home devices, or even provide a voice for those who have lost theirs.
For the Technical Reader
The application features a React + TypeScript + Tailwind CSS frontend and a FastAPI backend serving the MLX-optimized Qwen3-TTS model. The model is specifically the "Base" version to enable zero-shot voice cloning. The browser records audio in .webm format, which the backend converts to 16-bit PCM at 24kHz using ffmpeg for compatibility with the MLX audio encoders. The initial run downloads the Qwen3-TTS model (~1.7GB) from Hugging Face. It requires macOS with Apple Silicon (M1/M2/M3/M4), Python 3.10+ (3.13 recommended), Node.js & npm. The model runs locally on Apple Silicon using the MLX framework.
Why It Matters
This project democratizes voice cloning technology by making it accessible on personal hardware with open-source components. Running locally enhances privacy, as audio data doesn't need to be sent to external servers. The use of MLX optimizes performance on Apple Silicon, potentially reducing computational costs compared to cloud-based alternatives. The project's reliance on open-source tools fosters community contribution and customization.
The Collaborative CTA
How can we improve the voice cloning process to make it more robust against variations in recording quality and background noise, and what innovative applications beyond content creation can we envision for personalized voice models?
GitHub Repository: MLX-qwen3TTS-with-frontend-for-voicecloning
#VoiceAI #TTS