New: the Voice AI Investors list release! Check it out

    DramaBox

    Git Repo
    resemble-ai

    Implements expressive prompt-driven text-to-speech and voice cloning using an IC-LoRA fine-tune of the LTX-2.3 audio model.

    About DramaBox

    DramaBox, developed by Resemble AI, is a highly expressive Text-to-Speech (TTS) model that bridges the gap between static voice synthesis and dynamic performance. Built on the LTX-2.3 framework, it allows for granular control over emotion, delivery, and non-verbal cues through simple text prompting.

    For the Non-Technical Reader

    Imagine you are a film director working with a voice actor. Instead of just giving them a script, you can provide stage directions like "she said with a heavy sigh" or "he laughed mid-sentence." DramaBox acts as that director. It doesn't just read words; it understands context and emotion. By providing a short 10-second clip of a voice, you can clone that specific sound and then use text prompts to make that voice whisper, laugh, or pause naturally. It transforms TTS from a robotic tool into a creative partner for storytelling, gaming, and content creation.

    For the Technical Reader

    DramaBox is an IC-LoRA fine-tune of the LTX-2.3 3.3B audio-only model. The architecture utilizes a DiT (Diffusion Transformer) approach where the LoRA is merged into the base for streamlined inference. Key technical components include:

    • Text Encoder: Uses Gemma-3-12b-it-bnb-4bit for high-level semantic understanding of prompts.

    • Hardware Requirements: Peak VRAM usage is approximately 24 GB, with a generation speed of ~2.5 seconds on an H100.

    • Control Mechanism: Prompt-driven conditioning where stage directions (outside quotes) and literal sounds (inside quotes like "[laugh]") guide the DiT's output.

    • Safety: Integrated with Resemble Perth, an imperceptible neural watermark that survives compression and editing.

    Why It Matters

    This release signifies a major step in the Open Source vs. Proprietary debate. By building on the Lightricks LTX-2.3 base, Resemble AI is providing the community with high-tier expressive capabilities that were previously locked behind expensive APIs. The inclusion of robust watermarking also addresses the growing industry concern regarding AI safety and voice authenticity, offering a template for responsible open-weights deployment.

    The Voice AI Space Lab Idea

    Why not build an "Interactive NPC Narrator" for tabletop RPGs? Using DramaBox, a Dungeon Master could type out a character's dialogue and include emotional cues like (nervous stutter) or (booming authoritative tone). The model could instantly generate the audio, allowing for a fully voiced, reactive world where the characters' emotions shift based on the players' decisions in real-time.

    Explore the repository: GitHub - DramaBox
    Try the demo: HuggingFace Space