SmartDJ
Implements a declarative audio editing framework using an audio language model and diffusion-based editor for automated sound manipulation.
About SmartDJ
SmartDJ is an innovative framework designed to bridge the gap between complex audio engineering and intuitive user intent, leveraging Audio Language Models (ALMs) for declarative audio editing.
For the Non-Technical Reader
Think of SmartDJ as a professional sound engineer who lives inside your computer and follows your instructions perfectly. Instead of spending hours learning how to use complex software with hundreds of knobs and sliders, you simply tell the system what you want the final result to sound like. Whether you are a content creator wanting to enhance a voiceover or a musician looking to tweak a track, SmartDJ handles the "how" so you can focus on the "what."
For the Technical Reader
The architecture of SmartDJ is built on a dual-component system designed for high-fidelity audio manipulation. The SmartDJ-Planer acts as an Audio Language Model (ALM) that interprets natural language instructions to generate an execution plan. This plan is then carried out by the SmartDJ-Editor, a diffusion-based model optimized for precise audio transformations. Currently, the project has released the inference code for the diffusion editor, with the ALM planer and dataset synthesis pipeline expected soon. This research, slated for ICLR 2026, represents a significant step in applying generative diffusion and language modeling to the domain of audio signal processing.
Why It Matters
This project marks a shift from imperative editing (where the user must perform every action) to declarative editing (where the user defines the goal). By open-sourcing these models, the Penn Waves Lab is lowering the barrier to entry for high-quality audio production, potentially disrupting expensive proprietary suites and enabling more privacy-conscious, local-first audio workflows.
The Voice AI Space Lab Idea
Imagine building an "Adaptive Ambient Room." Using SmartDJ, you could create a system that listens to the mood of a room via a microphone and automatically edits a live stream of ambient music—adding rain sounds if it is cozy, or increasing the tempo and brightness if it detects a party atmosphere—all through natural language prompts generated by an LLM in the background.
Explore the repository here: https://github.com/penn-waves-lab/SmartDJ