sotto
Transcribes speech locally on macOS using Whisper and MLX, cleaning text with local LLM before pasting into applications.
About sotto
Sotto is a lightweight, fully-local dictation tool designed specifically for Apple Silicon macOS users. It bridges the gap between raw speech-to-text and polished writing by combining Whisper-based transcription with local LLM-driven text cleanup.
1. For the Non-Technical Reader
Imagine a "smart" version of your keyboard that listens when you hold a specific key. Unlike standard dictation that often captures every "um" and "uh," Sotto acts like a professional editor. It listens to your voice, removes the verbal clutter, fixes the punctuation, and types the finished thought directly into whatever app you are using—be it Slack, an email, or a Word doc. Because it runs entirely on your Mac's hardware, your voice recordings never travel to the cloud, ensuring total privacy for sensitive work.
2. For the Technical Reader
Sotto leverages the MLX framework for high-performance inference on Apple Silicon. The architecture follows a two-stage pipeline: Whisper for Automatic Speech Recognition (ASR) with English/Japanese auto-detection, followed by a small local LLM for post-processing (cleaning up filler words and formatting). Key technical specs include:
- Memory: Approximately 4.5 GB resident RAM while running.
- Latency: ~2–3 seconds for a 10-second utterance after initial warmup.
- Privacy: Zero-persistence architecture; audio and transcripts reside only in RAM and never hit the disk or the network.
- Integration: Uses global event taps for hotkey monitoring (Right Option/Command) and accessibility APIs for simulated clipboard injection (Command+V).
3. Why It Matters
As privacy regulations tighten, the demand for Edge AI solutions grows. Sotto demonstrates that high-quality, LLM-enhanced voice interfaces no longer require expensive cloud APIs or high-latency network calls. It provides a blueprint for "invisible" AI tools that respect user privacy while maintaining the performance levels expected of native desktop applications, effectively bypassing the "privacy tax" often associated with modern AI features.
4. The Voice AI Space Lab Idea
You could build a "Privacy-First Medical Scribe" for doctors. By utilizing Sotto's local-only processing, a practitioner could dictate patient notes during a consultation without violating HIPAA or data residency concerns. The local LLM could be further fine-tuned to recognize medical terminology, instantly formatting spoken observations into structured SOAP notes that are pasted directly into a secure Electronic Health Record (EHR) system.
Explore the repository here: https://github.com/tomoima525/sotto