ai-coustics launches Tyto to identify audio issues before they impact voice agents

    sotto

    Git Repo
    tomoima525

    Transcribes speech locally on macOS using Whisper and MLX, cleaning text with local LLM before pasting into applications.

    About sotto

    Sotto is a lightweight, fully-local dictation tool designed specifically for Apple Silicon macOS users. It bridges the gap between raw speech-to-text and polished writing by combining Whisper-based transcription with local LLM-driven text cleanup.

    1. For the Non-Technical Reader

    Imagine a "smart" version of your keyboard that listens when you hold a specific key. Unlike standard dictation that often captures every "um" and "uh," Sotto acts like a professional editor. It listens to your voice, removes the verbal clutter, fixes the punctuation, and types the finished thought directly into whatever app you are using—be it Slack, an email, or a Word doc. Because it runs entirely on your Mac's hardware, your voice recordings never travel to the cloud, ensuring total privacy for sensitive work.

    2. For the Technical Reader

    Sotto leverages the MLX framework for high-performance inference on Apple Silicon. The architecture follows a two-stage pipeline: Whisper for Automatic Speech Recognition (ASR) with English/Japanese auto-detection, followed by a small local LLM for post-processing (cleaning up filler words and formatting). Key technical specs include:

    • Memory: Approximately 4.5 GB resident RAM while running.
    • Latency: ~2–3 seconds for a 10-second utterance after initial warmup.
    • Privacy: Zero-persistence architecture; audio and transcripts reside only in RAM and never hit the disk or the network.
    • Integration: Uses global event taps for hotkey monitoring (Right Option/Command) and accessibility APIs for simulated clipboard injection (Command+V).

    3. Why It Matters

    As privacy regulations tighten, the demand for Edge AI solutions grows. Sotto demonstrates that high-quality, LLM-enhanced voice interfaces no longer require expensive cloud APIs or high-latency network calls. It provides a blueprint for "invisible" AI tools that respect user privacy while maintaining the performance levels expected of native desktop applications, effectively bypassing the "privacy tax" often associated with modern AI features.

    4. The Voice AI Space Lab Idea

    You could build a "Privacy-First Medical Scribe" for doctors. By utilizing Sotto's local-only processing, a practitioner could dictate patient notes during a consultation without violating HIPAA or data residency concerns. The local LLM could be further fine-tuned to recognize medical terminology, instantly formatting spoken observations into structured SOAP notes that are pasted directly into a secure Electronic Health Record (EHR) system.

    Explore the repository here: https://github.com/tomoima525/sotto