New: the Voice AI Investors list release! Check it out

    Speech Research Scientist | Bangalore

    Smallest

    Engineering
    Full-time
    On-site
    Bengaluru

    Posted on 4/5/2026

    Job Description

    Speech Research Scientist — Bangalore

    Team: Core Speech Research
    Location: Bangalore, India
    Type: Full-time
    Experience: No fixed bar — skill and depth matter more than years

    About Smallest.ai

    Smallest.ai builds real-time voice intelligence systems operating at enterprise scale.
    We work across speech recognition, speech generation, and speech-to-speech systems with a strong focus on low latency, multilingual intelligence, and production reliability.

    Our goal is simple: Smaller models. Lower latency. Higher intelligence.

    Role Overview

    As a Speech Research Scientist, you will work on the core speech stack at Smallest.ai.

    You will research, train, evaluate, and productionize models across:

    • Speech to Text (ASR)

    • Text to Speech (TTS)

    • Speech to Speech (S2S)

    This is not an offline research role.
    You will work at the intersection of research, engineering, and real-world deployment.

    Core Research Areas

    A. Automatic Speech Recognition (ASR)

    • Streaming and non-streaming ASR

    • Multilingual and code-mixed speech

    • Low-latency decoding and inference

    • Long-context speech modeling

    • Robustness to accents, noise, and telephony audio

    B. Text to Speech (TTS)

    • Neural TTS and generative speech models

    • Controllable speech generation including emotion, style, pitch, rate, and prosody

    • Speaker adaptation and voice cloning

    • Stability, expressiveness, and naturalness optimization

    C. Speech to Speech (S2S)

    • End-to-end speech-to-speech models

    • Streaming voice-to-voice architectures

    • Codec-based or token-based speech representations

    • Low-latency conversational speech generation

    D. Multilingual and Speaker Intelligence

    • Multilingual speaker understanding

    • Cross-lingual speaker embeddings

    • Speaker identification and verification

    • Accent and dialect robustness

    • Low-resource language modeling

    E. Multi-Speaker Modeling

    • Multi-speaker diarization

    • Overlapping speech detection and separation

    • Speaker-aware ASR pipelines

    • Joint diarization and recognition modeling

    F. Duplex Conversational Models

    • Full-duplex speech models

      • Simultaneous listening and speaking

      • Interruption handling and barge-in detection

    • Half-duplex conversational models

      • Turn detection

      • Latency-aware response generation

    What You Will Build

    • Novel model architectures and training strategies

    • Large-scale multilingual datasets and pipelines

    • Evaluation frameworks for WER, DER, MOS, latency, and RTF

    • Streaming inference systems for real-time speech

    • Research prototypes converted into production models

    • Your work will directly power live customer-facing systems.

    Required Skills

    • Strong background in speech processing or deep learning

    • Deep expertise in at least one of the following:

      • ASR

      • TTS

      • Speech-to-speech systems

    • Strong understanding of modern architectures:

      • Transformers, Conformers, diffusion or flow-based models

    • Experience with CTC, Transducer, attention-based decoding

    • Strong proficiency in PyTorch

    • Experience training models at scale

    Strong Plus

    • Multilingual speech experience (Indic or European languages)

    • Speaker embeddings and diarization systems

    • Parameter-efficient fine-tuning methods such as LoRA

    • Streaming inference optimization

    • Deployment experience using ONNX, TensorRT, or Triton

    • Publications, open-source contributions, or serious personal research projects

    What We Care About

    • Depth over buzzwords

    • Clean experiments and reproducibility

    • Strong benchmarking discipline

    • Latency, memory, and throughput awareness

    • Research that translates into shipped systems

    We value people who ask:

    “How does this behave at scale?”

    Not just: “Does this work on the dataset?”

    Why Smallest.ai

    • Work on real-world speech systems at scale

    • Direct ownership from research to production

    • Close collaboration with founders and infrastructure teams

    • Fast iteration cycles with minimal bureaucracy

    • Competitive compensation and meaningful ESOPs

    • One of the deepest speech research stacks in India

    How to Apply

    It would be nice if you can also share:

    • Resume

    • Research papers, GitHub repositories, or technical writing

    • Examples of models you trained or systems you built

    • A short note on what aspect of LLM or memory research excites you most

    Email: hetvi@smallest.ai