New: the Voice AI Investors list release! Check it out

    LLM and Memory Researcher | Bangalore

    Smallest

    Engineering
    Full-time
    On-site
    Bengaluru

    Posted on 4/18/2026

    Job Description

    LLM and Memory Researcher — Bangalore

    Team: Core AI Research
    Location: Bangalore, India
    Type: Full-time
    Experience: No fixed bar — depth and ownership matter more than years

    About Smallest.ai

    Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.

    We work on small, fast, controllable language models designed to run in production — not just in demos.

    Our focus areas include:

    • Small Language Models (SLMs)

    • Long- and short-term memory systems

    • Streaming inference

    • Agent architectures that reason, adapt, and improve over time

    We optimize for: Smaller models. Faster tokens. Real memory.

    Role Overview

    As an LLM and Memory Researcher, you will design and train models that can:

    • Think under latency constraints

    • Use memory effectively across time

    • Adapt from interaction history

    • Operate in streaming environments

    • Power real-world agents and workflows

    You will work across model architecture, training, memory systems, and deployment.

    This role sits at the intersection of research, systems, and product intelligence.

    Core Research Areas

    A. Language Model Architecture

    • Small language model design (1B–8B class)

    • Dense and Mixture-of-Experts variants

    • Fast decoding architectures

    • KV-cache optimization and compression

    • Long-context and sliding-window attention

    B. Memory Systems

    • Short-term working memory

    • Long-term persistent memory

    • Retrieval-augmented memory (RAG)

    • Structured memory representations

    • Episodic and semantic memory modeling

    C. Training and Adaptation

    • Pretraining and continual training strategies

    • Instruction tuning and alignment

    • Preference learning and RLHF-style methods

    • Online adaptation and feedback loops

    • Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)

    D. Reasoning and Planning

    • Multi-step reasoning under latency budgets

    • Tool use and function calling

    • Agent memory orchestration

    • Fast-think vs slow-think model architectures

    • Self-reflection and corrective reasoning

    E. Streaming Inference

    • Token-level streaming input and output

    • Interruptible generation

    • Partial context updates

    • Low-latency response formation

    What You Will Build

    • Novel memory architectures for LLMs

    • Training pipelines for small and efficient language models

    • Memory-aware inference engines

    • Evaluation frameworks for reasoning, memory retention, and hallucination

    • Research prototypes deployed into real production agents

    • Your work will directly affect live systems running at scale.

    Required Skills

    • Strong foundation in machine learning and deep learning

    • Deep experience with large or small language models

    • Strong understanding of:

      • Transformer architectures

      • Attention mechanisms

      • Positional encoding and context modeling

    • Proficiency with PyTorch

    • Experience training or fine-tuning LLMs end-to-end

    Strong Plus

    • Experience with long-context modeling

    • Memory or retrieval systems beyond vanilla RAG

    • Reinforcement learning or RLHF pipelines

    • Agent frameworks or orchestration layers

    • Experience with model quantization and inference optimization

    • Publications, open-source work, or deep independent research

    What We Care About

    • First-principles thinking

    • Clear experimental design

    • Measurable gains, not vague improvements

    • Understanding trade-offs between quality, latency, and cost

    • Research that survives production constraints

    We value people who ask:

    “What happens after 10 million conversations?”

    Not just: “What score does this get on a benchmark?”

    Why Smallest.ai

    • Work on real deployed LLM systems

    • Build memory systems few companies attempt

    • Direct ownership from research to production

    • High autonomy and fast execution culture

    • Competitive compensation and meaningful ESOPs

    • Deep focus on small, fast, and efficient AI

    How to Apply

    It would be nice if you can also share:

    • Resume

    • Research papers, GitHub repositories, or technical writing

    • Examples of models you trained or systems you built

    • A short note on what aspect of LLM or memory research excites you most

    Email: hetvi@smallest.ai