LLM and Memory Researcher — Bangalore

Team: Core AI Research
Location: Bangalore, India
Type: Full-time
Experience: No fixed bar — depth and ownership matter more than years

About Smallest.ai

Smallest.ai builds real-time intelligence systems that operate under strict latency, cost, and reliability constraints.

We work on small, fast, controllable language models designed to run in production — not just in demos.

Our focus areas include:

Small Language Models (SLMs)
Long- and short-term memory systems
Streaming inference
Agent architectures that reason, adapt, and improve over time

We optimize for: Smaller models. Faster tokens. Real memory.

Role Overview

As an LLM and Memory Researcher, you will design and train models that can:

Think under latency constraints
Use memory effectively across time
Adapt from interaction history
Operate in streaming environments
Power real-world agents and workflows

You will work across model architecture, training, memory systems, and deployment.

This role sits at the intersection of research, systems, and product intelligence.

Core Research Areas

A. Language Model Architecture

Small language model design (1B–8B class)
Dense and Mixture-of-Experts variants
Fast decoding architectures
KV-cache optimization and compression
Long-context and sliding-window attention

B. Memory Systems

Short-term working memory
Long-term persistent memory
Retrieval-augmented memory (RAG)
Structured memory representations
Episodic and semantic memory modeling

C. Training and Adaptation

Pretraining and continual training strategies
Instruction tuning and alignment
Preference learning and RLHF-style methods
Online adaptation and feedback loops
Parameter-efficient fine-tuning (LoRA, adapters, partial freeze)

D. Reasoning and Planning

Multi-step reasoning under latency budgets
Tool use and function calling
Agent memory orchestration
Fast-think vs slow-think model architectures
Self-reflection and corrective reasoning

E. Streaming Inference

Token-level streaming input and output
Interruptible generation
Partial context updates
Low-latency response formation

What You Will Build

Novel memory architectures for LLMs
Training pipelines for small and efficient language models
Memory-aware inference engines
Evaluation frameworks for reasoning, memory retention, and hallucination
Research prototypes deployed into real production agents
Your work will directly affect live systems running at scale.

Required Skills

Strong foundation in machine learning and deep learning
Deep experience with large or small language models
Strong understanding of:
- Transformer architectures
- Attention mechanisms
- Positional encoding and context modeling
Proficiency with PyTorch
Experience training or fine-tuning LLMs end-to-end

Strong Plus

Experience with long-context modeling
Memory or retrieval systems beyond vanilla RAG
Reinforcement learning or RLHF pipelines
Agent frameworks or orchestration layers
Experience with model quantization and inference optimization
Publications, open-source work, or deep independent research

What We Care About

First-principles thinking
Clear experimental design
Measurable gains, not vague improvements
Understanding trade-offs between quality, latency, and cost
Research that survives production constraints

We value people who ask:

“What happens after 10 million conversations?”

Not just: “What score does this get on a benchmark?”

Why Smallest.ai

Work on real deployed LLM systems
Build memory systems few companies attempt
Direct ownership from research to production
High autonomy and fast execution culture
Competitive compensation and meaningful ESOPs
Deep focus on small, fast, and efficient AI

How to Apply

It would be nice if you can also share:

Resume
Research papers, GitHub repositories, or technical writing
Examples of models you trained or systems you built
A short note on what aspect of LLM or memory research excites you most

Email: hetvi@smallest.ai

LLM and Memory Researcher | Bangalore

Job Description