Model family · 4 sizes
Sarvam: which size runs locally?
Sarvam comes in 4 sizes, from 2B to 105B. Bigger is generally more capable but needs more memory. Here is each size with its Q4_K_M weight, the memory it needs, and the hardware that runs it.
- Sizes
- 4
- Smallest
- 2B
- Largest
- 105B
- Runs from
- 8GB
The Sarvam lineup
"Needs" is the sourced minimum memory for Q4_K_M with a small context. Larger context needs more.
Which Sarvam fits your memory
Largest that fits: Sarvam-1 2B (2B), best case on Apple M1 (8GB).
Largest that fits: Sarvam-1 2B (2B), best case on Nvidia GeForce RTX 4080 (16GB).
Largest that fits: Sarvam-30B (30B), best case on Nvidia GeForce RTX 4090 (24GB). Comfortable up to Sarvam-M 24B (24B).
Largest that fits: Sarvam-30B (30B), best case on Nvidia GeForce RTX 5090 (32GB).
Best case means the most capable device at that size (usually a discrete GPU). A Mac at the same size sits roughly one rung lower; see the per-size breakdown on each memory budget page.
FAQ
Which Sarvam size should I run locally?
Pick the largest size your memory allows. On 8GB (best case) up to Sarvam-1 2B; On 16GB (best case) up to Sarvam-1 2B; On 24GB (best case) up to Sarvam-30B; On 32GB (best case) up to Sarvam-30B. Smaller sizes run faster and leave headroom for context.
What is the smallest Sarvam model?
Sarvam-1 2B at 2B parameters, about 1.55 GB on disk at Q4_K_M and roughly 4 GB of memory to run. It is the one to use on phones and 8 GB machines.
What is the largest Sarvam model and what does it need?
Sarvam-105B at 105B (mixture of experts), about 64.2 GB at Q4_K_M and roughly 80 GB of memory. It needs more than a typical 32 GB desktop; a high-memory Mac or multi-GPU rig.
Sources
Memory figures are estimates at Q4_K_M. See methodology.