Memory budget · 16 GB
Best local LLMs for 16GB
16GB is not a single ceiling. A 16GB Mac, a 16GB GPU and a 16GB Laptop each leave a different amount free for model weights, so the largest model you can run changes with the memory type, not just the number.
- Usable range
- 10.5–15 GB
- Models that fit
- 45
- Memory types
- 3
- Top pick
- 21B
What 16GB actually gives you
Usable figures are sourced per device (tap a card for the full profile). Verdicts below use Q4_K_M, the community-default quant.
Runs comfortably on the most capable 16GB setup (Nvidia GeForce RTX 4080 (16GB), ~15 GB usable) at ~13.2 GB. Check it against your exact device on its model page.
Models ranked for 16GB
- gpt-oss 20B21B MoE · ~13.2 GB at Q4_K_M
- DeepSeek-V2-Lite16B MoE · ~12.2 GB at Q4_K_M
- Phi-4 14B14B · ~10.8 GB at Q4_K_M · Elo 1256
- Qwen2.5 14B14B · ~10.7 GB at Q4_K_M
- Qwen3 14B14B · ~10.7 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 14B14B · ~10.7 GB at Q4_K_M
- Qwen2.5 Coder 14B14B · ~10.1 GB at Q4_K_M
- Mistral Nemo 12B12.2B · ~8.6 GB at Q4_K_M
- Gemma 3 12B12B · ~8.9 GB at Q4_K_M · Elo 1342
- Llama 3.2 Vision 11B10.7B · ~9 GB at Q4_K_M
- FNFalcon3 10B10B · ~7.5 GB at Q4_K_M
- Gemma 2 9B9B · ~7.3 GB at Q4_K_M · Elo 1266
- GLGLM-4 9B9B · ~7.3 GB at Q4_K_M
- Qwen2.5-VL 7B8.29B · ~7.1 GB at Q4_K_M
- Llama 3.1 8B8B · ~6.4 GB at Q4_K_M · Elo 1211
- Qwen3 8B8B · ~6.5 GB at Q4_K_M
- DeepSeek-R1-Distill-Llama 8B8B · ~6.4 GB at Q4_K_M
- Mistral 7B7B · ~5.8 GB at Q4_K_M · Elo 1149
- Qwen2.5 7B7B · ~6.1 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 7B7B · ~6.1 GB at Q4_K_M
- Qwen2.5 Coder 7B7B · ~5.8 GB at Q4_K_M
- Gemma 3 4B4B · ~3.8 GB at Q4_K_M · Elo 1303
- Qwen3 4B4B · ~3.8 GB at Q4_K_M
- Phi-3.5-mini 3.8B3.82B · ~3.7 GB at Q4_K_M
- Phi-4-mini 3.8B3.8B · ~3.8 GB at Q4_K_M
- Qwen2.5-VL 3B3.75B · ~4.4 GB at Q4_K_M
- Qwen2.5 3B3.09B · ~3.3 GB at Q4_K_M
- Qwen2.5 Coder 3B3.09B · ~3 GB at Q4_K_M
- Llama 3.2 3B3B · ~3.2 GB at Q4_K_M · Elo 1166
- SmolLM3 3B3B · ~3 GB at Q4_K_M
- Gemma 2 2B2.61B · ~2.9 GB at Q4_K_M
- Granite 3.1 2B2.53B · ~2.8 GB at Q4_K_M
- SSarvam-1 2B2B · ~2.7 GB at Q4_K_M
- SmolLM2 1.7B1.7B · ~2.2 GB at Q4_K_M · Elo 1114
- Qwen3 1.7B1.7B · ~2.4 GB at Q4_K_M
- Qwen2.5 1.5B1.54B · ~2.2 GB at Q4_K_M
- Qwen2.5 Coder 1.5B1.54B · ~2 GB at Q4_K_M
- TLTinyLlama 1.1B1.1B · ~1.8 GB at Q4_K_M
- Llama 3.2 1B1B · ~1.8 GB at Q4_K_M · Elo 1110
- Gemma 3 1B1B · ~1.8 GB at Q4_K_M
- Qwen3 0.6B0.6B · ~1.5 GB at Q4_K_M
- Qwen2.5 0.5B0.494B · ~1.5 GB at Q4_K_M
- Qwen2.5 Coder 0.5B0.494B · ~1.4 GB at Q4_K_M
- SmolLM2 360M0.362B · ~1.2 GB at Q4_K_M
- SmolLM2 135M0.135B · ~1 GB at Q4_K_M
Each chip links to the full breakdown for that model on a real 16GB device. "Tight" means it fits but with little headroom, close other apps.
The ceiling, per memory type
Apple M5 (16GB) (~10.5 GB usable)
Runs up to Mistral Nemo 12B (12.2B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~10.5 GB it can give a model.
Nvidia GeForce RTX 4080 (16GB) (~15 GB usable)
Runs up to gpt-oss 20B (21B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~15 GB it can give a model.
16GB RAM Laptop (CPU/iGPU only) (~12 GB usable)
Runs up to Qwen2.5 Coder 14B (14B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~12 GB it can give a model.
16GB phones & tablets
Phones report 16GB too, but iOS/Android reserve more and the runtimes differ. Their usable pool is smaller:
Too large for any 16GB device
FAQ
How much of 16GB can a model actually use?
It depends on the memory type. Apple unified memory: about 10.5 GB (Apple M5 (16GB)); GPU VRAM: about 15 GB (Nvidia GeForce RTX 4080 (16GB)); System RAM (CPU only): about 12 GB (16GB RAM Laptop (CPU/iGPU only)). The rest is reserved for the OS, display and runtime overhead.
What is the best local LLM for 16GB?
gpt-oss 20B (21B) is the strongest model that runs comfortably at Q4_K_M on the most capable 16GB setup (Nvidia GeForce RTX 4080 (16GB), ~15 GB usable). On a tighter 16GB device the ceiling is lower, shown per row above.
Why does a 16GB GPU fit a bigger model than a 16GB Mac?
A discrete GPU gives almost all of its VRAM to the model (leave ~1 GB for the driver). Apple Silicon shares one unified pool with macOS, so roughly 66% is available to the GPU for weights. Same 16GB sticker, different usable budget, so the model ceiling differs.
Sources
Memory figures are estimates at Q4_K_M with a small context. See methodology.