Memory budget · 32 GB
Best local LLMs for 32GB
32GB is not a single ceiling. A 32GB Mac, a 32GB GPU and a 32GB Laptop each leave a different amount free for model weights, so the largest model you can run changes with the memory type, not just the number.
- Usable range
- 21–31 GB
- Models that fit
- 59
- Memory types
- 3
- Top pick
- 35B
What 32GB actually gives you
Usable figures are sourced per device (tap a card for the full profile). Verdicts below use Q4_K_M, the community-default quant.
Runs comfortably on the most capable 32GB setup (Nvidia GeForce RTX 5090 (32GB), ~31 GB usable) at ~22.3 GB. Check it against your exact device on its model page.
Models ranked for 32GB
- Mixtral 8x7B46.7B MoE · ~28.9 GB at Q4_K_M
- CRCommand R 35B35B · ~22.3 GB at Q4_K_M
- YiYi 1.5 34B34B · ~21.4 GB at Q4_K_M
- Qwen2.5 32B32B · ~22.1 GB at Q4_K_M
- Qwen3 32B32B · ~22 GB at Q4_K_M · Elo 1347
- DeepSeek-R1-Distill-Qwen 32B32B · ~22.1 GB at Q4_K_M
- Qwen2.5 Coder 32B32B · ~20.7 GB at Q4_K_M
- Granite 4.0 H Small32B MoE · ~20.4 GB at Q4_K_M
- Qwen3 30B-A3B30.5B MoE · ~20.7 GB at Q4_K_M · Elo 1383
- SSarvam-30B30B MoE · ~21.7 GB at Q4_K_M
- Gemma 2 27B27B · ~18.7 GB at Q4_K_M · Elo 1289
- Gemma 3 27B27B · ~18.6 GB at Q4_K_M · Elo 1366
- Mistral Small 3 24B24B · ~16.3 GB at Q4_K_M · Elo 1357
- SSarvam-M 24B24B · ~16.3 GB at Q4_K_M
- gpt-oss 20B21B MoE · ~13.2 GB at Q4_K_M
- DeepSeek-V2-Lite16B MoE · ~12.2 GB at Q4_K_M
- Phi-4 14B14B · ~10.8 GB at Q4_K_M · Elo 1256
- Qwen2.5 14B14B · ~10.7 GB at Q4_K_M
- Qwen3 14B14B · ~10.7 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 14B14B · ~10.7 GB at Q4_K_M
- Qwen2.5 Coder 14B14B · ~10.1 GB at Q4_K_M
- Mistral Nemo 12B12.2B · ~8.6 GB at Q4_K_M
- Gemma 3 12B12B · ~8.9 GB at Q4_K_M · Elo 1342
- Llama 3.2 Vision 11B10.7B · ~9 GB at Q4_K_M
- FNFalcon3 10B10B · ~7.5 GB at Q4_K_M
- Gemma 2 9B9B · ~7.3 GB at Q4_K_M · Elo 1266
- GLGLM-4 9B9B · ~7.3 GB at Q4_K_M
- Qwen2.5-VL 7B8.29B · ~7.1 GB at Q4_K_M
- Llama 3.1 8B8B · ~6.4 GB at Q4_K_M · Elo 1211
- Qwen3 8B8B · ~6.5 GB at Q4_K_M
- DeepSeek-R1-Distill-Llama 8B8B · ~6.4 GB at Q4_K_M
- Mistral 7B7B · ~5.8 GB at Q4_K_M · Elo 1149
- Qwen2.5 7B7B · ~6.1 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 7B7B · ~6.1 GB at Q4_K_M
- Qwen2.5 Coder 7B7B · ~5.8 GB at Q4_K_M
- Gemma 3 4B4B · ~3.8 GB at Q4_K_M · Elo 1303
- Qwen3 4B4B · ~3.8 GB at Q4_K_M
- Phi-3.5-mini 3.8B3.82B · ~3.7 GB at Q4_K_M
- Phi-4-mini 3.8B3.8B · ~3.8 GB at Q4_K_M
- Qwen2.5-VL 3B3.75B · ~4.4 GB at Q4_K_M
- Qwen2.5 3B3.09B · ~3.3 GB at Q4_K_M
- Qwen2.5 Coder 3B3.09B · ~3 GB at Q4_K_M
- Llama 3.2 3B3B · ~3.2 GB at Q4_K_M · Elo 1166
- SmolLM3 3B3B · ~3 GB at Q4_K_M
- Gemma 2 2B2.61B · ~2.9 GB at Q4_K_M
- Granite 3.1 2B2.53B · ~2.8 GB at Q4_K_M
- SSarvam-1 2B2B · ~2.7 GB at Q4_K_M
- SmolLM2 1.7B1.7B · ~2.2 GB at Q4_K_M · Elo 1114
- Qwen3 1.7B1.7B · ~2.4 GB at Q4_K_M
- Qwen2.5 1.5B1.54B · ~2.2 GB at Q4_K_M
- Qwen2.5 Coder 1.5B1.54B · ~2 GB at Q4_K_M
- TLTinyLlama 1.1B1.1B · ~1.8 GB at Q4_K_M
- Llama 3.2 1B1B · ~1.8 GB at Q4_K_M · Elo 1110
- Gemma 3 1B1B · ~1.8 GB at Q4_K_M
- Qwen3 0.6B0.6B · ~1.5 GB at Q4_K_M
- Qwen2.5 0.5B0.494B · ~1.5 GB at Q4_K_M
- Qwen2.5 Coder 0.5B0.494B · ~1.4 GB at Q4_K_M
- SmolLM2 360M0.362B · ~1.2 GB at Q4_K_M
- SmolLM2 135M0.135B · ~1 GB at Q4_K_M
Each chip links to the full breakdown for that model on a real 32GB device. "Tight" means it fits but with little headroom, close other apps.
The ceiling, per memory type
Apple M5 (32GB) (~21 GB usable)
Runs up to Gemma 3 27B (27B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~21 GB it can give a model.
Nvidia GeForce RTX 5090 (32GB) (~31 GB usable)
Runs up to Command R 35B (35B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~31 GB it can give a model.
32GB RAM Laptop (CPU/iGPU only) (~28 GB usable)
Runs up to Command R 35B (35B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~28 GB it can give a model.
Too large for any 32GB device
FAQ
How much of 32GB can a model actually use?
It depends on the memory type. Apple unified memory: about 21 GB (Apple M5 (32GB)); GPU VRAM: about 31 GB (Nvidia GeForce RTX 5090 (32GB)); System RAM (CPU only): about 28 GB (32GB RAM Laptop (CPU/iGPU only)). The rest is reserved for the OS, display and runtime overhead.
What is the best local LLM for 32GB?
Command R 35B (35B) is the strongest model that runs comfortably at Q4_K_M on the most capable 32GB setup (Nvidia GeForce RTX 5090 (32GB), ~31 GB usable). On a tighter 32GB device the ceiling is lower, shown per row above.
Why does a 32GB GPU fit a bigger model than a 32GB Mac?
A discrete GPU gives almost all of its VRAM to the model (leave ~1 GB for the driver). Apple Silicon shares one unified pool with macOS, so roughly 66% is available to the GPU for weights. Same 32GB sticker, different usable budget, so the model ceiling differs.
Sources
Memory figures are estimates at Q4_K_M with a small context. See methodology.