Memory budget · 24 GB
Best local LLMs for 24GB
24GB is not a single ceiling. A 24GB Mac and a 24GB GPU each leave a different amount free for model weights, so the largest model you can run changes with the memory type, not just the number.
- Usable range
- 16–23 GB
- Models that fit
- 58
- Memory types
- 2
- Top pick
- 32B
What 24GB actually gives you
Usable figures are sourced per device (tap a card for the full profile). Verdicts below use Q4_K_M, the community-default quant.
Runs comfortably on the most capable 24GB setup (Nvidia GeForce RTX 4090 (24GB), ~23 GB usable) at ~20.4 GB. Check it against your exact device on its model page.
Models ranked for 24GB
- CRCommand R 35B35B · ~22.3 GB at Q4_K_M
- YiYi 1.5 34B34B · ~21.4 GB at Q4_K_M
- Qwen2.5 32B32B · ~22.1 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 32B32B · ~22.1 GB at Q4_K_M
- Qwen2.5 Coder 32B32B · ~20.7 GB at Q4_K_M
- Granite 4.0 H Small32B MoE · ~20.4 GB at Q4_K_M
- Qwen3 30B-A3B30.5B MoE · ~20.7 GB at Q4_K_M · Elo 1383
- SSarvam-30B30B MoE · ~21.7 GB at Q4_K_M
- Gemma 2 27B27B · ~18.7 GB at Q4_K_M · Elo 1289
- Gemma 3 27B27B · ~18.6 GB at Q4_K_M · Elo 1366
- Mistral Small 3 24B24B · ~16.3 GB at Q4_K_M · Elo 1357
- SSarvam-M 24B24B · ~16.3 GB at Q4_K_M
- gpt-oss 20B21B MoE · ~13.2 GB at Q4_K_M
- DeepSeek-V2-Lite16B MoE · ~12.2 GB at Q4_K_M
- Qwen2.5 14B14B · ~10.7 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 14B14B · ~10.7 GB at Q4_K_M
- Qwen2.5 Coder 14B14B · ~10.1 GB at Q4_K_M
- Mistral Nemo 12B12.2B · ~8.6 GB at Q4_K_M
- Gemma 3 12B12B · ~8.9 GB at Q4_K_M · Elo 1342
- Llama 3.2 Vision 11B10.7B · ~9 GB at Q4_K_M
- FNFalcon3 10B10B · ~7.5 GB at Q4_K_M
- Gemma 2 9B9B · ~7.3 GB at Q4_K_M · Elo 1266
- Qwen2.5-VL 7B8.29B · ~7.1 GB at Q4_K_M
- Llama 3.1 8B8B · ~6.4 GB at Q4_K_M · Elo 1211
- DeepSeek-R1-Distill-Llama 8B8B · ~6.4 GB at Q4_K_M
- Mistral 7B7B · ~5.8 GB at Q4_K_M · Elo 1149
- Qwen2.5 7B7B · ~6.1 GB at Q4_K_M
- DeepSeek-R1-Distill-Qwen 7B7B · ~6.1 GB at Q4_K_M
- Qwen2.5 Coder 7B7B · ~5.8 GB at Q4_K_M
- Gemma 3 4B4B · ~3.8 GB at Q4_K_M · Elo 1303
- Phi-3.5-mini 3.8B3.82B · ~3.7 GB at Q4_K_M
- Phi-4-mini 3.8B3.8B · ~3.8 GB at Q4_K_M
- Qwen2.5-VL 3B3.75B · ~4.4 GB at Q4_K_M
- Qwen2.5 3B3.09B · ~3.3 GB at Q4_K_M
- Qwen2.5 Coder 3B3.09B · ~3 GB at Q4_K_M
- Llama 3.2 3B3B · ~3.2 GB at Q4_K_M · Elo 1166
- SmolLM3 3B3B · ~3 GB at Q4_K_M
- Gemma 2 2B2.61B · ~2.9 GB at Q4_K_M
- Granite 3.1 2B2.53B · ~2.8 GB at Q4_K_M
- SSarvam-1 2B2B · ~2.7 GB at Q4_K_M
- SmolLM2 1.7B1.7B · ~2.2 GB at Q4_K_M · Elo 1114
- Qwen3 1.7B1.7B · ~2.4 GB at Q4_K_M
- Qwen2.5 1.5B1.54B · ~2.2 GB at Q4_K_M
- Qwen2.5 Coder 1.5B1.54B · ~2 GB at Q4_K_M
- TLTinyLlama 1.1B1.1B · ~1.8 GB at Q4_K_M
- Llama 3.2 1B1B · ~1.8 GB at Q4_K_M · Elo 1110
- Gemma 3 1B1B · ~1.8 GB at Q4_K_M
- Qwen3 0.6B0.6B · ~1.5 GB at Q4_K_M
- Qwen2.5 0.5B0.494B · ~1.5 GB at Q4_K_M
- Qwen2.5 Coder 0.5B0.494B · ~1.4 GB at Q4_K_M
- SmolLM2 360M0.362B · ~1.2 GB at Q4_K_M
- SmolLM2 135M0.135B · ~1 GB at Q4_K_M
Each chip links to the full breakdown for that model on a real 24GB device. "Tight" means it fits but with little headroom, close other apps.
The ceiling, per memory type
Apple M4 Pro (24GB) (~16 GB usable)
Runs up to gpt-oss 20B (21B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~16 GB it can give a model.
Nvidia GeForce RTX 4090 (24GB) (~23 GB usable)
Runs up to Granite 4.0 H Small (32B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~23 GB it can give a model.
Too large for any 24GB device
FAQ
How much of 24GB can a model actually use?
It depends on the memory type. Apple unified memory: about 16 GB (Apple M4 Pro (24GB)); GPU VRAM: about 23 GB (Nvidia GeForce RTX 4090 (24GB)). The rest is reserved for the OS, display and runtime overhead.
What is the best local LLM for 24GB?
Granite 4.0 H Small (32B) is the strongest model that runs comfortably at Q4_K_M on the most capable 24GB setup (Nvidia GeForce RTX 4090 (24GB), ~23 GB usable). On a tighter 24GB device the ceiling is lower, shown per row above.
Why does a 24GB GPU fit a bigger model than a 24GB Mac?
A discrete GPU gives almost all of its VRAM to the model (leave ~1 GB for the driver). Apple Silicon shares one unified pool with macOS, so roughly 66% is available to the GPU for weights. Same 24GB sticker, different usable budget, so the model ceiling differs.
Sources
Memory figures are estimates at Q4_K_M with a small context. See methodology.