Skip to content
localmodel.run

Memory budget · 32 GB

Best local LLMs for 32GB

32GB is not a single ceiling. A 32GB Mac, a 32GB GPU and a 32GB Laptop each leave a different amount free for model weights, so the largest model you can run changes with the memory type, not just the number.

Usable range
21–31 GB
Models that fit
59
Memory types
3
Top pick
35B

What 32GB actually gives you

Usable figures are sourced per device (tap a card for the full profile). Verdicts below use Q4_K_M, the community-default quant.

Top pick for 32GB Q4_K_M

Runs comfortably on the most capable 32GB setup (Nvidia GeForce RTX 5090 (32GB), ~31 GB usable) at ~22.3 GB. Check it against your exact device on its model page.

Models ranked for 32GB

Biggest that fits first Mac · GPU · Laptop

Each chip links to the full breakdown for that model on a real 32GB device. "Tight" means it fits but with little headroom, close other apps.

The ceiling, per memory type

Apple M5 (32GB) (~21 GB usable)

Runs up to Gemma 3 27B (27B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~21 GB it can give a model.

Nvidia GeForce RTX 5090 (32GB) (~31 GB usable)

Runs up to Command R 35B (35B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~31 GB it can give a model.

32GB RAM Laptop (CPU/iGPU only) (~28 GB usable)

Runs up to Command R 35B (35B) comfortably at Q4_K_M. Larger models either sit tight or spill past the ~28 GB it can give a model.

Too large for any 32GB device

FAQ

How much of 32GB can a model actually use?

It depends on the memory type. Apple unified memory: about 21 GB (Apple M5 (32GB)); GPU VRAM: about 31 GB (Nvidia GeForce RTX 5090 (32GB)); System RAM (CPU only): about 28 GB (32GB RAM Laptop (CPU/iGPU only)). The rest is reserved for the OS, display and runtime overhead.

What is the best local LLM for 32GB?

Command R 35B (35B) is the strongest model that runs comfortably at Q4_K_M on the most capable 32GB setup (Nvidia GeForce RTX 5090 (32GB), ~31 GB usable). On a tighter 32GB device the ceiling is lower, shown per row above.

Why does a 32GB GPU fit a bigger model than a 32GB Mac?

A discrete GPU gives almost all of its VRAM to the model (leave ~1 GB for the driver). Apple Silicon shares one unified pool with macOS, so roughly 66% is available to the GPU for weights. Same 32GB sticker, different usable budget, so the model ceiling differs.

Sources

Memory figures are estimates at Q4_K_M with a small context. See methodology.