Model family · 7 sizes
Qwen2.5: which size runs locally?
Qwen2.5 comes in 7 sizes, from 0.494B to 72B. Bigger is generally more capable but needs more memory. Here is each size with its Q4_K_M weight, the memory it needs, and the hardware that runs it.
- Sizes
- 7
- Smallest
- 0.494B
- Largest
- 72B
- Runs from
- 8GB
The Qwen2.5 lineup
- Qwen2.5 0.5B0.494B · ~0.491 GB Q4_K_M · needs ~1 GB
- Qwen2.5 1.5B1.54B · ~1.12 GB Q4_K_M · needs ~2 GB
- Qwen2.5 3B3.09B · ~2.1 GB Q4_K_M · needs ~4 GB
- Qwen2.5 7B7B · ~4.68 GB Q4_K_M · needs ~6 GB
- Qwen2.5 14B14B · ~8.99 GB Q4_K_M · needs ~11 GB
- Qwen2.5 32B32B · ~19.85 GB Q4_K_M · needs ~22 GB
- Qwen2.5 72B72B · ~47.42 GB Q4_K_M · needs ~50 GB · Elo 1303
"Needs" is the sourced minimum memory for Q4_K_M with a small context. Larger context needs more.
Which Qwen2.5 fits your memory
Largest that fits: Qwen2.5 3B (3.09B), best case on Apple M1 (8GB).
Largest that fits: Qwen2.5 14B (14B), best case on Nvidia GeForce RTX 4080 (16GB).
Largest that fits: Qwen2.5 32B (32B), best case on Nvidia GeForce RTX 4090 (24GB). Comfortable up to Qwen2.5 14B (14B).
Largest that fits: Qwen2.5 32B (32B), best case on Nvidia GeForce RTX 5090 (32GB).
Best case means the most capable device at that size (usually a discrete GPU). A Mac at the same size sits roughly one rung lower; see the per-size breakdown on each memory budget page.
FAQ
Which Qwen2.5 size should I run locally?
Pick the largest size your memory allows. On 8GB (best case) up to Qwen2.5 3B; On 16GB (best case) up to Qwen2.5 14B; On 24GB (best case) up to Qwen2.5 32B; On 32GB (best case) up to Qwen2.5 32B. Smaller sizes run faster and leave headroom for context.
What is the smallest Qwen2.5 model?
Qwen2.5 0.5B at 0.494B parameters, about 0.491 GB on disk at Q4_K_M and roughly 1 GB of memory to run. It is the one to use on phones and 8 GB machines.
What is the largest Qwen2.5 model and what does it need?
Qwen2.5 72B at 72B, about 47.42 GB at Q4_K_M and roughly 50 GB of memory. It needs more than a typical 32 GB desktop; a high-memory Mac or multi-GPU rig.
Sources
Memory figures are estimates at Q4_K_M. See methodology.