Skip to content
localmodel.run

Model family · 4 sizes

Llama: which size runs locally?

Llama comes in 4 sizes, from 1B to 70B. Bigger is generally more capable but needs more memory. Here is each size with its Q4_K_M weight, the memory it needs, and the hardware that runs it.

Sizes
4
Smallest
1B
Largest
70B
Runs from
8GB

The Llama lineup

"Needs" is the sourced minimum memory for Q4_K_M with a small context. Larger context needs more.

Which Llama fits your memory

8GB

Largest that fits: Llama 3.2 3B (3B), best case on Apple M1 (8GB).

Yes
16GB

Largest that fits: Llama 3.1 8B (8B), best case on Nvidia GeForce RTX 4080 (16GB).

Yes
24GB

Largest that fits: Llama 3.1 8B (8B), best case on Nvidia GeForce RTX 4090 (24GB).

Yes
32GB

Largest that fits: Llama 3.1 8B (8B), best case on Nvidia GeForce RTX 5090 (32GB).

Yes

Best case means the most capable device at that size (usually a discrete GPU). A Mac at the same size sits roughly one rung lower; see the per-size breakdown on each memory budget page.

FAQ

Which Llama size should I run locally?

Pick the largest size your memory allows. On 8GB (best case) up to Llama 3.2 3B; On 16GB (best case) up to Llama 3.1 8B; On 24GB (best case) up to Llama 3.1 8B; On 32GB (best case) up to Llama 3.1 8B. Smaller sizes run faster and leave headroom for context.

What is the smallest Llama model?

Llama 3.2 1B at 1B parameters, about 0.81 GB on disk at Q4_K_M and roughly 2 GB of memory to run. It is the one to use on phones and 8 GB machines.

What is the largest Llama model and what does it need?

Llama 3.3 70B at 70B, about 42.52 GB at Q4_K_M and roughly 48 GB of memory. It needs more than a typical 32 GB desktop; a high-memory Mac or multi-GPU rig.

Sources

Memory figures are estimates at Q4_K_M. See methodology.