Skip to content
localmodel.run

Efficiency · Elo per GB

The most efficient local LLMs

Capability per gigabyte, not raw score. This ranks the models that give the most quality for the least memory, the metric that matters when memory is what limits you.

Coverage

Only 16 of 67 tracked models carry a public LMArena Elo, so only those can be scored. Many strong small and distilled models have no arena score and are not shown. Efficiency uses the sourced minimum memory at Q4_K_M.

The efficient frontier

Each step is the smallest model that beats every cheaper one on Elo. If you have this much memory to spare, this is the most capable scored model that fits it.

Memory floor Elo

Ranked by Elo per GB

# Model Min GB Elo Elo/GB
1 2 1110 555
2 3 1114 371
3 4 1303 326
4 4 1166 292
5 8 1266 158
6 8 1211 151
7 8 1149 144
8 10 1342 134
9 12 1256 105
10 16 1357 85
11 20 1366 68
12 21 1383 66
13 20 1289 64
14 22 1347 61
15 48 1318 27
16 50 1303 26

Tiny models top this by asking for almost no memory. For raw quality regardless of size, see the Elo leaderboard; to fit a specific machine, start from your memory budget.

FAQ

What is the most efficient local LLM?

By capability per gigabyte, Llama 3.2 1B (1B) leads: an LMArena Elo of 1110 on roughly 2 GB of memory at Q4_K_M. Tiny models win this metric because they ask for so little memory; the frontier below shows the smallest model that beats each capability level.

Does this cover every model?

No. Only 16 of 67 tracked models carry a public LMArena Elo, so only those can be scored for efficiency. Many strong small and distilled models have no arena score and are not shown here; this is a ranking of what can be measured, not the whole catalog.

Why measure Elo per gigabyte?

Because on local hardware, memory is the binding constraint. Elo per GB answers a practical question: for the memory you can spare, what is the most capable model that fits? It rewards models that punch above their size.

Sources

Elo is a snapshot from LMArena around 2026-06-15 and drifts over time. Memory is the sourced Q4_K_M minimum; see methodology.