Open-Weight AI Models Compared

Six companies now release AI models you can download and run on your own hardware. Meta, Alibaba, DeepSeek, Mistral, Google, Microsoft. Between them, they cover everything from 3B parameter models that run on a phone to 685B parameter behemoths that need a data centre.

But “open” doesn’t mean what most people think it means. And the model you choose determines what you can actually do with it, legally and practically. This page tracks the models that matter for sovereign inference: who made them, what licence they carry, how much hardware they need, and whether you can run them without asking anyone’s permission.

For the practical case for (and against) running these locally, see Why Self-Host Your AI. For hardware setup, see Mac Studio DeAI Setup.

Open-weight vs open-source: the distinction that matters

The AI industry uses “open source” loosely. Most models marketed as open source are not. The distinction is simple and it matters.

Truly open source means the model weights ship under a licence approved by the Open Source Initiative (OSI): Apache 2.0 or MIT. You can use the model commercially, modify it, redistribute it, fine-tune it, and deploy it without restriction. Qwen 3, DeepSeek V3.2, DeepSeek R1, Mistral Large 3, Mistral Small 4, and Phi-4 all qualify.

Open weight, restricted licence means you can download and run the weights but the licence limits what you can do. Meta’s Llama models restrict use by applications with more than 700 million monthly active users. Google’s Gemma Terms restrict redistribution of fine-tuned models. Cohere’s Command A is non-commercial entirely. These restrictions may not affect individual users today, but they constrain what you can build on top of them.

For sovereignty purposes, the difference is real. If you’re building infrastructure on top of a model, an Apache 2.0 or MIT licence means nobody can pull the rug. A restricted licence means someone can change the terms.

The comparison

Data from HuggingFace API. Last refreshed: 2026-03-28. Downloads are 30-day rolling counts. Run npm run refresh:models to update.

Model	Params	Context	Licence	Min RAM (Q4)	Downloads	Ollama
Qwen 3 8B Alibaba · Qwen	8.2B	131K	Open Source	6 GB	9.3M	✓
Qwen 3 32B Alibaba · Qwen	32.8B	131K	Open Source	20 GB	3.8M	✓
DeepSeek R1 DeepSeek · DeepSeek	685B (37B active)	128K	Open Source	380 GB	2.1M	✓
Phi-4 Microsoft · Phi	14B	16K	Open Source	10 GB	852K	✓
Qwen 3 235B Alibaba · Qwen	235B (22B active)	131K	Open Source	143 GB	641K	✓
DeepSeek V3.2 DeepSeek · DeepSeek	685B (37B active)	262K	Open Source	380 GB	335K	✓
Mistral Small 4 Mistral AI · Mistral	119B (6.5B active)	256K	Open Source	74 GB	43K	✓
Mistral Large 3 Mistral AI · Mistral	675B (41B active)	256K	Open Source	407 GB	663	✓
Llama 3.1 8B Meta · Llama	8B	128K	Restricted	6 GB	8.5M	✓
Llama 3.2 3B Meta · Llama	3.2B	128K	Restricted	2.5 GB	7.0M	✓
Gemma 3 12B Google · Gemma	12B	128K	Restricted	8 GB	2.5M	✓
Gemma 3 27B Google · Gemma	27.4B	128K	Restricted	18 GB	1.1M	✓
Llama 3.3 70B Meta · Llama	70.6B	128K	Restricted	42 GB	412K	✓
Llama 4 Scout Meta · Llama	109B (17B active)	10M	Restricted	55 GB	255K	✓
Command A Cohere · Command	111B	256K	Non-Commercial	67 GB	5K	✓

Which model for which job

Not every model suits every task. Hardware constraints and use case determine the right choice.

Laptop or Mini PC (8-16GB RAM). Llama 3.2 3B, Qwen 3 8B, Gemma 3 12B, or Phi-4. All fit comfortably at Q4 quantisation. Qwen 3 8B is the strongest all-rounder at this size with Apache 2.0 licensing. Phi-4 punches above its weight on reasoning but has a short 16K context window.

Mac Studio or workstation (32-64GB RAM). Qwen 3 32B is the sweet spot: 20GB at Q4, 22 tok/s on an M4 Max, Apache 2.0 licence, and quality that rivals 70B models from a year ago. Llama 3.3 70B fits at Q4 (42GB) but runs at 10-12 tok/s, which feels slow for interactive use. Gemma 3 27B is another strong option at 18GB.

Multi-GPU server or cloud. DeepSeek V3.2 and R1 are frontier-competitive under MIT licence but need 380GB+ RAM. Qwen 3 235B (Apache 2.0, 143GB) is the largest model you can run with genuinely no restrictions. Mistral Large 3 (Apache 2.0, 407GB) is the European sovereignty option.

Coding and development. Qwen 3 32B and DeepSeek R1 (distilled 32B) both handle code generation well at a size that runs locally. For editor-integrated completion via Continue.dev, a 24-32B model at 22-25 tok/s gives responsive autocomplete.

RAG and document analysis. Context window matters here. Qwen 3 models support 131K tokens (with YaRN extension). DeepSeek V3.2 offers 262K. Mistral Small 4 supports 256K. For embedding generation, run nomic-embed-text or BGE-M3 alongside your generation model: they’re under 1GB and don’t compete for memory.

Privacy-first use cases. Any model running via Ollama or LM Studio on your own hardware provides Level 4 privacy (the highest on our privacy spectrum). No prompts leave the machine. For situations where data genuinely cannot touch an external server, local inference with an open-source model is the only option that provides both legal clarity and architectural guarantees.

The sovereignty lens

Three questions determine whether a model is sovereign-compatible:

Can you run it without permission? Apache 2.0 and MIT: yes. Llama Community and Gemma Terms: conditionally. CC-BY-NC: not commercially.
Can you build on it without risk? If the licence allows commercial use and modification without restrictions, you can build products, fine-tune for clients, and redistribute. If not, your project depends on someone else’s licence terms staying favourable.
Can you run it on hardware you control? Models under 70GB at Q4 run on consumer hardware. Above that, you need data centre infrastructure, which may mean renting from a cloud provider and reintroducing a dependency.

The models that score highest on all three: Qwen 3 32B (Apache 2.0, 20GB, runs on a Mac), Phi-4 (MIT, 10GB, runs on a laptop), and Qwen 3 8B (Apache 2.0, 6GB, runs on almost anything). DeepSeek’s MIT-licensed models are excellent but too large for consumer hardware in their full form; the distilled versions solve this.

How we keep this current

Open-weight models release constantly. This page uses a hybrid approach:

Automated data refresh. Download counts and community metrics update via the HuggingFace API (npm run refresh:models). The “last refreshed” date at the top of the comparison shows when data was last pulled.

Manual editorial updates. New model families, licence changes, hardware requirements, and sovereignty assessments are updated during our monthly review process. We add models when they’re significant enough to change the comparison, not every time someone releases a 7B variant.

What we track. 15 models across six families, chosen for relevance to sovereign and local inference. We exclude models that are research-only, not available on Ollama or LM Studio, or too niche to matter for our audience. If a model you care about is missing, it’s probably because it doesn’t meet one of those criteria.

For how these models fit into a broader sovereignty strategy, see The Sovereignty Stack.