Open-Weight AI Models Compared
A practical comparison of the best open-weight AI models for local and sovereign inference. Llama, Qwen, DeepSeek, Mistral, Gemma and more, with licence clarity and hardware requirements. Updated monthly.
Six companies now release AI models you can download and run on your own hardware. Meta, Alibaba, DeepSeek, Mistral, Google, Microsoft. Between them, they cover everything from 3B parameter models that run on a phone to 685B parameter behemoths that need a data centre.
But “open” doesn’t mean what most people think it means. And the modelModelA trained neural network that takes inputs (text, images, audio) and produces outputs (more text, classifications, generated content). In DeAI the model is the thing that actually does the work.Like a very experienced apprentice who has spent years watching thousands of masters make furniture. They can't explain how they know when a joint is right, but they can make a chair that looks and functions like a Chippendale. The training is invisible. The output is what matters.Read more → you choose determines what you can actually do with it, legally and practically. This page tracks the models that matter for sovereign inferenceInferenceRunning a trained AI model to produce an answer. Inference is what happens when you type a prompt into ChatGPT and get a response. The model takes your input, computes a best guess, and returns it.Like asking an expert for their opinion. The training was the decades they spent becoming an expert. The inference is the 30 seconds it takes them to answer your specific question.Read more →: who made them, what licence they carry, how much hardware they need, and whether you can run them without asking anyone’s permission.
For the practical case for (and against) running these locally, see Why Self-Host Your AI. For hardware setup, see Mac Studio DeAI Setup.
Open-weight vs open-source: the distinction that matters
The AI industry uses “open source” loosely. Most models marketed as open source are not. The distinction is simple and it matters.
Truly open source means the model weightsParametersThe internal numbers (weights and biases) inside a neural network that get adjusted during training. A 70-billion-parameter model has 70 billion adjustable internal numbers encoding everything it has learned.Like the synapses in a human brain. Each parameter is a tiny dial that gets nudged a little during training. With enough dials, the network can represent surprisingly complex patterns. The total parameter count is roughly how much "brain" the model has.Read more → ship under a licence approved by the Open Source Initiative (OSI): Apache 2.0 or MIT. You can use the model commercially, modify it, redistribute it, fine-tune it, and deploy it without restriction. Qwen 3, DeepSeek V3.2, DeepSeek R1, Mistral Large 3, Mistral Small 4, and Phi-4 all qualify.
Open weight, restricted licence means you can download and run the weights but the licence limits what you can do. Meta’s Llama models restrict use by applications with more than 700 million monthly active users. Google’s Gemma Terms restrict redistribution of fine-tuned models. Cohere’s Command A is non-commercial entirely. These restrictions may not affect individual users today, but they constrain what you can build on top of them.
For sovereignty purposes, the difference is real. If you’re building infrastructure on top of a model, an Apache 2.0 or MIT licence means nobody can pull the rug. A restricted licence means someone can change the terms.
The comparison
Data from HuggingFace API. Last refreshed: 2026-03-28. Downloads are 30-day rolling counts.
Run npm run refresh:models to update.
| Model | Params | Context | Licence | Min RAM (Q4) | Downloads | Ollama |
|---|---|---|---|---|---|---|
| Qwen 3 8B Alibaba · Qwen | 8.2B | 131K | Open Source | 6 GB | 9.3M | ✓ |
| Qwen 3 32B Alibaba · Qwen | 32.8B | 131K | Open Source | 20 GB | 3.8M | ✓ |
| DeepSeek R1 DeepSeek · DeepSeek | 685B (37B active) | 128K | Open Source | 380 GB | 2.1M | ✓ |
| Phi-4 Microsoft · Phi | 14B | 16K | Open Source | 10 GB | 852K | ✓ |
| Qwen 3 235B Alibaba · Qwen | 235B (22B active) | 131K | Open Source | 143 GB | 641K | ✓ |
| DeepSeek V3.2 DeepSeek · DeepSeek | 685B (37B active) | 262K | Open Source | 380 GB | 335K | ✓ |
| Mistral Small 4 Mistral AI · Mistral | 119B (6.5B active) | 256K | Open Source | 74 GB | 43K | ✓ |
| Mistral Large 3 Mistral AI · Mistral | 675B (41B active) | 256K | Open Source | 407 GB | 663 | ✓ |
| Llama 3.1 8B Meta · Llama | 8B | 128K | Restricted | 6 GB | 8.5M | ✓ |
| Llama 3.2 3B Meta · Llama | 3.2B | 128K | Restricted | 2.5 GB | 7.0M | ✓ |
| Gemma 3 12B Google · Gemma | 12B | 128K | Restricted | 8 GB | 2.5M | ✓ |
| Gemma 3 27B Google · Gemma | 27.4B | 128K | Restricted | 18 GB | 1.1M | ✓ |
| Llama 3.3 70B Meta · Llama | 70.6B | 128K | Restricted | 42 GB | 412K | ✓ |
| Llama 4 Scout Meta · Llama | 109B (17B active) | 10M | Restricted | 55 GB | 255K | ✓ |
| Command A Cohere · Command | 111B | 256K | Non-Commercial | 67 GB | 5K | ✓ |
Which model for which job
Not every model suits every task. Hardware constraints and use case determine the right choice.
Laptop or Mini PC (8-16GB RAM). Llama 3.2 3B, Qwen 3 8B, Gemma 3 12B, or Phi-4. All fit comfortably at Q4 quantisationQuantisationCompressing an AI model by storing each parameter with fewer bits of precision. Quantisation cuts model size and inference cost by 2-4x with small quality losses, making big models practical to run on consumer hardware.Like printing a high-resolution photo at lower DPI. The image is mostly the same, the details are slightly less crisp, and the file size drops dramatically. For most uses you can't tell the difference. For some uses the quality loss matters.Read more →. Qwen 3 8B is the strongest all-rounder at this size with Apache 2.0 licensing. Phi-4 punches above its weight on reasoning but has a short 16K context window.
Mac Studio or workstation (32-64GB RAM). Qwen 3 32B is the sweet spot: 20GB at Q4, 22 tok/s on an M4 Max, Apache 2.0 licence, and quality that rivals 70B models from a year ago. Llama 3.3 70B fits at Q4 (42GB) but runs at 10-12 tok/s, which feels slow for interactive use. Gemma 3 27B is another strong option at 18GB.
Multi-GPU server or cloud. DeepSeek V3.2 and R1 are frontier-competitive under MIT licence but need 380GB+ RAM. Qwen 3 235B (Apache 2.0, 143GB) is the largest model you can run with genuinely no restrictions. Mistral Large 3 (Apache 2.0, 407GB) is the European sovereignty option.
Coding and development. Qwen 3 32B and DeepSeek R1 (distilled 32B) both handle code generation well at a size that runs locally. For editor-integrated completion via Continue.dev, a 24-32B model at 22-25 tok/s gives responsive autocomplete.
RAG and document analysis. Context window matters here. Qwen 3 models support 131K tokens (with YaRN extension). DeepSeek V3.2 offers 262K. Mistral Small 4 supports 256K. For embeddingEmbeddingA numerical representation of a word, sentence, or image as a list of numbers (a vector) that captures its meaning. Similar things have similar embeddings, which makes them useful for search, clustering, and recommendation.Like a map where every word has GPS coordinates. Words with similar meanings end up close together on the map. "Cat" and "kitten" are nearby. "Cat" and "thunderstorm" are far apart. The map is the embedding space.Read more → generation, run nomic-embed-text or BGE-M3 alongside your generation model: they’re under 1GB and don’t compete for memory.
Privacy-first use cases. Any model running via Ollama or LM Studio on your own hardware provides Level 4 privacy (the highest on our privacy spectrum). No prompts leave the machine. For situations where data genuinely cannot touch an external server, local inference with an open-source model is the only option that provides both legal clarity and architectural guarantees.
The sovereignty lens
Three questions determine whether a model is sovereign-compatible:
-
Can you run it without permission? Apache 2.0 and MIT: yes. Llama Community and Gemma Terms: conditionally. CC-BY-NC: not commercially.
-
Can you build on it without risk? If the licence allows commercial use and modification without restrictions, you can build products, fine-tune for clients, and redistribute. If not, your project depends on someone else’s licence terms staying favourable.
-
Can you run it on hardware you control? Models under 70GB at Q4 run on consumer hardware. Above that, you need data centre infrastructure, which may mean renting from a cloud provider and reintroducing a dependency.
The models that score highest on all three: Qwen 3 32B (Apache 2.0, 20GB, runs on a Mac), Phi-4 (MIT, 10GB, runs on a laptop), and Qwen 3 8B (Apache 2.0, 6GB, runs on almost anything). DeepSeek’s MIT-licensed models are excellent but too large for consumer hardware in their full form; the distilled versions solve this.
How we keep this current
Open-weight models release constantly. This page uses a hybrid approach:
Automated data refresh. Download counts and community metrics update via the HuggingFace APIAPIApplication Programming Interface. A structured way for one piece of software to talk to another. In DeAI, APIs let applications request inference from a model without running the model themselves.Like a waiter in a restaurant. You don't walk into the kitchen and cook your own meal. You tell the waiter what you want, they tell the kitchen, the kitchen cooks it, and the waiter brings it back. The API is the waiter.Read more → (npm run refresh:models). The “last refreshed” date at the top of the comparison shows when data was last pulled.
Manual editorial updates. New model families, licence changes, hardware requirements, and sovereignty assessments are updated during our monthly review process. We add models when they’re significant enough to change the comparison, not every time someone releases a 7B variant.
What we track. 15 models across six families, chosen for relevance to sovereign and local inference. We exclude models that are research-only, not available on Ollama or LM Studio, or too niche to matter for our audience. If a model you care about is missing, it’s probably because it doesn’t meet one of those criteria.
For how these models fit into a broader sovereignty strategy, see The Sovereignty Stack.