Three Bets on Open Weights
Three Chinese labs hit the open-weight frontier in a fortnight. Moonshot raised prices, DeepSeek dropped them, Alibaba's closed flagship sits at #12 on the only independent index. The story has a more interesting shape than the headlines suggest.
Three things happened in the space of a fortnight. Moonshot shipped Kimi K2.6 on 20 April. Alibaba shipped Qwen 3.6 across three SKUs between 16 and 22 April, holding the flagship closed-weights as it has been since Qwen2-Max in June 2024. DeepSeek shipped V4 on 24 April, open-sourced same day, with Huawei integration baked in. Any one of them is a story. Together they’re a fork in the road.
The lazy headline writes itself: open weights are catching up. The honest version is more useful. Three Chinese labs just made three different bets on what “open” means in commercial AI in 2026. Moonshot is open at the frontier and pricing premium. DeepSeek is open at the frontier and racing to the bottom on price, with hardware sovereignty as an upside option. Alibaba is open below the frontier and closed at the top, except their closed flagship isn’t actually at the frontier.
This is the moment to read the divergence, not the convergence.
What Kimi K2.6 Actually Is
Kimi K2.6 went public on 20 April 2026. It’s a one-trillion-parameter mixture-of-expertsMoEMixture of Experts. A neural network architecture where many specialised "expert" sub-models exist alongside a router that picks which experts to use for each input. Only a fraction of the model's parameters are active for any single query.Like a hospital with many specialist doctors instead of one giant generalist. When a patient arrives, a triage nurse routes them to the right specialist. The hospital has 100 doctors total but each patient sees only the 2-3 they actually need.Read more → model with 32 billion active parameters per token, a 256K context window, and native multimodal support for text, image, and experimental video. Moonshot released it under a modified MIT licence with a UI attribution clause that triggers at 100 million MAUs or $20 million per month in revenue. For everyone reading this, that doesn’t apply.
The architecture ships in four variants: Instant, Thinking, Agent, and Agent Swarm. Agent Swarm is the new thing. It orchestrates up to 300 sub-agents across 4,000 coordinated steps.
Fact: Agent Swarm scales K2.5’s existing 100-sub-agent / 1,500-step capability up to 300 sub-agents across 4,000 coordinated steps, with K2.6 self-reporting 86.3% on BrowseComp Swarm against GPT-5.4’s 78.4% under external orchestration.
Take: This is the shape of the next year of inferenceInferenceRunning a trained AI model to produce an answer. Inference is what happens when you type a prompt into ChatGPT and get a response. The model takes your input, computes a best guess, and returns it.Like asking an expert for their opinion. The training was the decades they spent becoming an expert. The inference is the 30 seconds it takes them to answer your specific question.Read more →. Single-model agents are a rehearsal. Swarms are the performance. If you care about sovereignty, you want the swarm running on weights you control, not an API you can be cut off from.
The independent benchmark picture is sharper than the marketing. Artificial Analysis posted evaluations on 21 April. K2.6 scores 54 on Intelligence Index v4.0, ranked fourth. Anthropic, Google, and OpenAI tie at 57. That’s a three-point gap, not a lead. On agentic axes the story tightens: GDPval-AA Elo jumped 211 points from K2.5 to K2.6, a substantial step-change for a six-week release cycle. τ²-Bench Telecom at 96% is leading-tier agentic performance, independently verified. Hallucination at 39% is middle of the pack. Not a win.
Aider, the official SWE-bench leaderboard, and LiveCodeBench haven’t posted independent K2.6 numbers yet. So when you see 89.6 on LiveCodeBench v6 or 80.2 on SWE-bench Verified attributed to K2.6, those trace back to Moonshot’s own evaluation framework. Treat as self-reported until independent replication lands.
The honest summary: on the only fully independent index that’s posted scores so far (Artificial Analysis), K2.6 is the leading open-weight model and ranks fourth overall. It is not the best model full stop. The closed frontier still wins the aggregate index by roughly three points, and replication on Aider, LMArena, and the official SWE-bench boards is still pending.
One practical caveat. The BF16 weights are 595 GB. This is not a model you run on a laptop. It’s a model you run on a multi-GPU rig, or that you consume through decentralised inference providers who’ve stood up the hardware. “Open” here means auditable and self-hostable by serious operators, not portable to your desk.
The Price Tag On The Lead
Moonshot raised prices alongside the release. K2.6 API pricing is now $0.95 per million input tokens cache miss (up from roughly $0.60 for K2.5), $0.16 cache hit, and $4.00 per million output tokens (up from roughly $2.50). In CNY, per 36kr, it’s ¥6.5 input and ¥27 output, a 58 to 62 percent bump.
That’s not “open source wins on price.” That’s Moonshot pricing for the lead, and pricing it before V4 lands. Which it now has.
DeepSeek V4: Open At The Frontier, Cheap As Chips
DeepSeek V4 Preview shipped on 24 April 2026, open-sourced the same day under MIT. Two models. V4-Pro is 1.6 trillion total parameters with 49 billion active per token. V4-Flash is 284 billion total with 13 billion active. Both have a 1 million token context window. Both are text-only.
The architecture introduces Hybrid Attention (Compressed Sparse Attention plus Heavily Compressed Attention) and Manifold-Constrained Hyper-Connections, trained with the Muon optimiser on 32 trillion-plus tokens. DeepSeek claims 27 percent of V3.2’s single-token inference FLOPs and 10 percent of the KV cache for 1M-token contexts. If those numbers hold up under independent inspection, V4 is a real efficiency step, not just a parameter increase.
V4-Pro Max benchmarks (self-reported on the model card): MMLU-Pro 87.5, GPQA Diamond 90.1, SWE-bench Verified 80.6, LiveCodeBench 93.5, BrowseComp 83.4, Codeforces 3206. One number worth flagging: HLE with tools is 48.2. K2.6 leads at 54.0. So even on DeepSeek’s own card, V4-Pro doesn’t beat K2.6 across the board. It beats K2.6 on coding and math, trails on broad reasoning under tools.
Now the price story. DeepSeek is running V4-Pro at a 75 percent promotional discount until 5 May 2026. At promo: $0.435 per million input tokens cache miss, $0.87 output. V4-Flash, no promo, is $0.14 input miss, $0.28 output. That’s roughly an order of magnitude cheaper than K2.6 on output at promo, and roughly 14x cheaper if you can route to Flash for less reasoning-heavy work.
Fact: V4-Pro’s promo output pricing ($0.87/M) is 78 percent below K2.6’s ($4.00/M). V4-Flash is 93 percent below.
Take: Two open-weight Chinese frontier labs, two opposite pricing strategies in the same fortnight. One charges what the lead is worth. The other charges what their cost basis allows. Whichever framing is right will determine who’s still selling K2.6 access in three months.
The Huawei Question, Sharpened
The pre-launch coverage said V4 would train on Huawei’s Ascend 950PR. That story needs a more careful version now V4 has shipped.
Here’s what’s actually confirmed. Huawei has stated V4 is fully supported on Ascend 950-based supernode clusters for inference, and that Huawei chips were used for “part of V4-Flash’s training.” DeepSeek itself declined to disclose training hardware on the model card. The Information reports V4 development was delayed while DeepSeek worked closely with Huawei and Cambricon to rewrite the underlying architecture for domestic Chinese accelerators, in effect decoupling parts of the V4 stack from CUDACUDANvidia's parallel computing platform. The software layer that lets AI workloads run on Nvidia GPUs. Almost every serious AI model is trained and served through CUDA, which is why Nvidia has a structural moat in AI compute.Like electrical plugs. You can build any kind of appliance, but if every socket in the country is one specific shape, you have to match it. CUDA is that socket shape for AI, and Nvidia owns the factory that makes them.Read more →.
So the honest framing is narrower than “first frontier model trained on Ascend.” It’s: V4 was architecturally rebuilt for Chinese hardware, V4-Flash had partial Ascend training (per Huawei, not per DeepSeek), inference at scale is fully supported on Ascend supernodes, training disclosure for V4-Pro remains open. That’s still a meaningful sovereignty story. Huawei is targeting 750,000 Ascend 950PR units in 2026. ByteDance reportedly committed around $5.6 billion in orders. Alibaba confirmed large orders. The parallel compute stack is funded.
What this changes for the reader. Open weights on Nvidia was always a half-sovereignty story: you controlled the model, you didn’t control the chip. A frontier-grade Chinese open-weight model with confirmed Huawei inference support and probable Huawei training is the first credible attempt at a full-stack alternative to the US AI supply chain. It is not the finished article. It is the first credible attempt.
The catch is the same as for K2.6. None of this hardware is on your desk. Ascend 950PR is hyperscale-only. What it changes is the set of providers who can build independent inference infrastructure without routing through US export controls. That’s upstream of anything an end user touches, but it’s where the sovereignty story has to be won.
Qwen 3.6: The Bet That Doesn’t Match The Pattern
Alibaba shipped three models in the Qwen 3.6 family across April. The pattern doesn’t match Moonshot or DeepSeek’s, and that’s the most interesting part.
Qwen3.6-35B-A3B landed on 16 April under Apache 2.0. Open weights, MoE with 35 billion total and 3 billion active parameters per token, 256K native context (extensible to 1 million with YaRN scaling), genuinely multimodal across text, image, and video. A hybrid Gated DeltaNet plus Gated Attention architecture that’s a real departure from standard transformer designs. Self-reported benchmarks: SWE-bench Verified 73.4, MMLU-Pro 85.2, GPQA Diamond 86.0, AIME 2026 92.7. For 3 billion active parameters, that’s punching well above weight class.
Qwen3.6-27B followed on 22 April. Apache 2.0, dense not MoE, 256K context, multimodal. The point of interest is the agentic coding numbers compared against frontier closed models: SWE-bench Verified 77.2 against Claude 4.5 Opus at 80.9. SWE-bench Pro 53.5 against Claude at 57.1. Terminal-Bench 2.0 dead heat at 59.3. These are Alibaba’s self-reports, not independent, but they’re consistent enough across benchmarks that “27B dense Chinese open model competitive with Claude 4.5 Opus on agentic coding” is at minimum a defensible characterisation.
Then on 20 April, between those two open releases, Alibaba dropped Qwen3.6-Max-Preview. No open weights. Hosted only. This continues a pattern Alibaba started with Qwen2-Max in June 2024 and tightened through Qwen3-Max-Preview last September: keep the Max-tier flagship closed, open-source the smaller SKUs. The free tier of Qwen Code shut down the same day. The closed-flagship strategy isn’t a new pivot, it’s been the strategy for nearly two years. What’s new is that it’s no longer paired with frontier-grade quality.
And here’s where it gets pointed. Artificial Analysis put Qwen3.6-Max-Preview at 52 on Intelligence Index, ranked 12th overall. K2.6 is at 54, ranked 4th. The closed frontier (Anthropic, Google, OpenAI) ties at 57. So on the only independent intelligence index that has scored Qwen3.6-Max-Preview as of writing, Alibaba’s closed flagship sits behind Moonshot’s open flagship and well behind the actual closed frontier. Pricing for Max-Preview, per Artificial Analysis, is $1.30 per million input tokens and $7.80 per million output. Roughly 9x V4-Pro’s promo output, roughly 2x K2.6’s.
Fact: Alibaba’s closed Max-Preview ranks #12 on Artificial Analysis Intelligence Index. Their own open Qwen3.6-27B reportedly matches Claude 4.5 Opus on Terminal-Bench. The open mid-tier model is the stronger statement of Alibaba’s actual capability.
Take: If you’re going to close your weights at the frontier, the model needs to be at the frontier. Otherwise you’re just gating mid-tier capability and charging frontier prices for it. The Max-Preview pivot is the canary worth watching for the open-weights thesis. Not because closing is wrong, but because closing without quality leadership is just monetisation.
Alibaba has not disclosed parameters, training compute, or strategic reasoning. The community has filled the gap with the obvious read: Alibaba Cloud needs to monetise Model Studio, an open Apache flagship makes that hard, so the flagship goes closed and the open SKUs hold the developer mindshare. Whether that’s stable in the face of Moonshot and DeepSeek refusing to make the same trade is the question worth tracking.
Three Bets, One Fortnight
Step back. Three Chinese labs, three flagship moves in two weeks, three different theories of where commercial open AI is going.
Moonshot bet that quality leadership is worth premium pricing. K2.6 leads the open-weight tier on Artificial Analysis Intelligence Index, the one fully independent index that’s scored it so far, and Moonshot raised prices 60 percent to charge for it. This works if Aider, LMArena, and the official SWE-bench boards confirm the lead and customers stick at $4 per million output tokens against $0.87 from DeepSeek. It doesn’t work if either falls.
DeepSeek bet that the cheapest credible frontier wins on a long enough timeline. V4-Pro at promotional pricing is 78 percent below K2.6 on output. V4-Flash is 93 percent below. The Huawei integration adds a sovereignty narrative on top, even if the training disclosure is narrower than Reuters first reported. This works if cost compounds into market share, and if Ascend supply meets the demand the parallel chain is provisioning for.
Alibaba bet they can split the field: keep developer mindshare with strong open mid-tier releases, monetise Model Studio with a closed flagship. The wrinkle is the closed flagship isn’t actually at the frontier. So Alibaba is asking buyers to pay frontier prices for #12 on the Intelligence Index. The bet only stabilises if Max-Preview’s quality jumps materially in the production release, or if Alibaba Cloud’s enterprise channel is sticky enough to absorb that gap.
The cross-cutting observation: all three bets only work if the open mid-tier doesn’t eat the closed frontier. Qwen3.6-27B comparable to Claude 4.5 Opus on Terminal-Bench is the most aggressive open-weight statement of the fortnight, and it came from the lab that holds its flagship closed. That contradiction is the story. Either Alibaba is right that mid-tier open and frontier closed can coexist, or the open 27B puts pricing pressure on Max-Preview that Alibaba can’t sustain.
Position
Bearish on the “closed frontier is structurally ahead” thesis that held until February. The independent gap is three points and shrinking. Bearish on Qwen3.6-Max-Preview as a commercial product at current pricing relative to both K2.6 and V4. Cautiously bullish on V4 as the default open model for serious operators on cost-per-token grounds, pending independent benchmark replication. K2.6 holds the quality lead until Aider and the official SWE-bench boards say otherwise, but Moonshot’s pricing power has a clock on it.
For sovereignty: V4 with Huawei integration is the first credible full-stack alternative, even if the training story is narrower than the headlines. Watch the tech report, the contracts ByteDance and Alibaba have placed, and whether the Ascend supply chain delivers at the unit volumes Huawei has guided.
Three watchpoints over the next month. One: independent benchmark replication of K2.6, V4-Pro, and the Qwen 3.6 open SKUs on Aider, LMArena, and the official leaderboards. If Qwen 27B confirms its self-reported numbers against Claude 4.5 Opus, the open mid-tier story gets sharper. Two: whether DeepSeek’s promotional V4-Pro pricing holds past 5 May or normalises higher. The promo is the headline; the post-promo rate is the actual product. Three: whether any of the three labs drift on licence terms with future releases. Open today is not a guarantee of open tomorrow.
Keep watching the HuggingFace repos. The press releases lag.