Analyst News LLM Open Weights China MiniMax Model Release

Latest open artifacts (#19): Qwen 3.5, GLM 5, MiniMax 2.5 — Chinese labs' latest push of the frontier

Interconnectsby Florian Brand, Nathan LambertMarch 3, 20264 min read0 views

Welcome to the year of the horse!

It’s been a busy month at the top end of open-weights AI — with new flagship models from all of Qwen, MiniMax, Z.ai, Ant Ling, and StepFun. Still, all eyes are on DeepSeek V4’s pending release, which rumors continue to accelerate towards. Outside of the large, frontier models, this issue is a bit lighter on the long-tail of niche modalities and model sizes.

With all these new releases, we’re tracking them with our new Relative Adoption Metrics (RAM), a measurement tool that normalizes model downloads relative to peer models in their size class. This has already been an extremely useful tool for us, highlighting underrated models like GPT-OSS, which is literally off the charts in how downloaded it is — the most popular American open-weights model since Llama 3.1. A RAM score >1 means the model is on track to be a top 10 all-time downloaded model in its size class. We’re particularly interested to see how the early adoption of the smaller Qwen 3.5 dense models will go relative to Qwen 3 — balancing Qwen’s ever growing brand with a trickier, hybrid model architecture that can push the limits of some open-source tools.

A summary of the RAM scores for some of the popular models released late in 2025 is below, highlighting Kimi K2 Thinking and some OCR models as clear winners. DeepSeek V3.2, and their other recent large models, have wildly underperformed DeepSeek’s earlier releases in 2025.

The time here is days since release.

Qwen3.5-397B-A17B by Qwen: The long-awaited update to Qwen is finally here. It comes in various sizes from 0.8B to 27B (dense) and 35B-A3B to 397B-A17B (MoE), some of them even with base models. All of them are multi-modal, use reasoning by default and are based on the Qwen-Next architecture with GDN layers.

We tested these models over the last few days, and they are a clear upgrade over the previous version: There are a lot of substantial improvements across the board, making them perfect workhorses for a wide range of tasks.Their style and instruction-following have improved, and the models are even better at multilingual tasks, covering more languages.

However, at least the small models (still) tend to overthink. You can turn off reasoning by disabling it in the chat template.

Step-3.5-Flash by stepfun-ai: StepFun really stepped up its game (no pun intended), releasing a 196B-A11B MoE with strong metrics across the board. It is especially strong in math benchmarks, beating out models that are several times larger than it.
GLM-5 by zai-org: A 744B-A40B release from the Zhipu team, which has resulted in such a big increase in demand that they raised prices for their coding plan. It also comes with an accompanying tech report.
MiniMax-M2.5 by MiniMaxAI: Despite the relatively small size, Minimax-M2.5 can rival models such as GLM-5 and Kimi K2.5 and has quickly become one of the favorites of the community.
OpenThinker-Agent-v1 by open-thoughts: OpenThinkers, known for their open reasoning releases (such as OpenThoughts 3) are now tackling agentic reasoning. Their initial release includes SFT and RL data, as well as a “lite” version of terminal-based tasks to evaluate smaller models.

The subtle differences in architecture of these models are covered in detail in the similar, more technically focused, round-up from Sebastian Raschka, PhD — it’s a good complement if you’re looking to go deeper:

Tri-21B-Think by trillionlabs: The Korean Trillion Labs is a repeated guest at the Artifacts series. This time, they are releasing a 21B reasoning model with support for English, Korean and Japanese.
MiniCPM-SALA by openbmb: An English and Chinese 8B model with sparse attention, supporting a 1M context window.

Original source

Interconnects

https://www.interconnects.ai/p/latest-open-artifacts-19-qwen-35

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

LLMOpen WeightsChina

Analyst News

Shells

Lukas Biewald Blog

1malmost 11 years ago

Frontier Research

Understanding LSTM Networks

Long Short-Term Memory (LSTM) networks are a type of Recurrent Neural Network (RNN) specifically designed to process sequential data. They overcome limitations of traditional neural networks by using internal loops to maintain persistent memory and understand context over time. This capability makes LSTMs crucial for applications like natural language processing and video analysis.

Chris Olah Blog

1mover 10 years ago

Analyst News

What comes next with open models

Markets, capabilities, cope, and bewilderment in the industrialization of language models.

Interconnects

17m16 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 143 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Analyst News

Analyst News

OpenAI to Double Workforce as AI Competition Heats Up - Sri Lanka Guardian

<a href="https://news.google.com/rss/articles/CBMigwFBVV95cUxOc3R5U1JtNzdqUjFLcnNHREd5bVZES1gzb2k1eTh0VElKVW5hMm8wYkFtXzdVaVVOVXNaSGI0bnVCcFVQc2o4RWwyNEhWMHJETzVEa21QWmYxMkJSQWlBRU5jTU5GSVpEemtycmVydnY1NnlzMTlpcnhkd1luWm96NExDdw?oc=5" target="_blank">OpenAI to Double Workforce as AI Competition Heats Up</a> Sri Lanka Guardian

Google News - AI Sri Lanka

1m10 days ago

Analyst News

How Palantir’s AI Is Powering Deadly Strikes in Iran - Sri Lanka Guardian

<a href="https://news.google.com/rss/articles/CBMigAFBVV95cUxOY3JKZkREdTJWbGt3Rmh5NzFSaUFRNFZIM2hFWnNpckNmb2pMeVl2Rk5Fa3cwd0Q1M3JFZFBOb3NiZEc2ak1MUUJDeGlReUVIbEc0WmNibkZDa09RVlM2V1ZOc1JVRWhMdThPYlNKRDU2eFFPSEh3elFHWnpPUTZxSw?oc=5" target="_blank">How Palantir’s AI Is Powering Deadly Strikes in Iran</a> Sri Lanka Guardian

Google News - AI Sri Lanka

1m14 days ago

Analyst News

FCEL and AI Data Centers: Can 12.5 MW Blocks Drive Scale? - qz.com

<a href="https://news.google.com/rss/articles/CBMiekFVX3lxTE84dF9PQ0tWd1JGT0dKUU5PdHVZODhDNkpmUjVLR3pfVm5ERnFUU2FkdXBwdDE2Q1FnTV9SdVk2ZG8ybUNzbGl6VThYTWZXSTdlVWJZNEJlR0ZaQXBGVGxsZFNYZXlfOVRNaHBkMk9JZTVVdlY3eElMQmVn?oc=5" target="_blank">FCEL and AI Data Centers: Can 12.5 MW Blocks Drive Scale?</a> qz.com

Google News - Scale AI data

1m5 days ago

Analyst News

Chess in SQL

Comments

Hacker News

1m3 days ago