Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI closes larger than expected funding round of $122bnSilicon RepublicIran threatens attacks on Nvidia, Microsoft, Intel, and other US tech firms in the Middle EastTechSpotAI tools are great for individuals. but what about your team?DEV CommunityOpenAI: We’re generating $2 billion a month - thestack.technologyGoogle News: OpenAIBeyond Human Wisdom: Can Humanity Survive the Rise of AGI?LessWrong AICreate a workspace scheduler using Bryntum Scheduler Pro and MongoDBDEV CommunityNvidia commits billions to Lumentum, Synopsys, Nokia, XAI, OpenAI, Intel in March alone - 24/7 Wall St.Google News: OpenAIDiscover a Free AI Voice Tool with Emotional Control for Content CreatorsDEV CommunitySeatGeek launches its app in ChatGPT - IQ MagazineGoogle News: ChatGPTI tested denim jackets from Banana Republic, Old Navy, and Gap. One became my new closet staple.Business InsiderReact 20 Is Coming. Here's What Actually Matters (and What Doesn't).DEV CommunityAsync/Await in JavaScript: Writing Cleaner Asynchronous CodeDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI closes larger than expected funding round of $122bnSilicon RepublicIran threatens attacks on Nvidia, Microsoft, Intel, and other US tech firms in the Middle EastTechSpotAI tools are great for individuals. but what about your team?DEV CommunityOpenAI: We’re generating $2 billion a month - thestack.technologyGoogle News: OpenAIBeyond Human Wisdom: Can Humanity Survive the Rise of AGI?LessWrong AICreate a workspace scheduler using Bryntum Scheduler Pro and MongoDBDEV CommunityNvidia commits billions to Lumentum, Synopsys, Nokia, XAI, OpenAI, Intel in March alone - 24/7 Wall St.Google News: OpenAIDiscover a Free AI Voice Tool with Emotional Control for Content CreatorsDEV CommunitySeatGeek launches its app in ChatGPT - IQ MagazineGoogle News: ChatGPTI tested denim jackets from Banana Republic, Old Navy, and Gap. One became my new closet staple.Business InsiderReact 20 Is Coming. Here's What Actually Matters (and What Doesn't).DEV CommunityAsync/Await in JavaScript: Writing Cleaner Asynchronous CodeDEV Community

Streaming experts

Simon Willison BlogMarch 24, 20261 min read0 views
Source Quiz

<p>I wrote about Dan Woods' experiments with <strong>streaming experts</strong> <a href="https://simonwillison.net/2026/Mar/18/llm-in-a-flash/">the other day</a>, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.</p> <p>Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today <a href="https://twitter.com/seikixtc/status/2036246162936910322">@seikixtc reported</a> running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.</p> <p>And <a href="https://twitter.com/anemll/status/2035901335984611412">@anemll showed</a> that same Qwen3.5-3

24th March 2026

I wrote about Dan Woods' experiments with streaming experts the other day, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.

Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.

And @anemll showed that same Qwen3.5-397B-A17B model running on an iPhone, albeit at just 0.6 tokens/second - iOS repo here.

I think this technique has legs. Dan and his fellow tinkerers are continuing to run autoresearch loops in order to find yet more optimizations to squeeze more performance out of these models.

Update: Now Daniel Isaac got Kimi K2.5 working on a 128GB M4 Max at ~1.7 tokens/second.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Streaming e…modelupdatereportresearchgithubSimon Willi…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 213 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models