Streaming experts
<p>I wrote about Dan Woods' experiments with <strong>streaming experts</strong> <a href="https://simonwillison.net/2026/Mar/18/llm-in-a-flash/">the other day</a>, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.</p> <p>Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today <a href="https://twitter.com/seikixtc/status/2036246162936910322">@seikixtc reported</a> running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.</p> <p>And <a href="https://twitter.com/anemll/status/2035901335984611412">@anemll showed</a> that same Qwen3.5-3
24th March 2026
I wrote about Dan Woods' experiments with streaming experts the other day, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.
Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.
And @anemll showed that same Qwen3.5-397B-A17B model running on an iPhone, albeit at just 0.6 tokens/second - iOS repo here.
I think this technique has legs. Dan and his fellow tinkerers are continuing to run autoresearch loops in order to find yet more optimizations to squeeze more performance out of these models.
Update: Now Daniel Isaac got Kimi K2.5 working on a 128GB M4 Max at ~1.7 tokens/second.
Simon Willison Blog
https://simonwillison.net/2026/Mar/24/streaming-experts/#atom-everythingSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modelupdatereportIran threatens attacks on Nvidia, Microsoft, Intel, and other US tech firms in the Middle East
Days after Iran warned that offices and infrastructure belonging to US companies involved in military technology in the Middle East would be targeted, the IRGC updated its threat on Telegram. Read Entire Article


Nancy Guthrie Update: New Details Provide 'Puzzling' Context to Savannah's Mum's Disappearance
Investigators probing the disappearance of 84‑year‑old Nancy Guthrie are weighing a new account that her Arizona home was 'immaculate' and showed no clear signs of a struggle.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT - AiThority
<a href="https://news.google.com/rss/articles/CBMi1AFBVV95cUxQVkxhdm9KX0NMdWRwQ2pmUy1XMk1ocUtTYWt5WVpRU2M1NWNlcmZfSjhHb2hjM2FIcmhsM0lkZURja2hZcU1obi1tajlyV0NCQ2c5VTZreVVSTms0bVZpSnJRQ2x5UlA3YUIzOUQ1aHBVV2ZqTldsc25hSWNLVXdCVDhxMUxYZmRETzZxSEJmXzJZNUFpTW44NXVLeGJsbk02UVRhTHVLamh3aGhBVUM4YjFWbERjTzM5M3gzUklvbUlRVXgzeERjSnN3Sm5JbzBhMnNYOQ?oc=5" target="_blank">airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT</a> <font color="#6f6f6f">AiThority</font>
SeatGeek launches its app in ChatGPT - IQ Magazine
<a href="https://news.google.com/rss/articles/CBMifEFVX3lxTE12cjR3RUJjYlpUVXJ6RUpvV2tLUEpIWV9LOHJsV1lyTDNRdTNVUzI2Q1RtY1dYTTZQdnN3cGhSNzNnd1dSamNFbHJWRHE0WFF3R3QyS0tiOGVmRktId1otRzdmNUpQYjlBd3VNdXlYRTNydWdybzlpZG9JSGI?oc=5" target="_blank">SeatGeek launches its app in ChatGPT</a> <font color="#6f6f6f">IQ Magazine</font>
Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ
<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxPa0JGZTRUNGdpem5GcmF1MWoxTllsMFF5VHFxS2xxaVNtcGMwMnpHdTQ0Z01OMEJvcHJ5cW5OSlhKTVJDUkhpVFAwam9mTEJwWU9pQ0NGcW8wTEdRTHZ2ek9WcTdHRUxXZmZLSHJsTy1uTXZtSERNUTRzNWR5RWc3V2pFdTRjalduUDRmX2V6eWFPZGxSeDQtM2RjZTFHV0RuLTJhLWNFMVdCQzl1Z29ZQVIwSkdDMUJoc2dxUzVHNDQyUlVzd2VkT0VJTFZwSWFBSW42VERkZzg1Um9RWmVHQ3UwcmY5LXdhN3hVSlZ5R1BEVUdmSHZCNXdGdG9jR2s5d252SUNhamhTaGZEakxZMzRiekp0VDVqek1jVXQ4S0hhY0JGNEdrWWpiY0cwOU1QTEpBcFBDSmxYYl9vOWNtWnBEcy03MFo3bzk1c0VaUkhzVE5Fc1JOVUgtN2EyajN5cm9ka1BPMThPdEhKYU9qcEE4RHk4RGRDbks2UVQxMFNBRXpOSUhYRThHUTdCbWxvXzR1c0NGYWJpVEdjeFo3MGx3?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> <font color="#6f6f6f">WSJ</font>
CancerLLM: a large language model in cancer domain - npj Digital Medicine - Nature
<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE1WaVpBQ2o5ZWdBdC1vVTlITXAyZDZnUGVFdmNfRHZPcUFaYXBNUXhSSDk0Q0pZTENKb1NWZGtULTVRdU9zZUR5a0ZPNktpQl9fbUxNZ2J0dWkxc0lwVjFz?oc=5" target="_blank">CancerLLM: a large language model in cancer domain - npj Digital Medicine</a> <font color="#6f6f6f">Nature</font>
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!