Models model update report research github

Streaming experts

Simon Willison BlogMarch 24, 20261 min read0 views

I wrote about Dan Woods' experiments with streaming experts <a href="https://simonwillison.net/2026/Mar/18/llm-in-a-flash/">the other day</a>, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process. Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today <a href="https://twitter.com/seikixtc/status/2036246162936910322">@seikixtc reported</a> running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro. And <a href="https://twitter.com/anemll/status/2035901335984611412">@anemll showed</a> that same Qwen3.5-3

24th March 2026

I wrote about Dan Woods' experiments with streaming experts the other day, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process.

Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on an M2 Max MacBook Pro.

And @anemll showed that same Qwen3.5-397B-A17B model running on an iPhone, albeit at just 0.6 tokens/second - iOS repo here.

I think this technique has legs. Dan and his fellow tinkerers are continuing to run autoresearch loops in order to find yet more optimizations to squeeze more performance out of these models.

Update: Now Daniel Isaac got Kimi K2.5 working on a 128GB M4 Max at ~1.7 tokens/second.

Original source

Simon Willison Blog

https://simonwillison.net/2026/Mar/24/streaming-experts/#atom-everything

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelupdatereport

ReleasesLive

Iran threatens attacks on Nvidia, Microsoft, Intel, and other US tech firms in the Middle East

Days after Iran warned that offices and infrastructure belonging to US companies involved in military technology in the Middle East would be targeted, the IRGC updated its threat on Telegram. Read Entire Article

TechSpot

1m17 minutes ago

ReleasesLive

Nancy Guthrie Update: Mystery Deepens as Home Shows 'No Assault Signs' Despite Blood Found Outside Front Door

New details in the Nancy Guthrie case reveal no signs of an assault inside her home despite blood found outside, deepening the mystery.

International Business Times

4mabout 1 hour ago

ReleasesLive

Nancy Guthrie Update: New Details Provide 'Puzzling' Context to Savannah's Mum's Disappearance

Investigators probing the disappearance of 84‑year‑old Nancy Guthrie are weighing a new account that her Arizona home was 'immaculate' and showed no clear signs of a struggle.

International Business Times

3m32 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 213 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT - AiThority

<a href="https://news.google.com/rss/articles/CBMi1AFBVV95cUxQVkxhdm9KX0NMdWRwQ2pmUy1XMk1ocUtTYWt5WVpRU2M1NWNlcmZfSjhHb2hjM2FIcmhsM0lkZURja2hZcU1obi1tajlyV0NCQ2c5VTZreVVSTms0bVZpSnJRQ2x5UlA3YUIzOUQ1aHBVV2ZqTldsc25hSWNLVXdCVDhxMUxYZmRETzZxSEJmXzJZNUFpTW44NXVLeGJsbk02UVRhTHVLamh3aGhBVUM4YjFWbERjTzM5M3gzUklvbUlRVXgzeERjSnN3Sm5JbzBhMnNYOQ?oc=5" target="_blank">airSlate SignNow Partners with OpenAI to Launch the First Native eSignature App in ChatGPT</a> AiThority

Google News: ChatGPT

1m34 minutes ago

ModelsLive

SeatGeek launches its app in ChatGPT - IQ Magazine

<a href="https://news.google.com/rss/articles/CBMifEFVX3lxTE12cjR3RUJjYlpUVXJ6RUpvV2tLUEpIWV9LOHJsV1lyTDNRdTNVUzI2Q1RtY1dYTTZQdnN3cGhSNzNnd1dSamNFbHJWRHE0WFF3R3QyS0tiOGVmRktId1otRzdmNUpQYjlBd3VNdXlYRTNydWdybzlpZG9JSGI?oc=5" target="_blank">SeatGeek launches its app in ChatGPT</a> IQ Magazine

Google News: ChatGPT

1m31 minutes ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - WSJ

<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxPa0JGZTRUNGdpem5GcmF1MWoxTllsMFF5VHFxS2xxaVNtcGMwMnpHdTQ0Z01OMEJvcHJ5cW5OSlhKTVJDUkhpVFAwam9mTEJwWU9pQ0NGcW8wTEdRTHZ2ek9WcTdHRUxXZmZLSHJsTy1uTXZtSERNUTRzNWR5RWc3V2pFdTRjalduUDRmX2V6eWFPZGxSeDQtM2RjZTFHV0RuLTJhLWNFMVdCQzl1Z29ZQVIwSkdDMUJoc2dxUzVHNDQyUlVzd2VkT0VJTFZwSWFBSW42VERkZzg1Um9RWmVHQ3UwcmY5LXdhN3hVSlZ5R1BEVUdmSHZCNXdGdG9jR2s5d252SUNhamhTaGZEakxZMzRiekp0VDVqek1jVXQ4S0hhY0JGNEdrWWpiY0cwOU1QTEpBcFBDSmxYYl9vOWNtWnBEcy03MFo3bzk1c0VaUkhzVE5Fc1JOVUgtN2EyajN5cm9ka1BPMThPdEhKYU9qcEE4RHk4RGRDbks2UVQxMFNBRXpOSUhYRThHUTdCbWxvXzR1c0NGYWJpVEdjeFo3MGx3?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> WSJ

Google News: OpenAI

1m2 days ago

Models

CancerLLM: a large language model in cancer domain - npj Digital Medicine - Nature

<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE1WaVpBQ2o5ZWdBdC1vVTlITXAyZDZnUGVFdmNfRHZPcUFaYXBNUXhSSDk0Q0pZTENKb1NWZGtULTVRdU9zZUR5a0ZPNktpQl9fbUxNZ2J0dWkxc0lwVjFz?oc=5" target="_blank">CancerLLM: a large language model in cancer domain - npj Digital Medicine</a> Nature

Google News: LLM

1mabout 1 month ago