PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs
<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1s90wo4/prismml_announcing_1bit_bonsai_the_first/"> <img src="https://external-preview.redd.it/2MPuWzoesFag6DCF_VFzPwvyl7OF67TTNQpyVCauVaM.png?width=640&crop=smart&auto=webp&s=affc7807966287e6afa631a610e5ee9d92411731" alt="PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs" title="PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs" /> </a> </td><td>   submitted by   <a href="https://www.reddit.com/user/brown2green"> /u/brown2green </a> <br/> <span><a href="https://prismml.com/news/bonsai-8b">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1s90wo4/prismml_announcing_1bit_bonsai_the_first/">[comments]</a></span> </td>
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1s90wo4/prismml_announcing_1bit_bonsai_the_first/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamareviewFrom Direct Classification to Agentic Routing: When to Use Local Models vs Azure AI
<p>In many enterprise workflows, classification sounds simple.</p> <p>An email arrives.<br><br> A ticket is created.<br><br> A request needs to be routed.</p> <p>At first glance, it feels like a straightforward model problem:</p> <ul> <li>classify the input</li> <li>assign a category</li> <li>trigger the next step</li> </ul> <p>But in practice, enterprise classification is rarely just about model accuracy.</p> <p>It is also about:</p> <ul> <li>latency</li> <li>cost</li> <li>governance</li> <li>data sensitivity</li> <li>operational fit</li> <li>fallback behavior</li> </ul> <p>That is where the architecture becomes more important than the model itself.</p> <p>In this post, I want to share a practical way to think about classification systems in enterprise environments:</p> <ul> <li>when <str
1-bit llms on device?!
<!-- SC_OFF --><div class="md"><p>everyone's talking about the claude code stuff (rightfully so) but <a href="https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf">this paper</a> came out today, and the claims are pretty wild:</p> <ul> <li>1-bit 8b param model that fits in 1.15 gb of memory ...</li> <li>competitive with llama3 8B and other full-precision 8B models on benchmarks</li> <li>runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro</li> <li>they got it running on an iphone at ~40 tok/s</li> <li>4-5x more energy efficient</li> </ul> <p>also it's up on <a href="https://huggingface.co/prism-ml/Bonsai-8B-gguf">hugging face</a>! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor
New build
<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1s95cpa/new_build/"> <img src="https://preview.redd.it/6gcwubqi5hsg1.jpeg?width=640&crop=smart&auto=webp&s=dcc11e379b3473a61e23b0d2d398400393fef9b4" alt="New build" title="New build" /> </a> </td><td> <!-- SC_OFF --><div class="md"><p>Seasonic 1600w titanium power supply</p> <p>Supermicro X13SAE-F</p> <p>Intel i9-13900k</p> <p>4x 32GB micron ECC udimms</p> <p>3x intel 660p 2TB m2 ssd</p> <p>2x micron 9300 15.36TB u2 ssd (not pictured)</p> <p>2x RTX 6000 Blackwell max-q</p> <p>Due to lack of pci lanes gpus are running at x8 pci 5.0</p> <p>I may upgrade to a better cpu to handle both cards at x16 once ddr5 ram prices go down.</p> <p>Would upgrading cpu and increasing ram channels matter really that much?</p> <
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models
AI 週報:2026/3/27–4/1 Anthropic 一週三震、Arm 首顆自研晶片、Oracle 裁三萬人押注 AI
<blockquote> <p><strong>本週一句話摘要:</strong> Anthropic 成為全場焦點——但聚光燈有一半來自意外。</p> </blockquote> <h2> 一、最重要事件:Anthropic 的三重震盪 </h2> <p>本週沒有一家公司比 Anthropic 更頻繁登上頭條——但三次曝光中有兩次並非計畫內。</p> <h3> 1. IPO 計畫曝光(3/27) </h3> <p>Bloomberg 報導,Anthropic 正考慮<strong>最快十月掛牌上市</strong>,募資規模可能超過 <strong>600 億美元</strong>。消息指出公司已與 Goldman Sachs、JPMorgan、Morgan Stanley 進行初步接洽。</p> <p>背景脈絡:</p> <div class="table-wrapper-paragraph"><table> <thead> <tr> <th>指標</th> <th>數字</th> </tr> </thead> <tbody> <tr> <td>二月融資輪估值</td> <td>3,800 億美元</td> </tr> <tr> <td>付費訂閱成長</td> <td>2026 年以來翻倍以上</td> </tr> <tr> <td>企業客戶首次採購勝率</td> <td>約 70%(vs. OpenAI)</td> </tr> <tr> <td>每日新增免費用戶</td> <td>超過 100 萬人</td> </tr> </tbody> </table></div> <p>這不只是一場 IPO——這是 AI 產業從「誰的模型更強」轉向「誰先成為上市公司」的訊號。Anthropic 與 OpenAI 同時在準備 2026 年上市,資本市場正在為 AI 雙雄的對決定價。
New build
<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1s95cpa/new_build/"> <img src="https://preview.redd.it/6gcwubqi5hsg1.jpeg?width=640&crop=smart&auto=webp&s=dcc11e379b3473a61e23b0d2d398400393fef9b4" alt="New build" title="New build" /> </a> </td><td> <!-- SC_OFF --><div class="md"><p>Seasonic 1600w titanium power supply</p> <p>Supermicro X13SAE-F</p> <p>Intel i9-13900k</p> <p>4x 32GB micron ECC udimms</p> <p>3x intel 660p 2TB m2 ssd</p> <p>2x micron 9300 15.36TB u2 ssd (not pictured)</p> <p>2x RTX 6000 Blackwell max-q</p> <p>Due to lack of pci lanes gpus are running at x8 pci 5.0</p> <p>I may upgrade to a better cpu to handle both cards at x16 once ddr5 ram prices go down.</p> <p>Would upgrading cpu and increasing ram channels matter really that much?</p> <
1-bit llms on device?!
<!-- SC_OFF --><div class="md"><p>everyone's talking about the claude code stuff (rightfully so) but <a href="https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf">this paper</a> came out today, and the claims are pretty wild:</p> <ul> <li>1-bit 8b param model that fits in 1.15 gb of memory ...</li> <li>competitive with llama3 8B and other full-precision 8B models on benchmarks</li> <li>runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro</li> <li>they got it running on an iphone at ~40 tok/s</li> <li>4-5x more energy efficient</li> </ul> <p>also it's up on <a href="https://huggingface.co/prism-ml/Bonsai-8B-gguf">hugging face</a>! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor

Me: avoiding r/LocalLLaMA on April Fools’ Day so I don’t fall for fake model releases.
<table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1s973u2/me_avoiding_rlocalllama_on_april_fools_day_so_i/"> <img src="https://preview.redd.it/km4rhb1djhsg1.gif?width=320&crop=smart&s=5f34e0e2ec7deee2ee9daa7eb56bc4b0d0ccbaf6" alt="Me: avoiding r/LocalLLaMA on April Fools’ Day so I don’t fall for fake model releases." title="Me: avoiding r/LocalLLaMA on April Fools’ Day so I don’t fall for fake model releases." /> </a> </td><td> <!-- SC_OFF --><div class="md"><p>See y’all April 2nd. </p> </div><!-- SC_ON -->   submitted by   <a href="https://www.reddit.com/user/Porespellar"> /u/Porespellar </a> <br/> <span><a href="https://i.redd.it/km4rhb1djhsg1.gif">[link]</a></span>   <span><a href="https://www.reddit.com/r/LocalLLaMA/comments/1s973u2/me_avoiding_

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!