Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AI

1-bit llms on device?!

Reddit r/LocalLLaMAby /u/hankybrd https://www.reddit.com/user/hankybrdApril 1, 20261 min read0 views
Source Quiz

<!-- SC_OFF --><div class="md"><p>everyone's talking about the claude code stuff (rightfully so) but <a href="https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf">this paper</a> came out today, and the claims are pretty wild:</p> <ul> <li>1-bit 8b param model that fits in 1.15 gb of memory ...</li> <li>competitive with llama3 8B and other full-precision 8B models on benchmarks</li> <li>runs at 440 tok/s on a 4090, 136 tok/s on an M4 Pro</li> <li>they got it running on an iphone at ~40 tok/s</li> <li>4-5x more energy efficient</li> </ul> <p>also it's up on <a href="https://huggingface.co/prism-ml/Bonsai-8B-gguf">hugging face</a>! i haven't played around with it yet, but curious to know what people think about this one. caltech spinout from a famous professor

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →
Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudellamamodel

Knowledge Map

Knowledge Map
TopicsEntitiesSource
1-bit llms …claudellamamodelbenchmarkclaude codepaperReddit r/Lo…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models