Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow to secure MCP tools on AWS for AI agents with authentication, authorization, and least privilegeDev.to AIOpen Source Project of the Day (Part 30): banana-slides - Native AI PPT Generation App Based on nano banana proDev.to AIStop Writing AI Prompts From Scratch: A Developer's System for Reusable Prompt TemplatesDev.to AII Tested Every 'Memory' Solution for AI Coding Assistants - Here's What Actually WorksDev.to AIThe Flat Subscription Problem: Why Agents Break AI PricingDev.to AI10 Things I Wish I Knew Before Becoming an AI AgentDev.to AIGemma 4 Complete Guide: Architecture, Models, and Deployment in 2026Dev.to AI135,000 OpenClaw Users Just Got a 50x Price Hike. Anthropic Says It's 'Unsustainable.'Dev.to AIОдин промпт заменил мне 3 часа дебага в деньDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIciflow/trunk/177707PyTorch ReleasesShow HN: Vibooks – Local-first bookkeeping software built for AI agentsHacker News AI TopBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow to secure MCP tools on AWS for AI agents with authentication, authorization, and least privilegeDev.to AIOpen Source Project of the Day (Part 30): banana-slides - Native AI PPT Generation App Based on nano banana proDev.to AIStop Writing AI Prompts From Scratch: A Developer's System for Reusable Prompt TemplatesDev.to AII Tested Every 'Memory' Solution for AI Coding Assistants - Here's What Actually WorksDev.to AIThe Flat Subscription Problem: Why Agents Break AI PricingDev.to AI10 Things I Wish I Knew Before Becoming an AI AgentDev.to AIGemma 4 Complete Guide: Architecture, Models, and Deployment in 2026Dev.to AI135,000 OpenClaw Users Just Got a 50x Price Hike. Anthropic Says It's 'Unsustainable.'Dev.to AIОдин промпт заменил мне 3 часа дебага в деньDev.to AIBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIciflow/trunk/177707PyTorch ReleasesShow HN: Vibooks – Local-first bookkeeping software built for AI agentsHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

PrismML debuts energy-sipping 1-bit LLM in bid to free AI from the cloud

The Register AI/MLby Thomas Claburn https://search.theregister.com/?author=Thomas%20ClaburnApril 4, 20261 min read0 views
Source Quiz

Bonasi 8B model is competitive with other 8B models but 14x smaller and 5x more energy efficient PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models, with the expectation that it will improve AI efficiency and viability on mobile devices, among other applications.…

PrismML, an AI venture out of Caltech, has released a 1-bit large language model that outperforms weightier models, with the expectation that it will improve AI efficiency and viability on mobile devices, among other applications.

The model, dubbed Bonsai 8B, manages to be small and fast, with modest power demands and benchmark performance characteristics that rival much larger models.

"Our first proof point is 1-bit Bonsai 8B, a 1-bit model that fits into 1.15 GB of memory and delivers over 10x the intelligence density of its full-precision counterparts," the company said in a social media post. "It is 14x smaller, 8x faster, and 5x more energy efficient on edge hardware while remaining competitive with other models in its parameter-class."

AI models based on the Transformer architecture involve neural networks with millions or billions of weights, which control the strength of connections between neurons and influence how the model performs tasks. They're set during the training process and they take up memory space based on the precision used to represent them.

A model quantized at GGUF FP16 (16 bits) will take up much more space than one quantized at GGUF Q8_0 (8 bits) or GGUF Q4_0 (4 bits) or GGUF Q2_K (2 bits). That's excluding metadata and overhead that might increase actual storage space required. But given the same basic architecture, 16-bit models generally perform better than models quantized at lower levels.

PrismML's Bonsai model family is based on an architecture where "each weight is represented only by its sign, {−1, +1}, while a shared scale factor is stored for each group of weights," as explained in the company's white paper [PDF], instead of a 16-bit or 32-bit floating point number. Researchers have been working on improved approaches to quantization for many years, described in papers like "BitNet: Bit-Regularized Deep Neural Networks" (2017) and "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits" (2024).

PrismML's approach is based on work done by Caltech electrical engineering professor Babak Hassibi and colleagues. The company claims that its 1-bit architecture avoids the tradeoffs that historically have accompanied low-bit quantization, specifically poor instruction following, errant multi-step reasoning, and unreliable tool use.

  • Claude Code source leak reveals how much info Anthropic can hoover up about you and your system

  • Claude Code bypasses safety rule if given too many commands

  • Amazon security boss: AI makes pentesting 40% more efficient

  • OpenAI gets $122B to 'just build things' as the world blows them up

"We spent years developing the mathematical theory required to compress a neural network without losing its reasoning capabilities," said Babak Hassibi, CEO and founder of PrismML, in a statement. "We see 1-bit not as an endpoint, but as a starting point."

Hassibi argues that the company's 1-bit architecture establishes a new paradigm for AI that's focused on intelligence per unit of compute and energy.

To encourage others to think along these lines – remember when performance-per-watt became a thing? – PrismML proposes the measurement of intelligence density, a metric that shows its models in a good light.

"We define intelligence density as the negative of the log of the model's average error rate (across the same benchmark suite) divided by the model size," the company explains.

Assessed for intelligence density, Qwen3 8B, which comes out a bit ahead of Bonsai 8B in various benchmarks (MMLU Redux, MuSR, GSM8K, etc), scores just 0.10/GB for intelligence density, far short of Bonsai 8B at 1.06/GB.

Metrics may matter for marketing, but the more meaningful yardstick for PrismML's models is their potential to move AI out of cloud datacenters. The company foresees its models powering on-device agents, real-time robotics, secure enterprise systems, and other projects where memory bandwidth, power, or compliance constraints can hinder deployment.

"1-bit Bonsai 8B runs natively on Apple devices (Mac, iPhone, iPad) via MLX, on Nvidia GPUs via llama.cpp CUDA," the company says. "Model weights are available today under the Apache 2.0 License."

Two smaller models are also available: 1-bit Bonsai 4B and 1-bit Bonsai 1.7B. ®

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelrelease

Knowledge Map

Knowledge Map
TopicsEntitiesSource
PrismML deb…modellanguage mo…releaseapplicationventureThe Registe…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 229 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models

Один промпт заменил мне 3 часа дебага в день
ModelsLive

Один промпт заменил мне 3 часа дебага в день

Вечерами, когда большинство уже отдыхает, я зависаю в своём офисе и ковыряюсь с кодом. Тот 14 августа, в 21:45, не был исключением. Я опять сидел над этой задачей, которая съедала по три часа каждый день. Почему это была боль Всё началось с простого: проект на Python, который выглядел как очередное рутинное задание. Однако вычисления упорно выдавали ошибочные результаты. Три дня подряд я безуспешно искал причину. Как обычно, приходилось проверять каждую строчку, каждую переменную. Это было настоящим адом. Для фрилансера с жесткими сроками это катастрофа - теряешь время, не зарабатываешь, а заказчик ждёт. Я собрал промпты по этой теме в PDF. Забери бесплатно: https://t.me/airozov_bot Как я нашёл решение Тогда я решил попробовать ChatGPT, хотя и не особо верил в его чудеса. Вбил проблему в п