Research Papers research paper arxiv ai artificial-intelligence

TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling

arXivMarch 30, 202610 min read0 views

arXiv:2602.07374v2 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We present TernaryLM, a 132M-parameter transformer trained natively with ternary quantization {-1, 0, +1} (log2(3) ~ 1.58-bit effective precision), achieving significant memory reduction without sacrificing language modeling capability. Unlike post-training quantization approaches that quantize pre-trained full-precision models, TernaryLM learns quantizatio — Nisharg Nargund, Priyesh Shukla

View PDF HTML (experimental)

Abstract:Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We present TernaryLM, a 132M-parameter transformer trained natively with ternary quantization {-1, 0, +1} (log2(3) ~ 1.58-bit effective precision), achieving significant memory reduction without sacrificing language modeling capability. Unlike post-training quantization approaches that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors. Our experiments demonstrate: (1) validation perplexity of 58.42 on TinyStories with a cross-seed standard deviation of +/- 0.17 PPL, confirming stable optimization; (2) strong downstream transfer with 82.47% F1 on MRPC, surpassing DistilBERT despite using 55x less pretraining data; (3) 2.4x memory reduction (498 MB vs 1,197 MB for an FP32 model of identical architecture) with latency parity; and (4) an implicit regularization effect whereby the ternary constraint yields a train/val ratio of 1.05x versus 3.51x for the FP32 baseline, demonstrating that discrete weights prevent overfitting on small corpora. We provide layer-wise sparsity analysis revealing that middle transformer layers (L5-L9) achieve 60-62% quantization sparsity versus 45-55% for boundary layers, establishing an actionable design principle for non-uniform precision allocation. Our implementation and trained models are publicly available at this https URL.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2602.07374 [cs.CL]

(or arXiv:2602.07374v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2602.07374

arXiv-issued DOI via DataCite

Submission history

From: Nisharg Nargund Mr. [view email] [v1] Sat, 7 Feb 2026 05:35:17 UTC (520 KB) [v2] Fri, 27 Mar 2026 15:09:36 UTC (907 KB)

Original source

arXiv

https://arxiv.org/abs/2602.07374

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxOdjIwdEUxdjFvc0w4a2hMSnRNcjgtcF93TzZGMlN5dFJ1eEtNSGpQbGhHYWRfaDZCb0VrVVV5N1pGMHcxSXdHdkRMTVFxZkNyNG5ORTA2N1BUb0NKQjVTa1A1TTY3MndnLVNJMk9qR1NSVGpoYlpqaDhrUWFYVEpHZl9RdUxJbVdFZE9Tb2Ixc244d2xQQThDSHRjVy1Dc08tbndZZENKMUJzYi03TjhNNTMtYnJnUk9rOGUtTnZlbUR2RXJtTzF6d1dveHJaZE9DV0U5TDhIb3pwV0dQbGctUW4zTGoxX3JfYTNSOXpkcm9hdUNFYkxqaG5mbk1Dc2lTV2h4R2ZlN0pvcmpXajBwUFVBTWFFcExEZjh6WG02VDdISUx0NHBwZktfS1BUT1dNTU10OURFbXAzLU1saWxEb0MxVHdCUVl4SnlEdU9ob3dxY3NkeXhXN0Jpd1VnTG9HRkpkai00b2xpZ3dhOUswUmFmUmR2RTVuUXpJb2M0ckJGVzZHSy1ibjV6YWlTNjU5anZSaU16M3hFUW9PZmRvSXQ0ajd0OFA5TGlGTHJ1N05JZkYxOEJDXw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>

Google News: LLM

1mabout 11 hours ago

AI Tools

Scientists just cracked the quantum code hidden in a single atom

A research team has created a quantum logic gate that uses fewer qubits by encoding them with the powerful GKP error-correction code. By entangling quantum vibrations inside a single atom, they achieved a milestone that could transform how quantum computers scale.

ScienceDaily AI

1m7 months ago

Analyst NewsLive

Review of Kawabata's "Palm of the Hand" stories and their translation into English

Perhaps this is a somewhat unusual subject for LessWrong, but hopefully it's of some interest, if only as a case study of what we lose through translation. "Palm of the Hand stories" refer to short stories written by Kawabata between 1923 and 1972. This is a review of a collection of such stories, translated by Lane Dunlop and J. Martin Holman (JMH). [Very mild spoiler for Love Suicides, The Grasshopper and the Bell Cricket] These are some of the best 1-2 page short stories I've ever read. Unfortunately, I suspect the writing loses some luster through translation. The first part of my review focuses on some of the greatest moments in the collection. The second part focuses on a particular piece, Water (1944), which I have went through the pain of retranslating. I hope that this will both s

LessWrong AI

7m37 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 209 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

UK police force presses pause on live facial recognition after study finds racial bias

<h4>Cams statistically more likely to ID Black people, says new research</h4> <p>A UK police force has suspended its deployment of live facial recognition (LFR) technology after a study revealed it was statistically more likely to identify Black people on a watchlist database.…</p>

The Register AI/ML

1m12 days ago

Research Papers

Caltech breakthrough makes quantum memory last 30 times longer

While superconducting qubits are great at fast calculations, they struggle to store information for long periods. A team at Caltech has now developed a clever solution: converting quantum information into sound waves. By using a tiny device that acts like a miniature tuning fork, the researchers were able to extend quantum memory lifetimes up to 30 times longer than before. This breakthrough could pave the way toward practical, scalable quantum computers that can both compute and remember.

ScienceDaily AI

1m7 months ago

Research Papers

Too much screen time may be hurting kids’ hearts

More screen time among children and teens is linked to higher risks of heart and metabolic problems, particularly when combined with insufficient sleep. Danish researchers discovered a measurable rise in cardiometabolic risk scores and a metabolic “fingerprint” in frequent screen users. Experts say better sleep and balanced daily routines can help offset these effects and safeguard lifelong health.

ScienceDaily AI

1m5 months ago

Research Papers

Unbreakable? Researchers warn quantum computers have serious security flaws

Quantum computers could revolutionize everything from drug discovery to business analytics—but their incredible power also makes them surprisingly vulnerable. New research from Penn State warns that today’s quantum machines are not just futuristic tools, but potential gold mines for hackers. The study reveals that weaknesses can exist not only in software, but deep within the physical hardware itself, where valuable algorithms and sensitive data may be exposed.

ScienceDaily AI

1m2 months ago