Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessA Russian governor is ordering companies to choose at least 2 employees as 'candidates' to sign up with the militaryBusiness InsiderA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeManaging Secret For Your Golang Apps With The GCP Secret ManagerDEV CommunityLiquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement LearningMarkTechPostThe Role of a Team LeadDEV CommunityGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMachines are in loop, to plan, code and pair reviewDEV CommunityWhat 10 Real AI Agent Disasters Taught Me About Autonomous SystemsDEV CommunityI built Newsroulette: the anti-feed for tech newsDEV CommunityMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderHow We Finally Solved Test DiscoveryDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessA Russian governor is ordering companies to choose at least 2 employees as 'candidates' to sign up with the militaryBusiness InsiderA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeManaging Secret For Your Golang Apps With The GCP Secret ManagerDEV CommunityLiquid AI Released LFM2.5-350M: A Compact 350M Parameter Model Trained on 28T Tokens with Scaled Reinforcement LearningMarkTechPostThe Role of a Team LeadDEV CommunityGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMachines are in loop, to plan, code and pair reviewDEV CommunityWhat 10 Real AI Agent Disasters Taught Me About Autonomous SystemsDEV CommunityI built Newsroulette: the anti-feed for tech newsDEV CommunityMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderHow We Finally Solved Test DiscoveryDEV Community

TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2602.07374v2 Announce Type: replace-cross Abstract: Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We present TernaryLM, a 132M-parameter transformer trained natively with ternary quantization {-1, 0, +1} (log2(3) ~ 1.58-bit effective precision), achieving significant memory reduction without sacrificing language modeling capability. Unlike post-training quantization approaches that quantize pre-trained full-precision models, TernaryLM learns quantizatio — Nisharg Nargund, Priyesh Shukla

View PDF HTML (experimental)

Abstract:Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We present TernaryLM, a 132M-parameter transformer trained natively with ternary quantization {-1, 0, +1} (log2(3) ~ 1.58-bit effective precision), achieving significant memory reduction without sacrificing language modeling capability. Unlike post-training quantization approaches that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors. Our experiments demonstrate: (1) validation perplexity of 58.42 on TinyStories with a cross-seed standard deviation of +/- 0.17 PPL, confirming stable optimization; (2) strong downstream transfer with 82.47% F1 on MRPC, surpassing DistilBERT despite using 55x less pretraining data; (3) 2.4x memory reduction (498 MB vs 1,197 MB for an FP32 model of identical architecture) with latency parity; and (4) an implicit regularization effect whereby the ternary constraint yields a train/val ratio of 1.05x versus 3.51x for the FP32 baseline, demonstrating that discrete weights prevent overfitting on small corpora. We provide layer-wise sparsity analysis revealing that middle transformer layers (L5-L9) achieve 60-62% quantization sparsity versus 45-55% for boundary layers, establishing an actionable design principle for non-uniform precision allocation. Our implementation and trained models are publicly available at this https URL.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2602.07374 [cs.CL]

(or arXiv:2602.07374v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2602.07374

arXiv-issued DOI via DataCite

Submission history

From: Nisharg Nargund Mr. [view email] [v1] Sat, 7 Feb 2026 05:35:17 UTC (520 KB) [v2] Fri, 27 Mar 2026 15:09:36 UTC (907 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
TernaryLM: …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 209 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers