Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIAustin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 (Samantha Subin/CNBC)TechmemeWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIQuantum Machine Learning Gains Vital Reliability Checks For Data Mapping - Quantum ZeitgeistGoogle News: Machine Learningb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AIBlack Hat USADark ReadingBlack Hat AsiaAI Businessv1.83.0-nightlyLiteLLM ReleasesShow HN: Tama96 – A virtual pet for your desktop, terminal, or AI agentHacker News AI TopWhy Your AI Agent Shouldn't Define WordsHacker News AI TopCaltech Researchers Claim Compression of High-Fidelity AI ModelsHacker News AI Topb8601llama.cpp ReleasesCafé, e o prompt principal para gerar as ilustrações — Temperança DigitalMedium AIAustin-based Saronic, which builds military autonomous ships, raised a $1.75B Series D led by Kleiner Perkins at a $9.25B valuation, up from $4B in Feb. 2025 (Samantha Subin/CNBC)TechmemeWe Don’t Have a Memory Problem. We Have a Knowledge Problem.Medium AII Use AI to Prepare for Every Oral Exam. Here’s Exactly How.Medium AIQuantum Machine Learning Gains Vital Reliability Checks For Data Mapping - Quantum ZeitgeistGoogle News: Machine Learningb8600llama.cpp ReleasesFalse Flags Are Killing Writers— Here’s How to Avoid Them in 2026Medium AI

On-Policy Self-Distillation for Reasoning Compression

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.05433v4 Announce Type: replace Abstract: Reasoning models think out loud, but much of what they say is noise. We introduce OPSDC (On-Policy Self-Distillation for Reasoning Compression), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a "be concise" instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet th — Hejian Sang, Yuanda Xu, Zhengze Zhou, Ran He, Zhipeng Wang, Jiachen Sun

View PDF HTML (experimental)

Abstract:Reasoning models think out loud, but much of what they say is noise. We introduce OPSDC (On-Policy Self-Distillation for Reasoning Compression), a method that teaches models to reason more concisely by distilling their own concise behavior back into themselves. The entire approach reduces to one idea: condition the same model on a "be concise" instruction to obtain teacher logits, and minimize per-token reverse KL on the student's own rollouts. No ground-truth answers, no token budgets, no difficulty estimators. Just self-distillation. Yet this simplicity belies surprising sophistication: OPSDC automatically compresses easy problems aggressively while preserving the deliberation needed for hard ones. On Qwen3-8B and Qwen3-14B, we achieve 57-59% token reduction on MATH-500 while improving accuracy by 9-16 points absolute. On AIME 2024, the 14B model gains 10 points with 41% compression. The secret? Much of what reasoning models produce is not just redundant-it is actively harmful, compounding errors with every unnecessary token. Code is available at this https URL.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.05433 [cs.LG]

(or arXiv:2603.05433v4 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.05433

arXiv-issued DOI via DataCite

Submission history

From: Hejian Sang [view email] [v1] Thu, 5 Mar 2026 17:54:40 UTC (571 KB) [v2] Sun, 8 Mar 2026 06:29:26 UTC (570 KB) [v3] Tue, 17 Mar 2026 05:05:03 UTC (570 KB) [v4] Sat, 28 Mar 2026 03:56:28 UTC (591 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
On-Policy S…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 147 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers