Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHow SPACElab Has Integrated Science and Family Legacy to Craft Functional BeveragesInternational Business TimesDo You Trust Me? A Framework For Making Networks of Robots and Vehicles Safer - Harvard School of Engineering and Applied SciencesGoogle News: Machine LearningKubeCon Europe 2026: The Not-So-Unseen Engine Behind AI Innovation?Forrester AI Blog2. Mastering Time Series Forecasting with Python and timesfmDEV CommunityAirPods Max 2 reviewed: premium sound, top-tier ANC, same high priceTechSpotn8n Docker Setup: Why It Breaks (And the Easier Alternative)DEV Community1. Orchestrating AI Teams: A Python Guide to ChatDevDEV CommunityAI companies charge you 60% more based on your language, BPE tokensHacker NewsHow I Reverse-Engineered Claude Code's Hidden Pet SystemDEV Community@craft-ng: Associer l’art de la composition & du state management dans AngularDEV Community🔬 3D Science Lab — Interactive 3D STEM Education with 40+ Experiments Built Using Next.js and Three.jsDEV CommunityI Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible ModelsDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHow SPACElab Has Integrated Science and Family Legacy to Craft Functional BeveragesInternational Business TimesDo You Trust Me? A Framework For Making Networks of Robots and Vehicles Safer - Harvard School of Engineering and Applied SciencesGoogle News: Machine LearningKubeCon Europe 2026: The Not-So-Unseen Engine Behind AI Innovation?Forrester AI Blog2. Mastering Time Series Forecasting with Python and timesfmDEV CommunityAirPods Max 2 reviewed: premium sound, top-tier ANC, same high priceTechSpotn8n Docker Setup: Why It Breaks (And the Easier Alternative)DEV Community1. Orchestrating AI Teams: A Python Guide to ChatDevDEV CommunityAI companies charge you 60% more based on your language, BPE tokensHacker NewsHow I Reverse-Engineered Claude Code's Hidden Pet SystemDEV Community@craft-ng: Associer l’art de la composition & du state management dans AngularDEV Community🔬 3D Science Lab — Interactive 3D STEM Education with 40+ Experiments Built Using Next.js and Three.jsDEV CommunityI Turned helix-agent into helix-agents: One MCP Server for Ollama, Codex, and OpenAI-Compatible ModelsDEV Community

TED: Training-Free Experience Distillation for Multimodal Reasoning

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.26778v1 Announce Type: cross Abstract: Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-context experience injected into the student's pro — Shuozhi Yuan, Jinqing Wang, Zihao Liu, Miaomiao Yuan, Haoran Peng, Jin Zhao, Bingwen Wang, Haoyi Wang

View PDF HTML (experimental)

Abstract:Knowledge distillation is typically realized by transferring a teacher model's knowledge into a student's parameters through supervised or reinforcement-based optimization. While effective, such approaches require repeated parameter updates and large-scale training data, limiting their applicability in resource-constrained environments. In this work, we propose TED, a training-free, context-based distillation framework that shifts the update target of distillation from model parameters to an in-context experience injected into the student's prompt. For each input, the student generates multiple reasoning trajectories, while a teacher independently produces its own solution. The teacher then compares the student trajectories with its reasoning and the ground-truth answer, extracting generalized experiences that capture effective reasoning patterns. These experiences are continuously refined and updated over time. A key challenge of context-based distillation is unbounded experience growth and noise accumulation. TED addresses this with an experience compression mechanism that tracks usage statistics and selectively merges, rewrites, or removes low-utility experiences. Experiments on multimodal reasoning benchmarks MathVision and VisualPuzzles show that TED consistently improves performance. On MathVision, TED raises the performance of Qwen3-VL-8B from 0.627 to 0.702, and on VisualPuzzles from 0.517 to 0.561 with just 100 training samples. Under this low-data, no-update setting, TED achieves performance competitive with fully trained parameter-based distillation while reducing training cost by over 5x, demonstrating that meaningful knowledge transfer can be achieved through contextual experience.

Comments: 13 pages,4 figures

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26778 [cs.LG]

(or arXiv:2603.26778v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26778

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Shuozhi Yuan [view email] [v1] Wed, 25 Mar 2026 01:08:36 UTC (552 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
TED: Traini…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 213 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers