Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessLLM Context Windows: Managing Tokens in Production AI AppsDEV CommunityPgBouncer: Database Connection Pooling That Actually ScalesDEV CommunityHow to Choose The Best Test Management Software For Your TeamDEV CommunityWhy I Built Scenar.io - An AI-Powered DevOps Interview Practice ToolDEV CommunityOAuth 2.0 Flows Demystified: Authorization Code, PKCE, and Client CredentialsDEV CommunityAI Doesn't Fix Your Development Problems. It Accelerates Them.DEV CommunityWhat Gemma 4's multi-token prediction head actually means for your eval pipelineDEV CommunityThe 3-File Context Kit: Everything Your AI Needs to Understand Your ProjectDEV CommunityMicroservices Communication: REST, gRPC, and Message QueuesDEV Community10 LLM Engineering Concepts Explained in 10 Minutes - KDnuggetsGNews AI RAGSamsung forecasts record Q1 2026 profit, up eightfold, on AI chip demand - qz.comGNews AI SamsungWHY use OBIX?DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessLLM Context Windows: Managing Tokens in Production AI AppsDEV CommunityPgBouncer: Database Connection Pooling That Actually ScalesDEV CommunityHow to Choose The Best Test Management Software For Your TeamDEV CommunityWhy I Built Scenar.io - An AI-Powered DevOps Interview Practice ToolDEV CommunityOAuth 2.0 Flows Demystified: Authorization Code, PKCE, and Client CredentialsDEV CommunityAI Doesn't Fix Your Development Problems. It Accelerates Them.DEV CommunityWhat Gemma 4's multi-token prediction head actually means for your eval pipelineDEV CommunityThe 3-File Context Kit: Everything Your AI Needs to Understand Your ProjectDEV CommunityMicroservices Communication: REST, gRPC, and Message QueuesDEV Community10 LLM Engineering Concepts Explained in 10 Minutes - KDnuggetsGNews AI RAGSamsung forecasts record Q1 2026 profit, up eightfold, on AI chip demand - qz.comGNews AI SamsungWHY use OBIX?DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

arXivby [Submitted on 16 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)]March 31, 20262 min read2 views
Source Quiz
🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a giant toy box full of building blocks, right?

Sometimes, grown-ups want to teach a robot how to build a super cool tower. But the robot gets confused by too many blocks.

This news is like saying, "What if we could take just a few special blocks out of that giant box?" These special blocks are like magic! Even though there are only a few, they teach the robot just as well as all the other blocks.

So, the robot learns faster and doesn't need a huge storage room for all the blocks. It's like making a super-summary of all the toys so the robot can learn the best way to play with less stuff! Isn't that neat?

arXiv:2603.14830v2 Announce Type: replace Abstract: Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-lay — Yuri Kinoshita, Naoki Nishikawa, Taro Toyoizumi

View PDF HTML (experimental)

Abstract:Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-layer neural networks with width $L$. By focusing on a non-linear task structure called multi-index model, we prove that the low-dimensional structure of the problem is efficiently encoded into the resulting distilled data. This dataset reproduces a model with high generalization ability for a required memory complexity of $\tilde{\Theta}(r^2d+L)$, where $d$ and $r$ are the input and intrinsic dimensions of the task. To the best of our knowledge, this is one of the first theoretical works that include a specific task structure, leverage its intrinsic dimensionality to quantify the compression rate and study dataset distillation implemented solely via gradient-based algorithms.

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2603.14830 [cs.LG]

(or arXiv:2603.14830v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.14830

arXiv-issued DOI via DataCite

Submission history

From: Yuri Kinoshita [view email] [v1] Mon, 16 Mar 2026 05:14:34 UTC (302 KB) [v2] Mon, 30 Mar 2026 13:52:03 UTC (291 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Dataset Dis…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 245 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers