Live
Black Hat USADark ReadingBlack Hat AsiaAI Businesshiggsfield brings art-directed quality to AI image generation at production scale - DesignboomGNews AI artWith an eye on China, Japan looks to kamikaze drones and low-cost missilesSCMP Tech (Asia AI)Passive Income with AI in 2026: 7 Autonomous Systems Making Money While You SleepDev.to AIRamu Gopal: The Engineer Bridging CAD Automation and AI SystemsMedium AIlantea AI用的什么AI模型Dev.to AI汽车智能驾驶行业的公司做校招可以怎么选校招服务商Dev.to AIWhy AI’s Next Leap Won’t Come From Bigger Models But From Better ArchitecturesMedium AIThis AI Agent Backend Took Me Days — You Can Use It in MinutesMedium AIA 95% Facial Match Falls Apart If the Face Itself Is FakeDev.to AIGemma 4 Explained: What It Is, What It Can Do, And How To Use It Right Now - NDTVGNews AI GemmaI Mapped the OWASP Top 10 for AI Agents Against My Scanner — Here's What's MissingDev.to AIClaude Code Source Leak: What Anthropic’s Hidden Features Actually RevealMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI Businesshiggsfield brings art-directed quality to AI image generation at production scale - DesignboomGNews AI artWith an eye on China, Japan looks to kamikaze drones and low-cost missilesSCMP Tech (Asia AI)Passive Income with AI in 2026: 7 Autonomous Systems Making Money While You SleepDev.to AIRamu Gopal: The Engineer Bridging CAD Automation and AI SystemsMedium AIlantea AI用的什么AI模型Dev.to AI汽车智能驾驶行业的公司做校招可以怎么选校招服务商Dev.to AIWhy AI’s Next Leap Won’t Come From Bigger Models But From Better ArchitecturesMedium AIThis AI Agent Backend Took Me Days — You Can Use It in MinutesMedium AIA 95% Facial Match Falls Apart If the Face Itself Is FakeDev.to AIGemma 4 Explained: What It Is, What It Can Do, And How To Use It Right Now - NDTVGNews AI GemmaI Mapped the OWASP Top 10 for AI Agents Against My Scanner — Here's What's MissingDev.to AIClaude Code Source Leak: What Anthropic’s Hidden Features Actually RevealMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching

arXivby [Submitted on 27 Mar 2026]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.27044v1 Announce Type: cross Abstract: Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space $\Theta$ into a low-dimensional latent manifold $\mathcal Z$ using a learned generative mapping $g:\mathcal Z \to \Theta$. However, its performance is severely constrained by relying on immediate a — Andrea Fraschini, Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli

View PDF HTML (experimental)

Abstract:Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space $\Theta$ into a low-dimensional latent manifold $\mathcal Z$ using a learned generative mapping $g:\mathcal Z \to \Theta$. However, its performance is severely constrained by relying on immediate action-matching as a reconstruction loss, a myopic proxy for behavioral similarity that suffers from compounding errors across sequential decisions. To overcome this bottleneck, we introduce Occupancy-based Policy Compression (OPC), which enhances APC by shifting behavior representation from immediate action-matching to long-horizon state-space coverage. Specifically, we propose two principal improvements: (1) we curate the dataset generation with an information-theoretic uniqueness metric that delivers a diverse population of policies; and (2) we propose a fully differentiable compression objective that directly minimizes the divergence between the true and reconstructed mixture occupancy distributions. These modifications force the generative model to organize the latent space around true functional similarity, promoting a latent representation that generalizes over a broad spectrum of behaviors while retaining most of the original parameter space's expressivity. Finally, we empirically validate the advantages of our contributions across multiple continuous control benchmarks.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.27044 [cs.LG]

(or arXiv:2603.27044v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27044

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Davide Tenedini [view email] [v1] Fri, 27 Mar 2026 23:16:27 UTC (42,368 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Unsupervise…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 165 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers