Research Papers research paper arxiv ai artificial-intelligence

Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching

arXivby [Submitted on 27 Mar 2026]March 31, 20262 min read1 views

arXiv:2603.27044v1 Announce Type: cross Abstract: Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space $\Theta$ into a low-dimensional latent manifold $\mathcal Z$ using a learned generative mapping $g:\mathcal Z \to \Theta$. However, its performance is severely constrained by relying on immediate a — Andrea Fraschini, Davide Tenedini, Riccardo Zamboni, Mirco Mutti, Marcello Restelli

View PDF HTML (experimental)

Abstract:Deep Reinforcement Learning (DRL) is widely recognized as sample-inefficient, a limitation attributable in part to the high dimensionality and substantial functional redundancy inherent to the policy parameter space. A recent framework, which we refer to as Action-based Policy Compression (APC), mitigates this issue by compressing the parameter space $\Theta$ into a low-dimensional latent manifold $\mathcal Z$ using a learned generative mapping $g:\mathcal Z \to \Theta$. However, its performance is severely constrained by relying on immediate action-matching as a reconstruction loss, a myopic proxy for behavioral similarity that suffers from compounding errors across sequential decisions. To overcome this bottleneck, we introduce Occupancy-based Policy Compression (OPC), which enhances APC by shifting behavior representation from immediate action-matching to long-horizon state-space coverage. Specifically, we propose two principal improvements: (1) we curate the dataset generation with an information-theoretic uniqueness metric that delivers a diverse population of policies; and (2) we propose a fully differentiable compression objective that directly minimizes the divergence between the true and reconstructed mixture occupancy distributions. These modifications force the generative model to organize the latent space around true functional similarity, promoting a latent representation that generalizes over a broad spectrum of behaviors while retaining most of the original parameter space's expressivity. Finally, we empirically validate the advantages of our contributions across multiple continuous control benchmarks.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.27044 [cs.LG]

(or arXiv:2603.27044v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27044

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Davide Tenedini [view email] [v1] Fri, 27 Mar 2026 23:16:27 UTC (42,368 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27044

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Market NewsFresh

AI and Robotics in Aerospace and Defense Market to Reach USD 39.9 Billion by 2032, Says Stratview Research - openpr.com

AI and Robotics in Aerospace and Defense Market to Reach USD 39.9 Billion by 2032, Says Stratview Research openpr.com

Google News - AI robotics

1mabout 4 hours ago

CountriesLive

With an eye on China, Japan looks to kamikaze drones and low-cost missiles

Japan plans to introduce a fleet of kamikaze drones and low-cost missiles to boost deterrence against regional threats including China, according to Japanese media reports. The Yomiuri newspaper and Kyodo news agency reported on Wednesday that the strategy was focused on “integrated attacks” from unmanned aerial vehicles and long-range stand-off missiles, citing government and ruling coalition sources. They said the drones and missiles would be used to break down enemy air defences and...

SCMP Tech (Asia AI)

2m21 minutes ago

Research Papers

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches - gnet-research.org

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches gnet-research.org

GNews AI voice

1m4 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 165 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches - gnet-research.org

Neo-Nazi Exploitation Online: AI Voice-Cloning and the Revival of Hitler Speeches gnet-research.org

GNews AI voice

1m4 months ago

Research PapersFresh

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction

arXiv:2604.01756v1 Announce Type: new Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and

arXiv cs.RO

2mabout 6 hours ago

Research PapersFresh

3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements

arXiv:2604.01703v1 Announce Type: new Abstract: Realizing relative localization by leveraging inter-robot local measurements is a challenging problem, especially in the presence of measurement noise. Motivated by this challenge, in this paper we propose a novel and systematic 3-D relative localization framework based on inter-robot interior angle and self-displacement measurements. Initially, we propose a linear relative localization theory comprising a distributed linear relative localization algorithm and sufficient conditions for localizability. According to this theory, robots can determine their neighbors' relative positions and orientations in a purely linear manner. Subsequently, in order to deal with measurement noise, we present an advanced Maximum a Posterior (MAP) estimator by a

arXiv cs.RO

2mabout 6 hours ago

Research PapersFresh

Coupler Position Optimization and Channel Estimation for Flexible Coupler Antenna Aided Multiuser Communication

arXiv:2602.11319v2 Announce Type: replace-cross Abstract: In this paper, we propose a distributed flexible coupler antenna (FCA) array to enhance communication performance with low hardware cost. At each FCA, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FCA is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FCA-aided multiuser multiple-input multiple-output (MIMO) system, where an FCA array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of us

arXiv eess.SP

2mabout 6 hours ago