Research Papers research paper arxiv ai artificial-intelligence

Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

arXivMarch 26, 202610 min read0 views

Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $\textit{online}$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observ — Jiajun Hu, Nuria Armengol Urpi, Jin Cheng

View PDF HTML (experimental)

Abstract:Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $\textit{online}$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a regularization critic. FB-MEBE promotes exploration by maximizing the entropy of the achieved behavior distribution. Additionally, a regularization critic shapes the recovered policies toward more natural and physically plausible behaviors. We empirically demonstrate that FB-MEBE achieves and improved performance compared to other exploration strategies in a range of simulated downstream tasks, and that it renders natural policies that can be seamlessly deployed to hardware without further finetuning. Videos and code available on our website.

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.25464 [cs.LG]

(or arXiv:2603.25464v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.25464

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Núria Armengol Urpí [view email] [v1] Thu, 26 Mar 2026 14:07:01 UTC (4,767 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25464v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Countries

From climate storytelling to AI innovation: Rice researchers take on global challenges at SXSW - Rice University

From climate storytelling to AI innovation: Rice researchers take on global challenges at SXSW Rice University

GNews AI climate

1m16 days ago

Research PapersLive

🔮 Autoresearch and the experimental society - exponentialview.co

🔮 Autoresearch and the experimental society exponentialview.co

Google News: Machine Learning

1mabout 1 hour ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models WSJ

Google News: LLM

1m2 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 175 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

🔮 Autoresearch and the experimental society - exponentialview.co

🔮 Autoresearch and the experimental society exponentialview.co

Google News: Machine Learning

1mabout 1 hour ago

Research PapersLive

Springing into AI: PyTorch Conference Europe and ICLR 2026

Article URL: https://www.collabora.com/news-and-blog/news-and-events/springing-into-ai-pytorch-conference-europe-and-iclr-2026.html Comments URL: https://news.ycombinator.com/item?id=47619120 Points: 2 # Comments: 0

Hacker News AI Top

1mabout 1 hour ago

Research Papers

Vector researchers presenting more than 98 papers at NeurIPS 2024

Leading researchers from Vector are presenting groundbreaking research at this year s Conference on Neural Information Processing Systems (NeurIPS). The conference, taking place December 10-15 in Vancouver and online, showcases innovative [ ] The post Vector researchers presenting more than 98 papers at NeurIPS 2024 appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mover 1 year ago

Research Papers

Enterprise AI vs. Consumer AI: What’s the Difference? - Oracle

Enterprise AI vs. Consumer AI: What’s the Difference? Oracle

GNews AI UK

1m24 days ago