Research Papers research paper arxiv machine-learning deep-learning

Decoupling Exploration and Policy Optimization: Uncertainty Guided Tree Search for Hard Exploration

arXivMarch 30, 202610 min read0 views

arXiv:2603.22273v2 Announce Type: replace Abstract: The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards. We suggest that this approach incurs unnecessary overhead: while policy optimization is necessary for precise task execution, employing such machinery solely to expand stat — Zakaria Mhammedi, James Cohan

View PDF

Abstract:The process of discovery requires active exploration -- the act of collecting new and informative data. However, efficient autonomous exploration remains a major unsolved problem. The dominant paradigm addresses this challenge by using Reinforcement Learning (RL) to train agents with intrinsic motivation, maximizing a composite objective of extrinsic and intrinsic rewards. We suggest that this approach incurs unnecessary overhead: while policy optimization is necessary for precise task execution, employing such machinery solely to expand state coverage may be inefficient. In this paper, we propose a new paradigm that explicitly separates exploration from exploitation and bypasses RL during the exploration phase. Our method uses a tree-search strategy inspired by the Go-With-The-Winner algorithm, paired with a measure of epistemic uncertainty to systematically drive exploration. By removing the overhead of policy optimization, our approach explores an order of magnitude more efficiently than standard intrinsic motivation baselines on hard Atari benchmarks. Further, we demonstrate that the discovered trajectories can be distilled into deployable policies using existing supervised backward learning algorithms, achieving state-of-the-art scores by a wide margin on Montezuma's Revenge, Pitfall!, and Venture without relying on domain-specific knowledge. Finally, we demonstrate the generality of our framework in high-dimensional continuous action spaces by solving the MuJoCo Adroit dexterous manipulation and AntMaze tasks in a sparse-reward setting, directly from image observations and without expert demonstrations or offline datasets. To the best of our knowledge, this has not been achieved before for the Adroit tasks.

Subjects:

Machine Learning (cs.LG)

Cite as: arXiv:2603.22273 [cs.LG]

(or arXiv:2603.22273v3 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.22273

arXiv-issued DOI via DataCite

Submission history

From: Zakaria Mhammedi [view email] [v1] Mon, 23 Mar 2026 17:56:52 UTC (3,793 KB) [v2] Fri, 27 Mar 2026 17:44:46 UTC (3,796 KB) [v3] Mon, 30 Mar 2026 17:14:06 UTC (3,796 KB)

Original source

arXiv

https://arxiv.org/abs/2603.22273

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsFresh

AI models will secretly scheme to protect other AI models from being shut down, researchers find

Leading AI models will inflate performance reviews, exfiltrate model weights to prevent 'peer' AI models from being shut down

Fortune Tech

1mabout 3 hours ago

ModelsLive

AI alignment researchers want to automate themselves - Transformer | Substack

<a href="https://news.google.com/rss/articles/CBMiiwFBVV95cUxQTTlsWE8xQzg4Rlg4RW5fVUE4Nkc4WkN0WkRISmhvUnFndnpUMFlkcHNvZGQyQ1JRdm81Wmp6bGhzdnZyT295MFl2bmh3dTNpWWNmaXdUMnNNNGhkWEFHZXhiS0w5cm5GZGc3THJkeVEyYlRSM3pPZUNJejlqOHVoZkE4SXk0bGRHMGE4?oc=5" target="_blank">AI alignment researchers want to automate themselves</a> Transformer | Substack

Google News: AI Safety

1mabout 1 hour ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQT01EWURiSlJKbk1kZ2pSQ3BSUHFGRVpSdnBNdE1EMmtzUDJYemduTWJPa1FsZEw3RUdPQWt5WnlvMU9Ya0FKWjdBaHIyWEFoRzJHLTBhdnZCbTZxZ0JwdjJQMDMzY09rSmpabDNyc1JGRjI4Y1pBOXBZcnk0dzJ3Q25hMlkzLXhRRHl4YUF0R1lUSGdyQ2xfcm9DN1lyN01SbnNza2pmUmVDcVNVbHFXTXRUYkd2U1BxSXdqRzJpQ2JlMVVESW1qeGxHVG44enlSRXlZamJUS1RTdE56MllEQ0M3blB4dEJwNURrZzNjNWxROGc3cDJ2b1ZqeExFN0E5MEEzZWJDR3luVFNfRlBDdWxtMDBHMklmRWN4M3VjX3B3SjJXZFdJUHNTc2FBQmhjdjF0ZXFMV2hZWVdLS00wenpUZGVGelVQdXNxUWNUTUd5RXowR090dXBLcjdZVndOZXM2QzBFRkFDTllQLW16YWNwWlR2T0JzMENNbXNUanduSmZudm1rM0MtaS1CV0RodE9JRzBjMDBid3V1MDhaX0piWW1ocUlxMTBEWGd6QW9UNG1CMFlMMw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1m1 day ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 200 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

The Quantum Threat to Bitcoin Dividing Crypto

Two papers published this week have reignited debates about the risk posed by “Q-day” to the cryptography that underpins digital assets.

Decrypt AI

1mabout 2 hours ago

Research PapersFresh

Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno

<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> University of Nevada, Reno

Google News: AI

1mabout 3 hours ago

Research PapersFresh

AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes

<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> Forbes

Google News: Machine Learning

1mabout 3 hours ago

Research PapersFresh

Can Science Predict When a Study Won’t Hold Up?

Conducting research is hard; confirming the results is, too. And artificial intelligence isn’t yet ready to help, a major new study finds.

NYT Technology

1mabout 4 hours ago