Research Papers research paper arxiv ai artificial-intelligence

Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning

arXivMarch 31, 202610 min read0 views

arXiv:2510.25311v2 Announce Type: replace-cross Abstract: Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochas — Sagalpreet Singh, Rishi Saket, Aravindan Raghuveer

View PDF HTML (experimental)

Abstract:Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochasticity for encouraging exploration to find an optimal policy which may not necessarily lead to dispersed marginal state distribution over rewarding states. Other RL algorithms which match a target distribution assume the latter to be available apriori. This may be infeasible in large scale systems where enumeration of all states is not possible and a state is determined to be a goal state only upon reaching it. We formalize the problem of maximizing the expected return while uniformly visiting the goal states as Multi Goal RL in which an oracle classifier over the state space determines the goal states. We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states. Our algorithm is based on optimizing a custom RL reward which is computed - based on the current policy mixture - at each iteration for a set of sampled trajectories. The latter are used via an offline RL algorithm to update the policy mixture. We prove performance guarantees for our algorithm, showing efficient convergence bounds for optimizing a natural objective which captures the expected return as well as the dispersion of the marginal state distribution over the goal states. We design and perform experiments on synthetic MDPs and standard RL environments to evaluate the effectiveness of our algorithm.

Comments: 27 pages, 6 figures

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

Cite as: arXiv:2510.25311 [cs.LG]

(or arXiv:2510.25311v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2510.25311

arXiv-issued DOI via DataCite

Submission history

From: Sagalpreet Singh [view email] [v1] Wed, 29 Oct 2025 09:23:21 UTC (353 KB) [v2] Sat, 28 Mar 2026 14:07:59 UTC (413 KB)

Original source

arXiv

https://arxiv.org/abs/2510.25311

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

Precision Proactivity: Measuring Cognitive Load in Real-World AI-Assisted Work

Article URL: https://arxiv.org/abs/2505.10742 Comments URL: https://news.ycombinator.com/item?id=47595100 Points: 1 # Comments: 0

Hacker News AI Top

2m44 minutes ago

ModelsLive

We're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openly

At MWire Labs, we build language technology for Northeast India's indigenous languages - ASR, MT, OCR, LLMs. The region has 200+ languages. Almost none of them exist in mainstream AI datasets. So we're doing something a bit unusual. NortheastGenAI 2026 is a virtual workshop on May 29 where every submission must be AI-generated or AI-assisted - with full disclosure of how. All reviews are AI-assisted too, followed by a human editorial check. Everything is public on OpenReview. Inspired by Agents4Science 2025 (Stanford). We're not claiming AI research is ready. We're asking the question openly and publishing whatever comes out. *Three tracks: * Language, Culture & Heritage Society, History & Anthropology AI and Technology for NE In

Dev.to AI

1m28 minutes ago

Research Papers

Antonia Georgopoulou starts as Cyber Valley Max Planck Independent Research Group Leader

is.mpg.de

1m6 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 164 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Precision Proactivity: Measuring Cognitive Load in Real-World AI-Assisted Work

Article URL: https://arxiv.org/abs/2505.10742 Comments URL: https://news.ycombinator.com/item?id=47595100 Points: 1 # Comments: 0

Hacker News AI Top

2m44 minutes ago

Research PapersFresh

Iran’s Revolutionary Guards just named 18 US tech firms as military targets. The age of the civilian data centre is over.

At 8pm Tehran time on Tuesday, a new kind of front line was drawn, not through desert terrain or along a disputed border, but through the server farms, cloud regions, and corporate campuses of America’s largest technology companies. The Islamic Revolutionary Guard Corps published a statement on its official Sepah News channel naming 18 US […] This story continues at The Next Web

The Next Web AI

1mabout 3 hours ago

Research Papers

Antonia Georgopoulou starts as Cyber Valley Max Planck Independent Research Group Leader

is.mpg.de

1m6 months ago

Research Papers

Real-time speech-to-speech translation - research.google

<a href="https://news.google.com/rss/articles/CBMid0FVX3lxTFAxeFFhNVhOTjVXeEhXeGFHOXE3WENYeGFISjlpVGNueGtDS2ZZTEVsZHh6dkhLc191aFFYNEpMUUxraV9uTWF6YW1RcF9VTFlIZDBuQTlpbkhBRnJxU1FuTGY4aEtFc2FEaWMxekxUTnlzV3dFN1ow?oc=5" target="_blank">Real-time speech-to-speech translation</a> research.google

GNews AI voice

1m4 months ago