Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning
arXiv:2510.25311v2 Announce Type: replace-cross Abstract: Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochas — Sagalpreet Singh, Rishi Saket, Aravindan Raghuveer
View PDF HTML (experimental)
Abstract:Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, in many natural situations, it is desirable to learn a policy that induces a dispersed marginal state distribution over rewarding states, while maximizing the expected return which is typically tied to reaching a goal state. This aspect remains relatively unexplored. Existing techniques based on entropy regularization and intrinsic rewards use stochasticity for encouraging exploration to find an optimal policy which may not necessarily lead to dispersed marginal state distribution over rewarding states. Other RL algorithms which match a target distribution assume the latter to be available apriori. This may be infeasible in large scale systems where enumeration of all states is not possible and a state is determined to be a goal state only upon reaching it. We formalize the problem of maximizing the expected return while uniformly visiting the goal states as Multi Goal RL in which an oracle classifier over the state space determines the goal states. We propose a novel algorithm that learns a high-return policy mixture with marginal state distribution dispersed over the set of goal states. Our algorithm is based on optimizing a custom RL reward which is computed - based on the current policy mixture - at each iteration for a set of sampled trajectories. The latter are used via an offline RL algorithm to update the policy mixture. We prove performance guarantees for our algorithm, showing efficient convergence bounds for optimizing a natural objective which captures the expected return as well as the dispersion of the marginal state distribution over the goal states. We design and perform experiments on synthetic MDPs and standard RL environments to evaluate the effectiveness of our algorithm.
Comments: 27 pages, 6 figures
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2510.25311 [cs.LG]
(or arXiv:2510.25311v2 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2510.25311
arXiv-issued DOI via DataCite
Submission history
From: Sagalpreet Singh [view email] [v1] Wed, 29 Oct 2025 09:23:21 UTC (353 KB) [v2] Sat, 28 Mar 2026 14:07:59 UTC (413 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
We're running an AI-authored research workshop for Northeast India's 200+ languages - and publishing everything openly
<p>At MWire Labs, we build language technology for Northeast India's indigenous languages - ASR, MT, OCR, LLMs. The region has 200+ languages. Almost none of them exist in mainstream AI datasets.<br> So we're doing something a bit unusual.</p> <p>NortheastGenAI 2026 is a virtual workshop on May 29 where every submission must be AI-generated or AI-assisted - with full disclosure of how. All reviews are AI-assisted too, followed by a human editorial check. Everything is public on OpenReview. Inspired by Agents4Science 2025 (Stanford).</p> <p>We're not claiming AI research is ready. We're asking the question openly and publishing whatever comes out.</p> <p>*<em>Three tracks:<br> *</em><br> Language, Culture & Heritage<br> Society, History & Anthropology<br> AI and Technology for NE In
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Iran’s Revolutionary Guards just named 18 US tech firms as military targets. The age of the civilian data centre is over.
At 8pm Tehran time on Tuesday, a new kind of front line was drawn, not through desert terrain or along a disputed border, but through the server farms, cloud regions, and corporate campuses of America’s largest technology companies. The Islamic Revolutionary Guard Corps published a statement on its official Sepah News channel naming 18 US […] This story continues at The Next Web
Real-time speech-to-speech translation - research.google
<a href="https://news.google.com/rss/articles/CBMid0FVX3lxTFAxeFFhNVhOTjVXeEhXeGFHOXE3WENYeGFISjlpVGNueGtDS2ZZTEVsZHh6dkhLc191aFFYNEpMUUxraV9uTWF6YW1RcF9VTFlIZDBuQTlpbkhBRnJxU1FuTGY4aEtFc2FEaWMxekxUTnlzV3dFN1ow?oc=5" target="_blank">Real-time speech-to-speech translation</a> <font color="#6f6f6f">research.google</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!