Research Papers research paper arxiv statistics machine-learning

Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

arXivMarch 25, 202610 min read0 views

Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $γ$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in b — Guy Zamir, Matthew Zurek, Yudong Chen

View PDF HTML (experimental)

Abstract:Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $\gamma$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in both settings take the form $\tilde{O}( \sqrt{SA,\text{Var}} + \text{lower-order terms})$, where $S,A$ are the state and action space sizes, and $\text{Var}$ captures cumulative transition variance. This implies minimax-optimal average-reward and $\gamma$-regret bounds in the worst case but also adapts to easier problem instances, for example yielding nearly constant regret in deterministic MDPs. Furthermore, our algorithm enjoys significantly improved lower-order terms for the average-reward setting. With prior knowledge of the optimal bias span $\Vert h^\star\Vert_\text{sp}$, our algorithm obtains lower-order terms scaling as $\Vert h^\star\Vert_\text{sp} S^2 A$, which we prove is optimal in both $\Vert h^\star\Vert_\text{sp}$ and $A$. Without prior knowledge, we prove that no algorithm can have lower-order terms smaller than $\Vert h^\star \Vert_\text{sp}^2 S A$, and we provide a prior-free algorithm whose lower-order terms scale as $\Vert h^\star\Vert_\text{sp}^2 S^3 A$, nearly matching this lower bound. Taken together, these results completely characterize the optimal dependence on $\Vert h^\star\Vert_\text{sp}$ in both leading and lower-order terms, and reveal a fundamental gap in what is achievable with and without prior knowledge.

Subjects:

Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)

Cite as: arXiv:2603.23926 [cs.LG]

(or arXiv:2603.23926v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.23926

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Guy Zamir [view email] [v1] Wed, 25 Mar 2026 04:34:19 UTC (56 KB)

Original source

arXiv

https://arxiv.org/abs/2603.23926v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQelg3M0U0azc4TENIb2NHX09Ea1AtczN5T3ptb0lBS0g0MXdsbjBYVWNTc3RmZU1pQm5USjI0WWRNZjhGaVRtdmhhU01qQ1ZnUEZQN3B3QVFxek5BeWdRLU5EeDlJSEw3blJIOGVTSDR2dVl2RHpFTmd1dEpYdElxbmFNM1UyTzAxTm1wQmJOTk10ZE80VFgxVGJYUGdTbXFCa041VVhvZmVHLWMxTDVHaDlFdE8tSjIzVTZLY2dpVzlYRUROZ1JLMUhscFluQU44Y3ZKbDN0ZHUyeGpVNU5aTGtSaF9pM0YwVG1sd3p6S0V6OVc0WGZPQk1qOGY2UU5MUkJ6MHA5SmlaLUtURU5tQzFXZ2hVSnRNTHM3UWl5QmxYRkJiNDJkd1VYUFBWeG1mZFNEb0JtQl9SWUFwTU9IVnlfZWVLeTRTU25IZDRJM1pVQ3F1eFRIV1o0NUVveW8xRjFzNVQyQkdFOU5xdFhqZ0F3S3VJMHNNZHBPVEE1eUpTVTA3QUp3WFZKMk9CeDJUVWwyOWZBUDJkelpOQl9laUQ2QjVYRW1iYUU3OW1LMkRMSDJWQlRKRw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1m1 day ago

Products

Penn State Extension AI tool, Tilva, expands access to research-based guidance - The Pennsylvania State University

<a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxOX0prRHBaY0x3cnNKM3RnR3BuTmlBeW1xRE1wNFlMdEpZM2d2Uk9EMEM4MU5ONnkyNDVDbm9oYjVxRDNTZjZ2NzF3VUJvTWpsU2k2a1EtRDVaZjI5X3U2SEJraG4tN0JCLU4xaThNa2FtYnFZU0pSSkNkaGRYdEpaVlZYbXlmMUF4VzFkcHQtM052eE5sVG9wODA1dDRGUlNrWFRZenRmRU1DckNHNUg5blhCc0Jnby1Z?oc=5" target="_blank">Penn State Extension AI tool, Tilva, expands access to research-based guidance</a> The Pennsylvania State University

GNews AI agriculture

1m3 months ago

ModelsFresh

AI models will secretly scheme to protect other AI models from being shut down, researchers find

Leading AI models will inflate performance reviews, exfiltrate model weights to prevent 'peer' AI models from being shut down

Fortune Tech

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 85 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

The Quantum Threat to Bitcoin Dividing Crypto

Two papers published this week have reignited debates about the risk posed by “Q-day” to the cryptography that underpins digital assets.

Decrypt AI

1mabout 3 hours ago

Research PapersFresh

Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno

<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> University of Nevada, Reno

Google News: AI

1mabout 3 hours ago

Research PapersFresh

AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes

<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> Forbes

Google News: Machine Learning

1mabout 3 hours ago

Research PapersFresh

Can Science Predict When a Study Won’t Hold Up?

Conducting research is hard; confirming the results is, too. And artificial intelligence isn’t yet ready to help, a major new study finds.

NYT Technology

1mabout 4 hours ago