Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeOracle Lays Off Thousands to Offset AI SpendingGizmodoFranklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors (Vicky Ge Huang/Wall Street Journal)TechmemeDeveloper’s Guide to Building ADK Agents with SkillsGoogle Developers BlogUMW Inaugural AI Expert-in-Residence Shares Insight on Technology’s ‘Tremendous’ Impact - University of Mary WashingtonGoogle News: AISpaceX Said to File Confidentially for IPO Before AI RivalsBloomberg TechnologyCargill Wins 2026 BIG Artificial Intelligence Excellence Award - foodmarket.comGoogle News: AIWhen machines judge without knowing: AI, augmentation and the limits of automated cybersecurity decisions - IAPPGNews AI cybersecurityMeet the Agentic AI Design-to-Source Workspace for PLM: From CAD to Confident Sourcing Decisions - Oracle BlogsGNews AI agenticYouTube blasted by hundreds of experts over ‘AI slop’ videos served up to kidsFast Company TechApono Uses Gamified AI Security Exercise to Engage Cloud Security Community - TipRanksGoogle News: AI SafetyBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeOracle Lays Off Thousands to Offset AI SpendingGizmodoFranklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors (Vicky Ge Huang/Wall Street Journal)TechmemeDeveloper’s Guide to Building ADK Agents with SkillsGoogle Developers BlogUMW Inaugural AI Expert-in-Residence Shares Insight on Technology’s ‘Tremendous’ Impact - University of Mary WashingtonGoogle News: AISpaceX Said to File Confidentially for IPO Before AI RivalsBloomberg TechnologyCargill Wins 2026 BIG Artificial Intelligence Excellence Award - foodmarket.comGoogle News: AIWhen machines judge without knowing: AI, augmentation and the limits of automated cybersecurity decisions - IAPPGNews AI cybersecurityMeet the Agentic AI Design-to-Source Workspace for PLM: From CAD to Confident Sourcing Decisions - Oracle BlogsGNews AI agenticYouTube blasted by hundreds of experts over ‘AI slop’ videos served up to kidsFast Company TechApono Uses Gamified AI Security Exercise to Engage Cloud Security Community - TipRanksGoogle News: AI Safety

Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

arXivMarch 25, 202610 min read0 views
Source Quiz

Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $γ$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in b — Guy Zamir, Matthew Zurek, Yudong Chen

View PDF HTML (experimental)

Abstract:Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $\gamma$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in both settings take the form $\tilde{O}( \sqrt{SA,\text{Var}} + \text{lower-order terms})$, where $S,A$ are the state and action space sizes, and $\text{Var}$ captures cumulative transition variance. This implies minimax-optimal average-reward and $\gamma$-regret bounds in the worst case but also adapts to easier problem instances, for example yielding nearly constant regret in deterministic MDPs. Furthermore, our algorithm enjoys significantly improved lower-order terms for the average-reward setting. With prior knowledge of the optimal bias span $\Vert h^\star\Vert_\text{sp}$, our algorithm obtains lower-order terms scaling as $\Vert h^\star\Vert_\text{sp} S^2 A$, which we prove is optimal in both $\Vert h^\star\Vert_\text{sp}$ and $A$. Without prior knowledge, we prove that no algorithm can have lower-order terms smaller than $\Vert h^\star \Vert_\text{sp}^2 S A$, and we provide a prior-free algorithm whose lower-order terms scale as $\Vert h^\star\Vert_\text{sp}^2 S^3 A$, nearly matching this lower bound. Taken together, these results completely characterize the optimal dependence on $\Vert h^\star\Vert_\text{sp}$ in both leading and lower-order terms, and reveal a fundamental gap in what is achievable with and without prior knowledge.

Subjects:

Machine Learning (cs.LG); Information Theory (cs.IT); Optimization and Control (math.OC); Machine Learning (stat.ML)

Cite as: arXiv:2603.23926 [cs.LG]

(or arXiv:2603.23926v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.23926

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Guy Zamir [view email] [v1] Wed, 25 Mar 2026 04:34:19 UTC (56 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Optimal Var…researchpaperarxivstatisticsmachine-lea…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 85 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers