Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessExabeam expands Agent Behavior Analytics to secure AI agents across ChatGPT, Copilot and Gemini - SiliconANGLEGoogle News: ChatGPTShutterstock Launches Licensed Content App in ChatGPT, Bringing Commercial-Ready Assets into AI-Native Workflows - PR NewswireGoogle News: ChatGPTKids groups say they didn’t know OpenAI was behind their child safety coalition - The San Francisco StandardGoogle News: AI SafetyChatGPT users can now access Shutterstock images, video and music - Stock TitanGoogle News: ChatGPTInvestors Hedge Bets As Gold Gains Despite Risk-On MoodInternational Business TimesGen AI startup Runway announces $10m venture fund - Global VenturingGoogle News: Generative AI7 Essential AI Website Builders: From Prompt to Production - KDnuggetsGoogle News: Machine LearningWho is Demis Hassabis, the man behind Google DeepMind? - The EconomistGoogle News: AIThe IT department: Where AI goes to die - The EconomistGoogle News: AIUAE Reportedly Preparing To Join The War And Help The U.S. Reopen The Strait Of HormuzInternational Business TimesHow to Stay Employable When AI Is Coming for Your JobGradient FlowBlack Hat USADark ReadingBlack Hat AsiaAI BusinessExabeam expands Agent Behavior Analytics to secure AI agents across ChatGPT, Copilot and Gemini - SiliconANGLEGoogle News: ChatGPTShutterstock Launches Licensed Content App in ChatGPT, Bringing Commercial-Ready Assets into AI-Native Workflows - PR NewswireGoogle News: ChatGPTKids groups say they didn’t know OpenAI was behind their child safety coalition - The San Francisco StandardGoogle News: AI SafetyChatGPT users can now access Shutterstock images, video and music - Stock TitanGoogle News: ChatGPTInvestors Hedge Bets As Gold Gains Despite Risk-On MoodInternational Business TimesGen AI startup Runway announces $10m venture fund - Global VenturingGoogle News: Generative AI7 Essential AI Website Builders: From Prompt to Production - KDnuggetsGoogle News: Machine LearningWho is Demis Hassabis, the man behind Google DeepMind? - The EconomistGoogle News: AIThe IT department: Where AI goes to die - The EconomistGoogle News: AIUAE Reportedly Preparing To Join The War And Help The U.S. Reopen The Strait Of HormuzInternational Business TimesHow to Stay Employable When AI Is Coming for Your JobGradient Flow

Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27874v1 Announce Type: new Abstract: Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particul — Masoud S. Sakha, Rushikesh Kamalapurkar, Sean Meyn

View PDF HTML (experimental)

Abstract:Relative temporal-difference (TD) learning was introduced to mitigate the slow convergence of TD methods when the discount factor approaches one by subtracting a baseline from the temporal-difference update. While this idea has been studied in the tabular setting, stability guarantees with function approximation remain poorly understood. This paper analyzes relative TD learning with linear function approximation. We establish stability conditions for the algorithm and show that the choice of baseline distribution plays a central role. In particular, when the baseline is chosen as the empirical distribution of the state-action process, the algorithm is stable for any non-negative baseline weight and any discount factor. We also provide a sensitivity analysis of the resulting parameter estimates, characterizing both asymptotic bias and covariance. The asymptotic covariance and asymptotic bias are shown to remain uniformly bounded as the discount factor approaches one.

Comments: Extended version for submission to the 2026 IEEE CDC

Subjects:

Machine Learning (cs.LG); Optimization and Control (math.OC)

MSC classes: 68T05, 93E35, 62L20, 93E20

Cite as: arXiv:2603.27874 [cs.LG]

(or arXiv:2603.27874v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27874

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Sean Meyn [view email] [v1] Sun, 29 Mar 2026 21:19:19 UTC (820 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Stability a…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers