Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: ClaudeHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV Community38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyNH:STA S01E02 OpenPGP.jsDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessMCMC Island Hopping: An Intuitive Guide to the Metropolis-Hastings AlgorithmDEV CommunityOracle cut thousands of jobs in recent round of layoffs – CNBCSilicon RepublicAnthropic admits partial leak of Claude Code source, says no customer data exposed - Storyboard18Google News: ClaudeHow to Make Your WooCommerce Store Discoverable by ChatGPT (And Convert That Traffic)DEV Community38 Commits, Zero New Features — How I Made My Web App Production-ReadyDEV CommunityLWiAI Podcast #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention ResidualsLast Week in AIThe Leaked 'Employee-Grade' CLAUDE.md: How to Use It TodayDEV CommunityCanal+ Names Anne‑Laure Tingry Chief Data & AI Officer - The Hollywood ReporterGoogle News: AILouisiana scraps some, but not all, AI proposals after Trump threats - Louisiana IlluminatorGoogle News: AIAnthropic accidentally leaks Claude Code source in npm slipSilicon RepublicChina’s AI Is Spreading Fast. Here’s How to Stop the Security Risks - War on the RocksGoogle News: AI SafetyNH:STA S01E02 OpenPGP.jsDEV Community

Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.27389v1 Announce Type: cross Abstract: Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper introduces a prediction-based scoring method that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant — Naveen Mysore

View PDF HTML (experimental)

Abstract:Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper introduces a prediction-based scoring method that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant dynamics; ridge regression then tests whether historical observations reduce prediction error on the residuals beyond what the current observation provides. The resulting score is bounded in [0, 1] and requires no causal graph construction. Evaluation spans six environments (CartPole, Pendulum, Acrobot, HalfCheetah, Hopper, Walker2d), three algorithms (PPO, A2C, SAC), controlled AR(1) noise at six intensity levels, and 10 seeds per condition. In post-hoc detection, 7 of 16 environment-algorithm pairs, primarily high-dimensional locomotion tasks, show significant positive monotonicity between noise intensity and the violation score (Spearman rho up to 0.78, confirmed under repeated-measures analysis); under training-time noise, 13 of 16 pairs exhibit statistically significant reward degradation. An inversion phenomenon is documented in low-dimensional environments where the random forest absorbs the noise signal, causing the score to decrease as true violations grow, a failure mode analyzed in detail. A practical utility experiment demonstrates that the proposed score correctly identifies partial observability and guides architecture selection, fully recovering performance lost to non-Markovian observations. Source code to reproduce all results is provided at this https URL.

Comments: 15 pages, 3 figures, 5 tables. Under review at RLC 2026

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

MSC classes: 68T05, 62M10

ACM classes: I.2.6; I.2.8; G.3

Cite as: arXiv:2603.27389 [cs.LG]

(or arXiv:2603.27389v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.27389

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Naveen Mysore [view email] [v1] Sat, 28 Mar 2026 19:42:27 UTC (67 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Diagnosing …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 166 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers