Diagnosing Non-Markovian Observations in Reinforcement Learning via Prediction-Based Violation Scoring
arXiv:2603.27389v1 Announce Type: cross Abstract: Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper introduces a prediction-based scoring method that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant — Naveen Mysore
View PDF HTML (experimental)
Abstract:Reinforcement learning algorithms assume that observations satisfy the Markov property, yet real-world sensors frequently violate this assumption through correlated noise, latency, or partial observability. Standard performance metrics conflate Markov breakdowns with other sources of suboptimality, leaving practitioners without diagnostic tools for such violations. This paper introduces a prediction-based scoring method that quantifies non-Markovian structure in observation trajectories. A random forest first removes nonlinear Markov-compliant dynamics; ridge regression then tests whether historical observations reduce prediction error on the residuals beyond what the current observation provides. The resulting score is bounded in [0, 1] and requires no causal graph construction. Evaluation spans six environments (CartPole, Pendulum, Acrobot, HalfCheetah, Hopper, Walker2d), three algorithms (PPO, A2C, SAC), controlled AR(1) noise at six intensity levels, and 10 seeds per condition. In post-hoc detection, 7 of 16 environment-algorithm pairs, primarily high-dimensional locomotion tasks, show significant positive monotonicity between noise intensity and the violation score (Spearman rho up to 0.78, confirmed under repeated-measures analysis); under training-time noise, 13 of 16 pairs exhibit statistically significant reward degradation. An inversion phenomenon is documented in low-dimensional environments where the random forest absorbs the noise signal, causing the score to decrease as true violations grow, a failure mode analyzed in detail. A practical utility experiment demonstrates that the proposed score correctly identifies partial observability and guides architecture selection, fully recovering performance lost to non-Markovian observations. Source code to reproduce all results is provided at this https URL.
Comments: 15 pages, 3 figures, 5 tables. Under review at RLC 2026
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
MSC classes: 68T05, 62M10
ACM classes: I.2.6; I.2.8; G.3
Cite as: arXiv:2603.27389 [cs.LG]
(or arXiv:2603.27389v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.2603.27389
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Naveen Mysore [view email] [v1] Sat, 28 Mar 2026 19:42:27 UTC (67 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivExclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQamNrT0NoNFYxYTFFMUhzbWtzSVNVSHBQWVVPQ0ZOR3o1bTJTNFVuMlJLVDhXaHNhRDZGZWZjUUgtcU01WkJSM0hYSWVCQmV3ZllKbjVWcDBuX0pheTQ3QThDc254NElERTZqeXZRdE43UkJaYVlRUk03NDlMa1RlU2NUb2N6c25UMlFwQWhITVo3M3dLV1JNblZvcUxuTV9YUE04S2ZCRGFLaWZhQlNXdzdQd0dRLW5va0YzVjVkY0hyV1NyaDRLOVo2UEFxcTk5QWZhTFduUXZLUVdXN1hkbTFGeWx3YVJUMzR6eThmaExhajJOSTRIY0p0Tmt1Yy0zbE9nREJGV1hLM2xPcGpPd1RaRmFTaGZ4M09HanRGSnJSVF9yN0laang2Ui1fWTNuZjZRWEdseVNXelc3Q0d4eW82SG9DcXg2cW1UQ3pEbmYtdnZ5ZFd5ajhFV2ZWX0dSODlVZEVRVEh2LWgzTDNkRDBlNWI4U1p3cE0xQVJUY1ZKMTUyRnBDQ3RlOTg0X1J6M2hSMW9Da3hlTG54dzJ5cTRIMG5KUG45Q1c4TW41UW5XQzZwQmxmRg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxNRHZ4V00wTUhOaFprdE9sTTBWLWVtRzNSMjNRLXUyRWNXU3NPbVlrT1ctUk9HaHVTNmxnRzV1MWVaUHpGUk5VRXZNMll4T2ppTkVqQkhDbF9MUzJ2a2Zydm8zUVR0QzJ6aURwcS1tOVJnUUtrR0hjX1dZWXNBQkpMSUs4VGFCanBLR21ON2xrYlRDVnk4a2JjSTNmLWtlMnNmRDBVT182aElEam02UHppenFQQ2Z2QmNwMWNaRXNQMzdnckJYZnpMcEIzMmNjQUhHb3N6Wl95d09LZGVzNzhsUEFQMFJNcjVXNmpSSXlSVUp3WDFmZHVfaXBrcFdPQk4tSHpCc3hSeXFUcVVQeC0wV2gzNk1TN2phdzR1b1VKOWR1aW9vaGxNYWVwY0tJV0ItTFUtclpfZEg1a0N2elA1VHZUbVVYT3JCR093U1gyaWZWUWc5b2gxbG4zNmVLM3BmSUZGY21VM2t3RXJTdVd5dllKS0pCR3QxZ04yUmxTZzF4UWY2bFdZZ0J2SVl3eFluVVI0RGtyMmluNXN2NGYzQTNkWmgweGJ1WWNxVDNTN3BMeWEyTmgyTg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQVzhBdU1oTHZpa3UwNVRCOEx3Y2QwanNsOTJWTEJ0ZXRMRFU5V285eTlnbjhuci1jVlI2dVBjaTVwTkxOVUZVSkxjWkkyZnRUWFduaXBTS2g4THFWZ0prbHYwei1Mejd0bEprQ0s0dXBUeHVZNlRKZE1OWFJ0Z19sWVZkdXl4RGw5TXF2Mnk2RVpxeDVBUnB1bUY1N2x5bEwwSlVKekFybko4VVZDeXQtVHBQang5OWU0V0Y5dUNXYU4xaTFOZ3BKZjdlaE5HV2lzYlFzOTk0WmJZTjNGdGQ0S0t5X1FWMlRTeTBaQmR6R3pVU3pMbm5NNFI3VG9ZSGpVUXNzYU5MUGNCdkw5MEJ5UGpuUTV1ZHl5dWl5VUFmeWtqOGJwcS0zOU5MWE41N19TaUZvSkg0OGVVc0F0cHgxeFM1UlM0YXNPT1UxWnh2eGlmVnkzbHUzYnpDbTlja3RLeWd1ZEo1b1NNSUp2UGtTcV9pYWliNzZtbUgyVDFmXzNyWnBVM0lLNU9qSGd6SEhZczZ3R0NwenhPY1BtZEdtZ0JnSEh1ZHJUbXhHR0loSENKNnV2WmN0cg?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Hidden Helpers: Pittsburgh’s Industrial Past Might Hold the Key to a Cleaner Future
<p> <img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-03/260305B_WTM_Armbruster038.jpg.webp?itok=8RGXrI_N" width="900" height="508" alt="Researchers examine soil"> </p> Pittsburgh has reinvented itself from a steel powerhouse to a hub for health care and education. But the city’s industrial past left a hidden legacy: toxic compounds like benzene and toluene in the soil. While most life can’t survive such a contamination, some microbes adapted to use the pollutants as food.
XR is XR: Rethinking MR and XR as Neutral Umbrella Terms
arXiv:2603.29939v1 Announce Type: new Abstract: The term XR is currently widely used as an expression encompassing Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). However, there is no clear consensus regarding its origin or meaning. XR is sometimes explained as an abbreviation for Extended Reality, but multiple interpretations exist regarding its etymology and formation process. This paper organizes the historical formation of terminology related to VR, AR, MR, and XR, and reexamines the context in which the term XR emerged and how it has spread. In particular, by presenting a timeline that distinguishes between the coinage of terms and the drivers of their adoption, we suggest that XR, as an umbrella term, functions not as an abbreviation of Extended Reality, but rat
Interview-Informed Generative Agents for Product Discovery: A Validation Study
arXiv:2603.29890v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong performance on standardized social science instruments, but their value for product discovery remains unclear. We investigate whether interview-informed generative agents can simulate user responses in concept testing scenarios. Using in-depth workflow interviews with knowledge workers, we created personalized agents and compared their evaluations of novel AI concepts against the same participants' responses. Our results show that agents are distribution-calibrated but identity-imprecise: they fail to replicate the specific individual they are grounded in, yet approximate population-level response distributions. These findings highlight both the potential and the limits of LLM simulation in desig
Beyond Legacy OFDM: A Mobility-Adaptive Multi-Gear Framework for 6G
arXiv:2603.29721v1 Announce Type: new Abstract: While Third Generation Partnership Project (3GPP) has confirmed orthogonal frequency division multiplexing (OFDM) as the baseline waveform for sixth-generation (6G), its performance is severely compromised in the high-mobility scenarios envisioned for 6G. Building upon the GEARBOX-PHY vision, we present gear-switching OFDM (GS-OFDM): a unified framework in which the base station (BS) adaptively selects among three gears, ranging from legacy OFDM to delay-Doppler domain processing based on the channel mobility conditions experienced by the user equipments (UEs). We illustrate the benefit of adaptive gear switching for communication throughput and, finally, we conclude with an outlook on research challenges and opportunities.

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!