Research Papers research paper arxiv machine-learning deep-learning

On the Hardness of Reinforcement Learning with Transition Look-Ahead

arXivMarch 31, 202610 min read0 views

arXiv:2510.19372v2 Announce Type: replace-cross Abstract: We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming f — Corentin Pla, Hugo Richard, Marc Abeille, Nadav Merlis, Vianney Perchet

View PDF HTML (experimental)

Abstract:We study reinforcement learning (RL) with transition look-ahead, where the agent may observe which states would be visited upon playing any sequence of $\ell$ actions before deciding its course of action. While such predictive information can drastically improve the achievable performance, we show that using this information optimally comes at a potentially prohibitive computational cost. Specifically, we prove that optimal planning with one-step look-ahead ($\ell=1$) can be solved in polynomial time through a novel linear programming formulation. In contrast, for $\ell \geq 2$, the problem becomes NP-hard. Our results delineate a precise boundary between tractable and intractable cases for the problem of planning with transition look-ahead in reinforcement learning.

Subjects:

Machine Learning (stat.ML); Machine Learning (cs.LG)

Cite as: arXiv:2510.19372 [stat.ML]

(or arXiv:2510.19372v2 [stat.ML] for this version)

https://doi.org/10.48550/arXiv.2510.19372

arXiv-issued DOI via DataCite

Submission history

From: Corentin Pla [view email] [v1] Wed, 22 Oct 2025 08:47:18 UTC (139 KB) [v2] Sat, 28 Mar 2026 15:01:01 UTC (84 KB)

Original source

arXiv

https://arxiv.org/abs/2510.19372

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions

arXiv:2402.16391v2 Announce Type: replace Abstract: Artificial Intelligence (AI) is now used across nearly every industry, making AI model quality essential for building reliable and trustworthy systems. Historically, correctness has been the main focus, but industry AI models must also satisfy many other important quality attributes. To understand how these attributes are perceived, the challenges they create, and the solutions used in practice, we identify nine key quality attributes and interview 15 AI practitioners from diverse backgrounds. The interviews show that practitioners prioritize attributes differently depending on context. For example, efficiency can matter more than correctness in real-time applications, while scalability and deployability are no longer seen as primary conc

arXiv cs.SE

1mabout 9 hours ago

Research PapersFresh

Proceedings of the 7th Workshop on Models for Formal Analysis of Real Systems

arXiv:2604.03053v1 Announce Type: cross Abstract: These proceedings contain the papers that were presented at the 7th Workshop on Models for Formal Analysis of Real Systems (MARS 2026), which took place on 12 April 2026 in Turin, Italy, as a satellite event of the 29th International Joint Conferences on Theory and Practice of Software (ETAPS 2026). The goal of MARS is to bring together researchers from different communities who are developing formal models of real systems in areas where complex models occur (e.g., networks, cyber-physical systems, hardware/software codesign, biology). The motivation for MARS stems from the following two observations: - Large case studies are essential to show that specification formalisms and modelling techniques are applicable to real systems, whereas man

arXiv cs.SE

2mabout 9 hours ago

ModelsFresh

Separating Oblivious and Adaptive Differential Privacy under Continual Observation

arXiv:2603.11029v2 Announce Type: replace-cross Abstract: We resolve an open question of Jain, Raskhodnikova, Sivakumar, and Smith (ICML 2023) by exhibiting a problem separating differential privacy under continual observation in the oblivious and adaptive settings. The continual observation (a.k.a. continual release) model formalizes privacy for streaming algorithms, where data is received over time and output is released at each time step. In the oblivious setting, privacy need only hold for data streams fixed in advance; in the adaptive setting, privacy is required even for streams that can be chosen adaptively based on the streaming algorithm's output. We describe the first explicit separation between the oblivious and adaptive settings. The problem showing this separation is based on

arXiv cs.DS

1mabout 9 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 195 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions

arXiv cs.SE

1mabout 9 hours ago

Research PapersFresh

Proceedings of the 7th Workshop on Models for Formal Analysis of Real Systems

arXiv cs.SE

2mabout 9 hours ago

Research PapersLive

The Periodic Table of AI Architecture: Assigning Clear Roles to Scattered AI Findings

A speculative but highly insightful conceptual framework for AI architecture A Mini Textbook for AI Engineers on Structure, Flow, Trace, and Residual Governance.pdf just released on Open Science Framework for public review. This mini-textbook, with detail tutorial notes, offers a unified lens for thinking about intelligent systems — moving beyond “just scale more” toward structured coordination under real limits . It treats advanced AI not as an all-knowing predictor, but as bounded observers that extract stable structure from noisy reality while leaving a governable residual (ambiguity, fragility, and unresolved parts). At its core is a clean grammar built around: Maintained Structure vs. Active Flow Adjudication (separating the viable from the merely possible) Semantic time (event-define

discuss.huggingface.co

3mabout 1 hour ago

Research PapersLive

‘This is 160-million-year-old Jurassic clay’: inside Es Devlin’s bid to reshape AI ethics – through pottery

The great artist and designer has summoned spiritual leaders, AI researchers and academics to try their hands at ceramics – and debate their wide-ranging positions on where tech is taking humanity Es Devlin owns a really great bell. It’s a singing bowl – originally used in Buddhist chanting rituals but now found in most quality yoga classes. This particular bell hits just the right frequency to make my temples vibrate pleasantly and, from the way the others gathered around the workbench at Oxford Kilns fall silent when Devlin strikes it, I don’t think I’m alone in feeling my head go ping. Devlin is calling order on a group of artists, AI researchers, spiritual leaders, academics and experts from global tech gathered at the kilns to discuss AI and make pots at the AI and Earth conference or

The Guardian AI

1mabout 1 hour ago