Research Papers research paper arxiv computer-vision image-recognition

Persistent Robot World Models: Stabilizing Multi-Step Rollouts via Reinforcement Learning

arXivMarch 26, 202610 min read0 views

Action-conditioned robot world models generate future video frames of the manipulated scene given a robot action sequence, offering a promising alternative for simulating tasks that are difficult to model with traditional physics engines. However, these models are optimized for short-term prediction and break down when deployed autoregressively: each predicted clip feeds back as context for the next, causing errors to compound and visual quality to rapidly degrade. We address this through the following contributions. First, we introduce a reinforcement learning (RL) post-training scheme that t — Jai Bardhan, Patrik Drozdik, Josef Sivic

View PDF HTML (experimental)

Abstract:Action-conditioned robot world models generate future video frames of the manipulated scene given a robot action sequence, offering a promising alternative for simulating tasks that are difficult to model with traditional physics engines. However, these models are optimized for short-term prediction and break down when deployed autoregressively: each predicted clip feeds back as context for the next, causing errors to compound and visual quality to rapidly degrade. We address this through the following contributions. First, we introduce a reinforcement learning (RL) post-training scheme that trains the world model on its own autoregressive rollouts rather than on ground-truth histories. We achieve this by adapting a recent contrastive RL objective for diffusion models to our setting and show that its convergence guarantees carry over exactly. Second, we design a training protocol that generates and compares multiple candidate variable-length futures from the same rollout state, reinforcing higher-fidelity predictions over lower-fidelity ones. Third, we develop efficient, multi-view visual fidelity rewards that combine complementary perceptual metrics across camera views and are aggregated at the clip level for dense, low-variance training signal. Fourth, we show that our approach establishes a new state-of-the-art for rollout fidelity on the DROID dataset, outperforming the strongest baseline on all metrics (e.g., LPIPS reduced by 14% on external cameras, SSIM improved by 9.1% on the wrist camera), winning 98% of paired comparisons, and achieving an 80% preference rate in a blind human study.

Comments: 34 pages, 11 figures, 12 tables

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.25685 [cs.RO]

(or arXiv:2603.25685v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.25685

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jai Bardhan [view email] [v1] Thu, 26 Mar 2026 17:36:08 UTC (5,756 KB)

Original source

arXiv

https://arxiv.org/abs/2603.25685v1

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning , I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed LLMs. The core idea: instead of monitoring input embeddings (which is what existing tools do), we monitor the statistical manifold of the model’s output distributions using Fisher-Rao geodesic distance. We then run adaptive CUSUM (Page-Hinkley) on the resulting z-score stream to catch slow drift that per-request spike detection misses entirely. The methodology is grounded in published work on information geometry (Figshare, DOIs available). We’ve validated the signal on real OpenAI API logprobs, CUSUM caught gradual domain drift in 7 steps with zero false alarms during warmup, while spike detection missed it entirely. If anyone with cs.LG endorsement is

Reddit r/MachineLearning

1mabout 3 hours ago

CountriesLive

How can Beijing attract top-tier Chinese AI professionals based abroad?

Beijing should shift its strategy and improve ways to attract and retain top Chinese AI professionals as America’s accelerating integration of artificial intelligence into military and national security systems puts such talent in a bind. As geopolitical tensions rise, many highly skilled Chinese researchers working at US tech and research institutions are confronting a painful dilemma, according to Dai Mingjie, a researcher at the Institute of Public Policy at the Guangzhou-based South China...

SCMP Tech (Asia AI)

1mabout 2 hours ago

ProductsFresh

Anthropic says Claude subscriptions will no longer support OpenClaw because it puts an 'outsized strain' on systems

Why It Matters The decision by Anthropic to stop supporting OpenClaw for Claude subscriptions is significant because it highlights the challenges of integrating third-party tools with AI systems. According to a report from Business Insider, Anthropic cited the "outsized strain" that tools like OpenClaw put on their systems as the reason for this move. This strain is likely due to the additional computational resources required to support these tools, which can impact the overall performance and reliability of the AI system. The impact of this decision will be felt by users who rely on OpenClaw to enhance their experience with Claude subscriptions. OpenClaw's founder has already expressed disappointment, stating that cutting support would be "a loss." This reaction is understandable, given

Dev.to AI

3mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 233 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Reddit r/MachineLearning

1mabout 3 hours ago

Research PapersRecent

How AI Is Re‑Architecting Industrial Procurement and Supply Chain - Emerj Artificial Intelligence Research

How AI Is Re‑Architecting Industrial Procurement and Supply Chain Emerj Artificial Intelligence Research

GNews AI manufacturing

1m1 day ago

Research PapersFresh

Towards end-to-end automation of AI research

Article URL: https://www.nature.com/articles/s41586-026-10265-5 Comments URL: https://news.ycombinator.com/item?id=47645696 Points: 3 # Comments: 0

Hacker News AI Top

1mabout 3 hours ago

Research PapersFresh

[D] KDD Review Discussion

KDD 2026 (Feb Cycle) reviews will release today (4-April AoE), This thread is open to discuss about reviews and importantly celebrate successful reviews. Let us all remember that review system is noisy and we all suffer from it and this doesn't define our research impact. Let's all prioritise reviews which enhance our papers. Feel free to discuss your experiences submitted by /u/BomsDrag [link] [comments]

Reddit r/MachineLearning

1mabout 11 hours ago