Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechFilevine Emphasizes Ethical AI and Autonomous Systems in Legal Tech Strategy - TipRanksGNews AI ethicsPriced Out by AI: The Memory Chip Crisis Hitting Every ConsumerHacker News AI TopShow HN: AgentDog – Open-source dashboard for monitoring local AI agentsHacker News AI TopAI Enforcement Accelerates as Federal Policy Stalls and States Step In - Morgan LewisGNews AI USAGemma 4 and Qwen3.5 on shared benchmarksReddit r/LocalLLaMA[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on BlackwellReddit r/MachineLearningThe energy and environmental impact of AI and how it undermines democracy - greenpeace.orgGNews AI energyShow HN: A TUI for checking and comparing cloud and AI pricingHacker News AI TopAttorney General Pam Bondi pushed outAxios TechShow HN: Screenbox – Self-hosted virtual desktops for AI agentsHacker News AI TopBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessGeopolitics, AI, and Cybersecurity: Insights From RSAC 2026Dark ReadingThis International Fact-Checking Day, use these 5 tips to spot AI-generated contentFast Company TechFilevine Emphasizes Ethical AI and Autonomous Systems in Legal Tech Strategy - TipRanksGNews AI ethicsPriced Out by AI: The Memory Chip Crisis Hitting Every ConsumerHacker News AI TopShow HN: AgentDog – Open-source dashboard for monitoring local AI agentsHacker News AI TopAI Enforcement Accelerates as Federal Policy Stalls and States Step In - Morgan LewisGNews AI USAGemma 4 and Qwen3.5 on shared benchmarksReddit r/LocalLLaMA[P] Gemma 4 running on NVIDIA B200 and AMD MI355X from the same inference stack, 15% throughput gain over vLLM on BlackwellReddit r/MachineLearningThe energy and environmental impact of AI and how it undermines democracy - greenpeace.orgGNews AI energyShow HN: A TUI for checking and comparing cloud and AI pricingHacker News AI TopAttorney General Pam Bondi pushed outAxios TechShow HN: Screenbox – Self-hosted virtual desktops for AI agentsHacker News AI Top
AI NEWS HUBbyEIGENVECTOREigenvector

DreamerAD: Efficient Reinforcement Learning via Latent World Model for Autonomous Driving

arXivby [Submitted on 25 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read1 views
Source Quiz

arXiv:2603.24587v2 Announce Type: replace Abstract: We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Ou — Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen, Junyu Han, Lingyun Xu, Yifeng Pan, Dongbin Zhao

Authors:Pengxuan Yang, Yupeng Zheng, Deheng Qian, Zebin Xing, Qichao Zhang, Linbo Wang, Yichen Zhang, Shaoyu Guo, Zhongpu Xia, Qiang Chen, Junyu Han, Lingyun Xu, Yifeng Pan, Dongbin Zhao

View PDF HTML (experimental)

Abstract:We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Our approach leverages denoised latent features from video generation models through three key mechanisms: (1) shortcut forcing that reduces sampling complexity via recursive multi-resolution step compression, (2) an autoregressive dense reward model operating directly on latent representations for fine-grained credit assignment, and (3) Gaussian vocabulary sampling for GRPO that constrains exploration to physically plausible trajectories. DreamerAD achieves 87.7 EPDMS on NavSim v2, establishing state-of-the-art performance and demonstrating that latent-space RL is effective for autonomous driving.

Comments: authors update

Subjects:

Machine Learning (cs.LG); Robotics (cs.RO)

Cite as: arXiv:2603.24587 [cs.LG]

(or arXiv:2603.24587v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.24587

arXiv-issued DOI via DataCite

Submission history

From: Pengxuan Yang [view email] [v1] Wed, 25 Mar 2026 17:57:09 UTC (22,255 KB) [v2] Wed, 1 Apr 2026 13:02:07 UTC (22,255 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
DreamerAD: …researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 98 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!