Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessMeta Smart Glasses Can Now Track All the Food You Put Into Your MouthGizmodoI moved my entire ChatGPT context to Claude and it finally felt like home - MakeUseOfGoogle News: ChatGPTHere’s Who the Mysterious Main Characters Are in Disclosure DayGizmodoClaude Code is still vulnerable to an attack Anthropic has already fixed - InfoWorldGoogle News: ClaudeBehind the Blog: Systems As Designed404 MediaWe found $50k in forgotten subscriptionsDev.to AISMD/飞达 吸嘴、贴片机、物料车、刮刀等耗材对应的场景,以及这些耗材的市场行情,还有对应场景下的经济模式,处在哪个生态位上能够获得比较可观的收益Dev.to AIЯ автоматизировал 80% задач и уволил себя самDev.to AIIs 32GB RAM Enough for Developers in 2026? Or Will It Slow You Down?Medium AIYou can use Google Meet with CarPlay now: How to join meetings safely in your carZDNet Big DataWe Cut Our LLM Inference Bill by 73% Without Degrading Clinical AccuracyMedium AII Think I Found the Best Way to Rank in LLMsMedium AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessMeta Smart Glasses Can Now Track All the Food You Put Into Your MouthGizmodoI moved my entire ChatGPT context to Claude and it finally felt like home - MakeUseOfGoogle News: ChatGPTHere’s Who the Mysterious Main Characters Are in Disclosure DayGizmodoClaude Code is still vulnerable to an attack Anthropic has already fixed - InfoWorldGoogle News: ClaudeBehind the Blog: Systems As Designed404 MediaWe found $50k in forgotten subscriptionsDev.to AISMD/飞达 吸嘴、贴片机、物料车、刮刀等耗材对应的场景,以及这些耗材的市场行情,还有对应场景下的经济模式,处在哪个生态位上能够获得比较可观的收益Dev.to AIЯ автоматизировал 80% задач и уволил себя самDev.to AIIs 32GB RAM Enough for Developers in 2026? Or Will It Slow You Down?Medium AIYou can use Google Meet with CarPlay now: How to join meetings safely in your carZDNet Big DataWe Cut Our LLM Inference Bill by 73% Without Degrading Clinical AccuracyMedium AII Think I Found the Best Way to Rank in LLMsMedium AI
AI NEWS HUBbyEIGENVECTOREigenvector

$AutoDrive\text{-}P^3$: Unified Chain of Perception-Prediction-Planning Thought via Reinforcement Fine-Tuning

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.28116v1 Announce Type: cross Abstract: Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a significant domain gap and compromises decision-making capability; 2) Other VLMs can generate outputs for perception, prediction, and planning tasks — Yuqi Ye, Zijian Zhang, Junhong Lin, Shangkun Sun, Changhao Peng, Wei Gao

View PDF HTML (experimental)

Abstract:Vision-language models (VLMs) are increasingly being adopted for end-to-end autonomous driving systems due to their exceptional performance in handling long-tail scenarios. However, current VLM-based approaches suffer from two major limitations: 1) Some VLMs directly output planning results without chain-of-thought (CoT) reasoning, bypassing crucial perception and prediction stages which creates a significant domain gap and compromises decision-making capability; 2) Other VLMs can generate outputs for perception, prediction, and planning tasks but employ a fragmented decision-making approach where these modules operate separately, leading to a significant lack of synergy that undermines true planning performance. To address these limitations, we propose ${AutoDrive\text{-}P^3}$, a novel framework that seamlessly integrates $\textbf{P}$erception, $\textbf{P}$rediction, and $\textbf{P}$lanning through structured reasoning. We introduce the ${P^3\text{-}CoT}$ dataset to facilitate coherent reasoning and propose ${P^3\text{-}GRPO}$, a hierarchical reinforcement learning algorithm that provides progressive supervision across all three tasks. Specifically, ${AutoDrive\text{-}P^3}$ progressively generates CoT reasoning and answers for perception, prediction, and planning, where perception provides essential information for subsequent prediction and planning, while both perception and prediction collectively contribute to the final planning decisions, enabling safer and more interpretable autonomous driving. Additionally, to balance inference efficiency with performance, we introduce dual thinking modes: detailed thinking and fast thinking. Extensive experiments on both open-loop (nuScenes) and closed-loop (NAVSIMv1/v2) benchmarks demonstrate that our approach achieves state-of-the-art performance in planning tasks. Code is available at this https URL.

Comments: Accepted at ICLR 2026 (International Conference on Learning Representations)

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28116 [cs.RO]

(or arXiv:2603.28116v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.28116

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yuqi Ye [view email] [v1] Mon, 30 Mar 2026 07:28:41 UTC (11,136 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
$AutoDrive\…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 162 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!