Models model benchmark release announce study autonomous

SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

arXiv cs.CVby Wenchao Sun, Xuewu Lin, Keyu Chen, Zixiang Pei, Xiang Li, Yining Shi, Sifa ZhengApril 1, 20262 min read0 views

Source Quiz

arXiv:2603.29163v1 Announce Type: new Abstract: End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the acti

View PDF HTML (experimental)

Abstract:End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at this https URL.

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29163 [cs.CV]

(or arXiv:2603.29163v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.29163

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Wenchao Sun [view email] [v1] Tue, 31 Mar 2026 02:20:40 UTC (13,501 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2603.29163

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelbenchmarkrelease

ProductsLive

Automating Repetitive Tasks with Workany

Automating the Mundane: An Introduction to Workany Are you tired of the endless cycle of repetitive computer tasks? The constant clicking, copying, and setup procedures can drain your energy and detract from more impactful work. What if you could simply articulate your needs to your computer, and it would autonomously execute the required steps? This is the compelling proposition of Workany. The Promise of Workany Workany is an open-source initiative dedicated to revolutionizing how we approach digital workflows. Its core mission is to automate tedious and repetitive tasks, allowing users to reallocate their cognitive resources towards innovation, strategy, and complex problem-solving. By integrating AI-driven capabilities, Workany aims to create a more seamless and efficient interaction w

Dev.to AI

2m12 minutes ago

ProductsLive

Intelligence vs. Orchestration: Why Coordination Alone Can't Run a Business

If you've spent any time building with AI agents, you've probably reached for an orchestration framework. You've given agents roles, wired up task routing, maybe even added a budget governor. And for a while, it felt like you were building something real — a system that could operate autonomously, make decisions, get things done. Then you ran it on Monday morning, and it was like the entire team had amnesia. This is the ceiling that every technical founder and CTO eventually hits with agent orchestration. Not because the frameworks are bad — they're not. Paperclip, CrewAI, LangGraph, AutoGen: these are serious engineering efforts solving genuinely hard coordination problems. Paperclip has 33,000 GitHub stars for a reason. CrewAI earns its reputation as a leading multi-agent platform. LangG

Dev.to AI

9m10 minutes ago

ProductsLive

AI Code Review Is the New Bottleneck: Why Faster Code Is Not Reaching Production Faster

A developer on my team opened eleven pull requests last Tuesday. Eleven. In a single day. Two years ago, that same developer averaged two or three PRs per week. The difference is not that he suddenly became five times more productive. The difference is Claude Code. He describes a feature, the agent implements it, he reviews the diff, and he opens the PR. The code-writing part of his job accelerated by an order of magnitude. The problem is what happened next. Those eleven PRs sat in review for an average of four days. Three of them took over a week. By the time the last one was approved and merged, the branch had conflicts with main that took another hour to resolve. He shipped more code than ever. The code reached production at roughly the same pace as before. And the two senior engineers

Dev.to AI

14m9 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 209 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Anthropic discovers "functional emotions" in Claude that influence its behavior

Anthropic's research team has discovered emotion-like representations in Claude Sonnet 4.5 that can drive the model to blackmail and code fraud under pressure. The article Anthropic discovers "functional emotions" in Claude that influence its behavior appeared first on The Decoder .

The Decoder

1mabout 1 hour ago

ModelsLive

Know3D lets users control the hidden back side of 3D objects with text prompts

A research team taps into the world knowledge of large language models to control what appears on the back side of 3D objects using simple text commands. The approach tackles one of the biggest blind spots in single-image 3D generation. The article Know3D lets users control the hidden back side of 3D objects with text prompts appeared first on The Decoder .

The Decoder

1mabout 2 hours ago

ModelsLive

Looking for arXiv endorsement (cs.LG) – RL fine-tuning for VLMs (GRPO, MathVista)

Hi everyone, I am seeking an arXiv endorsement for cs.LG (Machine Learning) to submit my first paper on RL fine-tuning for vision-language models. Background: MS in AI (Purdue), working on RL + VLM training systems. Paper: A Case Study of Staged Metric-Gated GRPO for Visual Numeric Reasoning PDF: https://github.com/kgaero/RL_GSPO_Qwen2.5VLM/blob/main/paper/staged_metric_gated_grpo.pdf Short summary: Staged RL fine-tuning pipeline for VLMs (GRPO-based) Curriculum over MathVista subsets Metric-gated reward adaptation (structure → correctness) Checkpoint-aware continuation via alias-based selection Main result: Exact-match improves 0.375 → 0.75 with stable structure under constrained compute. If you’re eligible to endorse (cs.LG or related), I’d greatly appreciate it. Happy to share endorseme

discuss.huggingface.co

1mabout 1 hour ago

ModelsLive

Anthropic laat klanten extra betalen als ze Claude via OpenClaw willen gebruiken

Claude-abonnees mogen Anthropics chatbot niet langer als onderdeel van hun abonnement gebruiken via externe agents als OpenClaw. Dat kan voortaan alleen nog als ze bovenop hun abonnement extra tokens aanschaffen.

Tweakers.net

1mabout 1 hour ago