Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessThe way I see it — The development of autonomous vehicles is fraught with ethical concerns. And: The notion that the separatiDev.to AIThe Architect’s Reflection: The 5D MiddlewareMedium AII Am a Software Engineer Teaching Myself AI Engineering. Here Is Where I Am Starting.Medium AIWhy OpenAI’s TBPN Acquisition Is a Turning Point for Enterprise AIMedium AIUpstage, a South Korean artificial intelligence (AI) startup, met with French AI unicorn Mistral AI - 매일경제GNews AI MistralThis Artificial Intelligence (AI) Stock Could Be a Hidden Gem (and Here's Why) - The Motley FoolGoogle News: AIAI, Warfare, and Augmented Cities - Small Wars JournalGNews AI USAGamingtak Sony koopt start-up die foto s en video s omzet naar 3dTweakers.netChinese Chip Makers Hit Record Revenue on AI Boom, US Curbs - The Tech BuzzGNews AI China7 Key AI PC Functions That Upgrade Everyday Computing - vocal.mediaGNews AI educationMicrosoft Launches Three New AI Models to Advance Speech, Voice, and Image Capabilities - CXO DigitalpulseGNews AI voiceU.S. and China control 90% of AI data centres — the Global South is building a different kind of AI - Silicon CanalsGNews AI ChinaBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessThe way I see it — The development of autonomous vehicles is fraught with ethical concerns. And: The notion that the separatiDev.to AIThe Architect’s Reflection: The 5D MiddlewareMedium AII Am a Software Engineer Teaching Myself AI Engineering. Here Is Where I Am Starting.Medium AIWhy OpenAI’s TBPN Acquisition Is a Turning Point for Enterprise AIMedium AIUpstage, a South Korean artificial intelligence (AI) startup, met with French AI unicorn Mistral AI - 매일경제GNews AI MistralThis Artificial Intelligence (AI) Stock Could Be a Hidden Gem (and Here's Why) - The Motley FoolGoogle News: AIAI, Warfare, and Augmented Cities - Small Wars JournalGNews AI USAGamingtak Sony koopt start-up die foto s en video s omzet naar 3dTweakers.netChinese Chip Makers Hit Record Revenue on AI Boom, US Curbs - The Tech BuzzGNews AI China7 Key AI PC Functions That Upgrade Everyday Computing - vocal.mediaGNews AI educationMicrosoft Launches Three New AI Models to Advance Speech, Voice, and Image Capabilities - CXO DigitalpulseGNews AI voiceU.S. and China control 90% of AI data centres — the Global South is building a different kind of AI - Silicon CanalsGNews AI China
AI NEWS HUBbyEIGENVECTOREigenvector

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

arXivby [Submitted on 26 Mar 2026]March 30, 20262 min read1 views
Source Quiz

arXiv:2603.25946v1 Announce Type: cross Abstract: High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the CARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in closed-loop evaluations, collision-aware representation learning has received limited attention. To address this gap, we first develop a Video-Language-Augmented Anomaly Detector (VLAAD), leveraging a Multiple Instance Learning (MIL) formulation to obtain stable, temporally localized collision signals for proactive pr — Alex Koran, Dimitrios Sinodinos, Hadi Hojjati, Takuya Nanri, Fangge Chen, Narges Armanfard

View PDF HTML (experimental)

Abstract:High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the CARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in closed-loop evaluations, collision-aware representation learning has received limited attention. To address this gap, we first develop a Video-Language-Augmented Anomaly Detector (VLAAD), leveraging a Multiple Instance Learning (MIL) formulation to obtain stable, temporally localized collision signals for proactive prediction. To transition these capabilities into closed-loop simulations, we must overcome the limitations of existing simulator datasets, which lack multimodality and are frequently restricted to simple intersection scenarios. Therefore, we introduce CARLA-Collide, a large-scale multimodal dataset capturing realistic collision events across highly diverse road networks. Trained on this diverse simulator data, VLAAD serves as a collision-aware plug-in module that can be seamlessly integrated into existing E2E driving models. By integrating our module into a pretrained TransFuser++ agent, we demonstrate a 14.12% relative increase in driving score with minimal fine-tuning. Beyond closed-loop evaluation, we further assess the generalization capability of VLAAD in an open-loop setting using real-world driving data. To support this analysis, we introduce Real-Collide, a multimodal dataset of diverse dashcam videos paired with semantically rich annotations for collision detection and prediction. On this benchmark, despite containing only 0.6B parameters, VLAAD outperforms a multi-billion-parameter vision-language model, achieving a 23.3% improvement in AUC.

Comments: 33 pages, 11 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2603.25946 [cs.CV]

(or arXiv:2603.25946v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.25946

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hadi Hojjati [view email] [v1] Thu, 26 Mar 2026 22:32:52 UTC (4,020 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Collision-A…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 247 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers