Models model language model benchmark announce platform analysis

A Semantic Observer Layer for Autonomous Vehicles: Pre-Deployment Feasibility Study of VLMs for Low-Latency Anomaly Detection

arXiv cs.ROby Kunal Runwal, Swaraj Gajare, Daniel Adejumo, Omkar Ankalkope, Siddhant Baroth, Aliasghar ArabApril 1, 20261 min read0 views

Source Quiz

arXiv:2603.28888v1 Announce Type: new Abstract: Semantic anomalies-context-dependent hazards that pixel-level detectors cannot reason about-pose a critical safety risk in autonomous driving. We propose a \emph{semantic observer layer}: a quantized vision-language model (VLM) running at 1--2\,Hz alongside the primary AV control loop, monitoring for semantic edge cases, and triggering fail-safe handoffs when detected. Using Nvidia Cosmos-Reason1-7B with NVFP4 quantization and FlashAttention2, we achieve ~500 ms inference a ~50x speedup over the unoptimized FP16 baseline (no quantization, standard PyTorch attention) on the same hardware--satisfying the observer timing budget. We benchmark accuracy, latency, and quantization behavior in static and video conditions, identify NF4 recall collapse

View PDF HTML (experimental)

Abstract:Semantic anomalies-context-dependent hazards that pixel-level detectors cannot reason about-pose a critical safety risk in autonomous driving. We propose a \emph{semantic observer layer}: a quantized vision-language model (VLM) running at 1--2,Hz alongside the primary AV control loop, monitoring for semantic edge cases, and triggering fail-safe handoffs when detected. Using Nvidia Cosmos-Reason1-7B with NVFP4 quantization and FlashAttention2, we achieve ~500 ms inference a ~50x speedup over the unoptimized FP16 baseline (no quantization, standard PyTorch attention) on the same hardware--satisfying the observer timing budget. We benchmark accuracy, latency, and quantization behavior in static and video conditions, identify NF4 recall collapse (10.6%) as a hard deployment constraint, and a hazard analysis mapping performance metrics to safety goals. The results establish a pre-deployment feasibility case for the semantic observer architecture on embodied-AI AV platforms.

Subjects:

Robotics (cs.RO)

Cite as: arXiv:2603.28888 [cs.RO]

(or arXiv:2603.28888v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.28888

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Aliasghar Arab [view email] [v1] Mon, 30 Mar 2026 18:14:03 UTC (2,393 KB)

Original source

arXiv cs.RO

https://arxiv.org/abs/2603.28888

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

ProductsLive

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Microsoft’s decision to invest $10 billion in Japan between 2026 and 2029 looks like one of those stories that is easy to file under ‘big tech spends big again’. That would be a mistake. This is not just another data center expansion. It is a clear signal that the next phase of the AI race is shifting away from flashy model launches and toward something much harder to copy: national-scale infrastructure, workforce readiness, and cyber resilience. According to Reuters and follow-on reporting from Bloomberg and The Japan Times, the package is aimed at expanding AI infrastructure in Japan, deepening cybersecurity cooperation with the government, and supporting large-scale skills development. That combination matters. Microsoft is not merely selling cloud capacity into an attractive market. It

Dev.to AI

5m15 minutes ago

ProductsLive

How Cloud-Based Data Systems Are Transforming Businesses

Introduction In today’s digital-first world, businesses are generating more data than ever before. Managing this data efficiently has become a critical challenge—and opportunity. Traditional on-premise systems are no longer sufficient to handle the scale, speed, and complexity of modern data needs. This is where cloud-based data systems come into play. By offering scalable storage, real-time processing, and cost-effective infrastructure, cloud technologies are revolutionizing how businesses operate, innovate, and grow. What Are Cloud-Based Data Systems? Cloud-based data systems refer to platforms and services that store, manage, and process data over the internet instead of local servers. These systems allow businesses to access their data anytime, anywhere, without the need for heavy phys

Dev.to AI

3m19 minutes ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - wsj.com

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models wsj.com

Google News: LLM

1m3 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 194 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT wsj.com

Google News: OpenAI

1m4 days ago

Models

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - wsj.com

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models wsj.com

Google News: LLM

1m3 days ago

ModelsLive

The clock is ticking on law's billable hour, says a top Cleary Gottlieb lawyer

For decades, elite corporate law firms have relied on the billable hour to fuel profits. Can the model survive the age of AI?

Business Insider

5m8 minutes ago

ModelsFresh

Causal Scene Narration with Runtime Safety Supervision for Vision-Language-Action Driving

arXiv:2604.01723v1 Announce Type: new Abstract: Vision-Language-Action (VLA) models for autonomous driving must integrate diverse textual inputs, including navigation commands, hazard warnings, and traffic state descriptions, yet current systems often present these as disconnected fragments, forcing the model to discover on its own which environmental constraints are relevant to the current maneuver. We introduce Causal Scene Narration (CSN), which restructures VLA text inputs through intent-constraint alignment, quantitative grounding, and structured separation, at inference time with zero GPU cost. We complement CSN with Simplex-based runtime safety supervision and training-time alignment via Plackett-Luce DPO with negative log-likelihood (NLL) regularization. A multi-town closed-loop CA

arXiv cs.RO

1mabout 5 hours ago