Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business'AI-pilled' engineers are working harder and burning out faster, Django co-creator saysBusiness InsiderK-pop has an AI problem - dazeddigital.comGoogle News: Generative AIOpenAI’s new ChatGPT base model ‘Spud’: All you need to know - Storyboard18Google News: ChatGPTGoogle DeepMind Launches Gemma 4 Amid Competition from Chinese Open Models - Analytics India MagazineGoogle News: DeepMindMicrosoft releases foundational AI models targeting enterprisesSilicon RepublicCan AI chatbots effectively support cancer patients during treatments? - ESMO Daily ReporterGoogle News: AIAccelerating drug discovery with “paradigm shifting” AI model - BioTechniquesGoogle News: Machine LearningStep by Step Guide to Build an End-to-End Model Optimization Pipeline with NVIDIA Model Optimizer Using FastNAS Pruning and Fine-TuningMarkTechPostSeeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agentsdiscuss.huggingface.coGenerative AI: A Legal Framework in Development - group.bnpparibasGoogle News: Generative AIMicrosoft announces US$10B AI investment plan in Japan - MSNGNews AI USAS. Korea, France Bolster Ties in AI, Quantum Computing - KBS WORLD RadioGNews AI KoreaBlack Hat USAAI BusinessBlack Hat AsiaAI Business'AI-pilled' engineers are working harder and burning out faster, Django co-creator saysBusiness InsiderK-pop has an AI problem - dazeddigital.comGoogle News: Generative AIOpenAI’s new ChatGPT base model ‘Spud’: All you need to know - Storyboard18Google News: ChatGPTGoogle DeepMind Launches Gemma 4 Amid Competition from Chinese Open Models - Analytics India MagazineGoogle News: DeepMindMicrosoft releases foundational AI models targeting enterprisesSilicon RepublicCan AI chatbots effectively support cancer patients during treatments? - ESMO Daily ReporterGoogle News: AIAccelerating drug discovery with “paradigm shifting” AI model - BioTechniquesGoogle News: Machine LearningStep by Step Guide to Build an End-to-End Model Optimization Pipeline with NVIDIA Model Optimizer Using FastNAS Pruning and Fine-TuningMarkTechPostSeeking arXiv cs.AI endorsement — neuroscience-inspired memory architecture for AI agentsdiscuss.huggingface.coGenerative AI: A Legal Framework in Development - group.bnpparibasGoogle News: Generative AIMicrosoft announces US$10B AI investment plan in Japan - MSNGNews AI USAS. Korea, France Bolster Ties in AI, Quantum Computing - KBS WORLD RadioGNews AI Korea
AI NEWS HUBbyEIGENVECTOREigenvector

StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation

arXivMarch 31, 20262 min read1 views
Source Quiz

arXiv:2603.28565v1 Announce Type: cross Abstract: Vision-language-action (VLA) models have demonstrated exceptional performance in natural language-driven perception and control. However, the high computational cost of VLA models poses significant efficiency challenges, particularly for resource-constrained edge platforms in real-world deployments. However, since different stages of VLA (observation, action generation and execution) must proceed sequentially, and wait for the completion of the preceding stage, the system suffers from frequent halting and high latency. To address this, We condu — Yiran Shi, Dongqi Guo, Tianchen Zhao, Feng Gao, Liangzhi Shi, Chao Yu, ZhiJian Mo, Qihua Xiao, XiaoShuai Peng, Qingmin Liao, Yu Wang

Authors:Yiran Shi, Dongqi Guo, Tianchen Zhao, Feng Gao, Liangzhi Shi, Chao Yu, ZhiJian Mo, Qihua Xiao, XiaoShuai Peng, Qingmin Liao, Yu Wang

View PDF HTML (experimental)

Abstract:Vision-language-action (VLA) models have demonstrated exceptional performance in natural language-driven perception and control. However, the high computational cost of VLA models poses significant efficiency challenges, particularly for resource-constrained edge platforms in real-world deployments. However, since different stages of VLA (observation, action generation and execution) must proceed sequentially, and wait for the completion of the preceding stage, the system suffers from frequent halting and high latency. To address this, We conduct a systematic analysis to identify the challenges for fast and fluent generation, and propose enabling VLAs with the ability to asynchronously parallelize across VLA stages in a "streaming" manner. First, we eliminate the reliance on action chunking and adopt action flow matching, which learns the trajectory of action flows rather than denoising chunk-wise actions. It overlaps the latency of action generation and execution. Second, we design an action saliency-aware adaptive observation mechanism, thereby overlapping the latency of execution and observation. Without sacrificing performance, StreamingVLA achieves substantial speedup and improves the fluency of execution. It achieves a 2.4 $\times$ latency speedup and reduces execution halting by 6.5 $\times$.

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.28565 [cs.RO]

(or arXiv:2603.28565v1 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.28565

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yiran Shi [view email] [v1] Mon, 30 Mar 2026 15:23:27 UTC (4,298 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
StreamingVL…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 185 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers