Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderThe European Union's main institutions have banned staff from using fully AI-generated videos and images in official communications (Pieter Haeck/Politico)TechmemeThe Axios Supply Chain Attack Explained: How a Compromised npm Account Put 83 Million Projects at RiskDEV CommunityFrom Zero to Everything: The Story of My First ProjectDEV CommunityHow I Stopped Hallucinations in My AI Application Built on AWS BedrockDEV CommunityThe Agent Economy Needs Infrastructure, Not CustodyDEV CommunityBeyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%DEV CommunityInside the Anthropic leak: 4 hidden Claude features that could redefine AI forever - Moneycontrol.comGoogle News: ClaudeWe Benchmarked Our SSR Framework Against Next.js — Here's What We FoundDEV CommunityOpenAI’s Secret Project to Train ChatGPT on 400+ Specialized Jobs - Startup FortuneGoogle News: ChatGPTBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessA suspected system failure caused a number of Baidu robotaxis to stop across Wuhan, trapping passengers and reportedly causing traffic disruptions and crashes (Zeyi Yang/Wired)TechmemeGrab, in partnership with WeRide, launches a robotaxi service in Singapore, becoming Southeast Asia's first ride-hailing provider to offer a driverless service (Bloomberg)TechmemeMichael Jordan, 63, credits one trait for making him great: 'It keeps me young'Business InsiderThe European Union's main institutions have banned staff from using fully AI-generated videos and images in official communications (Pieter Haeck/Politico)TechmemeThe Axios Supply Chain Attack Explained: How a Compromised npm Account Put 83 Million Projects at RiskDEV CommunityFrom Zero to Everything: The Story of My First ProjectDEV CommunityHow I Stopped Hallucinations in My AI Application Built on AWS BedrockDEV CommunityThe Agent Economy Needs Infrastructure, Not CustodyDEV CommunityBeyond Static RAG: Using 1958 Biochemistry to Beat Multi-Hop Retrieval by 14%DEV CommunityInside the Anthropic leak: 4 hidden Claude features that could redefine AI forever - Moneycontrol.comGoogle News: ClaudeWe Benchmarked Our SSR Framework Against Next.js — Here's What We FoundDEV CommunityOpenAI’s Secret Project to Train ChatGPT on 400+ Specialized Jobs - Startup FortuneGoogle News: ChatGPT

StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2512.01707v2 Announce Type: replace-cross Abstract: Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications such as Augmented Reality (AR) glasses. While prior streaming benchmarks evaluate temporal reasoning, none measure whether Multimodal Large Language Models (MLLMs) can interpret or leverage human gaze signals within a streaming setting. To fill this gap, we introduce StreamGaze, the first benchmark designed to evaluate how effectively MLLMs utilize gaze for temporal and proactive — Daeun Lee, Subhojyoti Mukherjee, Branislav Kveton, Ryan A. Rossi, Viet Dac Lai, Seunghyun Yoon, Trung Bui, Franck Dernoncourt, Mohit Bansal

View PDF HTML (experimental)

Abstract:Streaming video understanding requires models not only to process temporally incoming frames, but also to anticipate user intention for realistic applications such as Augmented Reality (AR) glasses. While prior streaming benchmarks evaluate temporal reasoning, none measure whether Multimodal Large Language Models (MLLMs) can interpret or leverage human gaze signals within a streaming setting. To fill this gap, we introduce StreamGaze, the first benchmark designed to evaluate how effectively MLLMs utilize gaze for temporal and proactive reasoning in streaming videos. StreamGaze introduces gaze-guided past, present, and proactive tasks that comprehensively assess streaming video understanding. These tasks evaluate whether models can use real-time gaze signals to follow shifting attention and infer user intentions based only on past and currently observed frames. To build StreamGaze, we develop a gaze-video Question Answering (QA) generation pipeline that aligns egocentric videos with raw gaze trajectories through fixation extraction, region-specific visual prompting, and scanpath construction. This pipeline produces spatio-temporally grounded QA pairs that reflect human perceptual dynamics. Across all StreamGaze tasks, we observe substantial performance gaps between state-of-the-art MLLMs and human performance, highlighting key limitations in gaze-based temporal reasoning, intention modeling, and proactive prediction. We further provide detailed analyses of gaze prompting strategies, reasoning behaviors, and task-specific failure modes, offering insights into current limitations and directions for future research. All data and code are publicly available to support continued research in gaze-guided streaming video understanding.

Comments: Accepted to CVPR 2026, Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

Cite as: arXiv:2512.01707 [cs.CV]

(or arXiv:2512.01707v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.01707

arXiv-issued DOI via DataCite

Submission history

From: Daeun Lee [view email] [v1] Mon, 1 Dec 2025 14:15:44 UTC (4,693 KB) [v2] Fri, 27 Mar 2026 17:30:08 UTC (4,748 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
StreamGaze:…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 183 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers