Do You See What I Am Pointing At? Gesture-Based Egocentric Video Question Answering
arXiv:2603.12533v2 Announce Type: replace Abstract: Understanding and answering questions based on a user's pointing gesture is essential for next-generation egocentric AI assistants. However, current Multimodal Large Language Models (MLLMs) struggle with such tasks due to the lack of gesture-rich data and their limited ability to infer fine-grained pointing intent from egocentric video. To address this, we introduce EgoPointVQA, a dataset and benchmark for gesture-grounded egocentric question answering, comprising 4000 synthetic and 400 real-world videos across multiple deictic reasoning task — Yura Choi, Roy Miles, Rolandos Alexandros Potamias, Ismail Elezi, Jiankang Deng, Stefanos Zafeiriou
View PDF HTML (experimental)
Abstract:Understanding and answering questions based on a user's pointing gesture is essential for next-generation egocentric AI assistants. However, current Multimodal Large Language Models (MLLMs) struggle with such tasks due to the lack of gesture-rich data and their limited ability to infer fine-grained pointing intent from egocentric video. To address this, we introduce EgoPointVQA, a dataset and benchmark for gesture-grounded egocentric question answering, comprising 4000 synthetic and 400 real-world videos across multiple deictic reasoning tasks. Built upon it, we further propose Hand Intent Tokens (HINT), which encodes tokens derived from 3D hand keypoints using an off-the-shelf reconstruction model and interleaves them with the model input to provide explicit spatial and temporal context for interpreting pointing intent. We show that our model outperforms others in different backbones and model sizes. In particular, HINT-14B achieves 68.1% accuracy, on average over 6 tasks, surpassing the state-of-the-art, InternVL3-14B, by 6.6%. To further facilitate the open research, we will release the code, model, and dataset. Project page: this https URL
Comments: Accepted to CVPR 2026
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.12533 [cs.CV]
(or arXiv:2603.12533v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.12533
arXiv-issued DOI via DataCite
Submission history
From: Yura Choi [view email] [v1] Fri, 13 Mar 2026 00:23:38 UTC (43,500 KB) [v2] Fri, 27 Mar 2026 20:16:51 UTC (43,499 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hell
<h4>Chocolate Factory’s compression tech clears the way to cheaper AI inference, not more affordable memory</h4> <p>When Google unveiled <a target="_blank" rel="nofollow" href="https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/">TurboQuant</a>, an AI data compression technology that promises to slash the amount of memory required to serve models, many hoped it would help with a memory shortage that has seen prices triple since last year. Not so much.…</p>
Illinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!
<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE13OVpWMEk1Z3hlMkR2bHNBQ2dkazFwb3VqN3hCa29GWGJvSVlPa00zd2xUakRmYXFqQmc5OWU0eGl4a21FMDAwWUN2Q3p0M3FrbXBkNV8zN0cxaG1s?oc=5" target="_blank">Illinois Tech computer science researcher honored by IEEE Chicago Section</a> <font color="#6f6f6f">EurekAlert!</font>

My Journey to becoming a Quantum Engineer
<p>I have procrastinated on documenting this process for the longest time. But I think i am ready now (maybe). <br> Coming from a front end engineering background, I am fascinated by the work being done by the quantum engineers at IBM. I am not that great with maths and statistics but I believe anything can be learned with tons of practice and consistency. I want to use this platform to hold myself accountable (that is if i don't give up half way and delete all my posts. I'll try not to btw). </p> <p>This is an article describing <a href="https://www.ibm.com/think/topics/quantum-computing" rel="noopener noreferrer">what quantum computing is</a> and some of it's use cases.</p> <p>I became an IBM qiskit advocate late last year and I have been exposed to a lot of resources and networked a bun
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Illinois Tech computer science researcher honored by IEEE Chicago Section - EurekAlert!
<a href="https://news.google.com/rss/articles/CBMiXEFVX3lxTE13OVpWMEk1Z3hlMkR2bHNBQ2dkazFwb3VqN3hCa29GWGJvSVlPa00zd2xUakRmYXFqQmc5OWU0eGl4a21FMDAwWUN2Q3p0M3FrbXBkNV8zN0cxaG1s?oc=5" target="_blank">Illinois Tech computer science researcher honored by IEEE Chicago Section</a> <font color="#6f6f6f">EurekAlert!</font>
AI maps science papers to predict research trends two to three years ahead - Tech Xplore
<a href="https://news.google.com/rss/articles/CBMie0FVX3lxTE5aTkZYTWdaRDZwTXNRMldpMG1WZ1YzWDZTOHN5M183Z3A1ZTFYbnhEWTdPRmpvZnZFU0xodlRsNWxFaGxTcEpwalhJNmJpQWE5VjhaRS1tOXJIeTc5Z0JNblJ3dFd4WjRYZGJOX0NrWGt6ZmZJVTBpRm5wWQ?oc=5" target="_blank">AI maps science papers to predict research trends two to three years ahead</a> <font color="#6f6f6f">Tech Xplore</font>
AI inspires new research topics in materials science - Nanowerk
<a href="https://news.google.com/rss/articles/CBMiZ0FVX3lxTFBPWlJSM2ExeVQ3LVppTm45NHpEMW9YVkxscThCNDd2OVB0c3J1ZmVCbWNSZWZ0TjZwSzlOdEFXN2UtRk5LU1hxdXd4ZklldGxoM0FZSnhCd19PWkNHQ1ZRVDNwSHNUSk0?oc=5" target="_blank">AI inspires new research topics in materials science</a> <font color="#6f6f6f">Nanowerk</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!