Research Papers research paper arxiv computer-vision image-recognition

HandVQA: Diagnosing and Improving Fine-Grained Spatial Reasoning about Hands in Vision-Language Models

arXivMarch 30, 202610 min read0 views

arXiv:2603.26362v1 Announce Type: new Abstract: Understanding the fine-grained articulation of human hands is critical in high-stakes settings such as robot-assisted surgery, chip manufacturing, and AR/VR-based human-AI interaction. Despite achieving near-human performance on general vision-language benchmarks, current vision-language models (VLMs) struggle with fine-grained spatial reasoning, especially in interpreting complex and articulated hand poses. We introduce HandVQA, a large-scale diagnostic benchmark designed to evaluate VLMs' understanding of detailed hand anatomy through visual qu — MD Khalequzzaman Chowdhury Sayem, Mubarrat Tajoar Chowdhury, Yihalem Yimolal Tiruneh, Muneeb A. Khan, Muhammad Salman Ali, Binod Bhattarai, Seungryul Baek

View PDF HTML (experimental)

Abstract:Understanding the fine-grained articulation of human hands is critical in high-stakes settings such as robot-assisted surgery, chip manufacturing, and AR/VR-based human-AI interaction. Despite achieving near-human performance on general vision-language benchmarks, current vision-language models (VLMs) struggle with fine-grained spatial reasoning, especially in interpreting complex and articulated hand poses. We introduce HandVQA, a large-scale diagnostic benchmark designed to evaluate VLMs' understanding of detailed hand anatomy through visual question answering. Built upon high-quality 3D hand datasets (FreiHAND, InterHand2.6M, FPHA), our benchmark includes over 1.6M controlled multiple-choice questions that probe spatial relationships between hand joints, such as angles, distances, and relative positions. We evaluate several state-of-the-art VLMs (LLaVA, DeepSeek and Qwen-VL) in both base and fine-tuned settings, using lightweight fine-tuning via LoRA. Our findings reveal systematic limitations in current models, including hallucinated finger parts, incorrect geometric interpretations, and poor generalization. HandVQA not only exposes these critical reasoning gaps but provides a validated path to improvement. We demonstrate that the 3D-grounded spatial knowledge learned from our benchmark transfers in a zero-shot setting, significantly improving accuracy of model on novel downstream tasks like hand gesture recognition (+10.33%) and hand-object interaction (+2.63%).

Comments: Accepted in CVPR 2026; Project page, code, and dataset: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26362 [cs.CV]

(or arXiv:2603.26362v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26362

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: MD Khalequzzaman Chowdhury Sayem [view email] [v1] Fri, 27 Mar 2026 12:42:26 UTC (4,468 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26362

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Countries

Uganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6 - Independent Newspaper Nigeria

<a href="https://news.google.com/rss/articles/CBMimAFBVV95cUxNcnBtdldJUERlX0dzOTJEY2sybEc2ZjZSbUtiLWIzUUhJbkQ1N3BwUWlCcV95YmZNSmFGbFQ1enE5VWJlY0JBWDhlSENlNEFNMmM5Q0hrM080V3Q2eUF3cmpkeFBXRS01YXBpRUI4Uk5KOVY5bjFaRm1GNmVudGUtNTFmVDlBMDIyNGVGaF9WTkdHTDMxY1BZcw?oc=5" target="_blank">Uganda To Host Climate Change, Artificial Intelligence Summit, Sept 5-6</a> Independent Newspaper Nigeria

Google News - AI Uganda

1m15 days ago

Research Papers

AI could transform research assessment — and some academics are worried - Nature

<a href="https://news.google.com/rss/articles/CBMiX0FVX3lxTE12VmJ3THU1WmwzcENmWFJqTVRfclJGVkhzTG9Kcm9mTm1VZnJsV2IyZGwtc21EWnZRSkRfSXM3SDRlOVZnUlhpVm9VUEMtRWRRYmNDVU1kdHg5NllvSERj?oc=5" target="_blank">AI could transform research assessment — and some academics are worried</a> Nature

GNews AI UK

1mabout 2 months ago

Releases

Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories - Music Business Worldwide

<a href="https://news.google.com/rss/articles/CBMi5wFBVV95cUxQaW5rU25RUmwtd01xd0xKRVlDWEx6b204MFYzM3FHQlBXeE5wYzhYczVGdm1HOS03VjVURE02YzBGcE8yYTRzbk1IX3AtVlJmeUVaazlVQWduNnYxN05mamVYVGNmNGdFOVRxbTRhV3hqamhfY1JNSTdsTTB1U2Nic2lNcnd2YVpFMUY5YmlyWVZFY1FQTGd3dndCS3R6Zmt3QWVnWm14WFdVeUNFd0Y0a1FQU1ZLT2psSVRxeWQ0X0FaSGhxQU5UbjZBT1JGWDZERmRRV1c1VEU0RkNkZF9HLWZyXzFxUmc?oc=5" target="_blank">Instrument maker Roland launches AI melody generator powered by research from Sony Computer Science Laboratories</a> Music Business Worldwide

GNews AI music

1m14 days ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 96 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

AI could transform research assessment — and some academics are worried - Nature

GNews AI UK

1mabout 2 months ago

Research PapersLive

Watch Out Bitcoin: Cryptography-Breaking Quantum Computers May Be Closer Than Expected, Says Caltech

Research suggests fault-tolerant quantum machines could arrive sooner than expected, posing a threat to Bitcoin and Ethereum cryptography.

Decrypt AI

1mabout 1 hour ago

Research Papers

As AI-Generated Music Advances, Humans Still Lead in Creativity, CMU Research Finds

<img loading="lazy" src="https://www.cmu.edu/news/sites/default/files/styles/listings_desktop_1x_/public/2026-01/251104A_WTM_AI-Creativity-Music102.jpg.webp?itok=uEc2ayOO" width="900" height="508" alt="A woman with long black hair is seated on the right opposite a computer screen with a small piano keyboard and computer keyboard in front of her on a desk, where a man next to her with glasses and wavy black hair operates the mouse and talks to her."> AI can write songs, but still has a way to go before matching the creativity of tunes made by people, according to Carnegie Mellon University research.

Carnegie Mellon News

1m2 months ago

Research PapersFresh

Precision Proactivity: Measuring Cognitive Load in Real-World AI-Assisted Work

Article URL: https://arxiv.org/abs/2505.10742 Comments URL: https://news.ycombinator.com/item?id=47595100 Points: 1 # Comments: 0

Hacker News AI Top

2mabout 2 hours ago