Make Geometry Matter for Spatial Reasoning
GeoSR enhances vision-language models' spatial reasoning capabilities by strategically incorporating geometry tokens through masking and guided fusion mechanisms. (11 upvotes on HuggingFace)
Published on Mar 27
Authors:
,
Abstract
GeoSR enhances vision-language models' spatial reasoning capabilities by strategically incorporating geometry tokens through masking and guided fusion mechanisms.
AI-generated summary
Empowered by large-scale training, vision-language models (VLMs) achieve strong image and video understanding, yet their ability to perform spatial reasoning in both static scenes and dynamic videos remains limited. Recent advances try to handle this limitation by injecting geometry tokens from pretrained 3D foundation models into VLMs. Nevertheless, we observe that naive token fusion followed by standard fine-tuning in this line of work often leaves such geometric cues underutilized for spatial reasoning, as VLMs tend to rely heavily on 2D visual cues. In this paper, we propose GeoSR, a framework designed to make geometry matter by encouraging VLMs to actively reason with geometry tokens. GeoSR introduces two key components: (1) Geometry-Unleashing Masking, which strategically masks portions of 2D vision tokens during training to weaken non-geometric shortcuts and force the model to consult geometry tokens for spatial reasoning; and (2) Geometry-Guided Fusion, a gated routing mechanism that adaptively amplifies geometry token contributions in regions where geometric evidence is critical. Together, these designs unleash the potential of geometry tokens for spatial reasoning tasks. Extensive experiments on both static and dynamic spatial reasoning benchmarks demonstrate that GeoSR consistently outperforms prior methods and establishes new state-of-the-art performance by effectively leveraging geometric information. The project page is available at https://suhzhang.github.io/GeoSR/.
View arXiv page View PDF Project page GitHub 12 Add to collection
Get this paper in your agent:
hf papers read 2603.26639
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 0
No model linking this paper
Cite arxiv.org/abs/2603.26639 in a model README.md to link it from this page.
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2603.26639 in a dataset README.md to link it from this page.
Spaces citing this paper 0
No Space linking this paper
Cite arxiv.org/abs/2603.26639 in a Space README.md to link it from this page.
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!