HD-VGGT: High-Resolution Visual Geometry Transformer
arXiv:2603.27222v1 Announce Type: new Abstract: High-resolution imagery is essential for accurate 3D reconstruction, as many geometric details only emerge at fine spatial scales. Recent feed-forward approaches, such as the Visual Geometry Grounded Transformer (VGGT), have demonstrated the ability to infer scene geometry from large collections of images in a single forward pass. However, scaling these models to high-resolution inputs remains challenging: the number of tokens in transformer architectures grows rapidly with both image resolution and the number of views, leading to prohibitive com — Tianrun Chen, Yuanqi Hu, Yidong Han, Hanjie Xu, Deyi Ji, Qi Zhu, Chunan Yu, Xin Zhang, Cheng Chen, Chaotao Ding, Ying Zang, Xuanfu Li, Jin Ma, Lanyun Zhu
Authors:Tianrun Chen, Yuanqi Hu, Yidong Han, Hanjie Xu, Deyi Ji, Qi Zhu, Chunan Yu, Xin Zhang, Cheng Chen, Chaotao Ding, Ying Zang, Xuanfu Li, Jin Ma, Lanyun Zhu
View PDF HTML (experimental)
Abstract:High-resolution imagery is essential for accurate 3D reconstruction, as many geometric details only emerge at fine spatial scales. Recent feed-forward approaches, such as the Visual Geometry Grounded Transformer (VGGT), have demonstrated the ability to infer scene geometry from large collections of images in a single forward pass. However, scaling these models to high-resolution inputs remains challenging: the number of tokens in transformer architectures grows rapidly with both image resolution and the number of views, leading to prohibitive computational and memory costs. Moreover, we observe that visually ambiguous regions, such as repetitive patterns, weak textures, or specular surfaces, often produce unstable feature tokens that degrade geometric inference, especially at higher resolutions. We introduce HD-VGGT, a dual-branch architecture for efficient and robust high-resolution 3D reconstruction. A low-resolution branch predicts a coarse, globally consistent geometry, while a high-resolution branch refines details via a learned feature upsampling module. To handle unstable tokens, we propose Feature Modulation, which suppresses unreliable features early in the transformer. HD-VGGT leverages high-resolution images and supervision without full-resolution transformer costs, achieving state-of-the-art reconstruction quality.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.27222 [cs.CV]
(or arXiv:2603.27222v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.27222
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Deyi Ji [view email] [v1] Sat, 28 Mar 2026 10:29:07 UTC (3,398 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications - VentureBeat
Andrej Karpathy's new open source 'autoresearch' lets you run hundreds of AI experiments a night — with revolutionary implications VentureBeat

Vector researchers presented more than 50 papers at ICML 2024
Vector researchers presented more than 50 papers at the 2024 International Conference on Machine Learning (ICML). 35 papers co-authored by Vector Faculty Members were accepted to the conference, with a [ ] The post Vector researchers presented more than 50 papers at ICML 2024 appeared first on Vector Institute for Artificial Intelligence .
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Vector researchers presented more than 50 papers at ICML 2024
Vector researchers presented more than 50 papers at the 2024 International Conference on Machine Learning (ICML). 35 papers co-authored by Vector Faculty Members were accepted to the conference, with a [ ] The post Vector researchers presented more than 50 papers at ICML 2024 appeared first on Vector Institute for Artificial Intelligence .

Vector Researchers present papers at ACL 2024
Vector researchers will be well represented at the 62nd Annual Meeting of the Association for Computational Linguistics in Bangkok, Thailand this year. 14 papers co-authored by Vector-affiliated researchers are being [ ] The post Vector Researchers present papers at ACL 2024 appeared first on Vector Institute for Artificial Intelligence .



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!