Research Papers research paper arxiv computer-vision image-recognition

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

arXivMarch 31, 20262 min read1 views

arXiv:2412.14015v4 Announce Type: replace Abstract: Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To — Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang

View PDF HTML (experimental)

Abstract:Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containin both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Comments: CVPR 2025; Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2412.14015 [cs.CV]

(or arXiv:2412.14015v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2412.14015

arXiv-issued DOI via DataCite

Submission history

From: Haotong Lin [view email] [v1] Wed, 18 Dec 2024 16:32:12 UTC (35,944 KB) [v2] Tue, 22 Apr 2025 14:42:39 UTC (37,952 KB) [v3] Fri, 13 Feb 2026 16:17:04 UTC (21,767 KB) [v4] Sat, 28 Mar 2026 09:25:50 UTC (16,489 KB)

Original source

arXiv

https://arxiv.org/abs/2412.14015

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersRecent

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats - The Guardian Nigeria News

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats The Guardian Nigeria News

GNews AI USA

1m2 days ago

Products

How Customers Are Using AI Search [2025 Research] - Bain & Company

How Customers Are Using AI Search [2025 Research] Bain & Company

GNews AI search

1m8 months ago

Releases

France launches expert group on AI’s psychological threat - Research Professional News

France launches expert group on AI’s psychological threat Research Professional News

GNews AI France

1mabout 1 month ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 145 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats - The Guardian Nigeria News

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats The Guardian Nigeria News

GNews AI USA

1m2 days ago

Research PapersFresh

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous.4open.science/ ) to help supplement their rebuttal. Is this against any policy? I'm considering submitting additional graphs during the discussion phase for clarity, and would like to make sure that won't cause any issues submitted by /u/drahcirenoob [link] [comments]

Reddit r/MachineLearning

1mabout 3 hours ago

Research Papers

Tech Moves: Microsoft execs depart; TerraClear, UserTesting, EchoMark and Read AI add leaders - GeekWire

Tech Moves: Microsoft execs depart; TerraClear, UserTesting, EchoMark and Read AI add leaders GeekWire

GNews AI Microsoft

1m2 days ago

Research PapersFresh

[D] Is research in semantic segmentation saturated?

Nowadays I dont see a lot of papers addressing 2D semantic segmentation problem statements be it supervised, semi-supervised, domain adaptation. Is the problem statement saturated? Are there any promising research directions in segmentation except open-set segmentation? submitted by /u/Hot_Version_6403 [link] [comments]

Reddit r/MachineLearning

1mabout 10 hours ago