Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityCybersecurity Leaders to Watch in California’s Artificial Intelligence Industry - Security BoulevardGoogle News: AILess than a year after Anthropic called out China as an 'enemy nation', 'Claude leak' sends Chinese devel - The Times of IndiaGoogle News: ClaudeSources: Sam Altman has excluded OpenAI CFO Sarah Friar from some key financial meetings; Friar began reporting to Fidji Simo instead of the CEO in August 2025 (The Information)TechmemeReport says Minnesota workers face highest generative AI exposure in the Midwest - The Minnesota DailyGoogle News: Generative AIA 9-Million-Parameter LLM That Fits in 130 Lines of Code - Startup FortuneGoogle News: LLMAI breakthrough cuts energy use by 100x while boosting accuracy - ScienceDailyGNews AI energyMy forays into cyborgism: theory, pt. 1LessWrongBeauty, Bias, And Algorithm: AI Beauty Tools And The Amplification Of Inequality In India - Feminism in IndiaGNews AI IndiaAn Inside Look at OpenAI and Anthropic’s Finances Ahead of Their IPOs - WSJGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityCybersecurity Leaders to Watch in California’s Artificial Intelligence Industry - Security BoulevardGoogle News: AILess than a year after Anthropic called out China as an 'enemy nation', 'Claude leak' sends Chinese devel - The Times of IndiaGoogle News: ClaudeSources: Sam Altman has excluded OpenAI CFO Sarah Friar from some key financial meetings; Friar began reporting to Fidji Simo instead of the CEO in August 2025 (The Information)TechmemeReport says Minnesota workers face highest generative AI exposure in the Midwest - The Minnesota DailyGoogle News: Generative AIA 9-Million-Parameter LLM That Fits in 130 Lines of Code - Startup FortuneGoogle News: LLMAI breakthrough cuts energy use by 100x while boosting accuracy - ScienceDailyGNews AI energyMy forays into cyborgism: theory, pt. 1LessWrongBeauty, Bias, And Algorithm: AI Beauty Tools And The Amplification Of Inequality In India - Feminism in IndiaGNews AI IndiaAn Inside Look at OpenAI and Anthropic’s Finances Ahead of Their IPOs - WSJGoogle News: OpenAI
AI NEWS HUBbyEIGENVECTOREigenvector

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

arXivMarch 31, 20262 min read1 views
Source Quiz

arXiv:2412.14015v4 Announce Type: replace Abstract: Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To — Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang

View PDF HTML (experimental)

Abstract:Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containin both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Comments: CVPR 2025; Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2412.14015 [cs.CV]

(or arXiv:2412.14015v4 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2412.14015

arXiv-issued DOI via DataCite

Submission history

From: Haotong Lin [view email] [v1] Wed, 18 Dec 2024 16:32:12 UTC (35,944 KB) [v2] Tue, 22 Apr 2025 14:42:39 UTC (37,952 KB) [v3] Fri, 13 Feb 2026 16:17:04 UTC (21,767 KB) [v4] Sat, 28 Mar 2026 09:25:50 UTC (16,489 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Prompting D…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 145 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers