Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityCybersecurity Leaders to Watch in California’s Artificial Intelligence Industry - Security BoulevardGoogle News: AILess than a year after Anthropic called out China as an 'enemy nation', 'Claude leak' sends Chinese devel - The Times of IndiaGoogle News: ClaudeSources: Sam Altman has excluded OpenAI CFO Sarah Friar from some key financial meetings; Friar began reporting to Fidji Simo instead of the CEO in August 2025 (The Information)TechmemeReport says Minnesota workers face highest generative AI exposure in the Midwest - The Minnesota DailyGoogle News: Generative AIA 9-Million-Parameter LLM That Fits in 130 Lines of Code - Startup FortuneGoogle News: LLMAI breakthrough cuts energy use by 100x while boosting accuracy - ScienceDailyGNews AI energyMy forays into cyborgism: theory, pt. 1LessWrongBeauty, Bias, And Algorithm: AI Beauty Tools And The Amplification Of Inequality In India - Feminism in IndiaGNews AI IndiaAn Inside Look at OpenAI and Anthropic’s Finances Ahead of Their IPOs - WSJGoogle News: OpenAIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI Citations: The New Backlink and How to Track Them at ScaleDEV CommunityConnecting Generative Adversarial Networks and Actor-Critic MethodsDEV CommunityCybersecurity Leaders to Watch in California’s Artificial Intelligence Industry - Security BoulevardGoogle News: AILess than a year after Anthropic called out China as an 'enemy nation', 'Claude leak' sends Chinese devel - The Times of IndiaGoogle News: ClaudeSources: Sam Altman has excluded OpenAI CFO Sarah Friar from some key financial meetings; Friar began reporting to Fidji Simo instead of the CEO in August 2025 (The Information)TechmemeReport says Minnesota workers face highest generative AI exposure in the Midwest - The Minnesota DailyGoogle News: Generative AIA 9-Million-Parameter LLM That Fits in 130 Lines of Code - Startup FortuneGoogle News: LLMAI breakthrough cuts energy use by 100x while boosting accuracy - ScienceDailyGNews AI energyMy forays into cyborgism: theory, pt. 1LessWrongBeauty, Bias, And Algorithm: AI Beauty Tools And The Amplification Of Inequality In India - Feminism in IndiaGNews AI IndiaAn Inside Look at OpenAI and Anthropic’s Finances Ahead of Their IPOs - WSJGoogle News: OpenAI
AI NEWS HUBbyEIGENVECTOREigenvector

R3DP: Real-Time 3D-Aware Policy for Embodied Manipulation

arXivMarch 31, 20262 min read2 views
Source Quiz
🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a robot friend who wants to play with blocks.

This news is about making our robot friend super smart! Usually, robots are a bit slow when they try to really see all the blocks, like how tall they are or where they are in 3D space.

But now, our robot friend has a new superpower called R3DP! It helps the robot look at the blocks very, very fast. It's like having a super speedy eye that can see everything in 3D, without getting slow.

So, the robot can quickly grab the right block, build a tall tower, or stack things perfectly, just like you would! It helps robots play and work much better and faster. Isn't that cool?

arXiv:2603.14498v2 Announce Type: replace-cross Abstract: Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D — Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu

Authors:Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu

View PDF HTML (experimental)

Abstract:Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D priors into the policy without compromising real-time performance. The system maintains real-time efficiency by querying the pre-trained slow system (VGGT) only on sparse key frames, while simultaneously employing a lightweight Temporal Feature Prediction Network (TFPNet) to predict features for all intermediate frames. By leveraging historical data to exploit temporal correlations, TFPNet explicitly improves task success rates through consistent feature estimation. Additionally, to enable more effective multi-view fusion, we introduce a Multi-View Feature Fuser (MVFF) that aggregates features across views by explicitly incorporating camera intrinsics and extrinsics. R3DP offers a plug-and-play solution for integrating large models into real-time inference systems. We evaluate R3DP against multiple baselines across different visual configurations. R3DP effectively harnesses large-scale 3D priors to achieve superior results, outperforming single-view and multi-view DP by 32.9% and 51.4% in average success rate, respectively. Furthermore, by decoupling heavy 3D reasoning from policy execution, R3DP achieves a 44.8% reduction in inference time compared to a naive DP+VGGT integration.

Comments: Project Page: this https URL Github Repo: this https URL

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.14498 [cs.RO]

(or arXiv:2603.14498v2 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.14498

arXiv-issued DOI via DataCite

Submission history

From: Yuhao Zhang [view email] [v1] Sun, 15 Mar 2026 17:30:49 UTC (4,140 KB) [v2] Sat, 28 Mar 2026 07:15:57 UTC (4,140 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
R3DP: Real-…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 145 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers