Research Papers research paper arxiv computer-vision image-recognition

R3DP: Real-Time 3D-Aware Policy for Embodied Manipulation

arXivMarch 31, 20262 min read2 views

🧒Explain Like I'm 5Simple language

Hey there, little explorer! Imagine you have a robot friend who wants to play with blocks.

This news is about making our robot friend super smart! Usually, robots are a bit slow when they try to really see all the blocks, like how tall they are or where they are in 3D space.

But now, our robot friend has a new superpower called R3DP! It helps the robot look at the blocks very, very fast. It's like having a super speedy eye that can see everything in 3D, without getting slow.

So, the robot can quickly grab the right block, build a tall tower, or stack things perfectly, just like you would! It helps robots play and work much better and faster. Isn't that cool?

arXiv:2603.14498v2 Announce Type: replace-cross Abstract: Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D — Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu

Authors:Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu

View PDF HTML (experimental)

Abstract:Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D priors into the policy without compromising real-time performance. The system maintains real-time efficiency by querying the pre-trained slow system (VGGT) only on sparse key frames, while simultaneously employing a lightweight Temporal Feature Prediction Network (TFPNet) to predict features for all intermediate frames. By leveraging historical data to exploit temporal correlations, TFPNet explicitly improves task success rates through consistent feature estimation. Additionally, to enable more effective multi-view fusion, we introduce a Multi-View Feature Fuser (MVFF) that aggregates features across views by explicitly incorporating camera intrinsics and extrinsics. R3DP offers a plug-and-play solution for integrating large models into real-time inference systems. We evaluate R3DP against multiple baselines across different visual configurations. R3DP effectively harnesses large-scale 3D priors to achieve superior results, outperforming single-view and multi-view DP by 32.9% and 51.4% in average success rate, respectively. Furthermore, by decoupling heavy 3D reasoning from policy execution, R3DP achieves a 44.8% reduction in inference time compared to a naive DP+VGGT integration.

Comments: Project Page: this https URL Github Repo: this https URL

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.14498 [cs.RO]

(or arXiv:2603.14498v2 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.14498

arXiv-issued DOI via DataCite

Submission history

From: Yuhao Zhang [view email] [v1] Sun, 15 Mar 2026 17:30:49 UTC (4,140 KB) [v2] Sat, 28 Mar 2026 07:15:57 UTC (4,140 KB)

Original source

arXiv

https://arxiv.org/abs/2603.14498

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersRecent

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats - The Guardian Nigeria News

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats The Guardian Nigeria News

GNews AI USA

1m2 days ago

Products

How Customers Are Using AI Search [2025 Research] - Bain & Company

How Customers Are Using AI Search [2025 Research] Bain & Company

GNews AI search

1m8 months ago

Releases

France launches expert group on AI’s psychological threat - Research Professional News

France launches expert group on AI’s psychological threat Research Professional News

GNews AI France

1mabout 1 month ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 145 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats - The Guardian Nigeria News

U.S.-based expert advances AI research to tackle healthcare fraud and cyber threats The Guardian Nigeria News

GNews AI USA

1m2 days ago

Research PapersFresh

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous.4open.science/ ) to help supplement their rebuttal. Is this against any policy? I'm considering submitting additional graphs during the discussion phase for clarity, and would like to make sure that won't cause any issues submitted by /u/drahcirenoob [link] [comments]

Reddit r/MachineLearning

1mabout 3 hours ago

Research Papers

Tech Moves: Microsoft execs depart; TerraClear, UserTesting, EchoMark and Read AI add leaders - GeekWire

Tech Moves: Microsoft execs depart; TerraClear, UserTesting, EchoMark and Read AI add leaders GeekWire

GNews AI Microsoft

1m2 days ago

Research PapersFresh

[D] Is research in semantic segmentation saturated?

Nowadays I dont see a lot of papers addressing 2D semantic segmentation problem statements be it supervised, semi-supervised, domain adaptation. Is the problem statement saturated? Are there any promising research directions in segmentation except open-set segmentation? submitted by /u/Hot_Version_6403 [link] [comments]

Reddit r/MachineLearning

1mabout 10 hours ago