R3DP: Real-Time 3D-Aware Policy for Embodied Manipulation
Hey there, little explorer! Imagine you have a robot friend who wants to play with blocks.
This news is about making our robot friend super smart! Usually, robots are a bit slow when they try to really see all the blocks, like how tall they are or where they are in 3D space.
But now, our robot friend has a new superpower called R3DP! It helps the robot look at the blocks very, very fast. It's like having a super speedy eye that can see everything in 3D, without getting slow.
So, the robot can quickly grab the right block, build a tall tower, or stack things perfectly, just like you would! It helps robots play and work much better and faster. Isn't that cool?
arXiv:2603.14498v2 Announce Type: replace-cross Abstract: Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D — Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu
Authors:Yuhao Zhang, Wanxi Dong, Yue Shi, Yi Liang, Jingnan Gao, Qiaochu Yang, Yaxing Lyu, Zhixuan Liang, Yibin Liu, Congsheng Xu, Xianda Guo, Wei Sui, Yaohui Jin, Xiaokang Yang, Yanyan Xu, Yao Mu
View PDF HTML (experimental)
Abstract:Embodied manipulation requires accurate 3D understanding of objects and their spatial relations to plan and execute contact-rich actions. While large-scale 3D vision models provide strong priors, their computational cost incurs prohibitive latency for real-time control. We propose Real-time 3D-aware Policy (R3DP), which integrates powerful 3D priors into manipulation policies without sacrificing real-time performance. A core innovation of R3DP is the asynchronous fast-slow collaboration module, which seamlessly integrates large-scale 3D priors into the policy without compromising real-time performance. The system maintains real-time efficiency by querying the pre-trained slow system (VGGT) only on sparse key frames, while simultaneously employing a lightweight Temporal Feature Prediction Network (TFPNet) to predict features for all intermediate frames. By leveraging historical data to exploit temporal correlations, TFPNet explicitly improves task success rates through consistent feature estimation. Additionally, to enable more effective multi-view fusion, we introduce a Multi-View Feature Fuser (MVFF) that aggregates features across views by explicitly incorporating camera intrinsics and extrinsics. R3DP offers a plug-and-play solution for integrating large models into real-time inference systems. We evaluate R3DP against multiple baselines across different visual configurations. R3DP effectively harnesses large-scale 3D priors to achieve superior results, outperforming single-view and multi-view DP by 32.9% and 51.4% in average success rate, respectively. Furthermore, by decoupling heavy 3D reasoning from policy execution, R3DP achieves a 44.8% reduction in inference time compared to a naive DP+VGGT integration.
Comments: Project Page: this https URL Github Repo: this https URL
Subjects:
Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.14498 [cs.RO]
(or arXiv:2603.14498v2 [cs.RO] for this version)
https://doi.org/10.48550/arXiv.2603.14498
arXiv-issued DOI via DataCite
Submission history
From: Yuhao Zhang [view email] [v1] Sun, 15 Mar 2026 17:30:49 UTC (4,140 KB) [v2] Sat, 28 Mar 2026 07:15:57 UTC (4,140 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
![[R] ICML Anonymized git repos for rebuttal](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-graph-nodes-a2pnJLpyKmDnxKWLd5BEAb.webp)
[R] ICML Anonymized git repos for rebuttal
A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous.4open.science/ ) to help supplement their rebuttal. Is this against any policy? I'm considering submitting additional graphs during the discussion phase for clarity, and would like to make sure that won't cause any issues submitted by /u/drahcirenoob [link] [comments]
![[D] Is research in semantic segmentation saturated?](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-microchip-RD7Ub6Tkp8JwbZxSThJdV5.webp)
[D] Is research in semantic segmentation saturated?
Nowadays I dont see a lot of papers addressing 2D semantic segmentation problem statements be it supervised, semi-supervised, domain adaptation. Is the problem statement saturated? Are there any promising research directions in segmentation except open-set segmentation? submitted by /u/Hot_Version_6403 [link] [comments]



![How Customers Are Using AI Search [2025 Research] - Bain & Company](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-robot-hand-JvPW6jsLFTCtkgtb97Kys5.webp)

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!