Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessFive Agent Memory Types in LangGraph: A Deep Code Walkthrough (Part 2)DEV CommunityLayered Context Routing for Campus Operations: A Facilities Intake PoCDEV CommunityHow Crypto Lending Actually Works Under the Hood: A Developer's PerspectiveDEV CommunityAutomating Landed Cost: The AI Advantage for ASEAN SellersDEV CommunityAsync Web Scraping in Python: httpx + asyncio for 10x Faster Data CollectionDEV CommunityUsing GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026DEV CommunityBuilding Cross-Cloud Java Applications with Capa-Java: The Good, The Bad, and What I Learned the Hard WayDEV CommunityUBTECH 2025 "Report Card": Revenue from Full-Size Humanoid Robots Grows Over 22-Fold - GasgooGoogle News - AI roboticsI Built an MCP Server So Claude Can Answer Questions About Its Own UsageDEV CommunityAI Image Generation in 2026: A Developer's Guide to Building with AI Art APIsDEV CommunityUnder the Skin of America’s Humanoid Robots: Chinese Technology - WSJGoogle News - AI roboticsHow I Built a Zero-Signup AI Platform (And Why It Converts Better)DEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessFive Agent Memory Types in LangGraph: A Deep Code Walkthrough (Part 2)DEV CommunityLayered Context Routing for Campus Operations: A Facilities Intake PoCDEV CommunityHow Crypto Lending Actually Works Under the Hood: A Developer's PerspectiveDEV CommunityAutomating Landed Cost: The AI Advantage for ASEAN SellersDEV CommunityAsync Web Scraping in Python: httpx + asyncio for 10x Faster Data CollectionDEV CommunityUsing GPT-4 and Claude to Extract Structured Data From Any Webpage in 2026DEV CommunityBuilding Cross-Cloud Java Applications with Capa-Java: The Good, The Bad, and What I Learned the Hard WayDEV CommunityUBTECH 2025 "Report Card": Revenue from Full-Size Humanoid Robots Grows Over 22-Fold - GasgooGoogle News - AI roboticsI Built an MCP Server So Claude Can Answer Questions About Its Own UsageDEV CommunityAI Image Generation in 2026: A Developer's Guide to Building with AI Art APIsDEV CommunityUnder the Skin of America’s Humanoid Robots: Chinese Technology - WSJGoogle News - AI roboticsHow I Built a Zero-Signup AI Platform (And Why It Converts Better)DEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

AffordGrasp: Cross-Modal Diffusion for Affordance-Aware Grasp Synthesis

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.08021v2 Announce Type: replace-cross Abstract: Generating human grasping poses that accurately reflect both object geometry and user-specified interaction semantics is essential for natural hand-object interactions in AR/VR and embodied AI. However, existing semantic grasping approaches struggle with the large modality gap between 3D object representations and textual instructions, and often lack explicit spatial or semantic constraints, leading to physically invalid or semantically inconsistent grasps. In this work, we present AffordGrasp, a diffusion-based framework that produces — Xiaofei Wu, Yi Zhang, Yumeng Liu, Yuexin Ma, Yujiao Shi, Xuming He

View PDF HTML (experimental)

Abstract:Generating human grasping poses that accurately reflect both object geometry and user-specified interaction semantics is essential for natural hand-object interactions in AR/VR and embodied AI. However, existing semantic grasping approaches struggle with the large modality gap between 3D object representations and textual instructions, and often lack explicit spatial or semantic constraints, leading to physically invalid or semantically inconsistent grasps. In this work, we present AffordGrasp, a diffusion-based framework that produces physically stable and semantically faithful human grasps with high precision. We first introduce a scalable annotation pipeline that automatically enriches hand-object interaction datasets with fine-grained structured language labels capturing interaction intent. Building upon these annotations, AffordGrasp integrates an affordance-aware latent representation of hand poses with a dual-conditioning diffusion process, enabling the model to jointly reason over object geometry, spatial affordances, and instruction semantics. A distribution adjustment module further enforces physical contact consistency and semantic alignment. We evaluate AffordGrasp across four instruction-augmented benchmarks derived from HO-3D, OakInk, GRAB, and AffordPose, and observe substantial improvements over state-of-the-art methods in grasp quality, semantic accuracy, and diversity.

Comments: CVPR 2026

Subjects:

Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.08021 [cs.RO]

(or arXiv:2603.08021v2 [cs.RO] for this version)

https://doi.org/10.48550/arXiv.2603.08021

arXiv-issued DOI via DataCite

Submission history

From: Xiaofei Wu [view email] [v1] Mon, 9 Mar 2026 06:56:35 UTC (16,745 KB) [v2] Sat, 28 Mar 2026 15:33:35 UTC (16,617 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
AffordGrasp…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!