Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV CommunityOpenClaw SaaS vs Self-Hosting: Which One Should You Choose in 2026?DEV Community7 Best AI Coding Assistant Tools in 2026DEV CommunityHow Is Agentic AI Changing Travel Booking? What Ask Skift Says - SkiftGNews AI agenticWhat is GEO (Generative Engine Optimization)? The 2026 GuideDev.to AI[D] CVPR 2026 Travel Grant/Registration WaiverReddit r/MachineLearningIAPP Global Privacy Summit 2026: State AI Trends, FTC Signals, California’s DROP Build-Out, and the Hard Work of Cookie Compliance - JD SupraGNews AI privacy[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?Reddit r/MachineLearningQIS for Energy Grids: Why Distributed Renewable Integration Keeps Failing and What Outcome Routing ChangesDev.to AIBig Banks Seeking a Piece of SpaceX’s I.P.O. Must Subscribe to Elon Musk’s GrokNYT TechnologyCan We Fix Political Conversation Online? Joe Kiani's CitizeX Is Betting on Identity Verification, Not AlgorithmsInternational Business TimesBlack Hat USADark ReadingBlack Hat AsiaAI BusinessHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV CommunityOpenClaw SaaS vs Self-Hosting: Which One Should You Choose in 2026?DEV Community7 Best AI Coding Assistant Tools in 2026DEV CommunityHow Is Agentic AI Changing Travel Booking? What Ask Skift Says - SkiftGNews AI agenticWhat is GEO (Generative Engine Optimization)? The 2026 GuideDev.to AI[D] CVPR 2026 Travel Grant/Registration WaiverReddit r/MachineLearningIAPP Global Privacy Summit 2026: State AI Trends, FTC Signals, California’s DROP Build-Out, and the Hard Work of Cookie Compliance - JD SupraGNews AI privacy[D] When to transition from simple heuristics to ML models (e.g., DensityFunction)?Reddit r/MachineLearningQIS for Energy Grids: Why Distributed Renewable Integration Keeps Failing and What Outcome Routing ChangesDev.to AIBig Banks Seeking a Piece of SpaceX’s I.P.O. Must Subscribe to Elon Musk’s GrokNYT TechnologyCan We Fix Political Conversation Online? Joe Kiani's CitizeX Is Betting on Identity Verification, Not AlgorithmsInternational Business Times
AI NEWS HUBbyEIGENVECTOREigenvector

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2511.22950v2 Announce Type: replace Abstract: Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to-sim transfer, and reliable safety monitoring in dynamic human-robot environments. Despite the strong capabilities of modern segmentation models, surprisingly it remains challenging to segment robots. This is due to robot embodiment diversity, appearance ambiguity, structural complexity, and rapid shape changes. Embracing these challenges, we introduce Rob — Haiyang Mei, Qiming Huang, Hai Ci, Mike Zheng Shou

View PDF HTML (experimental)

Abstract:Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to-sim transfer, and reliable safety monitoring in dynamic human-robot environments. Despite the strong capabilities of modern segmentation models, surprisingly it remains challenging to segment robots. This is due to robot embodiment diversity, appearance ambiguity, structural complexity, and rapid shape changes. Embracing these challenges, we introduce RobotSeg, a foundation model for robot segmentation in image and video. RobotSeg is built upon the versatile SAM 2 foundation model but addresses its three limitations for robot segmentation, namely the lack of adaptation to articulated robots, reliance on manual prompts, and the need for per-frame training mask annotations, by introducing a structure-enhanced memory associator, a robot prompt generator, and a label-efficient training strategy. These innovations collectively enable a structure-aware, automatic, and label-efficient solution. We further construct the video robot segmentation (VRS) dataset comprising over 2.8k videos (138k frames) with diverse robot embodiments and environments. Extensive experiments demonstrate that RobotSeg achieves state-of-the-art performance on both images and videos, establishing a strong foundation for future advances in robot perception.

Comments: CVPR 2026. Project page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Cite as: arXiv:2511.22950 [cs.CV]

(or arXiv:2511.22950v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.22950

arXiv-issued DOI via DataCite

Submission history

From: Haiyang Mei [view email] [v1] Fri, 28 Nov 2025 07:51:02 UTC (42,632 KB) [v2] Sat, 28 Mar 2026 11:43:43 UTC (42,095 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
RobotSeg: A…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 181 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!