Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHow the Amazon Echo learned to talk — and listenThe Verge AIHere's when poker tactics secured Microsoft’s DeepMind deal - The News InternationalGoogle News: DeepMind🔥 sponsors/atilaahmettanerGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 google-deepmind/gemmaGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 sponsors/badlogicGitHub TrendingEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV CommunityARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES - futuristsspeakers.comGoogle News: AI"Be Anything You Want" — OK, Here's How (Technically)DEV CommunityWashington sets new rules for artificial intelligence - seattlered.comGoogle News: AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHow the Amazon Echo learned to talk — and listenThe Verge AIHere's when poker tactics secured Microsoft’s DeepMind deal - The News InternationalGoogle News: DeepMind🔥 sponsors/atilaahmettanerGitHub Trending🔥 google-ai-edge/galleryGitHub Trending🔥 google-deepmind/gemmaGitHub Trending🔥 google-ai-edge/LiteRT-LMGitHub Trending🔥 HKUDS/RAG-AnythingGitHub Trending🔥 sponsors/badlogicGitHub TrendingEverything Works, But Users Are Still Confused: What SaaS Teams Are MissingDEV CommunityARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES - futuristsspeakers.comGoogle News: AI"Be Anything You Want" — OK, Here's How (Technically)DEV CommunityWashington sets new rules for artificial intelligence - seattlered.comGoogle News: AI
AI NEWS HUBbyEIGENVECTOREigenvector

SAM 3: Segment Anything with Concepts

arXivby [Submitted on 20 Nov 2025 (v1), last revised 28 Mar 2026 (this version, v2)]March 31, 20262 min read1 views
Source Quiz

arXiv:2511.16719v2 Announce Type: replace-cross Abstract: We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard — Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman R\"adle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Doll\'ar, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer

Authors:Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer

View PDF HTML (experimental)

Abstract:We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phrases (e.g., "yellow school bus"), image exemplars, or a combination of both. Promptable Concept Segmentation (PCS) takes such prompts and returns segmentation masks and unique identities for all matching object instances. To advance PCS, we build a scalable data engine that produces a high-quality dataset with 4M unique concept labels, including hard negatives, across images and videos. Our model consists of an image-level detector and a memory-based video tracker that share a single backbone. Recognition and localization are decoupled with a presence head, which boosts detection accuracy. SAM 3 doubles the accuracy of existing systems in both image and video PCS, and improves previous SAM capabilities on visual segmentation tasks. We open source SAM 3 along with our new Segment Anything with Concepts (SA-Co) benchmark for promptable concept segmentation.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2511.16719 [cs.CV]

(or arXiv:2511.16719v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2511.16719

arXiv-issued DOI via DataCite

Submission history

From: Christoph Feichtenhofer [view email] [v1] Thu, 20 Nov 2025 18:59:56 UTC (37,393 KB) [v2] Sat, 28 Mar 2026 16:54:56 UTC (37,496 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
SAM 3: Segm…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 162 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!