Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSadly, The Whispering EarringLessWrong AIAnthropic Responsible Scaling Policy v3: Dive Into The Detailslesswrong.comI tried ChatGPT's new CarPlay integration: It's my new go-to for the questions Siri can't answerZDNet AIAvast Premium Isn’t Flashy — But It Might Be the Smartest Cheap Antivirus Right NowGizmodoScaling AI's Promise in Healthcare: The Time is Now - Pharmaceutical ExecutiveGNews AI healthcareChina issues guideline for AI ethics governance - China DailyGNews AI ChinaLiving brain cells enable machine learning computations - Tech XploreGoogle News: Machine LearningLiving brain cells enable machine learning computationsPhys.org AIARC Raiders Publisher Nexon Calls The Game A "Trojan Horse" To Normalize Generative AI - gameranx.comGoogle News: Generative AIInvestors Chasing AI Hardware Gains May Want to Rethink ARTY Before Adding More Exposure - 24/7 Wall St.GNews AI NVIDIAAI for investors - MLQ.aiGNews AI NVIDIABlack Hat USADark ReadingBlack Hat AsiaAI BusinessSadly, The Whispering EarringLessWrong AIAnthropic Responsible Scaling Policy v3: Dive Into The Detailslesswrong.comI tried ChatGPT's new CarPlay integration: It's my new go-to for the questions Siri can't answerZDNet AIAvast Premium Isn’t Flashy — But It Might Be the Smartest Cheap Antivirus Right NowGizmodoScaling AI's Promise in Healthcare: The Time is Now - Pharmaceutical ExecutiveGNews AI healthcareChina issues guideline for AI ethics governance - China DailyGNews AI ChinaLiving brain cells enable machine learning computations - Tech XploreGoogle News: Machine LearningLiving brain cells enable machine learning computationsPhys.org AIARC Raiders Publisher Nexon Calls The Game A "Trojan Horse" To Normalize Generative AI - gameranx.comGoogle News: Generative AIInvestors Chasing AI Hardware Gains May Want to Rethink ARTY Before Adding More Exposure - 24/7 Wall St.GNews AI NVIDIAAI for investors - MLQ.aiGNews AI NVIDIA
AI NEWS HUBbyEIGENVECTOREigenvector

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.01305v2 Announce Type: replace-cross Abstract: Large multimodal models (LMMs) exhibit strong task generalization capabilities, offering new opportunities for zero-shot visual anomaly segmentation (ZSAS). However, existing LMM-based segmentation approaches still face fundamental limitations: anomaly concepts are inherently abstract and context-dependent, lacking stable visual prototypes, and the weak alignment between high-level semantic embeddings and pixel-level spatial features hinders precise anomaly localization. To address these challenges, we present AG-VAS (Anchor-Guided Visu — Zhen Qu, Xian Tao, Xiaoyi Bao, Dingrong Wang, ShiChen Qu, Zhengtao Zhang, Xingang Wang

View PDF HTML (experimental)

Abstract:Large multimodal models (LMMs) exhibit strong task generalization capabilities, offering new opportunities for zero-shot visual anomaly segmentation (ZSAS). However, existing LMM-based segmentation approaches still face fundamental limitations: anomaly concepts are inherently abstract and context-dependent, lacking stable visual prototypes, and the weak alignment between high-level semantic embeddings and pixel-level spatial features hinders precise anomaly localization. To address these challenges, we present AG-VAS (Anchor-Guided Visual Anomaly Segmentation), a new framework that expands the LMM vocabulary with three learnable semantic anchor tokens-[SEG], [NOR], and [ANO], establishing a unified anchor-guided segmentation paradigm. Specifically, [SEG] serves as an absolute semantic anchor that translates abstract anomaly semantics into explicit, spatially grounded visual entities (e.g., holes or scratches), while [NOR] and [ANO] act as relative anchors that model the contextual contrast between normal and abnormal patterns across categories. To further enhance cross-modal alignment, we introduce a Semantic-Pixel Alignment Module (SPAM) that aligns language-level semantic embeddings with high-resolution visual features, along with an Anchor-Guided Mask Decoder (AGMD) that performs anchor-conditioned mask prediction for precise anomaly localization. In addition, we curate Anomaly-Instruct20K, a large-scale instruction dataset that organizes anomaly knowledge into structured descriptions of appearance, shape, and spatial attributes, facilitating effective learning and integration of the proposed semantic anchors. Extensive experiments on six industrial and medical benchmarks demonstrate that AG-VAS achieves consistent state-of-the-art performance in the zero-shot setting.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.01305 [cs.CV]

(or arXiv:2603.01305v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.01305

arXiv-issued DOI via DataCite

Submission history

From: Zhen Qu [view email] [v1] Sun, 1 Mar 2026 22:25:23 UTC (3,049 KB) [v2] Mon, 30 Mar 2026 07:42:53 UTC (10,640 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
AG-VAS: Anc…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers