Research Papers research paper arxiv ai artificial-intelligence

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

arXivMarch 31, 202610 min read0 views

arXiv:2603.01305v2 Announce Type: replace-cross Abstract: Large multimodal models (LMMs) exhibit strong task generalization capabilities, offering new opportunities for zero-shot visual anomaly segmentation (ZSAS). However, existing LMM-based segmentation approaches still face fundamental limitations: anomaly concepts are inherently abstract and context-dependent, lacking stable visual prototypes, and the weak alignment between high-level semantic embeddings and pixel-level spatial features hinders precise anomaly localization. To address these challenges, we present AG-VAS (Anchor-Guided Visu — Zhen Qu, Xian Tao, Xiaoyi Bao, Dingrong Wang, ShiChen Qu, Zhengtao Zhang, Xingang Wang

View PDF HTML (experimental)

Abstract:Large multimodal models (LMMs) exhibit strong task generalization capabilities, offering new opportunities for zero-shot visual anomaly segmentation (ZSAS). However, existing LMM-based segmentation approaches still face fundamental limitations: anomaly concepts are inherently abstract and context-dependent, lacking stable visual prototypes, and the weak alignment between high-level semantic embeddings and pixel-level spatial features hinders precise anomaly localization. To address these challenges, we present AG-VAS (Anchor-Guided Visual Anomaly Segmentation), a new framework that expands the LMM vocabulary with three learnable semantic anchor tokens-[SEG], [NOR], and [ANO], establishing a unified anchor-guided segmentation paradigm. Specifically, [SEG] serves as an absolute semantic anchor that translates abstract anomaly semantics into explicit, spatially grounded visual entities (e.g., holes or scratches), while [NOR] and [ANO] act as relative anchors that model the contextual contrast between normal and abnormal patterns across categories. To further enhance cross-modal alignment, we introduce a Semantic-Pixel Alignment Module (SPAM) that aligns language-level semantic embeddings with high-resolution visual features, along with an Anchor-Guided Mask Decoder (AGMD) that performs anchor-conditioned mask prediction for precise anomaly localization. In addition, we curate Anomaly-Instruct20K, a large-scale instruction dataset that organizes anomaly knowledge into structured descriptions of appearance, shape, and spatial attributes, facilitating effective learning and integration of the proposed semantic anchors. Extensive experiments on six industrial and medical benchmarks demonstrate that AG-VAS achieves consistent state-of-the-art performance in the zero-shot setting.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.01305 [cs.CV]

(or arXiv:2603.01305v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.01305

arXiv-issued DOI via DataCite

Submission history

From: Zhen Qu [view email] [v1] Sun, 1 Mar 2026 22:25:23 UTC (3,049 KB) [v2] Mon, 30 Mar 2026 07:42:53 UTC (10,640 KB)

Original source

arXiv

https://arxiv.org/abs/2603.01305

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

CountriesFresh

Paper AI robot offerings for tomb-sweeping festival - South China Morning Post

Paper AI robot offerings for tomb-sweeping festival South China Morning Post

GNews AI China

1mabout 5 hours ago

Research PapersLive

New memristor design uses built-in oxygen gradient to bring stability to reinforcement learning

In a recent study published in Nature Communications, researchers created a memristor that uses a built-in oxygen gradient to produce slow, stable conductance changes, enabling a reinforcement learning (RL) algorithm to learn faster and more stably than conventional approaches.

Phys.org AI

1mabout 1 hour ago

Research PapersLive

Living brain cells enable machine learning computations

A research team at Tohoku University and Future University Hakodate has demonstrated that living biological neurons can be trained to perform a supervised temporal pattern learning task previously carried out by artificial systems. By integrating cultured neuronal networks into a machine learning framework, the team showed that these biological systems can generate complex time-series signals, marking a significant step forward in both neuroscience and bio-inspired computing.

Phys.org AI

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

AG-VAS: Anchor-Guided Zero-Shot Visual Anomaly Segmentation with Large Multimodal Models

Submission history

Daily AI Digest

More about

Paper AI robot offerings for tomb-sweeping festival - South China Morning Post

New memristor design uses built-in oxygen gradient to bring stability to reinforcement learning

Living brain cells enable machine learning computations

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

New memristor design uses built-in oxygen gradient to bring stability to reinforcement learning

Living brain cells enable machine learning computations

Innovations in Medical Education Conference Confronts the AI Tipping Point - University of Miami

Judiciary Ready To Go Paperless, Rolls Out AI and Digital Systems - Uganda Radionetwork