Research Papers research paper arxiv ai artificial-intelligence

Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection

arXivMarch 31, 202610 min read0 views

arXiv:2408.13516v2 Announce Type: replace-cross Abstract: Few-shot multi-class anomaly detection is crucial in real industrial settings, where only a few normal samples are available while numerous object types must be inspected. This setting is challenging as defect patterns vary widely across categories while normal samples remain scarce. Existing vision-language model-based approaches typically depend on class-specific anomaly descriptions or auxiliary modules, limiting both scalability and computational efficiency. In this work, we propose AnoPLe, a lightweight multimodal prompt learning f — Yujin Lee, Sewon Kim, Daeun Moon, Seoyoon Jang, Hyunsoo Yoon

View PDF HTML (experimental)

Abstract:Few-shot multi-class anomaly detection is crucial in real industrial settings, where only a few normal samples are available while numerous object types must be inspected. This setting is challenging as defect patterns vary widely across categories while normal samples remain scarce. Existing vision-language model-based approaches typically depend on class-specific anomaly descriptions or auxiliary modules, limiting both scalability and computational efficiency. In this work, we propose AnoPLe, a lightweight multimodal prompt learning framework that removes reliance on anomaly-type textual descriptions and avoids any external modules. AnoPLe employs bidirectional interactions between textual and visual prompts, allowing class semantics and instance-level cues to refine one another and form class-conditioned representations that capture shared normal patterns across categories. To enhance localization, we design a scale-aware prefix trained on both global and local views, enabling the prompts to capture both global context and fine-grained details. In addition, alignment loss propagates local anomaly evidence to global features, strengthening the consistency between pixel- and image-level predictions. Despite its simplicity, AnoPLe achieves strong performance on MVTec-AD, VisA, and Real-IAD under the few-shot multi-class setting, surpassing prior approaches while remaining efficient and free from expert-crafted anomaly descriptions. Moreover, AnoPLe generalizes well to unseen anomalies and extends effectively to the medical domain.

Comments: accepted to CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2408.13516 [cs.CV]

(or arXiv:2408.13516v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2408.13516

arXiv-issued DOI via DataCite

Submission history

From: Yujin Lee [view email] [v1] Sat, 24 Aug 2024 08:41:19 UTC (46,204 KB) [v2] Mon, 30 Mar 2026 03:34:35 UTC (11,492 KB)

Original source

arXiv

https://arxiv.org/abs/2408.13516

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

is.mpg.de

1m8 months ago

Models

Howard University and Google Research Enhance A.I. Speech Recognition of African American English - The Dig at Howard University

<a href="https://news.google.com/rss/articles/CBMiygFBVV95cUxQRTh4T2h6cVRsdEF2cjlkWGQyT2tWZnVTTmh4czBJV3ZpSmd1T1Z2eG5Ld1dvQWhNckpjRDItVEtiZ2hMdjBVLWJ0b0xTY0pieG82U0VibXFBLWVUN0tlQ3J1dzBFa2ZBekF1YXJPZlpHNGtkOWZjdWFCSlVTQTctcTNvcURtOER4MnhnYk1BQUt4WllmekE4WkVERTA4Wi1VcnFCY2xYSml6ak9GM1o1NmI0VWtXb2xERlVZVFNBTTQyQ1FBWThESk53?oc=5" target="_blank">Howard University and Google Research Enhance A.I. Speech Recognition of African American English</a> The Dig at Howard University

GNews AI voice

1m9 months ago

Products

Speech-to-Retrieval (S2R): A new approach to voice search - research.google

<a href="https://news.google.com/rss/articles/CBMijAFBVV95cUxQekN0T0VkREpJVGk0U25zMVcyX0VYV0V4eVRJY2ozVW02ampCVXFMRDJybk56blpMdWVhdkRsWWI2S19JemlYM3dHd2dBSkx0SWxtNnNfN18zcjBKLWVXN3JZUnVFdndndTBnSVlVSGhVdWwyS1V3TkRCSUJ5SnRkYXJBV1NfZWUwa3ByWA?oc=5" target="_blank">Speech-to-Retrieval (S2R): A new approach to voice search</a> research.google

GNews AI voice

1m6 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 100 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

is.mpg.de

1m8 months ago

Research Papers

AI-driven digital manipulation ‘tested’ Dutch election integrity, researchers warn - EUobserver

<a href="https://news.google.com/rss/articles/CBMirwFBVV95cUxQcERTcUc5ZndxZ054endXTXNwTlhtYjRyLXBHWVJmRXloNV9JUUpFZnBrLUdDeUpSNklZRFJuUXl0bThIT2ZzbFd6ZU02TW9yaXBPbHducUlHaXVUbWprS0pla0JENkxpSkZfWW9vdTRvcjIzc2ZzWGF6ZmJPMXRVRkFnNmp5NWpLZTBIRk9LamF2RUtkdnQ2bFJXRVZMdVkxZWNHVUl1SzZZeE1JT3R3?oc=5" target="_blank">AI-driven digital manipulation ‘tested’ Dutch election integrity, researchers warn</a> EUobserver

GNews AI Netherlands

1m2 months ago

Research PapersLive

Why Drug Toxicity Can’t Be Predicted in Isolation — Building EIRION with Graph Neural Networks

How we built a graph neural network that finally sees the whole play — not just the audition Every year, drugs that passed early safety tests go on to harm people in ways nobody predicted. Not because the chemistry was wrong. Not because the researchers were careless. But because we kept evaluating drugs the way a talent agent judges an actor from a solo audition tape. Isolated. Out of context. No script. No co-stars. No stage. In real theatre, a performance is never just about one actor. It depends on who they share the stage with, which scene they appear in, what the story demands at that moment. A brilliant performer in the wrong play, surrounded by the wrong cast, in the wrong context — can still wreck the whole production. That is exactly how drug toxicity works. And that is exactly t

Towards AI

17mabout 1 hour ago

Research PapersLive

It's Not Smarter Models — It's Cheaper Memory: TurboQuant's Real Impact, Wall Street Panic & Academic Storm

<blockquote> One-line summary: TurboQuant is a genuinely important engineering breakthrough — but Google's marketing, academic ethics controversy, and Wall Street's overreaction made the story far more dramatic than the technology itself. </blockquote> <h2> 0. What This Article Answers </h2> Google Research published TurboQuant at ICLR 2026 (<a href="https://arxiv.org/abs/2504.19874" rel="noopener noreferrer">arXiv 2504.19874</a>), claiming 6x memory compression, 8x speedup, and zero accuracy loss for LLM KV caches. Then, in the same week: <ol> <li>Global memory stocks lost over $90 billion in market cap</li> <li>An ETH Zürich researcher publicly accused the paper of academic plagiarism and experimental fraud </li> <li

DEV Community

10m43 minutes ago

Bidirectional Multimodal Prompt Learning with Scale-Aware Training for Few-Shot Multi-Class Anomaly Detection

Submission history

Daily AI Digest

More about

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

Howard University and Google Research Enhance A.I. Speech Recognition of African American English - The Dig at Howard University

​​Speech-to-Retrieval (S2R): A new approach to voice search - research.google

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

Humboldt Fellow from the US conducts research in robotics to one day harvest energy from ocean waves

AI-driven digital manipulation ‘tested’ Dutch election integrity, researchers warn - EUobserver

Why Drug Toxicity Can’t Be Predicted in Isolation — Building EIRION with Graph Neural Networks

It's Not Smarter Models — It's Cheaper Memory: TurboQuant's Real Impact, Wall Street Panic & Academic Storm

Speech-to-Retrieval (S2R): A new approach to voice search - research.google