Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI, Price Theory, and the Future of Economics ResearchHacker News AI TopShow HN: EU Compliance SaaS for Sale ($4K Each) – CBAM, AI Act, Public TendersHacker News AI TopShow HN: Filoxenia – open protocol for human-AI companionshipHacker News AI TopShow HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)Hacker News AI TopMeta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at RiskWired AI"Cognitive surrender" leads AI users to abandon logical thinking, research findsHacker News AI TopUsing multiple AI agents as an architectural review councilHacker News AI TopAES Maximo robot installs 100 megawatts of solar capacityThe Robot Reportb8653llama.cpp Releases5 Backend Concepts You Shouldn’t IgnoreTowards AIAI's Next Frontier: Insights from Jeff Dean and Bill Dally InHacker News AI TopGoogle launches Gemma 4, an enterprise-grade open source AI model set - CIO DiveGNews AI GemmaBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI, Price Theory, and the Future of Economics ResearchHacker News AI TopShow HN: EU Compliance SaaS for Sale ($4K Each) – CBAM, AI Act, Public TendersHacker News AI TopShow HN: Filoxenia – open protocol for human-AI companionshipHacker News AI TopShow HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)Hacker News AI TopMeta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at RiskWired AI"Cognitive surrender" leads AI users to abandon logical thinking, research findsHacker News AI TopUsing multiple AI agents as an architectural review councilHacker News AI TopAES Maximo robot installs 100 megawatts of solar capacityThe Robot Reportb8653llama.cpp Releases5 Backend Concepts You Shouldn’t IgnoreTowards AIAI's Next Frontier: Insights from Jeff Dean and Bill Dally InHacker News AI TopGoogle launches Gemma 4, an enterprise-grade open source AI model set - CIO DiveGNews AI Gemma
AI NEWS HUBbyEIGENVECTOREigenvector

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

arXiv cs.MAby [Submitted on 25 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read2 views
Source Quiz

arXiv:2603.23853v2 Announce Type: replace-cross Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective

View PDF HTML (experimental)

Abstract:Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective hallucination detection and abstention for highly uncertain samples. On ScienceQA, SCoOP achieves an AUROC of 0.866 for hallucination detection, outperforming baselines (0.732-0.757) by approximately 10-13%. For abstention, it attains an AURAC of 0.907, exceeding baselines (0.818-0.840) by 7-9%. Despite these gains, SCoOP introduces only microsecond-level aggregation overhead relative to the baselines, which is trivial compared to typical VLM inference time (on the order of seconds). These results demonstrate that SCoOP provides an efficient and principled mechanism for uncertainty-aware aggregation, advancing the reliability of multimodal AI systems. Our code is publicly available at this https URL.

Comments: Accepted to ICLR 2024 Workshop on Agentic AI in the Wild: From Hallucinations to Reliable Autonomy

Subjects:

Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Cite as: arXiv:2603.23853 [cs.AI]

(or arXiv:2603.23853v2 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.23853

arXiv-issued DOI via DataCite

Submission history

From: Chung-En Johnny Yu [view email] [v1] Wed, 25 Mar 2026 02:30:48 UTC (1,921 KB) [v2] Wed, 1 Apr 2026 03:28:41 UTC (1,925 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
SCoOP: Sema…modellanguage mo…trainingannounceavailableopinionarXiv cs.MA

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 127 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models