SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems
arXiv:2603.23853v2 Announce Type: replace-cross Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective
View PDF HTML (experimental)
Abstract:Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective hallucination detection and abstention for highly uncertain samples. On ScienceQA, SCoOP achieves an AUROC of 0.866 for hallucination detection, outperforming baselines (0.732-0.757) by approximately 10-13%. For abstention, it attains an AURAC of 0.907, exceeding baselines (0.818-0.840) by 7-9%. Despite these gains, SCoOP introduces only microsecond-level aggregation overhead relative to the baselines, which is trivial compared to typical VLM inference time (on the order of seconds). These results demonstrate that SCoOP provides an efficient and principled mechanism for uncertainty-aware aggregation, advancing the reliability of multimodal AI systems. Our code is publicly available at this https URL.
Comments: Accepted to ICLR 2024 Workshop on Agentic AI in the Wild: From Hallucinations to Reliable Autonomy
Subjects:
Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as: arXiv:2603.23853 [cs.AI]
(or arXiv:2603.23853v2 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.23853
arXiv-issued DOI via DataCite
Submission history
From: Chung-En Johnny Yu [view email] [v1] Wed, 25 Mar 2026 02:30:48 UTC (1,921 KB) [v2] Wed, 1 Apr 2026 03:28:41 UTC (1,925 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modeltraining![[D] Best websites for pytorch/numpy interviews](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-graph-nodes-a2pnJLpyKmDnxKWLd5BEAb.webp)
[D] Best websites for pytorch/numpy interviews
Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or research scientist role. For now I’m doing mainly leetcode. I’m looking for websites that can help me train for coding interviews in pytorch/numpy. I did some research and these websites popped up: nexskillai, tensorgym, deep-ml, leetgpu and the torch part of neetcode. However I couldn’t really decide which of these websites are the best. I’m open to suggestions in this matter, thanks. submitted by /u/Training-Adeptness57 [link] [comments]
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Models

"Cognitive surrender" leads AI users to abandon logical thinking, research finds
Article URL: https://arstechnica.com/ai/2026/04/research-finds-ai-users-scarily-willing-to-surrender-their-cognition-to-llms/ Comments URL: https://news.ycombinator.com/item?id=47632504 Points: 5 # Comments: 0
![[D] Best websites for pytorch/numpy interviews](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-graph-nodes-a2pnJLpyKmDnxKWLd5BEAb.webp)
[D] Best websites for pytorch/numpy interviews
Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or research scientist role. For now I’m doing mainly leetcode. I’m looking for websites that can help me train for coding interviews in pytorch/numpy. I did some research and these websites popped up: nexskillai, tensorgym, deep-ml, leetgpu and the torch part of neetcode. However I couldn’t really decide which of these websites are the best. I’m open to suggestions in this matter, thanks. submitted by /u/Training-Adeptness57 [link] [comments]





Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!