Models model language model training announce available opinion

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

arXiv cs.MAby [Submitted on 25 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20262 min read2 views

arXiv:2603.23853v2 Announce Type: replace-cross Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective

View PDF HTML (experimental)

Abstract:Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models' outputs amplifies uncertainty and increases the risk of hallucinations. We propose SCoOP (Semantic-Consistent Opinion Pooling), a training-free uncertainty quantification (UQ) framework for multi-VLM systems through uncertainty-weighted linear opinion pooling. The core idea is to treat each VLM as a probabilistic "expert," sample multiple outputs, map them to a unified space, aggregate their opinions, and produce a system-level uncertainty score. Unlike prior UQ methods designed for single models, SCoOP explicitly measures collective, system-level uncertainty across multiple VLMs, enabling effective hallucination detection and abstention for highly uncertain samples. On ScienceQA, SCoOP achieves an AUROC of 0.866 for hallucination detection, outperforming baselines (0.732-0.757) by approximately 10-13%. For abstention, it attains an AURAC of 0.907, exceeding baselines (0.818-0.840) by 7-9%. Despite these gains, SCoOP introduces only microsecond-level aggregation overhead relative to the baselines, which is trivial compared to typical VLM inference time (on the order of seconds). These results demonstrate that SCoOP provides an efficient and principled mechanism for uncertainty-aware aggregation, advancing the reliability of multimodal AI systems. Our code is publicly available at this https URL.

Comments: Accepted to ICLR 2024 Workshop on Agentic AI in the Wild: From Hallucinations to Reliable Autonomy

Subjects:

Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)

Cite as: arXiv:2603.23853 [cs.AI]

(or arXiv:2603.23853v2 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.23853

arXiv-issued DOI via DataCite

Submission history

From: Chung-En Johnny Yu [view email] [v1] Wed, 25 Mar 2026 02:30:48 UTC (1,921 KB) [v2] Wed, 1 Apr 2026 03:28:41 UTC (1,925 KB)

Original source

arXiv cs.MA

https://arxiv.org/abs/2603.23853

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltraining

ModelsLive

[D] Best websites for pytorch/numpy interviews

Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or research scientist role. For now I’m doing mainly leetcode. I’m looking for websites that can help me train for coding interviews in pytorch/numpy. I did some research and these websites popped up: nexskillai, tensorgym, deep-ml, leetgpu and the torch part of neetcode. However I couldn’t really decide which of these websites are the best. I’m open to suggestions in this matter, thanks. submitted by /u/Training-Adeptness57 [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

ModelsLive

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data about how they train AI models.

Wired AI

4m15 minutes ago

ReleasesLive

Google launches Gemma 4, an enterprise-grade open source AI model set - CIO Dive

Google launches Gemma 4, an enterprise-grade open source AI model set CIO Dive

GNews AI Gemma

1mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 127 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Show HN: AI agent skills for affiliate marketing (Markdown, works with any LLM)

Article URL: https://github.com/Affitor/affiliate-skills Comments URL: https://news.ycombinator.com/item?id=47632530 Points: 1 # Comments: 0

Hacker News AI Top

1m15 minutes ago

ModelsLive

"Cognitive surrender" leads AI users to abandon logical thinking, research finds

Article URL: https://arstechnica.com/ai/2026/04/research-finds-ai-users-scarily-willing-to-surrender-their-cognition-to-llms/ Comments URL: https://news.ycombinator.com/item?id=47632504 Points: 5 # Comments: 0

Hacker News AI Top

2m17 minutes ago

ModelsLive

[D] Best websites for pytorch/numpy interviews

Reddit r/MachineLearning

1mabout 2 hours ago

ModelsLive

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data about how they train AI models.

Wired AI

4m15 minutes ago