Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes
arXiv:2512.14177v2 Announce Type: replace Abstract: Large Vision-Language Models (LVLMs) often produce plausible but unreliable outputs, making robust uncertainty estimation essential. Recent work on semantic uncertainty estimates relies on external models to cluster multiple sampled responses and measure their semantic consistency. However, these clustering methods are often fragile, highly sensitive to minor phrasing variations, and can incorrectly group or separate semantically similar answers, leading to unreliable uncertainty estimates. We propose Semantic Gaussian Process Uncertainty (SG — Joseph Hoche, Andrei Bursuc, David Brellmann, Gilles Louppe, Pavel Izmailov, Angela Yao, Gianni Franchi
View PDF HTML (experimental)
Abstract:Large Vision-Language Models (LVLMs) often produce plausible but unreliable outputs, making robust uncertainty estimation essential. Recent work on semantic uncertainty estimates relies on external models to cluster multiple sampled responses and measure their semantic consistency. However, these clustering methods are often fragile, highly sensitive to minor phrasing variations, and can incorrectly group or separate semantically similar answers, leading to unreliable uncertainty estimates. We propose Semantic Gaussian Process Uncertainty (SGPU), a Bayesian framework that quantifies semantic uncertainty by analyzing the geometric structure of answer embeddings, avoiding brittle clustering. SGPU maps generated answers into a dense semantic space, computes the Gram matrix of their embeddings, and summarizes their semantic configuration via the eigenspectrum. This spectral representation is then fed into a Gaussian Process Classifier that learns to map patterns of semantic consistency to predictive uncertainty, and that can be applied in both black-box and white-box settings. Across six LLMs and LVLMs on eight datasets spanning VQA, image classification, and textual QA, SGPU consistently achieves state-of-the-art calibration (ECE) and discriminative (AUROC, AUARC) performance. We further show that SGPU transfers across models and modalities, indicating that its spectral representation captures general patterns of semantic uncertainty.
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2512.14177 [cs.CV]
(or arXiv:2512.14177v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2512.14177
arXiv-issued DOI via DataCite
Submission history
From: Joseph Hoche [view email] [v1] Tue, 16 Dec 2025 08:15:24 UTC (2,200 KB) [v2] Mon, 30 Mar 2026 12:43:20 UTC (1,325 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivScientists uncover the brain’s hidden learning blocks
Princeton researchers found that the brain excels at learning because it reuses modular “cognitive blocks” across many tasks. Monkeys switching between visual categorization challenges revealed that the prefrontal cortex assembles these blocks like Legos to create new behaviors. This flexibility explains why humans learn quickly while AI models often forget old skills. The insights may help build better AI and new clinical treatments for impaired cognitive adaptability.
This tiny implant sends secret messages to the brain
Researchers have built a fully implantable device that sends light-based messages directly to the brain. Mice learned to interpret these artificial patterns as meaningful signals, even without touch, sight, or sound. The system uses up to 64 micro-LEDs to create complex neural patterns that resemble natural sensory activity. It could pave the way for next-generation prosthetics and new therapies.
AI finds a hidden stress signal inside routine CT scans
Researchers used a deep learning AI model to uncover the first imaging-based biomarker of chronic stress by measuring adrenal gland volume on routine CT scans. This new metric, the Adrenal Volume Index, correlates strongly with cortisol levels, allostatic load, perceived stress, and even long-term cardiovascular outcomes, including heart failure risk.
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
The breakthrough that makes robot faces feel less creepy
Humans pay enormous attention to lips during conversation, and robots have struggled badly to keep up. A new robot developed at Columbia Engineering learned realistic lip movements by watching its own reflection and studying human videos online. This allowed it to speak and sing with synchronized facial motion, without being explicitly programmed. Researchers believe this breakthrough could help robots finally cross the uncanny valley.
Unbreakable? Researchers warn quantum computers have serious security flaws
Quantum computers could revolutionize everything from drug discovery to business analytics—but their incredible power also makes them surprisingly vulnerable. New research from Penn State warns that today’s quantum machines are not just futuristic tools, but potential gold mines for hackers. The study reveals that weaknesses can exist not only in software, but deep within the physical hardware itself, where valuable algorithms and sensitive data may be exposed.

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!