Research Papers research paper arxiv computer-vision image-recognition

Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models

arXivMarch 31, 20262 min read0 views

arXiv:2603.07619v2 Announce Type: replace Abstract: Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express hi — Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan

View PDF HTML (experimental)

Abstract:Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express high confidence because intermediate layers have already converged to an incorrect hypothesis. We show that the key to hallucination detection lies within the model's thought process, not its final output. By probing decoder layers, we uncover a previously overlooked behavior, overthinking: models repeatedly revise object hypotheses across layers before committing to an incorrect answer. Once the model latches onto a confounded hypothesis, it can propagate through subsequent layers, ultimately causing hallucination. To capture this behavior, we introduce the Overthinking Score, a metric to measure how many competing hypotheses the model entertains and how unstable these hypotheses are across layers. This score significantly improves hallucination detection: 78.9% F1 on MSCOCO and 71.58% on AMBER.

Comments: CVPR2026 Findings

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.07619 [cs.CV]

(or arXiv:2603.07619v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.07619

arXiv-issued DOI via DataCite

Submission history

From: Ta Duc Huy [view email] [v1] Sun, 8 Mar 2026 13:07:32 UTC (3,975 KB) [v2] Sun, 29 Mar 2026 04:48:21 UTC (3,981 KB)

Original source

arXiv

https://arxiv.org/abs/2603.07619

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersLive

Instance-Optimality in PageRank Computation

arXiv:2512.16087v2 Announce Type: replace Abstract: We study the problem of estimating a vertex's PageRank within a constant relative error, with constant probability. We prove that an adaptive variant of a simple, classic algorithm is instance-optimal up to a polylogarithmic factor for all directed graphs of order $n$ whose maximum in- and out-degrees are at most a constant fraction of $n$. In other words, there is no correct algorithm that can be faster than our algorithm on any such graph by more than a polylogarithmic factor. We further extend the instance-optimality to all graphs in which at most a polylogarithmic number of vertices have unbounded degrees. This covers all sparse graphs with $\tilde{O}(n)$ edges. Finally, we provide a counterexample showing that our algorithm is not in

arXiv cs.DS

1mabout 2 hours ago

ReleasesLive

Local Node Differential Privacy

arXiv:2602.15802v2 Announce Type: replace Abstract: We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP*, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries about the input graph's degree distribution. Our framework is based on a new object, called the blurry degree distribution, which closely approximates the degree distribution and has lower sensitivity. Instead of answering queries about the degree distribution directly, our algorithms answer queries about the blur

arXiv cs.DS

2mabout 2 hours ago

ReleasesLive

Fully Dynamic Euclidean k-Means

arXiv:2507.11256v4 Announce Type: replace Abstract: We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^{\epsilon})$ update time, and $\tilde{O}(1)$ recourse, for any $\epsilon \in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go b

arXiv cs.DS

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 224 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Instance-Optimality in PageRank Computation

arXiv cs.DS

1mabout 2 hours ago

Research PapersLive

Near-Optimal Distributed Ruling Sets for Trees and High-Girth Graphs

arXiv:2504.21777v2 Announce Type: replace Abstract: Given a graph $G=(V,E)$, a $\beta$-ruling set is a subset $S\subseteq V$ that is i) independent, and ii) every node $v\in V$ has a node of $S$ within distance $\beta$. In this paper we present almost optimal distributed algorithms for finding ruling sets in trees and high girth graphs in the classic LOCAL model. As our first contribution we present an $O(\log\log n)$-round randomized algorithm for computing $2$-ruling sets on trees, almost matching the $\Omega(\log\log n/\log\log\log n)$ lower bound given by Balliu et al. [FOCS'20]. Second, we show that $2$-ruling sets can be solved in $\widetilde{O}(\log^{5/3}\log n)$ rounds in high-girth graphs. Lastly, we show that $O(\log\log\log n)$-ruling sets can be computed in $\widetilde{O}(\log\

arXiv cs.DS

1mabout 2 hours ago

Research PapersLive

Branch-and-Bound Algorithms as Polynomial-time Approximation Schemes

arXiv:2504.15885v3 Announce Type: replace Abstract: Branch-and-bound algorithms (B&B) and polynomial-time approximation schemes (PTAS) are two seemingly distant areas of combinatorial optimization. We intend to (partially) bridge the gap between them while expanding the boundary of theoretical knowledge on the B\&B framework. Branch-and-bound algorithms typically guarantee that an optimal solution is eventually found. However, we show that the standard implementation of branch-and-bound for certain knapsack and scheduling problems also exhibits PTAS-like behavior, yielding increasingly better solutions within polynomial time. Our findings are supported by computational experiments and comparisons with benchmark methods. This paper is an extended version of a paper accepted at ICALP 2025

arXiv cs.DS

1mabout 2 hours ago

Research PapersLive

On the Dynamics of Linear Finite Dynamical Systems Over Galois Rings

arXiv:2604.01548v1 Announce Type: cross Abstract: Linear finite dynamical systems play an important role, for example, in coding theory and simulations. Methods for analyzing such systems are often restricted to cases in which the system is defined over a field %and usually strive to achieve a complete description of the system and its dynamics. or lack practicability to effectively analyze the system's dynamical behavior. However, when analyzing and prototyping finite dynamical systems, it is often desirable to quickly obtain basic information such as the length of cycles and transients that appear in its dynamics, which is reflected in the structure of the connected components of the corresponding functional graphs. In this paper, we extend the analysis of the dynamics of linear finite d

arXiv cs.DS

1mabout 2 hours ago