Overthinking Causes Hallucination: Tracing Confounder Propagation in Vision Language Models
arXiv:2603.07619v2 Announce Type: replace Abstract: Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express hi — Abin Shoby, Ta Duc Huy, Tuan Dung Nguyen, Minh Khoi Ho, Qi Chen, Anton van den Hengel, Phi Le Nguyen, Johan W. Verjans, Vu Minh Hieu Phan
View PDF HTML (experimental)
Abstract:Vision Language models (VLMs) often hallucinate non-existent objects. Detecting hallucination is analogous to detecting deception: a single final statement is insufficient, one must examine the underlying reasoning process. Yet existing detectors rely mostly on final-layer signals. Attention-based methods assume hallucinated tokens exhibit low attention, while entropy-based ones use final-step uncertainty. Our analysis reveals the opposite: hallucinated objects can exhibit peaked attention due to contextual priors; and models often express high confidence because intermediate layers have already converged to an incorrect hypothesis. We show that the key to hallucination detection lies within the model's thought process, not its final output. By probing decoder layers, we uncover a previously overlooked behavior, overthinking: models repeatedly revise object hypotheses across layers before committing to an incorrect answer. Once the model latches onto a confounded hypothesis, it can propagate through subsequent layers, ultimately causing hallucination. To capture this behavior, we introduce the Overthinking Score, a metric to measure how many competing hypotheses the model entertains and how unstable these hypotheses are across layers. This score significantly improves hallucination detection: 78.9% F1 on MSCOCO and 71.58% on AMBER.
Comments: CVPR2026 Findings
Subjects:
Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2603.07619 [cs.CV]
(or arXiv:2603.07619v2 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.07619
arXiv-issued DOI via DataCite
Submission history
From: Ta Duc Huy [view email] [v1] Sun, 8 Mar 2026 13:07:32 UTC (3,975 KB) [v2] Sun, 29 Mar 2026 04:48:21 UTC (3,981 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Instance-Optimality in PageRank Computation
arXiv:2512.16087v2 Announce Type: replace Abstract: We study the problem of estimating a vertex's PageRank within a constant relative error, with constant probability. We prove that an adaptive variant of a simple, classic algorithm is instance-optimal up to a polylogarithmic factor for all directed graphs of order $n$ whose maximum in- and out-degrees are at most a constant fraction of $n$. In other words, there is no correct algorithm that can be faster than our algorithm on any such graph by more than a polylogarithmic factor. We further extend the instance-optimality to all graphs in which at most a polylogarithmic number of vertices have unbounded degrees. This covers all sparse graphs with $\tilde{O}(n)$ edges. Finally, we provide a counterexample showing that our algorithm is not in

Local Node Differential Privacy
arXiv:2602.15802v2 Announce Type: replace Abstract: We initiate an investigation of node differential privacy for graphs in the local model of private data analysis. In our model, dubbed LNDP*, each node sees its own edge list and releases the output of a local randomizer on this input. These outputs are aggregated by an untrusted server to obtain a final output. We develop a novel algorithmic framework for this setting that allows us to accurately answer arbitrary linear queries about the input graph's degree distribution. Our framework is based on a new object, called the blurry degree distribution, which closely approximates the degree distribution and has lower sensitivity. Instead of answering queries about the degree distribution directly, our algorithms answer queries about the blur

Fully Dynamic Euclidean k-Means
arXiv:2507.11256v4 Announce Type: replace Abstract: We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^{\epsilon})$ update time, and $\tilde{O}(1)$ recourse, for any $\epsilon \in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go b
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Instance-Optimality in PageRank Computation
arXiv:2512.16087v2 Announce Type: replace Abstract: We study the problem of estimating a vertex's PageRank within a constant relative error, with constant probability. We prove that an adaptive variant of a simple, classic algorithm is instance-optimal up to a polylogarithmic factor for all directed graphs of order $n$ whose maximum in- and out-degrees are at most a constant fraction of $n$. In other words, there is no correct algorithm that can be faster than our algorithm on any such graph by more than a polylogarithmic factor. We further extend the instance-optimality to all graphs in which at most a polylogarithmic number of vertices have unbounded degrees. This covers all sparse graphs with $\tilde{O}(n)$ edges. Finally, we provide a counterexample showing that our algorithm is not in

Near-Optimal Distributed Ruling Sets for Trees and High-Girth Graphs
arXiv:2504.21777v2 Announce Type: replace Abstract: Given a graph $G=(V,E)$, a $\beta$-ruling set is a subset $S\subseteq V$ that is i) independent, and ii) every node $v\in V$ has a node of $S$ within distance $\beta$. In this paper we present almost optimal distributed algorithms for finding ruling sets in trees and high girth graphs in the classic LOCAL model. As our first contribution we present an $O(\log\log n)$-round randomized algorithm for computing $2$-ruling sets on trees, almost matching the $\Omega(\log\log n/\log\log\log n)$ lower bound given by Balliu et al. [FOCS'20]. Second, we show that $2$-ruling sets can be solved in $\widetilde{O}(\log^{5/3}\log n)$ rounds in high-girth graphs. Lastly, we show that $O(\log\log\log n)$-ruling sets can be computed in $\widetilde{O}(\log\

Branch-and-Bound Algorithms as Polynomial-time Approximation Schemes
arXiv:2504.15885v3 Announce Type: replace Abstract: Branch-and-bound algorithms (B&B) and polynomial-time approximation schemes (PTAS) are two seemingly distant areas of combinatorial optimization. We intend to (partially) bridge the gap between them while expanding the boundary of theoretical knowledge on the B\&B framework. Branch-and-bound algorithms typically guarantee that an optimal solution is eventually found. However, we show that the standard implementation of branch-and-bound for certain knapsack and scheduling problems also exhibits PTAS-like behavior, yielding increasingly better solutions within polynomial time. Our findings are supported by computational experiments and comparisons with benchmark methods. This paper is an extended version of a paper accepted at ICALP 2025

On the Dynamics of Linear Finite Dynamical Systems Over Galois Rings
arXiv:2604.01548v1 Announce Type: cross Abstract: Linear finite dynamical systems play an important role, for example, in coding theory and simulations. Methods for analyzing such systems are often restricted to cases in which the system is defined over a field %and usually strive to achieve a complete description of the system and its dynamics. or lack practicability to effectively analyze the system's dynamical behavior. However, when analyzing and prototyping finite dynamical systems, it is often desirable to quickly obtain basic information such as the length of cycles and transients that appear in its dynamics, which is reflected in the structure of the connected components of the corresponding functional graphs. In this paper, we extend the analysis of the dynamics of linear finite d


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!