Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs
arXiv:2604.01457v1 Announce Type: new Abstract: Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that
View PDF HTML (experimental)
Abstract:Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that a compact set of MLP blocks and attention heads, concentrated in middle-to-late layers, consistently writes the confidence-inflation signal at the final token position. We further show that targeted inference-time interventions on these circuits substantially improve calibration. Together, our results suggest that verbalized overconfidence in LLMs is driven by identifiable internal circuits and can be mitigated through targeted intervention.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2604.01457 [cs.CL]
(or arXiv:2604.01457v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.01457
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Tianyi Zhao [view email] [v1] Wed, 1 Apr 2026 23:06:58 UTC (662 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounce
From Guesswork to Growth: AI-Driven Analytics for Grant Writing
Does your grant strategy feel like a black box? You submit proposals, hope for the best, and have little concrete data to explain why you win or lose. Moving beyond intuition is the key to sustainable funding. The Framework: Your Weekly Grant KPI Review The core principle is moving from sporadic reflection to disciplined, data-driven review. AI automation excels here, not by replacing your judgment, but by systematically gathering and presenting the metrics that matter. Implement a Weekly Grant KPI Review focused on three categories: Submission Efficiency Metrics (Process Health): Track time-per-proposal, submission-to-decision timelines, and win rates by grant type. AI can auto-populate this from your calendars and documents. Funder Relationship Metrics (Strategic Intelligence): Monitor e

Powering Down Enterprises Tackle AI’s Soaring Energy Costs
Key Takeaways Enterprises are adopting a multi-faceted approach to manage AI’s growing energy consumption, focusing on both technical and operational efficiencies. Hardware innovations like specialized AI accelerators and software optimizations such as model pruning and quantization are crucial for reducing AI workload power demands. Strategic shifts towards cloud and edge computing, combined with AI-driven energy management systems, are enabling dynamic resource allocation and integration of renewable energy sources for sustainable AI. The Energy Imperative of Enterprise AI AI workloads could consume nearly half of all data center power by the end of 2025, forcing enterprises to confront a stark reality: their AI ambitions are driving unprecedented energy costs. From training complex mach
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.




Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!