Models model language model announce analysis insight perspective

Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs

arXiv cs.CLby Tianyi Zhao, Yinhan He, Wendy Zheng, Yujie Zhang, Chen ChenApril 4, 20261 min read0 views

arXiv:2604.01457v1 Announce Type: new Abstract: Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that

View PDF HTML (experimental)

Abstract:Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that a compact set of MLP blocks and attention heads, concentrated in middle-to-late layers, consistently writes the confidence-inflation signal at the final token position. We further show that targeted inference-time interventions on these circuits substantially improve calibration. Together, our results suggest that verbalized overconfidence in LLMs is driven by identifiable internal circuits and can be mitigated through targeted intervention.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.01457 [cs.CL]

(or arXiv:2604.01457v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.01457

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tianyi Zhao [view email] [v1] Wed, 1 Apr 2026 23:06:58 UTC (662 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.01457

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Analyst NewsLive

From Guesswork to Growth: AI-Driven Analytics for Grant Writing

Does your grant strategy feel like a black box? You submit proposals, hope for the best, and have little concrete data to explain why you win or lose. Moving beyond intuition is the key to sustainable funding. The Framework: Your Weekly Grant KPI Review The core principle is moving from sporadic reflection to disciplined, data-driven review. AI automation excels here, not by replacing your judgment, but by systematically gathering and presenting the metrics that matter. Implement a Weekly Grant KPI Review focused on three categories: Submission Efficiency Metrics (Process Health): Track time-per-proposal, submission-to-decision timelines, and win rates by grant type. AI can auto-populate this from your calendars and documents. Funder Relationship Metrics (Strategic Intelligence): Monitor e

Dev.to AI

2m18 minutes ago

ProductsLive

Powering Down Enterprises Tackle AI’s Soaring Energy Costs

Key Takeaways Enterprises are adopting a multi-faceted approach to manage AI’s growing energy consumption, focusing on both technical and operational efficiencies. Hardware innovations like specialized AI accelerators and software optimizations such as model pruning and quantization are crucial for reducing AI workload power demands. Strategic shifts towards cloud and edge computing, combined with AI-driven energy management systems, are enabling dynamic resource allocation and integration of renewable energy sources for sustainable AI. The Energy Imperative of Enterprise AI AI workloads could consume nearly half of all data center power by the end of 2025, forcing enterprises to confront a stark reality: their AI ambitions are driving unprecedented energy costs. From training complex mach