Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPowering Down Enterprises Tackle AI’s Soaring Energy CostsDev.to AIIs Micron the New Nvidia? - The Motley FoolGNews AI NVIDIAFrom Guesswork to Growth: AI-Driven Analytics for Grant WritingDev.to AILost Warship From Battle of Copenhagen Found After 225 YearsGizmodoThese One-of-a-Kind Objects Are in the Wrong MuseumsGizmodoNew 'GeForge' and 'GDDRHammer' attacks can fully infiltrate your system through Nvidia's GPU memory — Rowhammer attacks in GPUs force bit flips in protected VRAM regions to gain read/write accesstomshardware.comSoftware-update - FairScan 1.18.0Tweakers.netGPUs vs. TPUs: Decoding the Powerhouses of AIHacker News AI TopAnthropic drops OpenClaw support amid Claude overload - News.azGoogle News: ClaudeNvidia Unveils Agent Toolkit to Power Enterprise AI Agents - National TodayGNews AI NVIDIAGoodbye, middle managers. Hello, 'player-coaches' and 'org leads.'Business InsiderBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPowering Down Enterprises Tackle AI’s Soaring Energy CostsDev.to AIIs Micron the New Nvidia? - The Motley FoolGNews AI NVIDIAFrom Guesswork to Growth: AI-Driven Analytics for Grant WritingDev.to AILost Warship From Battle of Copenhagen Found After 225 YearsGizmodoThese One-of-a-Kind Objects Are in the Wrong MuseumsGizmodoNew 'GeForge' and 'GDDRHammer' attacks can fully infiltrate your system through Nvidia's GPU memory — Rowhammer attacks in GPUs force bit flips in protected VRAM regions to gain read/write accesstomshardware.comSoftware-update - FairScan 1.18.0Tweakers.netGPUs vs. TPUs: Decoding the Powerhouses of AIHacker News AI TopAnthropic drops OpenClaw support amid Claude overload - News.azGoogle News: ClaudeNvidia Unveils Agent Toolkit to Power Enterprise AI Agents - National TodayGNews AI NVIDIAGoodbye, middle managers. Hello, 'player-coaches' and 'org leads.'Business Insider
AI NEWS HUBbyEIGENVECTOREigenvector

Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs

arXiv cs.CLby Tianyi Zhao, Yinhan He, Wendy Zheng, Yujie Zhang, Chen ChenApril 4, 20261 min read0 views
Source Quiz

arXiv:2604.01457v1 Announce Type: new Abstract: Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that

View PDF HTML (experimental)

Abstract:Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that a compact set of MLP blocks and attention heads, concentrated in middle-to-late layers, consistently writes the confidence-inflation signal at the final token position. We further show that targeted inference-time interventions on these circuits substantially improve calibration. Together, our results suggest that verbalized overconfidence in LLMs is driven by identifiable internal circuits and can be mitigated through targeted intervention.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.01457 [cs.CL]

(or arXiv:2604.01457v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.01457

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Tianyi Zhao [view email] [v1] Wed, 1 Apr 2026 23:06:58 UTC (662 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Wired for O…modellanguage mo…announceanalysisinsightperspectivearXiv cs.CL

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 165 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!