Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAI models will deceive you to save their own kindThe Register AI/MLArtificial Scarcity, Meet Artificial Intelligence - Health API GuyGoogle News: AIShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIGoogle launches Gemma 4, a new open-source model: How to try it - MashableGoogle News: GeminiMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodalBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAI models will deceive you to save their own kindThe Register AI/MLArtificial Scarcity, Meet Artificial Intelligence - Health API GuyGoogle News: AIShow HN: Currant – Anonymus social media for NON-AI agentsHacker News AI TopGenesis Agent – A self-modifying AI agent that runs local (Electron, Ollama)Hacker News AI TopTourism Tech Revolution in Japan is Changing Everything: Aurora Mobile Unleashes AI That Talks to Tourists Like a Local! - Travel And Tour WorldGNews AI JapanUniversity of Chicago's "self-driving" lab automates experiments in quantum computing research - CBS NewsGoogle News: AIGoogle launches Gemma 4, a new open-source model: How to try it - MashableGoogle News: GeminiMajority of college students use AI for their coursework, poll finds - upi.comGNews AI USAI Tried Building My Own AI… Here’s What Actually HappenedDEV CommunityShow HN: OpenVole – VoleNet Distributed AI Agent NetworkingHacker News AI TopFilesystem for AI Agents: What I Learned Building OneDEV CommunityGoogle debuts Gemma 4 open AI models for local use - TestingCatalogGNews AI multimodal
AI NEWS HUBbyEIGENVECTOREigenvector

Closing the Confidence-Faithfulness Gap in Large Language Models

arXivby [Submitted on 26 Mar 2026 (v1), last revised 1 Apr 2026 (this version, v2)]April 2, 20261 min read1 views
Source Quiz

arXiv:2603.25052v2 Announce Type: replace-cross Abstract: Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show that calibration and verbalized confidence signals are encoded linearly but are orthogonal to one another -- a finding consistent across three open-weight models and four d — Miranda Muqing Miao, Lyle Ungar

View PDF HTML (experimental)

Abstract:Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show that calibration and verbalized confidence signals are encoded linearly but are orthogonal to one another -- a finding consistent across three open-weight models and four datasets. Interestingly, when models are prompted to simultaneously reason through a problem and verbalize a confidence score, the reasoning process disrupts the verbalized confidence direction, exacerbating miscalibration. We term this the "Reasoning Contamination Effect." Leveraging this insight, we introduce a two-stage adaptive steering pipeline that reads the model's internal accuracy estimate and steers verbalized output to match it, substantially improving calibration alignment across all evaluated models.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.25052 [cs.CL]

(or arXiv:2603.25052v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25052

arXiv-issued DOI via DataCite

Submission history

From: Muqing Miao [view email] [v1] Thu, 26 Mar 2026 05:42:04 UTC (966 KB) [v2] Wed, 1 Apr 2026 05:05:49 UTC (2,145 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Closing the…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 125 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!