Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessSources: Mercor asked professionals in fields like entertainment to sell their prior work materials for AI training, even if the IP could belong to ex-employers (Katherine Bindley/Wall Street Journal)TechmemeStop Using Robotic AI Voices — Here’s How to Make Them Sound Human (For Free)Medium AILangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and FixedMedium AIGoogle TurboQuant Codes explainedMedium AIStop Storing Data in CSV Like It’s 2010-Apache Parquet Will Change How You Think About StorageMedium AIBest HSE Software in 2026: Top 10 Platforms for Safety ProfessionalsMedium AIPython OperatorsMedium AIPsyche 2.0? Unconsciousness, Preconsciousness, Consciousness, and ComputsciousnessMedium AIHow I Would Start From $0 Today Using AI and Affiliate MarketingMedium AITragedy to Triumph: A 20 Year Problem Solved In One Meaningful Conversation by Abigail Rose…Medium AIHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessSources: Mercor asked professionals in fields like entertainment to sell their prior work materials for AI training, even if the IP could belong to ex-employers (Katherine Bindley/Wall Street Journal)TechmemeStop Using Robotic AI Voices — Here’s How to Make Them Sound Human (For Free)Medium AILangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and FixedMedium AIGoogle TurboQuant Codes explainedMedium AIStop Storing Data in CSV Like It’s 2010-Apache Parquet Will Change How You Think About StorageMedium AIBest HSE Software in 2026: Top 10 Platforms for Safety ProfessionalsMedium AIPython OperatorsMedium AIPsyche 2.0? Unconsciousness, Preconsciousness, Consciousness, and ComputsciousnessMedium AIHow I Would Start From $0 Today Using AI and Affiliate MarketingMedium AITragedy to Triumph: A 20 Year Problem Solved In One Meaningful Conversation by Abigail Rose…Medium AIHow Google's Ad Review Bots Have Evolved in 2026: What Media Buyers Need to KnowDEV CommunityApfel: The Free AI Already Built Into Your MacDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Closing the Confidence-Faithfulness Gap in Large Language Models

arXivMarch 26, 202610 min read0 views
Source Quiz

Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show that calibration and verbalized confidence signals are encoded linearly but are orthogonal to one another -- a finding consistent across three open-weight models and four datasets. Interestingly, when models are prompted to simult — Miranda Muqing Miao, Lyle Ungar

View PDF HTML (experimental)

Abstract:Large language models (LLMs) tend to verbalize confidence scores that are largely detached from their actual accuracy, yet the geometric relationship governing this behavior remain poorly understood. In this work, we present a mechanistic interpretability analysis of verbalized confidence, using linear probes and contrastive activation addition (CAA) steering to show that calibration and verbalized confidence signals are encoded linearly but are orthogonal to one another -- a finding consistent across three open-weight models and four datasets. Interestingly, when models are prompted to simultaneously reason through a problem and verbalize a confidence score, the reasoning process disrupts the verbalized confidence direction, exacerbating miscalibration. We term this the "Reasoning Contamination Effect." Leveraging this insight, we introduce a two-stage adaptive steering pipeline that reads the model's internal accuracy estimate and steers verbalized output to match it, substantially improving calibration alignment across all evaluated models.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.25052 [cs.CL]

(or arXiv:2603.25052v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25052

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Muqing Miao [view email] [v1] Thu, 26 Mar 2026 05:42:04 UTC (966 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Closing the…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 177 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!