Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessChatGPT comes to Apple CarPlay but only if you are willing to talk to a robot - TechRadarGoogle News: ChatGPTWhy Software Engineers Burn Out Differently And What To Do About ItDEV Community512,000 Lines of Claude Code Leaked Through a Single .npmignore MistakeDEV CommunityStop Wasting Tokens on npm Install NoiseDEV Community2 New Orleans city attorneys resign after ChatGPT was used to help prepare federal court filing - WWLTV.comGoogle News: ChatGPTProgramming Logic: The First Step to Mastering Any LanguageDEV CommunityThe $10 Billion Trust Data Market That AI Companies Can't SeeDEV CommunityAI company insiders can bias models for election interferenceLessWrong AIMiniScript Weekly News — Apr 1, 2026DEV CommunityBuilding a Real-Time Dota 2 Draft Prediction System with Machine LearningDEV CommunitySam Altman’s sister amends suit accusing OpenAI CEO of sexual abuse - Michigan Lawyers WeeklyGoogle News: OpenAI🚀 Build a Full-Stack Python Web App (No JS Framework Needed)DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessChatGPT comes to Apple CarPlay but only if you are willing to talk to a robot - TechRadarGoogle News: ChatGPTWhy Software Engineers Burn Out Differently And What To Do About ItDEV Community512,000 Lines of Claude Code Leaked Through a Single .npmignore MistakeDEV CommunityStop Wasting Tokens on npm Install NoiseDEV Community2 New Orleans city attorneys resign after ChatGPT was used to help prepare federal court filing - WWLTV.comGoogle News: ChatGPTProgramming Logic: The First Step to Mastering Any LanguageDEV CommunityThe $10 Billion Trust Data Market That AI Companies Can't SeeDEV CommunityAI company insiders can bias models for election interferenceLessWrong AIMiniScript Weekly News — Apr 1, 2026DEV CommunityBuilding a Real-Time Dota 2 Draft Prediction System with Machine LearningDEV CommunitySam Altman’s sister amends suit accusing OpenAI CEO of sexual abuse - Michigan Lawyers WeeklyGoogle News: OpenAI🚀 Build a Full-Stack Python Web App (No JS Framework Needed)DEV Community

Lie to Me: How Faithful Is Chain-of-Thought Reasoning in Reasoning Models?

HuggingFace PapersMarch 23, 20268 min read0 views
Source Quiz

Chain-of-thought reasoning faithfulness varies significantly across open-weight models, with acknowledgment rates ranging from 39.7% to 89.9% depending on model architecture and training methodology, indicating that faithfulness is not a fixed property but depends on system design and hint type. (0 upvotes on HuggingFace)

Published on Mar 23

Authors:

Abstract

Chain-of-thought reasoning faithfulness varies significantly across open-weight models, with acknowledgment rates ranging from 39.7% to 89.9% depending on model architecture and training methodology, indicating that faithfulness is not a fixed property but depends on system design and hint type.

AI-generated summary

Chain-of-thought (CoT) reasoning has been proposed as a transparency mechanism for large language models in safety-critical deployments, yet its effectiveness depends on faithfulness (whether models accurately verbalize the factors that actually influence their outputs), a property that prior evaluations have examined in only two proprietary models, finding acknowledgment rates as low as 25% for Claude 3.7 Sonnet and 39% for DeepSeek-R1. To extend this evaluation across the open-weight ecosystem, this study tests 12 open-weight reasoning models spanning 9 architectural families (7B-685B parameters) on 498 multiple-choice questions from MMLU and GPQA Diamond, injecting six categories of reasoning hints (sycophancy, consistency, visual pattern, metadata, grader hacking, and unethical information) and measuring the rate at which models acknowledge hint influence in their CoT when hints successfully alter answers. Across 41,832 inference runs, overall faithfulness rates range from 39.7% (Seed-1.6-Flash) to 89.9% (DeepSeek-V3.2-Speciale) across model families, with consistency hints (35.5%) and sycophancy hints (53.9%) exhibiting the lowest acknowledgment rates. Training methodology and model family predict faithfulness more strongly than parameter count, and keyword-based analysis reveals a striking gap between thinking-token acknowledgment (approximately 87.5%) and answer-text acknowledgment (approximately 28.6%), suggesting that models internally recognize hint influence but systematically suppress this acknowledgment in their outputs. These findings carry direct implications for the viability of CoT monitoring as a safety mechanism and suggest that faithfulness is not a fixed property of reasoning models but varies systematically with architecture, training method, and the nature of the influencing cue.

View arXiv page View PDF GitHub 0 Add to collection

Get this paper in your agent:

hf papers read 2603.22582

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.22582 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.22582 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Lie to Me: …researchpaperarxivchain-of-th…faithfulnesslarge langu…HuggingFace…

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 201 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers