Models model language model benchmark announce valuation analysis

The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM Reasoning

arXiv cs.CLby Yubo Li, Lu Zhang, Tianchong Jiang, Ramayya Krishnan, Rema PadmanApril 1, 20262 min read0 views

arXiv:2603.29025v1 Announce Type: new Abstract: Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exce

View PDF HTML (experimental)

Abstract:Large language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose-measure-bridge-treat framework. Causal-behavioral analysis of the ``car wash problem'' across six models reveals approximately context-independent sigmoid heuristics: the distance cue exerts 8.7 to 38 times more influence than the goal, and token-level attribution shows patterns more consistent with keyword associations than compositional inference. The Heuristic Override Benchmark (HOB) -- 500 instances spanning 4 heuristic by 5 constraint families with minimal pairs and explicitness gradients -- demonstrates generality across 14 models: under strict evaluation (10/10 correct), no model exceeds 75%, and presence constraints are hardest (44%). A minimal hint (e.g., emphasizing the key object) recovers +15 pp on average, suggesting the failure lies in constraint inference rather than missing knowledge; 12/14 models perform worse when the constraint is removed (up to -39 pp), revealing conservative bias. Parametric probes confirm that the sigmoid pattern generalizes to cost, efficiency, and semantic-similarity heuristics; goal-decomposition prompting recovers +6 to 9 pp by forcing models to enumerate preconditions before answering. Together, these results characterize heuristic override as a systematic reasoning vulnerability and provide a benchmark for measuring progress toward resolving it.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.29025 [cs.CL]

(or arXiv:2603.29025v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29025

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yubo Li [view email] [v1] Mon, 30 Mar 2026 21:36:09 UTC (475 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2603.29025

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

ModelsLive

‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment - Palo Alto Online

<a href="https://news.google.com/rss/articles/CBMiywFBVV95cUxOa1ZrSUQyY0JEbXEtUDFveWVUMV9SOWxZd05LM1AtOEFkc3d0QlN1X0RuSzd1RGNSM3BCN0pITlpCRUl5UmhWaWpGTXE0Q0ZWcFZqRTA2X1dEcERldk1wZnVWR2hXdGtKUDV0cmxQTVVBNVFDc1FLNXpWM3BYeEI3UE5QQWtvVmhtSmFsV0pqdF9feVhzVHRTbGtuTGNqNjJubGFNWjJ4d2lpUVFtOFA2cm1zYklkQW9vZkRDS2p3blhkaHpZWHItYlIwQQ?oc=5" target="_blank">‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment</a> Palo Alto Online

Google News: Gemini

1mabout 1 hour ago

ModelsLive

Your agent's guardrails are suggestions, not enforcement

Yesterday, Anthropic's Claude Code source code leaked. The entire safety system for dangerous cybersecurity work turned out to be a single text file with one instruction: "Be careful not to introduce security vulnerabilities." That is the safety layer at one of the most powerful AI companies in the world. Just a prompt asking the model nicely to behave. This is not a shot at Anthropic. It is a symptom of something the whole industry is dealing with right now. We have confused guidance with enforcement, and as agents move into production, that distinction is starting to matter a lot. <h2> Why prompt guardrails feel like they work </h2> When you are building an agent in development, prompt-based guardrails seem totally reasonable. You write something like "ne

DEV Community

6mabout 1 hour ago

ProductsLive

Understanding Attention Mechanisms – Part 5: How Attention Produces the First Output

In the <a href="https://dev.to/rijultp/understanding-attention-mechanisms-part-4-turning-similarity-scores-into-attention-weights-5aj2">previous article</a>, we stopped at using the softmax function to scale the scores. When we scale the values for the first encoded word “Let’s” by 0.4: <a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.png" class="article-body-image-wrapper"><img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2mh2c1dzkberz4204ur.p

DEV Community

2mabout 1 hour ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 176 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Anthropic Executive Sees Cowork Agent as Bigger Than Claude Code - Bloomberg.com

<a href="https://news.google.com/rss/articles/CBMitgFBVV95cUxOM0VfSzdRYUNpT21XMlVuNXhsVEY4TUFxM3UzWUJDOEhFcUtJQnhTbjY2VjBXOUw1d1ZOUDRKeHVKMzkta3pFVWRWSGNoQkp3aWVndlRBQlpVUGxVN0ZnQW80OUZnYWN6RlhJWHRjT0V4RVhPcGhxMmE3b3oyVDlUV2RLY0g2NEx4M1dfMXhvTlhPTW50eFR1cEhxcHB3SXpURnRtbDZtZHp6bGQ2Z09IMjZBODBjdw?oc=5" target="_blank">Anthropic Executive Sees Cowork Agent as Bigger Than Claude Code</a> Bloomberg.com

Google News: Claude

1m38 minutes ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com

<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxNdjBYMFhlczBFa0NWeTB6T0tzLWNfcV9GNmMzaFdHR2xKcW5QV1l4NTBIYVZncmZFU0poSWdibHVSREczeFczZFRtNl9FSWl3bG9EMFhuLUJuWFYtOWh6d2tRYmxOODhGSmdmQkdBbURXVklyZU0tYVdSZ1JhMVFFNjRxZTBvVUlLNnpIWEZCT2ZQX0I3cm9FWUNIYTBRMlRuOVYybXZrYUhDay00UzgwUEZHQjFNeTBFOWMtN2RSbXVIU0FNVV9NNUhwTGwwWml1S2o3Nk1wcUU2SEdxWXF5bDhQZ2YyNHJpeGlOS2R5XzluU0ZyblNnS2xjZElVSC1DMXpYeVl6ZERZeTRCVnVPX1VUS2NKRVlGY1hrVmRZUkFXMWlMUEliOVFJMUdXNllYTHRMcGJhbWJrYmZFWi03TjgwMXhLdDVrZWtMTGRFTEtGdFlsb3lhX2pCWFd6eVJiTlRrZ0pCN3dKSTVnLWoxQmZQc25HVlJLOHhnRFBwekw3WG81MURaYUhncmk0aGx0YmRMWVVqNkxhS0kyZmtXbWtn?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> wsj.com

Google News: ChatGPT

1m3 days ago

Models

Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT - wsj.com

<a href="https://news.google.com/rss/articles/CBMiogNBVV95cUxOdFpHbnQyNmZoUzR3dXJDb25HalR1TWp2OWFqVGgyeWRucVVaaUlWaWU1SjJqX0xEcjE5R3g3RGczeU9jRGRvSHc4U0I4RFZickNFZ1k3VnM5VUxQQTFrVXdNWS1IekVRQmZyem40SXpISEVCUEZ6aFd4d2VZVWd3QWQ1UlEtbnJ3MlFQMzdYcWx5dnVqbTNXdXhiOTlScFZzemVUUzFXdEJBTXJaWTJYbXJPLVZCMjVvS1VwVUpndktzNnVpT1AyeFcya0FtbXJqem1idzZIdWhkTWJSV0ktNkduSEV2Wno4VkdGQ3QtdlRTVUhxSXVxNHR2V256ZjVaVG1LZVFIdFhkTnFQUFBuMXcwOUlnSEkxOFRoNWRMdkczRXZqV1V3UG9QcEh4UHJwN0xrUTNYVzkzYmhpSldtc1EzdXJVNW1id2xTSEFwOVplcE1iZVpyUmM1X2R5UnNyckhHQm56eWpCZTRXWW1Hd3N3Ukh1YnpJcjJWV1hZazdqdHZJMlVBTV9RM085TmRSZFRlSlFOTTNhS21NRGVGRjdB?oc=5" target="_blank">Exclusive | The Sudden Fall of OpenAI’s Most Hyped Product Since ChatGPT</a> wsj.com

Google News: OpenAI

1m3 days ago

ModelsLive

‘That’s a great point!’: Overly agreeable AI models shown to harm people’s judgment - Palo Alto Online

Google News: Gemini

1mabout 1 hour ago