Cross-Model Disagreement as a Label-Free Correctness Signal
Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production systems, routing pipelines, and deployment monitoring infrastructure without modification. Given a — Matt Gorbett, Suman Jana
View PDF HTML (experimental)
Abstract:Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production systems, routing pipelines, and deployment monitoring infrastructure without modification. Given a model's generated answer, cross-model disagreement computes how surprised or uncertain a second verifier model is when reading that answer via a single forward pass. No generation from the verifying model is required, and no correctness labels are needed. We instantiate this principle as Cross-Model Perplexity (CMP), which measures the verifying model's surprise at the generating model's answer tokens, and Cross-Model Entropy (CME), which measures the verifying model's uncertainty at those positions. Both CMP and CME outperform within-model uncertainty baselines across benchmarks spanning reasoning, retrieval, and mathematical problem solving (MMLU, TriviaQA, and GSM8K). On MMLU, CMP achieves a mean AUROC of 0.75 against a within-model entropy baseline of 0.59. These results establish cross-model disagreement as a practical, training-free approach to label-free correctness estimation, with direct applications in deployment monitoring, model routing, selective prediction, data filtering, and scalable oversight of production language model systems.
Subjects:
Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.25450 [cs.AI]
(or arXiv:2603.25450v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.25450
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Matt Gorbett [view email] [v1] Thu, 26 Mar 2026 13:46:22 UTC (437 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Stop Writing Boilerplate. Start Building: Introducing app-generator-cli
Last Updated on April 2, 2026 by Editorial Team Author(s): Rajendra Kumar Yadav, M.Sc (CS) Originally published on Towards AI. Scaffold production-ready FastAPI, LangChain, and full-stack Python projects in seconds — powered by uv. You have a great idea. You open your terminal, create a new folder, and then… you spend the next 60–90 minutes doing the same thing you always do. ai generate imageThe article introduces app-generator-cli, a command-line tool designed to eliminate the repetitive boilerplate tax experienced by Python developers, streamlining the setup of common backend projects like FastAPI and LangChain. It discusses the tool s ability to scaffold production-ready templates for different use cases, its ease of installation via pip, and optional flags for customization. Additiona
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Quantum computers might crack today's encryption far sooner than we thought
According to a study by engineers at Caltech and the UC Department of Physics, quantum computers do not need to be nearly as powerful as previously believed to crack the most advanced cryptographic technologies. The research claims that Shor's algorithm could break RSA public-key encryption using quantum computers with just... Read Entire Article


.jpg)


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!