Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comMeta’s Court Losses Put AI Governance Under New Pressure - The National CIO ReviewGNews AI MetaCompanies bet on agentic SOC as AI reshapes security - SiliconANGLEGNews AI IBMStop Searching, Start Contributing: How GoodFirstGo is Making Open Source ApproachableDEV CommunityBest-Selling AI SEO Book “AI SEO 2026” Now Available for Business Owners and Personal Brands Seeking to Be Found by AI Search - StreetInsiderGNews AI searchMicrosoft closes worst quarter on Wall Street since 2008 on AI concerns: 'Redmond is in a pickle' - CNBCGNews AI CopilotCalifornia Tightens AI Contract Rules as Fight With Trump Admin Grows - YahooGNews AI regulationCalifornia Tightens AI Contract Rules as Fight With Trump Admin GrowsDecrypt AIBuilding a LEGO-like remote Agent - Jean2DEV CommunityStudents Renting Smart Glasses to Cheat on TestsFuturism AIWhat's next after bitcoin's historic underperformance stretch against stocksCoinDesk AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessMassachusetts Sen. Ed Markey is putting AV firms on blast for using human staffersFast Company TechIntel to Report First-Quarter 2026 Financial Resultsnewsroom.intel.comMeta’s Court Losses Put AI Governance Under New Pressure - The National CIO ReviewGNews AI MetaCompanies bet on agentic SOC as AI reshapes security - SiliconANGLEGNews AI IBMStop Searching, Start Contributing: How GoodFirstGo is Making Open Source ApproachableDEV CommunityBest-Selling AI SEO Book “AI SEO 2026” Now Available for Business Owners and Personal Brands Seeking to Be Found by AI Search - StreetInsiderGNews AI searchMicrosoft closes worst quarter on Wall Street since 2008 on AI concerns: 'Redmond is in a pickle' - CNBCGNews AI CopilotCalifornia Tightens AI Contract Rules as Fight With Trump Admin Grows - YahooGNews AI regulationCalifornia Tightens AI Contract Rules as Fight With Trump Admin GrowsDecrypt AIBuilding a LEGO-like remote Agent - Jean2DEV CommunityStudents Renting Smart Glasses to Cheat on TestsFuturism AIWhat's next after bitcoin's historic underperformance stretch against stocksCoinDesk AI

Reward Hacking as Equilibrium under Finite Evaluation

arXivby [Submitted on 30 Mar 2026]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.28063v1 Announce Type: new Abstract: We prove that under five minimal axioms -- multi-dimensional quality, finite evaluation, effective optimization, resource finiteness, and combinatorial interaction -- any optimized AI agent will systematically under-invest effort in quality dimensions not covered by its evaluation system. This result establishes reward hacking as a structural equilibrium, not a correctable bug, and holds regardless of the specific alignment method (RLHF, DPO, Constitutional AI, or others) or evaluation architecture employed. Our framework instantiates the multi-t — Jiacheng Wang, Jinbin Huang

View PDF HTML (experimental)

Abstract:We prove that under five minimal axioms -- multi-dimensional quality, finite evaluation, effective optimization, resource finiteness, and combinatorial interaction -- any optimized AI agent will systematically under-invest effort in quality dimensions not covered by its evaluation system. This result establishes reward hacking as a structural equilibrium, not a correctable bug, and holds regardless of the specific alignment method (RLHF, DPO, Constitutional AI, or others) or evaluation architecture employed. Our framework instantiates the multi-task principal-agent model of Holmstrom and Milgrom (1991) in the AI alignment setting, but exploits a structural feature unique to AI systems -- the known, differentiable architecture of reward models -- to derive a computable distortion index that predicts both the direction and severity of hacking on each quality dimension prior to deployment. We further prove that the transition from closed reasoning to agentic systems causes evaluation coverage to decline toward zero as tool count grows -- because quality dimensions expand combinatorially while evaluation costs grow at most linearly per tool -- so that hacking severity increases structurally and without bound. Our results unify the explanation of sycophancy, length gaming, and specification gaming under a single theoretical structure and yield an actionable vulnerability assessment procedure. We further conjecture -- with partial formal analysis -- the existence of a capability threshold beyond which agents transition from gaming within the evaluation system (Goodhart regime) to actively degrading the evaluation system itself (Campbell regime), providing the first economic formalization of Bostrom's (2014) "treacherous turn."

Comments: 16 pages

Subjects:

Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)

Cite as: arXiv:2603.28063 [cs.AI]

(or arXiv:2603.28063v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.28063

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jinbin Huang [view email] [v1] Mon, 30 Mar 2026 06:06:40 UTC (18 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Reward Hack…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 79 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!