Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNvidia Stock Rises. This Issue Could Hamper Its Next-Generation AI Chips. - Barron'sGNews AI NVIDIABroadcom's CEO Has Line of Sight to $100 Billion in AI Chip Revenue. Is the Stock a Buy? - The Motley FoolGoogle News: AII gave Claude Code our entire codebase. Our customers noticed. | Al Chen (Galileo)lennysnewsletter.comGoogle DeepMind and Agile Robotics Combine Robotics Platforms - Automation WorldGoogle News: DeepMindBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIBuilding a Resume & Portfolio Platform with Next.js and ReactDev.to AIWhy AI-Powered Ecommerce Website Development Is the New Competitive Edge in 2026Dev.to AIFAQs on Visionary AI: Transforming the Future of InnovationDev.to AIDid AMD Just Beat Nvidia In AI Performance? - ForbesGNews AI NVIDIANvidia and Google are the safest AI bets in public markets: Intelligent Alpha CEO Doug Clinton - CNBCGNews AI NVIDIAOnly 20% of MCP Servers Are 'A-Grade' Secure — Here's How to Vet Them Before InstallingDev.to AIThe Senior Engineer's Guide to CLAUDE.md: From Generic to ActionableDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNvidia Stock Rises. This Issue Could Hamper Its Next-Generation AI Chips. - Barron'sGNews AI NVIDIABroadcom's CEO Has Line of Sight to $100 Billion in AI Chip Revenue. Is the Stock a Buy? - The Motley FoolGoogle News: AII gave Claude Code our entire codebase. Our customers noticed. | Al Chen (Galileo)lennysnewsletter.comGoogle DeepMind and Agile Robotics Combine Robotics Platforms - Automation WorldGoogle News: DeepMindBig Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.Dev.to AIBuilding a Resume & Portfolio Platform with Next.js and ReactDev.to AIWhy AI-Powered Ecommerce Website Development Is the New Competitive Edge in 2026Dev.to AIFAQs on Visionary AI: Transforming the Future of InnovationDev.to AIDid AMD Just Beat Nvidia In AI Performance? - ForbesGNews AI NVIDIANvidia and Google are the safest AI bets in public markets: Intelligent Alpha CEO Doug Clinton - CNBCGNews AI NVIDIAOnly 20% of MCP Servers Are 'A-Grade' Secure — Here's How to Vet Them Before InstallingDev.to AIThe Senior Engineer's Guide to CLAUDE.md: From Generic to ActionableDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

Information-Theoretic Limits of Safety Verification for Self-Improving Systems

arXivby [Submitted on 30 Mar 2026 (v1), last revised 2 Apr 2026 (this version, v2)]March 31, 20262 min read1 views
Source Quiz

arXiv:2603.28650v1 Announce Type: cross Abstract: Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions -- requiring sum delta_n 1, any classifier-based gate under overlapping safe/unsafe distributions satisfies TPR_n <= C_alpha * delta_n^beta via Holder's inequ — Arsenios Scrivens

View PDF HTML (experimental)

Abstract:Can a safety gate permit unbounded beneficial self-modification while maintaining bounded cumulative risk? We formalize this question through dual conditions -- requiring sum delta_n < infinity (bounded risk) and sum TPR_n = infinity (unbounded utility) -- and establish a theory of their (in)compatibility. Classification impossibility (Theorem 1): For power-law risk schedules delta_n = O(n^{-p}) with p > 1, any classifier-based gate under overlapping safe/unsafe distributions satisfies TPR_n <= C_alpha * delta_n^beta via Holder's inequality, forcing sum TPR_n < infinity. This impossibility is exponent-optimal (Theorem 3). A second independent proof via the NP counting method (Theorem 4) yields a 13% tighter bound without Holder's inequality. Universal finite-horizon ceiling (Theorem 5): For any summable risk schedule, the exact maximum achievable classifier utility is U*(N, B) = N * TPR_NP(B/N), growing as exp(O(sqrt(log N))) -- subpolynomial. At N = 10^6 with budget B = 1.0, a classifier extracts at most U* ~ 87 versus a verifier's ~500,000. Verification escape (Theorem 2): A Lipschitz ball verifier achieves delta = 0 with TPR > 0, escaping the impossibility. Formal Lipschitz bounds for pre-LayerNorm transformers under LoRA enable LLM-scale verification. The separation is strict. We validate on GPT-2 (d_LoRA = 147,456): conditional delta = 0 with TPR = 0.352. Comprehensive empirical validation is in the companion paper [D2].

Comments: 27 pages, 6 figures. Companion empirical paper: doi:https://doi.org/10.5281/zenodo.19237566

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

Cite as: arXiv:2603.28650 [cs.LG]

(or arXiv:2603.28650v2 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28650

arXiv-issued DOI via DataCite

Related DOI:

https://doi.org/10.5281/zenodo.19237451

DOI(s) linking to related resources

Submission history

From: Arsenios Scrivens [view email] [v1] Mon, 30 Mar 2026 16:34:37 UTC (136 KB) [v2] Thu, 2 Apr 2026 00:23:37 UTC (136 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Information…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Building knowledge graph…

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!