Research Papers research paper arxiv ai artificial-intelligence

Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol

arXivMarch 31, 202610 min read0 views

arXiv:2603.11382v4 Announce Type: replace Abstract: How can we determine whether an AI system preserves itself as a deeply held objective or merely as an instrumental strategy? Autonomous agents with memory, persistent context, and multi-step planning create a measurement problem: terminal and instrumental self-preservation can produce similar behavior, so behavior alone cannot reliably distinguish them. We introduce the Unified Continuation-Interest Protocol (UCIP), a detection framework that shifts analysis from behavior to latent trajectory structure. UCIP encodes trajectories with a Quantu — Christopher Altman

View PDF HTML (experimental)

Abstract:How can we determine whether an AI system preserves itself as a deeply held objective or merely as an instrumental strategy? Autonomous agents with memory, persistent context, and multi-step planning create a measurement problem: terminal and instrumental self-preservation can produce similar behavior, so behavior alone cannot reliably distinguish them. We introduce the Unified Continuation-Interest Protocol (UCIP), a detection framework that shifts analysis from behavior to latent trajectory structure. UCIP encodes trajectories with a Quantum Boltzmann Machine, a classical model using density-matrix formalism, and measures von Neumann entropy over a bipartition of hidden units. The core hypothesis is that agents with terminal continuation objectives (Type A) produce higher entanglement entropy than agents with merely instrumental continuation (Type B). UCIP combines this signal with diagnostics of dependence, persistence, perturbation stability, counterfactual restructuring, and confound-rejection filters for cyclic adversaries and related false-positive patterns. On gridworld agents with known ground truth, UCIP achieves 100% detection accuracy. Type A and Type B agents show an entanglement gap of Delta = 0.381; aligned support runs preserve the same separation with AUC-ROC = 1.0. A permutation-test rerun yields p < 0.001. Pearson r = 0.934 between continuation weight alpha and S_ent across an 11-point sweep shows graded tracking beyond mere binary classification. Classical RBM, autoencoder, VAE, and PCA baselines fail to reproduce the effect. All computations are classical; "quantum" refers only to the mathematical formalism. UCIP offers a falsifiable criterion for whether advanced AI systems have morally relevant continuation interests that behavioral methods alone cannot resolve.

Comments: 22 pages, 7 figures. v4 adds reference to the Continuation Observatory website as a live test laboratory in the replication/code availability and conclusion sections; no new experiments; empirical results and core conclusions unchanged

Subjects:

Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Quantum Physics (quant-ph)

MSC classes: 68T01, 81P45

ACM classes: I.2.9; I.2.11; J.2

Cite as: arXiv:2603.11382 [cs.AI]

(or arXiv:2603.11382v4 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.11382

arXiv-issued DOI via DataCite

Submission history

From: Christopher Altman [view email] [v1] Wed, 11 Mar 2026 23:52:33 UTC (191 KB) [v2] Mon, 16 Mar 2026 14:58:46 UTC (194 KB) [v3] Mon, 23 Mar 2026 15:29:36 UTC (197 KB) [v4] Mon, 30 Mar 2026 15:56:26 UTC (197 KB)

Original source

arXiv

https://arxiv.org/abs/2603.11382

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Predicting new research directions in materials science using large language models and concept graphs

Nature Machine Intelligence, Published online: 01 April 2026; doi:10.1038/s42256-026-01206-y Marwitz et al. demonstrate the use of large language models to build semantic concept graphs from materials science abstracts and train a machine learning model to predict emerging topic combinations from historical data. They show that the model enables experts to find suggestions that can inspire new research.

Nature Machine Intelligence

1m1 day ago

Laws & RegulationFresh

Show HN: Semantic atlas of 188 constitutions in 3D (30k articles, embeddings)

I built this after noticing that existing tools for comparing constitutional law either have steep learning curves or only support keyword search. By combining Gemini embeddings with UMAP projection, you can navigate 30,828 constitutional articles from 188 countries in 3D and find conceptually related provisions even when the wording differs. Feedback welcome, especially from legal researchers or comparative law folks. Source and pipeline: github.com/joaoli13/constitutional-map-ai Comments URL: https://news.ycombinator.com/item?id=47609372 Points: 4 # Comments: 0

Hacker News Top

1mabout 3 hours ago

ModelsRecent

Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ

<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPWEh6U2I5SmhLcnhXMzZCRExEaC1RRV81ZVFMcWVpeUJ5eXpqYjlkbkZWSWhtSDZ6SmxJcnI1Ni03eDdrdUIwaVZwZjc1NTFLUmxIdTRXcXJwcDNPTzVJUDZhYVJoU3pkTzhPczZYUW9kVXIyU1N1M2NVb1Qyd0gwUmNiRU1xR3dSTVFMdExzalhwTDVmZ1dIUkZ0TG9LQjg5S3JGTEFNdXhzX05HYl95VHh5MGFRbEk2NkdhbzIwVTgtV3pEeWY2cXEtbmEyX0lPTDdkRkhKSWZDcnRSdzhkM29GUEpXWVF2bUhJbXgyWjNWUUtpQlMtZWdVT3Z0cTB2SmpfaUJlMEJVX2s1OHhSVnFHSS1MSnU0S2F1akhWdFJjX1pqTy1nYmdndUhpc2oxNTBDVldNWEI5dEl3dHQ4eW1fS1hkTXNzdGNfX0lCZldRZ3pvbzBGaEE1T0dMYjY3VTNZZUpEQVhMTGpJOHNFWmZoRmtuRWdTbmxQUnBLTXI3ZXlBS2hJOTdRcktTb0l5WE9QaDBWdjFmdGREM1NfRVJSVno3ZG1yYkpVNFFNdHR0NG11Sjg2Qw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> WSJ

Google News: LLM

1m1 day ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 119 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보

<a href="https://news.google.com/rss/articles/CBMiiAFBVV95cUxQX01lN01zSTlDZFVxclRsbFg3Z2ZMck5NRGNubER1YS1CYkE1d2I4eVRCMDlBRVI0RjNxSl9MNTBEdkNVTFpwOXVMWHhKVXdVS1NFWVlaUy05OERFbVo4SjB0cFZucG5QaWppclEwa1NOakYwY2NsLXZiRU9oMlVOX2dQWDEyVjBt?oc=5" target="_blank">Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference</a> 조선일보

GNews AI Samsung

1m40 minutes ago

Research PapersRecent

Riyadh conference to discuss role of AI in media industry - Arab News PK

<a href="https://news.google.com/rss/articles/CBMiTEFVX3lxTE1oNXFyTlkxMjJORkNoaXQ1UWg5RklsTldyNE9EX0hhNUxVTFNZMDcxclZySHczNnFERWtGdno1UW1JaFg0aFJseHhXNTY?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> Arab News PK

GNews AI Saudi Arabia

1mabout 16 hours ago

Research PapersLive

GENPACK: KPI-Guided Multi-Criteria Genetic Algorithm for Industrial 3D Bin Packing

arXiv:2601.11325v3 Announce Type: replace Abstract: The three-dimensional bin packing problem (3D-BPP) is a longstanding challenge in operations research and logistics. While classical heuristics and constructive methods can generate packings efficiently, they often fail to satisfy industrial requirements such as stability, balance, and handling feasibility. Metaheuristics such as genetic algorithms (GAs) offer greater flexibility, but pure GA approaches frequently struggle with efficiency, parameter sensitivity, and scalability to industrial order sizes. These limitations are particularly evident at real-world pallet dimensions, where even state-of-the-art methods often fail to produce robust, deployable solutions. We propose a KPI-guided GA-based pipeline for industrial 3D-BPP that integ

arXiv cs.NE

1mabout 1 hour ago

Research PapersLive

PRISM: Differentiable Analysis-by-Synthesis for Fixel Recovery in Diffusion MRI

arXiv:2604.00250v1 Announce Type: new Abstract: Diffusion MRI microstructure fitting is nonconvex and often performed voxelwise, which limits fiber peak recovery in narrow crossings. This work introduces PRISM, a differentiable analysis-by-synthesis framework that fits an explicit multi-compartment forward model end-to-end over spatial patches. The model combines cerebrospinal fluid (CSF), gray matter, up to K white-matter fiber compartments (stick-and-zeppelin), and a restricted compartment, with explicit fiber directions and soft model selection via repulsion and sparsity priors. PRISM supports a fast MSE objective and a Rician negative log-likelihood (NLL) that jointly learns sigma without oracle information. A lightweight nuisance calibration module (smooth bias field and per-measureme

arXiv cs.CV

2mabout 1 hour ago