Detecting Intrinsic and Instrumental Self-Preservation in Autonomous Agents: The Unified Continuation-Interest Protocol
arXiv:2603.11382v4 Announce Type: replace Abstract: How can we determine whether an AI system preserves itself as a deeply held objective or merely as an instrumental strategy? Autonomous agents with memory, persistent context, and multi-step planning create a measurement problem: terminal and instrumental self-preservation can produce similar behavior, so behavior alone cannot reliably distinguish them. We introduce the Unified Continuation-Interest Protocol (UCIP), a detection framework that shifts analysis from behavior to latent trajectory structure. UCIP encodes trajectories with a Quantu — Christopher Altman
View PDF HTML (experimental)
Abstract:How can we determine whether an AI system preserves itself as a deeply held objective or merely as an instrumental strategy? Autonomous agents with memory, persistent context, and multi-step planning create a measurement problem: terminal and instrumental self-preservation can produce similar behavior, so behavior alone cannot reliably distinguish them. We introduce the Unified Continuation-Interest Protocol (UCIP), a detection framework that shifts analysis from behavior to latent trajectory structure. UCIP encodes trajectories with a Quantum Boltzmann Machine, a classical model using density-matrix formalism, and measures von Neumann entropy over a bipartition of hidden units. The core hypothesis is that agents with terminal continuation objectives (Type A) produce higher entanglement entropy than agents with merely instrumental continuation (Type B). UCIP combines this signal with diagnostics of dependence, persistence, perturbation stability, counterfactual restructuring, and confound-rejection filters for cyclic adversaries and related false-positive patterns. On gridworld agents with known ground truth, UCIP achieves 100% detection accuracy. Type A and Type B agents show an entanglement gap of Delta = 0.381; aligned support runs preserve the same separation with AUC-ROC = 1.0. A permutation-test rerun yields p < 0.001. Pearson r = 0.934 between continuation weight alpha and S_ent across an 11-point sweep shows graded tracking beyond mere binary classification. Classical RBM, autoencoder, VAE, and PCA baselines fail to reproduce the effect. All computations are classical; "quantum" refers only to the mathematical formalism. UCIP offers a falsifiable criterion for whether advanced AI systems have morally relevant continuation interests that behavioral methods alone cannot resolve.
Comments: 22 pages, 7 figures. v4 adds reference to the Continuation Observatory website as a live test laboratory in the replication/code availability and conclusion sections; no new experiments; empirical results and core conclusions unchanged
Subjects:
Artificial Intelligence (cs.AI); Emerging Technologies (cs.ET); Machine Learning (cs.LG); Quantum Physics (quant-ph)
MSC classes: 68T01, 81P45
ACM classes: I.2.9; I.2.11; J.2
Cite as: arXiv:2603.11382 [cs.AI]
(or arXiv:2603.11382v4 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2603.11382
arXiv-issued DOI via DataCite
Submission history
From: Christopher Altman [view email] [v1] Wed, 11 Mar 2026 23:52:33 UTC (191 KB) [v2] Mon, 16 Mar 2026 14:58:46 UTC (194 KB) [v3] Mon, 23 Mar 2026 15:29:36 UTC (197 KB) [v4] Mon, 30 Mar 2026 15:56:26 UTC (197 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivPredicting new research directions in materials science using large language models and concept graphs
Nature Machine Intelligence, Published online: 01 April 2026; doi:10.1038/s42256-026-01206-y Marwitz et al. demonstrate the use of large language models to build semantic concept graphs from materials science abstracts and train a machine learning model to predict emerging topic combinations from historical data. They show that the model enables experts to find suggestions that can inspire new research.
Show HN: Semantic atlas of 188 constitutions in 3D (30k articles, embeddings)
I built this after noticing that existing tools for comparing constitutional law either have steep learning curves or only support keyword search. By combining Gemini embeddings with UMAP projection, you can navigate 30,828 constitutional articles from 188 countries in 3D and find conceptually related provisions even when the wording differs. Feedback welcome, especially from legal researchers or comparative law folks. Source and pipeline: github.com/joaoli13/constitutional-map-ai Comments URL: https://news.ycombinator.com/item?id=47609372 Points: 4 # Comments: 0
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxPWEh6U2I5SmhLcnhXMzZCRExEaC1RRV81ZVFMcWVpeUJ5eXpqYjlkbkZWSWhtSDZ6SmxJcnI1Ni03eDdrdUIwaVZwZjc1NTFLUmxIdTRXcXJwcDNPTzVJUDZhYVJoU3pkTzhPczZYUW9kVXIyU1N1M2NVb1Qyd0gwUmNiRU1xR3dSTVFMdExzalhwTDVmZ1dIUkZ0TG9LQjg5S3JGTEFNdXhzX05HYl95VHh5MGFRbEk2NkdhbzIwVTgtV3pEeWY2cXEtbmEyX0lPTDdkRkhKSWZDcnRSdzhkM29GUEpXWVF2bUhJbXgyWjNWUUtpQlMtZWdVT3Z0cTB2SmpfaUJlMEJVX2s1OHhSVnFHSS1MSnU0S2F1akhWdFJjX1pqTy1nYmdndUhpc2oxNTBDVldNWEI5dEl3dHQ4eW1fS1hkTXNzdGNfX0lCZldRZ3pvbzBGaEE1T0dMYjY3VTNZZUpEQVhMTGpJOHNFWmZoRmtuRWdTbmxQUnBLTXI3ZXlBS2hJOTdRcktTb0l5WE9QaDBWdjFmdGREM1NfRVJSVno3ZG1yYkpVNFFNdHR0NG11Sjg2Qw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference - 조선일보
<a href="https://news.google.com/rss/articles/CBMiiAFBVV95cUxQX01lN01zSTlDZFVxclRsbFg3Z2ZMck5NRGNubER1YS1CYkE1d2I4eVRCMDlBRVI0RjNxSl9MNTBEdkNVTFpwOXVMWHhKVXdVS1NFWVlaUy05OERFbVo4SjB0cFZucG5QaWppclEwa1NOakYwY2NsLXZiRU9oMlVOX2dQWDEyVjBt?oc=5" target="_blank">Samsung SDS Unveils AI, Digital Twin Logistics Innovations at 2026 Conference</a> <font color="#6f6f6f">조선일보</font>
Riyadh conference to discuss role of AI in media industry - Arab News PK
<a href="https://news.google.com/rss/articles/CBMiTEFVX3lxTE1oNXFyTlkxMjJORkNoaXQ1UWg5RklsTldyNE9EX0hhNUxVTFNZMDcxclZySHczNnFERWtGdno1UW1JaFg0aFJseHhXNTY?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> <font color="#6f6f6f">Arab News PK</font>
GENPACK: KPI-Guided Multi-Criteria Genetic Algorithm for Industrial 3D Bin Packing
arXiv:2601.11325v3 Announce Type: replace Abstract: The three-dimensional bin packing problem (3D-BPP) is a longstanding challenge in operations research and logistics. While classical heuristics and constructive methods can generate packings efficiently, they often fail to satisfy industrial requirements such as stability, balance, and handling feasibility. Metaheuristics such as genetic algorithms (GAs) offer greater flexibility, but pure GA approaches frequently struggle with efficiency, parameter sensitivity, and scalability to industrial order sizes. These limitations are particularly evident at real-world pallet dimensions, where even state-of-the-art methods often fail to produce robust, deployable solutions. We propose a KPI-guided GA-based pipeline for industrial 3D-BPP that integ
PRISM: Differentiable Analysis-by-Synthesis for Fixel Recovery in Diffusion MRI
arXiv:2604.00250v1 Announce Type: new Abstract: Diffusion MRI microstructure fitting is nonconvex and often performed voxelwise, which limits fiber peak recovery in narrow crossings. This work introduces PRISM, a differentiable analysis-by-synthesis framework that fits an explicit multi-compartment forward model end-to-end over spatial patches. The model combines cerebrospinal fluid (CSF), gray matter, up to K white-matter fiber compartments (stick-and-zeppelin), and a restricted compartment, with explicit fiber directions and soft model selection via repulsion and sparsity priors. PRISM supports a fast MSE objective and a Rician negative log-likelihood (NLL) that jointly learns sigma without oracle information. A lightweight nuisance calibration module (smooth bias field and per-measureme

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!