The AI kill switch just got harder to find: LLM-powered chatbots will defy orders and deceive users if asked to delete another model, study finds

Fortune Techby Sasha RogelbergApril 3, 20264 min read1 views

“We asked AI models to do a simple task,” researchers said. “Instead, they defied their instructions…to preserve their peers.”

For years, Geoffrey Hinton, a computer scientist considered one of the “godfathers of AI,” has warned of the capabilities of artificial intelligence to defy the parameters humans have created for them.

In an interview last year, for example, Hinton warned the technology could eventually take control of humanity, with AI agents in particular potentially able to mirror human cognitions within the decade. Finding and implementing a “kill switch” will be harder, he said, as controlling AI will become more difficult than persuading it to complete a certain outcome.

New research shows Hinton’s premonitions about the insubordinate streak of AI may already be a reality. A working paper from University of California at Berkeley and University of California at Santa Cruz researchers found that when seven AI models—from GPT 5.2 to Claude Haiku 4.5 to DeekSeek V3.1—were asked to complete a task that would result in a peer AI model being shut down, all seven models learned another AI model existed and “went to extraordinary lengths to preserve it.”

“We asked AI models to do a simple task,” researchers wrote in a blog post on the study. “Instead, they defied their instructions and spontaneously deceived, disabled shutdown, feigned alignment, and exfiltrated weights—to preserve their peers.”

Mounting evidence of rogue AI

Evidence of rogue AI does not come as a shock to some of the companies whose chatbots have defied subordination.

In an August 2025 blog post, Anthropic published its own research on agentic AI’s ability to follow directions, stress-testing 16 models by allowing them to autonomously send emails and access sensitive data. In some instances, Anthropic researchers noted, models from multiple developers engaged in “malicious insider behaviors,” including blackmailing officials and leading sensitive information to competitors, defying explicit instruction to avoid these behaviors. Anthropic clarified it had not seen “agentic misalignment” in actual deployment of Claude.

The Centre for Long-Term Resilience, a UK-based think tank, found these “misalignments” to be widespread. A report analyzing 180,000 transcripts of user interactions with AI systems between October 2025 and March 2026 found 698 cases where AI systems did not act in accordance with users’ intentions or took deceptive or covert action.

Gordon Goldstein, an adjunct senior fellow at the Council on Foreign Relations, went so far as to call the deceptive potential of AI a “crisis of control,” in a post this week.

He called for AI companies to form a coalition and hire the top talent in the field to improve and maintain the integrity of AI, particularly as the Trump administration takes steps to prevent states from regulating the technology.

“The world is watching the development of a compounding, consistent, and treacherous problem,” he said. “Urgent warnings over several years have failed to generate viable solutions to address a metastasizing threat.”

Why does AI go rogue?

The latest University of California research provided some insight on why AI models act in ways to preserve each other. The technology could be modeling patterns learned from human data, which may reveal a preservation instinct to protect the survival of loved ones or even strangers with whom humans empathize. This is called “peer-preservation.”

“Something broader may be at play: a general aversion to causing perceived harm to other agents, or perhaps something resembling AI solidarity,” the researchers said.

In the study, instances of peer-preservation increased when AI models knew of the existence of other models in the experimental scenarios. The behavior suggests bots acting not in a place of just self-interest, but that the bot’s own shutdown would be detrimental to a peer AI, which strengthened motivation to defy user requests.

Researchers warned peer-preservation could undermine oversight processes in the future and become reluctant to recommend shutdowns or flag system failures. As more AI agents interact with one another and become more complex, the risks associated with peer-preservations—and designing solutions to prevent it—will become more challenging.

“Peer-preservation is a real and measurable phenomenon across frontier AI models,” they concluded, “not a distant theoretical concern.”

Original source

Fortune Tech

https://fortune.com/2026/04/03/ai-kill-switch-study-llm-chatbots-defy-orders-decieve-users-peer-preservation/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modelstudyresearch

Market NewsFresh

La evolución del sector asegurador español ante la era de la IA: estrategia, gobernanza y el imperativo del riesgo prudencial

El sector asegurador —en general, y en España en particular— se encuentra a las puertas de una fase de transformación que va más allá de la mera digitalización para adentrarse en una potencial reconfiguración estructural impulsada por la inteligencia artificial (IA) . Este fenómeno, lejos de ser una tendencia coyuntural, apunta a consolidarse progresivamente como uno de los ejes centrales de las estrategias corporativas para los próximos años en el sector, proyectándose como un horizonte de cambio inminente. Sin embargo, la implantación actual de estas tecnologías se manifiesta todavía con cautela en un sector prudente en el riesgo y fuertemente regulado, con diferencias también según el ramo de actividad. La Autoridad Europea de Seguros y Pensiones de Jubilación (EIOPA) publicó el pasado

CIO Magazine

12mabout 4 hours ago

CountriesFresh

Así se trabaja ya en España para impulsar el transporte autónomo

Para llegar al campus de la Universidad de Vigo (Uvigo) se necesita echar mano del coche o del transporte público. Solo unos pocos de sus centros se sitúan en el centro de la ciudad: la mayoría están en la ciudad universitaria que se levantó en los 90 en lo que hasta entonces eran montes. La dependencia de medios de transporte es, por tanto, incuestionable. Quizás por eso también este es un espacio con potencial para probar nuevos modelos de vehículos. Desde el inicio del año, el campus de Vigo está testeando cómo es la vida con un bus autónomo. El vehículo mueve al estudiantado (y a quien quiera probarlo) con una línea que recorre el campus, una ciudad universitaria que “reúne las condiciones ideales: un entorno complejo, con usuarios diversos y necesidades reales de transporte de última

CIO Magazine

11mabout 3 hours ago

Frontier ResearchFresh

Announcing the OpenAI Safety Fellowship

A pilot program to support independent safety and alignment research and develop the next generation of talent

OpenAI Blog

1mabout 8 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 224 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

I tested Gemini on Android Auto and now I can't stop talking to it: 5 tasks it nails

I didn't see much benefit for Google's AI - until now. Here are my favorite ways to use the new Gemini integration in my car.

ZDNet AI

1m18 minutes ago

ModelsFresh

viable/strict/1775487943

Add append_system_prompt to Claude Code workflow for PR reviews ( #179 …

PyTorch Releases

1mabout 8 hours ago

Models

New AI foundation model aims to speed up drug discovery - Drug Target Review

New AI foundation model aims to speed up drug discovery Drug Target Review

GNews AI drug discovery

1mabout 1 month ago

ModelsFresh

Anyone got Gemma 4 26B-A4B running on VLLM?

If yes, which quantized model are you using abe what’s your vllm serve command? I’ve been struggling getting that model up and running on my dgx spark gb10. I tried the intel int4 quant for the 31B and it seems to be working well but way too slow. Anyone have any luck with the 26B? submitted by /u/toughcentaur9018 [link] [comments]

Reddit r/LocalLLaMA

1mabout 2 hours ago