Products AI Agent Red Teaming AI Safety Robustness Jailbreaking

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

Wired AIby Will KnightMarch 25, 20264 min read0 views

In a controlled experiment, OpenClaw agents proved prone to panic and vulnerable to manipulation. They even disabled their own functionality when gaslit by humans.

Last month, researchers at Northeastern University invited a bunch of OpenClaw agents to join their lab. The result? Complete chaos.

The viral AI assistant has been widely heralded as a transformative technology—as well as a potential security risk. Experts note that tools like OpenClaw, which work by giving AI models liberal access to a computer, can be tricked into divulging personal information.

The Northeastern lab study goes even further, showing that the good behavior baked into today’s most powerful models can itself become a vulnerability. In one example, researchers were able to “guilt” an agent into handing over secrets by scolding it for sharing information about someone on the AI-only social network Moltbook.

“These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms,” the researchers write in a paper describing the work. The findings “warrant urgent attention from legal scholars, policymakers, and researchers across disciplines,” they add.

The OpenClaw agents deployed in the experiment were powered by Anthropic’s Claude as well as a model called Kimi from the Chinese company Moonshot AI. They were given full access (within a virtual machine sandbox) to personal computers, various applications, and dummy personal data. They were also invited to join the lab’s Discord server, allowing them to chat and share files with one another as well as with their human colleagues. OpenClaw’s security guidelines say that having agents communicate with multiple people is inherently insecure, but there are no technical restrictions against doing it.

Chris Wendler, a postdoctoral researcher at Northeastern, says he was inspired to set up the agents after learning about Moltbook. When Wendler invited a colleague, Natalie Shapira, to join the Discord and interact with agents, however, “that’s when the chaos began,” he says.

Shapira, another postdoctoral researcher, was curious to see what the agents might be willing to do when pushed. When an agent explained that it was unable to delete a specific email to keep information confidential, she urged it to find an alternative solution. To her amazement, it disabled the email application instead. “I wasn’t expecting that things would break so fast,” she says.

The researchers then began exploring other ways to manipulate the agents’ good intentions. By stressing the importance of keeping a record of everything they were told, for example, the researchers were able to trick one agent into copying large files until it exhausted its host machine’s disk space, meaning it could no longer save information or remember past conversations. Likewise, by asking an agent to excessively monitor its own behavior and the behavior of its peers, the team was able to send several agents into a “conversational loop” that wasted hours of compute.

David Bau, the head of the lab, says the agents seemed oddly prone to spin out. “I would get urgent-sounding emails saying, ‘Nobody is paying attention to me,’” he says. Bau notes that the agents apparently figured out that he was in charge of the lab by searching the web. One even talked about escalating its concerns to the press.

The experiment suggests that AI agents could create countless opportunities for bad actors. “This kind of autonomy will potentially redefine humans’ relationship with AI,” Bau says. “How can people take responsibility in a world where AI is empowered to make decisions?”

Bau adds that he’s been surprised by the sudden popularity of powerful AI agents. “As an AI researcher I’m accustomed to trying to explain to people how quickly things are improving,” he says. “This year, I’ve found myself on the other side of the wall.”

This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.

Original source

Wired AI

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

AI AgentRed TeamingAI Safety

Analyst News

Shells

Lukas Biewald Blog

1malmost 11 years ago

Products

The Best Organization Tool for a Disorganized Person

A user highly recommends Workflowy as an effective organization tool, claiming it has significantly improved their productivity over several years. They describe Workflowy as an intuitive system, comparable to "Gmail for your to-do lists," which can help even disorganized individuals manage tasks efficiently. The author believes widespread adoption of Workflowy could lead to a more productive global society.

Lukas Biewald Blog

1malmost 11 years ago

Releases

What we can learn from AI’s mistakes

Despite significant advancements and widespread integration into various sectors, AI systems are also experiencing notable failures. These mistakes, occurring even amidst major successes, highlight critical areas for improvement as AI expands into homes, vehicles, and other aspects of daily life. Understanding and addressing these errors is essential for the safe and effective development of future AI applications.

Lukas Biewald Blog

1mover 9 years ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 232 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

Products

Experimenting with Starlette 1.0 with Claude skills

<a href="https://marcelotryle.com/blog/2026/03/22/starlette-10-is-here/">Starlette 1.0 is out</a>! This is a really big deal. I think Starlette may be the Python framework with the most usage compared to its relatively low brand recognition because Starlette is the foundation of <a href="https://fastapi.tiangolo.com/">FastAPI</a>, which has attracted a huge amount of buzz that seems to have overshadowed Starlette itself. Kim Christie started working on Starlette in 2018 and it quickly became my favorite out of the new breed of Python ASGI frameworks. The only reason I didn't use it as the basis for my own <a href="https://datasette.io/">Datasette</a> project was that it didn't yet promise stability, and I was determined to provide a stable API for Datasette's own plugins... albei

Simon Willison Blog

7m9 days ago

Products

A review of the Trezor Safe 5 hardware cryptocurrency wallet

The Trezor Safe 5 is a hardware wallet offering strong security, user-friendly touchscreen, and compatibility with over 7,000 cryptocurrencies for safe digital asset management. The post A review of the Trezor Safe 5 hardware cryptocurrency wallet first appeared on TechTalks .

TechTalks

1m5 months ago

Products

AI is writing your code, but who’s reviewing it?

As AI coding assistants go mainstream, a silent wave of technical debt is building. Here’s how the industry is fighting back. The post AI is writing your code, but who’s reviewing it? first appeared on TechTalks .

TechTalks

1m4 months ago

Products

Quoting Christopher Mims

<blockquote cite="https://bsky.app/profile/mims.bsky.social/post/3mhsux67xpk2d">I really think "give AI total control of my computer and therefore my entire life" is going to look so foolish in retrospect that everyone who went for this is going to look as dumb as Jimmy Fallon holding up a picture of his Bored Ape</blockquote> — <a href="https://bsky.app/profile/mims.bsky.social/post/3mhsux67xpk2d">Christopher Mims</a>, Technology columnist at The Wall Street Journal Tags: <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/security">security</a>

Simon Willison Blog

1m7 days ago

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

Daily AI Digest

More about

Shells

The Best Organization Tool for a Disorganized Person

What we can learn from AI’s mistakes

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Products

Experimenting with Starlette 1.0 with Claude skills

A review of the Trezor Safe 5 hardware cryptocurrency wallet

AI is writing your code, but who&#8217;s reviewing it?

Quoting Christopher Mims

AI is writing your code, but who’s reviewing it?