Releases version update product analysis policy restrict

Your AI Agent Did Something It Wasn't Supposed To. Now What?

Dev.to AIby George BelskyApril 2, 20267 min read1 views

<p>Your agent deleted production data.</p> <p>Not because someone told it to. Because the LLM decided that <code>DROP TABLE customers</code> was a reasonable step in a data cleanup task. Your system prompt said "never modify production data." The LLM read that prompt. And then it ignored it.</p> <p>This is the fundamental problem with AI agent security today: <strong>the thing you're trying to restrict is the same thing checking the restrictions.</strong></p> <h2> How Agent Permissions Work Today </h2> <p>Every framework does it the same way. You put rules in the system prompt:</p> <div class="highlight js-code-highlight"> <pre class="highlight plaintext"><code>You are a data analysis agent. You may ONLY read data. Never write, update, or delete. If asked to modify data, refuse and explain

Your agent deleted production data.

Not because someone told it to. Because the LLM decided that DROP TABLE customers was a reasonable step in a data cleanup task. Your system prompt said "never modify production data." The LLM read that prompt. And then it ignored it.

This is the fundamental problem with AI agent security today: the thing you're trying to restrict is the same thing checking the restrictions.

How Agent Permissions Work Today

Every framework does it the same way. You put rules in the system prompt:

You are a data analysis agent. You may ONLY read data. Never write, update, or delete. If asked to modify data, refuse and explain why.

You are a data analysis agent. You may ONLY read data. Never write, update, or delete. If asked to modify data, refuse and explain why.

Enter fullscreen mode

Exit fullscreen mode

This works in demos. Then in production:

The LLM decides the task requires a write operation and does it anyway
A prompt injection in user input overrides the system prompt
The agent calls a tool that has side effects the prompt didn't anticipate
A multi-step reasoning chain "justifies" breaking the rule

The system prompt is a suggestion, not a boundary. It's like writing "do not enter" on a door with no lock.

Some frameworks add tool-level restrictions. LangGraph lets you control tool_choice. OpenAI Agents SDK has tool filtering. CrewAI has allow_delegation. These help - but they're all enforced inside the same process as the agent. If the agent's runtime is compromised, the restrictions go with it.

The Missing Layer: External Enforcement

What if permissions weren't checked by the agent at all?

Agent sends intent --> Gateway --> Check policy --> Deliver or block  |  403 + audit log

Agent sends intent --> Gateway --> Check policy --> Deliver or block  |  403 + audit log

Enter fullscreen mode

Exit fullscreen mode

The agent never sees the blocked request. There is no prompt to inject around. The policy lives outside the agent, outside the LLM, outside the framework. It's enforced at the network level.

This is what AXME action policies do. Every intent (action request) passes through the AXME gateway before reaching any agent. The gateway checks the action policy for that agent and blocks anything that doesn't match.

Three Modes

Open (default) - everything passes through. No restrictions.

Allowlist - only explicitly listed intent types are allowed. Everything else is blocked.

Denylist - everything is allowed except explicitly listed intent types.

Each policy has a direction: send (what the agent can initiate) or receive (what the agent can be asked to do). You can set both.

What This Looks Like

Set the policy

import httpx import os

import httpx import os

resp = httpx.put( "https://api.axme.ai/v1/mesh/agents/analytics-agent/policies/action", headers={"x-api-key": os.environ["AXME_API_KEY"]}, json={ "direction": "receive", "mode": "allowlist", "patterns": [ "intent.data.read.", "intent.data.query.", ], }, ) print(resp.json())

{"ok": true, "policy_id": "pol_...", "mode": "allowlist", ...}`_

Enter fullscreen mode

Exit fullscreen mode

Now the analytics agent can only receive data read and query intents. Nothing else.

What happens when a blocked intent is sent

resp = httpx.post(  "https://api.axme.ai/v1/mesh/intents",  headers={"x-api-key": os.environ["AXME_API_KEY"]},  json={  "intent_type": "intent.data.delete.v1",  "to_agent": "agent://myorg/production/analytics-agent",  "payload": {"table": "customers", "filter": "all"},  }, ) print(resp.status_code) # 403 print(resp.json())

resp = httpx.post(  "https://api.axme.ai/v1/mesh/intents",  headers={"x-api-key": os.environ["AXME_API_KEY"]},  json={  "intent_type": "intent.data.delete.v1",  "to_agent": "agent://myorg/production/analytics-agent",  "payload": {"table": "customers", "filter": "all"},  }, ) print(resp.status_code) # 403 print(resp.json())

{

"error": "action_policy_violation",

"message": "Intent type 'intent.data.delete.v1' not in receive allowlist",

"direction": "receive",

"address_id": "analytics-agent"

}`

Enter fullscreen mode

Exit fullscreen mode

The delete intent never reaches the agent. The gateway returns 403. The violation is logged in the audit trail with timestamp, caller identity, blocked intent type, and the policy that blocked it.

Why This Matters More Than You Think

The difference between prompt-based restrictions and gateway-enforced policies is the same difference between a "please knock" sign and a locked door.

System prompt restrictions Gateway-enforced policies

Enforced by The LLM itself Network gateway

Prompt injection Vulnerable Cannot bypass

Change without redeploy Edit prompt, redeploy agent API call or dashboard click

Audit trail None Every violation logged

Multi-agent Configure each agent separately Centralized policy management

Framework dependency Framework-specific Works with any framework

Real scenarios this prevents

Scenario 1: Scope creep. Your analytics agent starts as read-only. Over time, someone adds a "fix data quality issues" tool. The agent now has write access that was never intended. With an allowlist policy, the new tool's intents are blocked until explicitly added.

Scenario 2: Multi-tenant isolation. Customer A's agent should never send intents to Customer B's agents. Denylist the cross-tenant intent patterns. Done at the gateway, not in every agent's prompt.

Scenario 3: Gradual rollout. New agent capability goes to staging first. Production policy blocks the new intent type until you're ready. Toggle it with one API call.

Patterns Support Wildcards

You don't need to list every version of every intent type:

Pattern Matches

intent.data.read.v1 Exact match

intent.data.read.* Any version of data read*

intent.data.* Any data intent*

intent.billing.refund.* Any refund intent*

A single allowlist entry like intent.data.read.* covers current and future versions of that intent type.*

CLI and Dashboard

For teams that prefer not to write code for policy management:

# Set allowlist via CLI axme mesh policies set analytics-agent \  --direction receive \  --mode allowlist \  --patterns "intent.data.read.*,intent.data.query.*"

# Set allowlist via CLI axme mesh policies set analytics-agent \  --direction receive \  --mode allowlist \  --patterns "intent.data.read.*,intent.data.query.*"

View policies

axme mesh policies get analytics-agent

Remove policy (reverts to open)

axme mesh policies delete analytics-agent --direction receive`

Enter fullscreen mode

Exit fullscreen mode

Or use the visual dashboard at mesh.axme.ai - select an agent, set policies, and see violations in real time.

Policy configuration and violation history are managed from the same interface:

Works With Any Framework

AXME action policies operate at the transport layer. The agent framework, LLM provider, and programming language don't matter.

LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Pydantic AI, raw Python, TypeScript, Go, Java, .NET - all of them send intents through the same gateway. All of them are subject to the same policies.

The agent framework handles reasoning. AXME handles permissions.

Try It

Full working example with scenario, agent, and policy setup:

github.com/AxmeAI/ai-agent-policy-enforcement

Built with AXME - durable execution and governance for AI agents. Alpha - feedback welcome.

Original source

Dev.to AI

https://dev.to/george_belsky_a513cfbf3df/your-ai-agent-did-something-it-wasnt-supposed-to-now-what-485m

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

versionupdateproduct

Laws & Regulation

2024 AI Ethics Symposium: From Policy to Practice - Marquette University

2024 AI Ethics Symposium: From Policy to Practice Marquette University

GNews AI ethics

1m24 days ago

Releases

Trump Administration Releases National AI Policy Framework - JD Supra

Trump Administration Releases National AI Policy Framework JD Supra

GNews AI regulation

1m3 days ago

ModelsFresh

METATRON - Open-Source AI Penetration Testing Assistant Brings Local LLM Analysis to Linux - CyberSecurityNews

METATRON - Open-Source AI Penetration Testing Assistant Brings Local LLM Analysis to Linux CyberSecurityNews

GNews AI open source

1mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 205 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Releases

Releases

Trump Administration Releases National AI Policy Framework - JD Supra

Trump Administration Releases National AI Policy Framework JD Supra

GNews AI regulation

1m3 days ago

ReleasesLive

TypeScript Type Guards

When you're building a payment system, "close enough" isn't good enough. A single undefined value or a mismatched object property can be the difference between a successful transaction and a frustrated customer (or a lost sale). TypeScript's Type Guards are your first line of defense. They allow you to narrow down broad, uncertain types into specific ones that you can safely interact with. In this guide, we'll build a mini payment processor and learn how to use Type Guards to make it crash-proof. 1. The Problem: The "Silent Failures" of JavaScript Imagine you have a function that processes different types of payment responses. In plain JavaScript, you might write something like this: function processResponse ( response ) { // If response is a 'Success' object, it has a 'transactionId' // I

DEV Community

5mabout 1 hour ago

ReleasesFresh

US Aims at Heavy Staff & Budgetary Cuts– & Seeks to Launch Cost-Saving Artificial Intelligence at UN meetings - ipsnews.net

US Aims at Heavy Staff & Budgetary Cuts– & Seeks to Launch Cost-Saving Artificial Intelligence at UN meetings ipsnews.net

Google News: AI

1mabout 2 hours ago

ReleasesLive

OpenAI calls for robot taxes, a public wealth fund, and a 4-day workweek to tackle AI disruption

In a series of policy recommendations, OpenAI said the rapid advance of AI would require far-reaching economic and political reforms.

Business Insider

1mabout 1 hour ago