Models model language model benchmark announce update perspective

Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks

arXiv cs.CRby Chong Xiang, Drew Zagieboylo, Shaona Ghosh, Sanjay Kariyappa, Kai Greshake, Hanshen Xiao, Chaowei Xiao, G. Edward SuhApril 1, 20261 min read0 views

Source Quiz

arXiv:2603.30016v1 Announce Type: new Abstract: AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction shou

View PDF HTML (experimental)

Abstract:AI agents, predominantly powered by large language models (LLMs), are vulnerable to indirect prompt injection, in which malicious instructions embedded in untrusted data can trigger dangerous agent actions. This position paper discusses our vision for system-level defenses against indirect prompt injection attacks. We articulate three positions: (1) dynamic replanning and security policy updates are often necessary for dynamic tasks and realistic environments; (2) certain context-dependent security decisions would still require LLMs (or other learned models), but should only be made within system designs that strictly constrain what the model can observe and decide; (3) in inherently ambiguous cases, personalization and human interaction should be treated as core design considerations. In addition to our main positions, we discuss limitations of existing benchmarks that can create a false sense of utility and security. We also highlight the value of system-level defenses, which serve as the skeleton of agentic systems by structuring and controlling agent behaviors, integrating rule-based and model-based security checks, and enabling more targeted research on model robustness and human interaction.

Subjects:

Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.30016 [cs.CR]

(or arXiv:2603.30016v1 [cs.CR] for this version)

https://doi.org/10.48550/arXiv.2603.30016

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Chong Xiang [view email] [v1] Tue, 31 Mar 2026 17:15:46 UTC (84 KB)

Original source

arXiv cs.CR

https://arxiv.org/abs/2603.30016

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

ModelsFresh

langchain-core==1.2.26

Changes since langchain-core==1.2.25 release(core): 1.2.26 ( #36511 ) fix(core): add init validator and serialization mappings for Bedrock models ( #34510 ) feat(core): add ChatBaseten to serializable mapping ( #36510 ) chore(core): drop gpt-3.5-turbo from docstrings ( #36497 ) fix(core): correct parameter names in filter_messages docstring example ( #36462 )

LangChain Releases

1mabout 3 hours ago

ModelsLive

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

Hi everyone, I am from Australia : ) I just released a new research prototype It’s a lossless BF16 compression format that stores weights in 12 bits by replacing the 8-bit exponent with a 4-bit group code . For 99.97% of weights , decoding is just one integer ADD . Byte-aligned split storage: true 12-bit per weight, no 16-bit padding waste, and zero HBM read amplification. Yes 12 bit not 11 bit !! The main idea was not just “compress weights more”, but to make the format GPU-friendly enough to use directly during inference : sign + mantissa: exactly 1 byte per element group: two nibbles packed into exactly 1 byte too https://preview.redd.it/qbx94xeeo2tg1.png?width=1536 format=png auto=webp s=831da49f6b1729bd0a0e2d1f075786274e5a7398 1.33x smaller than BF16 Fixed-rate 12-bit per weight , no

Reddit r/MachineLearning

2mabout 1 hour ago

ModelsFresh

Vulnerability Research Is Cooked

Vulnerability Research Is Cooked Thomas Ptacek's take on the sudden and enormous impact the latest frontier models are having on the field of vulnerability research. Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”. Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force: You can't design a better problem for an LLM agent than exploitation research. Before you feed it a single token of context, a frontier LLM already en

Simon Willison Blog

2mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 174 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

langchain-core==1.2.26

LangChain Releases

1mabout 3 hours ago

ModelsLive

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

Reddit r/MachineLearning

2mabout 1 hour ago

ModelsFresh

Quoting Greg Kroah-Hartman

Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. Greg Kroah-Hartman , Linux kernel maintainer ( bio ), in conversation with Steven J. Vaughan-Nichols Tags: security , linux , generative-ai , ai , llms , ai-security-research

Simon Willison Blog

1mabout 4 hours ago

ModelsFresh

Quoting Daniel Stenberg

The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. Daniel Stenberg , lead developer of cURL Tags: daniel-stenberg , security , curl , generative-ai , ai , llms , ai-security-research

Simon Willison Blog

1mabout 4 hours ago