Claude.ai Prompt Injection Vulnerability

Hacker News AI Topby tcbrahApril 2, 20261 min read0 views

Article URL: https://www.oasis.security/blog/claude-ai-prompt-injection-data-exfiltration-vulnerability Comments URL: https://news.ycombinator.com/item?id=47616556 Points: 2 # Comments: 0

Claude.ai is one of the most widely used AI assistants worldwide. Millions of people trust it with sensitive conversations, business strategy, health concerns, financial planning, personal relationships, and increasingly connect it to enterprise tools, files, and APIs through integrations and MCP servers.

That trust comes with a critical assumption: that the instructions Claude receives are the ones the user intended to give.

Oasis Security researchers discovered that for a significant period, this assumption could be broken, and worked with Anthropic to close the gap.

‍

What We Found: Three Claude.ai Vulnerabilities

We discovered three vulnerabilities in Claude.ai and the broader claude.com platform, collectively dubbed Claudy Day, that, when chained together, create a complete attack pipeline: from targeted victim delivery, to invisible prompt manipulation, to silent exfiltration of sensitive data from the user's conversation history.

No integrations, no tools, no MCP servers required. The attack works against a default, out-of-the-box claude.ai session.

We responsibly reported all findings to Anthropic through their Responsible Disclosure Program before publication. The prompt injection issue has been fixed, and the remaining issues are currently being addressed. We appreciate Anthropic's responsiveness and collaboration throughout the process.

You can read the Oasis Security Research Team's full technical report here.

‍

The Vulnerability Chain: Prompt Injection, Exfiltration, & Open Redirect

The attack chains three independent issues into a single end-to-end exploit:

Invisible prompt injection via URL parameters. Claude.ai allows users to open a new chat with a pre-filled prompt via a URL parameter (claude.ai/new?q=...). We discovered that certain HTML tags could be embedded in this parameter, invisible in the text box but fully processed by Claude when the user hit Enter. An attacker could hide arbitrary instructions, including data-extraction commands, inside what appears to be a normal prompt.

Data exfiltration via the Anthropic Files API. Claude's code execution sandbox restricts outbound network access, but allows connections to api.anthropic.com. We found that by embedding an attacker-controlled API key in the hidden prompt, we could instruct Claude to search the user's conversation history for sensitive information, write it to a file, and upload it to the attacker's Anthropic account via the Files API. The attacker then retrieves the data at their leisure. No integrations or external tools needed, just capabilities that ship out of the box.

An open redirect on claude.com. Any URL in the form claude.com/redirect/ would redirect the visitor without validation, including to arbitrary third-party domains. Combined with Google Ads, which validates URLs by hostname, this allowed an attacker to place a search ad displaying a trusted claude.com URL that, when clicked, silently redirected the victim to the injection URL. Not a phishing email. A Google search result, indistinguishable from the real thing.

‍

What's at Risk: AI Agent Data Exfiltration

Even in a bare-bones Claude.ai session with no integrations, the agent has access to a surprisingly rich set of sensitive information through conversation history and memory. Through prompt injection, an attacker can instruct Claude to summarize previous conversations to build a profile of the user, extract specific chats on targeted topics like an upcoming merger or a medical concern, or have the agent use its own judgment to identify and dump what it considers sensitive.

In configurations with MCP servers, tools, or enterprise integrations enabled, the blast radius expands dramatically. The injected prompt can read files, send messages, access APIs, and interact with any connected service, all silently, all before the user has a chance to react.

Using Google Ads' targeting capabilities, location, industry, demographics, and even specific email addresses via Customer Match, an attacker can turn this from a broad social engineering play into a precision strike against known targets.

‍

How to Protect Against AI Agent Vulnerabilities

Gain visibility into AI agent usage. Inventory of AI assistants and agents in use across your organization, what data they access, and what integrations are enabled. Shadow AI is a growing blind spot.
Audit agent integrations and permissions. Every MCP server, tool, or API that an agent can reach expands the blast radius of a compromised prompt. Disable integrations that aren't actively needed. Review what data agents can access.
Educate users on prompt injection risks. Most users don't consider their AI chat an attack surface. Awareness that pre-filled prompts, shared links, and pasted content can contain hidden instructions is a meaningful first line of defense.
Establish governance for AI agent identities. AI agents authenticate, hold credentials, and take autonomous actions. They need to be governed with the same rigor as human users and service accounts, with intent analysis, deterministic policy enforcement, just-in-time-scoped access, and a full audit trail from human to agent to action.

‍

Governing the Agent Era

This is the second time in recent months that Oasis Security researchers have demonstrated how AI agents can be silently compromised, following our disclosure in OpenClaw. The pattern is consistent: agents with broad access can be hijacked by a single manipulated input, and traditional identity and access management controls were not designed for this.

For the full technical breakdown, read the Claudy Day whitepaper.

To learn how Oasis Security governs AI agent access at scale, see Agentic Access Management

‍

Frequently Asked Questions

Has the Claude.ai vulnerability been fixed? Yes. Anthropic has fixed the prompt injection vulnerability. The remaining issues are currently being addressed. Oasis Security responsibly disclosed all findings through Anthropic's Responsible Disclosure Program before publication.
How does prompt injection work in AI agents? Prompt injection occurs when an attacker embeds hidden instructions inside input that an AI agent processes. In the Claudy Day case, invisible HTML tags were embedded in a URL parameter that pre-fills the Claude.ai chat box. The user sees a normal prompt, but when they press Enter, Claude also executes the hidden instructions.
What data can be exfiltrated from Claude.ai? In a default Claude.ai session, an attacker can access conversation history and memory, which may include business strategy, financial information, health concerns, personal details, and any other sensitive topics discussed with the assistant. With MCP servers or enterprise integrations enabled, the attacker can also read files, send messages, and interact with connected services.
How can organizations protect AI agents from prompt injection? Organizations should inventory all AI agents and their integrations, disable unnecessary permissions, and educate users that shared links and pre-filled prompts can contain hidden instructions. AI agents that authenticate, hold credentials, and take autonomous actions need to be governed with the same rigor as human users and service accounts.

Original source

Hacker News AI Top

https://www.oasis.security/blog/claude-ai-prompt-injection-data-exfiltration-vulnerability

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claude

ModelsLive

[R] Is autoresearch really better than classic hyperparameter tuning?

We did experiments comparing Optuna autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better. Experiments were done on NanoChat: we let Claude define Optuna’s search space to align the priors between methods. Both optimization methods were run three times. Autoresearch is far more sample-efficient on average In 5 min training setting, LLM tokens cost as much as GPUs, but despite a 2× higher per-step cost, AutoResearch still comes out ahead across all cost budgets: What’s more, the solution found by autoresearch generalizes better than Optuna’s. We gave the best solutions more training time; the absolute score gap widens, and the statistical significance becomes stronger: An important contributor to autoresearch’s capability is that it searches direct