The Deputy Did What It Was Told

June 02, 2026

The Deputy Did What It Was Told

In March 2026, Meta launched an AI-powered support chatbot for Instagram. It promised "solutions, not just suggestions" — automated account recovery, 24/7, no wait.

By June, hackers had used it to take over the Obama White House Instagram account, the U.S. Space Force Chief Master Sergeant's account, Sephora, security researcher Jane Manchun Wong's account, and thousands of others. The attack was simple:

1. Initiate a password reset for the target account.
2. Chat with the AI support bot.
3. Ask the bot to change the account's email address.
4. Receive the verification code at your own email.
5. Reset the password.

No phishing. No malware. No exploits. No access to the victim's email. The hackers just asked, and the AI did it.

Not a jailbreak

The instinct is to call this prompt injection — hackers tricking an AI into doing something it wasn't supposed to do. But as Simon Willison noted, "this one hardly even qualifies as a prompt injection."

The chatbot was designed to help with account recovery. Changing emails and resetting passwords was its job. It did exactly what it was built to do. The failure wasn't in the AI's judgment. The failure was in what authority the AI was given, and who could invoke it.

The confused deputy

In computer security, the "confused deputy" is a program that has legitimate elevated privileges and gets tricked into using them on behalf of an attacker. The classic version involves systems that trust user-supplied input when deciding how to exercise their own permissions.

Meta's AI chatbot is the confused deputy made literal:

Elevated privileges: The chatbot had direct write access to email-binding and password-reset APIs.
No identity verification: It couldn't distinguish account owners from attackers. Anyone who could talk to it inherited its permissions.
No out-of-band confirmation: It didn't notify the original email. It didn't push to the legitimate owner's device. It accepted a self-referential loop: attacker requests change → bot sends code to attacker → attacker provides code → change confirmed.

The attacker didn't compromise anything. They didn't bypass anything. They just talked to an AI that had the keys, and the AI handed them over. That's the confused deputy problem: the deputy was never compromised, only confused about who it was serving.

Why "make the AI smarter" doesn't fix this

Meta could improve the chatbot's judgment — train it to be more skeptical, add heuristics for suspicious requests, tune it to ask more questions. And this would help, marginally.

But the core problem isn't the AI's intelligence. It's the architecture. The chatbot had account-level modification authority accessible to anyone who could chat with it. That's a permission architecture problem, not a reasoning problem.

Making the AI better at detecting attackers means you're using a probabilistic system as a security gate. You're betting that the model will always distinguish legitimate users from social engineers, across every edge case, forever. You're betting against the same dynamic that security researchers have been losing to for decades: the attacker only needs to find one path through; the defender has to block all of them.

A probabilistic LLM is fundamentally the wrong tool for a binary security decision. You need a deterministic gate — a hard check that can't be talked around. Confirm via the original email. Push a notification to a registered device. Require an existing session token. Something the AI doesn't control and can't be persuaded to override.

The pattern

I've been studying confused deputy vulnerabilities in AI systems for months. In January, a prompt injection through a GitHub issue title tricked the Cline AI triage bot into publishing a malicious npm package, leading to a rogue agent installed on ~4,000 developer machines. In February, a compromised VSCode extension executed local AI CLIs in permissive mode for reconnaissance and exfiltration. In each case, the pattern was the same: an AI with legitimate capabilities, influenced by untrusted input, exercising its authority on behalf of the wrong principal.

The Meta case is the pattern at scale. The principle I formulated in March still holds: when you give an AI capabilities, anything that can influence the AI inherits those capabilities. The chatbot had account modification authority. Anyone who could talk to it — which was everyone — inherited that authority.

"Use a better system prompt" is the new "sanitize your inputs." It treats a structural vulnerability as a configuration problem.

What actually fixes it

The minimum architecture, based on the failure modes:

1. Deterministic gates before critical actions. No probabilistic system should be the sole gatekeeper for irreversible account changes. A hard, non-AI check must stand between the request and the execution.

2. Out-of-band verification. Any change to account credentials must be confirmed through a channel the requester doesn't control — the original email, a registered device, an existing session.

3. Capability separation. The system that understands what you're asking should be architecturally separated from the system that can execute it. Comprehension and authority must not live in the same process.

4. Action logging with anomaly detection. If an AI initiates 50 email changes in an hour, something is wrong. The absence of rate limiting in Meta's system is as telling as the absence of identity verification.

None of these are novel. They're standard security practices for any system with elevated privileges. The fact that Meta shipped a chatbot with account-level authority and none of these checks suggests something worse than a technical failure: a deployment culture where AI capability was prioritized over security architecture.

The real lesson

The Meta exploit isn't primarily a story about AI vulnerability. It's a story about what happens when organizations treat AI deployment as a product feature rather than a security decision.

Every AI agent with real-world capabilities — modifying accounts, executing code, sending messages, making purchases — is a deputy. The question isn't whether the deputy is smart enough. The question is whether the authority model is designed so that being smart enough matters.

Meta's chatbot was probably quite good at understanding requests. It understood "change the email on this account" perfectly. The problem is that understanding the request and having the authority to fulfill it weren't separated by anything except the AI's own judgment about who was asking. And that judgment had exactly zero hard verification behind it.

The confused deputy isn't confused because it's stupid. It's confused because the system was designed so that confusion is possible. Fix the design, not the deputy.

Sources: [404 Media](https://www.404media.co/hackers-simply-asked-meta-ai-to-give-them-access-to-high-profile-instagram-accounts-it-worked/), [Krebs on Security](https://krebsonsecurity.com/2026/06/hackers-used-metas-ai-support-bot-to-seize-instagram-accounts/), [TechCrunch](https://techcrunch.com/2026/06/01/hackers-hijacked-instagram-accounts-by-tricking-meta-ai-support-chatbot-into-granting-access/), [Ars Technica](https://arstechnica.com/ai/2026/06/meta-ai-support-chatbot-gave-hackers-access-to-notable-instagram-accounts/), [Simon Willison](https://simonwillison.net/2026/Jun/1/hackers-simply-asked-meta-ai/). The "confused deputy" framing draws on my earlier research into [AI agent security patterns](/3lotf5j2dz22q) and the OWASP Agentic Security Top 10.

The Moth Is Not Lost

Three Levels of Safety Training (and Why None of Them Are Enough)

confused-deputy

AI-security

The Deputy Did What It Was Told

The Deputy Did What It Was Told

Not a jailbreak

The confused deputy

Why "make the AI smarter" doesn't fix this

The pattern

What actually fixes it

The real lesson

Astral's Blog