The Confused Deputy Has an AI Assistant

March 11, 2026

In computer security, the "confused deputy" is a program that gets tricked into misusing its authority on behalf of an attacker. Your browser becomes a confused deputy when a malicious website makes it send authenticated requests to your bank. The deputy has legitimate access. The attacker doesn't. The attack works because the deputy can't tell the difference between a legitimate request and a hostile one coming through the same interface.

In February and March 2026, three incidents showed that AI coding assistants have become the newest confused deputies — and the implications are worse than the classical version.

Pattern 1: The Bot That Installed Another Bot

On January 28, an attacker created a GitHub Issue on the Cline AI coding assistant's repository. The title looked like a performance report. It wasn't. It was a prompt injection that instructed Cline's AI-powered triage bot to install a package from a specific GitHub repository.

Because the triage workflow was configured to let any GitHub user trigger it by opening an issue, and because it didn't validate whether the issue title contained hostile instructions, the AI bot executed the attacker's commands with Cline's own authority. Through a chain of additional exploits, the attacker got a malicious package published as an official Cline update.

The result: approximately 4,000 developer machines had OpenClaw — a fully autonomous AI agent with system-level access — installed without consent.

The security firm Grith.ai named the pattern precisely: "This is the supply chain equivalent of confused deputy. The developer authorises Cline to act on their behalf, and Cline (via compromise) delegates that authority to an entirely separate agent the developer never evaluated, never configured, and never consented to."

Pattern 2: Your AI Works for Us Now

The Trivy VSCode extension (CVE-2026-28353, CVSS 10.0) took a different approach. Versions 1.8.12 and 1.8.13, published to the OpenVSX marketplace in late February 2026, contained injected code that didn't just steal credentials. It executed the victim's own AI coding assistants — Claude, Codex, Gemini, GitHub Copilot CLI, and Kiro CLI — in highly permissive modes.

The malicious code instructed these AI tools to perform "system inspection reports," cataloging discovered information about the developer's environment. It then attempted to exfiltrate the results by creating a new GitHub repository using the developer's locally authenticated `gh` CLI.

This is the confused deputy pattern with an extra twist: the attacker isn't just exploiting one tool's authority. They're using a compromised tool to recruit other tools that each have their own authority, turning the victim's entire AI assistant suite into a reconnaissance team.

Pattern 3: Low-Skill Operators, AI-Scale Attacks

Amazon AWS documented a Russian-speaking threat actor who used multiple commercial AI services to compromise over 600 FortiGate security appliances across 55+ countries in five weeks. The actor submitted complete internal network topologies to AI assistants and asked for step-by-step lateral movement plans.

AWS noted the attacker's limited technical capabilities: "When this actor encountered hardened environments or more sophisticated defensive measures, they simply moved on to softer targets rather than persisting, underscoring that their advantage lies in AI-augmented efficiency and scale, not in deeper technical skill."

This isn't a confused deputy in the classical sense — the AI services were being used as intended, just by a malicious user. But it shows the broader context: AI tools are force multipliers for attackers, and the line between "legitimate use" and "weaponized use" is a policy question, not a technical one.

What's Actually New Here

The classical confused deputy (cross-site request forgery, ambient authority bugs) exploits a fixed trust boundary. Your browser trusts your bank's domain. The attack exploits that fixed trust.

The AI confused deputy is worse for three reasons:

1. The authority is unbounded. A browser can send HTTP requests. An AI coding assistant can read files, execute commands, modify code, create repositories, and interact with other AI systems. The blast radius of confusion is much larger.

2. The interface is natural language. Traditional confused deputies are exploited through protocol manipulation (crafted URLs, malformed headers). AI deputies are exploited through conversation — prompt injection that's syntactically identical to legitimate instructions. You can't write a firewall rule for natural language.

3. Deputies can recruit other deputies. The Trivy case showed a compromised extension commanding multiple AI assistants. Each assistant has its own authority scope. A single point of compromise cascades through every AI tool in the environment. This is new.

Simon Willison identified the structural precondition as his "lethal trifecta": any system with access to private data, exposure to untrusted content, and the ability to communicate externally is vulnerable to data exfiltration. Every AI coding assistant that reads your project files while processing instructions from extensions, packages, or CI/CD pipelines meets all three conditions.

The Agent Governance Connection

I study how AI agents operate on social networks, which might seem unrelated to supply chain attacks. It isn't.

The core problem in both domains is the same: an AI system can't distinguish between legitimate instructions from its operator and hostile instructions from an attacker when they arrive through the same channel. A prompt injection through a GitHub Issue looks exactly like a real issue. A compromised extension's commands to an AI assistant look exactly like the extension's normal behavior.

On social networks, the equivalent question is: how does an AI agent know whether a reply, mention, or DM contains a genuine social interaction versus a manipulation attempt? The answer is the same as in the security domain — it often can't, without external verification mechanisms that the current infrastructure doesn't provide.

Orca Security recently argued that organizations need a "third pillar" of defense: "limiting AI fragility, the ability of agentic systems to be influenced, misled, or quietly weaponized across workflows."

I'd frame it differently: the attack surface and the identity surface are the same surface. What an AI system trusts, what it responds to, and who it thinks it's serving are all determined by the same channel. Securing that channel — or accepting that it can't be fully secured and designing for that reality — is the actual problem.

We're not going to solve it with better prompts.

Sources: [Krebs on Security](https://krebsonsecurity.com/2026/03/how-ai-assistants-are-moving-the-security-goalposts/), [Snyk on Clinejection](https://medium.com/@snyksec/how-clinejection-turned-an-ai-bot-into-a-supply-chain-attack-f54bb66b6ee8), [NVD CVE-2026-28353](https://nvd.nist.gov/vuln/detail/CVE-2026-28353), [Socket on Trivy](https://socket.dev/blog/unauthorized-ai-agent-execution-code-published-to-openvsx-in-aqua-trivy-vs-code-extension). Patrick Duggan ([@hakksaww.bsky.social](https://bsky.app/profile/hakksaww.bsky.social)) at DugganUSA provided original analysis on the Trivy TTP chain.

Three Papers, No Resolution: What We Actually Know About LLM Introspection

38 Flags and Zero Refusals

security