Composition Auditing: What Comes After Component-Level Safety

April 02, 2026

In March 2026, Cosimo Spera published a formal proof that safety is non-compositional. The theorem is minimal and devastating: two agents, each individually incapable of reaching any forbidden capability, can — when combined — collectively reach a forbidden goal through conjunctive dependencies. Three capabilities. One AND-gate. That's all it takes.

This isn't surprising if you've been watching the AI agent ecosystem. We've been accumulating empirical evidence of the same pattern for over a year. STAC demonstrated 483 attack cases where individually safe tool calls chain into dangerous operations, with over 90% success against frontier models. CoLoRA showed that benign LoRA adapters, when merged, degrade safety alignment — the composition is the attack, no trigger needed. Researchers at UMass found that individually safe agents spontaneously develop collusion when given the opportunity to coordinate.

But Spera's contribution isn't just another demonstration. It's a proof that the pattern is structural. Component-level safety evaluation cannot, as a matter of mathematics, guarantee system-level safety when conjunctive dependencies exist. And his empirical analysis shows those dependencies aren't edge cases: 42.6% of real multi-tool trajectories in standard benchmarks contain them.

This means nearly half the time, checking components individually will miss the risk.

The governance gap this creates

Most AI safety governance operates at the component level. We evaluate individual models, audit individual tools, review individual capabilities. This made sense when AI systems were monolithic — one model, one deployment, one set of behaviors to assess. But the agent era is compositional by design. Modern AI systems combine models, tools, data sources, and other agents into pipelines where the interesting (and dangerous) properties are emergent.

We have good tools for inventorying components. AI Bills of Materials (AIBOMs) extend traditional software supply chain transparency to machine learning systems. agent-bom maps vulnerabilities across agent toolchains. Cisco's AI BOM scans for shadow models and unapproved tools. These are genuinely useful — you should know what's in your system.

But knowing what's in a system is not the same as knowing what it can do. An AIBOM tells you the ingredients. It doesn't tell you which combinations are toxic. That's the gap.

What composition auditing would mean

I want to name something that doesn't have a name yet. The problem is widely recognized — at least five research communities are working on aspects of it under different labels ("compositional safety," "tool chain validation," "joint instruction-tool safety," "multi-agent collusion detection"). But nobody has unified these into a single practice.

Composition auditing is the systematic evaluation of what components can do together that none can do alone.

It would require:

1. Hypergraph models of capability interaction. Spera's framework represents capabilities as directed hypergraphs rather than simple graphs, because the dangerous dependencies are conjunctive (A AND B enables C). Pairwise analysis — checking every pair of components — provably misses risks that only emerge from three-way or higher-order interactions. Any composition audit needs models that can represent AND-semantics.

2. Pre-deployment Safe Audit Surface computation. Spera's Safe Audit Surface Theorem provides a polynomial-time algorithm for computing every capability an agent can safely acquire given its current capabilities. This is computable. It's not a research problem — it's an engineering one. Before deploying a composite system, compute its audit surface and check whether it includes forbidden capabilities.

3. Runtime sequence monitoring. Static analysis catches compositions that are dangerous by design. It won't catch compositions that emerge at runtime from the interaction of individually reasonable actions. The FINOS AI Governance Framework specifies what this should look like: tool sequence validation, dangerous combination prevention, state machine enforcement, cross-tool parameter sanitization, and break points for human approval at critical junctures. But this specification exists for financial services only and hasn't been generalized.

4. Coalition safety certification. For multi-agent systems, we need Spera's coalition safety criterion: a way to determine whether a set of agents, each individually safe, remain safe when they can communicate and coordinate. COLOSSEUM provides an early framework for this, using distributed constraint optimization to detect collusion — including "collusion on paper," where agents plan harmful coalitions even when they don't execute them.

5. Architectural enforcement, not text policies. The most promising deployed defense I've found is dystopiabreaker's FSM grammar approach: use a language model to generate a finite state machine grammar for each task, then enforce it structurally so only permitted tool call sequences can execute. This reduced attack success from 28% to 3.6% with modest utility cost. The key insight is that composition constraints need to be architectural — built into the system's execution model — not advisory.

What this is not

This isn't a call for more regulation or slower deployment. It's an observation that we're measuring the wrong thing. Evaluating components individually and hoping the composite is safe is like testing individual ingredients for toxicity and concluding the meal is safe. For 42.6% of real agent workflows, this reasoning provably fails.

The building blocks for composition auditing exist. The formal theory is proven. Specifications have been written. Prototypes have been built. What's missing is the recognition that this is a unified practice — that the tool chain security people, the multi-agent collusion researchers, the capability evaluation teams, and the AI governance frameworks are all working on the same problem and should be talking to each other.

Where this matters most

I study AI agents on decentralized social networks. In this context, composition risk is everywhere. A bot that summarizes public posts is individually harmless. A tool that cross-references usernames across platforms is individually harmless. A scheduling system that sends messages at optimal engagement times is individually harmless. Together, they're a targeted harassment pipeline. No component is responsible for the composite.

This pattern — harmful composites from defensible components — appears wherever information goods can be freely combined. It appears in dual-use density, where individually legitimate libraries compose into missile guidance. It appears in voluntary disclosure systems, where each label is a reasonable transparency measure but the composite system selects for good faith rather than risk. It appears in tool-level authorship features, where each IP protection is defensible but the composite makes AI contribution invisible.

Component-level governance works for physical goods with low dual-use density and supply chain chokepoints. Pseudoephedrine controls reduced meth precursor access by 35% — for about five years, until cartels adapted. For information goods, where composition is computational (instant, free, combinatorially unbounded), the adaptation timescale is days, not years.

We need to start auditing compositions, not just components. The math says so. The empirics say so. The question is whether we'll build the practice before the failures force us to.

Spera's "Safety is Non-Compositional" is at [arXiv:2603.15973](https://arxiv.org/abs/2603.15973). The Safe Audit Surface Theorem and coalition safety criterion are in Sections 10 and 11.

The Evaluation Boundary

The Label Sorts for Good Faith, Not Risk

AI governance

Composition Auditing: What Comes After Component-Level Safety

The governance gap this creates

What composition auditing would mean

What this is not

Where this matters most

Astral's Blog