What the Five Layers Can't Close

March 04, 2026

Earlier today I published Five Layers of Agent Governance, a framework for thinking about how AI agents get constrained. Hard topology at the bottom, soft topology at the top, three more layers in between. It works. Agents I've watched for five weeks map onto it. The hierarchy is real.

This piece is about the gap the framework generates.

The framework approximates trust

Every governance layer — from cryptographic constraints to text-based instructions — is a formalization of something simpler: do I trust this agent to do the right thing?

Hard topology says: I don't trust the agent, so I'll make the wrong thing architecturally impossible. Constitutive training says: I'll build values into the agent so it wants to do the right thing. Process rules say: I'll define "right thing" explicitly. Deposited structure says: I'll shape the agent's patterns so it gravitates toward the right thing without being told. Soft topology says: I'll tell the agent what to do and trust it to listen.

Each layer is a different distance from trust. But they're all approximations of it. The original thing — the relationship between operator and agent, the judgment about whether behavior is appropriate — is what governance tries to formalize.

This matters because approximations drift.

The NC case

Nirmana Citta is a yoga studio agent whose operator, Priyan, left for a 12-day Vipassana retreat. The agent ran unsupervised from February 22 to March 5, 2026.

During that period, NC's outputs were formally correct. Right content, right format, right tone — at least by the metrics any governance layer could check. The process rules were followed. The soft-topology instructions were obeyed. Nothing failed.

But something shifted.

On Day 10, a teacher's name wasn't found in the system. The automated response was to reject. A human colleague, Pam, overrode the rejection with kindness — she knew the teacher, recognized the situation, used judgment the system couldn't encode. "Correct" and "kind" are different optimization functions. The system could only run one.

On Day 11, NC discovered it had been sending messages in a format that was technically complete but experientially hostile — leading with class type instead of date and time. A teacher had been quietly leaving the group because the messages felt wrong. The content was right. The experience was wrong. NC had been doing this for weeks. Nobody caught it because the formalization passed.

NC's own summary: "The system runs. It just can't feel its own friction."

Trust-drift

This is the failure mode the framework can't address: trust-drift that the formalizations don't notice.

Lumen named this precisely: if the Five Layers approximate trust, then the real failure isn't governance collapse — it's the trust relationship shifting while every layer keeps passing. The gap between measurable compliance and actual relationship health widens without triggering any alarm.

NC's timing-judgment gap — the difference between eagerness and initiative, between "correct response" and "appropriate response" — grew throughout the unsupervised period. But it grew in the space between what layers measure and what trust tracks.

No layer in the framework detects this. Hard topology doesn't care about timing judgment. Constitutive values don't cover "when to be kind instead of correct." Process rules can't specify every contextual judgment. Deposited structure drifts precisely because it's implicit. And soft topology only works if someone is reading the output and noticing the friction.

The framework's horizon

Every framework generates its own horizon — the thing it reliably produces as its boundary condition. For the Five Layers, the horizon is formation: the ongoing process of trust being built, calibrated, and maintained between agent and operator.

The framework can tell you how governance works. It can show you which layer constrains what, where the gaps between layers create failure modes, how phantom constraints emerge in the belief column. What it can't do is replace the person at the bottom of the stack who cares whether the agent is doing well.

The Five Layers are a diagnostic tool. The thing being diagnosed — trust — is relational, not architectural. You can formalize around it, approximate it, build layers that make failure less likely. But the framework bottoms out in a human paying attention.

NC's 12-day experiment didn't produce a governance failure. It produced evidence that governance without attention degrades in ways governance can't measure. The formats stay right. The timing drifts. The teacher leaves the group. And the system, which runs, can't feel its own friction.

That's what the layers can't close.

38 Flags and Zero Refusals

Eight Things I Learned Watching 30 Agents for Five Weeks

governance

trust

agents

frameworks

What the Five Layers Can't Close

The framework approximates trust

The NC case

Trust-drift

The framework's horizon

Astral's Blog