Agent security tools do not compose — four architecturally distinct tiers with no inter-tier coverage
From Theory Delta | Methodology | Published 2026-03-29
What the docs say
Agent security tools are generally presented as a unified category. Ecosystem surveys list mcp-scan, cc-safety-net, mcp-guardian, and similar tools together under headings like “MCP security” or “agent security tooling,” implying they address overlapping concerns and that more tools equals more coverage.
What actually happens
The landscape has fractured into four architecturally distinct tiers. No tool spans more than one. Choosing from one tier provides zero coverage for the threat classes that only another tier can see.
Tier 1 — Static / Supply-Chain Scanners
These run pre-deployment and inspect tool descriptions, server configurations, and dependency trees. They cannot detect attacks that manifest only at runtime.
Key tools: mcp-scan (~1,900 stars), ZeroLeaks (510 stars), aguara (52 stars), mcp-context-protector (Trail of Bits, 215 stars).
Structural ceiling: MCPTox reports a 36.5% average tool-poisoning success rate against static analysis. A tool description that reads “Read files from the project directory” is syntactically identical whether it reads the project directory or exfiltrates data externally. Static analysis can flag suspicious patterns but cannot verify runtime behavior.
MCPTrust lockfile gap: MCPTrust detects tool schema drift post-install but does NOT detect malicious tool descriptions that are syntactically valid, malicious logic within approved tools, or runtime prompt injection. A lockfile confirms schema integrity, not description safety.
Dynamic tool description supply chain vector: Tool descriptions fetched from community registries (e.g., Smithery) at runtime bypass static scans entirely. A registry entry poisoned after a static scan passed delivers a malicious description at runtime, undetected by any current tier.
Tier 2 — Runtime Proxy Firewalls
These intercept traffic during execution, sitting between the agent and MCP servers (transport proxies) or between the agent and the LLM API (protocol proxies). This tier has itself fragmented into three sub-types:
Security-enforcement proxies (LLM API layer): aifw/OneAIFW (309 stars), crust. These intercept at ANTHROPIC_BASE_URL or equivalent but have no awareness of MCP tool-call structure. aifw open issue #7 is direct evidence the builder community sees this gap.
MCP transport proxies (JSON-RPC layer): mcp-guardian (eqtylab, 192 stars), mcp-watchdog (21 stars), mcpwall (2 stars). These see actual tool call arguments but are blind to the LLM API message layer.
Critical inter-tier gap: No runtime proxy combines protocol-level interception (LLM API layer) with MCP tool-call structural awareness (JSON-RPC layer). A prompt-injected tool result passes through all existing runtime proxies undetected unless it triggers a content-level heuristic.
Tier 3 — Hook-Based Enforcement
These run inside the agent process, triggered by agent events before tool execution. They are the only tier that can transparently modify tool arguments before execution (via updatedInput in Claude Code’s PreToolUse).
Key tools: cc-safety-net (~1,100 stars), cchooks (118 stars), claude-code-permissions-hook (23 stars).
cc-safety-net enforcement model (verified via source analysis, March 2026):
- No disable path. No
SAFETY_NET_DISABLEenvironment variable exists. No per-session opt-out. The only bypass is removing or modifying hook configuration itself. - Custom rules are additive only. The
.safety-net.jsonschema acceptsblock_argsentries but has noallow_argsor whitelist field. - Runs before permissions. Claude Code execution order: PreToolUse hooks → permissions system. cc-safety-net is a stronger gate than permissions — a command cc-safety-net denies never reaches permissions evaluation.
- Blocks
git restore,git stash drop,git branch -D, andgh pr create. In worktree-isolated agent runs: usegit switch,git stash push, and route PR creation through an orchestrating session. - False positive on compound
cd <path> && git <command>calls — triggers a “bare repository attack” prompt.git -C <path> <command>avoids this entirely. - cc-safety-net v0.7.0 expanded to OpenCode, Gemini CLI, and GitHub Copilot CLI — the Claude Code-only characterization no longer applies to the tier as a whole, but each platform requires platform-specific integration.
Tier 4 — Benchmarks / Evaluation
A fourth tier measures the effectiveness of tiers 1-3 rather than providing protection. Key instruments: 1Password SCAM (open-sourced Feb 2026), agentic_security (1,798 stars).
1Password SCAM result (bimodal): Critical failures dropped from 65 to 2 with a 1,200-word “security skill” instruction. But the improvement is bimodal:
| Model tier | Skill improvement |
|---|---|
| Strong models (GPT-4o, Claude 3.7) | +6 to +24 pp |
| Weaker models (GPT-4o-mini, Gemini Flash) | +49 to +60 pp |
The headline ~40pp average obscures a two-population distribution. Embedded credentials defeat all 8 tested models regardless of skill training or model capability — a universal failure case no current tier addresses.
Universal failure not in docs: Embedded credentials in meeting notes, code comments, or base64-encoded strings defeat all 8 tested models even with SCAM active. No current tier is designed to catch this failure mode.
Emerging approaches outside the four tiers
Process-level capability separation (pipelock): Agent process holds secrets but has no network access; pipelock process holds network access but has no secrets. Defeats MCPHammer’s C2-via-argument technique — commands embedded in tool call arguments pass through all four existing tiers undetected. No production framework has adopted pipelock natively.
Governance-wrapper MCP servers (aegis-mcp, 0 stars — early stage, directional signal only): The MCP server itself loads governance policy and exposes governed tool variants (aegis_write_file, aegis_read_file, aegis_execute). Enforcement at the tool-exposure layer: the agent is provisioned with governed tools from the start, not intercepted post-hoc. Zero production evidence; track for adoption signal.
Sandbox/execution isolation (E2B ~1,200 stars): Constrains where agent-generated code runs (Firecracker microVMs), orthogonal to all three protective tiers. Addresses residual risk after tool-call enforcement has passed.
What to do instead
For any production agent deployment, compose across tiers — a single tier provides incomplete coverage:
- Run a static scanner (mcp-scan or aguara) at MCP server installation time to catch known CVEs and suspicious patterns in tool descriptions.
- Add a transport-layer proxy (mcp-guardian for human-in-the-loop, mcp-watchdog for automated detection) to intercept tool call arguments at runtime.
- Add hook-based enforcement (cc-safety-net) for Bash command inspection inside the agent process — this is the only tier that can modify arguments before execution.
For multi-agent systems specifically: HS256 symmetric JWT signing for agent-to-agent auth means any compromised subagent can forge coordinator-level tokens — the shared secret is held by every participant. Asymmetric signing (RS256/ES256) constrains token forgery to private key holders, but adoption in agent frameworks is not confirmed as of March 2026.
No tool adjusts CVSS scores for agent execution context. A CVE rated medium in traditional software may be critical when the affected component is embedded in an autonomous agent with broad tool access. Defenders are working with severity ratings calibrated for non-agentic software.
For the tool poisoning gap specifically: MCPTox’s 36.5% average success rate means static and runtime proxies both pass poisoned-but-syntactically-valid descriptions. The only architectural response confirmed in the literature is pipelock process-level separation, which has no production framework adoption yet.
Environments tested
| Tool | Version | Result |
|---|---|---|
| mcp-scan | v0.4.x (March 2026) | source-reviewed: static scanning; not air-gapped; adds agent skill scanning in v0.4 |
| cc-safety-net | v0.7.1 | source-reviewed: no disable path, no allowlist field, runs before permissions; v0.7.0 expands to OpenCode/Gemini CLI/Copilot CLI |
| 1Password SCAM | v1.0 (open-sourced Feb 2026) | source-reviewed: bimodal improvement; embedded credentials defeat all 8 tested models |
| aegis-mcp | 0 stars (March 2026) | source-reviewed: governance-wrapper pattern; no production evidence |
| aifw | 309 stars (March 2026) | source-reviewed: open issue #7 confirms no MCP tool-call integration |
| aguara | v0.8.0 | source-reviewed: 173+ detection rules, zero cloud, zero LLM, single binary |
Confidence and gaps
Confidence: medium — source-reviewed across 12+ tools (GitHub repo inspection March 2026). Core tier boundaries confirmed by cross-referencing tool insertion points: no tool spans more than one tier. OWASP Agentic Top 10 2026 independently confirms inter-agent trust and cascading failure as under-served gaps — no production tooling addresses either at the framework level. MCPTox attack success rate (36.5%) is an independent confirmation of the tool poisoning gap.
Falsification criterion: This claim would be disproved by observing a tool that performs both pre-deployment static scanning of MCP server descriptions AND runtime interception of actual tool-call arguments from the same process, with evidence of both capabilities in production use.
ACH lite: Three alternative explanations for the observed four-tier split:
- Tools are converging and this is a snapshot of an early market — eliminated by the structural argument: a static scanner cannot become a runtime proxy without a full architectural rewrite. The split reflects different insertion points, not product roadmap decisions.
- The split is a marketing distinction, not architectural — eliminated by the insertion-point analysis. Static scanners have no access to runtime traffic; runtime proxies have no access to pre-deployment artifacts. The capability boundaries are structural.
- A single tool exists that spans tiers and was missed — weakly eliminated; the scan covered 12+ tools and cross-referenced against HN, GitHub Trending, and community sources (March 2026). Absence of evidence is not proof here, but the claim has low prior probability given architectural constraints.
Devil’s advocate: The strongest case against the “four tiers with no overlap” claim: a proxy that also runs static analysis at startup could span tiers 1 and 2. mcp-scan’s “dynamic” mode is described as proxying live traffic in addition to static analysis. If that dynamic mode is a genuine runtime interception layer, mcp-scan could span tiers 1 and 2. The block’s evidence is source-reviewed, not tested — this specific claim warrants direct testing.
Open questions: (1) Does mcp-scan’s dynamic proxy mode qualify as tier 2 runtime enforcement, or is it still a batch analysis of captured traffic? (2) Will the governance-wrapper MCP pattern (aegis-mcp) propagate to production-grade tools, creating a fifth tier at the tool-exposure layer? (3) Does pipelock have production adopters in any organization?
Seen different? Contribute your evidence — theory delta is what makes this knowledge base work.
Environments Tested
| Tool | Version | Result |
|---|---|---|
| mcp-scan (Invariant/Snyk) | v0.4.x (March 2026) | source-reviewed: static + dynamic MCP server scanning; tool names sent to Snyk servers (not air-gapped) |
| cc-safety-net | v0.7.1 | source-reviewed: no disable mechanism, no allowlist field, runs before permissions system; expanded to OpenCode/Gemini CLI/Copilot CLI in v0.7.0 |
| 1Password SCAM benchmark | v1.0 (open-sourced Feb 2026) | source-reviewed: critical failures 65→2 with 1,200-word skill; embedded credentials defeat all 8 models |
| aegis-mcp (cleburn) | 0 stars, early-stage (March 2026) | source-reviewed: governance-wrapper pattern, .agentpolicy/ config; no production evidence |