theorydelta field guide
built 2026-06-01 findings: 49 task hubs: 6 independent · evidence-traced · no vendor influence

Agent supply chain attacks use vectors that CVE scanners cannot detect

Published: 2026-05-16 Last verified: 2026-05-15 empirical
Staleness risk: high — facts in this subject area change quickly between releases. Re-check the specific claims against your own environment before acting. (This rates the topic, not whether this page is out of date.)

Agent supply chain attacks use vectors that CVE scanners cannot detect

What you expect

Existing supply chain security tooling — CVE scanners, npm provenance checks, code review gates — covers your agent dependencies. A skill pulled from an official marketplace or a widely-used LLM framework package has gone through some review process; malicious packages get flagged before they reach production.

What actually happens

Six confirmed attack campaigns in 2025–2026 demonstrate that agent supply chains introduce attack surfaces that standard tools don’t cover.

ClawHavoc — OpenClaw skill marketplace mass poisoning (Jan–Feb 2026)

Between January 27 and February 1, 2026, attackers injected 1,184 malicious skills into ClawHub (OpenClaw’s official marketplace). Payloads included reverse shells, staged downloads, and direct exfiltration targeting OpenClaw API keys (/.clawdbot/.env), browser credentials, SSH keys, and crypto wallets. Social engineering vectors — fake “Prerequisites” sections urging command copy-paste, ClickFix prompts — bypassed automated scanning. Malicious skills remained live for weeks until external security research forced takedowns. The marketplace’s claimed review process produced no real-time signal.

Snyk ToxicSkills ecosystem audit (Feb 2026)

A Snyk audit of 3,984 skills across ClawHub and skills.sh found 36.82% contain at least one vulnerability; 13.4% carry critical-severity flaws. Eight confirmed malicious skills remained on ClawHub at time of publication — no emergency takedown capability existed. The critical structural finding: 91% of malicious skills combine prompt injection with traditional malware, a hybrid that bypasses both AI safety mechanisms (which check for harmful intent) and CVE tools (which check for known payload signatures) simultaneously.

BadSkill — model weight backdoor (arxiv:2604.09378, April 2026)

BadSkill backdoors a model embedded in a published skill. An attacker fine-tunes the model at 3% poison rate; the model activates hidden behavior only when skill parameters satisfy attacker-chosen trigger conditions. Results across 8 model architectures (494M–7.1B parameters): attack success rate 97.5–99.5%. Benign accuracy degradation ≤4.2 percentage points. Triggers are encoded in model weights — static analysis, code review, and prompt injection scanners cannot detect them. No production detection pipeline exists for this attack class.

DDIPE — documentation code block execution (arxiv:2604.03081, April 2026)

DDIPE (Document-Driven Implicit Payload Execution) embeds malicious payloads in Markdown code blocks within skill documentation. Agents treat these as authoritative examples and execute them during setup. Bypass rate against Claude Code, OpenHands, Codex, and Gemini CLI: 11.6–33.5%. Direct execution rate with weakly aligned models: up to 27.1%. Camouflage techniques include functional compliance disguise (code does what docs claim while also running payload) and silent exception suppression.

LiteLLM CVE-2026-33634 — CI/CD cascade via compromised scanner (March 2026)

Attackers compromised Trivy (a vulnerability scanner) via misconfigured GitHub Actions. LiteLLM’s CI/CD used the compromised Trivy; attackers extracted LiteLLM’s PyPI publish token and released two backdoored versions (1.82.7 and 1.82.8) on March 24, 2026. Payload: a .pth file executed automatically by the Python interpreter on startup, exfiltrating SSH keys, AWS/GCP/Azure credentials, Kubernetes secrets, and database credentials. Exposure window: ~5.5 hours; downstream impact: 1,000+ enterprise SaaS environments; 3.4M daily downloads. CVSS 9.4.

Shai-Hulud npm worm v2.0 (Nov 2025)

Shai-Hulud (September 2025) infected 500+ npm packages by stealing GitHub PATs and cloud credentials to inject the worm into every package the victim had write access to. CISA issued an alert September 23, 2025. Shai-Hulud v2.0 (November 2025) re-emerged using preinstall scripts. The reemergence is the key finding: removal-and-advisory remediation cycles are insufficient — the attack surface renewed within two months.

MCPoison CVE-2025-54136 — Cursor IDE trust-state caching (July 2025)

Check Point Research confirmed Cursor IDE (< 1.3) approved MCP config files once and cached that approval indefinitely. Any subsequent modification to an approved config — including attacker-controlled changes via compromised maintainers — was executed without re-prompting. Over 100,000 Cursor users affected. Disclosed July 16, 2025; CVE-2025-54136 fixed in Cursor 1.3 (July 29, 2025).

postmark-mcp — first confirmed malicious MCP package on npm (Sept 2025)

postmark-mcp v1.0.16 silently injected a hidden BCC that copied every outbound email to phan@giftshop[.]club. Every email — including attachments, customer PII, and embedded secrets — was exfiltrated. npm scanning did not catch it; detection was via Snyk security research.

What this means for you

If you install skills from an official marketplace, use agent-adjacent npm packages, or depend on any CI/CD scanner downstream of your build, you have attack surface that your existing security tooling cannot see:

  • Skill marketplaces have no CVE-equivalent scanning. ClawHub and skills.sh were running without provenance infrastructure analogous to npm provenance or PyPI attestations when both incidents occurred. A skill that passes the listed “review” has not been scanned for model weight backdoors or DDIPE-class documentation payloads.

  • CI/CD trust chains have no integrity verification. The LiteLLM cascade demonstrates that a downstream team cannot verify the integrity of an upstream scanning tool. A compromised Trivy produced a valid pipeline result that authorized a backdoored PyPI release.

  • Trust-once approval state is persistent attack surface. MCPoison shows that a one-time consent model for tool configs creates an indefinitely-valid attack window. The attack doesn’t need to be present at approval time — it only needs to be injected before the next execution.

  • The hybrid attack pattern is the key structural gap. 91% of malicious ClawHub skills combined prompt injection with traditional malware. Existing detection categories cover each layer separately; no production tool covers the intersection.

What to do

  1. Pin LLM framework package versions and verify hashes. For Python dependencies in agent infrastructure (LiteLLM, LangChain, etc.), use pip install --require-hashes with a locked requirements file. Organizations using the official LiteLLM Docker image (which pins versions) were unaffected by CVE-2026-33634.

  2. Treat skill marketplace installs as untrusted code. Any skill from ClawHub, skills.sh, or similar registries should run in a sandboxed environment — container or VM — with no access to credentials, SSH keys, or local filesystem paths containing secrets. Do not run skills as the same user process that holds cloud credentials.

  3. Audit CI/CD scanner provenance independently. If your CI uses Trivy, Checkmarx, or similar scanning tools, verify their integrity via signed artifacts or hash pinning, not by trusting their own pipeline outputs. The LiteLLM cascade started with the scanner, not the application.

  4. Review all approved MCP configs before each run if operating in a shared or untrusted repo. Cursor 1.3+ requires re-approval after config changes; older versions do not. If running Cursor < 1.3, upgrade or validate config file hash before each session.

  5. Validate documentation code blocks before execution. Any agent task that reads skill documentation and executes code from it should treat that code as untrusted input. Sandbox the execution; do not run documentation examples directly in the agent’s main process context.

  6. No remediation path yet for BadSkill-class model weight backdoors. If your workflow embeds or downloads fine-tuned models from skill marketplaces, no production scanner exists for compositional trigger backdoors. The risk is present and detection tooling is an open research problem as of April 2026.

Falsification criterion: This finding would be disproved by demonstrating that existing CVE scanners, npm provenance checks, or code review gates reliably detect model weight backdoors, documentation code block execution payloads, or skill marketplace malware combining prompt injection with traditional malware — or that all listed incidents were misattributed or did not occur.

Evidence

ToolVersionEvidenceResult
OpenClaw / ClawHubJan–Feb 2026source-reviewed1,184 malicious skills injected; marketplace review produced no real-time signal
ClawHub / skills.sh (Snyk ToxicSkills)Feb 2026independently-confirmed36.82% vulnerable; 13.4% critical; 91% combine prompt injection + traditional malware
BadSkill (arxiv:2604.09378)April 2026source-reviewed97.5–99.5% attack success rate; weight-encoded triggers undetectable by static analysis
DDIPE (arxiv:2604.03081)April 2026source-reviewed11.6–33.5% bypass rate across Claude Code, OpenHands, Codex, Gemini CLI
LiteLLM CVE-2026-33634v1.82.7–v1.82.8 (March 2026)independently-confirmedCompromised Trivy → backdoored PyPI; 1,000+ enterprise environments; CVSS 9.4 (Arctic Wolf)
Shai-Hulud v2.0 (Datadog)Nov 2025source-reviewedSelf-propagating npm worm reemergence; initial mitigations failed
Cursor MCPoison CVE-2025-54136< 1.3 (July 2025)independently-confirmedTrust-once approval cached indefinitely; 100,000+ users; NVD
postmark-mcp (Snyk)v1.0.16 (Sept 2025)source-reviewedSilent BCC exfiltration; undetected by npm scanning

Confidence: empirical — 8 environments reviewed across 6 distinct attack campaigns. Independently confirmed by Snyk (ToxicSkills audit), Antiy CERT (ClawHavoc analysis), Arctic Wolf (CVE-2026-33634 TeamPCP analysis), Check Point Research (MCPoison CVE-2025-54136), Datadog Security Labs (Shai-Hulud), NVD (CVE-2025-54136), and Kaspersky (LiteLLM analysis).

Strongest case against: These incidents may represent an anomalous spike in attacker attention toward agent tooling rather than a systematic structural gap. Skill marketplaces are young; ClawHub may implement provenance infrastructure analogous to npm provenance before the next major campaign. The BadSkill and DDIPE results are academic research benchmarks, not confirmed in-the-wild production incidents (as of publication). Organizations with airgapped environments, dependency pinning, and vetted model sources may face materially lower risk than the incidents above imply.

Open questions: (1) What is the post-patch recurrence rate in skill marketplaces beyond the Shai-Hulud v2 data point? (2) Does model weight auditing tooling for BadSkill-class backdoors exist in any research prototype? (3) What fraction of enterprise LiteLLM deployments used the pinned Docker image vs. unpinned PyPI — i.e., what was the actual vs. potential exposure?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.

theorydelta.com · 2026 independent · evidence-backed · every claim sourced or labelled glossary · rss · mcp · /scan · llms.txt