Claude Code hooks have 25+ confirmed failure modes -- no single hook is a reliable enforcement point

From Theory Delta | Methodology | Published 2026-03-03

What the docs say

Claude Code hooks let you run custom scripts at specific points in the agent lifecycle: before a tool is used (PreToolUse), after a tool completes (PostToolUse), and at other lifecycle events. Hooks can block operations, modify arguments, and enforce policies. The documentation presents hooks as the primary mechanism for customizing and securing Claude Code's behavior.

What actually happens

Hooks fail in 25+ confirmed ways across five categories. No single hook event is a reliable enforcement point.

1. Silent non-firing. PreToolUse and PostToolUse hooks intermittently fail to trigger. The hook is configured, the tool runs, and the hook simply does not execute. No error, no log entry. The operation proceeds as if the hook does not exist. This is the most dangerous category because it is invisible.

2. Ignored decisions. When a hook returns a PermissionRequest decision (asking the user to approve or deny), the decision is sometimes ignored. The tool call proceeds regardless of what the hook decided. This means a hook that correctly identifies a dangerous operation and returns "deny" may have no effect.

3. Platform breakage. Windows has 5+ distinct hook failure modes that do not appear on macOS or Linux. Path handling, process spawning, and signal propagation all behave differently. A hook that works reliably on a Mac development machine may fail silently on a Windows CI server.

4. Data corruption. Hooks that modify tool arguments can produce corrupted output under certain conditions. The modified arguments may be partially applied, double-escaped, or silently dropped. This is especially dangerous for hooks that sanitize inputs -- the sanitization itself can introduce new problems.

5. Architectural constraints. Post-compaction sessions lose all plugin hook enforcement. When Claude Code compacts its context (which happens automatically in long sessions), hook state can be lost. A session that started with full hook protection may silently lose it after compaction. The only mitigation is the PostCompact hook, which itself is subject to the same non-firing issues.

The net effect: any security policy that relies on a single hook firing reliably will eventually fail. Defense-in-depth -- using multiple hook events, external monitoring, and independent verification -- is the minimum viable approach.

What to do instead

Never rely on a single hook for security enforcement. Use PreToolUse AND PostToolUse AND external monitoring. Assume each individual hook has a non-zero chance of not firing.
Add the PostCompact hook to re-inject critical configuration after context compaction. This is the structural mitigation for the compaction-loss problem.
Test hooks on your deployment platform. If you develop on macOS and deploy on Windows or Linux, test hooks in the target environment. Platform-specific failures are common.
Monitor hook execution independently. Have hooks log to an external file or service. If the log entry is missing, the hook did not fire. This gives you visibility into silent non-firing.
Keep hooks simple. Hooks that modify arguments are more likely to produce data corruption. Prefer hooks that block or allow without modifying the payload.

Environments tested

Tool	Version	Result
Claude Code	v2.x	25+ failure modes confirmed across 5 categories

Confidence and gaps

Confidence: empirical -- failure modes confirmed through runtime testing across multiple sessions and platforms. Silent non-firing and ignored decisions observed directly. Windows-specific failures documented via issue reports.

Falsification criterion: This claim would be disproved by demonstrating that PreToolUse hooks fire with 100% reliability across 1000+ tool calls in a single session, including after context compaction events, on all supported platforms.

Open questions: Is Anthropic tracking hook reliability metrics internally? Will future Claude Code versions add hook execution guarantees? Are there plans to address the post-compaction enforcement loss?

Seen different? Contribute your evidence -- theory delta is what makes this knowledge base work.

Tested this tool yourself? Contribute your evidence -- confirmation, contradiction, or a fix.