Error suppression patterns common in scripts will silently break your agent

Published: 2026-03-29Last verified: 2026-03-28medium

Published Fact-checked 2026-03-28 · 0 corrections

Error suppression patterns common in scripts will silently break your agent

From Theory Delta | Methodology | Published 2026-03-29

What you expect

Shell scripting guides treat 2>/dev/null, || true, and set +e as style preferences. Python tutorials present broad except: blocks as acceptable shortcuts. Both treat error suppression as a minor code smell — the assumption is that the person running the script will notice if something looks wrong.

What actually happens

Agents have exactly one feedback channel: exit code plus stdout/stderr. When error suppression destroys this channel, the agent cannot detect, diagnose, or recover from failures.

The “Tools Fail” paper (arXiv:2406.19228) measures the impact quantitatively: when tools returned incorrect output without error signals, GPT-3.5 accuracy dropped from 98.7% to 22.7% — a 76 percentage point collapse. The failure mode is not “agent notices something is wrong but cannot fix it.” It is “agent does not notice anything is wrong” — models copy incorrect tool output and generate hallucinated justifications for why it was correct.

This pattern appears across the full agent toolchain, not just bash scripts:

SDK layer: Vercel AI SDK streamText() silently swallows errors by default — no logs, no stack trace without an opt-in onError callback. Fixed in v4.1.22, but the default behavior shipped silently for all prior versions.

CLI layer: OpenAI Codex CLI issue #9091 documents a regression between v0.36 and v0.72 where all commands returned exit code 0 with empty stdout and stderr. The agent could not observe any command output and could not detect whether commands succeeded or failed. Root cause was local shell environment configuration — but the failure mode (stdout invisible to agent) is the same regardless of cause. Related issues #7317, #6033, and #6991 confirm this is a recurring failure class.

Shell layer: Bash 3.2 (macOS default) silently crashes agent scripts on empty array expansion under set -u — ${arr[@]} expands to nothing and the script terminates with an unbound variable error. Bash 3.2 also does not support declare -A associative arrays — scripts using them return exit code 0 with no output rather than surfacing the unsupported syntax error.

Observability layer: AgentOps silent 401 authentication failures occur without visible error output by default. An agent using AgentOps for observability may be operating without any telemetry while believing instrumentation is active.

CI layer: SFEIR Institute documentation on Claude Code headless mode reports approximately 8% of CI/CD executions produce a silent failure: exit code 0 returned, but response is empty, incomplete, or irrelevant.

Note: the observatory prevalence claim (58% of repos exhibit error suppression) is sourced from an internal scan tagged as [legacy, unverified] in the source block and should not be treated as a confirmed statistic until independently reproduced.

What this means for you

Your agent pipeline is not just vulnerable to error suppression you introduce — it is vulnerable to the error suppression already in every shell script, every SDK, and every tool your agent calls. The difference between human and agent execution is not degree but kind:

Property	Human-executed script	Agent-executed script
Visual feedback	Terminal output visible	None — only structured return value
Error detection	Intuition, domain knowledge	Exit code + stderr only
Recovery path	Re-run, inspect logs, ask a colleague	Retry same command, or hallucinate success
Suppression impact	Inconvenient — human will eventually notice	Catastrophic — agent proceeds on false premise

The compounding factor: Dagster’s “Dignified Python” analysis found that LLMs default to broad exception swallowing because try: ... except: pass patterns dominate training data. Agents writing scripts actively introduce error suppression unless their context explicitly prohibits it. You are not just defending against legacy patterns — you are defending against your own agent.

Your staging environment (low concurrency, single platform) will pass. Production (concurrent requests, mixed platforms, real-world tool failures) is where error suppression triggers the 76-point drop.

What to do

Add set -euo pipefail as the first line of every agent-executed bash script. This converts the largest class of silent failures into explicit errors that agents can detect and act on. The pipefail flag is especially important: without it, failing_command | grep pattern returns the exit code of grep, not the failing command.

#!/bin/bash
set -euo pipefail

For commands where non-zero exit is not an error (e.g., grep returning 1 for “no matches”), use targeted handling with logging rather than blanket suppression:

# BAD: suppresses all error information
command 2>/dev/null || true

# GOOD: capture and surface errors explicitly
if ! output=$(command 2>&1); then
  echo "ERROR: command failed with: $output" >&2
  exit 1
fi

# Acceptable: grep no-match is not an error, but document why
matches=$(grep -r "pattern" dir/ 2>&1) || true  # grep returns 1 on no match

For Python code agents execute or write:

# BAD: swallows all errors
try:
    result = risky_operation()
except:
    pass

# GOOD: catch specific exceptions, surface the error
try:
    result = risky_operation()
except SpecificError as e:
    print(f"ERROR: {e}", file=sys.stderr)
    raise

When agents write scripts, include an explicit rule in their context: “Never use 2>/dev/null, || true, or bare except: blocks. Use set -euo pipefail. Catch specific exceptions and re-raise or log to stderr.”

For the Tools Fail paper’s mitigation: adding a simple disclaimer (“The tool can sometimes give incorrect answers”) to tool descriptions improved silent error detection by approximately 30% (GPT-3.5: from 23% to 53% in zero-shot). This is low-cost and can be added to tool definitions without changing the tool implementation.

Falsification criterion: A controlled study showing agent task accuracy is not significantly degraded by silent tool failures in real-world (non-synthetic) codebases, or agent architectures achieving >80% accuracy on tasks where tool failures are silently suppressed at a scale comparable to the Tools Fail paper, would disprove this finding.

Evidence

Claim	Primary Source	Verified
76pp accuracy drop with silent tool failures	arXiv:2406.19228	Yes — controlled study, GPT-3.5
~8% of Claude Code CI/CD runs return exit 0 with empty/irrelevant response	SFEIR Institute	Source-reviewed
Codex CLI all commands returned exit 0 with empty stdout/stderr	openai/codex#9091	Source-reviewed; root cause: local shell env config
Vercel AI SDK streamText() swallows errors silently by default	vercel/ai#4726	Source-reviewed; fixed in v4.1.22
Adding disclaimer to tool description improves error detection ~30%	arXiv:2406.19228	Yes — same study

Confidence: medium — primary quantitative evidence from one academic paper (arXiv:2406.19228), independently confirmed by GitHub issues in three separate tools. Not tested by execution in Theory Delta’s environment. The Tools Fail paper uses synthetic calculator tools, not real-world bash scripts; the magnitude of accuracy degradation in production agentic systems may differ.

Open questions: Does the accuracy degradation replicate on more capable models (Claude 3.5+, GPT-4o)? Does set -euo pipefail actually improve end-task accuracy in agent-executed scripts, or just surface more visible errors? What is the real-world prevalence of error suppression in agent-executed scripts (the internal observatory stat is unverified)?

Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.