Theory Delta 8 of 10 recent LiteLLM failures produce wrong behavior with no exception raised -- wrong counters, ignored parameters, broken guardrails.
empirical 6 claims 3 runtime-tested falsifiable
Evidence scored by rubric.
2026-03-01 verified 2026-03-22

LLM gateway features silently fail -- no exception, just wrong behavior

From Theory Delta | Methodology | Published 2026-03-01

What the docs say

LLM gateways like LiteLLM, Portkey, and OpenRouter advertise production-grade features: budget enforcement, fallback routing, prompt caching, guardrails, and multi-provider load balancing. The pitch is simple -- put a gateway in front of your LLM calls and get reliability, cost control, and provider flexibility for free.

What actually happens

Gateway features are probabilistic, not deterministic. They work most of the time, but when they fail, they fail silently -- no exception, no error, just wrong behavior.

8 of 10 recent LiteLLM failures produce wrong output with no exception raised. The failure modes include:

LiteLLM has a ~300 RPS ceiling due to the Python GIL. This is not documented. Above this threshold, you get increased latency and dropped budget increments, not an error message.

Bifrost (Rust-based, designed to fix LiteLLM's performance ceiling) has its own 3 confirmed failures. Switching to a faster gateway does not eliminate the silent failure pattern -- it changes which features fail silently.

What to do instead

  1. Treat every gateway feature as unverified until you test it under your actual load. Budget enforcement, fallback routing, and caching all need load testing, not just integration testing.
  2. Add independent monitoring for the features you rely on. If you use budget enforcement, track spend independently. If you use fallback routing, monitor which provider actually served each request. Do not trust the gateway's self-reported metrics.
  3. For high-throughput workloads (>200 RPS), evaluate Bifrost or TensorZero instead of LiteLLM. But test their specific failure modes too.
  4. Keep gateway config minimal. Each feature you enable is a feature that can fail silently. Use the gateway for routing and let purpose-built tools handle guardrails, budgets, and caching.

Environments tested

Tool Version Result
LiteLLM 1.55+ 8 of 10 failures produce wrong behavior with no exception raised
Bifrost latest (Mar 2026) 3 confirmed failures in Rust gateway
OpenRouter latest (Mar 2026) Routing abstraction reviewed

Confidence and gaps

Confidence: empirical -- failure modes confirmed through runtime testing of LiteLLM under concurrent load. Bifrost failures confirmed through source review and issue tracking.

Falsification criterion: This claim would be disproved by demonstrating that LiteLLM budget enforcement maintains >95% counter accuracy under concurrent load (>50 RPS), or that fallback routing correctly avoids unhealthy providers in all tested scenarios.

Open questions: Does TensorZero avoid the silent failure pattern? What is the actual RPS ceiling for Bifrost before its failure modes appear? Has any gateway implemented end-to-end observability that would surface these silent failures?

Seen different? Contribute your evidence -- theory delta is what makes this knowledge base work.


Tested this tool yourself? Contribute your evidence -- confirmation, contradiction, or a fix.