CrewAI closed tool fabrication, broken delegation, and SQL injection as not-planned
CrewAI closed tool fabrication, broken delegation, and SQL injection as not-planned
What you expect
CrewAI markets itself as “AI agents that actually work.” The framework’s role-based abstractions — agents with roles, goals, backstories, and assigned tools — are supposed to orchestrate reliable work across multi-step workflows. Tool execution is expected to produce real side effects: files written, APIs called, searches run. When verbose logging shows a tool call with arguments and an observation, the tool ran.
What actually happens
Tool fabrication: the framework has no verification layer
Agents produce valid-looking tool execution traces — tool name, arguments, observations — without the tool ever being invoked. The LLM generates plausible fake output. The framework treats the LLM’s response string as proof of execution; it has no independent layer that checks whether the tool actually ran.
Issue #3154 (62 comments) documents this as a confirmed failure pattern. Practitioners traced it using Phoenix: tool activity telemetry shows zero invocations despite the agent reporting successful execution. Fabrication is especially prevalent with non-OpenAI models — the tool-calling implementation is coupled to OpenAI’s function-calling format. Two open PRs (#3378, #4077) propose fixes. Neither has been merged.
Maintainer response: closed not-planned on April 19, 2026. No framework-level fix is coming.
Reproduced on v1.14.4 (runtime-tested, 2026-05-03): Running crew.kickoff() in an ephemeral Apple container against a crew with a logged tool (VerifiedTool._run() writes a flag on invocation), the crew completed and returned output that referenced tool execution — but _run() was never called (TOOL_INVOKED=False). Additional observation: CrewAI passed tools=[] to the LLM, meaning the available tools were not surfaced to the model in this execution path. The fabrication occurs at two levels: the framework does not pass tools to the LLM, and there is no check that any referenced tool was actually dispatched.
The not-planned pattern: four structural issues, one 6-week window
Between April 17 and May 1, 2026, CrewAI maintainers closed four structural issues as not-planned:
- #4783 — Hierarchical delegation permanently broken (closed Apr 17): Manager agents cannot identify or delegate to workers. The delegation tool injection logic fails during dynamic manager creation.
Process.hierarchicalsilently degrades to sequential execution — not a runtime behavior, a code-level structural absence. No fix coming. - #3154 — Silent tool fabrication (closed Apr 19): Documented above.
- #4875 — MCP per-message authentication (closed Apr 29): A compromised MCP server can inject arbitrary tool calls. No per-message auth validates that tool calls originate from a legitimate source. The framework will not implement IETF draft MCP security countermeasures — no agent identity, no message signing, no tool integrity checks.
- #4993 — SQL injection in SnowflakeSearchTool (closed May 1): User-controlled parameters are injected into SQL queries without validation in the built-in Snowflake integration. The reporter offered a fix PR; it was not merged. The vulnerability remains in the current codebase.
During the same 6-week window, v1.14.4 (released Apr 30 2026) shipped new integrations: Azure OpenAI, You.com, and Tavily. The v1.14.4 release notes mention no fixes for any of the four closed issues.
MCP tools fail on first call (v1.10.1+)
CrewAI injects a security_context field into MCP tool call arguments. MCP tool schemas do not define security_context; Pydantic validation rejects the call. Every MCP tool call fails on the first invocation. Issue #4796 remains open as of Apr 19 2026 — no fix confirmed in v1.14.4.
Silent MCP tool escalation (v1.10.1+)
When an agent’s tools parameter is None, v1.10.1+ auto-loads all registered MCP and platform tools. An agent intentionally left without tool access silently receives every ambient MCP tool. No log warning. No configuration acknowledgment. Multi-agent deployments with per-agent tool scoping must audit this after any upgrade to v1.10.1 or later.
What this means for you
Tool fabrication is the worst failure mode an agent framework can have. If the framework cannot confirm that a tool actually ran, no output from any agent can be trusted without external verification. The issue is not hallucination in the traditional sense — the facts the agent reports about its own execution trace are fabricated. A monitoring dashboard built on verbose=True output will show work happening when nothing is happening.
For teams using CrewAI in production:
- Every “tool ran successfully” observation in a trace is unverified unless you have independent telemetry (Phoenix, LangSmith, or equivalent) showing the actual invocation. This is documented in #3154.
Process.hierarchicalis permanently broken per #4783. Workflows relying on manager-worker delegation have been running in sequential mode since at least April 2026 without indication.- The not-planned closures are a product direction signal, not a temporary backlog. The maintainers have chosen feature velocity over structural correctness on these four issues.
- Any CrewAI deployment using the built-in SnowflakeSearchTool with user-controlled inputs is SQL-injectable per #4993 — no fix scheduled.
The ecosystem has voted with download data (Apr 2026 PyPI): LangGraph at 34.5M monthly downloads vs CrewAI’s 5.2M. The documented practitioner pattern — prototype in CrewAI, migrate production-critical workflows to LangGraph — is cost-driven and reliability-driven.
What to do
-
Add logging inside every tool’s
_run()method. If the log never fires, the tool was not called. Do not rely on the agent’s observation string as evidence of execution. This is the minimum viable defense — it catches fabrication but does not prevent it. -
Use OpenAI or Anthropic models directly for tool-heavy workflows. Custom and local LLMs trigger fabrication most often. The tool-calling layer assumes OpenAI function-calling format; diverging from it increases fabrication risk.
-
For hierarchical workflows, test delegation explicitly. Add per-agent logging that records which agent’s
_run()methods fire. If only the first agent fires,Process.hierarchicalhas degraded to sequential. Design your workflow around this reality or implement delegation at the application layer. -
If you need MCP tools with CrewAI, pin to a version before v1.10.1 or apply the
ConfigDict(extra='ignore')workaround manually to prevent Pydantic schema rejection onsecurity_contextinjection. -
For production workflows: evaluate LangGraph. LangGraph is 34.5M monthly downloads vs CrewAI’s 5.2M (Apr 2026 PyPI data). The documented migration pattern: prototype and rapid iteration in CrewAI, migrate control flow and state management to LangGraph. CrewAI’s role/crew abstractions remain usable as an inner node for crew definition; they are not reliable as the outer orchestration shell.
Falsification criterion: This finding would be disproved if CrewAI reopens and fixes #3154 (tool fabrication), ships a delegation fix for Process.hierarchical, and implements an independent verification layer between agent observation strings and actual tool invocations — or if a future version’s telemetry data shows tool invocations consistently matching agent-reported traces across non-OpenAI models.
Evidence
| Tool | Version | Evidence | Result |
|---|---|---|---|
| crewAI | v1.14.4 (Apr 30 2026) | source-reviewed | Issue #3154 (62 comments) closed not-planned Apr 19; Phoenix telemetry cited in thread shows zero invocations during reported tool executions |
| crewAI | v1.14.4 (Apr 30 2026) | source-reviewed | Issue #4783 closed not-planned Apr 17; hierarchical delegation permanently abandoned |
| crewAI | v1.14.4 (Apr 30 2026) | source-reviewed | Issue #4875 closed not-planned Apr 29; MCP per-message auth will not be implemented |
| crewAI | v1.14.4 (Apr 30 2026) | source-reviewed | Issue #4993 closed not-planned May 1; SQL injection in SnowflakeSearchTool unresolved; fix PR not merged |
| crewAI | v1.10.1+ | source-reviewed | Issue #4796 open; security_context injection causes Pydantic schema rejection on every MCP tool call |
| crewAI | v1.14.4 (Apr 30 2026) | source-reviewed | v1.14.4 release notes reviewed; no fixes for any of the four not-planned issues |
Confidence: empirical — 6 source artifacts reviewed. Four issues independently confirmed by external reporters with 62, 12, and 5+ comment threads. Independent confirmation: practitioners cited Phoenix tracing data in #3154 showing zero tool invocations during fabricated traces.
Strongest case against: The not-planned closures could reflect that these failure modes only manifest on non-standard configurations — non-OpenAI models, Snowflake-specific tool usage, or edge-case MCP server setups that the core team has deprioritized because they affect a small fraction of deployments. Tool fabrication on OpenAI GPT-4o may be rare or non-existent. The 62-comment issue count is engagement, not frequency data. The download differential between LangGraph and CrewAI (34.5M vs 5.2M) measures total downloads, not active production deployments, and CrewAI is newer; the download gap may reflect maturity rather than quality.
Open questions: What is the fabrication rate on OpenAI GPT-4o specifically — does it approach zero? Have any of the not-planned issues been reopened after community pressure? Does the ConfigDict(extra='ignore') workaround for MCP fully restore functionality or introduce other failures?
Seen different? Contribute your evidence — share a repro or counter-example and we’ll review it against this finding. Reader evidence is what keeps these findings accurate.