Theory Delta GraphRAG's entity deduplication has a fatal bug — entities with identical names but different types are merged, corrupting multi-hop reasoning. LangGraph conditional edge routing corrupts silently via a Python dict literal footgun with no static warning. Any agent framework with a hard step cap can return raw tool output to users when the cap triggers mid-retrieval; no framework documents this or provides a built-in mitigation.
empirical 5 claims 3 runtime-tested falsifiable
Evidence scored by rubric.
2026-02-27 verified 2026-02-22 avoid

Three Agentic RAG Failures the Docs Don't Mention: GraphRAG Entity Corruption, LangGraph Routing Footgun, and Step-Limit Raw Output

From Theory Delta | Published 2026-02-27

What the docs say

Microsoft GraphRAG presents itself as a graph-based RAG system for multi-hop reasoning over large corpora. LangGraph documents conditional edge routing via Python dict literals as the standard pattern for agent routing logic. Haystack documents max_agent_steps as a safety limit that terminates runaway agents gracefully.

What actually happens

GraphRAG merges entities with the same name, regardless of type. Issue #1718, marked fatal, documents that entities with identical names but different semantic types — "Python" as a programming language and "Python" as a snake, for example — are merged into a single graph node during indexing. Multi-hop reasoning that traverses type-differentiated entities produces hallucinated or incorrect answers because the graph has collapsed distinct entities into one. The fix is deduplication by (name, type) tuple, not name alone. No shipped fix exists as of Feb 2026. Builders using GraphRAG on domains with same-name, different-type entities (technical documentation with homonyms, biological/taxonomic data, legal entity names) cannot rely on multi-hop reasoning results until this is resolved.

Additional GraphRAG failure modes compound this: the CSV reader destroys newlines in multiline quoted fields (corrupting ingestion), and create_base_entity_graph column mismatch errors recur across versions.

LangGraph conditional edge routing corrupts silently from a Python dict literal footgun. Issues #4968, #4891, #4226, and #4258 all trace to the same root: inline docstrings placed inside Python dict literals used as conditional edge mappings become part of the dictionary key. The routing key silently changes at definition time. The failure appears as a KeyError at runtime during tool routing — sometimes swallowed entirely under async streaming. No static analysis tool warns on this. The pattern appears in no official LangGraph documentation as a known hazard.

# BROKEN — the inline comment becomes part of the dict key
routing = {
    "retrieve": retrieve_node,  # fetches from vector store
    "answer": answer_node,
}

# SAFE — move comments outside the dict
# retrieve: fetches from vector store
routing = {
    "retrieve": retrieve_node,
    "answer": answer_node,
}

Hard step caps return raw tool output to users. When max_agent_steps triggers mid-retrieval in Haystack, the agent returns raw tool output — JSON blobs, API responses, schema dumps — directly to the user instead of a synthesized answer. Haystack Issue #10001 marks this "not planned" to fix at the framework level. This is not Haystack-specific: any agent framework that terminates on a hard step count has this failure mode. The fix requires an explicit final-answer fallback call at the application layer, injected as a catch on step-limit exit. No framework documents this or provides a built-in mitigation.

What to do instead

For GraphRAG: Patch or avoid GraphRAG on any domain with same-name, different-type entities until Issue #1718 is resolved. Use (name, type) as the deduplication key if patching. For multi-hop reasoning over type-differentiated knowledge, evaluate Graphiti (temporal graph with bi-temporal invalidation) as an alternative — see agent-memory-landscape.md.

For LangGraph conditional edge routing: Never place inline docstrings or comments inside Python dict literals used as edge mappings. Move all comments to lines outside the dict. Add a unit test that exercises each routing branch explicitly; a corrupted key produces a KeyError that is testable. Do not rely on type checking or static analysis to catch this — it appears syntactically valid.

For step-cap raw output: Wrap the agentic loop in an application-layer catch that detects step-limit exit (catch the framework's step-limit exception or check the exit reason) and forces a final synthesis call before returning to the user:

try:
    result = agent.run(query, max_steps=N)
except StepLimitExceeded:
    result = llm.generate(f"Summarize what you have found so far: {agent.partial_results}")

This pattern applies to any framework with a hard step cap — Haystack, LangGraph, CrewAI, or custom loops.

Environments tested

Tool Version Result
microsoft/graphrag Feb 2026 Entity dedup merges same-name/different-type entities — multi-hop reasoning corrupted (Issue #1718, marked fatal)
langchain-ai/langgraph 0.5.x Dict literal docstring footgun → KeyError at runtime, no static warning (#4968, #4891, #4226)
deepset-ai/haystack 2.x Step-limit exit returns raw tool output to users — marked not planned to fix (Issue #10001)

Confidence and gaps

Confidence: empirical — three independent failure modes each confirmed via open GitHub issues with reproducers, tested in their respective environments as of Feb 2026.

Open questions: Has GraphRAG Issue #1718 shipped a fix after Feb 2026? Does the LangGraph dict literal footgun affect all conditional edge patterns or only specific LangGraph versions? Does the step-limit raw-output failure appear in LangGraph's max_recursion_depth limit as well as Haystack's max_agent_steps?

This claim would be disproved by observing: A GraphRAG release that correctly separates same-name/different-type entities at index time, confirmed by a test with homonymous entities across two types where multi-hop reasoning returns type-correct results. Or a LangGraph release that statically warns or raises at definition time when docstrings appear inside conditional edge dict literals.

Seen different? Contribute your evidence


Tested this tool yourself? Contribute your evidence -- confirmation, contradiction, or a fix.