--- slug: graph-rag type: pattern summary: "Retrieve from a knowledge graph instead of a flat document store, so an agent can chain facts across relationships to answer multi-hop questions." created: 2026-07-01 updated: 2026-07-01 related: retrieval: relation: specializes note: "GraphRAG is retrieval over a graph rather than a flat document store; the retrieve-augment-generate cycle is the same, the index shape is not." context-engineering: relation: uses note: "Graph retrieval is one context-assembly strategy within the broader discipline of managing what the model sees." codebase-map: relation: related note: "A codebase map is a hand-built structural index of one corpus; GraphRAG generalizes that idea to any corpus and lets the agent traverse it at query time." memory: relation: contrasts-with note: "Memory persists agent-specific learnings across sessions; GraphRAG queries a shared, structured corpus at query time." rag-poisoning: relation: threatened-by note: "Graph nodes and edges are populated from source documents, so a poisoned corpus can plant false facts and false relationships." prompt-injection: relation: threatened-by note: "Node text and edge labels can carry injected instructions just as retrieved documents can." --- # GraphRAG > **Pattern** > > A named solution to a recurring problem. *GraphRAG retrieves from a knowledge graph instead of a flat document store, so an agent can chain facts across explicit relationships to answer questions a single similarity search cannot.* *Also known as: Graph Retrieval-Augmented Generation, Knowledge-Graph RAG* Ordinary [Retrieval](retrieval.md) treats your corpus as a bag of chunks. Ask it a question, and it hands back the passages that look most like the question. That works well when the answer sits inside a passage. It falls apart when the answer lives in the *connections between* passages. Ask "which service that my team owns depends on a library another team just deprecated?" and no single chunk says that. The answer is a path: your team owns service A, service A imports library L, library L is marked deprecated by another team. GraphRAG is what you reach for when the answer is a path rather than a paragraph. ## Understand This First - [Retrieval](retrieval.md) — GraphRAG is a specialization of retrieval; understand flat retrieval first. - [Context Engineering](context-engineering.md) — graph retrieval is one technique within the broader discipline of managing what the model sees. ## Context At the **agentic** level, GraphRAG sits alongside flat [Retrieval](retrieval.md) as the second major way to bring outside knowledge into an [agent](agent.md)'s [context window](context-window.md). Flat retrieval indexes documents as vectors and matches by similarity. GraphRAG indexes the corpus as a graph: entities become nodes, the relationships between them become typed edges, and retrieval becomes traversal. You reach for it once your agent moves past single-document question answering and starts reasoning over a body of knowledge that has structure: a codebase with its dependency web, an organization's services and the teams that own them, a research corpus where papers cite and contradict each other. In all of these, the facts you care about are relationships, and relationships are exactly what a flat vector store throws away when it chops the corpus into independent chunks. ## Problem How do you answer a question whose answer requires chaining several facts together, when no single document contains the chain? A developer asks their agent, "If we upgrade the payments library to v3, what breaks?" The answer requires walking a chain: the payments library is used by the checkout service, the checkout service exposes an endpoint the mobile app calls, and the mobile app's contract test asserts the old response shape. Each of those facts lives in a different file, and flat retrieval will happily return the three most payments-related chunks it can find. Those are probably three paragraphs about payments, none of which mention the mobile app at all. The similarity score can't know that the *connection* between checkout and mobile is what the question turns on. This is **multi-hop reasoning**: the answer is more than one relationship-step away from the question. ## Forces - **Similarity is not connection.** Two facts can be highly relevant to each other and yet look nothing alike as text. Vector search scores surface similarity; it cannot see that fact A implies fact B. - **Chunking severs relationships.** Splitting documents into independent passages is what makes flat retrieval scalable, and it is also what destroys the links between them. - **Graphs cost more to build and maintain.** Extracting entities and relations from raw documents, keeping the graph fresh as the corpus changes, and storing it are real engineering work that a flat vector index does not require. - **Traversal can explode.** Follow every edge from every node and you retrieve the whole graph. The retrieval has to be bounded to a subgraph, not the universe. - **Not every question needs a path.** Many queries are answered by a single passage. Building a graph for those is expensive overkill. ## Solution **Model the corpus as entities and typed relationships, then retrieve a connected subgraph instead of a ranked list of chunks.** GraphRAG keeps the retrieve-augment-generate cycle of flat retrieval, but changes what gets indexed and what gets returned. **Build the graph.** Process the corpus once to extract entities (the nodes) and the relationships between them (the typed edges). For a codebase, nodes are files, functions, and modules; edges are "imports," "calls," and "tests." For an organization, nodes are teams, services, and libraries; edges are "owns," "depends-on," and "deprecates." The extraction is usually itself a model-driven step: you prompt a model to read each document and emit the entities and relations it finds, then merge those into a single graph. Some systems also compute *community summaries*, short descriptions of tightly connected clusters, so the agent can retrieve a summary of a whole region rather than every node in it. **Retrieve a subgraph.** When a query arrives, find the entry-point nodes (often by the same vector similarity flat retrieval uses, applied to node text) and then traverse outward along edges to a bounded depth. The result is not the top-k similar chunks; it is a connected subgraph: the entities involved, the edges between them, and any relevant community summaries. This is what carries the multi-hop structure the model needs. **Augment and generate.** Serialize the subgraph into the context window in a form the model can read: a list of facts and relationships, or a small labeled diagram in text. The model then reasons over the *structure*, not just the content, and can follow the chain from question to answer because the chain is now sitting in its context. > **💡 Tip** > > Start flat, and only add a graph when you can point to specific questions that flat retrieval keeps getting wrong: questions where you can see the answer is a chain of two or more relationships. The graph is worth its maintenance cost exactly when multi-hop questions are common, and it's dead weight when they're rare. ## How It Plays Out A platform team runs an agent that answers "what breaks if I change this?" for a codebase of 300 services. They build a graph where nodes are services, endpoints, and shared libraries, and edges are "calls," "depends-on," and "owned-by." A developer asks about deprecating an authentication library. Flat retrieval would have surfaced the library's own docs and called it done. The graph agent instead traverses out from the library node: it finds the eleven services that depend on it, the three of those that expose public endpoints, and the two teams that own them. The agent's answer names the blast radius and the people to notify, because the graph made the second and third hops visible. Consider a research assistant working over a corpus of internal design documents. A product manager asks which past decisions conflict with a proposed architecture. The relevant conflict isn't stated in any one document — it's that a decision recorded in 2024 assumed synchronous processing, and a later document quietly moved to an event-driven model. Flat retrieval returns both documents if you're lucky, but the agent has to notice the contradiction on its own. With a graph that has "supersedes" and "contradicts" edges extracted across the corpus, the conflict is an edge the agent can retrieve directly, not a subtlety it has to reconstruct from two disconnected passages. ## Consequences GraphRAG changes the retrieval question from "which passages resemble the query?" to "which connected region of the corpus does the query touch?" That buys multi-hop reasoning, and it costs you a graph to build and keep current. **Benefits:** - Multi-hop questions become answerable, because the relationships the answer depends on are retrieved explicitly rather than left for the model to infer. - Answers can carry provenance as a path: not just "here is the answer" but "here is the chain of facts that leads to it," which is far easier to audit than a single opaque similarity match. - The graph is inspectable. You can look at the extracted entities and edges, correct wrong ones, and see exactly why the agent retrieved what it did. **Liabilities:** - Building and maintaining the graph is ongoing work. Entity extraction is imperfect, and a stale graph gives confidently wrong answers about a corpus that has moved on. - Extraction errors compound. A wrong edge is worse than a missing document, because the agent will reason *across* it and reach a conclusion the corpus never supported. - The graph is a [trust boundary](trust-boundary.md), same as any retrieval corpus. An attacker who can plant documents can plant false entities and false relationships, and node text can smuggle injected instructions the way any retrieved passage can. This is [RAG Poisoning](rag-poisoning.md) with an extra surface. - For questions a single passage would answer, the graph adds cost and latency for nothing. GraphRAG is a specialization, not a replacement for flat [Retrieval](retrieval.md). ## Sources The graph-based approach to retrieval-augmented generation was named and popularized by a Microsoft Research team in their 2024 report *["From Local to Global: A Graph RAG Approach to Query-Focused Summarization"](https://arxiv.org/abs/2404.16130)*, which introduced the entity-graph-plus-community-summary pipeline that most later systems build on. The pattern rests on the older idea of the knowledge graph as a way to represent entities and their relationships for querying, a lineage that runs back through decades of work in knowledge representation and semantic-web research well before language models made automated extraction cheap. Retrieval-augmented generation itself (the retrieve-augment-generate cycle that GraphRAG specializes) was introduced by Patrick Lewis and colleagues in *["Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"](https://arxiv.org/abs/2005.11401)* (2020); see this book's [Retrieval](retrieval.md) entry for that foundation. --- - [Next: ReAct](react.md) - [Previous: Retrieval](retrieval.md)