Prompt Chaining

Pattern

A named solution to a recurring problem.

Break a task into a fixed sequence of model calls, where each call works on the previous one’s output, so you trade a little latency for a lot of accuracy.

If you have ever pasted a model’s answer back into the same model with a follow-up instruction (“now summarize that,” “now turn that into JSON,” “now check it for errors”), you have already built a prompt chain by hand. The pattern just names the thing and makes it deliberate: instead of one prompt asked to do everything at once, you write a short pipeline of focused steps and let each one do a single job well.

Understand This First

Prompt — each step in the chain is its own prompt with one focused responsibility.
Structured Outputs — each step’s output has to be parseable for the next step to consume it.
Agent — a chain is the fixed-path alternative to an agent that picks its own next step.

Context

At the agentic level, prompt chaining is the simplest of the workflow patterns: a fixed, ordered sequence of model calls, wired together so the output of one becomes the input of the next. It sits below the branching and looping workflows in the same section. Parallelization fans independent work out sideways, Orchestrator-Workers lets a lead agent invent the subtasks, Generator-Evaluator loops a writer against a judge, and Model Routing picks a different model per request. Prompt chaining is the straight line they all bend away from, and the base case worth understanding before any of them.

The distinction that matters is between a workflow and an agent. In a chain, you decide the steps and their order ahead of time; the model fills in the content of each step but never chooses what comes next. In an agent, the model decides its own path at runtime. A chain is what you reach for when the task decomposes cleanly into known subtasks and you would rather have a predictable pipeline than an open-ended loop.

Problem

A single prompt that asks a model to do several things at once (research, then outline, then draft, then fact-check, then format) tends to do all of them at half quality. The model spreads its attention across the whole job, drops requirements, and produces output you can’t easily inspect when one part comes out wrong. You’re left re-running the entire prompt and hoping the next roll of the dice is better.

How do you get a model to do a multi-part task reliably, in a way you can debug step by step, without handing the whole thing to an autonomous agent you then have to supervise?

Forces

Accuracy vs. latency: splitting one call into five focused calls raises quality but adds round-trips, so the chain is slower than a single prompt.
Decomposability: chaining only helps when the task breaks into a fixed sequence of subtasks; a task that needs to branch unpredictably wants an agent, not a chain.
Debuggability vs. simplicity: more steps mean more places to inspect and more glue code to maintain.
Drift between steps: each handoff is a chance for the next step to misread the last one’s output unless that output is structured.
Cost: more model calls per task means a higher per-task bill, which has to be weighed against the accuracy gain.

Solution

Decompose the task into a fixed sequence of steps, give each step a single focused responsibility, and pass each step’s output as the next step’s input. Where a step can fail in a way the chain shouldn’t proceed past, insert a programmatic gate — an ordinary code check between two model calls that confirms the work is still on track before continuing.

The discipline is in the decomposition. Each step should be small enough that you could write its prompt, read its output, and judge it correct on its own. A research step gathers facts; a drafting step turns facts into prose; a formatting step turns prose into the final shape. Because each step is narrow, you can tune its prompt without disturbing the others, and when the chain produces a bad result you can see exactly which step went wrong by reading the intermediate outputs.

Make each step’s output machine-consumable. If step two has to parse step one’s answer, step one should emit structured output (JSON, a fixed schema, a delimited list) rather than free prose the next step has to interpret. The cleaner the handoff format, the less the chain drifts.

Put gates where a downstream step would waste effort or compound an error if it ran on bad input. A gate is not another model call; it is plain code that asks a yes-or-no question (does the outline have the required sections? does the JSON parse? is the word count in range?) and stops or reroutes the chain when the answer is no. Gates are what turn a chain from a hopeful sequence into a checked pipeline.

Tip

Reach for a chain when you can name the steps in advance. If you find yourself unable to say what step three is until you see the output of step two, the task probably wants an agent or an orchestrator, not a fixed chain.

How It Plays Out

A team builds a feature that turns a support ticket into a structured bug report. They chain four steps: extract the reported symptoms, classify the affected component, draft a reproduction summary, and format the result as the JSON their tracker ingests. Between the classify step and the draft step they add a gate: if the classifier returns “unknown component,” the chain stops and routes the ticket to a human instead of drafting a report against a component that doesn’t exist. Each step is a short prompt they can test in isolation, and when a report comes out wrong they read the intermediate outputs and find the one step that misfired.

Consider an agentic coding workflow. A developer wants the agent to translate a plain-English spec into a tested function. Rather than one prompt asking for everything, they chain it: first the agent extracts acceptance criteria from the spec, then it writes the function against those criteria, then it writes tests, then a gate runs the tests. If the tests fail, the chain doesn’t ship. It loops the failure back into a fix step. The fixed sequence makes the agent’s behavior predictable: the developer always knows criteria come before code, code before tests, tests before merge.

Warning

A chain is only as deterministic as its steps. Each model call is still probabilistic, so a long chain can compound small errors across steps. Keep chains short, gate the steps where a mistake is expensive, and prefer a checked five-step chain over an unchecked fifteen-step one.

Consequences

Benefits. Chaining trades latency for accuracy: each focused step does its one job better than a single prompt asked to do everything. The chain is debuggable, because every intermediate output is inspectable and you can pinpoint which step failed. It’s predictable, because the path is fixed in advance rather than chosen by the model at runtime, which makes it easier to reason about than an agent loop and a natural fit when you want determinism. And each step is independently tunable: you can improve one prompt without touching the rest.

Liabilities. A chain is slower and more expensive than a single call, because it makes several round-trips where one prompt made one. It only fits tasks that decompose into a known, fixed sequence; a task that needs to branch on what it discovers will strain against the rigid path and is better served by an orchestrator or an agent. Each handoff is a seam where the next step can misread the last unless the output is structured. And errors can compound: a small mistake early in a long chain propagates forward, which is why gates and short chains matter.

Sources

Anthropic’s Building Effective Agents (December 2024) names prompt chaining the first of its workflow patterns, defining it as decomposing a task into a sequence of steps where each model call processes the previous one’s output, and introduces the programmatic “gate” check between steps. The article’s latency-for-accuracy framing and the gate concept come directly from this treatment.
The Spring AI reference documents the “Chain Workflow” as the simplest foundational pattern, where each step has a focused responsibility and the output of one becomes the input of the next. It offers the practitioner’s guidance to begin with basic workflows before adding complexity.
The broader idea of composing language-model calls into reasoning pipelines emerged across the practitioner community in 2023-2024, as engineers building on early LLM APIs found that decomposing a task into focused, chained sub-calls produced more reliable, more debuggable results than a single monolithic prompt.

Keyboard shortcuts