Plan-and-Execute

Pattern

A reusable solution you can apply to your work.

Split the agent into a planner that thinks once, an executor that runs each step, and a re-planner that only re-engages when the plan needs to change, so the expensive reasoning model isn’t paying to re-derive the same plan on every tool call.

Also known as: Plan-and-Solve Prompting, ReWOO (Reasoning WithOut Observation), LLMCompiler.

Understand This First

ReAct — the contrast point; Plan-and-Execute is the deliberate alternative to ReAct’s per-step re-planning.
Agent — Plan-and-Execute is one architectural choice for what’s running inside the agent loop.
Tool — the executor’s entire job is calling tools; the planner mostly never touches them.

Context

At the agentic level, Plan-and-Execute is an architectural choice: who does the thinking, who does the doing, and how often the thinking has to repeat. The default architecture in 2026, ReAct, interleaves a thought, a tool call, and an observation on every single step. That’s the right shape when the next correct move depends on what the last tool call returned. It’s the wrong shape when the plan is roughly stable and you’re paying a large reasoning model to re-derive the same plan two hundred times in a row.

Three architectural choices show up in practice. ReAct is the inner loop: one model, every step. Plan Mode is the human-review variant: the agent proposes, you approve, the agent executes. Plan-and-Execute is the autonomous separation: a planner LLM produces a multi-step plan up front, an executor (often a smaller model or a deterministic runner) carries out each step, and a re-planner checks after each step or batch whether to finish, continue, or revise. The split is the whole point.

Problem

How do you keep an agent from spending its biggest token budget on the part of the work that doesn’t change?

A code-migration agent walking 200 files with the same six-step transformation per file doesn’t need a fresh plan after every file. A research agent exploring ten parallel hypotheses doesn’t need to think about hypothesis seven before it starts running hypothesis one. ReAct re-plans on every observation because that’s what its design is for, and on tasks where the plan is mostly stable, the re-planning is wasted spend. The per-step LLM call is the dominant cost in production agent systems, and most of those calls are repeating yesterday’s reasoning.

Forces

Adaptability vs. cost. Re-thinking on every step lets the agent adjust to surprises. It also means paying the planner’s token cost a hundred times when the plan barely shifts.
Planner quality vs. executor cost. A weak planner produces a brittle plan that the executor can’t follow. A strong planner is expensive to call. Splitting the roles lets each one match its model.
Replan frequency vs. throughput. Replan after every step and you’ve reinvented ReAct. Replan never and the agent flounders the first time a step fails. The right cadence is somewhere in between, and it varies per task.
Observation-driven vs. plan-driven control. ReAct lets the latest observation pull the agent in any direction. Plan-and-Execute holds the plan as the anchor and only revisits it on explicit signals. Each shape suits different tasks.

Solution

Separate the agent into three roles and run them on different cadences:

Planner. The planner sees the goal and produces the full plan up front: an ordered list of steps, a DAG of steps with dependencies, or a structured program with placeholders for tool outputs. The planner is typically a strong reasoning model (Claude Opus, GPT-5 reasoning mode, the largest model the budget supports). It runs once per task, sometimes once per major checkpoint.
Executor. The executor takes one step at a time and carries it out. It calls the named tool with the named arguments, captures the result, and returns. It does not reason about the plan; it reasons only enough to fill in the next argument or parse the last observation. The executor can be a small fast model (Haiku, GPT-5 mini), a deterministic tool runner with no model at all, or a subagent specialized for the step type.
Re-planner. Between steps or after a batch of steps, the re-planner looks at what happened and decides whether to finish, continue with the existing plan, or revise. The re-planner is the same model class as the planner, called sparingly. Its job is the question that ReAct asks every step: does the plan still hold?

The architectural rule that unlocks the cost win: the planner sees the goal, the executor sees one step plus context. The planner does not see step-level observations. The executor does not see the full plan. That separation is what lets each role run on its own cadence with its own model.

Three named variants ship in 2026 that make different choices about how to specify the plan and when to re-engage the planner.

Vanilla Plan-and-Execute (LangChain’s langgraph tutorial) emits a plain ordered list of steps, runs them one at a time, and calls the re-planner between batches. Simplest to implement; matches most code-migration and form-filling tasks.

ReWOO (Xu et al., 2023) emits a plan with placeholder variables, like step 3: search the web for $RESULT_OF_STEP_2, and the executor fills them in by running tools without re-engaging any reasoning at all. Reasoning never re-enters the loop. The cost saving is dramatic on tasks where the plan is structurally stable.

LLMCompiler (Kim et al., 2023) emits the plan as a directed acyclic graph with explicit data dependencies. The executor runs independent nodes in parallel and resolves data flow between them. Same planner-executor split, plus parallelism scheduling: wall-clock time on independent-hypothesis tasks drops from minutes to seconds.

Which variant fits depends on how rigid the plan is and how parallel the work is. All three share the architectural core: separate planning from execution, run each role on its own cadence with its own model class, and re-plan only when the plan demands it.

Tip

Pick Plan-and-Execute when you can describe the task as “for each X, do Y” or “explore these N hypotheses.” Pick ReAct when each step’s outcome substantially changes what the next step should be. Pick Plan Mode when the plan needs human eyes before the agent touches anything. Each of the three patterns answers a different architectural question, so the right one depends on which question the task is actually asking.

How It Plays Out

A team is migrating 200 Python files from a deprecated ORM to its successor. The transformation is the same six steps per file: parse the queries, identify the deprecated calls, write the new equivalents, update the imports, run the affected tests, commit if green. ReAct on this task burns 200 planner LLM calls re-deriving the same six steps every time. Plan-and-Execute does it once: the planner produces the rule “for each .py file under src/, apply steps 1-6, fall through to the re-planner only on test failure.” The executor (a small model with file-edit and pytest tools) runs 1,200 deterministic steps. The re-planner fires three times across the whole migration, each time on a query with a wrinkle the planner didn’t anticipate. Cost drops by a factor that more than pays for the engineering effort to set the architecture up.

A research agent is asked to evaluate ten possible architectures for a new caching layer. Each evaluation involves reading a paper, prototyping the approach, running a benchmark, and recording the result. The hypotheses are independent; there’s no reason to evaluate them in series. The team uses the LLMCompiler variant: the planner emits a DAG with ten parallel nodes plus a final consolidation node. The executor runs the ten evaluations concurrently across ten subagent threads. The re-planner consolidates. Wall-clock time on what would have been a 25-minute serial ReAct trace drops to four minutes. The architectural decision (separating planning from execution and emitting the plan as a DAG) is what made parallelism a one-line change instead of a refactor.

A debugging agent gets pointed at a flaky test and given a Plan-and-Execute architecture. The planner produces what looks like a clean six-step plan: reproduce the failure, isolate the offending test, identify the source of nondeterminism, write a fix, re-run, commit. The executor starts on step one. The first reproduction succeeds: the test passes this time. Step two now has nothing to isolate. The executor flounders, the re-planner re-engages, and the planner produces a new plan that step three undermines five minutes later. Each step substantively changes what the next correct move is, which is exactly the shape ReAct exists for. The team rewires the agent: ReAct for the diagnosis, Plan-and-Execute for the fix-and-deploy phase once the diagnosis is in hand. Two architectures, used where each one is right.

Where the Plan Breaks

Plan-and-Execute fails in predictable ways. The recurring traps:

Brittle plans on changing environments. When the first observation invalidates the plan, the executor flounders and the re-planner ends up doing the work the planner should have done. The repair is recognizing this earlier. If your task is intrinsically observation-driven, ReAct is the right pattern, not Plan-and-Execute with aggressive re-plan triggers.
Per-task amortization fails on small jobs. The planner call is a fixed cost per task. On tasks of three or four steps, the planner overhead dominates and ReAct is cheaper. Plan-and-Execute starts paying off around fifteen to twenty steps and dominates above fifty.
Re-plan logic that can’t decide when to give up. The re-planner’s job is to know when the plan is salvageable and when to throw it out. A re-planner that always patches the existing plan creates Frankenstein plans that grow new appendages forever. A re-planner that always discards and starts over loses the work the executor already did. The signal worth tuning: how much of the original plan’s preconditions still hold.
Hidden coupling between steps. A plan that looks parallel often has implicit dependencies: the second hypothesis modifies the same database the first one is reading. The LLMCompiler variant exposes this through explicit dependency edges; the vanilla variant hides it and the executor races itself.

Consequences

The cost per useful action drops, often substantially. LangChain’s published measurements on canonical Plan-and-Execute benchmarks report three-to-five-times reductions in planner-token spend versus ReAct on tasks where the plan is stable. The DAG-based LLMCompiler variant adds wall-clock latency wins on top: independent steps that ran in series under ReAct now run in parallel under the executor.

Two costs land back on the team. Debugging gets harder. ReAct failures are local: one step went wrong, you read the trace at that step. Plan-and-Execute failures are global: the plan was wrong, which means every executor step that ran since the planner spoke might be salvage or might be garbage. The re-planner trace is now part of the debugging surface, and it’s a more complex object than a ReAct loop’s per-step log. The second cost: the planner becomes the highest-leverage prompt to get right. A weak planner produces a plan the executor can’t follow, and no amount of executor tuning rescues a bad plan. Teams that adopt Plan-and-Execute end up investing in planner prompt engineering and planner evaluation in a way ReAct never demanded.

The architectural decision shapes everything around it. The executor is a natural place to apply Model Routing: small cheap model for steps the planner already specified, large model only on the planner and re-planner. The re-planner is a natural place to consume verification loop output, since the verification check produces the signal the re-planner needs to decide what to do next. Reflexion layers cleanly on the re-planner, converting failures into post-mortems that improve the next plan. Plan-and-Execute is the architectural decision that opens the door to those compositions; once the planner-executor split is in place, the rest of the agent surface can be tuned around it.

Sources

Lei Wang, Wanyu Xu, Yihuai Lan, Zhiqiang Hu, Yunshi Lan, Roy Ka-Wei Lee, and Ee-Peng Lim introduced the prompting variant of the architecture in Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (arXiv:2305.04091, ACL 2023). The paper distinguished “devise a plan, then carry it out” from one-shot chain-of-thought and gave the architecture its first academic anchor.
Binfeng Xu, Zhiyuan Peng, Bowen Lei, Subhabrata Mukherjee, Yuchen Liu, and Dongkuan Xu introduced ReWOO in ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models (arXiv:2305.18323, 2023), the first formalization of a planner-executor split where reasoning never re-enters the loop.
Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, and Amir Gholami introduced LLMCompiler in An LLM Compiler for Parallel Function Calling (arXiv:2312.04511, 2023), adding a directed-acyclic-graph executor that resolves dependencies and runs independent steps in parallel.
The LangChain blog post Plan-and-Execute Agents (Feb 13, 2024) gave the architecture its working name, codified the planner / executor / re-planner roles, and reported the first widely-cited measurements of cost and latency wins versus ReAct.
The official LangGraph Plan-and-Execute tutorial made the architecture buildable end-to-end in a single notebook, which is what moved Plan-and-Execute from a paper formalism to the de-facto reference implementation in 2025-2026.

Keyboard shortcuts