Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ReAct

Pattern

A reusable solution you can apply to your work.

Interleave a thought, an action, and an observation on every step, so the agent can plan against what it actually sees instead of what it first assumed.

Also known as: Reasoning and Acting, Thought-Action-Observation Loop, ReAct Agent.

Understand This First

  • Agent – ReAct is the inner loop that most coding agents run on.
  • Tool – each action step calls a tool and reads its result.
  • Context Window – every thought, action, and observation consumes tokens.

Context

At the agentic level, ReAct is the step-by-step cycle that turns a model into an agent. On every step the agent produces a short piece of reasoning (a thought), picks one tool to call with specific arguments (an action), and then reads what that call returned (an observation). The next thought is written against the observation that just arrived, not against whatever the agent guessed when the task began.

Almost every coding agent you have used runs ReAct under the hood, whether or not the product names it. Claude Code, Codex, Cursor, Copilot Chat, Aider, and most LangGraph agents all drive a thought-action-observation loop with varying window sizes, stop conditions, and surface polish. Once you can name the loop, the vocabulary for everything built on top of it snaps into place: plan mode, verification loops, steering, and the failure modes they guard against.

Problem

How do you get useful work out of a language model in a partially unknown environment, where the next correct move depends on facts the model will not have until it looks?

A model asked to fix a bug can reason about what the bug probably is. It can write what the fix probably should be. What it can’t do, by itself, is check any of that. Without some way to look at the code, run the tests, and adjust, the model is doing a plausible performance of debugging on a codebase it can’t see. That performance is fast and confident, and it’s wrong often enough that anyone who has tried it learned quickly to stop.

A pure “plan everything up front, then execute” approach fails for the same reason. The plan is written before the codebase has been read. The first tool call reveals something the plan didn’t account for, and the agent now has to either ignore the new information or throw out the plan.

Forces

  • The model cannot see the environment without acting. Every useful fact about the codebase, the tests, or the runtime requires a tool call.
  • Acting without thinking produces random tool calls. The agent flails: grep, read, grep again, with no accumulating understanding.
  • Thinking without acting produces confident fiction. The model fills gaps with plausible guesses, and the guesses are often wrong in exactly the ways that matter.
  • Every thought, action, and observation spends context window tokens. A loop that never terminates will exhaust its budget before it finishes the task.
  • The loop needs an honest stop condition. The agent must be able to decide “I have enough” and end the cycle, or a human has to end it.

Solution

Drive the agent through a loop with three steps on every turn:

  1. Thought. The agent writes a short piece of reasoning: what it currently believes, what it does not yet know, and which single action would close the biggest gap. The thought is conditioned on every prior observation in the window.
  2. Action. The agent emits one tool call with concrete arguments: a grep, a file read, a test run, a code edit. One action per step, not five. The discipline of picking one keeps the agent’s reasoning tied to a specific next move.
  3. Observation. The tool runs and returns its output: the matching lines, the file contents, the test results, the diff that was applied. The observation is appended to the conversation and becomes an input to the next thought.

The loop continues until one of three things happens: the agent concludes the task is done and says so, the agent hits an explicit stop signal from the harness (context pressure, turn limit, approval policy), or a human interrupts.

What makes ReAct work is the interleaving. In pure chain-of-thought prompting, the model reasons for many steps before producing a final answer, with no intermediate grounding. In pure action-only agents, the model calls tools reflexively, without reasoning about what the results mean. Interleaving lets reasoning adjust to reality on every step. The agent that finds an unexpected null in a query result can write a thought about it before choosing whether to add a guard, change the query, or look at the data.

The Yao et al. (2022) paper that introduced ReAct showed the result empirically: on tasks that required both knowledge and action (HotpotQA, FEVER, ALFWorld, WebShop), interleaving beat reasoning-only and acting-only baselines by significant margins. The insight generalized fast. Within two years, the ReAct loop was the default architecture for coding agents across the industry.

Tip

When reading an agent transcript, look for the three-part rhythm: a thought, one tool call, and the result. If you see five tool calls in a row with no reasoning between them, the agent has dropped out of ReAct and into flailing. If you see five paragraphs of reasoning with no tool calls, the agent is talking itself into a plan instead of testing it. Both failure modes have names and cures once you’ve learned to see the rhythm.

How It Plays Out

You ask an agent to find where UserService is defined. The agent thinks: “I don’t know the file layout; the fastest way to locate a class definition is a grep for class UserService.” It runs the grep and observes three hits: src/domain/user_service.ts, tests/user_service_test.ts, and tests/mocks/user_service_mock.ts. It thinks: “The one under src/domain/ is the production class; the others are test files. I should read the production file.” It reads the file, observes a 180-line class, and reports the location with a one-line summary of what the class does. Three thoughts, three actions, three observations, and a clean answer. That’s ReAct on an easy task.

A team is debugging a test that fails intermittently on CI but passes locally. The agent begins by thinking: “Flakiness usually points at time, order, or shared state. I should look at how the test sets up state.” It reads the test file, observes that the test mutates a module-level cache, and thinks: “If another test uses the same cache and runs first in CI’s parallel worker, that would explain the order dependence.” It runs the test suite with the flaky test in isolation, observes a pass, and runs it alongside its neighbors, observes the failure. The loop made the diagnosis reproducible, which is the first real step toward a fix. Without interleaved reasoning, the agent would have either stared at the test file guessing or run tests at random until something matched.

An engineer gives an agent a migration task: convert forty-two database queries from a deprecated ORM to its successor. Each iteration of the agent’s ReAct loop reads one query, thinks about the structural difference between the old and new API, writes the edit, runs the affected test, and observes the result. If the test passes, the agent moves to the next query. If it fails, the agent reads the failure and iterates on the edit within the same ReAct loop until the test passes or the agent decides the case needs human attention. The migration is thirty-nine one-step loops and three that went multi-step because the query had a wrinkle. At no point does the agent try to plan all forty-two changes up front; the plan is re-derived on every step from what the last test actually did. That’s ReAct doing useful work at scale.

Where the Loop Breaks

The loop is reliable in the common case but not self-correcting in every failure mode. The recurring traps worth recognizing:

  • Runaway loops. The agent keeps acting and reasoning without making progress. This is the failure that Ralph Wiggum Loop documents and the harness-level steering loop is built to interrupt. Detection is usually external: a turn counter, a repeated-observation check, or a human noticing the spin.
  • Observation overload. A single tool call returns fifty thousand tokens. The observation dominates the context window and pushes older thoughts out. The cure is tighter tool contracts: head-limited outputs, truncation, pagination, or a specific subagent that summarizes before returning.
  • Premature termination. The agent concludes too early because it thinks it is done. This is typically a reasoning failure, not a loop failure, and it is what verification loops and independent evals catch.
  • Brittle parsing. In early ReAct implementations, the agent’s thought and action were parsed from a single text string. Malformed output broke the loop. Structured tool-calling APIs from the major model vendors have mostly eliminated this failure; it still appears in hand-rolled implementations.

Consequences

Naming ReAct gives readers and teams a shared word for something they already use every day. Debugging conversations get sharper: “the loop is fine, the tool’s output is too big” means something specific now. Comparing agents gets easier: two coding agents with different UIs are probably running similar ReAct loops with different stop conditions, and once you see that, you can reason about which one to pick for a given task.

The pattern also shapes what goes around it. Plan mode inserts a deliberate reasoning-heavy phase before handing the same loop a richer starting context. Verification loops wrap the ReAct loop’s output in a test-based check rather than trusting it. Steering loops are the harness primitive that watches a running ReAct loop and corrects it in flight. Each of these patterns assumes the ReAct inner loop is already there; once you’ve named it, you can reason about the layers on top.

The costs are real. Every step spends tokens on thought and observation, not only action, which makes ReAct more expensive per unit of work than a pure action-only agent would be on tasks where the model already knows what to do. The interleaving also couples reasoning to whatever the most recent observation was, which can let an unexpected result pull the agent sideways from its original plan. Longer horizons amplify this. Beyond a few dozen steps, the agent often needs external structure to stay anchored: a progress log, a plan file, or a checkpoint.

  • Depends on: Agent – ReAct is the inner loop definition of what “agent” means in 2026.
  • Depends on: Tool – every action is a tool call; without tools there is no ReAct, only chain-of-thought.
  • Used by: Plan Mode – plan mode structures two ReAct passes (explore first, implement second) with a human review between them.
  • Used by: Verification Loop – the outer loop that checks the ReAct loop’s output against tests.
  • Used by: Research, Plan, Implement – three ReAct passes with changed prompts and allowed tools between them.
  • Used by: Ralph Wiggum Loop – restarts a ReAct loop with fresh context once per task.
  • Corrected by: Steering Loop – the harness watches a running ReAct loop and intervenes when it drifts.
  • Contrasts with: Generator-Evaluator – generator-evaluator is two agents criticizing each other; ReAct is one agent criticizing its last observation.
  • Related: Prompt – ReAct is the prompting strategy that produces a thought-action-observation trace.
  • Related: Context Engineering – what the agent sees in each observation is a context-engineering decision.
  • Failure mode: Ralph Wiggum Loop is also the antidote to runaway ReAct over long horizons.

Sources

  • Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao introduced the pattern in ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629, 2022; published at ICLR 2023). The paper gave the loop its name and the empirical evidence that interleaving beat reasoning-only or acting-only baselines.
  • The ReAct prompting template was popularized through the promptingguide.ai reference, LangChain’s early agent implementations, and the LangGraph Thought-Action-Observation node primitive, which together made the loop easy to adopt without re-reading the paper.
  • Anthropic’s tool-use API and OpenAI’s function-calling API turned the original text-parsed ReAct trace into structured JSON, eliminating the brittle-parsing failure that early implementations suffered from.
  • The widespread mid-2020s adoption of ReAct as the default coding-agent architecture emerged as a community practice among agentic coding teams; no single author owns that shift, though the Yao et al. paper is the universal reference.