REPL
The REPL wraps an agent in a persistent read-evaluate-print-loop shell so a human can direct it conversationally, one turn at a time, with the session state preserved across turns.
Also known as: Read-Eval-Print Loop, Interactive Shell, Conversational Shell
Understand This First
- Harness (Agentic) – the harness is what implements the REPL; this article explains the interaction shape the harness wraps.
- Agent – the thing that runs inside each loop turn.
- Tool – tool calls happen inside the evaluate step of each turn.
Context
At the agentic level, a REPL is the interaction shape that most coding agents inhabit. Claude Code, Aider, Codex CLI, sgpt, and every agent that lives in your terminal runs as a read-evaluate-print-loop: read the human’s input, evaluate it by planning and invoking tools, print the transcript, and loop back with the session state intact. The shape is older than the agents that use it. Lisp pioneered the REPL in the 1960s, and it’s since become the default way humans interact with a running computation: Python’s interpreter, Node’s shell, the browser’s devtools console, IPython and Jupyter, and every Unix shell in common use.
Agentic coding inherits this lineage. The twist is what happens in the E (evaluate) step. A traditional REPL evaluates an expression and returns a value. An agentic REPL evaluates a natural-language request by running a ReAct loop against a model, calling tools, and streaming a transcript back. The outer loop is the same. The inner behavior is different, and that’s what lets the pattern feel familiar and unfamiliar at the same time.
Problem
How do you give a human productive, interactive access to a stateful, nondeterministic reasoner that may need to run for minutes and touch dozens of files per turn?
Two obvious shapes fail. A one-shot prompt (the agent takes a request, returns an answer, forgets everything) throws away the session state the agent just earned, and forces the human to rebuild context every time. A background job (submit a task, come back when it’s done) hides exactly the signals the human needs to steer: what the agent is trying, what it’s finding, where it’s stuck. Neither shape supports the tight collaboration that the work actually calls for.
Forces
- The agent is stateful within a session. Tool results, partial plans, and corrections build up across turns; throwing them away between turns is wasteful and alienating.
- The human needs to steer continuously. Complex work rarely survives a one-shot prompt intact; the human wants to interrupt, redirect, approve, and resume.
- Each turn is nondeterministic and can be long. The agent may plan, call tools, revise, and call more tools before it’s ready to print. The interface has to make that progress visible without demanding that the human babysit every token.
- History is the durable artifact. The transcript is what lets you go back, audit what the agent did, and resume after an interruption.
Solution
Wrap the agent in a read-evaluate-print-loop and make each phase observable, interruptible, and persistent.
Each turn has four phases. Read accepts the human’s input: a prompt, a slash command, a pasted file, an approval response. Evaluate runs the agent: the model plans, tools are called (sometimes with inline approval prompts), intermediate output streams back. Print emits the result and updates the transcript, which now includes the new request, the model’s reasoning trace where appropriate, tool calls and their outputs, and the agent’s reply. Loop returns control to the prompt with all of that session state still in memory for the next turn.
The phases need a few properties to make the pattern work for real agentic use. The loop must yield cleanly between turns: the human should be able to interrupt a running turn, paste in a correction, and resume without losing the transcript. Approval Policy checkpoints are natural yield points inside the evaluate step. Slash commands are a second-class input form the read step recognizes: they parse before the request goes to the model, so the harness can handle them locally without spending tokens. Session state (the transcript plus any extracted plans, memory edits, and tool cache) persists across turns and is usually resumable across restarts through a session file or database.
The pattern isn’t universal. A batch or one-shot agent (a cron-scheduled refactor run, a CI-time security review, a code-completion call) is a different shape: it’s a filter, not a shell. It takes input, produces output, and exits. Both shapes are valid. The REPL shape is the right one when a human needs to collaborate with the agent turn by turn; the batch shape is the right one when the work is specified tightly enough that no per-turn steering is needed.
When you’re designing or choosing an agent harness, ask which REPL phases it lets you observe and intervene in. A harness that hides the evaluate step behind a spinner is hard to steer. A harness that streams tool calls, surfaces approval prompts inline, and preserves an auditable transcript is doing the REPL job well.
How It Plays Out
One turn inside Claude Code, in detail. The human types a request: “Refactor the payment module so that the retry policy lives in its own file.” Read picks up the prompt and appends it to the session transcript. Evaluate hands the transcript to the model; the model plans, decides it needs to read four files, and requests a tool call. The harness’s approval policy is set to allow file reads without asking, so the reads fire, their results stream back into the model’s next step, and the model drafts a patch. The patch involves a write, which the approval policy gates: the REPL yields, the human sees a diff, types y, and the write completes. Tests run as a follow-up tool call, pass, and the model prints a summary. Control returns to the prompt. The transcript now contains the request, four reads, one approved write, test output, and the summary — ready for the next turn.
A batch-shaped contrast: a weekend refactoring agent that runs overnight. There’s no REPL. The human hands it a plan file, it runs to completion, and posts a pull request. No per-turn steering, no interactive approvals, no transcript the human reads in real time. The inputs and outputs look similar to a REPL session; the shape of the interaction is different, and so is the kind of trust the human extends. Knowing which shape you’re in keeps the UX expectations aligned with what the agent can actually do.
A developer using IPython as a data-exploration REPL and Claude Code as a coding REPL side by side notices the family resemblance: both let you hold state across turns, iterate cheaply, and recover from mistakes without losing context. The difference is what the evaluate step does. That symmetry is why agentic coding feels familiar to experienced programmers and also why the rough edges (when compaction silently drops history, when an approval prompt fires mid-typing, when the transcript scrolls past the viewport) feel like REPL bugs rather than AI bugs. They are REPL bugs. The shape is what’s being engineered.
Consequences
Naming the REPL gives the rest of the agentic vocabulary a stable substrate. Persistent session state, slash commands, inline approvals, transcript audits, interruption and resume, and human steering at turn boundaries all follow from the shape. Readers who understand REPLs already understand ninety percent of how a coding agent’s UI works; the rest is the evaluate step’s internals.
The cost is the usual REPL cost, amplified. Session state grows until something has to give: compaction summarizes older history, handoff transfers context to a fresh session, a thread-per-task boundary starts a new REPL for a different subproblem. Each of those is a destructive edit to the transcript, and the agent won’t tell you what it lost. The REPL also ties the human to the terminal: while one session is running, it’s harder to use that harness for something else, which is why parallelization and worktree isolation exist.
There’s also a design trap. It’s tempting to treat the REPL as the only valid shape for an agent and to retrofit every workflow into a conversational session. Batch shapes are fine. Scheduled shapes are fine. An agent that should be a filter shouldn’t be forced into a shell just because shells are what we’re used to. Pick the shape that matches the task.
Related Patterns
- Depends on: Harness (Agentic) – the REPL is implemented by the harness.
- Wraps: Agent – the loop evaluates an agent turn per iteration.
- Uses: ReAct – the inner behavior inside the evaluate step is typically a ReAct loop.
- Uses: Tool – tools are called during evaluate, and their outputs feed the transcript.
- Uses: Context Engineering – the transcript is the context the loop carries forward; engineering it is a per-turn discipline.
- Hosts: Human in the Loop – each loop boundary is a natural human checkpoint.
- Gates with: Approval Policy – approval prompts fire inside the evaluate step and yield the loop to the human.
- Pairs with: Memory – REPL state is short-term memory; memory carries information across REPL sessions.
- Truncates with: Compaction – when the transcript outgrows the context window, compaction rewrites the loop’s working set.
- Contrasts with: Handoff – handoff is what happens when you leave a REPL session for a new one.
- Contrasts with: Batch and one-shot agents – a REPL is a shell; a filter takes input once, runs, and exits.
Sources
The read-eval-print loop originated in Lisp in the 1960s, where it was the primary way programmers interacted with the running language system. John McCarthy’s group at MIT and the early Maclisp and Interlisp communities established the pattern; it spread to every major interactive language afterward. Harold Abelson and Gerald Jay Sussman’s Structure and Interpretation of Computer Programs (MIT Press, 1985) codified the REPL as a teaching substrate and popularized it across computer-science curricula.
Python’s interactive interpreter, Node’s shell, and the IPython and Jupyter projects are the modern general-purpose REPLs that most working programmers encounter. Fernando Pérez’s IPython work (starting in 2001) pushed the pattern toward rich display, persistent kernel state, and first-class tooling integration — the direct ancestors of the agentic coding REPL’s slash commands, approval prompts, and transcript displays.
The application of the REPL shape to coding agents is a 2024-2026 development. Anthropic’s Claude Code documentation describes the agent as an “interactive session” without naming the shape as a REPL; the naming gap closed first in practitioner writing. The pattern’s recognition as the dominant agentic-coding UX emerged from the community observing that Claude Code, Aider, Codex CLI, and others had independently converged on the same interaction shape.
Further Reading
- Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs – the REPL as a teaching substrate; the first chapters model what a thoughtful read-eval-print loop looks like.
- Python’s Interactive Interpreter documentation – the most widely used modern REPL, and the reference point most working programmers already share.
- Jupyter Project documentation – the richest non-agentic REPL in common use, with persistent kernel state and extension points that prefigure slash commands, inline approvals, and transcript rendering.