Code Mode
Instead of showing an agent every tool’s schema and having it emit JSON calls one step at a time, give it a small API and let it write code that calls those tools inside a sandbox.
Also known as: Code-Mode MCP, Code Execution with MCP, Tools as Code.
Understand This First
- MCP (Model Context Protocol) – the tool-exchange protocol that Code Mode restructures.
- Tool – the callable capability being wrapped.
- Sandbox – where the model’s generated code actually runs.
- Context Window – the bounded working memory the pattern conserves.
- Context Rot – the failure mode Code Mode mitigates at scale.
Context
At the agentic level, a modern agent can connect to hundreds or thousands of tools through MCP servers. Each tool comes with a name, a description, and an input schema, and the agent’s harness loads these definitions into the context window so the model knows what is available. For small tool sets this is fine. For an enterprise surface with a few thousand endpoints, it doesn’t stay fine for long.
The classic MCP loop works like a phone call: the agent picks one tool, emits a JSON call, waits for the full response to come back through the model, reads it, picks the next tool. Every intermediate result passes through the context window. Every decision costs a round trip. When the model needs to join five API responses, filter the result, and keep only the three rows that matter, it must ferry all of that data through its own brain.
Code Mode sits at the boundary between the harness and the tool layer. It asks a different question: what if the agent wrote a short program instead of a sequence of JSON calls? That’s the whole idea.
Problem
How do you give an agent access to a large surface of tools without drowning it in schemas, without piping every intermediate result back through the model, and without losing the ability to compose multiple calls into a single coherent step?
The classic tool-use pattern breaks down at scale in three ways. Loading thousands of tool schemas eats a huge fraction of the context window before the agent has done any work. Piping raw API responses back through the model turns a 150,000-token payload into 150,000 tokens of context rot. And composing five tool calls into one logical action (fetch orders, fetch customers, join them, filter by date, return the top three) costs five full round trips through the model, each with its own opportunity to wander off.
Forces
- Context economics. Every tool schema and every intermediate response competes for space with the agent’s actual working memory. Schemas alone can cost over a million tokens on realistic enterprise surfaces.
- Model skill asymmetry. Modern models are markedly better at writing code than at composing long chains of step-by-step JSON tool calls. Training corpora have more code than tool-call transcripts.
- Composition and filtering. Most useful work is not a single tool call. It is fetch, join, filter, reduce. Forcing that through one-call-per-turn is expensive and brittle.
- Safety and auditability. Running model-written code is a different risk profile than running discrete, pre-audited tool calls. The sandbox becomes load-bearing.
- Discoverability. If the agent cannot see every tool’s schema up front, it needs another way to find out what is available when it needs it.
Solution
Expose tools to the agent as a small programming-language API (typically TypeScript), and give the model two operations: one to search for available tools, and one to execute a block of code against them inside an isolated sandbox. The model produces a short program. The harness runs it. Intermediate data stays in the sandbox. Only the distilled result returns to the context window.
Concretely, the harness provides two tools in the classic MCP sense:
search(query): returns a compact list of relevant tool signatures, on demand. The model does not need every schema up front; it looks up what it needs when it needs it.execute(code): runs a TypeScript snippet inside a locked-down runtime. The snippet calls tool functions directly, chains their results, filters and joins in memory, and returns a value.
The model writes something like:
const orders = await tools.orders.list({ since: "2026-04-01" });
const customers = await tools.customers.batchGet(
orders.map(o => o.customerId)
);
return orders
.map(o => ({ ...o, customer: customers[o.customerId] }))
.filter(o => o.total > 100)
.slice(0, 3);
That snippet runs once. The 10,000-row orders list and the 10,000-row customer list never touch the context window. Only the three-row result does.
The sandbox is the load-bearing part of the design. Generated code is arbitrary code, and if it can escape its runtime it can reach anything the harness can reach. The usual ingredients (process isolation, no filesystem access, no ambient network, strict timeouts, capability-scoped APIs) are not optional here. They are the pattern.
When you adopt Code Mode, start by putting just one or two tools behind the sandbox and keeping the rest on the classic MCP path. Watch what the agent writes. The generated code is a useful signal about whether your API shapes are sensible or whether the model is fighting them.
How It Plays Out
A small team runs a customer-support agent against an internal platform with about 2,400 endpoints exposed through MCP. The classic loop works for simple tickets and falls over the moment the agent needs to cross-reference accounts, invoices, and usage logs. They move to Code Mode: the agent now calls search("invoices overdue"), gets back three relevant tool signatures, writes a fifteen-line TypeScript block that joins the three data sets, and returns a short summary. The daily token bill drops by roughly 80% on the multi-step tickets, and response latency falls because the model stops narrating every intermediate step.
A different team tries the same move and discovers a subtler benefit. Their agent used to get lost in long tool chains; a mistake in step two would quietly poison steps three through seven. With Code Mode, the agent writes the whole plan at once, in code, and the sandbox either returns a clean value or throws an error the agent can actually read. Debugging becomes “read this stack trace” instead of “reconstruct what the agent was thinking six turns ago.” That’s a real change in how the team spends its time.
The sandbox is the whole security story. An agent that can write code has every capability the runtime grants it: network access, environment variables, filesystem handles. Don’t let Code Mode graduate from a prototype to a production surface until you’ve decided, explicitly and in writing, what the sandbox can and can’t touch.
Consequences
Benefits.
- Token usage drops sharply on complex tasks, often by more than half, and sometimes by 80% or more when the work is genuinely multi-step.
- The agent composes rather than narrates. A join, a filter, and a reduction become one step instead of five.
- Intermediate data stays out of the context window, which protects against context rot on long-running tasks.
- The generated code is inspectable. A human reviewer can read a fifteen-line program much faster than a seven-turn JSON call trace.
Liabilities.
- The sandbox becomes a critical component. An escape means the agent has free run of whatever the runtime can reach.
- Per-tool approval policies become harder. When five tools are called inside one
execute(), the traditional approval policy that gates each call individually doesn’t cleanly apply. - Failure modes shift. Instead of a bad tool call, you now face runtime errors, timeouts, non-terminating loops, and the occasional syntax mistake.
- Observability changes shape. Intermediate tool calls inside
execute()still need logging, but they happen in a different process; your tracing story needs to cover both the model turn and the sandbox run.
Related Patterns
- Refines: MCP (Model Context Protocol) – Code Mode is a way of consuming MCP tools that restructures how schemas and results move through the context window.
- Uses: Tool – the underlying capabilities Code Mode composes.
- Uses: Sandbox – the isolation boundary that makes running model-written code tolerable.
- Complements: Retrieval – both pull only what is needed into the context window, one for documents, the other for tool results.
- Complements: Handoff – an
execute()call is a structured handoff to a code interpreter and back. - Mitigates: Context Rot – intermediate data never reaches the model.
- Constrained by: Least Privilege – the sandbox must grant only the capabilities the generated code actually needs.
- Contrasts with: classic per-call JSON tool use, where every schema loads up front and every result round-trips through the model.
Sources
Cloudflare introduced the name in two March 2026 engineering posts, “Code Mode: the better way to use MCP” and “Code Mode: give agents an entire API in 1,000 tokens,” which argued the architectural case and reported token savings of roughly 32% on simple tasks and 81% on complex batch operations against their own 2,500-endpoint MCP surface.
Anthropic’s engineering note “Code execution with MCP: building more efficient AI agents” makes the same structural argument from a model-provider vantage point, framing code execution as the natural next step for agents wiring together large tool sets.
The broader vocabulary (search-and-execute, sandbox-bounded tool composition, TypeScript as the agent’s working surface) has been picked up across the agentic tooling community through 2026, including the universal-tool-calling-protocol project, which ships a library that adapts MCP and UTCP tools into code-mode form for harnesses outside Cloudflare’s stack.
Further Reading
- Cloudflare, “Code Mode: the better way to use MCP” (https://blog.cloudflare.com/code-mode/) – the original framing, with benchmarks and architectural diagrams.
- Anthropic, “Code execution with MCP: building more efficient AI agents” (https://www.anthropic.com/engineering/code-execution-with-mcp) – the model-provider perspective on why code execution scales where JSON tool calls do not.
- universal-tool-calling-protocol/code-mode on GitHub (https://github.com/universal-tool-calling-protocol/code-mode) – a portable implementation that works outside Cloudflare’s runtime.