Context Editing
Clear already-processed tool results and stale thinking from the context, replacing each with a short placeholder, so a long-running agent stays under its window ceiling without summarizing.
Also known as: Tool Result Clearing, Context Clearing.
Understand This First
- Context Window — the finite resource that context editing manages.
- Context Engineering — the parent discipline; context editing is one of its moves.
Context
Context editing keeps a long-running session under its context window limit by deleting material the agent has already used. Picture an agent forty tool calls into a task. Turns ago it ran a search that returned 3,000 lines, read the three files that mattered, and moved on. Those 3,000 lines still sit in the window, crowding out room for the work ahead. Context editing clears them: the raw output goes and a one-line marker takes its place, so the model still sees that a search happened without carrying its full weight.
The mechanism is server-side and automatic. You configure a trigger (when total tokens cross a threshold) and a keep count (how many recent tool results to preserve), and the platform prunes older results as the agent works. Two strategies are named consistently across the sources that describe this: tool result clearing drops the raw output of old tool calls once the agent has acted on them, and thinking-block clearing discards older extended-thinking traces while keeping recent reasoning. Both replace what they remove with a placeholder rather than a summary. That is the whole idea, and it is the lightest-touch of the three techniques for surviving a full window.
Problem
How do you keep a heavy-tool-use agent under its window ceiling when most of what fills the window is old output the agent will never look at again?
A long agentic task accumulates exhaust. Every search, every file read, every command leaves its full result in the conversation, and by the fiftieth turn the window is mostly a graveyard of payloads the agent consumed hours ago. You could summarize the whole history, but summarizing is lossy and expensive, and it flattens material the agent didn’t need to keep at all. You could route each payload to disk on the way in, but that means wrapping every tool before the session starts. Sometimes the cheapest move is the bluntest one: the old results are genuinely finished business, so just delete them. The question is how to do that safely, without dropping something the agent still needs.
Forces
- Bluntness vs. fidelity. Clearing is cheaper than summarizing and simpler than offloading, but it keeps nothing. A cleared result is gone, and the agent cannot fetch it back the way it could a file on disk.
- Recency is a proxy, not a guarantee. Keeping the most recent N results assumes old means finished. Usually true; occasionally a result from thirty turns ago is exactly what the agent needs on turn fifty.
- Cache interaction. Clearing a span near the front of the window invalidates every cached prefix after it. On a workload tuned for prompt caching, an aggressive clear can cost more in cache misses than it saves in window space.
- What the placeholder admits. The marker left behind tells the model that something was cleared. Too terse and the agent forgets the call ever happened; too detailed and you have reinvented summarization.
- Configuration is load-bearing. Trigger threshold, keep count, and which tools are exempt from clearing all have to be set for the specific task. The defaults rarely fit a workload that has a few results it must never lose.
Solution
Configure the platform to clear old tool results and stale thinking automatically once the window crosses a threshold, and pair it with a durable store for anything the agent must keep. Set a trigger (clear when total tokens exceed some ceiling), a keep count (preserve the most recent N tool results untouched), and an exemption list (tools whose output must never be cleared). As the agent works past the threshold, the platform drops older results and leaves a placeholder in each cleared span.
The decision that governs everything is clear vs. summarize vs. offload, and context editing is the answer when the old material is genuinely finished business.
- Clear (this pattern) when the old results are done: consumed, acted on, and unlikely to be revisited. It is the cheapest option, with no summarization pass, no wrapper, and no copy kept. The cost is that the material is gone for good.
- Summarize with Compaction when the history itself carries meaning the agent needs going forward: decisions made, dead ends ruled out, the shape of the work so far. Compaction distills that into a summary; context editing would throw it away.
- Offload with Context Offloading when a payload is large but might be needed again. Offloading writes it to disk and hands back a reference the agent can re-read; context editing keeps no such handle.
The three compose rather than compete. A production long-horizon agent often runs all three: offload the payloads it might revisit, clear the tool exhaust it won’t, and compact the conversation history when even the trimmed window fills. The move that makes clearing safe is pairing it with Memory: before a clear fires, the agent writes anything it needs to keep (a finding, a decision, a file path) into a durable store, so deletion removes only what was truly spent. Clearing without a memory discipline is where this pattern bites: the agent drops a result it turns out to need, and unlike compaction, there is no summary that even hints at what was there.
Two settings decide whether it helps or hurts. Set the keep count high enough that the agent’s recent working set survives every clear, and exempt any tool whose output is reference material the agent samples throughout the task, not exhaust it consumes once. And watch the prompt caching interaction: schedule clears so they fall at cache-friendly boundaries rather than firing mid-prefix on every turn.
Turn on the memory tool before you turn on clearing. Context editing is safe exactly to the degree that the agent has already written down what it needs to keep. Clearing without a place to save findings first is how a long session quietly loses the one result it can’t re-derive.
How It Plays Out
A support agent runs a hundred-turn troubleshooting session, calling a search tool on nearly every turn. Most searches return a few thousand tokens the agent scans once to pick the right document, then never reads again. With clearing configured to keep the last five tool results and trigger at 70% of the window, the session runs the full hundred turns without ever hitting the ceiling: each search result lives long enough to be used, then gets cleared to a placeholder as newer results push it out. Nothing was summarized and no wrapper was written; the old exhaust simply stopped taking up room. The vendor that shipped this mechanism reports roughly an 84% token reduction on a hundred-turn web-search evaluation and a meaningful quality lift from clearing alone, larger still when the memory tool is paired in. Treat those figures as a vendor-reported result on their benchmark, not a guarantee for your workload.
Now the contrast. A different agent is doing a multi-file refactor where each step depends on decisions made earlier: which module owns the new interface, why an alternative was rejected, what convention the last edit established. Here clearing is the wrong move. The value is in the history, not the payloads, so this session wants Compaction, which preserves the decisions in a summary. And a third agent runs a research task that pulls a few large reports it must compare against each other late in the session. Clearing would delete a report the moment a newer one arrives; this one wants Context Offloading, so each report goes to disk and stays retrievable. Same window pressure, three different answers, and the tool-result-clearing agent is the one whose old material was truly finished business.
Keep count set too low is the classic failure. If clearing preserves only the last two results but the agent’s current step needs the output of a call it made five turns ago, that output is already a placeholder, and the agent will confidently proceed on a gap it cannot see. Size the keep count to the task’s real working set, not to whatever squeezes the window smallest.
Consequences
Context editing is the cheapest way to extend a heavy-tool-use session. There is no summarization pass to pay for and no tool wrapper to build: you set a threshold and a keep count, and old exhaust stops accumulating. For workloads dominated by tool output the agent consumes once (search, retrieval, repeated lookups), it keeps a session under its ceiling with almost no engineering cost, and it composes cleanly with compaction and offloading rather than replacing them.
The cost is that clearing keeps nothing. A cleared result is gone, with no summary and no reference, so the pattern is only as safe as the memory discipline behind it. Get the keep count or the exemptions wrong and the agent drops something it needed, then proceeds on a gap that leaves no trace — the same quiet failure mode that makes lossy compaction dangerous, minus even the summary. Clearing also fights prompt caching: every clear near the front of the window invalidates the cached prefix after it, so on a cache-tuned workload an aggressive setting can spend more than it saves. Tune the trigger, the keep count, and the exemptions for the specific task; the defaults are a starting point, not an answer.
Related Articles
Sources
- The clear-versus-summarize distinction, and the two named strategies (tool result clearing and thinking-block clearing), were developed by Anthropic as a configurable capability on its Claude platform, documented alongside a cookbook on managing context through memory, compaction, and tool clearing. The vendor-reported evaluation numbers cited above come from that work.
- The surrounding discipline, which treats the context window as a finite resource to be curated rather than filled, comes from the body of context-engineering practice that emerged across the agent-building community through 2025. Clearing is one specific move within it.
- The pattern’s generality beyond a single vendor is visible in independent reimplementations of the same clear-and-placeholder mechanism for other model providers, which is a signal that clearing is a technique rather than one platform’s feature.