Compaction

Concept

A foundational idea to recognize and understand.

When the conversation outgrows the model’s memory, compaction distills what matters so the work can continue.

Understand This First

Context Window — compaction exists because context windows have hard limits.
Harness (Agentic) — most harnesses perform compaction automatically.

Compaction is the summarization of prior conversation history to free up space in the context window. Older parts of a conversation (early explorations, dead-end approaches, resolved sub-problems) get condensed into a short summary that captures decisions, current state, and remaining work. The summary replaces the full history, paired with the most recent exchanges that are still actively relevant.

The harness or the agent itself performs the compaction. Some harnesses do it automatically when the context approaches a configurable threshold; Claude Code, for instance, watches a reserve-token floor and compacts whenever the running total threatens to dip below it. Other harnesses require an explicit request (“summarize our progress so far and continue”), and a few platforms expose compaction as an API endpoint that any harness can call.

A good compaction is a faithful, lossy snapshot. It captures four things: the decisions made (what approaches were chosen and why, what alternatives were rejected); the current state (what files have been modified, what tests pass or fail, what the code looks like now); the remaining work (what still needs doing, in what order); and the key constraints (any conventions established during the conversation that must keep being honored). The summary is shorter than the original by an order of magnitude, but a competent agent reading it should still be able to pick up the work without re-litigating settled questions.

The term draws an analogy from database compaction (merging and deduplicating stored data), applied to the conversational context that accumulates during agent work. The mechanism is destructive editing on working memory, and the loss is real: any fact discarded at compaction time may turn out to matter later, and the agent won’t flag what it lost.

Why It Matters

Long agentic tasks routinely outrun the context window. A multi-file refactor, an extended debugging session, a feature implementation that spans many components: each can fill the window before the work is done. When the window saturates, the agent’s output degrades in characteristic ways: it forgets earlier decisions, contradicts its own work, or loses track of the overall plan. Compaction is what keeps a long task from collapsing under its own history.

Without the concept, practitioners have no shared vocabulary for that moment. They describe it loosely (“the agent forgot what we were doing”), they reach for ad-hoc remedies (“start a new chat”), and they conflate context exhaustion with model unreliability. The vocabulary matters because compaction sits next to its alternatives: thread-per-task starts fresh and accepts the cost of losing context; compaction holds on at the cost of summarization loss. The naming forces the tradeoff into view.

Compaction also names the seam where agentic systems differ most from earlier conversational AI. The window is finite; long work is not. Every framework that runs an agent across hours of work makes a compaction decision, whether explicitly or by accident. Naming it makes the decision designable.

How to Recognize It

Three signals tell you compaction is in play, or that it should be:

The agent starts repeating itself. A debugging session is ninety minutes in, and the agent suggests an approach it already tried an hour ago. That’s the tell that early context has scrolled out of effective attention. The fix is to compact — either automatically (if the harness supports it) or with an explicit request to summarize progress before continuing.

A summary block appears mid-conversation. Many harnesses surface compaction visually: a folded “earlier in this conversation” block, a notice that the assistant has summarized prior turns, a token-count reset. If you see one, the harness has compacted. Read the summary before trusting the next exchange — a compaction is a destructive edit, and the agent will not flag what it lost.

Context usage flattens despite continued work. If you can see the token meter (in Claude Code, in API instrumentation, in a harness sidebar), watch what it does over a long session. Steady growth followed by a sharp drop is compaction firing. Smooth growth past 80% of the window without a drop usually means no compaction is configured, and you’re heading toward a hard wall.

Automatic and manual triggers each carry costs. Automatic compaction stays out of your way but may quietly discard something you wanted to keep. Manual compaction keeps you in the loop at the cost of interrupting flow. Either way, review the summary before you trust it.

Tip

Don’t wait for the context window to fill. Periodically ask the agent to summarize progress during long tasks. These mid-session checkpoints catch misunderstandings early and give you a recovery point if something goes wrong later.

How It Plays Out

A developer is debugging a concurrency issue that spans five modules. After ninety minutes and hundreds of messages, the agent starts repeating suggestions it made an hour ago. That’s the tell: the early context has scrolled out of effective memory. She asks the agent to compact: “Summarize what we’ve tried, what we’ve learned, and what we should try next.” The summary captures three failed hypotheses, two promising leads, and the current state of the code. The conversation picks up from the summary with renewed focus.

Automatic compaction is less dramatic but more common. A harness detects that the context has reached eighty percent capacity and compacts in the background. It keeps the current task description, the list of modified files, recent test results, and the active plan. Older exchanges get condensed to a few sentences each. The agent keeps working, and the developer may not notice it happened until the next time they scroll back and find the early turns are gone.

Example Prompt

“We’ve been working on this for a while and the context is getting long. Summarize what we’ve accomplished, what’s still broken, and what approach we should try next. Then continue from that summary.”

A platform team running a long-horizon agent job (say, a four-hour code audit) bakes compaction into the harness explicitly. They set the threshold low (60% of the window), capture the summary into a progress log at each compaction event, and treat the log as the durable record. When the agent finishes, the log is the artifact; the in-conversation summaries are scaffolding.

Consequences

Compaction extends the useful life of a conversation, letting complex tasks proceed without losing all accumulated context. It’s most valuable for work that resists decomposition into independent subagent subtasks: genuinely sequential work where each step depends on what came before.

The cost is information loss. Summarization discards detail. A fact that seemed unimportant at compaction time may prove critical later. The mitigations are mechanical: keep summaries thorough about decisions and state even at the expense of verbosity, and maintain a progress log outside the conversation as a durable backup the agent can re-read.

Compaction also shifts the failure mode. Without it, long tasks fail loudly when the window saturates. With it, they fail quietly when a critical detail is summarized away and the agent proceeds on an incomplete picture. The loud failure is easier to notice; the quiet one is more dangerous. Reviewing the summary before trusting the next stretch of work is the discipline that pays this cost down.

Sources

The concept of compaction as conversation summarization emerged from the agentic coding community in 2024-2025 as context windows became the primary bottleneck in extended agent sessions. Anthropic’s Claude Code introduced automatic compaction with configurable thresholds, establishing the pattern of harness-managed context recycling. The term draws an analogy from database compaction (merging and deduplicating stored data), applied to the conversational context that accumulates during agent work.

Keyboard shortcuts