Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compaction

Concept

A foundational idea to recognize and understand.

When the conversation outgrows the model’s memory, compaction distills what matters so the work can continue.

Understand This First

Context

Compaction is the summarization of prior conversation history to free up space in the context window. It lets you extend the useful life of a conversation when the thread-per-task approach won’t work, either because the task is genuinely long-running or because starting over would lose too much hard-won context.

The harness or the agent itself performs the compaction. Older parts of the conversation (early explorations, dead-end approaches, resolved sub-problems) get condensed into a summary that captures decisions, current state, and remaining work. That summary replaces the full history, paired with the most recent exchanges that are still actively relevant.

Problem

How do you continue a productive conversation with an agent when the context window is full but the task isn’t done?

Long, complex tasks (multi-file refactorings, extended debugging sessions, feature implementations that span many components) can exhaust the context window before the work is complete. When this happens, the agent’s output quality degrades: it forgets earlier decisions, contradicts its own work, or loses track of the overall plan. Starting a completely fresh thread risks losing context about what’s been tried, what’s worked, and what remains.

Forces

  • Context window limits are hard. Once full, new information pushes old information out.
  • Long tasks exist. Not everything fits neatly into a single-thread conversation.
  • Context quality degrades gradually. The agent doesn’t announce that it’s forgetting; it just gets worse.
  • Summary loss. Any summarization discards detail that might later prove important.

Solution

When a conversation approaches context limits, compact the history: summarize what’s been accomplished, what decisions have been made, what the current state is, and what work remains. Replace the full conversation history with this summary plus the most recent, actively relevant exchanges.

Good compaction captures:

Decisions made. What approaches were chosen and why. What alternatives were considered and rejected.

Current state. What files have been modified, what tests are passing or failing, what the code looks like now.

Remaining work. What still needs to be done, in what order.

Key constraints. Any constraints or conventions established during the conversation that the agent needs to continue following.

Some harnesses compact automatically when the context approaches its limit. Claude Code, for instance, triggers compaction at a configurable threshold and condenses the conversation without interrupting your workflow. The threshold is usually expressed as a reserve token floor: keep at least this much headroom free, and compact whenever the running total threatens to dip below it. Other harnesses require you to request compaction explicitly (“summarize our progress so far and continue”), and a few platforms expose it as an API endpoint that any harness can call.

Automatic and manual triggers each carry a real cost. Automatic compaction stays out of your way but may quietly discard something you wanted to keep. Manual compaction keeps you in the loop at the cost of interrupting flow. Either way, review the summary before you trust it. A compaction is a destructive edit to your working memory, and the agent will not flag what it lost.

Tip

Don’t wait for the context window to fill. Periodically ask the agent to summarize progress during long tasks. These mid-session checkpoints catch misunderstandings early and give you a recovery point if something goes wrong later.

How It Plays Out

A developer is debugging a concurrency issue that spans five modules. After ninety minutes and hundreds of messages, the agent starts repeating suggestions it made an hour ago. That’s the tell: the early context has scrolled out of effective memory. The developer asks the agent to compact: “Summarize what we’ve tried, what we’ve learned, and what we should try next.” The summary captures three failed hypotheses, two promising leads, and the current state of the code. The conversation picks up from the summary with renewed focus.

Example Prompt

“We’ve been working on this for a while and the context is getting long. Summarize what we’ve accomplished, what’s still broken, and what approach we should try next. Then continue from that summary.”

Automatic compaction is less dramatic but more common. A harness detects that the context has reached eighty percent capacity and compacts in the background. It keeps the current task description, the list of modified files, recent test results, and the active plan. Older exchanges get condensed to a few sentences each. The agent keeps working. The developer may not even notice it happened.

Consequences

Compaction extends the useful life of a conversation, letting complex tasks proceed without losing all accumulated context. It’s most valuable for tasks that resist decomposition into independent subagent subtasks, where the work is genuinely sequential and each step depends on the previous one.

The cost is information loss. Summarization discards detail. A fact that seemed unimportant at compaction time may prove critical later. You can mitigate this by keeping summaries thorough about decisions and state, even at the expense of verbosity, and by maintaining a progress log outside the conversation as a durable backup.

Sources

The concept of compaction as conversation summarization emerged from the agentic coding community in 2024-2025 as context windows became the primary bottleneck in extended agent sessions. Anthropic’s Claude Code introduced automatic compaction with configurable thresholds, establishing the pattern of harness-managed context recycling. The term draws an analogy from database compaction (merging and deduplicating stored data), applied to the conversational context that accumulates during agent work.

  • Depends on: Context Window – compaction addresses context window limits.
  • Contrasts with: Thread-per-Task – starting fresh is the alternative to compacting.
  • Enables: Progress Log – compaction summaries can feed the progress log.
  • Uses: Context Engineering – compaction is a context engineering technique.
  • Depends on: Harness (Agentic) – many harnesses perform compaction automatically.