Context Window

Concept

A foundational idea to recognize and understand.

Understand This First

Model – the context window is a property of the model.

At the agentic level, the context window is the bounded working memory available to a model. Everything the model can “see” during a single interaction (the system prompt, the conversation history, any files or documents provided, and the model’s own previous responses) must fit within this window. It’s measured in tokens (roughly, word fragments), and its size varies by model: from tens of thousands to over a million tokens.

The context window is the single most important constraint in agentic coding. It determines how much code an agent can consider at once, how long a conversation can run before losing coherence, and how much guidance you can provide in instruction files and prompts.

Problem

How do you work effectively with an agent when its memory is bounded and everything outside the window is invisible?

The context window creates an asymmetry: you, the human, can walk away and come back with your full memory intact. The model can’t. Once information falls outside the window (because the conversation grew too long, or because a file wasn’t included) the model proceeds as if that information doesn’t exist. It won’t tell you it has forgotten; it will generate plausible output based on whatever it still has.

Forces

Larger windows allow more context but increase cost and can decrease response quality as the model attends to more material.
Conversation length grows naturally as work progresses, eventually pushing early context out.
Relevant information is scattered across many files, but including all of them may exceed the window or dilute focus.
The model can’t request information it doesn’t know it lacks. It works with what it has.

Solution

Treat the context window as a scarce resource and manage it deliberately. This is the foundation of context engineering.

Include what matters most, earliest. Models tend to attend most strongly to the beginning and end of their context. Put project conventions, critical constraints, and the current task description early.

Exclude what doesn’t matter. If the model is working on one file, it doesn’t need the entire codebase. Provide the relevant file and its immediate dependencies. This is why good code architecture (with clear module boundaries and minimal coupling) directly improves agentic workflows.

Watch for context exhaustion. Long conversations degrade in quality as the window fills. If you notice an agent repeating earlier mistakes, ignoring instructions it previously followed, or producing lower-quality output, the context may be saturated. Start a fresh thread with a focused summary of the current state. See Compaction and Thread-per-Task.

Use the agent’s tools to extend its reach. An agent that can read files, search codebases, and run commands doesn’t need everything preloaded into context. It can fetch what it needs on demand. This is why tools matter so much: they turn the context window from a hard limit into a soft one.

Tip

If an agent starts ignoring your project conventions or producing code that contradicts earlier instructions, the context window may have pushed those instructions out of the model’s effective memory. Restate the instructions or start a fresh conversation thread.

How It Plays Out

A developer has been working with an agent for an hour, building out a module. The early conversation established that the project uses TypeScript with strict null checks and a specific error-handling convention. By the sixtieth message, the agent starts returning JavaScript with loose typing and try/catch blocks. The developer’s instructions haven’t changed. They’ve simply scrolled out of the model’s effective attention.

A team structures their codebase with small, well-documented modules. When an agent needs to modify a module, it reads only that module and its interface contracts. The small module size means the agent can hold the complete picture within its window. A competing codebase with tangled dependencies requires the agent to load five files to understand one function, burning most of its window on navigation.

Example Prompt

“Read src/auth/middleware.ts and src/auth/types.ts, then add rate limiting to the login endpoint. Don’t read other files unless you need to check an import.”

Consequences

Understanding the context window makes you a more effective director of AI agents. You learn to provide focused context, start fresh conversations when quality degrades, and structure codebases for agent-friendliness.

The cost is ongoing attention management. You must decide what to include and what to leave out, and those decisions affect the quality of the agent’s work. Over time, tools like compaction, instruction files, and memory reduce this burden, but they are themselves patterns that require understanding and practice.

Depends on: Model — the context window is a property of the model.
Enables: Context Engineering — context engineering is the practice of managing the window deliberately.
Enables: Compaction — compaction addresses context exhaustion.
Enables: Thread-per-Task — fresh threads reset the context window.
Uses: Local Reasoning — code that supports local reasoning requires less context.

Keyboard shortcuts

Encyclopedia of Agentic Coding Patterns