Progressive Disclosure
Understand This First
- Context Window – the finite resource that forces the question of what to load when.
- Context Engineering – the broader discipline; progressive disclosure is one of its core moves.
Context
At the agentic level, progressive disclosure is a design principle for loading instructions, tool definitions, and reference material into an agent’s working memory only when they become relevant, not eagerly and up front. The name comes from human-computer interaction, where good interfaces reveal complexity as the user needs it instead of showing every feature on every screen. The same idea now organizes how agents find the material they need to do a task.
The principle reshapes how you build the artifacts that govern an agent’s behavior. An instruction file stops being a monolithic rulebook and becomes a short index with pointers. A skill stops being a single long document and becomes a small metadata header that loads its body only when the classifier decides the skill applies. A tool definition stops being preloaded and gets registered only for the subset of work that needs it.
Problem
How do you give an agent enough guidance to do the work well, without drowning it in material that has nothing to do with the task in front of it?
Every token you load eagerly is a token that crowds out what actually matters. If the agent’s CLAUDE.md lists thirty project gotchas, eight are relevant to today’s task and the rest are noise. If the harness preloads forty tool definitions, the agent has to scan past thirty-five irrelevant ones to find the three it will call. The more you try to cover in advance, the less the agent attends to any of it, and the shorter the effective working memory becomes for the real problem.
The naive response is “just load everything, the model will sort it out.” Modern context windows are large enough that this feels safe. It isn’t. Loaded context competes for the model’s attention, degrades judgment on the foreground task, and accelerates context rot. The alternative (loading nothing) is worse. You need a third option: the agent pulls what it needs, when it needs it, and ignores the rest.
Forces
- Coverage vs. attention. You want to cover every situation the agent might hit. You also want the agent’s attention focused on the current task.
- Predictability vs. flexibility. Eager loading is predictable: you know exactly what the agent sees. On-demand loading is flexible: the agent assembles the right context per task, but you trade away some of that predictability.
- Discovery cost. If material lives somewhere the agent cannot find, it may as well not exist. Progressive disclosure requires a small, always-present index that makes the rest discoverable.
- Classifier accuracy. When the agent decides what to load next, mistakes happen. The system must tolerate a skill loading that does not apply, or missing a skill that did.
- Author effort. Writing material so that it loads in layers (headline, body, supplements) costs more upfront than dumping everything into one file.
Solution
Structure the agent’s knowledge in three tiers and load them on demand.
Tier 1: the always-loaded index. A small metadata layer that tells the agent what is available and when each piece applies. For Anthropic’s Agent Skills, this is the frontmatter at the top of every SKILL.md file: a name, a one-line description, and a trigger hint. Roughly a hundred tokens per skill. Every session sees this layer. It is the table of contents the agent reads before deciding what to open next.
Tier 2: the on-demand body. When the agent’s classifier decides a skill, instruction section, or reference document applies to the current task, it loads the full body into context. The body is written assuming the agent already saw the metadata and decided to open it, so it can start directly with the substance.
Tier 3: supplements. Scripts, schemas, large examples, and reference tables the body may or may not need. These load only when the body explicitly references them. A skill for writing database migrations might bundle a naming script, an example migration, and a schema cheatsheet, all sitting in Tier 3 until the body actually pulls them in.
The same three tiers apply to instruction files. A short top-level CLAUDE.md stays under sixty lines and points at deeper documents: docs/architecture.md, docs/testing.md, docs/deployment.md. The agent reads the top-level file on every session, follows the pointers only when the current task warrants it. Tool registration follows the same shape: register the handful of tools that the session’s task type needs, discover the rest only when the agent asks for them.
Two practices make progressive disclosure work in practice.
Write for classification. The Tier 1 description has to make the load decision easy. A vague description like “helps with testing” forces the agent to load the body just to find out. A specific description like “use when adding a Python unit test to an existing pytest suite” lets the agent skip it confidently when the task doesn’t match. Treat the description as a contract: it’s the only thing the agent sees before deciding whether to pay the cost of loading the body.
Let the body point outward. Tier 3 supplements should be referenced by path from the body, not pasted inline. A skill body that says “see examples/complex_migration.sql for the multi-step case” lets the agent fetch the example only when the user’s task needs it. A body that pastes the example inline forces every invocation of the skill to carry those tokens, whether they matter or not.
Before adding anything to your project’s top-level instruction file, ask: does the agent need this on every single task? If the answer is no, push it into a deeper document and point the top-level file at it with a one-line description of when the deeper document applies.
How It Plays Out
A team’s CLAUDE.md had grown to 300 lines. It covered coding style, testing conventions, deployment steps, incident procedures, onboarding notes, and a dozen project-specific gotchas. Every session loaded all of it. When the team audited a week of agent output, they found that the style rules were followed but the deployment section, loaded every time, was never touched on most tasks. They cut CLAUDE.md to forty lines of always-true conventions and moved the rest into five focused documents under docs/, each referenced by one line in the top-level file. The agent now reads deployment steps only when it’s actually deploying. Average session context dropped by about 15%, and adherence to the style rules went up, not down, because those rules were no longer buried.
A developer writes a skill for generating database migration files. The skill’s frontmatter says: “Use when creating a new database migration in this project. Applies to Postgres migrations only.” The body explains the naming convention, the up/down structure, the review checklist, and points at a scripts/validate-migration.sh helper. A reference library of example migrations sits in examples/, linked from the body but not included inline. When the agent is asked to write a Ruby unit test, the skill’s Tier 1 description makes it obvious this skill does not apply, and the body never loads. When the agent is asked to add a migration for a new users.verified_at column, the description matches, the body loads, and the reference example for adding a nullable column loads only after the body signals it is needed.
“Restructure our CLAUDE.md using progressive disclosure. Extract sections that only matter for specific tasks (deployment, testing, incident response) into separate files under docs/. Leave a one-line pointer in CLAUDE.md for each extracted file, naming when the agent should read it.”
A harness team building an agentic framework started with eager tool registration: forty tools visible on every turn. Token usage was fine but tool-choice accuracy suffered; the model regularly picked a plausible-but-wrong tool from the long list. They rewrote the harness to register tools in tiers: a core set of six always-visible tools (file read, file write, shell, search, list directory, and an index of available tool-groups), plus groups that load on demand when the agent asks for them by name. Tool-choice accuracy improved measurably, and an unexpected second benefit followed: the agent learned to ask for specialized tool-groups explicitly, which made its reasoning more legible to the humans reviewing its work.
Consequences
Progressive disclosure turns context from a liability into a resource. The agent’s attention stays focused on material that matters for the current task. Large bodies of expertise can exist without crowding out the foreground. Author effort pays off repeatedly: one well-structured skill serves dozens of future invocations without bloating any of them. Systems that apply the principle scale further, accommodating more skills, more tools, and more conventions, without the quality degradation that eager loading produces.
The costs are real. You have to write in layers, which is harder than writing in one long document. You have to design Tier 1 descriptions well enough that the classifier makes good load decisions. You have to tolerate occasional misses: a skill that should have loaded and didn’t, or one that loaded and didn’t apply. Debugging an agent that chose not to open the right document requires tooling that exposes which tiers were consulted. And you have to maintain the discipline over time, because the path of least resistance when adding something new is to paste it into the top-level file where everyone will see it. That is exactly the anti-pattern this whole approach exists to prevent.
The opposite, eager context loading, is so common that it deserves its own warning. Teams reach for it out of anxiety: “what if the agent misses something important?” The honest answer is that if you load everything, the agent will miss something important, because the signal gets diluted in noise. Progressive disclosure is the discipline that trades a small risk of missing context for a large gain in the quality of what the agent does attend to.
Related Patterns
- Depends on: Context Window – the finite resource that makes disciplined loading necessary.
- Uses: Context Engineering – progressive disclosure is one of the core moves of the broader discipline.
- Contrasts with: Context Rot – the degradation progressive disclosure is designed to prevent.
- Enables: Skill – the canonical application; skills are structured in exactly the three tiers progressive disclosure describes.
- Enables: Instruction File – the same three-tier shape applies to project-level guidance.
- Related: Compaction – the recovery move when eager loading has already happened; progressive disclosure is the prevention.
- Related: Retrieval – retrieval is a concrete implementation of progressive disclosure over an external corpus.
- Related: Tool – tool registration benefits from the same tiered approach.
Sources
- Progressive disclosure as a design principle for user interfaces was articulated by Jakob Nielsen and colleagues at the Nielsen Norman Group, who defined it as deferring advanced or rarely used features to secondary screens so initial interfaces stay simple. The idea predates them in usability research but the 1990s NN/g writings made the name standard.
- Anthropic’s Agent Skills documentation explicitly names progressive disclosure as “the core design principle that makes Agent Skills flexible and scalable,” specifying the three-tier model used in this article: metadata always loaded, body loaded on demand, supplementary files loaded only when referenced.
- The practice of structuring agent instruction files in layers (a short top-level file with pointers to deeper documents loaded only when relevant) emerged from the agentic coding community in late 2025 and early 2026 as projects hit the limits of monolithic CLAUDE.md files. Several independent practitioners published versions of the same advice within a few months, treating context-window crowding as the shared problem.
- The broader observation that eager loading degrades model attention comes from the context engineering discipline as a whole and connects to research on long-context attention decay. The Context Rot article traces this line in more depth.