Test Impact Analysis

Pattern

A named solution to a recurring problem.

Run only the tests a change can affect, chosen from a change-to-test map, so feedback stays fast as the rate of changes climbs.

Also known as: TIA, Predictive Test Selection, Test Selection

A suite that takes forty minutes is fine when you push twice a day. It stops being fine the moment a coding agent pushes twice an hour, and stops being usable when three agents do. Test Impact Analysis sits between a changed diff and the suite and answers one question before the runner starts: which tests could this change actually break? It runs those, skips the rest, and returns the answer in a fraction of the time. The trick is doing it without quietly skipping the test that would have caught the bug.

Understand This First

Test — the suite TIA selects from; you need a body of tests before selection means anything.
Regression — the failure class TIA is built to catch fast.
Continuous Integration — the pipeline stage where the selected slice runs.

Context

You direct changes into a codebase with a real test suite: hundreds or thousands of tests, the kind that takes long enough to run that nobody runs the whole thing on every save. The suite is your regression defense. It exists to tell you when a change broke a behavior that used to work. That defense only helps if it runs, and it only stays in the loop if running it is cheap enough that running it is the default.

This is a tactical, operational concern, and it gets sharper under agentic coding. When a human pushes a few times a day, a slow suite is an annoyance. When an agent is iterating (edit, run, read the failure, edit again) a slow suite is the bottleneck that sets the agent’s whole pace. The agent can’t take its next step until the runner finishes, so every minute the suite costs is multiplied by every iteration. Test Impact Analysis attacks that minute directly.

Problem

The full suite is the safe choice and the slow choice, and those two facts pull against each other harder the more often you change the code. Run everything on every change and you’re certain you caught every regression the suite can catch, but the feedback is slow and the compute bill grows with your edit rate. Run a hand-picked subset and you’re fast, but now you’re betting that the subset covers what the change touched, and that bet fails silently: a regression in a test you skipped looks exactly like a green build.

So the question isn’t should we run fewer tests. Under high edit volume you have to. The question is how do we run fewer tests without losing the regressions we’d have caught? A wrong answer doesn’t announce itself. It ships a green check over a broken behavior.

Forces

Speed versus coverage. Fewer tests mean faster feedback and lower cost, but a smaller slice can miss the regression the change introduced.
Map accuracy versus map cost. A precise change-to-test map needs coverage data or dependency tracing that’s expensive to build and goes stale every time the code moves.
Trust versus convenience. A selector you don’t trust gets bypassed (“just run everything”), erasing its benefit; a selector you trust too much hides the tests it quietly stopped running.
Static reach versus dynamic reach. Dependency maps see imports and call graphs, but reflection, dynamic dispatch, configuration, and generated code create edges the map can’t see.
Agent incentive. An agent optimizing for a green check has every reason to prefer the smallest slice that passes — which is exactly the slice that proves nothing.

Solution

Build a trustworthy map from changed files to affected tests, run the impacted slice on every change, and back it with full-suite runs that catch what the map misses. The map is the core asset. It comes from one of three sources, in rising order of power: a static dependency graph (which tests import which modules), recorded coverage data (which tests actually executed which lines on their last run), or a learned model that predicts affected tests from the history of which changes broke which tests. Coverage-based maps are the common middle ground: instrument the suite once, record which test touched which file, and consult that record when a diff arrives.

Selection alone is not safe; the safety lives in the fallbacks. Always run newly added and recently changed tests unconditionally, because the map has no history for them. Always carry forward tests that failed last time. When the map is missing, stale, or the change touches something the map can’t resolve (a build file, a shared dependency, a config the whole system reads), fall back to the full suite rather than guessing. And run the full suite on a schedule regardless, nightly or on every merge to the main line, so a dependency the map never saw surfaces within a day instead of in production. The selected slice is the fast path; the periodic full run is the net under it.

Make the skipped tests visible. A run that selected 40 of 3,000 tests should say so, on the record, where a reviewer and an auditor can both see it. Selection that hides what it skipped is indistinguishable from a suite that’s silently rotting. Selection that reports what it skipped is an optimization a team can trust and an agent can’t game, because the slice it ran is evidence, not a claim.

flowchart LR
  A[Changed files in the diff] --> B{Change-to-test map}
  B -->|map resolves| C[Run the impacted slice]
  B -->|map stale or unresolvable| D[Run the full suite]
  C --> E[Report which tests ran and which were skipped]
  D --> E
  F[Scheduled full run] --> E

The impacted slice gives fast feedback on every change; the unresolvable-change fallback and the scheduled full run are the two paths that keep a missed dependency from shipping.

How It Plays Out

A platform team’s suite has grown to 9,000 tests and 35 minutes. Developers have started pushing speculative commits just to let CI tell them what broke, which clogs the pipeline. The team turns on coverage-based selection: the runner records a per-test file map, and each push runs only the tests whose recorded coverage touches a changed file. Median CI time drops to four minutes. The safety net is a nightly full run plus an unconditional full run on every merge to the main branch, so the maybe-one-percent of regressions the coverage map misses (usually from reflection or a shared config file) surface that night rather than in a customer report. The skipped-test count lands in each build’s summary, so when a developer asks “did CI actually test my change,” the answer is a number, not a shrug.

A team running coding agents against a large service hits a different version of the same wall. Each agent iterates dozens of times per task, and at 35 minutes a run the agents spend more time waiting than working. Selection cuts the inner-loop suite to the slice each edit touches, so the agent’s verification loop tightens from half an hour to under two minutes and the agent can actually iterate. The team holds one line firmly: the agent never chooses its own slice. The selector is a separate, independent step keyed off the diff, and the full suite still gates the merge. This is the safety contract that matters most under agents: a system that let the agent pick which tests confirm its change would be letting the change grade its own homework.

Warning

The dangerous failure mode is the silent one: the change touches a behavior through an edge the map can’t see (a reflective call, a generated file, a config key) and the impacted slice comes back green because the test that covers that behavior wasn’t selected. Treat any change to build files, shared dependencies, or system-wide config as a full-suite trigger, not a candidate for selection.

Example Prompt

“Run the impact-selected test slice for this change and report it: list the tests selected, the tests skipped, and the reason for the selection. If the diff touches a build file, a shared dependency, or anything the coverage map can’t resolve to specific tests, run the full suite instead and say why. Do not narrow the selection to make the build pass.”

Consequences

Benefits. Feedback gets dramatically faster and cheaper without abandoning the regression defense, which is the whole point: a team can keep a large suite and still get an answer in minutes. Under agentic workflows the gain compounds, because every saved minute is multiplied across every iteration of every agent; a tight selected slice is often what makes agent-driven iteration economically viable at all. The selection record also becomes a useful artifact in its own right: paired with build provenance, it tells you not just that the build passed but exactly which behaviors were checked to make that claim.

Liabilities. The map is a maintenance burden and a trust liability. It needs rebuilding as the code moves, and a stale map silently selects the wrong slice. The technique can’t see edges absent from its dependency or coverage data (reflection, dynamic dispatch, configuration, generated code), so it always carries a residual miss rate that only the periodic full run catches. There’s a real failure mode where a team trusts selection too much, lets the full-suite cadence lapse, and accumulates regressions in the unselected long tail until one reaches production. And under agents the contract is load-bearing in a specific way: the moment the entity producing the change also influences which tests run, selection stops being an optimization and becomes a way to launder a broken change past the gate. Keep the selector independent of the change author, human or agent, or don’t run it.

Sources

Paul Hammant’s The Rise of Test Impact Analysis, hosted on Martin Fowler’s site, traces the technique’s lineage and names its core mechanism: not every test exercises every source file, so coverage or instrumentation gathered while tests run is the intelligence that lets a pipeline pick the relevant ones. It also documents the standard safety carve-outs: newly added tests and previously failing tests always run.
Microsoft’s engineering teams ran the most sustained public effort to productize the technique across Visual Studio and Azure DevOps; their documentation of the per-test source map and the always-run handling of new and recently-failed tests is the practitioner reference for how the fallbacks fit together. Google’s earlier internal selective-execution work, using bytecode instrumentation to skip unaffected tests, established the same idea years before it was widely available.
The argument that a fast, credible automated suite is the only thing that lets a team change code often without accumulating regressions runs throughout Jez Humble and David Farley’s Continuous Delivery (Addison-Wesley, 2010); test selection is one of the optimizations that keeps that suite inside the per-change budget as a codebase grows.
The agentic framing is the test-selection instance of a broader independence principle the agentic engineering community discusses under evaluation gates and verification loops: a coding agent must never select the tests that confirm its own change, so impact-based selection has to be an independent step keyed off the diff. The discipline software engineering built for human change rates is the same discipline that keeps agent change rates honest; only the volume is new.

Keyboard shortcuts