Progressive Delivery
Release a change to a small, growing slice of traffic behind observability gates, promoted or halted on live signal rather than shipped to everyone at once.
Also known as: Gradual Rollout, Phased Rollout
You have probably watched this without a name for it. A new version ships, but only 2% of users get it. A dashboard turns green. Someone bumps it to 10%, then 50%, then everyone. Or the error rate twitches and it snaps back to zero before most people notice. James Governor of RedMonk coined “progressive delivery” in 2018 for a discipline teams had been improvising for years: don’t ship to everyone and hope; ship to a few, watch, and let the live signal decide who gets it next.
Understand This First
- Continuous Delivery — the change must already be releasable on demand before you can stage how it releases.
- Feature Flag — the runtime switch that changes exposure without redeploying.
- Observability — the signal that gates each stage.
Context
This operational pattern sits above the mechanics of shipping. Continuous Delivery gives you a change that could go to production at any moment; Deployment gives you the mechanisms to get it there (blue-green environments, canary slices, rolling updates); Feature Flags give you a runtime dial on who sees what. Progressive Delivery composes those parts into one answer to “now that the change is ready, how does it reach users?”: incrementally, reversibly, and on evidence.
Agentic coding raises the stakes. An agent can produce a working change in minutes, so the bottleneck is no longer writing it but deciding whether it’s safe to widen. Progressive Delivery structures that decision, and an agent can take part directly: read the canary’s error rate, compare it to the threshold, call the next move.
Problem
A change passes every test and is ready to ship. You still face the worst moment in the pipeline: the cutover. Flip it for everyone and a defect no test caught hits all of them at once; hold it for a long manual bake and you lose the speed that made the change cheap. Tests prove the change behaves on the inputs you imagined, not the load production throws at it or the data you never sampled. How do you put it in front of real traffic without betting the whole user base on it surviving first contact?
Forces
- A change that passes every test can still fail on real traffic, real load, or real data the tests never covered.
- Shipping to everyone at once maximizes both the speed of feedback and the blast radius when that feedback is bad.
- Staging a rollout adds coordination, infrastructure, and a window where two versions run side by side.
- The promote-or-halt decision needs a signal trustworthy enough to act on automatically, and most teams’ telemetry is noisier than they admit.
- Speed pressure pushes toward “just ship it”; the cost of a bad release pushes toward “bake it forever.” Neither extreme is right.
Solution
Release in stages, gate each stage on live signal, and keep every stage reversible. Expose the change to a small slice of traffic, watch the few metrics that define “healthy,” and widen only when the signal clears a threshold you set in advance. If it doesn’t clear, halt; because each stage is reversible, that costs a flag flip, not an outage.
The stages reuse well-known mechanics; the discipline is naming the gate, not inventing machinery:
- Canary: route a few percent of traffic to the new version and compare its health against the stable fleet. If the canary suffers, the rest of the flock never goes down the mine.
- Blue-green: stand the new version beside the old and shift traffic between them, so a halt is an instant switch rather than a redeploy.
- Ring rollout: widen by audience instead of by percentage. Internal users first, then a beta ring, then everyone.
- Flag-gated exposure: use Feature Flags to control the percentage independently of deployed code, keeping “deployed” and “exposed” decoupled.
Before the rollout, define what “healthy” means: a Service Level Objective on error rate, latency, or a business metric. Each stage then runs as a Feedback Loop: expose, measure against the SLO, widen or revert. The threshold decides, not someone’s nerve at 4 p.m. on a Friday. Where the signal is clean and the blast radius small, the gate advances itself; where it’s ambiguous or the stakes high, keep a Human in the Loop on the button. An agent can occupy that gate too: hand it an honest threshold, and it holds while the numbers are noisy, advances when they’re clean, and triggers a Rollback the moment they cross the line.
How It Plays Out
A team ships a rewritten checkout service behind the new_checkout flag, with an agent driving the ladder on a standing instruction: “Advance 1% → 5% → 25% → 100%, fifteen minutes per stage. Hold if canary error rate exceeds the stable fleet by more than 0.5 points; roll back if it exceeds 2 points.” The agent watches the canary’s error rate and p99 latency against the stable fleet, clears 1% and 5%, then stalls at 25% when p99 climbs and error rate drifts to +0.7. It posts a summary instead of promoting. A human reads it, traces the latency to a slow query the staging data never triggered, agrees the drift is real, and the agent flips the flag back to 0%. No user-facing incident: the bad version touched a quarter of traffic at worst, for the minutes it took to halt. The strategy ran itself up to the judgment call, then handed the call to a person.
“Roll the new ranking service out progressively behind a feature flag. Start at 2% of traffic and double the exposure every 20 minutes if the canary’s error rate and p95 latency stay within 10% of the stable fleet. Halt and alert me if either metric breaches that band, and roll back automatically if error rate doubles.”
A gate is only as trustworthy as the signal behind it. Auto-promotion on a noisy or sparse metric is worse than no gate at all: it greenlights a bad release with the appearance of evidence. If the canary gets too little traffic for a stable reading, lengthen the stage, widen the canary, or keep a human on the button. Don’t let a confident-looking number that means nothing advance the rollout.
Consequences
Benefits. A defect’s blast radius shrinks to the current stage instead of the whole user base, and time-to-detect shrinks with it, because you’re watching a small instrumented slice on purpose. Halting is cheap, a flag flip rather than an emergency redeploy, so the cost of being wrong drops and the team grows braver about shipping. The rollout becomes a sequence of small, evidence-backed decisions concrete enough to automate or hand to an agent.
Liabilities. You run two versions at once during every rollout, which complicates state, data migrations, and any code that assumes a single live version. The gating signal becomes critical infrastructure: thin observability or wishful SLOs make the gate confident and wrong. Staged rollouts take longer in wall-clock time than a single cutover. And the machinery (flags, canary routing, automated analysis) has to exist before any of this pays off, so the first progressive rollout costs far more than the hundredth.
Related Articles
Sources
- James Governor of RedMonk coined “progressive delivery” in 2018 to name the discipline of releasing changes incrementally behind observability gates, framing it as the successor question to continuous delivery: not just can you ship, but how widely and how fast should an individual change spread.
- Jez Humble and David Farley established the underlying releasable-on-demand foundation in Continuous Delivery (Addison-Wesley, 2010), which named the deployment pipeline that progressive delivery sits on top of.
- The canary and blue-green release techniques predate the umbrella term and emerged from the large-scale web operations community through the 2000s and 2010s; Danny North and others at ThoughtWorks, and the operations teams at companies running continuous deployment at scale, developed the practice of staged, monitored rollouts that the named discipline later generalized.