Feedback Flywheel

Pattern

A reusable solution you can apply to your work.

A cross-session retrospective loop that harvests corrections from AI-assisted work, distills them into rules, and feeds those rules back into the team’s instruction files so each session’s frustrations become the next session’s defaults.

“We are what we repeatedly do. Excellence, then, is not an act, but a habit.” — Will Durant, paraphrasing Aristotle

Also known as: Retrospective Loop, Rule Harvesting, Institutional Learning Loop

Understand This First

Steering Loop – the within-session control cycle that the flywheel wraps.
Instruction File – the artifact where harvested rules land.
Feedback Sensor – the signals that reveal what went wrong inside a session.

Context

At the organizational level, the feedback flywheel sits above Steering Loop and Feedback Sensor. Those patterns operate inside a single session: the agent acts, sensors check, the loop corrects. They handle today’s task. The feedback flywheel handles what happens between sessions, across days and weeks, when a team asks: “Why do we keep correcting the same thing?”

Most teams using AI coding tools hit this wall. The agent generates code that compiles and passes tests, but violates a convention, misunderstands a domain rule, or structures files in a way the team doesn’t want. A developer fixes it. The next day, a different developer makes the same fix. Nobody writes the rule down. The knowledge stays locked in individual sessions, evaporating when the context window closes.

Problem

How do you turn repeated corrections into permanent improvements when each agent session starts fresh with no memory of past mistakes?

Sessions are ephemeral. An agent that learned from a correction at 2 PM has forgotten it by the next morning. Developers who notice the same problem three times grumble but don’t formalize the fix. The team’s experience with their AI tools grows, but the tools themselves don’t improve because nobody closes the loop between “I fixed this again” and “the agent should know this.”

Forces

Sessions are stateless. Each new conversation starts from the instruction file and whatever context the developer provides. Corrections made mid-session don’t persist.
Corrections are scattered. Different developers make different corrections at different times. No single person sees the full picture of what the team keeps fixing.
Writing rules takes effort. Even when someone notices a recurring problem, formalizing it into a clear, machine-readable rule feels like a distraction from the actual work.
Rules can accumulate without review. If everyone adds rules but nobody prunes them, instruction files grow into contradictory, bloated documents that the agent struggles to follow.
The signal is noisy. Not every correction reveals a systemic problem. Some are one-off mistakes, context-dependent judgments, or personal preferences that shouldn’t become team rules.

Solution

Capture corrections in structured session logs, run periodic retrospectives to find root causes, and feed validated rules back into the team’s instruction files and commands. Track first-pass acceptance rate as the metric that tells you whether the flywheel is turning.

The flywheel has three moving parts: capture, distill, and codify.

Capture. When a developer corrects agent output, they note what was wrong and what the fix was. This doesn’t need to be elaborate. A structured log entry with three fields works: the file or area, the correction, and a one-line description of why. Some teams build this into their harness as an automatic prompt after each session. Others use a shared document or channel. The format matters less than the habit.

Distill. On a regular cadence (weekly or biweekly), the team reviews the correction log. The goal isn’t to discuss every entry but to spot clusters: the same correction appearing three or more times, or showing up across different developers. Those clusters are the flywheel’s raw material. A correction that appears once might be noise. One that appears five times from three developers is a missing rule.

Codify. The team writes the rule into the appropriate instruction file, custom command, or linter configuration. The rule should be specific enough for an agent to follow: not “use better names” but “prefix all database query functions with fetch_ and all mutation functions with update_.” After codifying, the team verifies that the rule actually changes agent behavior by running a representative task.

The metric that tells you whether this works is first-pass acceptance rate: the percentage of agent-generated outputs accepted without modification. A rising rate means the instruction files are improving. A flat rate means the retrospectives aren’t producing actionable rules, or the rules aren’t reaching the agent. A falling rate means something has changed (new team members, unfamiliar codebase area, model update) and the flywheel needs to respond.

Tip

Don’t wait for a formal retrospective to codify an obvious rule. If you’ve corrected the same thing three times in one week, write the rule now. The retrospective catches what individuals miss, but it shouldn’t be the only entry point.

How It Plays Out

A four-person team adopts an AI coding assistant for a Python backend. In the first two weeks, three developers independently correct the agent’s habit of using bare except clauses instead of catching specific exceptions. Each developer fixes it in their session and moves on. At the first weekly retrospective, the correction log shows seven instances of the same fix. The team adds a rule to their project instruction file: “Never use bare except clauses. Always catch specific exception types. Use except ValueError or except KeyError, not except Exception unless the function is a top-level error boundary.” The following week, zero corrections for exception handling. First-pass acceptance rate for error-handling code jumps from around 40% to over 80%.

A frontend team tracks corrections for a month and finds that 60% cluster around three issues: the agent uses inline styles instead of CSS modules, it drops test files in the wrong directory, and it imports a deprecated utility. They codify all three as rules, and first-pass acceptance rate climbs from 55% to 72% over three weeks.

Then a new team member joins who works in a different part of the codebase, and the rate dips. The retrospective reveals that the rules assumed a directory structure that doesn’t apply to her area. The team refines the rules to be path-aware. The rate recovers, but more importantly, the team has learned something about how rules age: they’re only as portable as their assumptions.

A solo developer keeps a simple text file of corrections. After two weeks, a third of her entries involve the agent generating functions longer than 30 lines. She adds a rule to her instruction file capping function length and specifying decomposition. The correction rate drops, but a new problem appears: the agent now creates too many tiny helper functions that do almost nothing. Her next rule sets a floor on meaningful work per function. Two rules, two weeks, and the agent’s output has noticeably improved.

Consequences

The feedback flywheel turns a team’s accumulated experience into durable, machine-readable rules. Over weeks, the agent’s output aligns more closely with the team’s standards, reducing the correction burden and freeing developers to focus on design and judgment rather than cleanup.

The payoff compounds. Each rule makes every future session slightly better, across every developer on the team. A team with 50 well-tested rules in their instruction file gets noticeably different agent output than a team with none, even when both use the same model.

The costs are real. Retrospectives take time, and if the team treats them as bureaucracy rather than productive work, attendance and quality drop. Rule bloat is a persistent risk: instruction files that grow past a few hundred lines start contradicting themselves or exceed the agent’s ability to follow them all. Teams need a pruning discipline alongside the capture discipline. Rules that haven’t prevented a correction in months are candidates for removal.

There’s also a measurement trap. First-pass acceptance rate is the best available metric, but it can be gamed: a developer who lowers their standards accepts more output, and the rate rises without real improvement. Use it as a trend indicator alongside qualitative judgment, not as a target to optimize in isolation.

Wraps: Steering Loop – the steering loop handles within-session correction; the feedback flywheel handles cross-session learning.
Feeds into: Instruction File – instruction files are the primary artifact the flywheel produces and maintains.
Depends on: Feedback Sensor – sensors surface the in-session signals that developers capture as corrections.
Related: Feedforward – feedforward controls are the mechanism through which harvested rules reach the agent.
Related: Shift-Left Feedback – shifting feedback earlier in the session and harvesting rules across sessions are complementary; both reduce correction cost.
Contrasts with: Memory – memory persists facts from session to session for a single agent; the flywheel persists validated rules across a team.
Related: Metric – first-pass acceptance rate is the flywheel’s leading metric.
Related: Feedback Loop – the feedback flywheel is a specific, cross-session instance of the general feedback loop pattern.
Related: Skill – skills and custom commands are another codification target for harvested rules.

Sources

Rahul Garg introduced the Feedback Flywheel as a named pattern in “Patterns for Reducing Friction in AI-Assisted Development” (martinfowler.com, February 2026), describing the cross-session retrospective loop with first-pass acceptance rate as the leading metric.
The concept of retrospective-driven process improvement has roots in the agile community, particularly Norm Kerth’s Project Retrospectives: A Handbook for Team Reviews (2001), which established the practice of structured team reflection as a tool for institutional learning.
Jim Collins popularized the flywheel metaphor in Good to Great (2001), describing how small, consistent pushes in a coherent direction compound into unstoppable momentum. The feedback flywheel applies this dynamic to AI-assisted development: each harvested rule is a push that makes the next session slightly better.

Keyboard shortcuts