Red/Green TDD
Also known as: Red-Green-Refactor
Understand This First
Context
You’ve decided to practice Test-Driven Development. You understand the principle (write tests first) but you need a concrete, mechanical process you can follow without ambiguity. This is a tactical pattern: the specific loop that makes TDD work in practice.
The name comes from test runner output: a failing test shows as red, a passing test shows as green.
Problem
“Write tests first” is good advice but vague. How much code should you write at a time? When should you stop adding to the implementation? When is it safe to clean things up? Without a clear rhythm, developers oscillate between writing too much code at once (losing the benefits of test-first design) and getting paralyzed by the question of what to test next.
Forces
- Large steps make it hard to locate the source of a failure.
- Tiny steps can feel tediously slow.
- Without a refactoring phase, code accumulates mess even when tests pass.
- Skipping the “red” phase means you don’t know if the test actually tests anything.
- The temptation to write “just a little more code” before running the tests undermines the discipline.
Solution
Follow a strict three-step loop:
Red. Write a single test that describes one small behavior the system doesn’t yet support. Run it. Watch it fail. The failure confirms that the test is actually checking something; a test that passes immediately hasn’t proven anything new.
Green. Write the simplest code that makes the failing test pass. Don’t worry about elegance, performance, or generality. Don’t write code for the next test. Just make this one test pass, doing as little as possible.
Refactor. Now that all tests pass, look at the code you just wrote and the code around it. Is there duplication? An unclear name? A clumsy structure? Clean it up. Run the tests after each change to make sure they still pass. The test suite is your safety net during this phase.
Then start the loop again with a new failing test.
The discipline that matters most is never skipping the red step. If you write code without a failing test, you’ve left the loop. If you write a test that already passes, you haven’t proven anything new. The red step is what keeps you honest.
How It Plays Out
A developer is building a stack data structure. Red: They write test_push_increases_size; it fails because there’s no Stack class yet. Green: They create Stack with a push method and a size property, using the simplest implementation (a list). The test passes. Refactor: Nothing to clean up yet. Red: They write test_pop_returns_last_pushed; it fails. Green: They add a pop method. The test passes. Refactor: They notice push and pop could share a clearer internal naming. They rename and re-run tests. All green. The stack grows feature by feature, always covered by tests.
In agentic coding, the red/green loop gives AI agents a tight feedback cycle. You write a failing test (red). You ask the agent to make it pass (green). The agent writes code, runs the test, and iterates until it’s green. Then you, or the agent, refactor. Each cycle is small enough that if the agent goes off track, you catch it immediately. This is far more reliable than asking an agent to “build a whole feature” in one shot.
A typical agentic red/green session might look like:
- Human writes:
test_discount_applies_to_orders_over_100 - Agent implements: a discount function that checks order total
- Test goes green
- Human writes:
test_discount_does_not_apply_under_100 - Agent adjusts the implementation
- Both tests green
- Human or agent refactors
“I’ve written a failing test: test_discount_applies_to_orders_over_100. Read the test, understand what it expects, and write the minimum code to make it pass. Don’t add anything the test doesn’t require.”
Consequences
The red/green loop enforces small, incremental progress. You always know where you are: either you have a failing test to fix, or all tests pass and you’re free to clean up or write the next test. This predictability reduces anxiety and prevents the “big bang” approach where you write hundreds of lines and then debug for hours.
The cost is pace. Red/green TDD feels slow, especially at the start of a project when you’re writing more test code than production code. It also requires a fast Harness; if running the test suite takes minutes, the loop breaks down. For TDD to work, tests must run in seconds.
Related Patterns
- Refines: Test-Driven Development — this is TDD’s core mechanical loop.
- Depends on: Test, Harness — you need fast, reliable test infrastructure.
- Enables: Refactor — the refactoring phase is built into every cycle.
- Contrasts with: Regression — red/green TDD prevents regressions by catching them as they’re introduced.