Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Verification Loop

Pattern

A reusable solution you can apply to your work.

Understand This First

  • Agent – the verification loop is the agent’s primary quality assurance mechanism.
  • Tool – the agent needs tools to run tests and read results.

Context

At the agentic level, the verification loop is the cycle of change, test, inspect, and iterate that makes agentic coding reliable. It’s the mechanism by which an agent confirms that its changes actually work, not through confidence, but through evidence.

The verification loop is what separates agentic coding from “generate and hope.” A model generates plausible code, but plausible isn’t correct. The loop closes the gap by running tests, checking output, and feeding results back to the agent for correction.

Problem

How do you ensure that agent-generated changes actually work, when the agent’s default output is optimized for plausibility rather than correctness?

An agent that writes code without verifying it is like a developer who never runs their tests. The code might be right. It often is. But when it isn’t, the errors compound: the next change builds on a broken foundation, and the agent doesn’t notice because it isn’t checking.

Forces

  • Agent confidence doesn’t correlate with correctness. The model sounds equally sure about right and wrong code.
  • Fast iteration is one of the agent’s strengths, making verify-and-retry cheap.
  • Test infrastructure must exist for verification to work. The loop is only as good as the checks it runs.
  • Verification scope must be calibrated. Running the full test suite after every small change is wasteful; running nothing is reckless.

Solution

Build verification into the agent’s workflow as a mandatory step, not an optional one. The basic loop is:

  1. Change. The agent modifies code based on the task or the previous iteration’s feedback.
  2. Test. The agent runs relevant tests, linters, type checks, or other automated checks.
  3. Inspect. The agent reads the results. If everything passes, the task may be complete. If something fails, the agent analyzes the failure.
  4. Iterate. The agent uses the failure information to make a corrective change and returns to step 2.

Steps 2-4 are what the agent does naturally when given access to test tools and trained to use them. Most capable agents, when told “fix this and make sure the tests pass,” will automatically run tests, read failures, and iterate. Your job is to ensure the infrastructure exists and the agent knows how to invoke it.

Verification works at multiple granularities. Unit tests catch functional errors quickly. Type checkers catch structural errors. Linters catch style violations and common mistakes. Integration tests catch issues at boundaries. A good verification loop uses the fastest checks first and escalates to slower, broader checks as the change stabilizes.

Warning

Don’t trust agent-generated tests as your only verification. An agent can write code and tests that agree with each other while both being wrong. Use existing tests, human-written tests, and manual inspection as anchors. See Smell (AI Smell) for more on this failure mode.

How It Plays Out

An agent is asked to add input validation to an API endpoint. It writes the validation logic, runs the existing test suite, and discovers that two tests fail because they were sending invalid input that the old code silently accepted. The agent examines the tests, determines they should be updated to send valid input, makes the corrections, reruns the suite, and all tests pass. Without the verification loop, the validation would have shipped alongside broken tests.

A developer configures their agent’s harness to automatically run type checks after every file save. The agent writes a function that returns string | null but the caller expects string. The type checker catches the mismatch immediately, and the agent adds a null check before moving on. The bug never reaches a test; it was caught at the fastest verification level.

Here’s what a verification loop looks like in practice. A developer asks the agent to add rate limiting to an API endpoint. The agent writes the implementation, then runs the tests:

Developer prompt:
  "Add rate limiting to POST /api/upload. Limit to 10 requests
  per minute per API key. Run the tests after you make the change."

Agent writes rate_limiter.py, updates the route handler,
then runs the test suite:

  $ pytest tests/api/test_upload.py -v
  FAILED test_upload_returns_201 - AssertionError: 429 != 201
  FAILED test_upload_large_file - AssertionError: 429 != 201
  3 passed, 2 failed

Agent analyzes the failures:
  "Two existing tests are failing because they don't include an
  API key in their request headers. My rate limiter rejects
  keyless requests with 429. The tests need updating — they
  should include a valid test API key."

Agent fixes the test fixtures to include X-API-Key headers,
then reruns:

  $ pytest tests/api/test_upload.py -v
  5 passed, 0 failed

Agent reports:
  "Rate limiting is in place. Two existing tests needed updated
  fixtures because they were sending unauthenticated requests
  that the rate limiter now rejects. All tests pass."

Without the loop, the agent would have shipped the rate limiter alongside two broken tests. The loop caught the interaction between new code and old assumptions, and the agent fixed both sides before reporting success.

Example Prompt

“Add input validation to the /register endpoint. After writing the code, run the full test suite. If any test fails, read the failure output and fix the issue. Repeat until all tests pass.”

Consequences

The verification loop makes agentic coding reliable. It catches errors while the agent still has the context to fix them, reducing the chance that broken code reaches code review or production. It also builds a healthy habit: treat agent output as a hypothesis to be tested, not a fact to be trusted.

The cost is infrastructure. You need tests, linters, type checkers, and a way for the agent to invoke them. Projects with weak test coverage get less benefit from the verification loop because there are fewer checks to run. This creates a virtuous cycle: the more you invest in test infrastructure, the more productive your agents become.

  • Depends on: Agent – the verification loop is the agent’s primary quality assurance mechanism.
  • Depends on: Tool – the agent needs tools to run tests and read results.
  • Uses: Plan Mode – planning produces expectations that verification can check against.
  • Enables: Eval – evals are verification loops applied to the agent’s overall performance.
  • Refined by: Human in the Loop – some verification steps require human judgment.
  • Uses: Smell (AI Smell) – AI smell detection is a form of verification that automated tools can’t yet perform.
  • Prevents: Vibe Coding – systematic verification is the antidote to accepting code without checking it.
  • Used by: Code Review — code review is the human-mediated verification step in a change workflow.
  • Refines: Feedback Loop — the agent-specific feedback loop: generate, test, read results, regenerate.
  • Related: Happy Path — agents retry off the happy path until they find it again.
  • Related: Printf Debugging — agents use printf debugging as part of their verify step: insert prints, run, read output, fix.
  • Supported by: Test-Driven Development — a failing test gives an agent a concrete exit condition to loop against.

Sources

  • Norbert Wiener formalized the feedback loop as a general principle of control in Cybernetics: or Control and Communication in the Animal and the Machine (1948). The verification loop’s core structure (act, observe the result, correct) is a direct instance of Wiener’s cybernetic cycle applied to software construction.
  • Kent Beck codified the tight test-feedback cycle in Test-Driven Development: By Example (2003). The verification loop’s change-test-inspect-iterate rhythm is a generalization of Beck’s red-green-refactor, extended from human developers to autonomous agents.
  • The application of closed-loop verification to LLM-generated code emerged as a community practice among agentic coding practitioners in 2023-2024, as teams discovered that treating model output as a hypothesis to be tested, not a result to be trusted, was essential for reliability.