Code Review

Pattern

A reusable solution you can apply to your work.

“Ask a programmer to review ten lines of code, he’ll find ten issues. Ask him to review five hundred lines and he’ll say it looks good.” — Giray Ozil

Code review is the practice of having someone other than the author examine code changes before they merge. It catches defects, enforces standards, and spreads knowledge across the team.

Also known as: Peer Review, Pull Request Review, Change Review

Understand This First

Test – tests verify behavior mechanically; reviews verify intent and design.
Coding Convention – conventions give reviewers a shared standard to check against.
Acceptance Criteria – criteria define what “done” means for the change under review.

You’re working on a team where multiple people (or agents) contribute code to the same codebase. Changes land frequently, and each one carries risk: it might introduce a bug, violate a design convention, duplicate existing functionality, or solve the wrong problem entirely. This is a tactical pattern, applied at the point where new code meets the existing system.

Code review sits at the intersection of Testing and human judgment. Tests verify what the machine can check. Reviews verify what tests can’t: that the code does what the team intended, that it fits the system’s Architecture, that it handles cases the author didn’t think of, and that a future reader will be able to follow it.

Problem

The author of a piece of code is the worst person to evaluate it. They know what they meant, so they read what they meant rather than what they wrote. Errors that would be obvious to a fresh reader slip past because the author’s mental model fills in the gaps. How do you catch the defects, design problems, and misunderstandings that the author can’t see?

Forces

The author’s familiarity with the change blinds them to its flaws.
Tests catch behavioral bugs but miss design problems, naming confusion, and duplicated logic.
Thorough reviews take time, and every hour spent reviewing is an hour not spent building.
Large changes are harder to review well. Reviewers lose concentration and start rubber-stamping.
Knowledge about the codebase concentrates in whoever wrote a given module, creating a single point of failure when that person leaves.

Solution

Require every code change to be examined by at least one person who didn’t write it before it merges into the shared codebase.

The reviewer reads the diff with a specific focus: does this change do what it claims? Does it introduce risk the author may not have seen? Does it follow the team’s conventions? Could a future reader understand it without the author’s context?

Keep changes small. A diff under 200 lines gets a thorough reading; a diff over 500 lines gets a skim at best. If a feature is too large for a single reviewable change, break it into stacked or sequential pull requests that each make sense on their own. Review for intent before style. Does the change solve the right problem? Only after that question is settled does it matter whether the variable names are consistent. Write comments that teach. A review isn’t a list of demands. Explain why something matters, not just what should change.

In agentic workflows, AI agents generate code faster than any human can write it, so the review queue fills faster too. The response isn’t to skip reviews. It’s to layer them. Automated reviewers handle the mechanical layer: style compliance, security patterns, test coverage, complexity metrics. Human reviewers focus on what machines still can’t judge well: whether the design fits the larger system, whether the abstraction is right, and whether the change actually solves the user’s problem. The Generator-Evaluator pattern formalizes this split: the agent generates, something else evaluates.

Tip

When an agent opens a pull request, treat the review the same way you’d review a junior developer’s work. Read the intent, check the edge cases, verify it matches the spec. The agent writes fast, but “fast” and “correct” are different things.

How It Plays Out

A startup uses Claude Code to implement a payment webhook handler. The agent produces working code in a few minutes: it parses the webhook payload, validates the signature, and updates the order status. The tests pass. But the human reviewer notices that the handler doesn’t check for duplicate delivery. Webhooks are inherently at-least-once, so the same event can arrive twice. Without an Idempotency check, a customer could be charged twice. The agent didn’t make a coding mistake. It got it wrong because the spec didn’t mention idempotency and the agent had no reason to infer it. One review comment, five minutes, saved a billing incident.

A platform team with 40 engineers and heavy agent usage sees review turnaround climb past two days. Pull requests pile up, developers context-switch away, and by the time review comments arrive the author has moved on. The team restructures: trunk-based development with short-lived branches, diffs capped at 300 lines, and an automated pre-review bot that checks formatting, test coverage, and known security patterns. The bot approves or rejects the mechanical layer instantly. Human reviewers now see smaller, pre-filtered changes and turn them around in hours. Review stops being a bottleneck and becomes a fast Feedback Loop.

Consequences

Code review distributes knowledge. When two people examine every change, the team develops shared understanding of the codebase. Nobody becomes the only person who knows how the billing module works. Knowledge distribution also raises the team’s floor. Junior developers absorb patterns from senior reviewers’ comments, and seniors discover blind spots when juniors ask questions they hadn’t considered.

The cost is real. Review takes time, and that time comes from somewhere. Teams that treat review as a checkbox, approving without reading, get none of the benefits and all of the delay. Teams that treat review as an interrogation create resentment and slow delivery to a crawl. The sweet spot is reviews that are fast, focused, and framed as collaboration rather than gatekeeping.

In agent-heavy codebases, the bottleneck is shifting shape. The volume of changes rises (DORA’s 2025 report measured 98% more pull requests and 154% larger diffs in AI-assisted teams), while the need for human judgment stays constant. AI-generated code doesn’t need more review; it needs different review. Automated tools handle mechanical checks, human reviewers handle design and intent, and the boundary between those two categories narrows over time as tooling improves.

Depends on: Test – tests handle behavioral verification; reviews handle design verification. Both are needed.
Depends on: Coding Convention – conventions make reviews faster because both author and reviewer share the same expectations.
Depends on: Acceptance Criteria – criteria define the “done” bar the reviewer checks against.
Uses: Feedback Sensor – automated review tools act as feedback sensors on the diff.
Uses: Verification Loop – code review is the human-mediated verification step in a change workflow.
Complements: Generator-Evaluator – the review is the evaluation half of the generate-evaluate cycle.
Related: Happy Path – reviews catch the non-happy-path cases the author and their tests may have missed.
Related: Feedback Loop – review is a feedback loop between the author and the team’s standards.
Related: Refactor – reviews often surface refactoring opportunities that the author, focused on the immediate task, didn’t see.
Enables: Shift-Left Feedback – reviewing changes before merge shifts defect detection earlier in the lifecycle.

Sources

Michael Fagan published “Design and Code Inspections to Reduce Errors in Program Development” (1976), the first formal study of code inspection as an engineering practice. Fagan demonstrated that structured inspections found 60-90% of defects before testing, establishing inspection as the most cost-effective defect-removal technique then known.

Google’s engineering practices documentation codified code review as a required step for every change, regardless of the author’s seniority. Their published guide emphasizes reviewer responsibility: approve for correctness, clarity, consistency, and coverage, in that order.

The DORA State of DevOps Report (2025) quantified the agentic review bottleneck: AI-assisted developers merge 98% more pull requests at 154% larger size, while code review time increased 91%. This data frames the emerging need for automated pre-review and structured change sizing.

Karl Wiegers’s “Humanizing Peer Reviews” (2002) addressed the interpersonal dimension, arguing that reviews fail not because the technique is wrong but because teams treat them as adversarial rather than collaborative. His guidelines for review etiquette remain widely cited in engineering team handbooks.

Keyboard shortcuts