Agentic Pull Request
Treat an agent’s work as a reviewable change request, not a raw diff: branch, commits, test evidence, session link, and rationale, with a review surface where reviewer comments become the agent’s next instructions.
Also known as: Agent PR, AI Pull Request
The pull request was invented so a maintainer could say “I like where this is going, but fix these three things before I merge.” It bundles code, evidence, and conversation into one reviewable object. When an agent writes the code, that object matters more: reviewer comments become instructions the agent can execute. The PR stops being courtesy to future readers and becomes the live contract between a human reviewer and a machine that writes code.
Understand This First
- Code Review — the human discipline the agentic PR is built to be reviewed by.
- Bounded Autonomy — how far the agent may go before it must stop and present a reviewable change.
- Approval Policy — the rules that decide which PRs merge unsupervised.
Context
A coding agent has finished a task. It edited eleven files, added a test, and ran the suite. Now what? The agent has to hand its work to a human in a form the human can judge, and the team has to decide what may merge on the agent’s say-so and what may not. This is the handoff between an agent doing work and a human accepting it, and most teams already have a machine built for that handoff: the pull request.
This pattern is operational, and it sits where Bounded Autonomy and Approval Policy meet the daily mechanics of shipping. It applies the moment an agent’s output is headed for a shared branch that people and other agents depend on. Codex, GitHub Copilot cloud agent, and Claude Code all organize serious work around PR-shaped artifacts, because the PR is the one interface every team with a remote already understands. The agent doesn’t need a new review tool. It needs to fit the one that exists.
Problem
A diff is not a change request. Handed a bare patch, a reviewer has the what but none of the why: no record of what the agent was asked to do, what it tried, what it verified, or where it was unsure. The reviewer has to reconstruct intent from the changes themselves, the same trap that makes reviewing your own code hard.
A one-shot diff also has no place to put the next instruction. The reviewer spots a missing edge case and wants to say “handle the duplicate-delivery case.” With a raw patch, that comment goes nowhere actionable: someone has to translate it back into a new agent session by hand. The artifact and the conversation have come apart. How do you package agent work so a human can judge it quickly and feed corrections back without leaving the review?
Forces
- Reviewable now versus mergeable later. A reviewer wants enough context to judge the change in minutes. More context means more for the agent to assemble and more for the human to read. The PR has to carry what review needs and not bury it.
- Trust the evidence versus re-run it. Test output the agent pastes in is fast to read but easy to fake or stale. Test output a fresh CI run produces is trustworthy but slower. The PR has to make the difference between the two visible.
- Conversation versus throughput. The PR’s value is the back-and-forth, but every round costs the reviewer attention. Agents generate changes faster than humans can discuss them, so the review surface can flood.
- Uniform contract versus per-task reality. A single PR template is easy to enforce, but a documentation fix and a new feature don’t deserve the same scrutiny. The evidence acceptance literature is consistent on this: no single agent wins across all task types, and low-risk changes are accepted far more readily than risky ones. The bar should move with the task.
- Autonomy versus the gate. The more the agent can merge on its own, the faster the team moves and the less a human sees. Where the merge line sits is a policy decision, not a default.
Solution
Have the agent present its work as a complete pull request, not a loose patch. The PR should include a branch, atomic commits, reviewer-facing explanation, test evidence, a link back to the session that produced it, and a review surface where comments feed the next agent turn. Make the PR the contract, and make the agent responsible for filling it out the way you’d expect a careful colleague to.
A good agentic PR carries six things, and a reviewer should be able to find each in seconds:
- A scoped branch and clean commits. One logical change per PR, with commits a reviewer can read in order. A 1,200-line PR gets skimmed; a 200-line PR gets read.
- A description written for the reviewer. What the agent was asked to do, what it changed, and why, in the Progress Log sense: a narrative, not a restatement of the diff.
- Test evidence the reviewer can trust. Not “tests pass” in prose, but a CI run on the actual branch. Pasted output is a claim; a green check from a fresh run is evidence.
- A link to the session. A pointer to the Agent Trace so a reviewer who wants to know how the agent got here can follow the reasoning, not guess at it.
- A rationale for the non-obvious. Where the agent made a judgment call, a sentence on why. The choices a reviewer would otherwise have to interrogate are the ones worth stating up front.
- Authorship that’s honest. The PR is the natural place to record Agent Provenance: which agent, which model, under which instructions, produced this.
Before requesting review, a human owner should read the PR body and scan the diff. Agent-written descriptions can be verbose, stale, or overconfident. The owner signs that the PR reflects the request and is ready for someone else’s attention.
The cross-cutting move is to make reviewer comments the agent’s next instructions. When a reviewer writes “this doesn’t handle the empty-list case,” the agent reads the comment, makes the fix, pushes a new commit, and replies. The PR becomes a Steering Loop: the human steers, the agent acts, the artifact updates in place. That loop only works if the PR is the durable record both sides return to. The artifact and the conversation have to live in the same place.
Two policies sit on top of the artifact. The merge line, governed by your Approval Policy, decides what an agent may merge unsupervised: maybe a docs typo, never a database migration. The bar moves with the task, because the evidence says one bar doesn’t fit all work. A dependency bump and a new auth flow are both PRs; they shouldn’t clear the same gate.
flowchart LR
T[Task] --> A[Agent run]
A --> PR[Pull request: branch, commits, evidence, rationale, trace link]
PR --> R{Human review}
R -->|comments| A
R -->|approve| M[Merge]
How It Plays Out
A developer asks a coding agent to add pagination to a list endpoint. Twenty minutes later the agent opens a PR: a four-commit branch, a description that explains the response-shape choice, a green CI run, and a link to the session. The reviewer reads it in three minutes, leaves one comment (“default page size should be 50, not 100”), and the agent pushes a fix commit and replies within the minute. What would have been a half-day of Slack messages and a local checkout is a single PR thread with two human messages in it. The contract did the work.
A platform team turns on a background agent to clear a backlog of small bugs overnight. By morning there are nine PRs waiting. Eight are tidy: scoped, tested, explained. The ninth touches the billing service and rewrites a function nobody asked it to rewrite. Because the team’s approval policy routes anything touching billing/ to a named human and blocks auto-merge there, the risky PR is sitting in review while the eight safe ones have already merged on a passing CI gate. The team didn’t have to watch the agent work. The PR contract and the merge policy watched it for them.
A team studying its own data notices a pattern that matches the public literature: the agent’s documentation and test-scaffolding PRs get approved almost on sight, while new-feature PRs bounce two or three times before merging. Rather than treat that as the agent failing, they treat it as task types deserving different gates. Docs PRs from the agent now auto-merge on a passing build; feature PRs always get a human reviewer and a design note in the description. Acceptance rates climb, not because the agent got better, but because the gate finally matched the risk.
Make the agent’s CI run the source of truth, not its self-report. An agent that writes “all tests pass” in the PR body and an agent whose PR shows a green check from a fresh run on the branch are making different promises. Require the second. The cheapest review you can do is reading a passing pipeline you trust; the most expensive is re-litigating a claim you can’t.
Consequences
Benefits. Review gets faster, because the reviewer judges a complete change request instead of reconstructing one. Feedback gets cheaper, because a comment is an instruction the agent can act on in seconds rather than a ticket someone has to reopen as a new session. The merge gate becomes explicit policy you can tune per task type, so low-risk work flows and high-risk work stops for a human. The PR becomes the one place authorship, evidence, and rationale all live together, which is exactly what an auditor or incident responder wants to find later.
Liabilities. The review surface can flood. An agent that opens twenty PRs a day will outrun any human reviewer who treats each one as a careful read, and the failure mode is rubber-stamping, which is worse than no review because it looks like review. The evidence is only as good as the gate that produced it: a green check from a weak test suite is a false comfort. And the better the contract works, the more tempting it is to widen the auto-merge line until the human is no longer really in the loop. That is the slow drift toward a Dark Factory, where the reviewable PR has quietly stopped being reviewed.
Failure modes to name.
- The diff dump. The agent opens a PR with no description, no rationale, and “tests pass” as the only evidence. The reviewer is back to archaeology. The fix is to make the description and a real CI run required, not optional.
- The mega-PR. The agent bundles a week of work into one 2,000-line change because it could. It gets skimmed and approved, and the bugs ship. The fix is to scope the agent to one logical change per PR.
- The trusted self-report. The PR body claims green tests the pipeline never ran. The fix is to require evidence from CI on the branch, where the agent can’t author the result.
- The runaway gate. Auto-merge starts at docs typos and creeps outward until a migration lands unreviewed. The fix is an explicit approval policy with the merge line written down and audited, not a default that drifts.
Related Articles
Sources
The pull request as a unit of collaboration comes from the distributed-version-control world; GitHub popularized the PR-as-review-surface model that the agentic version inherits wholesale. The agentic adaptation is now explicit in product documentation. GitHub’s Copilot cloud agent works on a branch before a pull request. OpenAI Codex code review in GitHub can review a PR and then fix issues in the same branch when asked. Claude Code GitHub Actions can create pull requests and implement fixes from PR or issue comments. The convergence matters more than any single vendor: agents are being routed through the PR because it is the review surface teams already trust.
The empirical grounding for treating the agentic PR as a studied object, rather than a vendor convenience, comes from a pair of 2026 large-scale corpora. The AIDev dataset catalogs 932,791 agent-authored pull requests across 116,211 repositories, large enough to study how agent contributions move through review at scale. A companion task-stratified acceptance study found that no single agent wins across all task types, and that documentation and low-risk PRs are accepted far more readily than new-feature PRs: the evidence behind the “let the bar move with the task” guidance above. Work on why agentic PRs fail in review and on the security posture of agent-authored PRs rounds out the picture, both arguing that the PR is where the risks of agent-written code become visible and reviewable.
The human half of the contract is the Code Review tradition, whose foundational result is Michael Fagan’s Design and Code Inspections to Reduce Errors in Program Development (1976): structured inspection finds most defects before testing. The agentic PR is that discipline pointed at a faster author, and Fagan’s core finding holds whether the author is a person or a model: a reviewer who didn’t write the code catches what the author cannot.
Further Reading
- AIDev: A Large-Scale Dataset of AI-Authored Pull Requests — the corpus behind the claim that agentic PRs are now common enough to study empirically.
- Task-stratified PR-acceptance study — the evidence that acceptance varies sharply by task type and that no single agent dominates.
- Google’s engineering practices: code review — the reviewer-side discipline the agentic PR is meant to be reviewed by, applied to every change regardless of author.
- GitHub: Agent pull requests are everywhere. Here’s how to review them — a current practitioner guide focused on the reviewer habits agent-authored PRs demand.