--- slug: background-agent type: pattern summary: "Delegate a bounded coding task to an isolated agent that works away from the live conversation and returns a reviewable artifact when it finishes." created: 2026-06-14 updated: 2026-06-14 related: agent: relation: depends-on note: "A background agent is an agent given a task to pursue outside the live human conversation." thread-per-task: relation: uses note: "Each background task should run as its own focused thread." worktree-isolation: relation: uses note: "The background agent needs an isolated checkout so its changes cannot collide with the human's current work." task-horizon: relation: scopes note: "Task horizon decides how long and complex a background task can be before it needs checkpoints or decomposition." verification-loop: relation: uses note: "The agent should verify its work before returning an artifact." progress-log: relation: uses note: "The returned artifact needs a human-readable record of what the agent did and what it checked." externalized-state: relation: uses note: "Plans, logs, and intermediate artifacts make unattended work inspectable." handoff: relation: produces note: "A finished background task returns through a handoff back to the human or parent workflow." agentic-pull-request: relation: produces note: "A pull request is the most common reviewable artifact returned by a background coding agent." subagent: relation: contrasts-with note: "A subagent works inside a parent workflow; a background agent runs as an independent session or cloud task." bounded-autonomy: relation: depends-on note: "Unattended work needs explicit autonomy limits before it starts." approval-policy: relation: uses note: "The approval policy decides what the background agent may do without interrupting a human." delegation-chain: relation: extends note: "Launching a background agent creates another link in the path from human intent to action." --- # Background Agent > **Pattern** > > A named solution to a recurring problem. *Delegate a bounded coding task to an isolated agent that works away from the live conversation and returns a reviewable artifact when it finishes.* *Also known as: Background Coding Agent, Cloud Coding Agent, Asynchronous Coding Agent* You don't need to watch every agent step in real time. Some tasks are better handed off: fix this small bug, update these tests, investigate this issue, prepare a draft change. A background agent names the operating contract for that handoff. You give the agent a task, boundaries, and an environment. It works while you do something else, then returns evidence you can review. ## Understand This First - [Agent](agent.md) — the worker doing the task. - [Thread-per-Task](thread-per-task.md) — each background run needs a focused thread. - [Worktree Isolation](worktree-isolation.md) — unattended changes need a separate checkout or branch. - [Bounded Autonomy](bounded-autonomy.md) — the agent's freedom must be set before it starts. ## Context At the **agentic** level, a background agent is an agent session you launch and then stop supervising turn by turn. It may run in a cloud sandbox, a GitHub Actions runner, a local background session, or a separate worktree. The hosting model is secondary. The pattern is asynchronous delegation: the human assigns the work, leaves the inner loop, and reviews a finished artifact later. This pattern sits between live pairing and full automation. In live pairing, the human and agent share a [REPL](repl.md)-like session, making decisions at every turn. In full automation, the system runs without expecting human review. A background agent keeps the human out of the inner loop but inside the outer loop. The agent can plan, edit, run checks, and produce a result. The human still decides whether that result is acceptable. ## Problem How do you use an agent for work that does not need constant supervision without turning it into an unreviewed automation? Many agent tasks are too slow to watch and too risky to trust blindly. If you supervise every shell command and file edit, you lose the wall-clock benefit. If you let the agent run without boundaries, it may drift, touch unrelated files, or return a confident summary without enough evidence. You can't fix that with a longer chat transcript. The workflow needs a middle position: unattended execution inside a bounded sandbox, followed by review. ## Forces - **Wall-clock overlap matters.** The agent's minutes should not always consume the human's minutes. - **Unattended work needs a box.** The longer the human is away, the more important scope, permissions, and isolation become. - **Evidence must replace conversation.** If you were not present for the work, the returned artifact has to show what happened. - **Task size has a ceiling.** Past the agent's [Task Horizon](task-horizon.md), background work drifts unless it is decomposed or checkpointed. - **Review bandwidth is finite.** Ten background agents can produce ten artifacts faster than a human can review them. ## Solution **Launch background agents only for bounded tasks.** Give each run an isolated environment, explicit authority, and a required return artifact. Treat the background run as a contract: what the agent may touch, what it must verify, when it must stop, and what evidence it must bring back. A good background-agent dispatch has five parts. **Task.** State one concrete outcome. "Fix issue #218 by adding pagination to the orders endpoint" is a background task. "Improve the API layer" is not. **Boundary.** Name the allowed files, services, commands, and risk level. If the task can touch production data, secrets, access control, billing, migrations, or release settings, it probably should not run unattended. **Environment.** Give the agent an isolated branch, worktree, container, or cloud runner. The human's active workspace should not change while the agent works. **Verification.** Tell the agent which checks must pass before it returns. Relevant tests, linters, type checks, and manual inspection notes belong in the prompt, not in a hope that the agent will infer them. **Return artifact.** Require a reviewable result: a branch, patch, draft pull request, investigation report, failing-test reproduction, or explicit "I could not finish" report. The artifact should include the task, files changed, checks run, failures hit, and remaining risks. This is why background-agent products converge on pull requests and session logs. Codex tasks run in isolated cloud environments and return evidence for review. GitHub Copilot cloud agent works in a GitHub Actions environment, edits a branch, and can open a PR. Claude Code can respond to GitHub issues or PR comments, create pull requests, and move sessions into the background. Different products, same pattern: isolate the work, let it run, bring back a reviewable artifact. > **⚠️ Warning** > > Do not confuse "background" with "trusted." A background agent is less visible than a live agent, not safer. If the task's failure would be expensive, narrow the boundary, add checkpoints, or keep the human in the loop. ## How It Plays Out A developer assigns a small bug to a background agent from an issue: "When a user has no saved addresses, checkout throws a 500. Reproduce it, fix it, and add a regression test." The agent starts in a cloud runner with the repo loaded, creates a branch, finds the nil-address path, adds the guard and test, and opens a draft PR after the checkout test passes. The developer reads the PR body, sees the failing reproduction and the passing test run, reviews the diff, and merges after one small naming comment. She did not watch the agent work. She reviewed the artifact it returned. A platform team gives background agents a nightly maintenance lane. Each agent gets one low-risk task: update a flaky test fixture, remove an unused feature flag, or refresh generated API docs. The approval policy allows auto-merge for docs and generated files after CI, but routes code changes to a human reviewer. By morning, six PRs are waiting. Four are already merged, one needs review, and one failed because the agent could not reproduce the issue. That failed report is still useful because it names the command, environment, and missing precondition. A team tries the pattern on the wrong task: "modernize the billing service." The background agent runs for two hours, touches twenty-three files, rewrites a migration, and returns a huge PR with a green unit-test suite but no integration evidence. Review stalls. The problem was not that background agents are bad. The problem was task shape. The work exceeded the agent's horizon, crossed sensitive boundaries, and returned an artifact too large for confident review. The fix is to decompose the job: one background agent maps the billing call sites, another writes a plan, and each implementation run touches one bounded slice. > **💡 Tip** > > Write the stop rule into the prompt. "If you cannot reproduce the bug in 20 minutes, stop and report what you tried" is better than letting the agent keep searching until it invents progress. ## Consequences **Benefits.** Background agents turn agentic coding into parallel wall-clock work. They are well suited for backlog items, small bug fixes, test improvements, documentation updates, codebase investigations, and other tasks where the outcome can be judged from an artifact. The human spends attention at the decision and review boundary: what to delegate, what came back, and whether the evidence is enough. They also improve traceability when the return artifact is designed well. A branch plus CI run plus session log tells a clearer story than a live chat transcript buried in one developer's tool. The same artifact can feed [Code Review](code-review.md), [Agent Provenance](agent-provenance.md), and later incident analysis. **Liabilities.** Background agents can flood review queues. They can also make weak prompts look productive, because a branch and confident summary feel like progress even when the task was misunderstood. The more agents run unattended, the more you need limits on task size, file scope, runtime, and merge authority. Review cannot stop at a green CI badge. For non-trivial changes, trace the critical path, check boundary cases and permissions, and make sure the agent did not weaken tests or workflows to make the run pass. The pattern depends on review discipline. A background agent whose work is rubber-stamped becomes a path into [Dark Factory](dark-factory.md): code moves without meaningful human inspection. The safe operating rule is simple: background agents may work while you're away, but they don't get to decide that high-risk work is acceptable. ## Sources - OpenAI introduced [Codex](https://openai.com/index/introducing-codex/) as a cloud-based software engineering agent that can run many tasks in parallel, with each task in an isolated cloud sandbox preloaded with the repository. - OpenAI's [harness-engineering essay](https://openai.com/index/harness-engineering/) describes regular background Codex tasks that scan for deviations, update quality grades, and open targeted refactoring PRs, a production example of the maintenance-lane form of this pattern. - GitHub's [Copilot cloud agent documentation](https://docs.github.com/en/copilot/concepts/agents/cloud-agent/about-cloud-agent) describes background repository work in a GitHub Actions-powered environment, including planning, branch changes, iteration, and optional pull request creation. - GitHub's [agent pull request review guide](https://github.blog/ai-and-ml/generative-ai/agent-pull-requests-are-everywhere-heres-how-to-review-them/) describes the review-bandwidth problem created by agent-generated pull requests and recommends checks for CI weakening, duplicate utilities, boundary behavior, and evidence. - Anthropic's [Claude Code GitHub Actions documentation](https://code.claude.com/docs/en/github-actions) describes issue and PR comment triggers where Claude can implement features, fix bugs, and create pull requests while following project standards. - Anthropic's [agent-view documentation](https://code.claude.com/docs/en/agent-view) describes moving a Claude Code session into the background and starting background sessions from the shell, giving the same asynchronous shape in a local or remote session model. - Hao Li, Haoxiang Zhang, and Ahmed E. Hassan introduced the AIDev dataset in [*AIDev: Studying AI Coding Agents on GitHub*](https://arxiv.org/abs/2602.09185) (arXiv:2602.09185, 2026), cataloging 932,791 agent-authored pull requests across 116,211 repositories. --- - [Next: Compaction](compaction.md) - [Previous: Worktree Isolation](worktree-isolation.md)