Agentic Engineering

Pattern

A reusable solution you can apply to your work.

“‘Agentic’ because the new default is that you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight. ‘Engineering’ to emphasize that there is an art and science and expertise to it.” — Andrej Karpathy

The professional discipline of orchestrating coding agents to produce production software, where the human writes the spec, supervises the work, and reviews the output, and the agents write almost all of the code.

Understand This First

Vibe Coding — the predecessor it supersedes; agentic engineering is what you do when you take the same workflow seriously.
Agent — the unit of work being orchestrated.
Compound Engineering — the discipline that lets the practice get cheaper over time.
Harness Engineering — the infrastructure layer that makes orchestration reliable.

Context

In February 2026, Andrej Karpathy posted that he was retiring “vibe coding” as the default name for what he was actually doing day to day. The replacement was agentic engineering: the same model-driven workflow, but no longer pretending the output was a weekend toy. Within ten weeks the term had been picked up by Anthropic’s Trends Report, training programs, vendor docs, and a steady stream of practitioner writeups. Glide’s writeup pinned the definition: humans now write under one percent of code directly, instead orchestrating multiple specialized AI agents that plan, implement, and test in parallel under supervision.

The shift matters because it names the sober middle ground that practitioners had been working in without a label. On one side sits Vibe Coding, the let-it-rip workflow that Karpathy himself originated and then disowned for production use. On the other sits the older default of writing every line by hand. Agentic engineering is the position most working developers actually occupy in 2026: the agents do the typing, but a human is responsible for the result, reads the diffs, and engineers the conditions under which the agents can be trusted with more.

Problem

Once a coding agent is genuinely capable, the developer’s job changes shape. You’re no longer the primary author. You’re the supervisor of an unevenly skilled team that works at machine speed, never gets tired, and occasionally produces something confidently wrong. The skills that mattered when you wrote every line (fast typing, deep familiarity with the standard library, holding the whole module in your head) recede. New skills come forward: writing a spec the agent can execute against, decomposing work into chunks an agent can finish, reading diffs faster than you used to write them, knowing which kinds of mistakes to look for in which kinds of output.

There has been no agreed name for this role. “Software engineer using AI assistance” understates how much has changed. “Vibe coder” overstates the abdication of responsibility, and after the security incident reports of late 2025 and early 2026, the term started carrying enough reputational damage that serious practitioners stopped applying it to themselves. Without a name, the practice was being learned in isolation, recipe by recipe, with no shared vocabulary for what made a good supervisor different from a bad one.

Forces

Capability has moved past the tool boundary. Agents that genuinely write production code change what “doing the work” means. Treating them as fancy autocomplete misses the actual lever; treating them as autonomous coworkers misses what they still get wrong.
The reputational cost of “vibe coding” rose fast. The original term implied accepting output without reading it. Once production incidents started getting attributed to that workflow, the label became unsafe to wear in professional contexts, which left a vocabulary hole.
Oversight is expensive but skipping it is more expensive. Reading every diff slows the human down; not reading them ships defects at machine speed. The practice has to find a stable point where supervision is meaningful but not the bottleneck.
The 99/1 ratio rewards different skills than the 0/100 ratio did. Spec-writing, decomposition, agent supervision, and reviewing-at-speed are the new core skills. Knowing every API call by heart matters less.
The practice is repo-local in the same way harness work is. What makes agentic engineering effective in this codebase is partly the conventions, the tests, and the harness, none of which transfer cleanly to the next project.
There is genuine disagreement about how much oversight is enough. Anthropic’s own 2026 Trends Report finds developers using AI in 60% of work but fully delegating only 0–20% of tasks. The 80–100% supervision band is currently load-bearing; predictions that it will compress vary widely.

Solution

Treat orchestrating coding agents as a real engineering discipline, with named practices, accumulating expertise, and explicit standards for supervision. The change isn’t that you stopped doing software engineering. It’s that the surface area you do it on moved. You spend more time writing the brief and the spec, more time on plan and review, and less time typing the implementation.

Four practices distinguish the discipline as it has stabilized in 2026:

Structured oversight. A human stays accountable for the output. The level of automation rises with experience; the accountability does not. Practical mechanisms include code review on every meaningful change, bounded autonomy that constrains what agents can do without asking, and approval policy for the irreversible operations.
Goal-driven decomposition. The supervisor breaks work into pieces an agent (or subagent) can finish in a bounded session, then specifies done-when conditions for each piece. Plan Mode, specs, and explicit task lists are the durable artifacts the orchestration runs on.
Iterative verification. The agents run inside a verification loop: change, test, inspect, iterate. The supervisor’s job is to make sure the loop closes. That means tests are real, failures are surfaced rather than papered over, and the agent isn’t fooling itself with happy-path-only checks.
Governance and traceability. What the agents do is recorded. Agent traces, progress logs, and decision records make the work auditable after the fact. When something goes wrong, you can read what actually happened, not just what the agent reported.

The practice rides on two adjacent disciplines that this article does not subsume. Harness Engineering is the infrastructure layer underneath: the configuration of tools, subagents, hooks, and policies that turns a general model into a reliable worker on this codebase. Compound Engineering is the time-axis discipline: it captures every shipped lesson onto a durable surface so the work gets cheaper as it runs. Agentic engineering is the umbrella discipline the working developer is doing; the other two are the supporting structures that make it scale.

Tip

When you find yourself reaching for “vibe coding” to describe your own day-to-day work, stop and ask whether you mean it. If you read the diffs, run the tests, write the spec, and own the result, you’re not vibe coding; you’re doing agentic engineering. The names matter because they describe different relationships with the output. Pick the one that’s true, and use it.

Distinguishing from neighbors

A handful of related terms are close enough that readers reasonably ask how they differ.

Vibe Coding is the anti-pattern version of the same workflow. Same agents, same prompt-driven loop, but the developer accepts output without reading it. Karpathy coined “vibe coding” for throwaway projects and then introduced “agentic engineering” specifically to mark the boundary between that workflow and serious production use. The distinction is not about tooling; it’s about whether anyone reads what the agent wrote.

Compound Engineering is one specific discipline within agentic engineering — the one that makes the practice compound across sessions by codifying lessons onto durable surfaces. A team can do agentic engineering without compound engineering and find that month seven feels exactly like month one. Agentic engineering describes the day-to-day workflow; compound engineering is the time-axis investment that determines whether it gets cheaper or stays flat.

Harness Engineering is the infrastructure underneath. Where agentic engineering is what the working developer does, harness engineering is what the platform person does to make agentic engineering reliable on a particular codebase. The two roles can be the same human or different ones; on small teams they always are.

How It Plays Out

A senior engineer at a mid-size company has stopped writing implementation code as their first move. The morning starts with reading agent traces from the overnight run, accepting two PRs the critic agent already vouched for, and rejecting one where the test coverage looked plausible but the test was checking the wrong invariant. By 10am they’re writing a spec for the day’s larger piece of work, a refactor of a billing module, and decomposing it into five tasks small enough that each can be handed to a subagent with a clear done-when. The actual coding starts at 11. By 5pm three of the five tasks are merged, one is in review, and one bounced back to the spec because the agent surfaced a question the engineer hadn’t thought to answer. None of the day’s typing was implementation code, and the team shipped more than they used to ship in three. That’s the practice.

A two-person startup runs a single Codex-based harness with a planner-writer-critic topology. The founder writes the briefs in the morning, kicks off the harness, and works on customer calls while it runs. Every hour or so a notification surfaces a PR for review. The founder reads each diff against the original brief (not against the implementation choices, just against the intent) and approves or sends back with a one-paragraph correction. Three times a week she pulls up the progress logs and looks for patterns: classes of mistakes the critic isn’t catching, conventions the writer keeps forgetting. Those patterns turn into instruction-file updates, new subagent specializations, or hook additions. She is doing agentic engineering at the working level and harness engineering on the maintenance cadence. Together they let two people ship what used to take a team of eight.

A junior engineer in their first year on the job is learning agentic engineering as their default mode. They have never spent a long stretch writing implementation code without an agent. Their early growth pains are different from the previous generation’s: they can spec a task, but their specs are too vague; they can read diffs, but they read them too fast; they trust the agent’s tests until the day a passing suite ships a regression. Their senior pairs them with a mentor specifically on supervision skills: how to read a diff at the speed an agent produces them, how to design a spec that fails closed when the agent misunderstands, when to break a piece of work into smaller pieces. The mentor’s job is teaching the discipline of agentic engineering, not the syntax of the language. Six months in, the junior is supervising work at the rate the seniors do, and starting to develop a feel for which kinds of mistakes show up in which kinds of code.

Example Prompt

“You are working as part of an agentic-engineering workflow. I am the supervisor; you are the implementer. Before writing any code, restate the spec back to me in your own words, list any ambiguities you can see, and propose the decomposition into sub-tasks you intend to use. Wait for my approval before starting implementation.”

Consequences

The wins map to the discipline’s claims. Throughput goes up substantially because the typing stops being the bottleneck. Senior engineers spend more of their day on the parts of the work that benefit most from senior judgment (specs, decomposition, review, harness investment) and less on parts that don’t. Smaller teams ship more software, because the cost of executing on a clear specification has fallen sharply. The discipline also produces a clearer separation between “what we want” and “how we got it,” because both the spec and the agent trace are first-class artifacts rather than tacit knowledge.

The costs are honest, and several of them are still being learned. Skill atrophy is real: practitioners who spent years building muscle for fast implementation work report that those skills decay when they aren’t used daily, which becomes a problem the day the agent gets stuck on something only the human can finish. Supervision skills are not the same as implementation skills, and senior engineers who don’t actively develop the new skills can become the bottleneck rather than the throughput multiplier. Specs that worked fine when humans read them turn out to be too vague for agents, which forces a discipline of writing harder specs that some teams find unfamiliar. Code-review load grows because more code is being produced; teams that don’t invest in faster review pipelines drown in PRs.

The deepest cost is the comprehension question. When the agents write almost all of the code, the working developer’s understanding of the codebase shifts from line-level to architectural. That’s fine for some kinds of changes and dangerous for others. Teams that adopt agentic engineering without a deliberate practice for keeping at least one human deeply familiar with each subsystem accumulate the comprehension debt that the Vibe Coding article warns about, just at a slower rate. The practice is not a substitute for understanding the system; it’s a discipline that makes understanding the system feasible at higher throughput, if the team invests in keeping that understanding current.

The largest open question is how much of the supervision load will compress as agents get more reliable. If it compresses a lot, agentic engineering shades toward something closer to product management. If it compresses little, the supervisor role stays central for the foreseeable future. Both scenarios reward investing in the named practices now: the supervision skills, the spec discipline, the harness work, the compound-engineering loops. Whichever way the curve bends, those investments hold their value.

		Note
Contrasts with	Vibe Coding	Vibe coding is the unsupervised version of the same workflow; agentic engineering is what you call it once you take responsibility for the output.
Depends on	Harness Engineering	Agentic engineering at production scale is only as reliable as the harness around the agents; harness engineering is the infrastructure layer the practice rests on.
Detects	Smell (AI Smell)	Recognizing AI smells in agent output is one of the supervisor's core skills.
Refined by	Compound Engineering	Compound engineering is the specific discipline that makes agentic engineering compound across sessions; without it, the practice plateaus.
Uses	Agent Trace	Traces are how a supervisor reads what the agents actually did, not just what they reported.
Uses	Approval Policy	The mechanism that operationalizes bounded autonomy at the action level.
Uses	Bounded Autonomy	The governance frame that decides what each agent is allowed to do without asking.
Uses	Brief	The supervisor writes the brief; the agents implement against it.
Uses	Code Review	Reviewing agent output is non-negotiable for the practice; review is where the supervisor's judgment lands.
Uses	Human in the Loop	The supervision posture; agentic engineering is the practice of staying in the loop deliberately, not by default.
Uses	Plan Mode	Plan-then-execute is the dominant inner-loop discipline of agentic engineering.
Uses	Subagent	The dominant scale-out mechanism: split the work across specialized agents and supervise the composition.
Uses	Verification Loop	The change-test-inspect-iterate cycle the agent runs inside; agentic engineering is the practice of making sure that loop closes.

Sources

Andrej Karpathy introduced the term in a public statement in February 2026, framing the change as both descriptive (“the new default is that you are not writing the code directly 99% of the time”) and prescriptive (“‘Engineering’ to emphasize that there is an art and science and expertise to it”). The naming choice was deliberate: Karpathy had coined “vibe coding” the previous year and was retiring it for serious work after watching the term get associated with shipped defects.
Anthropic, 2026 Agentic Coding Trends Report. The report uses agentic engineering as the framing for the practice professional engineers have settled into, and provides the empirical anchors used in this article: AI used in roughly 60% of developer work, full delegation in only 0–20% of tasks, the 80–100% supervision band as the current operating range.
The 99/1 framing and the four named practices (structured oversight, goal-driven decomposition, iterative verification, governance and traceability) crystallized in practitioner writeups during the first quarter of 2026, with multiple independent treatments converging on roughly the same set. The decomposition into four practices is a synthesis, not a single author’s contribution.
Frederick Brooks’s The Mythical Man-Month (1975) supplies the older intellectual ancestor: the observation that the hardest part of large-scale software work is conceptual integrity, not raw production volume. Agentic engineering is an instance of that insight. When production volume is no longer the constraint, what becomes central is the conceptual work the supervisor does: writing the spec, decomposing the work, and reviewing the result.
Donald Schön’s The Reflective Practitioner (1983) frames the supervisor’s role as reflection-in-action: a professional working with a partly-autonomous medium, reading what the medium produces, and adjusting the work in flight. The framing applies cleanly to the agentic engineering supervisor, who reads agent output, recognizes patterns of mistake, and adjusts the brief, the spec, or the harness accordingly.

Keyboard shortcuts