Harness (Agentic)

Pattern

A named solution to a recurring problem.

Understand This First

Model – the harness wraps a model.
Tool – the harness manages tool access.

At the agentic level, a harness is the software layer that wraps a model and turns it into a usable agent. The model provides intelligence. The harness provides everything else: the loop, the tools, the context engineering, the approval policies, and the interface that puts it all in front of a human. Without a harness, a model is a function that takes text and returns text. With one, it’s an agent that can read files, run commands, and iterate toward outcomes.

Claude Code, Cursor, Windsurf, Aider, and custom applications built with agent SDKs are all harnesses. Each makes different choices about tool exposure, autonomy, and user interface, but they share a purpose: making the model practically useful for real work.

Problem

How do you bridge the gap between a model’s raw capability and the practical requirements of getting work done?

A model alone can’t read your codebase, run your tests, or modify your files. It can’t remember what it did last session or enforce your project’s conventions. It doesn’t know when to ask for permission and when to act. Every one of these capabilities must come from something outside the model.

Forces

Models are stateless. They need external systems to persist state, manage conversations, and carry context across turns.
Tool access cuts both ways. Too few tools and the agent is helpless; too many and it picks the wrong one or causes damage.
Safety boundaries must be enforced externally. The model has no built-in sense of what it should and shouldn’t do.
The interface shapes the experience. A clumsy harness makes agentic coding feel slower than typing the code yourself.

Solution

A harness provides several capabilities:

The agent loop. The harness orchestrates the cycle of prompt, response, tool call, observation, and next step. It manages the back-and-forth between the model and the tools until the task is complete or the agent needs human input.

Tool management. The harness decides which tools the agent can access and how they’re invoked. It might expose file reading, file writing, shell commands, web search, and MCP servers, each with its own permissions and constraints.

Context assembly. The harness loads instruction files, includes memory entries, manages conversation history, and handles compaction when the context window fills. A good harness does this transparently. You focus on the task; it worries about what the model can see.

Approval and safety. The harness enforces approval policies: which actions the agent can take autonomously and which require human confirmation. This is the primary safety mechanism in agentic workflows.

User interface. Terminal, IDE panel, or web app, the harness presents the agent’s work in a way that supports human review and direction.

Tip

Choose a harness that matches your workflow. If you work in a terminal, a CLI-based harness keeps you in your environment. If you work in an IDE, an integrated harness reduces context switching. The best harness is the one you actually use consistently.

How It Plays Out

A developer uses a CLI-based harness to work on a Python project. The harness reads the project’s CLAUDE.md file on startup, loading coding conventions and architectural decisions into the context. When the developer asks for a new feature, the harness lets the agent read relevant files, write new code, and run the test suite, pausing for approval before any destructive operation. The developer works at a higher level of abstraction, directing rather than typing.

A platform team builds a custom harness using an agent SDK to automate pull-request reviews. When a PR is opened, the harness spins up an agent that reads the diff, runs the test suite, checks for naming-convention violations, and posts a review with inline comments. The model does the reasoning; the harness wires it into GitHub webhooks, the CI runner, and the team’s style-guide document. Nobody on the team could have built the reasoning. Nobody at the model provider could have built the integration. The harness is the seam where both halves meet.

Example Prompt

“I’m starting a new Python project. Set up your harness to load the project’s CLAUDE.md, use pytest for testing, and pause for approval before any destructive shell command.”

Consequences

A good harness makes agentic coding feel natural and productive. It handles the mechanics of tool invocation, context management, and approval flow so that the human can focus on direction and review.

The cost is dependency. Different harnesses make different tradeoffs about autonomy, tool exposure, and context management, and switching means adjusting your workflow. The harness itself is software with bugs, limitations, and opinions that shape your work. Understanding what your harness does behind the scenes, especially around context assembly and approval policies, helps you work with it rather than against it.

Sources

Birgitta Boeckeler coined the term “harness engineering” in her work with Martin Fowler at ThoughtWorks (2024-2025), framing the harness as a distinct engineering discipline rather than a configuration detail. Their Exploring Generative AI series treats the harness as the primary locus of engineering judgment in agentic systems.
The agent loop that the harness orchestrates traces back to Stuart Russell and Peter Norvig’s perceive-reason-act cycle in Artificial Intelligence: A Modern Approach (1995). Their formulation of an agent as anything that perceives its environment and acts upon it through actuators maps directly to the harness’s role: it provides the sensors (tools that read state) and actuators (tools that change state) that the model reasons over.

Keyboard shortcuts