Agent-Computer Interface (ACI)

Concept

A foundational idea to recognize and understand.

The Agent-Computer Interface is the set of tools, affordances, and interaction formats through which a language-model agent acts on a computer, deliberately designed for the agent’s cognition rather than a human’s.

Also known as: Agent Interface, Tool Surface (loosely)

Understand This First

Tool – the unit that an ACI is composed of and shapes.
Affordance – the human-side analog this concept mirrors.
Model – the entity whose perception and reasoning the ACI is tuned for.

What It Is

A computer interface is a negotiated surface between two parties. For sixty years the parties have been a human and a machine, and the discipline that studies the negotiation is Human-Computer Interaction: pointing devices, undo stacks, visual scanning, kinesthetic memory, keyboard shortcuts. When the party on the human side gets replaced by a language model, almost every assumption under HCI breaks. The model can’t scan a screen. It has no visual working memory. It reads text one token at a time. Its attention thins as context grows. It can’t hover, right-click, or notice a flashing cursor.

The Agent-Computer Interface is what you design for that user. Same computer, different cognition. The Princeton SWE-agent paper (Yang et al., NeurIPS 2024) named the idea and showed, on the SWE-bench benchmark, that rethinking the command surface around a language model’s perceptual limits could lift a coding agent from near-zero to state-of-the-art performance using the same underlying model. Where HCI asks “what will a human notice, understand, and remember?”, ACI asks “what will a language model see in its context window, parse into an action, and recover from when it fails?”

Concretely, an ACI is the union of the names you give tools, the descriptions you write for them, the shape of their inputs, the shape of their outputs, the errors they surface, and the way they compose. Every one of those is a design choice, and most of them were never ACI-conscious when the tool was built.

Why It Matters

Three forces put ACI at the center of how well a coding agent performs:

The model is roughly fixed; the interface isn’t. You can’t retrain a foundation model for your environment, but you can redesign every tool the agent sees this afternoon. The room to improve an agent’s behavior without touching the model lives here.
Tools designed for humans underperform for agents. A find that returns opaque paths works for a human who’ll scan and re-run it. It wastes tokens for an agent that has to guess. A REST endpoint returning forty fields helps a UI developer pick what to render. It dilutes the agent’s attention across thirty-five fields it didn’t need.
The empirical evidence is dramatic. When the SWE-agent authors replaced raw bash with a small ACI (line-numbered file viewer, a bounded edit command, a built-in find_file, in-line linter feedback on every write), the same model class jumped from single digits to 12.5% pass rate on SWE-bench. That’s not a tweak. It’s a different product.

How to Recognize It

You’re looking at an ACI whenever you design, evaluate, or criticize:

Tool names. read_file versus fs_op. search_symbols versus grep_wrapper. The name is half the description the model sees.
Tool descriptions. A one-line description the author pasted from a docstring versus a three-paragraph description that tells the model when to reach for this tool versus a nearby one.
Input schemas. Free-form string arguments versus typed, structured inputs that make bad calls unrepresentable.
Output shape. Returning everything the underlying API returned versus returning the five fields the agent will actually use for its next decision.
Error behavior. A cryptic stack trace versus an error message that names what went wrong and what to try next.
Surface size. Sixteen narrow tools competing for the agent’s attention versus a consolidated handful with clear division of labor. (The antipattern when this goes wrong is tool sprawl: a catalog that has grown past the model’s ability to select cleanly among its members.)

If a tool was ported into an agent’s catalog without anyone asking “how will a model read this?”, it isn’t ACI-conscious yet. Most aren’t.

How It Plays Out

A team wraps a codebase search capability for a coding agent. The naive version is one tool, grep(pattern), returning matching lines as raw text. The agent gets a wall of paths and line snippets, has to re-prompt itself to narrow, and searches the same thing twice.

A better version splits the capability into three tools with structured output: find_files(glob), search_content(regex, path), read_file(path, start, end). The agent can now ask for files, then narrow, then read. Precision rises; token usage drops.

An ACI-conscious version consolidates them back into one tool, search(query, type, scope, cursor), with a typed schema, paginated results, stable identifiers the agent can pass to a follow-up read, and a response shape that omits fields the agent doesn’t need for its next step. The tool’s description includes two concrete examples of when to use type=symbols versus type=content. The error messages teach: an invalid glob returns "your glob matched zero files under 'src/tests'; did you mean 'src/test'?" instead of "no matches".

Now compare to Claude Code’s own tool surface. Read takes an optional offset and limit, returns line-numbered output, and enforces “you must Read a file before Edit.” Edit replaces a literal string and fails loudly if the string isn’t unique, forcing the agent to quote enough surrounding context to disambiguate. Bash has a timeout, a background-run option, and a sandbox policy. None of those choices are accidental. Each is an ACI decision about how a model should interact with a filesystem and a shell, made on the model’s behalf.

Consequences

Benefits. A well-designed ACI is the biggest change you can make to a coding agent’s behavior without touching the model. Pass rates climb. Token cost drops. Sessions finish faster. The agent’s mistakes become more predictable and therefore more fixable; instead of wild flailing, you see it reach for the wrong tool in a specific way, which tells you which tool to redesign. Teams that treat the ACI as a first-class artifact report that the same model, wrapped in a better interface, starts behaving like a more senior engineer.

Liabilities. ACI design is engineering work. Good tool descriptions take time to write and revise. Response-shape choices need telemetry to validate. Consolidated tools look elegant and fail in new ways when the consolidation hid a distinction the model actually needed. And ACI choices drift: yesterday’s ideal tool becomes today’s legacy surface when the codebase or the model changes. The discipline pays off, but it pays off on every future session rather than as a one-time refactor.

There’s also a portability ceiling. An ACI tuned for your repository and your model won’t be the best ACI for someone else’s. Communities will publish sensible defaults; the local wins go to teams that tune from there.

Sources

Yang, Jimenez, Wettig, Lieret, Yao, Narasimhan, and Press’s SWE-agent paper (SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, NeurIPS 2024) introduced the term and the empirical result that makes it hard to ignore: the same model class, rewrapped in an agent-tuned tool surface, moves from near-zero to state-of-the-art on real software engineering tasks. The phrase “Agent-Computer Interface” comes from this paper.

Donald Norman’s The Design of Everyday Things (1988) and the broader HCI tradition supply the intellectual scaffolding that ACI inherits. Norman’s framing of affordances, signifiers, mapping, and feedback is the lens the SWE-agent authors explicitly borrowed; ACI is HCI with the user replaced.

Stuart Russell and Peter Norvig’s perceive-reason-act framing in Artificial Intelligence: A Modern Approach (1995) names the architecture an agent needs: sensors, actuators, and a reasoning layer between them. The ACI is the designed shape of those sensors and actuators for a reasoner whose perceptual channel is bounded text.

The modern practitioner vocabulary around tool descriptions, response shaping, and semantic identifiers emerged from the agentic coding community through 2024 and 2025, with frontier labs publishing operating-guide material that converged on a shared set of rules: write rich descriptions, prefer semantic identifiers over opaque IDs, consolidate broad-use tools, namespace as catalogs grow, and shape responses to the next decision the agent has to make.

Keyboard shortcuts