Tool

Pattern

A named solution to a recurring problem.

Context

At the agentic level, a tool is a callable capability exposed to an agent. Tools are what transform a language model from a text generator into something that can interact with the real world: reading files, writing code, running commands, searching the web, querying databases, or calling APIs.

Without tools, an agent is a chatbot: it can discuss code but not touch it. With tools, it becomes a collaborator that can inspect, modify, test, and iterate. The set of tools available to an agent defines the boundary of what it can do.

Problem

How do you give a model the ability to take actions in the real world while keeping those actions safe, predictable, and useful?

A model generates text. But fixing a bug requires reading a file, understanding the error, editing the code, and running a test. Each of those steps requires a capability the model doesn’t inherently have. Tools provide those capabilities, but each tool also introduces a surface for mistakes, misuse, or unintended consequences.

Forces

Capability: more tools make the agent more capable, but also increase the chance of unintended actions.
Complexity: each tool adds to the model’s decision space, potentially confusing it about which tool to use when.
Safety: some tools (file deletion, shell commands, network requests) can cause real damage if misused.
Discoverability: the agent must know what tools are available and what they do, all within its finite context window.

Solution

Design tools as focused, well-described capabilities that do one thing clearly. A good tool has:

A clear name that communicates its purpose. read_file is better than fs_op. run_tests is better than execute.

A precise description that tells the model when and how to use it. The model selects tools based on their descriptions, so clarity here directly affects quality of use.

Bounded scope. A tool that reads a file is safer and more predictable than a tool that executes arbitrary shell commands. When you must expose powerful tools, pair them with approval policies that require human confirmation for dangerous operations.

Structured input and output. Tools that accept and return structured data (JSON, typed parameters) are easier for models to use correctly than tools that require free-form text parsing.

The harness manages the inventory of available tools and mediates between the model’s tool-call requests and the actual execution. Some tools are built into the harness (file read/write, shell access). Others are provided by external MCP servers that extend the agent’s capabilities dynamically.

Tip

When an agent has access to too many tools, it can spend time deliberating about which one to use or choose poorly. If you notice an agent picking the wrong tool for a task, consider whether the tool set is too broad. A focused set of well-described tools outperforms a sprawling catalog of vaguely described ones.

How It Plays Out

An agent is asked to fix a failing test. It uses a read_file tool to examine the test and the code under test, identifies the mismatch, uses a write_file tool to apply the fix, and uses a run_tests tool to verify the fix works. Each tool invocation is a discrete, reviewable step. The human can see exactly what the agent read, what it changed, and what it tested.

A team exposes a custom tool that queries their internal documentation wiki. When the agent encounters an unfamiliar internal API, it searches the wiki rather than guessing (and hallucinating). The tool is simple (it takes a search query and returns matching pages) but it eliminates an entire category of AI smells by grounding the agent in real documentation.

Example Prompt

“Add a tool to the MCP server that queries our Postgres database for order history. It should accept a customer_id and date range, return JSON, and never allow write operations. Write tests that verify it rejects SQL injection attempts.”

Consequences

Tools are what take an agent out of the chat window and into your codebase. With a decent set of tools, an agent can read, change, and verify real files; without them, it can only describe what it would do. Well-designed tools also make agent behavior reviewable: every action is a named call with visible arguments and results, not a black-box judgment.

The cost is the tool layer itself. Each tool has to be implemented, documented, and kept working as the environment changes. Tools that are too permissive create safety risks; tools that are too restrictive frustrate the agent and the user. Calibrating capability and approval policy tool by tool is continuous work, not a one-time design decision.

Sources

Shunyu Yao, Jeffrey Zhao, and colleagues introduced the ReAct framework (2022), which formalized the interleaved reasoning-and-acting loop that makes tool use systematic for language models rather than ad hoc.
Timo Schick and colleagues at Meta demonstrated with Toolformer (2023) that language models can learn to use external tools — calculators, search engines, translators — in a self-supervised way, without explicit tool-use training data.
Reiichiro Nakano and colleagues at OpenAI built WebGPT (2021), an early demonstration that a language model could use a real tool (a web browser) to answer questions more accurately than it could from memory alone.
OpenAI introduced function calling as a standard API feature in June 2023, turning tool use from a research technique into a production capability available to any developer building on their models.

Keyboard shortcuts