Pinning

Pattern

A recurring solution to a recurring problem.

Pinning is the discipline of explicitly fixing a choice so downstream work can rely on it not changing without a deliberate, traceable update.

You have already depended on pinning if npm ci saved you from a surprise upgrade or a lockfile made yesterday’s build reproducible. Agentic systems make the same move more important: the model, prompt, tool schema, and fixture can drift as surely as a library can. Pinning names the habit of making those choices explicit enough that future work can reproduce or change them on purpose.

Understand This First

Dependency — version pinning is the canonical instance, and the place this discipline first appears.
Version Control — every pin is a versioned statement; without VCS, pinning is just wishing.

Context

At the heuristic level, pinning is a discipline that runs across the whole stack. You pin a library version in a lockfile. You pin a model id in a config constant. You pin a prompt by checking it into the repository. You pin a schema by freezing it at a versioned boundary. You pin a decision by writing an Architecture Decision Record. The mechanics differ; the move is the same: make the choice durable enough that drift has to announce itself.

In agentic coding, the surface area for silent drift has exploded. A prompt that worked last week may behave differently today because the model alias rolled forward. A tool’s JSON shape may shift because the MCP server gained a field. A fixture pulled from a live API may not match the snapshot in your test directory. Pinning is the response to all of these: pick the version, the id, the prompt text, the schema revision, the data snapshot, and write it down somewhere a future build will read.

Problem

How do you keep the things you depend on from changing under you without warning?

The default in every modern toolchain is “latest, please.” Latest npm package. Latest Docker image tag. Latest model alias. Latest API version. Each of those defaults is a small bet that nothing important will change between now and the next time you build. The bet pays off most days. The day it doesn’t, you spend hours bisecting yesterday’s working code against today’s broken run, and the answer is always the same: something moved that you didn’t ask to move.

Forces

Freshness has real value. Security patches, bug fixes, and capability improvements only reach you if you pull in newer versions. Pinning forever means rotting forever.
Stability has real value too. Reproducible builds, deterministic tests, deterministic agent runs, A/B comparisons, and incident forensics all need the inputs to hold still long enough to study them.
Defaults push toward drift. Package managers prefer “latest compatible” ranges. Cloud APIs deprecate behaviors quietly. Model providers ship new behavior under unchanged aliases. The path of least resistance is the path of silent change.
Pinning is cheap to add and expensive to maintain. A lockfile takes a moment to commit. Updating it deliberately, with attention to what changed, is the work that pinning shifts onto you instead of letting it ambush you later.

Solution

For every input that affects behavior, replace the implicit “latest” with an explicit, immutable identifier. Then define how the pin moves.

A real pin has two parts. The immutable identifier is something that means exactly one thing forever: a SHA-256 digest, a fully qualified model id, an exact version number, a content hash. Aliases like latest, stable, claude-3-5-sonnet, or even ^4.0.0 are not pins; they are placeholders that resolve to whatever the upstream wants them to resolve to today. The deliberate update process is what keeps the pin from rotting: a scheduled review, a renovate-bot pull request, an ADR supersession, a planned model-version evaluation. Pinning without an un-pin discipline is fossilization.

What deserves a pin in agentic coding work:

Model id. Use the dated, qualified id (claude-opus-4-7, gpt-5-2026-04-15), never the alias.
Prompt text. Check prompts into the repository. The file is the pin. Treat changes the way you treat code changes.
Tool and schema definitions. Versioned MCP server contracts, JSON schemas, and tool descriptions, with consumers tested against a specific revision.
Dependency versions. Lockfiles, exact versions, hash-checked installs. Range specifiers are not pins.
Fixtures and golden outputs. A captured response from a flaky upstream is a pin you can run tests against.
Cache prefixes. Prompt caching only pays off when the prefix is byte-identical run to run. The prefix is a pin whether you call it one or not.

Leave some parts fluid: internal data structures, refactorings inside a single module, or anything covered by tests strict enough to catch a regression. The skill is knowing which inputs need to stand still and which need to move.

Warning

Pinning to an alias is not pinning. claude-opus-latest, node:lts, python:3, and ^4.0.0 all look like pins and behave like roulette. Real pins resolve to the same bytes today, tomorrow, and a year from now. If you cannot answer “what exact thing does this resolve to?” with a single immutable identifier, you have not pinned anything.

How It Plays Out

A team runs a nightly evaluation pipeline that compares two prompt versions on the same dataset. The first month’s results are unreadable: scores swing five points night to night for reasons nobody can pin down. Someone notices that the model id in the config is claude-opus-latest, which the provider has rolled forward twice. The team replaces the alias with the dated id, captures the dataset as a fixture in the repository, and locks the prompt-evaluation loop to a single combination of (model, dataset, prompt template). Scores stop drifting. The A/B becomes meaningful for the first time.

A developer ships a feature that stops working three weeks later. Bisecting the repository shows no commit that broke it. Bisecting the lockfile shows that a transitive dependency’s caret range pulled in a minor version that changed an undocumented behavior. The team replaces caret ranges with exact versions in their direct dependencies and runs npm ci instead of npm install in CI. The drift category shrinks; the next surprise comes from a different category they hadn’t pinned yet, and they pin that too.

An agent that maintains a customer-facing chatbot starts producing slightly worse outputs over a week. Nothing in the agent’s code or prompt has changed. The investigation eventually finds the cause: the agent calls gpt-4o, which the provider quietly updated. The team switches to gpt-4o-2024-11-20, adds a quarterly model-review ADR to their cadence, and writes a runbook for evaluating model upgrades against a held-out test set before bumping the pin. The next provider update no longer reaches production by accident.

A platform team rewrites a public JSON API and ships the change without a version bump. Three downstream services break overnight. The post-incident fix is structural: the boundary now carries a version in the URL, the schema is pinned per version, and changes to a published version are forbidden by review policy. New behavior ships under a new version; the old one stays frozen until consumers migrate.

Tip

For agent configurations, treat the model id, the system prompt, and the tool definitions as a single pinned bundle. When any of them changes, the bundle gets a new version, and the change is reviewed the way a code change would be. This is the smallest unit of behavior you can reproduce, evaluate, or roll back.

Consequences

Pinning makes behavior reproducible. The same inputs produce the same outputs because the inputs actually stay the same. Tests become trustworthy: a green run today means the same thing it meant last week. Incident forensics gets easier because you can rebuild the exact stack that was running when something broke. Agent evaluations become honest because the model and prompt under test are the model and prompt that ran.

The cost is the maintenance work pinning relocates rather than removes. Security patches don’t reach you for free anymore; you have to pull them in. Bug fixes upstream don’t fix your build until you bump the version. The deliberate update process becomes load-bearing infrastructure: scheduled review cadences, automated PRs that propose updates, evaluation suites that compare old and new behavior. Skip that work and pinning turns into fossilization, which is its own smell.

There’s also a real tension with YAGNI. Pinning every transitive choice forever is over-application: you carry upgrade debt for things that didn’t need to be frozen. The discipline is to pin the inputs whose drift would cost you, and to leave the rest free to evolve. The opposite mistake, drift by alias, is more common in agentic work. A prompt that depends on latest or a tool that depends on an unversioned schema looks fine until the day it doesn’t. The job of the reviewer is to spot the unpinned input before the next silent change.

Sources

The phrase “Don’t update without thinking” runs through Kent Beck and Cynthia Andres’s Extreme Programming Explained: Embrace Change (Addison-Wesley, 2nd ed. 2004) as part of the daily practices that make refactoring safe. Pinning is the artifact-side complement to that discipline.
The Twelve-Factor App methodology (12factor.net) made dependency declaration a first-class concern in the cloud-native era. Factor II (“Explicitly declare and isolate dependencies”) is the canonical statement that implicit dependencies are a liability and explicit, pinned ones are an asset.
The reproducible-builds movement (reproducible-builds.org) developed the engineering discipline behind byte-for-byte deterministic outputs from pinned inputs. The work clarified what real pinning costs and what it makes possible.
Nix and Guix, with their content-addressed store and lockfile semantics, demonstrated the strongest form of dependency pinning available in mainstream tooling. The Nix model treats every input, down to the compiler, as a pinned hash.
Michael Nygard’s Documenting Architecture Decisions (Cognitect, 2011) introduced the ADR format, which is pinning applied to decisions: capture the choice, the date, and the context, so future readers know the call was deliberate.
This article’s agentic-coding framing applies the dependency-pinning discipline to model ids, prompts, tool schemas, fixtures, and cache prefixes: the inputs that make agentic runs reproducible or unstable.

Keyboard shortcuts