Determinism

Concept

Vocabulary that names a phenomenon.

Determinism is the property that the same inputs to the same code produce the same outputs every time, and naming it is what lets a team reason about which parts of their system they can trust to be repeatable.

Understand This First

Algorithm — determinism is what makes algorithms testable and reproducible.
Side Effect — side effects are the primary source of nondeterminism in software.

What It Is

Determinism is the property that a piece of code, given the same inputs and starting state, produces the same outputs every time. A pure function that adds two numbers is deterministic: the same 2 + 2 returns 4 on every machine, in every run, forever. A function that reads the current time, or generates a random value, or hits the network is not: identical inputs produce different outputs because the function depends on something the caller didn’t supply.

The word names a binary at the function level and a spectrum at the system level. A single function either is or isn’t deterministic. A program made of many functions is deterministic to the extent that the parts you care about (the calculation, the decision, the transformation) are insulated from the parts that read the wall clock or the network. The common formulation is “functional core, imperative shell”: a deterministic core that handles the logic, surrounded by a thin nondeterministic shell that handles the outside world and passes its readings into the core as explicit inputs.

Determinism contrasts with several distinct sources of variation that practitioners often lump together. Randomness, system clocks, network calls, file system state, thread scheduling, floating-point rounding, and uninitialized memory each introduce a different kind of nondeterminism, and isolating each one has its own technique. In agentic coding, the agent itself adds a fresh source: the same prompt to the same model can produce different code on different runs. Determinism is the vocabulary that lets a team name what’s stable and what isn’t, separately for each layer of the stack.

Why It Matters

Without the word, people describe the missing property in fragments: “flaky test,” “works on my machine,” “I can’t reproduce that bug,” “the agent gave a different answer this time.” Each fragment is a symptom of the same underlying issue: somewhere in the chain, the output depends on something the inputs don’t capture. Without a single name for that property, the team can’t argue cleanly about which parts of the system should preserve it.

Determinism is also the foundation of every verification strategy a team uses. Tests assume deterministic behavior: a test that passes once and fails once is no test at all. Debugging assumes deterministic behavior: a bug you can’t reproduce is a bug you can’t fix. Type checking, property-based testing, contract testing, formal methods: every technique that proves something about a program’s behavior depends on the program behaving the same way each time it’s run. When determinism is lost, all of these tools degrade silently.

The concept matters especially in agentic coding because the agent’s output is inherently nondeterministic. The same prompt won’t produce the same code twice, and that isn’t a bug to fix; it’s a property of the medium. The discipline isn’t to force the agent to be deterministic. It’s to verify its output through deterministic means. You accept nondeterminism in the generation step and enforce determinism in the acceptance criteria: run the tests, check the types, validate the behavior. Naming determinism is what makes that separation legible.

How to Recognize It

You spot deterministic code by what it doesn’t touch. A function that takes its inputs as parameters and returns a value, without reading the clock, the filesystem, the network, or a global variable, is deterministic. A function that does any of those things is not. The test is mechanical: list every input the function reads, every output it produces, and every side effect it triggers. If the inputs fully determine the outputs and there are no side effects, it’s deterministic.

Nondeterminism announces itself in characteristic failure modes:

Intermittent test failures. A test passes ten times in a row, then fails on the eleventh. The bug isn’t in the code under test — it’s in the test’s dependence on something outside its declared inputs (shared state, clock, ordering, parallelism).
“Works on my machine.” The same code produces different behavior on different developers’ machines, in CI, or in production. Some environment variable, file path, locale, or installed library is feeding into the function without being declared.
Order-dependent tests. Test A passes when run alone, but fails when run after Test B. The tests share state that one of them mutates and the other reads, and the order in which they run determines the outcome.
Heisenbugs. A bug disappears when you add logging or run under the debugger. The added observation perturbs timing or memory layout enough to change which nondeterministic path is taken.
Drift across runs of the same agent. You re-run the same prompt to generate the same function, and you get a meaningfully different implementation. The function is deterministic; the agent that generates it isn’t.

The deeper signal is what the team reaches for when a flake shows up. If the response is “retry the build” or “mark the test as flaky,” the team has accepted nondeterminism it could remove. If the response is “find what changed between runs and feed it in as a parameter,” the team is using the vocabulary the concept gives them.

Tip

When you ask an AI agent to generate a function, check whether it introduces hidden nondeterminism: calls to the current time, random values, or external services embedded inside what should be pure logic. Ask the agent to extract those dependencies as parameters instead.

How It Plays Out

A billing system calculates monthly charges. The calculation depends on usage data and rate tables, both of which can be made deterministic inputs. The developer structures the calculation as a pure function: given these usage records and these rates, the charge is exactly this amount. The function that fetches usage data from the database lives outside the calculation, in the nondeterministic shell. The billing logic itself can be tested with fixed inputs and expected outputs, every time. When a customer disputes a charge, the developer reproduces the calculation by feeding the same inputs into the same function and watches it produce the same number.

A team notices that their integration tests pass locally but fail intermittently on the build server. Investigation reveals that two tests depend on the order in which they run; one test leaves data behind that the other consumes. The tests are nondeterministic because they depend on shared mutable state (the database). The fix isn’t to mark the tests as flaky and retry on failure; it’s to make each test self-contained: set up its own state, run, and clean up. The tests become deterministic, and the intermittent failures disappear.

Example Prompt

“Extract the billing calculation into a pure function that takes usage records and rate tables as parameters and returns the charge amount. Move the database fetch and the current-time call outside this function.”

A platform team running an agent against a large refactor finds that the agent produces a different rewrite on each run. They don’t try to make the agent deterministic; they can’t. They make the acceptance criteria deterministic instead: a fixed test suite, a fixed type check, a fixed lint pass. The agent generates a candidate, the deterministic gates accept or reject it, and the team trusts the gates rather than the generation. The agent is the imperative shell; the gates are the functional core.

Consequences

Deterministic systems are far easier to test, debug, and reason about. When a bug is reported, you can reproduce it by supplying the same inputs. When a test fails, you know it’ll fail again the same way, so you can diagnose it without guessing at timing or environmental differences. Refactoring becomes safer because you can compare outputs before and after a change and know that any difference is your change, not noise.

The cost is that strict determinism takes discipline. Side effects must be quarantined to the shell, dependencies must be passed in rather than reached for, and some convenient idioms (sprinkling timestamps through the code, calling a UUID library inline, reading a config file from disk wherever it’s needed) have to be rewritten as explicit parameters. The discipline is real overhead until it becomes habit, and a fully pure codebase is impractical: somewhere, the program has to read the clock and talk to the network.

The deeper consequence is what the concept does to the team’s reasoning. Once a team has the word, they can argue about which parts of the system should be deterministic and which can stay nondeterministic. They can spot a flaky test and ask the right question (what input is unmeasured?) instead of just retrying it. They can accept the agent’s nondeterminism without losing their grip on the system’s verifiability. The vocabulary is what makes that decomposition possible.

Sources

Alan Turing’s 1936 paper On Computable Numbers, with an Application to the Entscheidungsproblem formalized the idea of a deterministic machine whose behavior is fully determined by its current state and input symbols. This is the theoretical foundation for determinism in computing.
Michael Rabin and Dana Scott introduced nondeterministic automata in their 1959 paper Finite Automata and Their Decision Problems, giving the formal counterpart to deterministic computation and launching decades of complexity theory research.
Gary Bernhardt coined the phrase “functional core, imperative shell” in his 2012 Destroy All Software screencast and his Boundaries talk at SCNA 2012. The pattern of isolating deterministic pure logic from nondeterministic I/O at the edges has become a widely adopted architectural strategy.

Keyboard shortcuts