Consistency

Consistency is the property that everyone reading a system’s data sees a story that adds up; the word is what lets a team talk about which kind of “adds up” they actually need, and pay for only that.

Concept

Vocabulary that names a phenomenon.

What It Is

Consistency is the property that the data inside a system agrees with itself and with the rules the system is supposed to enforce. An account balance reflects every completed transaction. An inventory count matches what is on the shelf. Two services reading the same record see the same record. When that property holds, the system is consistent; when it breaks, two observers can look at the same system and walk away with two different stories about reality.

The word does a lot of work and it pays to keep the layers separate, because they get conflated and the conflation is where bugs come from:

Application-level consistency is the rule the business cares about. The sum of debits equals the sum of credits. Every shipped order has a paid invoice. No two users hold the same seat reservation. These rules are not properties of any database; they are properties of the model the database is being used to represent, and they are violated by code that updates one row without the other.
Transactional consistency is the rule a relational database promises (the C in ACID). Every transaction takes the database from one valid state to another, where “valid” means the constraints declared in the schema hold. Foreign keys point at rows that exist. CHECK constraints pass. Unique indexes are unique. The database does not know the business rule about debits and credits; it knows only the constraints the schema told it to enforce.
Replica consistency is the rule a distributed store promises about copies. After a write, when can a reader on a different replica be guaranteed to see the new value? Strong consistency means every reader sees the latest write immediately. Eventual consistency means readers will converge on the latest write given enough time. In between live a menu of intermediate guarantees (read-your-writes, monotonic reads, session, causal) that production systems pick from when “always strong” is too expensive and “anything goes” is too dangerous.

When practitioners argue about consistency they are usually arguing across these layers without naming which one they mean. “The system is inconsistent” can mean “the schema has a foreign key violation,” or “the cache and the database disagree,” or “the business invariant that every order has an invoice doesn’t hold.” All three are real failures; the cures are different; the vocabulary is what lets the team talk about which one they have.

The classical formal result behind the third layer is the CAP theorem. In a network partition, a distributed store can keep accepting writes (availability) or keep promising every reader the latest write (consistency), but not both. The theorem is often summarized as “pick two of three” — consistency, availability, partition tolerance — and that summary is the cause of more confused architecture meetings than any other piece of folklore in the field, because partition tolerance is not optional. Networks partition. A real system makes a partition-time choice: stay consistent and stop serving, or stay available and let replicas diverge. The team’s job is to know which choice their store is making for them.

For agentic coding the surface tightens. An agent that writes code touching state will quietly conflate the three layers unless someone has named them in the project’s vocabulary. The agent will reach for a cache, read a value, decide based on it, and write back to the database — and not notice that the cache and the database can disagree under load. The agent will write a feature that updates two tables and not wrap them in a transaction, because the schema doesn’t require it, only the business does. The agent will treat “the test passes once on a single-node SQLite” as evidence that the production multi-replica Postgres deployment will behave the same way. None of this is the agent being careless; it’s the agent operating on the layer the prompt named, and the prompt rarely names all three.

Why It Matters

A team that hasn’t separated the three layers will accept defenses at one layer as evidence that the others are covered, and they aren’t. A team that has the vocabulary asks the question every layer needs answered: what rule does this layer enforce, what rule does it not, and where does that gap get covered?

The cost of getting it wrong is not abstract. Two customers buy the last unit in stock at the same instant because the application checked inventory and decremented it in two separate statements rather than one. Money disappears from a reconciliation report because the bank-transfer code wrote the debit, crashed before the credit, and the schema had no constraint that required them to ride together. A customer-service agent reads a customer’s address from a notification-service replica that hasn’t caught up to the address change the customer made an hour ago, and a package goes to the old apartment. None of these are exotic bugs; all of them are routine consequences of asking one layer to enforce a rule that lives in a different layer.

Naming the layers is also what makes performance honest. Strong consistency is not free; coordination is the work that gives it to you, and coordination has a latency floor. A team that demands strong consistency everywhere ends up with a system that is slow and brittle, then quietly relaxes the guarantee in the places where it hurts most, and forgets to write down which places those are. Six months later somebody builds a feature on top of an “obviously consistent” subsystem and discovers it wasn’t. The discipline isn’t to make everything strong; it’s to be explicit about which data needs which guarantee, write that decision down, and design the rest of the system around the decisions actually made.

For agentic workflows the discipline gets more pointed. The agent’s prompt is the place where the layer gets named, or doesn’t. “Wrap the read and write in a transaction” names the transactional layer. “Read the address from the source of truth, not the notification cache” names the replica layer. “Enforce the rule that every shipped order has a paid invoice” names the application layer. The team that treats consistency as one word is going to ship code that’s consistent at the wrong layer, and the agent will help them do it confidently. The team that names the layer the agent has to operate at gets code that defends the layer it was supposed to defend.

How to Recognize It

You’re looking at a consistency question whenever two facts inside the system are supposed to imply each other and the code that maintains them is more than a single atomic operation. The questions to ask are layer-specific; trying to ask all three at once is what produces vague design discussions.

At the application layer, look for invariants the business has but the schema doesn’t:

Two writes that have to happen together for the model to make sense (debit one account, credit another; create an order row and an order-items row; move money out of one bucket and into the next).
A rule that says “every X has exactly one Y” where the schema allows zero or many because the foreign key isn’t constrained that tightly.
A multi-step workflow where a partial completion leaves the system in a state no human ever wanted to represent (“invoice paid, order not yet shipped, customer charged twice if they retry”).
Read-then-decide-then-write sequences where the value can change between the read and the write (the textbook race condition; the agent’s helpful-but-wrong “check balance, then debit” code is this same shape).

At the transactional layer, look for what the schema does and does not enforce:

A CHECK constraint that a developer left off because “the application validates that”; six months later a different code path bypasses the application and writes the bad row directly.
A foreign key declared but not indexed, so the cascading update on the parent takes the table offline under load.
A unique index missing on a column the application treats as the primary user-facing identifier.
A column with no NOT NULL because “we’ll always set it”; six months later a migration sets it to null on a million rows because nobody remembered the invariant.

At the replica layer, look for places where the data lives in more than one machine and the application is reading from a copy:

A cache in front of a database, with no rule about how the cache stays current and no metric on how stale it is.
A read replica behind a leader, with the application reading from the replica because it’s faster and not realizing the replica lags under load.
A search index built off the database, where “the user just created this and immediately searched for it” returns nothing for a beat and the team calls it a bug rather than the indexing pipeline doing its job.
A multi-region deployment where two regions can both accept writes for the same record and the conflict-resolution policy is the default the database chose.

A few signs that all three layers are in play in the same incident:

A reconciliation report that doesn’t reconcile and the team can’t immediately tell whether the application wrote bad data, the schema accepted bad data, or the replicas haven’t caught up. Until the question is split, every theory is plausible and none can be tested.
“It looks right in production but the test environment shows it wrong,” or vice versa, where the difference is a single-node test database versus a multi-replica production cluster; that gap is the replica layer announcing itself.
An agent’s confident “the feature works” combined with a downstream report that says it doesn’t; the gap is somewhere in the three layers and the agent’s self-report didn’t include the consistency model it assumed.

Warning

“Eventually consistent” is not the same as “consistent eventually.” The first is a precise architectural choice with bounded staleness and named guarantees. The second is “we hope it works out.” A system that says “eventually consistent” and means the second one is going to surprise its operators, and the surprise will be expensive.

How It Plays Out

A team running an e-commerce site holds a flash sale for the last hundred units of a hot item. The checkout flow reads the inventory count, displays it to the user, and on click decrements the count and creates the order. Under steady-state traffic the gap between read and write is small enough that the race rarely fires. Under flash-sale traffic the gap fires every second; two customers see “1 left,” both click, both orders go through, the warehouse has one unit and two paid orders. The fix is to wrap the inventory read and decrement in a database transaction with a row-level lock, so the two operations form one atomic unit and the second customer’s transaction sees the post-decrement zero rather than the pre-decrement one. The team puts the fix in and the bug goes away; what they actually changed is the transactional layer, by promoting an application-level invariant (“inventory and orders agree”) into a constraint the schema’s transaction machinery now enforces.

A platform team migrates their notification service to a new database with a read replica. The old service ran off a single node, the new one routes reads to the replica for performance, and the replica lags the leader by anywhere from milliseconds to a few seconds depending on load. The notification service’s job is to send “your order has shipped” emails, and it reads the customer’s current address from the replica. For most customers this is fine. For the customer who updated their address an hour ago and just received the shipping notification, the email goes to the old apartment, because the replica’s address-update event hasn’t propagated yet under the day’s load. The fix has two parts: address reads for shipping-relevant operations route to the leader (or a stronger read consistency level), and the team writes down which other operations have the same shape and need the same routing. The team learned something specific about the replica layer their architecture had until then treated as one undifferentiated database.

A coding agent is asked to build a “transfer credit between accounts” endpoint in a brand-new application. The agent writes two SQL UPDATE statements in sequence, one to debit the source account and one to credit the destination, runs the new endpoint against the test database, sees that both accounts move correctly, declares the feature done, and merges. The next week’s reconciliation report shows the system has lost $4,300. Investigation reveals that the application crashes occasionally between the two UPDATE statements (an unrelated bug in the request middleware), and when it does the debit has happened and the credit hasn’t, and there is no schema constraint that requires the two to ride together. The fix is to wrap both statements in a transaction so the crash rolls both back, and to add a daily invariant check that the sum of all balances equals the sum of all transfers ever made, so the next time something like this drifts the team finds out within a day rather than a quarter. What the agent missed was the application layer: the business invariant (“debits and credits balance”) was real, and the schema didn’t enforce it, and the agent’s prompt didn’t name it, and the agent’s “two updates in a row” is correct code for a layer where the invariant doesn’t exist and wrong code for the layer where it does.

Example Prompt

“This endpoint debits one account and credits another. The two updates must succeed together or fail together; if one happens without the other, the system has lost money. Wrap the two statements in a database transaction with the appropriate isolation level, and add an integration test that simulates a crash between them and verifies neither balance changed.”

Consequences

Treating consistency as a named layered question, rather than as one word the team uses to mean three different things, changes what the team’s defensive investment is for. The team stops trying to make “the system consistent” and starts asking, of each piece of data, which guarantee does this need, at which layer, and what is going to enforce it? That question has answers; the previous question has only opinions.

Benefits. A team that has separated the layers will reach for the right defense for the actual problem. Application invariants get expressed as transactions, as schema constraints where possible, and as periodic reconciliation checks where the constraint can’t be declared. Schema constraints get treated as part of the model, not as an afterthought, because the team understands that “the application validates that” is a defense that’s one bug away from failing. Replica behavior gets named explicitly in the architecture documents, so the next engineer who builds on top of “the database” knows whether they’re getting strong consistency or something weaker. The team’s mental model becomes precise enough that an outside reviewer can ask “which layer enforces this rule?” and get a real answer for every rule that matters.

Liabilities. Every additional consistency guarantee costs something to produce. Transactions cost lock time and throughput; schema constraints cost migration effort and rule out shapes the application might later want; strong replica consistency costs latency under partition. A team that doesn’t budget for the cost will reach for the strongest guarantee everywhere and then quietly relax it in the places where the performance hurts, without writing the relaxations down. Six months later the team’s architectural documents say one thing and the system behaves another way, and a feature gets built on top of an assumption that no longer holds. The discipline is not “demand strong consistency”; it’s “be explicit about which data needs which guarantee, document the decisions, and design the rest of the system around the decisions actually made.”

For agentic workflows the consequence is sharper still. An agent will produce code that’s consistent at the layer the prompt named and inconsistent at the layers the prompt didn’t. The team that prompts only at the application layer will get application-correct code that has race conditions; the team that prompts only at the transactional layer will get transaction-correct code that violates a business invariant the schema doesn’t know about. The remedy is not to write longer prompts; it’s to make consistency a topic the codebase itself has vocabulary for — invariant checks the agent can see, transaction boilerplate the agent can pattern-match, documented replica-routing rules the agent can read — so that the agent has the vocabulary too. The agent is a fast writer of code in the codebase it’s reading. If the codebase names its consistency layers, the agent’s code will name them too.

Sources

Jim Gray’s “The Transaction Concept: Virtues and Limitations” (VLDB 1981) defined the transaction as the unit of consistency — “all or nothing, before or after” — and gave the field the vocabulary used here for atomic operations and serializable updates. Theo Härder and Andreas Reuter’s “Principles of Transaction-Oriented Database Recovery” (ACM Computing Surveys, 1983) coined the ACID acronym (Atomicity, Consistency, Isolation, Durability) that is now the standard rubric for what a transactional database guarantees, and supplies the precise sense of “consistency” used in this article’s transactional-layer discussion.
Eric Brewer introduced the CAP trade-off in his 2000 PODC keynote “Towards Robust Distributed Systems”, arguing that under network partition a system must choose between consistency and availability. Seth Gilbert and Nancy Lynch turned the conjecture into a theorem two years later in “Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services” (ACM SIGACT News, 2002). Brewer revisited and refined the framing in “CAP Twelve Years Later: How the ‘Rules’ Have Changed” (IEEE Computer, 2012), clarifying that real systems explicitly handle partitions rather than literally pick “two of three” — the point on which the article’s “pick two of three is folklore” framing rests.
Werner Vogels’s “Eventually Consistent” (ACM Queue, 2008) gave the eventual-consistency model its modern name and worked out the practical menu of weaker guarantees (read-your-writes, monotonic reads, session, causal) that production systems use when strong consistency is too expensive. The article above adopts that menu directly.
The agent-specific framing — that an agent’s output is consistent at the layer the prompt named, and silently inconsistent at the layers the prompt did not — is implicit in the working literature on coding agents. The Anthropic engineering team’s discussions of agent loops, tool use, and prompting discipline and the broader practitioner conversation around production-grade coding agents converge on the operational rule used here: the codebase’s vocabulary is what the agent has access to, and a codebase that names its consistency model gets code that respects it.

Keyboard shortcuts