--- slug: schema-database type: concept summary: "The data model rendered as a contract the database enforces: the columns, types, keys, and constraints the storage engine will accept." created: 2026-04-04 updated: 2026-05-23 related: consistency: relation: refined-by note: "Constraints in the schema are one mechanism for maintaining consistency." crud: relation: enables note: "CRUD operations run against the schema." data-model: relation: uses note: "The schema implements the data model in a specific database." data-normalization: relation: enables note: "Normalization is a technique applied to schema design." data-structure: relation: contrasts-with note: "Schemas organize data at rest; data structures organize data in motion." database: relation: uses note: "The schema lives inside a database system." domain-model: relation: informed-by note: "The database schema encodes domain model entities as tables and constraints." schema-serialization: relation: contrasts-with note: "Serialization schemas define data shape for transmission, not storage." serialization: relation: contrasts-with note: "Database schemas define storage shape; serialization defines transmission shape." --- # Schema (Database) *A database schema is the data model rendered as a contract the database itself can enforce: the columns, types, keys, and constraints that turn "what the system knows" into "what the storage engine will accept."* > **Concept** > > Vocabulary that names a phenomenon. ## Understand This First - [Data Model](data-model.md) — the schema implements the data model in a specific database. - [Database](database.md) — the schema lives inside a database system. ## What It Is A database schema is the specification of how a particular database will hold a particular system's data. It names the tables (or collections, or vertex types, depending on the engine), declares the columns each table carries, fixes the type of every column, and writes down the constraints that the database itself will refuse to violate — primary keys, foreign keys, uniqueness, not-null, check expressions, defaults. It sits one level down from the [Data Model](data-model.md). The data model says *what the system knows about*; the schema says *how the storage engine is going to be told about it.* A data model says "a book has a title and an author." A schema says "the `books` table has a `title` column of type `VARCHAR(255) NOT NULL` and an `author_id` column of type `BIGINT` that is a foreign key into `authors(id) ON DELETE RESTRICT`." The data model is a sketch the team can argue about with a marker; the schema is something the database server will load, validate, and start enforcing the moment it's committed. The term covers three places where the word is used and that practitioners conflate at their peril: - **The declared schema.** The DDL — `CREATE TABLE`, `CREATE INDEX`, `CREATE TYPE` — that defines what the database believes the world looks like. In a relational database this is explicit; in a document database it may be implicit (a JSON Schema validator, an ORM definition, or a coding convention that everyone is supposed to follow). - **The applied schema.** What the running database actually has loaded right now, after every migration that has been applied to it. The declared schema and the applied schema drift when a migration is forgotten, run partially, or applied in a different order across environments. - **The implied schema.** What the data in the database actually conforms to, regardless of what the DDL says. In a strict relational database with constraints, the implied schema and the declared schema are the same. In a document database with no enforcement, the implied schema can be a half-dozen overlapping shapes — older documents missing fields the application now requires, newer documents carrying fields no validator checks, rows in production that the developer's local schema can no longer represent. A schema is also what distinguishes a database from a file. A pile of JSON documents on disk is storage; it becomes a database when the storage engine knows enough about the shape to index it, enforce constraints on it, and answer queries that go beyond "give me the bytes." The schema is where that knowledge lives. For agentic coding the gap that matters is between what the prompt names and what the agent writes. A prompt that names entities and attributes ("a tasks table with title, status, and an assignee") will produce a schema, but the agent's default is to under-constrain: nullable columns where they should be required, no foreign keys, no indexes, status as free text instead of an enum, no `ON DELETE` clause. The code runs; the data slowly corrupts. The team's job is to extend the prompt or the codebase's existing migrations far enough that the agent has the constraint vocabulary it needs. ## Why It Matters A team without a written, enforced schema is letting its application code carry the load that the database was designed to carry for free. Every check that should live in the schema and doesn't — "this field must be present," "this id must point at a real row," "this enum value is one of four" — becomes a check in application code, then a missed check in a different part of the application, then a row in production that no current code path can have created and that nobody can explain. The schema is where invariants get cheap. A `NOT NULL` constraint is one keyword in DDL; the same guarantee written in application code is a validation function, a test, a code reviewer remembering to look for it, and a runtime exception when somebody forgets. A foreign key with `ON DELETE RESTRICT` is one clause in DDL; the same guarantee written in application code is a "did we remember to check?" review item on every endpoint that deletes anything. The schema turns these into properties of the data itself: the database refuses to store a row that breaks them, and the team gets to stop carrying the rule in their heads. The schema is also how the team's understanding of its own data stays honest. A schema is read at every onboarding, referenced at every code review that touches storage, and diffed at every migration. A team that has its schema in version control has the system's data shape in version control; the migration history is the diary of how the team's understanding of its domain has changed. A team that doesn't has only what the production database happens to hold right now, and reconstructing the history is archeology. For agentic workflows the discipline tightens. An agent reads the schema before it reads anything else, because the schema is the densest description of the system in the codebase. A schema with named enums, foreign keys, and check constraints teaches the agent what the rules are. A schema with `TEXT` everywhere and no constraints teaches the agent that anything goes, and the code the agent ships will reflect that, with no validation, no foreign keys, and a quiet trail of bad data in its wake. The team isn't writing the schema only for the database; the team is also writing it for every coding agent that's going to read the codebase next year. ## How to Recognize It You're looking at a schema question whenever the conversation is about *what shape the data is allowed to take*, *where in the system the rule is enforced*, or *what the database will refuse to do.* Specific signs: **The "should this be nullable?" debate.** A column is being added and somebody asks whether it should be `NOT NULL`. The right answer almost always exists in the domain — either the system genuinely doesn't have the value yet, in which case it's nullable and the application has to handle the absence, or the system always has the value, in which case the constraint should be there and the absence should be impossible. The wrong answer is "let's make it nullable to be safe," which moves the question to every reader of the column for the rest of the table's life. **Enum vs. string.** A column holds one of a small fixed set of values — `pending`, `done`, `archived`; `light`, `dark`, `system`. If the set is enforced (an enum type, a check constraint, a foreign key into a reference table), the schema carries the rule and the database refuses bad values. If the column is plain text, the rule lives only in application code and the column will eventually accumulate `Done`, `done `, `Complete`, `done.`, and `pending review`. **Foreign keys missing.** A `customer_id` column exists but there's no foreign key into `customers`. The team is relying on application code to make sure a customer always exists when an order references one. This works until a delete somewhere leaves orphans, or a bulk import inserts orders whose `customer_id` points at nothing. The fix is one DDL line; the cost of skipping it is a slow accumulation of dangling references that show up as null-pointer errors in unrelated parts of the codebase. **No `ON DELETE` policy.** A foreign key exists but the deletion behavior was never declared. The database defaults to `NO ACTION` (block the delete if there are children), and the team discovers this the first time they try to remove a customer who happens to have an order. The right answer (cascade, restrict, set null, set default) depends on the domain; the wrong answer is to find out from a production error at the moment a customer-success request is sitting on someone's desk. **Migration history that doesn't add up.** Two engineers run the same set of migration files against two empty databases and end up with different schemas — different column order, different indexes, different defaults. The migration tool is being used to add columns but not to converge structure, and "the schema" is no longer a thing the team can name without saying which environment. **A documentation page that calls itself the schema.** The team has a wiki page describing the database tables, written six months ago, that no engineer has updated since. The actual schema is whatever the database currently has loaded; the wiki page is fiction. The wiki page is also what new engineers read first. **Schema-less drift in a document store.** The team picked MongoDB because "the schema is flexible." Six months in, the application code branches on missing fields, defaults the absent ones, coerces strings that used to be numbers, and copes with a half-dozen historical shapes. The flexibility hasn't been free; it has been paid for in defensive application code, and the schema does exist — it just lives, distributed, in every code path that reads from the collection. **Agent code that "works" but doesn't constrain.** An agent is asked to add a new table for some feature. It writes a migration that creates a table with all `TEXT` columns, no foreign keys, no `NOT NULL`, no indexes. The endpoints work in development; the data shape it creates will misbehave at scale and corrupt under concurrent edits. The agent isn't being lazy; it's reaching for the smallest DDL that gets the test green, and the test didn't name the constraints. > **⚠️ Warning** > > A column without `NOT NULL` is a column where every reader has to think about the null case. A multiplication of small defenses, scattered across the application, costs vastly more than the single constraint in the schema would have cost. "Let's leave it nullable for flexibility" is almost always a way of saying "let's pay the constraint's cost out of the application's pocket, not the database's." ## How It Plays Out A back-office team builds a task-management feature. The first migration creates a `tasks` table with `id`, `title TEXT`, `status TEXT`, `assignee_id BIGINT`, and `created_at TIMESTAMPTZ`. Six weeks later the support queue starts seeing odd reports: tasks that say "Done" in one view and "done" in another; tasks assigned to user IDs that no longer exist; tasks with no creation timestamp because some import path forgot it. The fix isn't a feature; it's a migration. The team adds a `task_status` enum (`pending`, `done`, `archived`); a foreign key from `assignee_id` to `users(id) ON DELETE RESTRICT`; `NOT NULL` on `title`, `status`, `created_at`; and a default of `now()` on `created_at`. The next day every one of those bug classes disappears from the codebase, because the database is now refusing to create the rows that produced them. The team paid for six weeks of confusion in exchange for not writing one migration up front. A platform team migrates a service from a monolith. They pull the table definitions out of the monolith's migration history, port them to the new service's repository, and start running. Two weeks later they find that one column they thought was `NOT NULL` is actually nullable in production, because somewhere in the monolith's history there was a migration that removed the constraint to fix an emergency and a migration that re-added it, but the re-add only ran in staging and main was carrying the older definition. The lesson the team writes down: the *declared* schema and the *applied* schema are not the same thing, and the production database is the source of truth for the applied one. They add a step to every deployment that diffs the applied schema against the declared one and fails the deploy if they disagree. A coding agent is asked to "add a notification preferences feature to the user settings." The codebase has a `users` table with seventeen columns, no foreign keys to anything new, and a coding convention (visible in three other tables in the migrations directory) of using `created_at`, `updated_at`, and `deleted_at TIMESTAMPTZ` for soft-deletable tables. The agent generates a migration that creates a `notification_preferences` table with a `user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE`, the three timestamp columns, an `email_enabled BOOLEAN NOT NULL DEFAULT true`, an `sms_enabled BOOLEAN NOT NULL DEFAULT false`, and a unique constraint on `user_id`. The migration is shaped right because the existing migrations were shaped right and the agent pattern-matched. The team didn't have to write a longer prompt; they had written a longer migration history, and the agent read it. The same agent in a codebase whose existing migrations use `TEXT` for everything and no foreign keys produces a migration for the same feature that uses `TEXT` for everything and no foreign keys. The agent isn't choosing; the agent is mirroring. The codebase teaches the agent the schema vocabulary the agent will use; the team's job is to be sure the codebase is teaching the right thing. > **💡 Example Prompt** > > "Before writing this migration, list the constraints the table should carry — NOT NULL on every column that must always have a value, foreign keys with explicit ON DELETE behavior, unique constraints where uniqueness is required, check constraints or enum types where the value is from a fixed set, and indexes on every column we'll query by. State the constraint and the reason. I'll review before you generate the DDL." ## Consequences **Benefits.** A well-defined schema turns invariants into properties of the data instead of properties of the code. The database refuses to accept rows that violate the rules, which means the application stops carrying the rules and stops failing in surprising ways when one path forgets to check. Queries become faster because indexes and constraints give the planner the information it needs. The schema in version control becomes the team's most authoritative documentation of the system's data shape, more reliable than any wiki page because it's diff-able, reviewable, and tested on every deploy. Onboarding shortens: a new engineer reads the migration history and learns the system's domain in an afternoon. For coding agents, the constraint vocabulary in the schema teaches the agent — by example, every time it reads — to produce migrations that match the team's standards. The schema also makes change-cost legible. A new column is cheap. A new table is medium. A type change on a large column is expensive. A change to a foreign key cascade behavior may be invisible at the SQL level but expensive at the application level because deletion semantics shift under whoever's code depended on them. A team that reads schema changes the way it reads code changes catches the expensive ones before they ship. **Liabilities.** Schemas cost effort to maintain and effort to change. Every constraint that's right makes the data better and the next change harder; a `NOT NULL` column that turns out to occasionally need to be null requires a migration, a backfill, and a coordinated application update. A foreign key that turns out to span a table the team wants to shard requires removing the foreign key, which means losing the database's enforcement and reconstructing the check in application code. Schemas reward thinking the domain through carefully up front and punish thinking it through carelessly; teams that rush to ship and then keep ratcheting constraints onto an already-populated table pay a migration tax forever. Schemas are also a place where premature constraint costs the team. A `VARCHAR(255)` chosen at random becomes the upper bound on every value the team can ever store in that column, and the constraint will be discovered the day a user wants to enter a longer title and the application rejects them. The discipline is to constrain what's known to be true about the domain and leave free what genuinely is variable; "I made it `VARCHAR(50)` to be safe" is the same trap as "I made it nullable to be safe" with a different cost profile. Finally, the schema can become a place where the team's product confusion hides. A team that can't decide whether a `users` table should have a `deleted_at` column or a separate `archived_users` table, or whether a `subscription` row with `status: cancelled` is the same shape as one with `status: active`, is often really arguing about a domain question that the schema has surfaced but can't answer. Treat a stalled schema-design discussion as a signal to ask the product owner what the entity actually represents, not as a problem to be solved with cleverer DDL. For agentic workflows the consequence is sharp. An agent will mirror the schema discipline of the codebase it reads. The schema is the densest, most reviewable, most enforced piece of documentation the codebase has; investing in it pays for itself every time an agent edits the system, because the agent reads the constraints before it writes new code and pattern-matches off them. A team that wants its agents to ship migrations the team would have approved makes sure the existing migrations are the migrations the team would have approved. ## Sources - E.F. Codd's ["A Relational Model of Data for Large Shared Data Banks"](https://dl.acm.org/doi/10.1145/362384.362685) (*Communications of the ACM*, 1970) introduced the relational model that underlies the modern notion of a database schema — tables, tuples, keys, the separation of logical structure from physical storage. The article's framing of the schema as the contract the database itself enforces descends from Codd's framing. - C.J. Date's [*An Introduction to Database Systems*](https://openlibrary.org/works/OL1840931W) (Addison-Wesley, multiple editions since 1975) is the canonical textbook treatment of relational schema design — primary keys, foreign keys, normalization, constraints — and is where most working engineers first met the vocabulary used above. - Martin Fowler's [*Patterns of Enterprise Application Architecture*](https://martinfowler.com/eaaCatalog/) (Addison-Wesley, 2002) cataloged the patterns by which application code maps to and from a relational schema — Data Mapper, Active Record, Foreign Key Mapping, Embedded Value — and is the canonical reference for the interplay between schema design and the code that sits on top of it. The "constraint vocabulary teaches the agent" framing in this article is a 2026 extension of Fowler's older observation that an application's quality is bounded by the quality of the mapping between its objects and its tables. - The PostgreSQL [documentation on constraints](https://www.postgresql.org/docs/current/ddl-constraints.html) is the working practitioner's reference for what a schema can enforce — check constraints, foreign keys with `ON DELETE` policies, exclusion constraints, deferrable constraints — and is the document most often open when an engineer is deciding which rule the schema can carry and which the application has to. --- - [Next: Schema (Serialization)](schema-serialization.md) - [Previous: Data Model](data-model.md)