Primitive Obsession
Using raw strings, integers, booleans, or loose maps where a named domain type would carry the meaning and enforce the rules.
Also known as: Primitive Typing, Stringly Typed Code, Type-Code Obsession
Primitive obsession is not a dislike of primitive values. Strings, numbers, and booleans are useful building blocks. The trap begins when a real domain concept gets flattened into one of them, then every caller has to remember what the value means, which values are allowed, and which combinations are impossible.
Understand This First
- Value Object — the usual corrective pattern for domain values with rules and no identity.
- Make Illegal States Unrepresentable — the broader design principle behind replacing loose values with tighter types.
- Ubiquitous Language — the shared vocabulary primitive obsession erases from code.
Symptoms
- A
status,role,currency, orcountrytravels through the codebase as a string. - A function accepts
amount: floatandcurrency: string, so every caller can pass dollars to code that expects euros. - Several booleans describe one state:
is_paid,is_shipped,is_cancelled,is_refunded. - Validation logic for the same raw value appears in controllers, serializers, tests, and UI code.
- An agent invents a
dict[str, Any]or generic JSON blob because the prompt didn’t provide a domain type. - Reviewers ask, “What does this number mean?” or “Which strings are valid here?”
- The code has comments like
// status is one of pending, active, suspended, closedbecause the type system doesn’t know that.
Why It Happens
Primitive obsession starts with speed. A string is easy to add. A new class, enum, value object, or tagged union feels like design work, and design work feels expensive when the feature is small. So the first version stores "premium" as a string, 0.85 as a threshold, or true, false, false as state.
Serialization also pushes teams toward primitives. APIs, databases, queues, and config files exchange strings and numbers, so it feels natural to keep those shapes inside the application too. The boundary format leaks inward. A value that should be parsed once at the edge stays loose all the way through the domain code.
Agents amplify the habit. When a prompt says “add a priority field,” the agent sees a thousand tutorial examples where priority is a string. Unless the surrounding code or instruction file names a Priority type, the agent will often add "low", "normal", and "high" as raw text. The patch works until the next agent writes "urgent" or "High" somewhere else.
The deeper cause is a missing domain model. If the team hasn’t named Money, EmailAddress, OrderStatus, or DeploymentEnvironment as concepts, the code can’t carry those names. The primitives are not the problem by themselves. They are evidence that the meaning never found a proper home.
The Harm
Primitive obsession spreads rules across the codebase. Every raw value needs its own validation, parsing, comparison, formatting, and error handling. If OrderStatus is a string, every function that touches it needs to know the allowed strings. If Money is two unrelated fields, every function that adds amounts needs to remember to check currency first.
It also creates invalid states. Three booleans can express eight combinations even when the business only allows four. A string can hold "admin", "Admin", "administrator", "", or "🤷". A float can hold negative money, NaN, and values rounded in ways no payment processor accepts. The system then grows defensive branches to handle states that should never have existed.
In agentic coding, the harm is review load. A human reviewer has to inspect every primitive-bearing patch for hidden domain assumptions. Did the agent use the canonical status strings? Did it preserve timezone meaning? Did it compare currency before adding amounts? Did it pass tenant IDs as strings to code that expects user IDs? The more meaning lives outside the types, the more supervision the human has to do by hand.
Primitive obsession also weakens future prompts. Agents read the local code as instruction. If the code teaches them that statuses are strings and money is a float, they will keep producing more of that shape. The antipattern becomes self-reinforcing.
The Way Out
Promote domain concepts into named types at the point where they first gain rules. The type does not have to be large. It only has to make the meaning explicit and keep invalid values from spreading.
Use four moves:
Name the concept. If a value has domain meaning, give it a domain name. EmailAddress, Money, OrderStatus, TenantId, and DeploymentEnvironment tell the next reader what the value is before they inspect its contents.
Constrain the value. Use enums for closed sets, value objects for structured values, and tagged unions for state that varies by case. Parse raw input once at the boundary, then pass the constrained type through the rest of the system.
Move behavior to the type. A Money type should know how to add money and reject mixed currencies. An EmailAddress type should validate format on construction. An OrderStatus type should expose allowed transitions. Don’t make every caller rediscover the rules.
Teach the agent the type. Put the domain types in the prompt context or instruction file before asking for code. “Use OrderStatus, not raw strings. New statuses must be added to the enum and the transition table. Don’t compare status values as text.” That instruction gives the agent a local convention to follow.
When reviewing an agent’s patch, search for new string, number, boolean, and dict fields. For each one, ask whether it is a real domain concept wearing a primitive costume. If yes, ask the agent to extract the named type before the shape spreads.
How It Plays Out
A subscription service stores plan tiers as strings: "free", "pro", and "enterprise". An agent adds a billing feature and checks for "paid" in one branch because the prompt said “paid plans.” Tests pass for the pro case but fail in production for enterprise customers. The team replaces the string with a PlanTier enum and exposes is_billable() on the type. Future code asks the domain question directly instead of guessing which strings imply payment.
A payments module passes amount: float and currency: string through twenty functions. One path adds 10.00 USD to 10.00 EUR because both are floats by the time they reach the accumulator. The fix is a Money value object that stores a decimal amount and currency together. Its add() method rejects mixed currencies unless a conversion step has already produced a common currency. The bug disappears because the invalid operation no longer has an easy expression.
A workflow engine models job state with booleans: started, finished, failed, cancelled. The agent asked to add retries writes a branch for failed && finished because the data shape permits it. The state machine never meant to allow that combination. The team replaces the booleans with a tagged union: Queued, Running, Succeeded, Failed(reason), Cancelled(by). The retry code becomes shorter because each case carries only the fields it can legally have.
Do not fix primitive obsession by wrapping every value blindly. PageNumber, RetryCount, and PercentComplete may earn names; a local loop index probably doesn’t. The test is domain meaning plus rules, not discomfort with primitives in general.
Consequences
Replacing primitives with domain types makes code easier to read and safer to change. The names carry the ubiquitous language into the implementation. Constructors and enums reject invalid values early. Agents given those types generate code that follows the model instead of inventing local conventions.
The cost is extra structure. Small systems can drown in tiny wrappers if every value becomes a class. Serialization also needs care: strict internal types still have to cross loose external boundaries such as JSON, forms, CSV files, databases, and tool outputs. That conversion code belongs at the boundary. Once the value is inside the system, it should carry its meaning with it.
The judgment call is timing. Extract too early and you create ceremony. Extract too late and the primitive shape spreads through APIs, tests, fixtures, and stored data. A practical rule: when the second validation check appears, or when a value needs a second field to make sense, promote the concept.
Related Patterns
Sources
- Martin Fowler and Kent Beck’s Refactoring: Improving the Design of Existing Code names primitive obsession as a code smell and gives the core remedy: replace loose data values with objects that carry behavior and meaning. Fowler’s online catalog entry for Replace Primitive with Object shows the small refactoring step behind the larger design move.
- Eric Evans’s Domain-Driven Design provides the domain-modeling frame this article uses: the important concepts in the domain should appear in the model, and the model should speak the team’s ubiquitous language. Domain Language’s DDD resources page is the stable public pointer for Evans’s book and surrounding work.
- Yaron Minsky’s Jane Street writing on Effective ML Revisited gives the type-design principle that makes this antipattern costly in practice: invalid states should be impossible to represent, not merely checked after they appear.