Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Smell (Code Smell)

A code smell is a surface feature of working code that hints at a deeper design problem; the word gives a reviewer fast vocabulary for the structural intuitions they already have.

Concept

Vocabulary that names a phenomenon.

“A code smell is a surface indication that usually corresponds to a deeper problem in the system.” — Martin Fowler

What It Is

A code smell is a recognizable feature in source code that suggests, but does not prove, a design problem. The code compiles. The tests pass. Something about the structure still makes it harder to understand, change, or extend than it should be. The metaphor is deliberate: a smell is a clue, not a verdict. You notice it, you investigate, and you decide whether there’s something rotten or whether the room is just unusual.

Kent Beck coined the term in the late 1990s while helping Martin Fowler with the book that became Refactoring. The point of the word was to give developers a shared name for the structural intuitions they already had. Before the vocabulary, a reviewer who looked at a 200-line function and felt uneasy had only “this looks bad” to work with. After the vocabulary, they could say “this is a Long Method” and point at a known catalog of remedies. The named smells in the original Refactoring catalog (Long Method, Feature Envy, Shotgun Surgery, Primitive Obsession, Duplicated Code, God Class, and roughly two dozen more) became the working vocabulary of code review for the next two decades.

A smell is not a bug. A bug is wrong behavior; a smell is a structural feature that makes the next bug more likely or the next change more expensive. A smell is not a rule violation, either. A linter flags rule violations and you fix them mechanically; a smell needs a human (or an agent under human supervision) to decide whether it’s actually a problem in this code, on this team, at this point in the system’s life. The whole point of treating smells as heuristics rather than rules is that the answer is sometimes “yes, this is fine.”

It helps to keep three close-but-distinct ideas separate, because the conversation tangles when they get conflated:

  • A smell is a surface feature you observe: a long function, a duplicated block, a primitive type used where a domain type would be clearer.
  • The underlying design problem is the structural weakness the smell hints at: one responsibility split across two places, two responsibilities crammed into one, missing abstraction, leaky boundary.
  • The remedy is the refactoring that, if applied, would remove the smell by addressing the underlying problem: Extract Function, Move Method, Replace Primitive with Object, Consolidate Duplicate Code.

The smell is the cheap part: it’s what a reviewer sees in seconds. The diagnosis (what design problem is the smell pointing to?) and the prescription (which remedy actually helps?) are the expensive parts that require judgment.

Why It Matters

Design problems rarely announce themselves. A function that’s slightly too long works today. A class with one too many responsibilities passes all its tests. The damage is cumulative: each small compromise makes the next change a little harder, until the codebase becomes resistant to modification. By the time someone says “we need to rewrite this,” the cost is enormous. Smells are the early-warning system that lets a team intervene while interventions are still cheap.

Without the vocabulary, the conversation that should happen during code review either doesn’t happen or happens vaguely. A reviewer who feels something is wrong but can’t name it tends either to wave the change through (“I don’t have a concrete objection”) or to push back in ways that read as personal taste (“I just don’t like it”). A reviewer who can name the smell can be specific: this is Feature Envy, this method belongs on the other class, the symptom is that it reaches across three accessors to do its work. The conversation moves from “I don’t like this” to “here’s a known structural issue and here’s the known remedy,” which is a conversation the author can actually engage with.

The vocabulary matters more under agentic coding, not less. An agent generates code prolifically, and not all of it is well-structured. A reviewer supervising agent output is reviewing more code per hour than they ever did when humans wrote everything, and the supervisor’s job is to spot the structural issues that the agent can’t see in its own output. A reviewer working from a vocabulary of named smells can scan an agent’s pull request and surface “Long Method on process_order, Feature Envy on validate_customer, Primitive Obsession around the address fields” in seconds, then ask the agent to refactor with specific instructions. A reviewer without the vocabulary either passes the structurally weak code through or stalls trying to articulate the problem from scratch.

The other reason the vocabulary matters under agentic coding: agents have characteristic smells. A model trained on a wide range of codebases tends to produce certain shapes more often than human-written code does: overly elaborate class hierarchies, defensive validation duplicated at every layer, primitive obsession around domain values that “should” be typed. Knowing the named smells means you also know which ones to look for first when the author is a model rather than a person, and you can write review checklists that target them.

How to Recognize It

The original Refactoring catalog and the community that grew up around it have given the field a working vocabulary of named smells. The ones that come up most often, with the structural problem each one usually points to:

  • Long Method / Long Function. A function does so many things you can’t hold it in your head. The function is usually doing two or three conceptually separate things that should be named, extracted, and called from the original.
  • Feature Envy. A method uses more data from another class than from its own. It probably belongs on the other class, where the data lives.
  • Shotgun Surgery. A single conceptual change requires edits in many files. The related logic is scattered and should be consolidated into a place that owns it.
  • Primitive Obsession. Raw strings, integers, or booleans appear where a domain type would be clearer. Money is a float. An email address is a str. A user ID and a product ID are both int. See Make Illegal States Unrepresentable.
  • Duplicated Code. The same logic appears in two or more places. When one copy gets fixed, the others don’t.
  • God Object / God Class. A single class knows too much and does too much. It violates Separation of Concerns — usually because nobody pushed back on the first three responsibilities that landed there, and now the fourth feels normal.
  • Speculative Generality. Code carries abstractions, hooks, or configuration knobs that no caller uses. The author was building for an imagined future that hasn’t arrived, and may never.
  • Comments as Apology. A comment explains why a confusing piece of code is the way it is, in detail. The code wants to be refactored into a shape that doesn’t need the comment.

A few signs that the named smells are doing their work in practice:

  • A reviewer points at a section of code and names the smell without searching for it. The vocabulary has become reflexive.
  • A pull request comment cites a specific smell and a specific known remedy (“Long Method → Extract Function, suggest pulling the validation block into validate_payload”). The conversation moves past taste.
  • Code review checklists include smells the team has decided are worth scanning for every time. The vocabulary becomes infrastructure.
  • An agent’s prompt names the smells the reviewer wants caught (“flag any Long Method over 50 lines, any Primitive Obsession around Money or Address”). The vocabulary is shared with the model.

Note

When reviewing agent-generated code, check for these common smells first: overly elaborate class hierarchies (the agent reached for enterprise patterns), duplicated validation logic (the agent didn’t extract a shared function), and primitive obsession (strings used where typed values would be safer). Agents rarely produce true god classes on their own, but they frequently produce long methods and feature envy.

Smells are heuristics, not rules. A long function that reads clearly and does one conceptual thing may not need refactoring. A small amount of duplication may be preferable to a bad abstraction. The smell tells you where to look; your judgment decides what to do.

How It Plays Out

A developer reviews an agent’s pull request and notices a 200-line function in the diff. The function works; all the tests pass. The developer recognizes the Long Method smell and asks the agent to refactor it into smaller functions with descriptive names. The refactored version is easier to test, easier to read, and reveals a subtle boundary between two responsibilities that the long version had blurred. The reviewer didn’t need to articulate that boundary from first principles; the smell pointed at it and the refactoring exposed it.

A team notices that every time they add a new payment type, they have to change code in seven files. They recognize the Shotgun Surgery smell and consolidate the payment logic into a single module with a clear extension point. Future payment types require changes in one place. The conversation that produced the consolidation took five minutes once someone named the smell; before the vocabulary, the same team had been quietly absorbing the cost for a year.

A senior engineer is reviewing a code review with a junior. The junior says “this looks bad but I’m not sure why.” The senior points at the same function and says “Feature Envy: this method reads four fields from Customer and writes to one of them, but it’s defined on Order. It belongs on Customer.” The junior sees it now and the next code review they do, they spot the same shape in a different class without prompting. The vocabulary is doing the teaching.

Example Prompt

“This function is 200 lines long. Refactor it into smaller functions with descriptive names. Each one should do a single conceptual thing. Run the tests after each extraction to make sure nothing breaks. If the extraction reveals that two of the pieces really belong on a different class, flag that and propose where they should move.”

Consequences

A shared vocabulary of smells makes code review sharper. Instead of vague discomfort (“something feels off”), a reviewer can name the issue and point to a known remedy. Smells caught early are cheap to fix; smells ignored compound over time and become the structural debt that eventually forces a rewrite.

Benefits. The vocabulary teaches. A team that uses named smells in code review trains its members to spot the same shapes elsewhere, and the next reviewer doesn’t have to relearn the structural intuitions from scratch. The vocabulary also bounds the conversation: “this is Feature Envy” is something the author can engage with, where “I just don’t like it” isn’t. Under agentic coding, the named smells become a shared checklist between the human reviewer and the agent: prompts can name the smells the reviewer wants caught, and the agent can be asked to self-flag its own output for them.

Liabilities. Smells can be applied dogmatically. Not every Long Method is actually doing too much; some long functions are long because the work genuinely is. Not every duplication is bad; sometimes two pieces of code that look similar are responding to genuinely different requirements and forcing them together is the worse outcome. A team that treats the smell catalog as a rulebook rather than a heuristic will spend cycles refactoring code that didn’t need it and producing the wrong abstraction in its place. The smell tells you where to look; the smell does not tell you what to do.

Refactoring without purpose is the most common failure mode. A team that’s just learned the vocabulary tends to want to fix every smell they see. Stable, rarely-changed, well-tested code with mild smells is usually fine where it is; the return on refactoring it is low. The best use of the vocabulary is to prioritize: focus on smelly code that’s also frequently modified or actively painful to extend. That’s where the return on refactoring is highest, and the case for spending the time is clearest.

Sources

  • Kent Beck coined the term “code smell” in the late 1990s while helping Martin Fowler with Refactoring. The metaphor — something that doesn’t look wrong but smells wrong — gave developers a shared vocabulary for structural intuition.
  • Ward Cunningham’s WikiWikiWeb (c2.com, also called WardsWiki) is where the concept was first discussed publicly. The CodeSmell page there served as the community’s working notebook through the late 1990s and early 2000s and seeded much of the refactoring vocabulary that later appeared in print.
  • Martin Fowler and Kent Beck catalogued twenty-two code smells and their remedies in Refactoring: Improving the Design of Existing Code (Addison-Wesley, 1999; 2nd ed. 2018, with contributions from William Opdyke, John Brant, and Don Roberts). Chapter 3, “Bad Smells in Code,” co-authored with Beck, remains the canonical reference for the concept.
  • Martin Fowler’s bliki entry CodeSmell (martinfowler.com, 2006) is the source of the epigraph’s surface-indication definition and the short-form treatment most practitioners quote today.
  • Arthur Riel formalized the “God Class” anti-pattern in Object-Oriented Design Heuristics (Addison-Wesley, 1996), identifying the tendency of procedural-minded developers to concentrate behavior in a single controller class.

Further Reading

  • Sandi Metz and Katrina Owen, 99 Bottles of OOP (2nd ed., 2020) — a practical demonstration of identifying and addressing smells through incremental refactoring.