Smell (AI Smell)

Concept

A foundational idea to recognize and understand.

Understand This First

Human in the Loop – AI smell detection is a human capability that agents can’t reliably perform on their own output.

At the heuristic level, an AI smell is a surface pattern in model-generated output that suggests the content was produced for plausibility rather than understanding. Just as a code smell hints at a structural problem in human-written code, an AI smell hints that the model is pattern-matching from training data rather than reasoning about the specific problem at hand.

This pattern is unique to the agentic coding era. As AI agents take on more of the work of writing code, documentation, and tests, the humans directing them need a vocabulary for recognizing when the output looks right but isn’t right. An AI smell doesn’t prove the output is wrong, but it raises a flag worth investigating.

Problem

How do you tell the difference between AI output that reflects genuine understanding of your problem and output that merely resembles correct answers?

Large language models generate text by predicting plausible continuations. This means they produce output that reads fluently and follows conventions, even when the content is factually wrong, logically inconsistent, or disconnected from your specific context. The danger isn’t obvious garbage; it’s confident, well-formatted, subtly incorrect work that passes a casual review.

Forces

Fluency masks errors. Well-written prose and clean code formatting create an illusion of correctness.
Confidence is uniform. The model doesn’t signal uncertainty. A hallucinated fact reads with the same tone as a verified one.
Volume overwhelms review. When an agent produces a thousand lines of code, the reviewer’s attention is finite.
Familiarity bias leads reviewers to accept output that matches patterns they recognize, even when those patterns don’t fit the current context.

Solution

Develop the habit of scanning AI output for these common AI smells:

Plausible but fabricated references. The agent cites a function, API, library version, or configuration option that doesn’t exist. It looks real because it follows naming conventions, but it was confabulated from training patterns.

Symmetry without substance. The agent produces a beautifully parallel structure (three examples, each with the same format) but the examples don’t actually illustrate different things. The structure is decorative, not informative.

Confident hedging. Phrases like “this is generally considered best practice” or “most developers agree” that sound authoritative but commit to nothing. The model is averaging across its training data rather than making a specific claim.

Cargo-cult patterns. The agent applies a design pattern (dependency injection, observer pattern, middleware chain) because it frequently appears in similar codebases, not because the current problem requires it. The pattern is structurally present but serves no purpose. See YAGNI.

Shallow error handling. The agent wraps code in try/catch blocks or adds error returns, but the handling logic is generic: logging the error and re-throwing, or returning a default value that’s never correct. It looks like the code handles errors, but it actually suppresses them.

Tests that test the implementation. The agent writes tests that mirror the code’s structure rather than its requirements. The tests pass, but they’d also pass if the code were subtly wrong because they’re testing what the code does rather than what the code should do.

Agent Struggle as a Code Quality Signal

The smells above are all about problems in the agent’s output. But there’s an inverse worth knowing: when the agent struggles with existing code, that struggle itself is a signal about your codebase.

If an agent repeatedly introduces bugs in a particular module, misunderstands the control flow, or asks clarifying questions about the same area, that module likely has poor Local Reasoning properties. Hidden state, implicit conventions, tangled dependencies – the same things that trip up a new team member will trip up an agent, only faster and more visibly. The agent acts as a canary: its confusion reveals structural problems that experienced developers have learned to work around but never fixed.

This reframes agent failure. Instead of asking “why is the agent so bad at this?” ask “what is it about this code that makes it hard to work with?” A codebase where agents perform well is usually a codebase where humans perform well too.

Warning

The most dangerous AI smell is code that works perfectly for the test cases the agent generated alongside it. Always verify that agent-written tests reflect your requirements, not the agent’s own implementation choices. Write at least a few tests yourself to anchor the suite in real expectations.

How It Plays Out

A developer asks an agent to integrate with a third-party API. The agent produces a clean client library with methods for every endpoint, complete with type definitions and error handling. The developer notices the base URL is wrong, two of the endpoints don’t exist, and the authentication header uses a format the API doesn’t support. The code looks like a professional API client because the model has seen thousands of them, but it was generated from plausibility, not from the actual API documentation.

A team reviews agent-generated documentation and notices that every function’s docstring follows the same template: “This function takes X and returns Y. It handles Z errors gracefully.” The descriptions are fluent but generic. They describe what the function signature already says, not what the function’s purpose or edge cases are. The documentation passes a superficial review but adds no value.

A team notices that agents consistently produce broken code in their billing module. Every modification requires multiple correction cycles. At first they blame the agent, but a new hire reports the same experience: the module has undocumented coupling to three other systems, configuration values that change meaning depending on the time of day, and variable names inherited from a system retired two years ago. The agent’s struggle wasn’t a failure of AI – it was a readout of accumulated technical debt.

Example Prompt

“Review the API client you just generated. Check that every endpoint URL, request field, and authentication header matches the documentation I provided. Flag anything you inferred rather than read from the docs.”

Consequences

Recognizing AI smells makes you a more effective director of AI agents. You learn to trust and verify, accepting the agent’s productivity while maintaining the critical eye that catches plausible nonsense before it reaches production.

The cost is vigilance. Smell detection requires reading AI output carefully, which partially offsets the speed advantage of using agents. Over time, you develop a calibrated sense of when to trust and when to probe, but the initial learning curve requires slowing down and checking more than feels necessary.

There’s also a social dimension: teams need to normalize questioning AI output without treating it as a failure of the agent or the person who prompted it. AI smells are inherent to how models work, not evidence of bad prompting.

Refines: Smell (Code Smell) — AI smells extend the smell concept to model-generated output.
Uses: YAGNI — cargo-cult patterns in AI output are a form of speculative generality.
Enables: Verification Loop — recognizing AI smells motivates systematic verification of agent output.
Depends on: Human in the Loop — AI smell detection is a human capability that agents can’t reliably perform on their own output.
Informed by: Local Reasoning — code that resists local reasoning causes agents to struggle, making the agent’s difficulty a diagnostic signal.

Sources

Kent Beck coined the term “code smell” in the late 1990s while collaborating with Martin Fowler on Refactoring: Improving the Design of Existing Code (1999). The metaphor of surface symptoms hinting at deeper structural problems is the foundation this article extends to AI-generated output.
Wikipedia editors compiled “Signs of AI Writing” (2025), a field guide cataloging recurring patterns in AI-generated text observed across thousands of edits. Many of the specific smells described here — confident hedging, symmetry without substance, plausible fabrication — align with patterns the guide documents.
Adam Tornhill and the CodeScene team published “AI-Ready Code: How Code Health Determines AI Performance” (2026), demonstrating empirically that AI agents produce more defects in unhealthy code. Their research supports the “agent struggle as code quality signal” framing: when agents fail repeatedly in a module, the code’s structural health is often the root cause.

Encyclopedia of Agentic Coding Patterns

Smell (AI Smell)

Understand This First

Context

Problem

Forces

Solution

Agent Struggle as a Code Quality Signal

How It Plays Out

Consequences

Sources

Keyboard shortcuts

Encyclopedia of Agentic Coding Patterns