Exploratory Testing

Pattern

A named solution to a recurring problem.

Learn the system, design a probe, run it, and let what you observe decide what to probe next, all in the same short session.

Also known as: Session-Based Exploratory Testing (SBET), Charter-Based Testing

Understand This First

Test – the executable artifact that locks in what you already know; exploration looks for what you don’t.
Test Oracle – you still need a way to decide pass or fail, even when you didn’t plan the check in advance.

You have a scripted test suite. Unit tests are green. Integration tests pass. A continuous integration run shows all lights blue. Then a user tries something nobody thought of and the whole thing falls over. This is a tactical pattern: a deliberate activity, not a substitute for automation, that catches the class of bug scripted tests are blind to.

The situation gets worse when an agent writes the code. Agents tend to produce tests that mirror the happy path they imagined, not tests that probe the seams of the system they actually built. You end up with a green suite and a fragile product. Exploratory testing is where a human closes that gap.

Problem

Scripted tests only check what you predicted. Every test you write is an assertion about behavior you already had in mind. But most interesting bugs live in territory nobody thought to look at: the timing window between two requests, the postal code the validator never saw, the stale session token that still technically parses. How do you find defects in a space too large and too surprising to enumerate in advance?

Forces

Writing scripts for every conceivable scenario is impossible and produces a test suite nobody can maintain.
Unstructured “clicking around” finds bugs by accident, but it’s slow, unreproducible, and invisible to the rest of the team.
Bug discovery depends on intuition about where the system is likely to fail, and intuition improves only when exercised.
Automation and exploration compete for the same tester hours; one without the other is incomplete.
Agent-generated code passes agent-generated tests, so agent workflows narrow the territory any test suite knows to cover.

Solution

Run time-boxed sessions against the system. Each session is driven by a charter that names the mission, scope, and risks to investigate, but leaves the specific steps open. Inside the session, form hypotheses about where the software might fail. Probe them, observe what happens, and use what you learn to decide what to try next. After the session, debrief: what was tested, what surprised you, what bugs were found, what new charters does this suggest?

The charter is the key artifact. It’s a paragraph, sometimes a sentence. “Explore the checkout flow with cart sizes between 50 and 500 items, focusing on pagination and timeout behavior.” It focuses attention without telling you what to click. Session length is usually 45 to 90 minutes: long enough to get into the flow, short enough to stay sharp.

Keep notes as you go: what you tried, what you saw, what you noticed in passing. These notes are the primary output, along with any defects you file. They let you pick up a follow-up session, hand the mission to a teammate, or turn a reproducible finding into a new scripted test.

Three disciplines keep exploratory testing from degenerating into aimless clicking:

Charters define the session. A session without a charter is a stroll. A charter without a session is a wish.
Debriefs close the session. Either in writing or in a short conversation, you summarize what happened. No debrief means the learning evaporates.
Oracles are explicit. Even when you didn’t plan a specific check, you decide before probing: if the next action produces X, call that a bug. A hunch is fine; an articulated hunch is better.

How It Plays Out

A tester charters a session on a new search feature: “Explore search with queries containing mixed scripts, emoji, and punctuation, for 60 minutes, focusing on ranking and pagination.” She doesn’t write a test plan. She types queries. The first Arabic query reverses the pagination arrows. A query with a combining diacritic returns zero results even though the same word without the mark returns three pages. Punctuation is handled inconsistently: a search for “C++” silently strips the pluses. None of these were in the original test suite. The debrief produces four bug reports and two new charters for next week.

A team ships a feature built by an AI agent. The agent wrote the code, wrote unit tests, and ran them. Everything is green. A developer charters a 45-minute session: “Explore the new export feature with files at the boundary of the size limit (large files, slightly over the limit, slightly under, and zero-byte files).” Within ten minutes he finds that a 0-byte file produces a corrupt download, and a file one byte over the limit silently truncates without warning. The agent hadn’t imagined those inputs, so the tests the agent wrote didn’t cover them.

Tip

After an agent writes and tests a feature, charter a 30-minute exploratory session aimed at the seams: the boundaries between units the agent tested in isolation, the timing between events the agent didn’t simulate, and the inputs the agent’s happy-path tests didn’t include. You’ll find bugs faster than by reading the diff.

Pair testing has emerged as a natural extension. One tester drives while another observes and suggests angles. The driver focuses; the observer notices. An AI pair tester plays the same role — a second model running alongside the human, proposing inputs the human hasn’t tried, flagging response-time drift, and recalling similar defect classes from other parts of the codebase. The human keeps the agency; the model keeps the attention from drifting.

Consequences

Exploratory testing finds bugs that scripted tests never will, especially on the kinds of systems agents now produce at speed. It also builds tester expertise in a way scripted execution does not: every session teaches you something about how the product behaves under pressure.

The costs are real. Sessions require concentration and can’t be outsourced to the build server. The findings are only as good as the tester; a novice session covers less ground than an expert one. Reproducing a bug found during exploration sometimes takes as long as finding it. And the practice is hard to measure — “hours of exploration” is a weak metric compared to “tests passing,” so teams that only count what they can automate tend to underinvest.

The usual mistake is treating exploratory testing as the whole testing strategy or as a fallback for when automation is inconvenient. It’s neither. Scripted tests (and, above them, the Test Pyramid) hold the line on what you already know. Exploration finds what you don’t yet. Teams need both.

Sources

Cem Kaner coined the term “exploratory testing” in 1984 and developed it through the 1990s as a counterweight to heavyweight test-plan documents. James Bach and Michael Bolton refined the practice into Session-Based Test Management (SBTM) around 2000, introducing the charter as the unit of test design and the debrief as the mechanism for turning session notes into shared knowledge. Jonathan Bach’s original “Session-Based Test Management” paper (2000) is the canonical description of the session structure.

Elisabeth Hendrickson’s Explore It! (2013) is the most accessible book-length treatment for practitioners, organizing the activity around heuristics for where to probe and how to reason about results.

The AI pair-testing variant emerged from the agentic-coding community in 2025 and 2026 as a response to the flood of agent-generated code that passed its own tests; agent-accessible tools such as the Model Context Protocol and Playwright made the practice concrete enough for teams to describe it as a first-class testing mode.

Keyboard shortcuts