---
slug: risk-spike
type: pattern
summary: "A time-boxed, throwaway probe aimed at the single highest-risk unknown in a piece of work, run before you commit an agent to the full build."
created: 2026-06-16
related:
  research-plan-implement:
    relation: complements
    note: "The spike is the empirical move when research and planning alone can't settle a feasibility question."
  plan-mode:
    relation: complements
    note: "Plan mode reasons about an approach; a spike tests whether the riskiest part of that approach actually works."
  jagged-frontier:
    relation: detects
    note: "A spike is how you locate the frontier cheaply for one specific task instead of guessing whether the model can do it."
  tradeoff:
    relation: supports
    note: "A spike supplies the evidence a tradeoff decision turns on."
  judgment:
    relation: supports
    note: "A spike converts a guess into evidence, so judgment runs on data rather than hope."
  build-vs-dont-build-judgment:
    relation: supports
    note: "Spiking the riskiest unknown is how you earn the build/don't-build call before committing."
  yagni:
    relation: complements
    note: "Risk ordering applies YAGNI to investigation: don't prototype the easy parts if the hard part might not work."
  speculative-generality:
    relation: prevents
    note: "A spike tests the one load-bearing assumption instead of building speculative scaffolding around it."
  smoke-test:
    relation: contrasts-with
    note: "A smoke test checks that built software runs; a spike checks that an approach is viable in the first place."
  exploratory-testing:
    relation: contrasts-with
    note: "Exploratory testing probes finished software for surprises; a spike probes an unbuilt idea for feasibility."
  verification-loop:
    relation: contrasts-with
    note: "The verification loop confirms the agent's output is correct; a spike confirms the approach was worth building at all."
  greenfield-and-brownfield:
    relation: related
    note: "A spike is throwaway by contract; keeping spike code is the antipattern."
  task-horizon:
    relation: supports
    note: "Spiking the riskiest sub-task de-risks a long autonomous run before the agent commits to it."
  task-decomposition:
    relation: supports
    note: "Risk ordering tells you which decomposed sub-task to probe first."
  production-readiness-cliff:
    relation: mitigates
    note: "A spike surfaces the hard 20% early, before a slick demo hides it."
  specification:
    relation: upstream-of
    note: "A spike resolves the feasibility question before you commit effort to a specification."
---
# Risk Spike

> **Pattern**
>
> A named solution to a recurring problem.

> "Walking on water and developing software from a specification are easy if both are frozen."
> — Edward V. Berard

*A short, throwaway probe of the one unknown most likely to sink the work, run first, before you commit to the full build.*

*Also known as: Spike, Risk-Reduction Spike, Tracer Probe*

> **📝 Where the name comes from**
>
> "Spike" comes from Extreme Programming in the late 1990s. The image is a railroad spike: a single deep strike that drives through to bedrock, telling you what you're standing on before you lay the whole track. A spike isn't the feature; it's the one quick experiment that tells you whether the feature is even buildable the way you're imagining it.

## Understand This First

- [Tradeoff](tradeoff.md) — a spike supplies the evidence a tradeoff turns on.
- [Jagged Frontier](jagged-frontier.md) — the reason you often can't know whether the model can do a thing until it tries.

## Context

This is a **strategic** pattern, applied before a [Specification](specification.md) is committed and before you point an agent at a long build. You've got a task with a real unknown in it: an unfamiliar API, a toolchain nobody on the team has driven, a constraint you're not sure the approach can satisfy, a model capability you're not sure exists. Research narrows the question but can't close it. The only way to know is to try.

In ordinary software work, spikes were a deliberate investment. You spent a day or two writing throwaway code to answer a question, then threw it away. That cost made spikes something you reserved for the genuinely scary parts. Agents change the math. Throwaway code is now cheap to generate and cheap to discard, so the calculus that made spikes a rationed resource now favors probing almost any load-bearing assumption before you build on it.

## Problem

Pointing an agent at a large, ambiguous task and letting it run is the fastest way to burn tokens, time, and trust on an approach that was never going to work. The failure rarely announces itself early. The agent produces plausible code, the demo looks fine, and the wall shows up three days in, when the unfamiliar API turns out not to support the operation you assumed, or the model can't reliably satisfy the one constraint the whole design rests on.

How do you find out whether an approach is viable for the cost of a short experiment, instead of the cost of a doomed build?

## Forces

- The riskiest unknown is usually not the most visible one; the parts that look hard are often routine, and the part that looks routine is what kills you.
- Research and planning reduce uncertainty but can't resolve a feasibility question that depends on a specific tool, model, or environment behaving a certain way.
- The [Jagged Frontier](jagged-frontier.md) means model capability is unpredictable per task: you can't reason your way to whether the agent can do something it has never been asked to do.
- A probe that lingers becomes load-bearing. Once spike code ships, it stops being an experiment and starts being technical debt.
- Spending the cheap experiment up front feels slower than just starting the build, right up until the build collapses.

## Solution

**Identify the single highest-risk unknown, build the cheapest experiment that could prove it impossible, run that first, and throw the code away once it has answered the question.**

The discipline has three rules.

**Order by risk.** Don't probe the easy parts. Find the assumption that, if false, kills the whole approach, and spike *that*. If the hard part doesn't work, you want to know on day one, not after you've built the easy 80% around it. A spike that confirms something you were already confident about is wasted motion.

**Time-box it.** A spike has a deadline measured in minutes or a couple of hours, not days. The goal isn't a working feature; it's a yes-or-no answer to one question. When the box closes, you decide: the approach is viable, the approach is dead, or you need a different spike. With agents the box is often a single focused session: one prompt aimed at the scariest part of the task.

**Throw it away.** Spike code is disposable by contract. It skips error handling, ignores edge cases, hardcodes whatever it needs, and exists only to answer the question. Keeping it is the antipattern: you inherit code written to a throwaway standard and treat it as a foundation. Answer the question, record what you learned, and discard the code. The knowledge is the deliverable, not the prototype.

With an agent, two things make spikes nearly free. The cost of generating throwaway code has collapsed, so the experiment that used to cost an afternoon now costs one prompt. And the spike is often the *only* way to locate the frontier for a specific task: instead of guessing whether the model can drive an unfamiliar toolchain, you ask it to, watch what happens, and find out for the price of a discarded attempt.

## How It Plays Out

A founder wants to add semantic search to a product and assumes the database's new vector extension will handle it. Before committing an agent to wire search through the whole stack, she spends twenty minutes spiking the riskiest part: she has the agent stand up the extension, load a few hundred rows, and run one nearest-neighbor query. It works, but the query takes 900 milliseconds on 400 rows, far too slow for the real corpus. The spike cost twenty minutes and killed a design that would have cost three days to build and discover broken. She throws the spike code away and writes the [Specification](specification.md) around a dedicated vector store instead.

A team is migrating a service to a new framework and isn't sure the agent can satisfy a hard constraint: every request must carry a trace ID through three internal hops. Rather than reason about it, they spike it. One prompt, one throwaway endpoint, three hops, check the logs. The trace survives. Now the [build-vs-don't-build judgment](build-vs-dont-build-judgment.md) runs on evidence instead of hope, and the long autonomous run that follows is de-risked at its single scariest point.

> **💡 Spiking the frontier**
>
> When you don't know whether an agent can handle an unfamiliar API or toolchain, don't ask it to build the feature. Ask it to do the one hardest thing the feature depends on, in isolation, with no surrounding code. You'll locate the [Jagged Frontier](jagged-frontier.md) for that task in a single session — and you'll know whether to commit before you've spent anything you'd regret losing.

> **⚠️ Warning**
>
> A spike that you keep is no longer a spike. The moment throwaway code becomes the basis for real work, you've inherited code written to a throwaway standard and called it a foundation. If the spike taught you the approach is viable, write the real thing from scratch with what you learned. Don't promote the prototype.

## Consequences

**Benefits.** A spike converts an expensive unknown into cheap knowledge before any of the expense is sunk. It surfaces the hard part early, which is exactly where the [Production-Readiness Cliff](production-readiness-cliff.md) hides: the slick demo that conceals the load-bearing 20% that doesn't work yet. It gives [judgment](judgment.md) and [tradeoff](tradeoff.md) decisions real evidence to run on. And under agents it's nearly free, so the discipline that used to be reserved for the scariest unknowns now applies to almost any assumption your design rests on.

**Liabilities.** Spending the experiment up front feels slower than diving into the build, and the impatience is real when the unknown turns out fine. Risk ordering takes judgment of its own: spike the wrong unknown and you've answered a question that wasn't going to kill you while the real risk waits untouched. And the throwaway contract demands discipline most teams find hard to keep. The temptation to keep working code, even bad working code, is strong, and a spike that quietly becomes production code is worse than no spike at all.

## Sources

The spike originated in the Extreme Programming community in the late 1990s, where Kent Beck and Ward Cunningham used "spike solution" for a quick throwaway program written to answer a single technical question. The risk-first variant, ordering spikes by what is most likely to kill the approach, was sharpened in the broader agile planning tradition. The epigraph is from Edward V. Berard's *[Essays on Object-Oriented Software Engineering](https://openlibrary.org/works/OL4301697W)* (1993). The agent-native framing here, that throwaway code is now nearly free and that a spike is often the only way to probe the [Jagged Frontier](jagged-frontier.md) for a specific task, is this Encyclopedia's own.

---

- [Next: Programming Language Selection](programming-language-selection.md)
- [Previous: Architecture Decision Record](architecture-decision-record.md)