Dark Factory

Pattern

A named solution to a recurring problem.

A Dark Factory is a software operating model in which coding agents write, test, and ship production code with no human writing or reviewing the code itself; humans set the goals, scenarios, and constraints and let the factory run.

“Code must not be written by humans. Code must not be reviewed by humans.” — StrongDM Engineering, public manifesto (2026)

Also known as: Software Factory, Lights-Out Coding, Level 4 / Level 5 Agentic Development

Understand This First

Bounded Autonomy – the governance model at the opposite end of the spectrum; Dark Factory is what bounded autonomy looks like when every tier is set to “act without asking.”
Harness (Agentic) – a mature harness is the substrate a Dark Factory runs on.
Verification Loop – without a tight, reliable verification loop, a Dark Factory ships defects at speed.
AgentOps – production monitoring replaces human code review as the primary feedback signal.

Context

The term borrows from manufacturing. A “dark factory” is a production facility that runs without human workers on the floor: the lights stay off because the robots don’t need them. Dan Shapiro coined the software version to name an operating model that was, until 2026, mostly theoretical. StrongDM’s engineering team made it concrete by publishing a manifesto with two rules: code is not written by humans, and code is not reviewed by humans. Humans set the intent, describe the scenarios the system must handle, and define the constraints. Everything from the first line of code to the production deploy happens between agents.

This sits at the agentic and operational level. It isn’t a coding technique. It’s a claim about where the human belongs in the software lifecycle: outside the code, at the specification and governance layer. Dark Factory names the far end of a spectrum whose other end is the traditional workflow where a human writes every character and reviews every change.

Practitioners have converged on a rough five-level ladder to describe positions along this spectrum:

Human-written, human-reviewed. Autocomplete at most.
Agent-assisted authoring. The agent drafts; a human reviews every line.
Agent-authored, human-reviewed. The agent writes whole features; a human reads the diff.
Agent-authored, agent-reviewed, human spot-checks. A human still looks, but only at flagged changes.
Dark Factory. No human writes or reviews code. Humans work only at the specification, scenario, and policy layer.

Level 5 is where “Dark Factory” strictly applies. Level 4 is the common preparatory state.

Problem

As agents become capable enough to write entire features end to end, human code review becomes the bottleneck. A team that writes code in minutes can spend hours waiting for a reviewer, and the reviewer’s attention drops sharply as diff sizes grow. At the same time, reviewing agent-authored code well is genuinely hard: the patterns are unfamiliar, the volume is relentless, and the signal that a line is worth pausing on is weaker than for human-authored code.

You are left with a choice. Either the human stays in the loop and accepts that review is now the constraint on delivery, or you take the human out of code-level review and redesign everything else in the lifecycle to make that safe. Dark Factory is the second choice, taken seriously.

Forces

Review cost scales with code volume, not code value. When agents generate 100x more code, line-by-line review becomes uneconomic long before it becomes impossible.
Humans review agent-authored code worse than they think. Diffs look plausible, explanations sound confident, and attention fades. The signal-to-noise ratio for human reviewers is collapsing just as the volume rises.
Specifications and scenarios scale with product complexity, not code size. You can write a specification for a billing system once and have it survive many refactors. You can’t review every refactor.
Preconditions are exacting. A Dark Factory needs codified intent, a strong test oracle, a mature harness, reliable simulation environments, and production telemetry that catches what tests miss. Miss any of these and the factory ships defects at industrial scale.
Accountability doesn’t disappear. Regulators, customers, and the team’s own conscience all still need someone to answer for what the system does. The human moves; the human doesn’t leave.

Solution

Redesign the software lifecycle so that humans work at the layer above code, and the factory between their specifications and the production system runs without human hands on the keyboard. Three moves make this work:

Move the human up one level. Humans stop writing and reviewing code. They write and review specifications, scenarios, constraints, and production policies. The artifacts that used to be informal (user stories, acceptance criteria) become first-class inputs that agents can read, execute, and regenerate code from. The artifacts that used to be secondary (tests, invariants, performance budgets) become the primary contract.

Replace human review with stacked automated checks. Break the code review a human used to do into pieces and spread them across the pipeline. Agents generate code against a specification. A second agent critiques it against the same specification. Property-based tests, simulation runs, and scenario replays exercise it far beyond what hand-written unit tests ever did. Static analysis, security scanners, and Architecture Fitness Functions enforce constraints the specification can’t capture. Production traffic runs through canary deploys and feature flags so the real world becomes the final review surface, with automatic rollback when domain metrics move the wrong way.

Treat production telemetry as the primary feedback sensor. Because no human reads the diff, the system needs to know quickly and precisely when the deployed behavior diverges from the specification. AgentOps dashboards, domain-oriented metrics, and error budgets become the governance layer. A Dark Factory that can’t detect its own regressions isn’t a factory; it’s a defect machine.

The payoff is real: a small team can ship a large surface area, because the only human-time-bounded work left is specifying and supervising. The cost is equally real: the preconditions are expensive, and the failure mode is delivering broken software faster than you can catch it.

Warning

Don’t try to run at Level 5 on a codebase that can’t be tested well. A Dark Factory inherits the quality of its test oracle. If your tests let bad code pass today, a Dark Factory will ship bad code a hundred times faster tomorrow. Harden the oracle before removing the reviewer.

How It Plays Out

A small infrastructure startup decides to run its internal tools as a Dark Factory. They invest two months up front in a specification system: every feature begins life as a markdown brief with acceptance scenarios written in a structured format. Agents consume the brief, generate the service, a second agent critiques it against the brief, a test suite validates behavior, and the change lands behind a feature flag. A human PM writes briefs; a human SRE watches production dashboards; no engineer reviews a diff. Over six months the team ships ten times the feature volume of a comparable team running Level 3. Their first incident arrives when an agent interprets an ambiguous scenario as “silent retry on failure” and the team watches a bill triple overnight before the alert fires. They codify the missing constraint as an invariant, add a cost-per-request fitness function, and keep running.

A financial services firm tries the same approach for a customer-facing billing service and aborts after three weeks. Regulatory requirements mandate human sign-off on any change touching customer funds. The team can get to Level 4 inside the firm’s walls, but Level 5 is legally out of reach on that surface. They reclassify: internal tools run as a Dark Factory; the billing service runs at Level 3 with full human review. The framework accommodates the split because the governance tier is a property of the code path, not the team.

A sole developer experiments with a weekend project. He writes a short specification, points an agent at it, and walks away. The agent produces three iterations, each one complete and self-tested, each one subtly wrong in a way his specification failed to pin down. He realizes the specification, not the code, is where the real work lives. He spends the rest of the weekend rewriting the specification rather than the code, and the fourth iteration works. He has, in miniature, learned the central discipline of a Dark Factory: the artifact you maintain isn’t the code.

Consequences

A working Dark Factory collapses the lead time between “we want this” and “it’s in production.” Small teams become capable of surface areas that used to require large ones. The human workload shifts from mechanical translation (requirement → code) to creative and governance work (what should we build, how will we know if it’s right, what must never be true).

The costs are unforgiving. The preconditions are expensive: a mature harness, codified specifications, a strong test oracle, reliable simulation, production telemetry rich enough to catch silent failures, and an organization culturally prepared to trust automated verification over human judgment. Each of these takes months to build and can be undermined in a single bad quarter. Teams that try to run a Dark Factory on top of a weak oracle discover that the factory ships their quality problems at full speed.

There’s also a trust and accountability dimension that tooling doesn’t solve. Stanford’s CodeX center framed the question sharply: “Built by agents, tested by agents, trusted by whom?” When something goes wrong in a Dark Factory, the humans responsible can’t appeal to “the engineer who wrote this had a reason.” Ownership attaches to the specification author, the governance layer, and the production operator, in ways most organizations haven’t yet worked out. Regulators, auditors, and customers are still catching up to what this means, and the legal precedent is thin.

Finally, there’s a skills question. A team that runs at Level 5 for a year doesn’t produce engineers who can debug code; it produces engineers who can debug specifications and systems. That’s probably the right skill for the long run. But the transition is real, and a team that can’t drop back to Level 3 during an outage is fragile in a way that a traditional team isn’t.

Sources

Dan Shapiro coined the “Dark Factory” framing for agent-driven software development in The Five Levels: from Spicy Autocomplete to the Dark Factory (January 2026) and developed the playbook further in Dark Factories: Rise of the Trycycle (March 2026), drawing on the existing industrial term for lights-out manufacturing facilities. The manufacturing analogy is older than the software use, but Shapiro’s application to coding is the lineage most subsequent writers cite.

StrongDM’s public engineering manifesto, The StrongDM Software Factory: Building Software with AI, is the most concrete reference implementation: two explicit rules (“Code must not be written by humans,” “Code must not be reviewed by humans”), a description of a “digital twin universe” for scenario simulation, and named sub-patterns (Gene Transfusion, Semports, Pyramid Summaries) for the specification and testing layers. Their team’s willingness to publish the rules in enforceable form is what made the concept concrete enough for others to argue about.

Stanford Law School’s CodeX center raised the durable question that every Dark Factory adopter eventually has to answer in Built by Agents, Tested by Agents, Trusted by Whom? (February 2026). It is the clearest statement of the accountability gap that tooling alone can’t close, and it shapes the Consequences discussion above.

The five-level framework for positioning teams along the human-to-agent spectrum emerged from the agentic coding practitioner community in early 2026, with multiple independent writers converging on the same ladder structure. It isn’t attributable to a single author; by April 2026 the levels had become common vocabulary across newsletters, conference talks, and team internal documents.

Keyboard shortcuts