ADR-0003: Doc-first, then test-driven

  • Status: Accepted
  • Date: 2026-05-21
  • Deciders: @karasu
  • Supersedes:
  • Superseded by:

Context

kaged is being built in deliberate reaction to the agentic-tooling pattern of ship first, document if anyone complains. The manifesto's promise — "we will not vibe-code the harness we built to escape vibe-coded harnesses" — is only credible if the process backing it is structural, not aspirational.

There are a few process shapes that could deliver that promise:

  1. Code-first, docs-as-followup. The dominant pattern. Fast, comfortable, and the one we are explicitly reacting against.
  2. Doc-first, then code. Specs are written before implementation. Code is reviewed against specs.
  3. Doc-first, then test-driven. Specs first. Then failing tests that encode the spec. Then implementation. Then refactor.
  4. Spec-by-test. Skip prose; the test suite is the spec. The Smalltalk / some-Lisp tradition.

The question is load-bearing because it sets the cadence and the gating criteria for every future PR. Pick wrong and either:

  • The project stalls in documentation theatre and never ships, or
  • The docs become marketing-grade fiction with no enforcement coupling them to behavior.

This ADR locks the answer so future maintainers (and AI coding tools working on kaged) have a clear, mechanical rule rather than a stylistic preference.

Decision

Every load-bearing change to kaged is doc-first then test-driven. A change starts as a doc (spec, ADR, or RFC). Specs are merged before code. Code arrives as failing tests first, then implementation, then refactor.

The pipeline, in order:

  1. Doc PR lands first. New feature → updated or new spec in docs/specs/. New load-bearing decision → ADR in docs/adr/. Exploratory direction → RFC in rfcs/ distilled to an ADR after.
  2. Code PR #1 implements the spec as failing tests. No production code, just the test suite the spec implies. CI must run the tests and they must fail (or be marked skip with a link to the next PR).
  3. Code PR #2 implements the failing tests. Production code lands. Tests go green.
  4. Code PR #3 (optional) refactors with the green tests as a safety net. Cleanup, perf, ergonomics.
  5. If implementation reveals the spec is wrong, the spec is amended first in a new PR. Then steps 2–3 repeat from the amended baseline.

A change is "load-bearing" if it affects the public surface of any package, the DSL grammar, the wire protocol of an API, the database schema, a security boundary, or operator-facing behavior. Internal refactors that change nothing externally are exempt — they go straight to PR with tests.

Consequences

What this commits us to

  • Slower v0. The doc layer has to exist before any meaningful code lands. The README has called the project "pre-alpha · documentation-first · no code yet" since day one; this ADR ratifies that.
  • CI must enforce the test gate. A merged spec without tests is a TODO. A green test suite without a corresponding spec is a leak. Both need to be caught.
  • The doc system is part of the engineering deliverable. The docs/README.md document-types contract, the ADR/RFC distinction, the spec-amendment protocol — these are tooling, not preface. They get maintained like code.
  • Reviewers must read the spec before reviewing the code. PR templates will require linking the implementing spec or amending it in the same PR.

What this forecloses

  • No "let's just prototype it real quick" branches that merge. Prototypes are fine; they don't merge. They produce input to specs.
  • No "the test is the spec" minimalism. Tests are necessary but not sufficient. Operators reading the project from the outside need prose to orient.
  • No silent behavioral drift. Behavior change without spec change is a process violation, full stop. Caught in review.
  • No "we'll write the spec after we ship" promises. The shipped behavior is the spec at that point, and we have lost the contract.

What becomes easier

  • Onboarding. New contributors (human or model) can read the spec and know what the code is supposed to do without reverse-engineering it.
  • Refactoring. A test suite encoded from a spec is a contract, not a coincidence. Refactors can be aggressive because the spec is the invariant.
  • Reasoning about correctness. "Does this match the spec?" is a tractable question. "Does this match the code's intent?" usually isn't.
  • AI-assisted contribution. A doc-first repo is one the kaged daemon itself can eventually work in. Specs are the legible interface a model needs to make non-vandalous changes.

What becomes harder

  • Anything urgent. Hotfixes for security or correctness still follow the process — they just compress it (spec amendment + tests + code in one PR, reviewed together). This is friction by design.
  • Speculative work. You can't merge "I tried X to see what happened." That stays in a branch or in an RFC.
  • Process-light contributions. A typo PR is fine. A "while I was in here I also refactored module Y" PR is not. Each change is atomic against a doc.

Alternatives considered

Alternative A — Code-first, docs-as-followup (the industry default)

Why tempting: Fastest path from idea to running code. Comfortable. Every IDE, AI tool, and CI pipeline is optimized for it.

Why rejected: It is the pattern we are explicitly defining ourselves against. The README, manifesto, and vision all say so. Adopting it would make the marketing copy a lie. Practically, the docs that ship under this model are either out of date or aspirational, and the operator has no way to tell which.

Alternative B — Doc-first, then code (skip TDD)

Why tempting: Captures the spec-as-contract benefit without the TDD ceremony. Faster than C.

Why rejected: The link between "the spec said X" and "the code does X" becomes manual. Reviewers have to read both and notice drift. Tests are how we mechanically couple the two. Without TDD, doc-first degrades into doc-followed-by-code-that-vaguely-matches.

Alternative C — Spec-by-test (Smalltalk tradition)

Why tempting: Lowest doc overhead. Tests are executable so they cannot rot. Sometimes elegant.

Why rejected: Tests are a great contract but a terrible introduction. Operators reading kaged from the outside — to integrate a plugin, to write a project DSL, to understand whether the sandbox model fits their threat model — need narrative. Tests don't explain why. Specs do. Compounded: kaged is mid-complexity software (a daemon, a DSL, a sandbox model, a plugin host). Specs-by-test scales poorly past trivial systems.

Alternative D — Two-track: docs land async, code lands when ready

Why tempting: Lets the doc and code work proceed in parallel. Less blocking.

Why rejected: Async docs are no docs. The whole value of the gate is that the spec is the contract the code is reviewed against. Asynchronous tracks mean the contract is whatever someone wrote down later. Same failure mode as Alternative A with extra steps.

References

  • docs/README.md — the document-types contract this ADR ratifies
  • docs/00-manifesto.md — principle 6, "Doc-first, then TDD"
  • ADR-0001 — sets up the "we are not inheriting an existing codebase" position, which is what makes doc-first plausible at all