ADR-0006: Project DSL is YAML with JSON Schema + Zod validation

Status: Accepted
Date: 2026-05-21
Deciders: @karasu
Supersedes: —
Superseded by: —
Follows from: RFC-0001

Context

RFC-0001 lays out the full design space: YAML, TOML, JSON/JSON5, TypeScript-as-config, a custom DSL (PKL / Dhall / Skylark variants), and a hybrid (YAML-canonical + optional TS-emit).

The constraints (full list in the RFC) reduce to:

The DSL is a security boundary. The cage allowlist, the prompt references, the interconnect graph — all of these gate what the model can do. The format must keep this boundary obvious.
The DSL is operator-authored. Not generated, not "imported from your IDE." Operators hand-write these files.
The DSL is operator-reviewed. Diffs are read. Pull requests on a project's DSL file are how teams notice "wait, when did we grant network access to that subagent?"
The DSL must not execute. A config file that can read env vars, call out to the network, or import arbitrary code is not a config file; it's a script. We do not run scripts as security-critical input.
The DSL must be parseable outside Bun. Plugins in other languages, external linters, IDE plugins — all should be able to read the format without embedding our runtime.
The DSL must be ergonomic for the audience. YAML-shaped tooling is the lingua franca of the self-hosted operator population. Kubernetes, Compose, GitHub Actions, Ansible, Home Assistant. The cognitive cost is already paid.

The RFC's lean (and my proposed pick) is YAML with JSON Schema validation, plus a Zod schema inside the daemon to give us TypeScript-level type safety at the API boundary.

Decision

The kaged project DSL is YAML. Schema is defined as JSON Schema (published at kaged.dev/schema/vN.json) and mirrored as a Zod schema inside the daemon. The DSL file is non-executable, fully declarative, and parsed once at load time with strict mode (unknown fields are errors). Schema version is encoded in a top-level version: field. There is no TS-as-config in v0; that path is deferred to a v1.x escape hatch if operator pressure justifies it.

Canonical form

Filename: .kaged/project.yaml at the root of a project directory.
File extension: .yaml (not .yml).

Schema declaration: Optional editor hint at the top of the file:

# yaml-language-server: $schema=https://kaged.dev/schema/v1.json
version: 1
project: music-site
...

Schema location: Published at https://kaged.dev/schema/vN.json, also shipped inside the kaged binary at a known path for offline use.
Encoding: UTF-8, LF line endings, no BOM. Strict.

Schema versioning

Version field: version: 1 (integer) at the top level. Required.
Major version = breaking changes. Bumping major requires a migration.
Minor changes are additive. New optional fields can be added without a major bump; old kaged versions ignore them (relaxed mode for forward-compat) but emit a warning.
Migration path: kaged ships kaged dsl migrate <file> that rewrites a v1 file into the current major, or errors with a diff if it can't.

Strictness

Unknown fields are errors, not warnings. Silent ignoring is how bad configs ship.
Type mismatches are errors. A string where a boolean is expected fails to load.
Path references are validated. If system_prompt: ./prompts/primary.md points to a file that doesn't exist, the DSL fails to load. We catch this at parse time, not at runtime.
Cross-references are validated. If interconnect.from: scraper references a subagent name that isn't defined, the DSL fails to load.

What the DSL contains (high-level shape; full spec in `docs/specs/project-dsl.md`)

# yaml-language-server: $schema=https://kaged.dev/schema/v1.json
version: 1
project: music-site

primary:
  model: claude-sonnet-4.6
  system_prompt: ./prompts/primary.md

subagents:
  - name: scraper
    model: claude-haiku
    system_prompt: ./prompts/scraper.md
    cage:
      fs:
        - { mode: ro, path: /data }
      net:
        allow:
          - "*.bandcamp.com"
          - "*.soundcloud.com"
      state: ephemeral
    can_be_called_by: [primary]

  - name: deployer
    model: claude-sonnet-4.6
    system_prompt: ./prompts/deployer.md
    cage:
      fs:
        - { mode: rw, path: /projects/music-site }
      net:
        allow: ["k3s.local:6443"]
      state: ephemeral
    can_be_called_by: [primary, scraper]

interconnect:
  - from: scraper
    to: deployer
    on: found_release

What is deferred (will be its own ADR / spec amendment)

No imports / includes. A DSL file is a single file. If repeated patterns emerge, we add cage_profiles: (named reusable cages) in a minor version. We do not add general !include directives.
No environment variable interpolation in the security-sensitive fields. Net allowlists, fs paths, and model names are not ${VAR}-interpolated. (Prompts and project names may interpolate from a documented allowlist of operator-supplied vars in a future minor. Open question.)
No TS-as-config. Deferred to a v1.x feature: a defineProject() helper that emits YAML at build time. Operators may run it manually; kaged only ever reads YAML.

Consequences

What this commits us to

A published JSON Schema as a first-class artifact. It lives at kaged.dev/schema/v1.json. We version it. We never break v1 once we cut it. We add v2 when we need breaking changes.
A Zod schema kept in sync with the JSON Schema. Two sources of truth is a smell, but the tradeoff is intentional: JSON Schema is the publishable contract; Zod is the runtime-typed contract inside the daemon. We will likely auto-generate one from the other (likely Zod → JSON Schema via a converter) so they cannot drift.
A strict parser path. No YAML 1.1 mistakes (Norway problem). We use a YAML 1.2 parser (yaml npm package or equivalent) with strict mode.
A kaged dsl validate command. Operators can validate a DSL file from the CLI without booting a session. CI on operator repos can run it pre-merge.
A kaged dsl migrate command. Forward migration between schema major versions.
First-class editor support. VSCode's YAML extension reads the $schema directive and gives inline validation + autocomplete out of the box. We test this and document it in the operator guide.

What this forecloses

No Turing-complete config. Conditional logic in the DSL is not happening. Computed values are not happening. If an operator wants those, they write a script that emits a project.yaml and points kaged at the emitted file.
No "smart" defaults that grant access. Every allowlist must be explicit. No cage: { fs: auto } shortcut that infers paths.
No silent unknown-field tolerance. A typo in subagnets: doesn't get silently ignored. It fails loudly.
No .yml alias for the filename. Reduces "did I name it right?" noise. We pick one and stick to it.

What becomes easier

Onboarding. Operators who have ever written a Kubernetes manifest or a Compose file can author a kaged project DSL from a sample in five minutes.
Tooling. yq works. kustomize-style patches work. git diff is readable. Every operator's existing YAML toolchain is now kaged-compatible.
Reviewing security changes. A PR that changes a cage block is a YAML diff. It shows up clearly in any code review tool.
Schema documentation. JSON Schema fields can carry inline description: text; editor tooltips show them automatically. The schema is the operator manual.
External validators. Anyone can write a kaged-DSL linter in any language. Just consume the public JSON Schema.

What becomes harder

Composition / reuse. Operators with five projects that share a cage profile will copy-paste. The escape hatches (cage_profiles: minor feature, TS-emit v1.x) are deferred. Some operators will be frustrated by this in the interim.
Validating prompts. YAML can't say "this string is a path to an existing markdown file with a particular frontmatter shape." We validate that programmatically at load time, not via schema.
Authoring complex graphs. A project with 12 subagents and 30 interconnects is a lot of YAML. We accept this as a v0 problem. The web UI may eventually render and edit the topology graphically.
Maintaining two schemas (JSON + Zod). Mitigated by code-gen from one source. Not zero ongoing cost.

Open questions deliberately left open

These come up next, in docs/specs/project-dsl.md:

Exact JSON Schema for each field. This ADR commits to the approach; the full schema is a spec deliverable.
Prompt file format. Markdown with optional YAML frontmatter? Pure markdown? Versioned alongside the DSL? Separate ADR if it gets contentious.
What an interconnect "event" actually is. The on: found_release syntax sketched in the architecture doc is illustrative. The real event taxonomy is a spec question.
Cage state: semantics. ephemeral vs persistent vs scratch — naming and semantics need pinning down.
Whether primary may have can_be_called_by: [other_primary] for the cross-project interconnect deferred to v2. Probably yes, but we don't need to commit now.

Alternatives considered

(Full design-space exploration is in RFC-0001. Summarized here.)

Alternative A — TOML

Why rejected: Nested structures (cages with fs + net) read awkwardly. [[subagents]] repetition gets noisy. Operator community is smaller. No upside that YAML doesn't already provide.

Alternative B — JSON / JSON5

Why rejected: Verbose to hand-write. No comments in vanilla JSON; JSON5 is less universal. Reads as data, not as a spec. Loses to YAML on operator ergonomics, ties on machine-readability.

Alternative C — TypeScript as config

Why rejected: Executable config violates the trust-boundary principle. Sandboxing the loader is real engineering for a benefit (type-checking, composition) we can get most of via JSON Schema in editors. Deferred to a v1.x optional escape hatch.

Alternative D — Custom DSL (PKL, Dhall, Skylark variant, or invented)

Why rejected: Engineering cost is enormous (parser, formatter, LSP, editor plugins, error messages, docs). Operator onboarding cost is enormous (a new language to learn). The brand argument ("opinionated craft warrants its own language") is real but premature. We can graduate in v2 if YAML proves insufficient. Building a DSL in v0 is the kind of yak-shave that ships nothing.

Alternative E — Hybrid: YAML canonical, TS escape hatch from day one

Why rejected for v0: Adds a second surface to spec, document, test, and maintain. Operator confusion ("which is the real file?") is real. We can add the TS-emit path in a v1.x without breaking anything, and we'll have actual usage data on whether operators want it. Premature.

References

RFC-0001 — the full design-space exploration
docs/02-architecture.md — DSL sketch used as architecture pseudocode
docs/03-glossary.md — terms the DSL uses (project, primary, subagent, cage, interconnect, plugin)
ADR-0004 — the runtime that parses the DSL
JSON Schema: https://json-schema.org/
Zod: https://zod.dev/
VSCode YAML extension: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml
Original discussion: design conversation with colleagues, 2026-05-21

Amendments

2026-05-26 — DSL content reorganized around recursive `AgentSpec` (ADR-0022)

DSL format is unchanged (still YAML 1.2, still validated against JSON Schema + Zod, still strict mode). DSL content gets a recursive AgentSpec shape per ADR-0022: PrimaryAgent and Subagent collapse into a single AgentSpec type used at every position in the agent tree. cage and tools move from project-level / subagent-only to per-agent on AgentSpec. cage_defaults, can_be_called_by, interconnect, and the project-level tools: block are removed. Schema stays at version: 1 (pre-alpha, no migration support). Full spec amendments in docs/specs/project-dsl.md.