ADR-0006: Project DSL is YAML with JSON Schema + Zod validation
- Status: Accepted
- Date: 2026-05-21
- Deciders: @karasu
- Supersedes: —
- Superseded by: —
- Follows from: RFC-0001
Context
RFC-0001 lays out the full design space: YAML, TOML, JSON/JSON5, TypeScript-as-config, a custom DSL (PKL / Dhall / Skylark variants), and a hybrid (YAML-canonical + optional TS-emit).
The constraints (full list in the RFC) reduce to:
- The DSL is a security boundary. The cage allowlist, the prompt references, the interconnect graph — all of these gate what the model can do. The format must keep this boundary obvious.
- The DSL is operator-authored. Not generated, not "imported from your IDE." Operators hand-write these files.
- The DSL is operator-reviewed. Diffs are read. Pull requests on a project's DSL file are how teams notice "wait, when did we grant network access to that subagent?"
- The DSL must not execute. A config file that can read env vars, call out to the network, or import arbitrary code is not a config file; it's a script. We do not run scripts as security-critical input.
- The DSL must be parseable outside Bun. Plugins in other languages, external linters, IDE plugins — all should be able to read the format without embedding our runtime.
- The DSL must be ergonomic for the audience. YAML-shaped tooling is the lingua franca of the self-hosted operator population. Kubernetes, Compose, GitHub Actions, Ansible, Home Assistant. The cognitive cost is already paid.
The RFC's lean (and my proposed pick) is YAML with JSON Schema validation, plus a Zod schema inside the daemon to give us TypeScript-level type safety at the API boundary.
Decision
The kaged project DSL is YAML. Schema is defined as JSON Schema (published at
kaged.dev/schema/vN.json) and mirrored as a Zod schema inside the daemon. The DSL file is non-executable, fully declarative, and parsed once at load time with strict mode (unknown fields are errors). Schema version is encoded in a top-levelversion:field. There is no TS-as-config in v0; that path is deferred to a v1.x escape hatch if operator pressure justifies it.
Canonical form
- Filename:
.kaged/project.yamlat the root of a project directory. - File extension:
.yaml(not.yml). - Schema declaration: Optional editor hint at the top of the file:
# yaml-language-server: $schema=https://kaged.dev/schema/v1.json version: 1 project: music-site ... - Schema location: Published at
https://kaged.dev/schema/vN.json, also shipped inside the kaged binary at a known path for offline use. - Encoding: UTF-8, LF line endings, no BOM. Strict.
Schema versioning
- Version field:
version: 1(integer) at the top level. Required. - Major version = breaking changes. Bumping major requires a migration.
- Minor changes are additive. New optional fields can be added without a major bump; old kaged versions ignore them (relaxed mode for forward-compat) but emit a warning.
- Migration path: kaged ships
kaged dsl migrate <file>that rewrites a v1 file into the current major, or errors with a diff if it can't.
Strictness
- Unknown fields are errors, not warnings. Silent ignoring is how bad configs ship.
- Type mismatches are errors. A string where a boolean is expected fails to load.
- Path references are validated. If
system_prompt: ./prompts/primary.mdpoints to a file that doesn't exist, the DSL fails to load. We catch this at parse time, not at runtime. - Cross-references are validated. If
interconnect.from: scraperreferences a subagent name that isn't defined, the DSL fails to load.
What the DSL contains (high-level shape; full spec in docs/specs/project-dsl.md)
# yaml-language-server: $schema=https://kaged.dev/schema/v1.json
version: 1
project: music-site
primary:
model: claude-sonnet-4.6
system_prompt: ./prompts/primary.md
subagents:
- name: scraper
model: claude-haiku
system_prompt: ./prompts/scraper.md
cage:
fs:
- { mode: ro, path: /data }
net:
allow:
- "*.bandcamp.com"
- "*.soundcloud.com"
state: ephemeral
can_be_called_by: [primary]
- name: deployer
model: claude-sonnet-4.6
system_prompt: ./prompts/deployer.md
cage:
fs:
- { mode: rw, path: /projects/music-site }
net:
allow: ["k3s.local:6443"]
state: ephemeral
can_be_called_by: [primary, scraper]
interconnect:
- from: scraper
to: deployer
on: found_release
What is deferred (will be its own ADR / spec amendment)
- No imports / includes. A DSL file is a single file. If repeated patterns emerge, we add
cage_profiles:(named reusable cages) in a minor version. We do not add general!includedirectives. - No environment variable interpolation in the security-sensitive fields. Net allowlists, fs paths, and model names are not
${VAR}-interpolated. (Prompts and project names may interpolate from a documented allowlist of operator-supplied vars in a future minor. Open question.) - No TS-as-config. Deferred to a v1.x feature: a
defineProject()helper that emits YAML at build time. Operators may run it manually; kaged only ever reads YAML.
Consequences
What this commits us to
- A published JSON Schema as a first-class artifact. It lives at
kaged.dev/schema/v1.json. We version it. We never break v1 once we cut it. We add v2 when we need breaking changes. - A Zod schema kept in sync with the JSON Schema. Two sources of truth is a smell, but the tradeoff is intentional: JSON Schema is the publishable contract; Zod is the runtime-typed contract inside the daemon. We will likely auto-generate one from the other (likely Zod → JSON Schema via a converter) so they cannot drift.
- A strict parser path. No YAML 1.1 mistakes (Norway problem). We use a YAML 1.2 parser (
yamlnpm package or equivalent) with strict mode. - A
kaged dsl validatecommand. Operators can validate a DSL file from the CLI without booting a session. CI on operator repos can run it pre-merge. - A
kaged dsl migratecommand. Forward migration between schema major versions. - First-class editor support. VSCode's YAML extension reads the
$schemadirective and gives inline validation + autocomplete out of the box. We test this and document it in the operator guide.
What this forecloses
- No Turing-complete config. Conditional logic in the DSL is not happening. Computed values are not happening. If an operator wants those, they write a script that emits a
project.yamland points kaged at the emitted file. - No "smart" defaults that grant access. Every allowlist must be explicit. No
cage: { fs: auto }shortcut that infers paths. - No silent unknown-field tolerance. A typo in
subagnets:doesn't get silently ignored. It fails loudly. - No
.ymlalias for the filename. Reduces "did I name it right?" noise. We pick one and stick to it.
What becomes easier
- Onboarding. Operators who have ever written a Kubernetes manifest or a Compose file can author a kaged project DSL from a sample in five minutes.
- Tooling.
yqworks.kustomize-style patches work.git diffis readable. Every operator's existing YAML toolchain is now kaged-compatible. - Reviewing security changes. A PR that changes a cage block is a YAML diff. It shows up clearly in any code review tool.
- Schema documentation. JSON Schema fields can carry inline
description:text; editor tooltips show them automatically. The schema is the operator manual. - External validators. Anyone can write a kaged-DSL linter in any language. Just consume the public JSON Schema.
What becomes harder
- Composition / reuse. Operators with five projects that share a cage profile will copy-paste. The escape hatches (
cage_profiles:minor feature, TS-emit v1.x) are deferred. Some operators will be frustrated by this in the interim. - Validating prompts. YAML can't say "this string is a path to an existing markdown file with a particular frontmatter shape." We validate that programmatically at load time, not via schema.
- Authoring complex graphs. A project with 12 subagents and 30 interconnects is a lot of YAML. We accept this as a v0 problem. The web UI may eventually render and edit the topology graphically.
- Maintaining two schemas (JSON + Zod). Mitigated by code-gen from one source. Not zero ongoing cost.
Open questions deliberately left open
These come up next, in docs/specs/project-dsl.md:
- Exact JSON Schema for each field. This ADR commits to the approach; the full schema is a spec deliverable.
- Prompt file format. Markdown with optional YAML frontmatter? Pure markdown? Versioned alongside the DSL? Separate ADR if it gets contentious.
- What an
interconnect"event" actually is. Theon: found_releasesyntax sketched in the architecture doc is illustrative. The real event taxonomy is a spec question. - Cage
state:semantics.ephemeralvspersistentvsscratch— naming and semantics need pinning down. - Whether
primarymay havecan_be_called_by: [other_primary]for the cross-project interconnect deferred to v2. Probably yes, but we don't need to commit now.
Alternatives considered
(Full design-space exploration is in RFC-0001. Summarized here.)
Alternative A — TOML
Why rejected: Nested structures (cages with fs + net) read awkwardly. [[subagents]] repetition gets noisy. Operator community is smaller. No upside that YAML doesn't already provide.
Alternative B — JSON / JSON5
Why rejected: Verbose to hand-write. No comments in vanilla JSON; JSON5 is less universal. Reads as data, not as a spec. Loses to YAML on operator ergonomics, ties on machine-readability.
Alternative C — TypeScript as config
Why rejected: Executable config violates the trust-boundary principle. Sandboxing the loader is real engineering for a benefit (type-checking, composition) we can get most of via JSON Schema in editors. Deferred to a v1.x optional escape hatch.
Alternative D — Custom DSL (PKL, Dhall, Skylark variant, or invented)
Why rejected: Engineering cost is enormous (parser, formatter, LSP, editor plugins, error messages, docs). Operator onboarding cost is enormous (a new language to learn). The brand argument ("opinionated craft warrants its own language") is real but premature. We can graduate in v2 if YAML proves insufficient. Building a DSL in v0 is the kind of yak-shave that ships nothing.
Alternative E — Hybrid: YAML canonical, TS escape hatch from day one
Why rejected for v0: Adds a second surface to spec, document, test, and maintain. Operator confusion ("which is the real file?") is real. We can add the TS-emit path in a v1.x without breaking anything, and we'll have actual usage data on whether operators want it. Premature.
References
- RFC-0001 — the full design-space exploration
docs/02-architecture.md— DSL sketch used as architecture pseudocodedocs/03-glossary.md— terms the DSL uses (project, primary, subagent, cage, interconnect, plugin)- ADR-0004 — the runtime that parses the DSL
- JSON Schema: https://json-schema.org/
- Zod: https://zod.dev/
- VSCode YAML extension: https://marketplace.visualstudio.com/items?itemName=redhat.vscode-yaml
- Original discussion: design conversation with colleagues, 2026-05-21
Amendments
2026-05-26 — DSL content reorganized around recursive AgentSpec (ADR-0022)
DSL format is unchanged (still YAML 1.2, still validated against JSON Schema + Zod, still strict mode). DSL content gets a recursive AgentSpec shape per ADR-0022: PrimaryAgent and Subagent collapse into a single AgentSpec type used at every position in the agent tree. cage and tools move from project-level / subagent-only to per-agent on AgentSpec. cage_defaults, can_be_called_by, interconnect, and the project-level tools: block are removed. Schema stays at version: 1 (pre-alpha, no migration support). Full spec amendments in docs/specs/project-dsl.md.