ADR-0022: Agents are recursive; tools and cage are per-agent

  • Status: Accepted
  • Date: 2026-05-26
  • Deciders: @karasu
  • Supersedes:
  • Superseded by:

Context

The DSL today has two structural problems that have been worked around but never fixed, and the workarounds are starting to compound.

Problem 1: subagents are half-agents. A Subagent in docs/specs/project-dsl.md has model, system_prompt, cage, can_be_called_by, and parameters. It has no tools field. A subagent's tool surface is whatever the project's top-level tools: block enables, narrowed by the subagent's cage. Two consequences fall out of this: a subagent cannot be a specialist with its own curated toolset, and a subagent cannot itself have subagents. The "subagent" type is poorer than the "primary" type for no defensible reason.

Problem 2: cage and tools are project-level constructs masquerading as per-agent ones. Cage lives in two places (subagents.<name>.cage and cage_defaults at the top level) but never on the primary. Tools live exclusively at the project level. The primary "runs uncaged" by convention encoded only in docs/specs/agent.md prose, not in the schema. There is no path in the current DSL to express "this primary is a coordinator that has only issue and workflow tools" — the primary always inherits whatever the project enables.

Problem 3: project-reference subagents (added 2026-05-26) inline a nested project's primary as a callable subagent of the parent. The compiled tree carries a wrapped _compiled field alongside the original path / name / description / overrides reference fields. This is a different shape than a regular subagent — downstream consumers (the harness, the synthesized endpoint, the UI DSL viewer) have to branch on which shape they're holding. The recent federated-config.md amendment un-deferred this work but kept the wrapper shape as a temporary measure.

Problem 4: interconnect overlaps can_be_called_by. interconnect declares event-routed dispatch (scraper -> deployer: on_event(found_release)); can_be_called_by declares the capability to call. In practice the LLM running in the primary already decides what to dispatch to whom based on its reasoning. Declarative event routes add a parallel mechanism that no one is using and that complicates the call graph.

Problem 5: there is no "project-management" tool namespace. Issues (ADR-0020) and workflows (ADR-0019) are first-class concepts, but the agent-facing tool surface for them does not exist as a namespace. Currently issues are operator-UI-only and workflows are invoked via dedicated endpoints; an agent that wants to file or update an issue, or trigger a workflow as part of its reasoning, has no tool to call.

These problems point at one underlying simplification: the primary and a subagent should be the same kind of thing, and that kind of thing owns its own tools and cage. Project references, when flattened, are also the same kind of thing. The DSL's content reorganizes around a single recursive shape.

This ADR makes that reorganization the new ground floor.

Decision

The DSL collapses PrimaryAgent and Subagent into a single recursive AgentSpec. AgentSpec carries its own cage and its own tools. A project's primary field is an AgentSpec; a subagent of any agent is an AgentSpec; an agent's subagents field is Record<string, AgentSpec>, recursively. Project-reference subagents flatten into AgentSpec subtrees at compile time. can_be_called_by, interconnect, and cage_defaults are removed: the tree structure is the call graph, and there are no agent-level defaults below the project root.

The specific changes:

  1. One agent shape. AgentSpec = { model, system_prompt, description?, parameters?, cage, tools?, subagents? }. Used identically for ProjectDsl.primary and for every entry under any subagents map at any depth.

  2. Recursive nesting. AgentSpec.subagents is Record<string, AgentSpec> and may itself contain agents with their own subagents. Depth is bounded (16 levels, same as the existing project-reference depth limit). The tree is the call graph: a parent agent can call its direct children; sibling and cross-tree calls do not exist.

  3. cage moves to the agent. AgentSpec.cage is required on every agent including the primary. cage_defaults is removed from the project schema. Each agent declares its own cage; there is no inheritance between parent and child cages because cages are per-process and a child runs in its own sandbox context.

  4. tools moves to the agent. AgentSpec.tools is an optional per-agent override map with the existing ToolOverride shape. The project-level tools: block is removed. Tool resolution per agent: built-in defaults → role-based defaults (see below) → agent's own tools: block → cage filter at dispatch.

  5. Role-based defaults are positional. The agent reachable via ProjectDsl.primary (the "root agent") gets kaged.issue.* and kaged.workflow.* enabled by default. Every other agent in the tree starts with an empty tool set; the operator opts in per agent. There is no schema-level "primary vs subagent" distinction — the default is a property of position.

  6. can_be_called_by is removed. Sibling-to-sibling and grandparent-to-grandchild dispatch are no longer expressible. If two agents need to be called by a third, they become children of that third agent.

  7. interconnect is removed. Reasoning-driven dispatch via the synthetic agent-{key} tools is the only dispatch mechanism. The LLM in the parent decides when to call which child.

  8. Project references flatten into AgentSpec subtrees. A subagents.<name>: { path: project:/..., name?, description?, overrides? } resolves at compile time to the nested project's primary becoming this entry, with the nested project's subagents becoming this entry's subagents, recursively. After flattening there is no residual path: or _compiled wrapper — only AgentSpec.

  9. New tool namespaces. kaged.issue.* (create, update, comment, transition, list, get) and kaged.workflow.* (trigger, list, status). These tools are registered in the built-in tool registry per docs/specs/agent-tooling.md. They have a principal-scope tag: the schema rejects them on non-root agents.

  10. Issue bubble-up is the only subagent-issue pattern. Subagents do not have kaged.issue.* access. Issue context that a subagent needs to act on is part of the delegation message the parent sends. Subagent return values bubble back to the parent, which decides whether to update the issue. This keeps subagents domain-blind and the audit trail single-rooted.

  11. The synthesized endpoint is total truth. GET /api/v1/projects/:id/dsl/synthesized returns the fully flattened, resolved DSL: project references expanded to AgentSpec trees, all tools: overrides applied, all defaults materialized, no path: or wrapper fields remaining. This is the shape the harness consumes.

The unified shape

Before (current v1, with the new agent-level tools that this ADR adds — illustrating only the structural change):

version: 1
project: example

primary:
  model: smart-generalist
  system_prompt: project:/prompts/primary.md

cage_defaults:
  fs: []
  net: { allow: [] }
  state: ephemeral

subagents:
  scraper:
    model: low-cost-fast
    system_prompt: project:/prompts/scraper.md
    cage:
      fs: [{ mode: ro, path: data }]
      net: { allow: ["*.example.com"] }
      state: ephemeral
    can_be_called_by: [primary]

  deployer:
    model: smart-careful
    system_prompt: project:/prompts/deployer.md
    cage: disabled
    can_be_called_by: [primary, scraper]

  child_project:
    path: project:/sub/builder
    overrides:
      primary:
        model: smart-careful

interconnect:
  release_pipeline:
    from: scraper
    to: deployer
    on: found_release

tools:
  "dap.*": null

After:

version: 1
project: example

primary:
  model: smart-generalist
  system_prompt: project:/prompts/primary.md
  cage: disabled                            # interim; see § Interim state
  # tools: implicit — root agent gets kaged.issue.* and kaged.workflow.*
  subagents:
    scraper:
      model: low-cost-fast
      system_prompt: project:/prompts/scraper.md
      cage:
        fs: [{ mode: ro, path: data }]
        net: { allow: ["*.example.com"] }
        state: ephemeral
      tools:
        "file.read": { enabled: true }
        "search.grep": { enabled: true }

    deployer:
      model: smart-careful
      system_prompt: project:/prompts/deployer.md
      cage: disabled
      tools:
        "file.*": { enabled: true }
        "kaged.workflow.trigger": { enabled: true }   # PARSE ERROR: only root may have kaged.*

    child_project:
      path: project:/sub/builder              # flattened at compile time
      overrides:
        model: smart-careful

After flattening, the synthesized endpoint returns child_project as a plain AgentSpec with whatever the nested project's primary declared, with its own subagents map inlined, with no path: remaining. The shape is uniform top-to-bottom.

Consequences

What this commits us to

  • A single AgentSpec type in @kaged/dsl, used for every position in the agent tree. The schema mirror (Zod + JSON Schema) gets a recursive type alias.
  • A documented "root agent" concept in docs/specs/project-dsl.md — the agent at ProjectDsl.primary — with explicit default-tool rules attached to that position.
  • Two new tool namespaces in docs/specs/agent-tooling.md: kaged.issue.* and kaged.workflow.*. Each tool definition declares a principal-scope (root-only initially); the registry rejects registration on non-root agents.
  • A synthesized endpoint that returns a fully inlined tree. Implementation work in packages/dsl (compiler) and packages/daemon (handler) per docs/specs/http-api.md.
  • A clear, single rule for the call graph: parent → direct children, period. Documented in docs/specs/agent.md and reflected in the Mastra supervisor wiring.
  • A documented issue bubble-up pattern in docs/specs/issues.md — issues live with the root agent; subagents receive issue context as delegation framing; subagent returns are interpreted by the parent.
  • A regenerated set of example DSL files in docs/dsl/examples/ demonstrating the new shape. Existing examples are rewritten in place; the file list does not change.

What this forecloses

  • No sibling-to-sibling dispatch. Two top-level subagents cannot call each other directly. If the operator wants that pattern, they nest one under the other or move both under a shared parent.
  • No grandparent-to-grandchild dispatch. A root agent cannot reach across a child to invoke a grandchild. Each level of the tree mediates the level below it.
  • No declarative event-routed dispatch. interconnect is gone. If reasoning-driven dispatch is insufficient for a workflow, the workflow's prompt orchestrates the steps; the DSL no longer carries event hooks.
  • No project-level cage defaults. Each agent declares its own cage. The repetition this introduces in projects with many similarly-caged subagents is real and accepted as the cost of explicitness; a project-level helper macro is plausible v1.x.
  • No project-level tool overrides. Each agent declares its own tool surface. A project-wide "disable DAP" requires touching each agent (or, more typically, having only one or two agents to which DAP would apply).
  • kaged.issue.* and kaged.workflow.* on non-root agents. Schema-level rejection. A subagent cannot file or transition issues. This is enforced; it is not "lean strongly against."
  • The _compiled wrapper shape from the 2026-05-26 project-reference amendment. It existed only as long as project references had a different shape from regular subagents. With unification, the wrapper has no remaining purpose.

What becomes easier

  • Reading a DSL file. One shape, recursive, no special cases for "primary" vs "subagent" vs "project-reference." The reader's mental model is a single tree.
  • Writing a flavor-B project (primary coordinates; subagents work). The operator declares primary with no tools: override (gets PM tools by default), then declares primary.subagents.<worker> with the work tools each needs. There is no schema flag or mode toggle to remember.
  • Writing a flavor-A project (primary does work directly). The operator overrides primary.tools to add file.*, lsp.*, etc. Same shape, different content.
  • Reasoning about cage scope. Each agent is a node with its own cage. There is no inheritance to chase, no default to remember.
  • Building the synthesized endpoint, the DSL viewer, the audit log, and any other consumer of compiled DSL. They all walk one shape.
  • Adding new agent positions in the future. A "scheduled agent" or a "system agent" is structurally just another AgentSpec.

What becomes harder

  • Repeating boilerplate. A project with eight subagents that all need fs: [{ mode: ro, path: data }] writes that line eight times. Documented tradeoff; addressed later with macros or templates if it becomes painful.
  • Migrating any in-repo DSL that uses sibling dispatch via can_be_called_by. The dogfood .kaged/project.yaml and every example in docs/dsl/examples/ must be rewritten. Pre-alpha; no external operators to migrate.
  • Designing prompts for nested-subagent chains. A grandchild's prompt must assume the parent's framing without seeing the grandparent's framing — same constraint as today, but now possible at arbitrary depth. Prompt-authoring guidance updated in docs/dsl/.
  • Reasoning about token cost in deep trees. Each delegation forwards filtered messages; deep trees forward more times. Existing audit infrastructure already records per-agent tokens; no new mechanism needed, but operators should be aware.

Spec amendments required

The decision above is the contract. The following spec amendments implement it. Each lands in its own PR per ADR-0003; each cites this ADR in its ## Amendments section.

# File Change
1 docs/specs/project-dsl.md Define AgentSpec as the single recursive shape. Rewrite the primary and subagents sections to reference it. Add an ## AgentSpec section. Remove cage_defaults section. Remove interconnect section. Remove top-level tools: section (move to AgentSpec.tools). Rewrite the project-reference flattening section to describe AgentSpec subtree output. Update JSON Schema in Appendix A. Update top-level shape example.
2 docs/specs/agent-tooling.md Add kaged.issue.* and kaged.workflow.* namespaces with full tool definitions. Add a principal_scope: "root-only" | "any" field to ToolDefinition. Document tool resolution per agent (built-in → role-default → agent override → cage filter). Update namespace table.
3 docs/specs/agent.md Update the Mastra supervisor section: the harness walks the recursive AgentSpec tree, registering each child as a synthetic agent-{key} tool on its parent. No can_be_called_by checks. No event routing. Per-agent tool resolution.
4 docs/specs/federated-config.md Update the project-reference flattening section: output is an AgentSpec subtree, not a wrapped _compiled shape. Cycle detection unchanged. Depth limit unchanged.
5 docs/specs/http-api.md Update the GET /api/v1/projects/:id/dsl/synthesized section: response is the fully flattened DSL with no path: entries and all overrides applied. Document the shape contract.
6 docs/specs/issues.md Add an ## Issue bubble-up section: subagents do not access issues directly; delegation framing carries issue context; subagent returns feed primary's decision to update.
7 docs/specs/workflows.md Update the tool intersection logic: workflows compose against the root agent's tool surface, not the (removed) project-level tools: block.
8 docs/dsl/examples/ Rewrite every example file: single-subagent.yaml, defaults.yaml, insecure.yaml, portable.yaml, and any others present. Remove the defaults.yaml if cage_defaults was its only reason to exist; replace with a nested.yaml showing recursive subagents.
9 docs/dsl/README.md Update operator-facing prose to describe the recursive shape. Update the example index.
10 .kaged/project.yaml (dogfood) Update to the new shape.
11 JSON Schema at kaged.dev/schema/v1.json Republish with the recursive AgentSpec shape. See § Schema version below.

After spec PRs land:

  • Tests. Each amended spec triggers failing tests per ADR-0003: parser tests (@kaged/dsl), tool registry tests (@kaged/agent-tooling), harness tests (@kaged/harness), daemon endpoint tests (@kaged/daemon), example-validation tests (CI walk of docs/dsl/examples/).
  • Code. Implementation follows tests. Approximate package surface: @kaged/dsl (schema + compiler), @kaged/agent-tooling (new tool registrations + principal-scope check), @kaged/harness (recursive Mastra wiring), @kaged/daemon (synthesized endpoint shape, issue-tool dispatch).
  • STATUS.md sync. Per the AGENTS.md hard sync rule, code changes land with matching STATUS.md entries in the same diff.

Existing ADRs amended

This ADR does not supersede any existing ADR. It amends the following:

  • ADR-0006 — DSL format unchanged (still YAML 1.2, still validated, still strict-mode). DSL content gets the recursive AgentSpec shape. An amendment block is added to ADR-0006 noting the content reorganization.
  • ADR-0009 — Sandbox mechanism unchanged. Cage location in the DSL moves from subagents.<name>.cage + cage_defaults to per-agent AgentSpec.cage. The primary now has a cage field; supervisor work to actually cage the primary process is scheduled here as a follow-up (see § Interim state). An amendment block is added to ADR-0009.
  • ADR-0015 — Merge semantics unchanged. Project-reference flattening output changes from wrapped _compiled to direct AgentSpec subtree. Section 7 ("Compiled Contextualization") is amended in the federated-config spec.
  • ADR-0019 — Workflow model unchanged. Tool intersection logic now operates against the root agent's tool surface instead of the removed project-level tools: block. Amendment block added to ADR-0019 and to workflows.md.

Interim state

Two pieces of this ADR cannot ship runtime support in the same chain as the spec amendments:

Primary cage runtime. The schema requires cage on every AgentSpec including the primary. The supervisor today does not cage the primary process — primary runs as the daemon's UID. Until the supervisor is extended to spawn the primary in its own bwrap context, the only legal value for the root agent's cage is disabled. The DSL parser emits a parse-time error for any other value. The full cage block is accepted on every non-root agent. A follow-up ADR (or amendment to ADR-0009) schedules the supervisor work; the eventual default for the root agent is "locked to project root, no network, ephemeral state."

Schema version. This is a breaking change to the DSL's content shape (cage_defaults removed, can_be_called_by removed, interconnect removed, project-level tools removed, recursive subagents added). The repo is pre-alpha and the established pattern from the named-object-map rewrite (2026-05-25) is "no migration support; new format enforced directly" at version: 1. We continue that pattern here: schema stays at version: 1, no migration tooling, dogfood and examples are rewritten in the same PR chain.

Alternatives considered

Alternative A — Keep primary and Subagent as separate types; add tools and recursive subagents to both

Why tempting: smaller schema change. Preserves the conceptual distinction between "the entry point" and "a worker."

Why rejected: the distinction is only meaningful by position. Two types that are structurally identical except for which fields are technically optional are a smell, not a design. Maintaining the dual-type system means every consumer (parser, harness, viewer, audit) branches on type when it could walk one shape. The position-based default-tool rule is cleaner than a type-based one.

Alternative B — Explicit kind: discriminator on each agent (simple vs coordinator)

Why tempting: a glance at the file tells you the operator's intent. Schema can validate "coordinator may not have work tools."

Why rejected: invents categories the operator didn't ask for and doesn't need. The shape of tools: already says everything: an agent with file tools is doing work; an agent with only kaged.* tools is coordinating. A kind: field is metadata duplicating the data. Per the discussion preceding this ADR: "we can invent C, D, E ideas without having to label them."

Alternative C — Keep can_be_called_by; allow sibling dispatch

Why tempting: existing examples use it. Real workloads sometimes look like "scraper finds X; deployer ships X" without needing a coordinator agent in between.

Why rejected: the tree's hierarchy is the call graph in every other recursive-agent system, and adding sibling capability requires every consumer to track a side-channel call graph in addition to the tree. The "scraper → deployer" pattern is well-modeled by a coordinator parent with both as children; the LLM in the coordinator does the routing the can_be_called_by edge used to do. The simplification is worth the small refactor cost in existing examples.

Alternative D — Keep interconnect for event-routed dispatch

Why tempting: deterministic event routes are easier to reason about than LLM-driven dispatch.

Why rejected: kaged is built on LLM reasoning. If a workload genuinely needs deterministic step-graph execution, that is a workflow (ADR-0019) — a separate, intentional construct. The DSL's agent graph is for reasoning-driven dispatch. Two parallel dispatch mechanisms split the audit story and the operator's mental model. Pick one; we pick reasoning.

Alternative E — Per-agent tools but keep project-level tools as defaults

Why tempting: avoids per-agent boilerplate when many agents share a tool surface.

Why rejected: the role-based defaults (root gets PM tools; subagents start empty) already handle the common cases. The cases that remain are genuinely heterogeneous — a scraper, a deployer, and a builder rarely want the same tool surface. The "project-level defaults that some agents inherit and others don't" model would put us back to needing inheritance rules, which is the complexity we're escaping.

Alternative F — Defer the recursive change; only do cage/tools per-agent in this ADR

Why tempting: smaller blast radius. Easier review.

Why rejected: the recursive change is what makes the per-agent shape coherent. Project references already need to flatten into something; the choice is "flatten into a wrapper" or "flatten into AgentSpec." If we ship per-agent tools/cage without the recursive shape, project references stay weird and the next ADR has to do the unification anyway. One coherent change beats two half-changes.

Open questions

These are not blockers for accepting the ADR, but each needs a decision during the spec-amendment phase:

  1. Guest-facing agent topology. The current workflow spec assumes the workflow itself is the agent the guest interacts with. The conversation preceding this ADR raised whether there should be a dedicated guest-facing agent (a "concierge") in the project DSL that triages guest requests, picks workflows, and gathers inputs in natural language. This is genuinely a separate decision and belongs in a follow-up ADR (probably 0023). It does not change the shape decided here — a concierge would simply be another agent position the operator can declare. Left open.

  2. kaged.subagent.invoke as an explicit tool. Mastra exposes child agents as synthetic agent-{key} tools today. Should the synthetic tools also be visible in the tool registry under a kaged.subagent.* namespace for audit-log uniformity and configurability? Lean yes — it makes the operator-facing tool list complete — but defer to the agent-tooling spec amendment.

  3. Per-agent prompt-file naming convention. With recursive nesting, prompt files for deeply-nested agents proliferate. A naming convention (prompts/<agent-path>.md?) would help. Operator preference, not a schema concern. Documented in docs/dsl/ during amendment.

  4. Lint-time warning for child cage broader than parent cage. Recommended; not blocking. The parser can emit a warning when an agent's cage grants more than its parent's cage grants (e.g., parent is fs: [{mode: ro, path: src}] and child is cage: disabled). Defer the precise warning text and conditions to the project-dsl.md amendment.

References

  • ADR-0003 — doc-first then TDD; the process this ADR feeds into
  • ADR-0006 — DSL format; amended by this ADR (content, not format)
  • ADR-0009 — sandbox mechanism; amended by this ADR (cage location + primary cage scheduled)
  • ADR-0011 — portability; preserved (paths still project-relative; aliases unchanged)
  • ADR-0012 — Mastra supervisor pattern; the harness wiring this ADR adjusts
  • ADR-0015 — federated config; amended by this ADR (project-reference flattening shape)
  • ADR-0019 — workflows; amended by this ADR (tool intersection target)
  • ADR-0020 — issues; this ADR adds the kaged.issue.* tool namespace and the bubble-up pattern
  • docs/specs/project-dsl.md — the spec carrying most of the change
  • docs/specs/agent-tooling.md — the spec gaining new namespaces
  • docs/specs/federated-config.md — the spec amended for flattening
  • Original discussion: design conversation with colleagues, 2026-05-26