ADR-0009: Sandbox technology is bubblewrap; network allowlist is kaged-managed

Status: Accepted
Date: 2026-05-21
Last amended: 2026-05-21
Deciders: @karasu
Supersedes: —
Superseded by: —
Follows from: RFC-0002

Context

RFC-0002 explores the Linux sandboxing landscape — bwrap, firejail, podman-rootless, nsjail, systemd-nspawn, and a pluggable scheme — against kaged's constraints:

The cage's DSL contract (ADR-0006) declares fs allowlist, net allowlist, state ephemerality.
Must run on small Linux hosts (ARM64, modest resources) and on desktop Linux.
No Docker daemon, no setuid binaries, no kernel patches, no extras at install time.
Spawn cost must be low — subagents are units of work, not services.
The supervisor needs to stream stdout/stderr, signal, reap.
v0 is Linux-only. macOS/Windows deferred.

The cage is brand-critical: "your agents are caged" is one of the top three pitches. The mechanism has to back that claim.

The RFC's lean is bwrap as the default mechanism, with the network allowlist as a kaged-managed concern on top.

Decision

The [CAGED] mechanism in v0 is bubblewrap (bwrap). The kaged daemon's subagent supervisor spawns every subagent under bwrap with a cage policy compiled from the project DSL. Network allowlisting is enforced by kaged itself via a per-cage network namespace plus a kaged-managed DNS+nftables setup. Resource limits use cgroups via systemd-run when systemd is available, with a documented non-systemd fallback. A default seccomp profile blocks dangerous syscalls; the DSL may relax it explicitly per cage.

What the daemon ships

bwrap as the spawn mechanism. Required on every kaged host. Documented in install prerequisites.
A cage compiler (packages/sandbox/) that translates a DSL cage: block into a bwrap argv plus auxiliary network setup.
A network gatekeeper. A small kaged-internal service (per host, not per cage) that owns DNS resolution and nftables rule generation for every running cage's network namespace.
A default seccomp profile. Blocks ptrace, kexec_load, init_module, keyctl, mount, kernel module load, and similar host-impacting calls. Lives at a known path inside the daemon and is loaded for every cage.
A cgroups wrapper. When systemd is detected, spawn through systemd-run --scope --slice=kaged.slice for resource limits. Otherwise, fall back to direct cgroup v2 manipulation (cgroup-tools or raw /sys/fs/cgroup/... writes) with documented limitations.

Network allowlist enforcement (the hard part)

We commit to a phased rollout, documented in the spec (docs/specs/sandbox.md):

Phase 0 — binary net gate (cuttable for very early v0): A cage with an empty net.allow: gets --unshare-net (no network at all). A cage with net.allow: ["*"] gets the daemon's network. Anything in between is rejected by the parser with "hostname allowlisting is not yet supported in this kaged version." This unblocks the rest of the daemon work.
Phase 1 — kaged-managed allowlist (v0 release target): Per-cage network namespace + veth pair to the daemon's namespace. A kaged-managed userspace SOCKS5 proxy on the daemon side does hostname-aware filtering. A kaged-managed resolver in the cage's netns only resolves allowlisted glob patterns. nftables drops everything not destined for the proxy.
Phase 2 — kernel-level filtering (v1): Move hostname filtering closer to the kernel: kaged installs nftables rules driven by SNI/HTTP-Host introspection, eliminating the userspace proxy hop. Stretch goal.

The supervisor never lets a cage run without one of these enforcement states active. Network setup failure = subagent spawn failure.

Default cage profile

When a subagent declares no cage: block, the implicit cage is maximally restrictive:

fs: [] (no host filesystem access; subagent sees only a tmpfs /, /tmp, and a minimal /usr if explicitly requested)
net: [] (no network)
state: ephemeral
seccomp: default profile applied
cgroups: default limits (e.g., 256MB RAM, 1 CPU)

There is no "wide open" implicit cage. Every grant is an explicit decision in the DSL.

Threat model (the line we're committing to)

In scope (kaged is responsible for):

A subagent breaking out of its bwrap mounts and reading host paths not in its fs allowlist → P0 bug
A subagent reaching a network destination not in its allowlist → P0 bug
A subagent persisting state to host disk when declared ephemeral → P0 bug
A subagent signaling the daemon or another subagent → P0 bug
A subagent escalating to root on the host → P0 bug
A subagent exfiltrating other subagents' memory → P0 bug
The supervisor failing to apply a cage policy at all → P0 bug

Out of scope (kaged cannot defend against):

Linux kernel CVEs that allow namespace escape from any sandbox using namespaces. (We document the kernel version baseline.)
The operator deliberately granting wide-open cages and being surprised when subagents act on that grant.
LLM-provider-side data leakage (the model exfiltrates secrets through its outputs to its hosted provider). This is a model-trust problem, not a sandbox problem. Documented as a separate concern.
Side-channel attacks (CPU caching, Spectre-class). We are not building a CPU sandbox.
DoS by a misbehaving subagent that exhausts its cgroup limits. The supervisor kills and reports; we do not promise it never happens.

Consequences

What this commits us to

A packages/sandbox/ package that owns the cage compiler, network gatekeeper, seccomp profile, and cgroups wrapper. This is a real component, not a thin shell-out.
Sustained engineering on the network gatekeeper. The userspace proxy + resolver is non-trivial code. It is a security-critical service.
bwrap as a runtime dependency. Documented prerequisite. The installer / first-run check verifies bwrap is present.
Linux-only v0. macOS and Windows are an explicit deferral until we have an analogous sandbox story per platform.
A kernel version baseline. We document the minimum supported kernel (likely 5.10+ for stable user-namespace support on the small-host hardware we target).
Sandbox tests on every CI run. Including escape attempts. A test that should fail (the subagent tries to read /etc/shadow) must fail every build.
A documented "what the cage cannot defend against" page. Operators get an honest threat model, not marketing.

What this forecloses

No firejail in v0. Operators who prefer it wait for the pluggable-sandbox story in v1.x.
No podman-as-cage in v0. Same. The container/image use case is real but adds significant scope.
No macOS / Windows v0 support. Operators on those platforms run kaged in a Linux VM.
No skipping the network gatekeeper for "speed." Even when the cage is effectively open, the gatekeeper is in the path. This is a security property, not a performance choice.
No relying on Docker semantics anywhere. Even when we add a podman adapter later, the bwrap path remains the reference cage. The DSL is bwrap-shaped.

What becomes easier

The cage block in the DSL maps cleanly to a single mechanism. fs entries → --bind/--ro-bind. net allowlist → gatekeeper config. state ephemeral → --tmpfs. Operator mental model = the implementation.
Subagent spawn is fast. Tens of milliseconds, which makes the "primary delegates to subagent, subagent finishes, primary reads result" pattern feel responsive.
Cage policy is auditable. The supervisor logs the effective bwrap argv and gatekeeper rules for every spawn. Operators can grep their audit log to verify the cage they declared is the cage that ran.
Small-host deployment is realistic. No image registry, no daemon, no setuid surface.
Brand promise is mechanically backed. "Caged" is not a marketing word; it's the file packages/sandbox/cage.ts.

What becomes harder

The network gatekeeper is real engineering. A correct userspace proxy that hostname-filters TCP+TLS is not trivial. We accept that scope.
Debugging a misbehaving cage is more involved than debugging a misbehaving process. Operators learn to read bwrap argv and to use the daemon's "inspect cage" UI. Documentation tax.
Per-cage netns has overhead. Spawning a netns + setting up nftables is on the order of 10-100ms. We design for it but it's a real cost.
The seccomp default will occasionally bite a legitimate subagent. We need an escape hatch (cage.seccomp: relaxed or per-syscall opt-in) and a clear error message when a subagent is killed by the filter.
Pluggable sandboxing is now a v1 concern, not v0. Operators who want podman cages have to wait. We accept the pressure.

Open spec questions (not load-bearing; resolved in `docs/specs/sandbox.md`)

Exact seccomp profile. Which syscalls block, which warn, which allow. Reference Flatpak's profile and Docker's default profile as starting points.
cgroups defaults. Memory limit, CPU shares, PIDs limit. v0 defaults to be tuned for low-resource ARM64 hardware as the lower bound.
DSL net allowlist grammar details. Glob patterns (*.bandcamp.com), exact CIDR support, port restriction (hostname:443 vs hostname).
Gatekeeper resolver caching. How long does the resolver cache an allowlisted hostname's resolution? Stale cache vs new IP-vs-old-IP edge cases.
State semantics naming. ephemeral vs scratch vs persistent — final names land in the DSL spec.
How the operator inspects a running cage. The UI shows effective policy; the spec defines the API endpoint.

Alternatives considered

(Full design-space exploration in RFC-0002. Summarized here.)

Alternative A — firejail

Why rejected: Setuid binary is a trust smell for a tool whose pitch is "caged agents." Historical CVE record in the setuid path. firejail's profile grammar is less surgically expressive than bwrap's bind-mount model. The convenience of pre-shipped profiles doesn't apply when we're translating from a kaged-specific DSL anyway.

Alternative B — podman rootless

Why rejected for v0: Spawn time is too slow for our usage profile (subagents as units of work). Each cage becoming a container adds an image-management burden to the operator that the DSL does not currently model. Rootless setup overhead on small Linux hosts (subuid/subgid mappings, fuse-overlayfs) is real friction. The "I want versioned scraper:latest" use case is legitimate but premature.

We will likely add podman as a v1.x cage type for cages that want image semantics. The DSL would gain cage.image: registry.example.com/scraper:1.2.3 as an opt-in. Default cage stays bwrap.

Alternative C — nsjail

Why rejected: No clear advantage over bwrap for our use case. Smaller community. Less commonly packaged on non-Debian distros. The richer policy-file format is nice but we already template a config file (the DSL); we don't need a second one.

Alternative D — systemd-nspawn

Why rejected: Designed for OS-image containers, overkill for our binary-with-mounts profile. systemd-only, cuts off non-systemd hosts. Heavier than bwrap with no compensating benefit for our use case.

Alternative E — Pluggable from day one

Why rejected for v0: "We support all sandboxes" is the most common form of "we don't have a default that works." Three real adapters means three test matrices and three sets of escape-test investigation. v0 ships one cage mechanism, well-tested, with a documented threat model. Plugin sandboxing comes in v1.

Amendments

2026-05-21 — Sandbox is optional; two opt-out paths

The sandbox is enabled by default on every subagent in every project. The original ADR text could be read as making it mandatory. This amendment clarifies: per the project's "we don't lock anyone into anything" posture, the operator may opt out at two levels.

1. Per-subagent: cage: disabled in the DSL.

A subagent may declare cage: disabled instead of a cage block:

subagents:
  - name: deployer
    model: claude:sonnet-4.6
    system_prompt: ./prompts/deployer.md
    can_be_called_by: [primary]
    cage: disabled

Semantics:

The supervisor spawns the subagent as the daemon's own UID, with no bwrap wrapper, no network namespace, no seccomp filter, no cgroup limits beyond whatever the daemon itself runs under.
The subagent has full read-write access to the host filesystem that the daemon user has.
The subagent can reach any network destination the daemon user can reach.
The subagent can invoke any host binary, including sudo if the daemon user is sudoers.

The honest framing in operator docs: a cage: disabled subagent IS your daemon's hands. Same UID, same access, same blast radius. There are no half-measures. If you write cage: disabled on a subagent, you have decided that subagent is your hand-extension and you trust it accordingly.

Use cases:

A deployer subagent that needs kubectl, git push, signed-image-build access, or anything else requiring real host capabilities.
A backup subagent that needs to read everywhere and write to an external mount.
Operator's-own-tools workflows where caging would just produce friction without security benefit.

When NOT to use cage: disabled:

A subagent processing untrusted input (a scraper, a webhook handler, a parser of operator-supplied files). Untrusted input + uncaged subagent = the model is now an attack vector with full host access.

Per-subagent warning UX:

The web UI's project view shows uncaged subagents with a magenta [UNCAGED] badge (parallel to the standard [CAGED] badge but inverted in tone).
The DSL validator emits a warning at parse time for every cage: disabled entry: subagents[N].cage: disabled — this subagent runs as the daemon user. see ADR-0009.
The audit log records subagent.spawn.uncaged for every spawn of an uncaged subagent (in addition to the normal spawn event).
The session UI shows the cage policy (or "DISABLED") on every subagent invocation.

2. Per-daemon: --no-sandbox global flag.

The operator may run the entire daemon with sandboxing disabled:

CLI: kaged start --no-sandbox (or KAGED_NO_SANDBOX=1 env var).
Semantics: every subagent in every project on this daemon runs as if it had cage: disabled, regardless of what its DSL declares. The DSL's cage block is parsed and validated (so the file is still portable to a sandboxed daemon) but it is not enforced.
Use cases: dev machines where the operator is iterating on prompts and doesn't want sandbox-debugging in the loop. Single-user trusted hosts where cage: blocks would just be ceremony.

Warning UX for --no-sandbox (matches the --insecure pattern from ADR-0007):

Surface	What appears	When
CLI startup	Multi-line warning block naming `--no-sandbox` and a link to this ADR	Every daemon start
Web UI banner	Persistent magenta banner: `SANDBOX DISABLED — ALL SUBAGENTS RUN AS DAEMON USER`	Every page render
Web UI splash	Modal on first session of each day requiring "I understand" click	First session per day
HTTP responses	`X-Kaged-Warning: no-sandbox` header	Every response
Audit log	`sandbox.disabled` event at startup	Every daemon start
Subagent badges	Every subagent shows `[UNCAGED]` regardless of its DSL	Always

Combining --insecure (no auth) with --no-sandbox (no cage) is allowed but the banner is very magenta: INSECURE — NO AUTH — NO SANDBOX. The audit log records the combination explicitly. We do not refuse the combination — operator's principal, operator's call — but we make it impossible to miss.

3. What is NOT in v1.

These were considered and deferred:

Per-project sandbox: disabled (whole-project opt-out from the project DSL top level). Deferred — the existing per-subagent cage: disabled covers most use cases, and per-project would tempt operators into "I'll just disable it for the whole project while I prototype" patterns that survive into production. If operators ask for it, we add it as a minor.
Per-task ephemeral override (clicking "run uncaged this once" in the UI). Deferred to v1.x. Adds a third concept (task-level cage override) that complicates the audit story.
Three-state cage (enabled / relaxed / disabled where relaxed means "default seccomp only, no fs/net cage"). Deferred — disabled is honest, enabled is the contract; we don't ship a half-measure that operators will misread as "partially safe."

Updated threat model:

In-scope items remain the same when the sandbox is enabled. When opted-out (cage: disabled or --no-sandbox), the operator has explicitly removed kaged from the protection path. kaged's only obligation in that mode is to be loud about it — the warnings above. The "caged" brand promise holds when the cage is on; when the operator turns it off, the cage promise does not apply and the operator-facing surface says so plainly.

This is the manifesto principle 1 ("The operator is the principal") plus principle 5 ("Sandbox by default, escalate by intent") working together: default-on enforcement, explicit opt-out, no surprises either way.

References

RFC-0002 — design-space exploration
docs/02-architecture.md — architecture sketch of the sandbox component
docs/03-glossary.md — cage, subagent, supervisor, insecure mode
ADR-0006 — the DSL that declares cage policies
ADR-0007 — parallel opt-out via --insecure
docs/brand/brand-guide.md — the "caged" brand promise this ADR backs
bubblewrap: https://github.com/containers/bubblewrap
Flatpak's bwrap usage: https://docs.flatpak.org/en/latest/sandbox-permissions.html
nftables: https://wiki.nftables.org/
Original discussion: design conversation with colleagues, 2026-05-21
Amendment: colleagues, 2026-05-21

2026-05-26 — Cage location moves to per-agent `AgentSpec.cage` (ADR-0022)

Sandbox mechanism is unchanged (still bwrap, still the same threat model, same seccomp profiles). Cage location in the DSL moves from subagents.<name>.cage + cage_defaults to per-agent AgentSpec.cage per ADR-0022. The primary agent now has a cage field; cage_defaults is removed. Until the supervisor is extended to spawn the primary in its own bwrap context, the only legal value for the root agent's cage is disabled — the parser emits an error for any other value. A follow-up ADR (or amendment to this one) schedules the supervisor work for primary cage enforcement.