ADR-0007: Authentication is an OAuth proxy sidecar in front of the daemon

Status: Accepted
Date: 2026-05-21
Last amended: 2026-05-21 (per-user mode)
Deciders: @karasu
Supersedes: —
Superseded by: —

Context

kaged is reachable from the public internet through a tunnel (Cloudflare Tunnel in the reference deployment). The architecture diagram (docs/02-architecture.md) shows the access path:

operator → tunnel → oauth sidecar → kaged daemon

The daemon must believe two things about every incoming request:

The request crossed the OAuth boundary. It came from an authenticated operator, not an arbitrary internet caller who reached the tunnel hostname.
The identity attached to the request is trustworthy. The daemon should not have to validate JWTs, manage refresh tokens, or know about Google/GitHub/whatever provider the operator chose.

There are a few architectures for this:

Bake auth into the daemon. The daemon implements OIDC, OAuth2, session management, the works. Mature pattern in monoliths.
Push auth to the tunnel layer. Cloudflare Access, Tailscale identity headers, etc. The tunnel terminates auth and forwards identified requests.
Put a sidecar between the tunnel and the daemon. A dedicated reverse proxy (oauth2-proxy, Pomerium, Authentik, traefik-forward-auth) handles the OAuth dance; the daemon trusts headers.
Mutual TLS with operator-issued certs. The operator installs a client cert on every device.

The constraint set:

Multi-device access. Operators use phone, laptop, tablet. Auth flow must work in every browser, not require certs per device. Rules out (4) for v0.
Operator's existing deployment pattern. The operator already runs a Cloudflare Tunnel + OAuth-proxy sidecar pattern across deployed infra. kaged should slot in without reinventing.
The daemon stays simple. kaged's complexity budget is best spent on the sandbox, the DSL, the session manager. Implementing OIDC correctly is its own engineering investment.
No vendor lock-in. Operators not on Cloudflare should still have a sane auth path. The pattern must work behind Tailscale, behind Authelia, behind a homelab Keycloak.
Audit-friendly. Auth events (logins, refreshes, denials) belong in a log the operator can read. They should not be tangled in the daemon's application logs.
No telemetry. Per the manifesto, kaged does not phone home. The auth layer cannot either.

Decision

Authentication for kaged is performed by a separate OAuth proxy sidecar process that sits in front of the daemon. The daemon trusts identity headers set by the sidecar. The daemon implements zero OIDC/OAuth flow logic. The sidecar's specific choice (oauth2-proxy, Pomerium, Authelia, traefik-forward-auth, or a hand-rolled equivalent) is the operator's call; the daemon only requires a documented header contract.

The header contract

The daemon expects every request to carry:

X-Kaged-User-Id — a stable identifier for the operator (email, SSO subject claim, or equivalent). Required. Absence = 401.
X-Kaged-User-Email — operator's email, for display and audit. Optional but recommended.
X-Kaged-User-Groups — comma-separated group list, for the eventual multi-operator phase. Optional in v0 (single operator).
X-Kaged-Auth-Nonce — a per-request token the sidecar and daemon share, set at startup. Prevents header spoofing if the daemon is ever exposed without the sidecar.

The daemon rejects any request missing X-Kaged-User-Id or with a wrong X-Kaged-Auth-Nonce with HTTP 401. There is no fallback "well it came from localhost so we'll allow it" — localhost binding is the operator's choice in the deployment topology, not a trust assertion in the daemon code.

The bind contract

By default the daemon binds to 127.0.0.1 (or a Unix socket). The sidecar is the only client that should reach it. Operators who bind the daemon to 0.0.0.0 and expose it directly are making an explicit, documented operational mistake; kaged will warn at startup.

Reference deployment

The documented "blessed" deployment uses:

Tunnel: Cloudflare Tunnel (cloudflared).
Sidecar: oauth2-proxy configured with whichever OIDC provider the operator wants (Google, GitHub, GitLab, Authentik, Keycloak, etc.).
Daemon: Bind to 127.0.0.1:<port>; sidecar upstreams to that port; injects the X-Kaged-* headers from the OIDC id_token claims.

The kaged repo ships an example compose.yaml and systemd units for this topology in examples/deployment/cloudflare-oauth2-proxy/.

Alternative deployments

Operators on different stacks substitute the sidecar:

Tailscale Funnel + tsidp/tsnet identity: The sidecar consumes Tailscale identity headers and re-emits them as X-Kaged-*. A small adapter, documented.
Authelia: Use Authelia's forward-auth mode; bind it to the same upstream contract.
Self-hosted Keycloak + traefik-forward-auth: Same.
Local-only: For development, an example "passthrough" sidecar that injects fixed headers. Strictly local-only; documented as such.

The kaged daemon does not care which the operator chose, as long as the header contract is met.

Consequences

What this commits us to

A header contract that is now a public API. Sidecar authors target it. Once cut, breaking it requires a major version bump and a migration note for every deployment.
A X-Kaged-Auth-Nonce shared-secret mechanism. Generated at daemon startup, passed to the sidecar via env var or config file, rotated on restart. The daemon ships kaged auth nonce to print the current value for sidecar configuration.
Documentation of reference deployments. Cloudflare + oauth2-proxy is the blessed path. Others are listed but not blessed. We test the blessed path in CI (or as part of release validation).
A startup self-check. The daemon refuses to start if it detects it's bound to a public interface without a sidecar (heuristic: bound to non-loopback + no KAGED_AUTH_SIDECAR_NONCE env var set). Operators can override with KAGED_INSECURE_BIND=1 if they really know what they're doing, and the override is logged loudly.
An audit-log integration. Every authenticated request's user_id is logged. The auth events themselves (login, logout, refresh) live in the sidecar's logs, not the daemon's; the operator's playbook tells them where to look.

What this forecloses

No daemon-side OIDC implementation. We do not implement the OAuth flow. We do not parse JWTs. We do not manage tokens. Operators who want kaged without a sidecar can't have it — they can run the documented passthrough sidecar locally and call that the auth layer.
No "I'll just hit the daemon directly with curl" workflow for ops. Scripts targeting the kaged API target the sidecar's URL with a proper auth header. The daemon's internal API is not a stable public surface for ad-hoc tooling.
No multi-user-from-day-one. v0 assumes a single operator (or a small trusted group all auth'd through the same sidecar). RBAC, per-project ACLs, fine-grained scopes — all v2.

What becomes easier

Reuses operator stacks. Anyone already running Cloudflare Tunnel + oauth2-proxy, or Tailscale Funnel, or Authelia, plugs kaged in without learning a new auth model.
Daemon stays small. No OIDC dependencies, no token caches, no certificate management. The complexity stays in the dedicated tool that already exists for this.
Switching providers is sidecar-only. Operator moves from Google OIDC to a self-hosted Keycloak? Update the sidecar config; kaged doesn't change.
Provides a real session story for the web UI. The sidecar issues a cookie; the browser carries it; the daemon trusts the headers the sidecar attaches. This is exactly the multi-device story we need (ADR-0002).
Telemetry-free by construction. kaged itself never speaks to an identity provider. There's nothing to leak.

What becomes harder

More moving parts. Operators install two processes (sidecar + daemon) plus a tunnel. The blessed-path docs need to make this trivial; the install guide is non-trivially longer than "run a single binary."
Local development friction. No "just hit localhost:port" without auth. We mitigate with the passthrough sidecar for dev, but the friction is real.
One more place for a misconfiguration. A sidecar misconfigured to upstream to the wrong port, or to set wrong headers, or to skip auth in a debug mode — any of these is now part of the operator's responsibility. The startup self-check catches the most dangerous cases; the rest is documentation.
No CLI auth (yet). A future kaged CLI that calls the API needs to go through the same sidecar. We'll define a --auth-token mechanism or a CLI-specific path through the sidecar later. v0 is web-UI-only for non-trivial actions.

Implementation notes (not normative)

Cookies vs Bearer: Sidecar typically sets a cookie for browser sessions. The daemon doesn't read cookies — only headers. The sidecar translates cookie → header in the forward step. WebSocket upgrades carry the cookie too; the sidecar must inject the headers on the upgrade request.
WebSocket auth: The PTY broker and the agent-output stream are WebSocket endpoints. The sidecar must support WebSocket forward-auth. oauth2-proxy and Pomerium both do; document the relevant flags.
CSRF: State-changing endpoints require a CSRF token in addition to the auth cookie. The daemon issues and verifies the token; the sidecar is uninvolved.
Same-site cookies: The cookie the sidecar sets should be SameSite=Lax at minimum; Strict if no cross-origin features are needed.

Alternatives considered

Alternative A — Bake OAuth into the daemon

Why tempting: Single binary deploy. No second process. Total control over the flow.

Why rejected: OIDC done correctly is a real engineering investment we don't need. Multiple maintained sidecars (oauth2-proxy, Pomerium, Authelia, etc.) already do this well. Reinventing them in the daemon means we either ship a worse implementation or spend disproportionate calories matching their feature set. The complexity budget is better spent on cages, DSL, and session management.

Alternative B — Push auth to the tunnel layer (Cloudflare Access)

Why tempting: Operators on Cloudflare get auth "for free" via Access. No sidecar needed.

Why rejected: Couples kaged to Cloudflare. Operators on Tailscale, on a VPN, on a self-hosted reverse proxy don't have an equivalent "tunnel layer auth" — for them, auth is exactly the sidecar pattern we're picking. Better to standardize on the sidecar contract and let Cloudflare Access slot in as one of many ways to satisfy it.

Alternative C — mTLS with operator client certs

Why tempting: No passwords, no OAuth dance, strong cryptographic identity.

Why rejected: Multi-device UX is bad. Provisioning a cert on a phone browser is hostile. Revocation requires CRL/OCSP infra. Operators do this for some homelab use cases, but it's a complement to a web-friendly auth path, not a replacement. We can add mTLS support to the sidecar layer in v1.x for operators who want it.

Alternative D — No auth (rely on tunnel + IP allowlist)

Why tempting: Simplest. Trust the tunnel + a static allowlist.

Why rejected: Insufficient. A static IP allowlist doesn't survive the operator changing networks (LTE, hotel WiFi, mobile data, the whole point of the tunnel). And once you've decided not to trust your IP, you need an actual identity layer. This is the whole reason OAuth proxies exist.

Alternative E — Magic-link / passwordless email auth, daemon-implemented

Why tempting: Simple flow, no provider dependency, single binary.

Why rejected: Requires the daemon to send email. The daemon is supposed to be a sealed, offline-capable resident on the operator's host. Adding SMTP outbound or a third-party email service introduces telemetry-shaped problems. Sidecar handles email-based flows if the operator picks Authelia or similar; kaged stays out of it.

Amendments

2026-05-21 — Trust model and `--insecure` flag

Two clarifications added after the initial accept:

1. The daemon trusts sidecar headers without cryptographic validation.

When the sidecar is configured, the daemon checks that requests carry X-Kaged-User-Id and a matching X-Kaged-Auth-Nonce. The daemon does not validate the user identity against an OIDC provider, does not verify JWT signatures, does not check token expiry. That work was already done by the sidecar — that is the point of having a sidecar.

The trust model is therefore: the sidecar is trusted, the network path between sidecar and daemon is trusted, anything reaching the daemon with a valid nonce is trusted. This means:

The nonce is the only thing the daemon checks. Whoever holds the nonce IS the operator from the daemon's perspective.
The nonce must be kept out of logs, environment dumps, and config files mode-755+.
The daemon's default loopback bind (127.0.0.1) means the only client that should ever see the nonce is the sidecar itself.
We do not test the OAuth flow as part of kaged's CI surface. That's the sidecar's job; sidecars have their own test suites.

This is documented honestly in operator-facing docs: kaged outsources auth, kaged trusts the outsourcee, and that's a deliberate trade for not reinventing OIDC.

2. --insecure flag bypasses auth entirely.

Per the project's "don't lock anyone into anything" posture, the operator may run kaged without any auth layer. Use cases:

Local development. The operator hits http://localhost:port from a browser on the same machine and wants no friction.
Trusted private network. A single-user LAN where the operator wants the daemon reachable from any device without standing up a sidecar.
First-run / install verification. A new operator just wants to see the UI work before configuring auth.

The flag:

CLI: kaged start --insecure (or KAGED_INSECURE=1 env var).
Semantics: no X-Kaged-Auth-Nonce check; no X-Kaged-User-Id requirement; all requests accepted. The daemon synthesizes a user_id of insecure-mode for audit-log purposes.
Network binding: with --insecure the daemon still defaults to 127.0.0.1. Operators who want to expose --insecure to the network must also pass --bind 0.0.0.0 (or equivalent). The combination is doubly-loud (see warning UX below).
No sidecar required. With --insecure the daemon can be reached directly from any client on a reachable network address.

Warning UX (mandatory and un-dismissable):

Surface	What appears	When
CLI startup	Multi-line warning block to stderr naming `--insecure`, the bind address, and a link to this ADR	Every daemon start
CLI commands	Single-line warning prefix on every `kaged` subcommand's output	Every command
Web UI	Persistent magenta banner across every page reading `INSECURE MODE — AUTH BYPASSED` with no close button	Every page render
Web UI splash	Modal dialog on the first session of each day requiring an "I understand" click	First session per day; the banner does NOT go away after acknowledgment
HTTP responses	`X-Kaged-Warning: insecure-mode` header on every response	Every response
Audit log	`auth.insecure_mode` event with timestamp, bind address, and the operator-configured reason (if any)	Daemon start and every session-attach

The magenta color is intentional. Per the brand guide, magenta is the secondary accent for personality and danger states. --insecure is a danger state.

What --insecure does not affect:

The sandbox layer. --insecure only bypasses auth. Subagent cages still apply. To disable the sandbox, see ADR-0009 and --no-sandbox.
The CSRF token check on state-changing endpoints. Even in --insecure, CSRF protection is on. Browser-level safety is independent of operator-level trust.
The audit log. Every action is logged regardless of auth mode. --insecure makes the user_id useless for distinguishing operators, but it does not make the system silent.

Documented thread model with --insecure:

In scope (operator's responsibility): restrict network access to the daemon. Bind to loopback, use firewall rules, or use a VPN. The daemon does no auth; the network layer must.
Out of scope (kaged is not protecting against): anyone who can reach the daemon's address. They are the operator from the daemon's perspective.

This is the "the operator is the principal" manifesto principle taken to its limit: if you say it's fine, kaged trusts you. The warnings exist so you cannot forget you said it's fine.

2026-05-21 — Per-user deployment mode auth default

ADR-0010 introduced per-user as a first-class deployment mode. In that mode, the sidecar contract is overkill: the operator is running their own daemon under their own UID for their own browser. The OS user boundary already separates this operator's kaged from anyone else on the box.

For per-user mode, the default auth model is loopback bind + cookie-bound nonce, no sidecar required.

How it works:

The daemon starts under the operator's UID, binds to 127.0.0.1:<port>. The port is picked by the OS (or operator-configured).
At startup, the daemon generates a per-session nonce and writes it to a file at $XDG_RUNTIME_DIR/kaged/auth-cookie (mode 0600, owned by the operator). The file path is also printed to the daemon's stderr at startup.
The daemon also prints a one-time launch URL to stderr: http://127.0.0.1:<port>/launch?token=<nonce>. The operator opens this URL in a browser.
The /launch endpoint validates the token, sets a long-lived session cookie (kaged_session=<derived-from-nonce>, HttpOnly, Secure=false because localhost, SameSite=Lax), and redirects to /.
Subsequent requests carry the cookie. The daemon validates the cookie against the per-startup nonce. No sidecar headers are required.
CSRF protection still applies on state-changing endpoints, same as before.

This is NOT --insecure. The cookie nonce gates access. Anyone reaching the daemon's loopback without the cookie gets 401 unauthenticated. The "trust the OS user boundary" claim is mechanical: only processes running as this UID can read $XDG_RUNTIME_DIR/kaged/auth-cookie, so only this UID's browsers can pick up the launch token.

Header contract in per-user mode:

The daemon synthesizes the same headers internally that the sidecar would have set in system-wide mode:

Header (synthesized)	Value
`X-Kaged-User-Id`	The operator's UNIX username (`whoami`)
`X-Kaged-User-Email`	`<username>@localhost` (placeholder; no real email is known)
`X-Kaged-User-Groups`	empty

This means every internal code path treats per-user mode and system-wide-with-sidecar mode identically once auth is resolved. There is no per-mode branch in session, project, or audit code — just at the auth gate.

Per-user with --insecure:

Still supported, still loud. Per-user --insecure skips the cookie-nonce check too, so anyone reaching 127.0.0.1:<port> is accepted. Same warning UX as system-wide --insecure.

Per-user behind a tunnel:

The per-user daemon can be put behind a tunnel + sidecar by the operator (Tailscale Funnel + oauth2-proxy is the typical setup). When the daemon detects external auth headers (X-Kaged-User-Id from upstream), the cookie-nonce check is skipped for that request — the sidecar is the authority. Operators configuring this set auth.mode = "sidecar" in local config to make the choice explicit (rather than relying on auto-detect, which is brittle for security gates).

Three resolved modes:

`auth.mode`	What's checked	When to use
`loopback` (default for per-user)	Cookie nonce against per-startup secret	Single-operator, single-machine
`sidecar` (default for system-wide)	`X-Kaged-User-Id` + `X-Kaged-Auth-Nonce` from sidecar	Tunneled or multi-operator; behind oauth2-proxy or equivalent
`insecure`	Nothing	Local dev, trusted LAN, single-trust environments

The values are stable across config sources (env vars accept loopback, sidecar, insecure). --insecure CLI flag remains a shorthand for auth.mode=insecure.

2026-06-11: Unified user identity and shared SSO (ADR-0036)

ADR-0007 stated the daemon validates zero JWTs. ADR-0036 amends that narrowly. When auth.sharedsso.enabled = true, the daemon verifies ES256 token signatures (one JWKS fetch and one signature check via jose, ~50 LOC) at the session-bootstrap endpoint only (POST /api/v1/auth/sso).

The daemon still implements zero OAuth or OIDC flows, meaning no redirects, no token exchange, no refresh, and no provider knowledge.

ADR-0007's actual fear was implementing OIDC. Verifying a signature against a configured issuer isn't that.

Sidecar mode is untouched. It remains the recommendation for tunneled single-operator deployments that don't need in-daemon identity.

A sidecar can't absorb this because it authenticates but can't mint kaged's user sessions, distinguish principal classes, or drive the TOFU lifecycle. The guest and member requirements force identity into the daemon regardless.

See ADR-0036 and docs/specs/sso-relay.md.

References

docs/02-architecture.md — the in-front diagram showing the sidecar's position
ADR-0002 — web-first decision; HTTP/WS auth is the relevant surface
ADR-0009 — the parallel --no-sandbox escape hatch
ADR-0010 — deployment modes that determine the default auth mode
oauth2-proxy: https://oauth2-proxy.github.io/oauth2-proxy/
Pomerium: https://www.pomerium.com/
Authelia: https://www.authelia.com/
Cloudflare Tunnel: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/
XDG Base Directory Specification ($XDG_RUNTIME_DIR): https://specifications.freedesktop.org/basedir-spec/
Original discussion: design conversation with colleagues, 2026-05-21
Amendment (--insecure): colleagues, 2026-05-21
Amendment (per-user mode): colleagues, 2026-05-21
Amendment (unified user identity and shared SSO): colleagues, 2026-06-11