ADR-0016: Streaming-first UI — live data and operator abort are non-negotiable

Status: Accepted
Date: 2026-05-24
Deciders: @ash
Supersedes: —
Superseded by: —

Context

kaged is an operator-facing daemon that dispatches LLM requests on the operator's behalf. LLM calls are slow (seconds to minutes for thinking models), expensive, and can go wrong mid-stream. The operator needs three capabilities at all times:

See what the model is producing as it produces it. Waiting for a complete response before showing anything is unacceptable — it turns a 30-second thinking-model call into a 30-second blank screen.
Stop a run the moment it goes wrong. If the model is hallucinating, looping, or burning tokens on the wrong approach, the operator must be able to abort immediately — not after the response is complete.
Know the current state of a run without polling or refreshing. Session state (idle → running → done/failed), message arrival, and error conditions must propagate to the UI in real time via the existing WebSocket channel.

These are not features. They are the minimum viable interface for a tool that spends the operator's money and time on their behalf.

Decision

kaged's UI will always stream live data from active runs and provide immediate abort capability. No AI-related operation may run without the operator being able to observe and stop it.

Concretely:

1. The daemon publishes lifecycle events on the `events` WebSocket channel

When a run starts, the daemon publishes run.started on the events channel. When a run ends (success, failure, or abort), it publishes run.ended. These events trigger query cache invalidation in the UI so session state and message lists stay current without manual refresh.

2. The `output` channel carries streaming tokens to the UI

The daemon already publishes message.start, message.delta, message.end (and tool events) on the output channel via publishHarnessEvent. The UI must consume these — accumulating deltas into a live in-flight message bubble — rather than silently dropping them.

3. The operator can abort any active run

The daemon maintains a per-run AbortController registry. The existing POST /sessions/:id/runs/:rid/cancel endpoint signals the controller, which propagates through the harness to the LLM provider's SSE stream. The UI surfaces a stop button whenever a run is active.

4. Thinking tokens are visible by default

message.delta events carry a kind field ("text" or "thinking"). The UI renders both. A future preference toggle may hide thinking tokens, but the default is visibility — the operator should see what the model is reasoning about.

5. Persistence is deferred, not blocking

The streaming path is: LLM → harness → WS → UI (real-time). Persistence to SQLite happens after the stream completes (or on abort). The UI never waits for persistence to display content. This is the "stream first, store at end" principle.

Consequences

What this commits us to

Every daemon code path that starts a run must publish run.started/run.ended on the events channel.
Every daemon code path that starts a run must register its AbortController so the cancel endpoint can reach it.
The UI must handle output channel frames and render streaming content — no silent drops.
The UI must show a stop/abort control whenever session.state === "running".

What this forecloses

"Fire and forget then poll" patterns for LLM dispatch. The WebSocket is the primary data path; REST is the persistence/query layer.
Hiding model output until the response is complete. Buffering is not allowed as a default.

What becomes easier

Operator trust — they can see what's happening and stop it.
Debugging — streaming output reveals model behavior in real time.
Cost control — abort before the model burns through a long, wrong response.
Thinking model support — extended thinking is visible, not a black box.

What becomes harder

UI complexity increases — streaming state management (in-flight bubbles, partial content, abort transitions) adds real React state.
Testing streaming paths requires more infrastructure than testing REST-only flows.

Alternatives considered

Alternative A — Poll-based refresh

Periodically refetch messages via REST. Simpler UI code, but introduces latency (poll interval), wastes bandwidth, and still requires a manual refresh if the poll interval is too long. For a 30-second thinking-model call, any reasonable poll interval either misses the action or hammers the server. Rejected: fundamentally wrong for real-time operator control.

Alternative B — Stream but don't show thinking tokens

Only render kind: "text" deltas; suppress kind: "thinking". Reduces visual noise for non-technical operators. Rejected as default: the operator is choosing to run a thinking model; hiding the thinking defeats the purpose. Retained as a future user preference toggle (opt-out, not opt-in).

References

docs/specs/daemon.md — daemon spec, WebSocket framing, events channel
docs/specs/http-api.md — session/run endpoints including cancel
packages/daemon/src/runtime/ws-registry.ts — publishHarnessEvent on output channel
packages/harness/src/types.ts — HarnessOutputEvent type definitions
packages/ui/src/lib/use-session-socket.ts — WebSocket hook with handleEventInvalidation