ADR-0031: An assistant turn is an ordered transcript of parts, not a flattened bubble

  • Status: Proposed
  • Date: 2026-06-03
  • Deciders: @karasu
  • Supersedes:
  • Superseded by:

Context

ADR-0016 established the streaming-first session UI: live tokens, thinking visibility, deferred persistence. It got the transport right — the harness emits an ordered HarnessOutputEvent stream (message.start → interleaved message.delta {kind} / message.tool_call / message.tool_resultmessage.end) and the daemon relays it on the output channel with a monotonic per-channel seq. It did not specify how that ordered stream is assembled into a turn for display, and the implementation that grew under it buckets the stream by kind. The result is the bug this ADR fixes.

What the operator sees today

A single primary turn renders as three stacked sections, in this fixed order regardless of what actually happened:

  1. one collapsible thinking blob at the top (all reasoning across the whole turn, concatenated),
  2. the text as one Markdown blob,
  3. a flat list of tool cards at the bottom.

A real discovery loop does not happen in that order. A trace of the first working loop (primary, kimi-k2p6-turbo, 11 model steps, ~70s) makes this concrete. The persisted content field reads:

"Let me read the key files to understand the DSL schema and API types.Let me read the existing task screen and the router to understand the current task UI surface.Now I have a complete picture. Let me create the implementation task based on this discovery.Done. I've created #11…"

Those are four separate text segments emitted in four different model steps, with tool batches (search_glob ×4, file_read ×5, search_grep ×3, …, kaged_issue_create) firing between them. Flattened, they read as glued-together nonsense ("API types.Let me read") followed by a wall of ~30 tool cards divorced from the prose that motivated each one. The narration "Let me read the key files… Done, I've created #11" only makes sense with the reads sitting inside it. As a paragraph with the tools swept to the bottom, it is, in the operator's words, stupid.

This directly violates the manifesto's operator-visibility commitment. kaged exists so the operator can see what the agent did. A turn is a sequence of actions and the reasoning around them; collapsing it into a press-release paragraph plus an appendix of tool calls hides the actual causal flow — exactly the opacity kaged was built against. Compare opencode, which renders the turn as a literal timeline: think → text → tool → think → text → tool → … in the order it occurred.

The load-bearing fact: this is not a storage problem

The ordered timeline already exists in storage, intact, today. persistAssistantMessage (packages/daemon/src/runtime/primary-runner.ts) writes metadata.contentBlocks = assistant.content — the full (TextContent | ThinkingContent | ToolCall)[] array in chronological order, spanning all loop steps (this is why text from step 1 and step 4 sit adjacent in the flattened field — they were adjacent in contentBlocks, separated by the tool blocks the flattener dropped). metadata.toolResults holds each result keyed by tool-call id. No schema migration is required. No new columns. The data is right; the assembly is wrong.

The flattening happens in exactly two places, both downstream of storage:

  1. reconstructMessageParts (session-handlers.ts) — builds the REST parts[] array. It already interleaves text + tool_call + tool_result in order (tests confirm: ["text","tool_call","tool_result","text"]). But it hoists thinking out of the timeline into a separate concatenated thinking string (thinking += block.thinking), discarding its position. content (the flattened text) is carried alongside as a peer field.

  2. MessageBubble / StreamingBubble (session-messages.tsx, session-pane.tsx) — render enrichment.thinking (blob) → <Markdown>{message.content}</Markdown> (blob) → message.parts.map() (tool cards only; text parts are explicitly return null-ed, so the ordered text in parts is thrown away in favour of the flattened content). The live StreamingState mirrors this: separate text, thinking, and toolCalls[] accumulators, so even mid-stream the interleaving is destroyed.

So the "bubble" is a lossy view over a faithful record. The fix is to make the view faithful too — and it is mostly deletion of the bucketing logic.

Decision

A model turn is rendered as an ordered transcript of typed parts — thinking, text, and tool_call+tool_result — in the exact sequence the model produced them, both live and persisted. parts[] (carrying thinking as a first-class ordered part) becomes the single source of truth for rendering a turn. The flattened content string is demoted to a derived field used only for LLM history reconstruction, search, and fallback rendering of legacy messages that predate parts.

The "bubble" stops being one fluent reply with a thinking header and a tool appendix. It becomes a transcript.

Specifics

  1. thinking becomes an ordered MessagePart. Extend the MessagePart union (in both packages/daemon/src/runtime/session-handlers.ts and packages/ui/src/lib/api-types.ts) with { type: "thinking"; thinking: string }. reconstructMessageParts pushes thinking blocks into parts in position instead of concatenating into a side string. The function's separate thinking return value is retained transitionally for back-compat hydration but is no longer the rendering input.

  2. parts[] is the render contract. MessageBubble renders message.parts in order: thinking → collapsible <details> (tier-5 reasoning surface, kaged-prose-thinking, opacity 0.8, no motion); text<Markdown>; tool_callToolCallCard paired with its tool_result; an orphan tool_result (no matching call in the same turn) renders standalone. The top-level enrichment.thinking blob and the standalone <Markdown>{message.content}</Markdown> are removed from the model-turn render path.

  3. content is demoted, not deleted. It remains the flattened concatenation of text parts. It is the input to reconstructMessages / reconstructCompactableMessages (LLM history — unaffected, already correct), to search, and to the fallback render branch: when parts is absent (legacy messages persisted before this ADR), MessageBubble falls back to the current content + hoisted-thinking rendering. No migration; old turns degrade gracefully to the old look.

  4. Live streaming interleaves. Replace StreamingState's { text, thinking, toolCalls[] } buckets with a single ordered parts: StreamingPart[] accumulator. The output-channel frames already arrive in stream order with monotonic seq; message.delta already carries kind: "text" | "thinking". The accumulator appends to the current open part when the incoming kind matches, and opens a new part when the kind changes or a tool_call/tool_result arrives. No new transport field is needed — arrival order is the order. StreamingBubble renders the same transcript component as MessageBubble, so live and persisted views are identical by construction (the streaming view simply has a trailing open part and [running] tool states).

  5. Per-step thinking is preserved, not merged. The persisted turn is the ordered concatenation of every loop step's content blocks. Implementation must confirm the harness retains each step's thinking block as a distinct, positioned entry in assistant.content (rather than dropping or merging reasoning across steps); if it currently merges, that is fixed here so each reasoning burst sits with the actions it preceded.

  6. No new badge vocabulary; align the tool card to the formal set. Per the brand guide, tool-call state uses the bracket-lock badges already standardised: [RUNNING] (amber), [OK] (cyan), [FAILED] (magenta). ToolCallCard's current ad-hoc lowercase running/done/error is brought onto the StatusBadge vocabulary. Code surfaces stay tier-5 sacred: zero animation on the transcript body, syntax-highlighted JSON in the brand palette only.

Consequences

What this commits us to

  • parts[] is the canonical render shape for a turn. Any new content kind (e.g. images, citations) is added to the MessagePart union and gets a position in the transcript, not a new stacked section.
  • Live and persisted renders share one transcript component. They cannot drift, because there is one renderer.
  • The content field is permanently a derived projection of parts, never the inverse. Writers populate contentBlocks; content is computed from it.

What this forecloses

  • The "executive summary on top, receipts on the bottom" layout. If a future need arises to also show a turn-level summary, it is an explicit additional element, not a replacement for the transcript.

What becomes easier

  • Reading a discovery loop. The reasoning sits with the reads it motivated; the issue-create sits under the prose that announces it.
  • Subagent turns, replays, and (later) guest read-only session views — all render through the same faithful transcript.
  • Debugging agent behaviour from the UI alone, without dropping into a Langfuse trace to recover ordering.

What becomes harder

  • Nothing structurally. The change is net-negative LOC in the render path (bucketing logic deleted). The one cost is a transitional fallback branch for legacy parts-less messages, retired whenever those age out.

Alternatives considered

Alternative A — Persist one message per loop step

Stop aggregating the 11 steps into one primary record; write a separate message per step so ordering falls out of created_at. Rejected: it fragments the turn at the storage layer for a presentation problem the storage layer doesn't have. contentBlocks already preserves cross-step order in one record; splitting would complicate compaction (reconstructCompactableMessages couples tool pairs per record), checkpointing, and the run↔message relationship for zero gain.

Alternative B — Add an explicit contentIndex/timestamp to every part

Carry a positional index or per-part timestamp through the transport and persist it. Rejected as over-engineering: contentBlocks is already an array (order is intrinsic), and the WS stream already arrives in order with monotonic seq. Position is free; a parallel ordering field is redundant state that can disagree with the array it describes.

Alternative C — Render-only fix, leave thinking hoisted

Fix MessageBubble to interleave text and tools from parts, but keep thinking as the top blob (it's "just reasoning"). Rejected: the whole point is the timeline. A thinking burst that precedes a specific tool batch is evidence of why that batch ran; floating it to the top severs that. Thinking is a positioned part like any other.

Amendment checklist

Doc-first, then TDD. Accept this ADR, then execute in one chain:

Specs

  • docs/specs/http-api.mdoutput channel table: document message.delta.kind, message.tool_result.{tool_call_id,is_error} (the table currently omits both, though the implementation in ws-registry.ts already sends them). REST GET …/messages: document the parts[] shape including the new thinking part, and state that content is a derived/compat field, not the render source.
  • docs/specs/agent.md — note that turn rendering consumes parts order positionally; HarnessOutputEvent ordering is the contract (no new field). Confirm per-step thinking retention requirement.
  • UI session-render spec (the message-stream surface under docs/specs/ui/) — replace the "thinking header / content / tool list" description with the ordered-transcript model; specify the legacy fallback branch; specify shared component for live + persisted.
  • docs/brand/brand-guide.md — no change needed; cite the existing badge vocabulary ([RUNNING]/[OK]/[FAILED]) for the tool-card alignment.

Code

  • packages/ui/src/lib/api-types.ts — add { type: "thinking"; thinking: string } to MessagePart.
  • packages/daemon/src/runtime/session-handlers.ts — add the thinking variant to the daemon MessagePart union; reconstructMessageParts pushes thinking in order; mapMessageItem keeps emitting content (compat) and may drop the separate thinking field once the UI no longer reads it.
  • packages/ui/src/components/screens/session-messages.tsxMessageBubble renders parts as an ordered transcript; remove the standalone content blob and enrichment.thinking header from the parts-present path; keep the fallback for parts-absent messages. Extract a shared TurnTranscript component. Align ToolCallCard state badges to StatusBadge.
  • packages/ui/src/components/screens/session-pane.tsxStreamingState.parts: StreamingPart[] replacing the text/thinking/toolCalls buckets; handleOutput appends/opens parts by arrival; StreamingBubble renders TurnTranscript.
  • packages/harness/ — verify (and fix if needed) that per-step thinking blocks survive into assistant.content in order.

Tests

  • message-parts-reconstruction.test.ts — assert thinking appears as a positioned part, e.g. ["thinking","text","tool_call","tool_result","thinking","text"]; assert multi-step interleaving is preserved.
  • UI component test — TurnTranscript renders parts in array order; parts-absent message uses the legacy fallback.
  • UI streaming test — interleaved delta(text) / delta(thinking) / tool_call / tool_result frames produce an ordered transcript matching final persisted order.

References

  • ADR-0016 — streaming-first UI (this refines its render model; transport unchanged)
  • ADR-0012 — Mastra substrate; per-turn aggregation lives in the harness
  • packages/daemon/src/runtime/primary-runner.tspersistAssistantMessage (contentBlocks + toolResults are the faithful record)
  • packages/daemon/src/runtime/session-handlers.tsreconstructMessageParts, MessagePart
  • packages/ui/src/components/screens/session-messages.tsx, session-pane.tsx — the two flattening render paths
  • packages/daemon/src/runtime/ws-registry.tsharnessEventToPayload (frames already carry kind and ordering via seq)
  • docs/specs/agent.mdHarnessOutputEvent shape
  • docs/brand/brand-guide.md — tier-5 sacred code surfaces, motion ladder, bracket-lock badge vocabulary
  • Trace bdc19ad5… — the first working loop; the concrete evidence