Spec: Agent Harness
- Status: Draft
- Last amended: 2026-06-03 (agent execution limits —
maxStepsandmaxOutputTokenspassed to Mastra; stop reason surface) - Constrained by: ADR-0012, ADR-0013, ADR-0014, ADR-0011, ADR-0004, ADR-0009, ADR-0022, ADR-0023, ADR-0024
- Implements:
packages/harness/(planned)
Purpose
This spec defines the agent harness — the layer that compiles the operator's project DSL into running Mastra agents, wires up the Processor pipeline for prompt observability and policy enforcement, manages provider routing, and integrates with Langfuse for optional tracing.
This document is normative for:
- The DSL → Mastra compilation boundary — how project config becomes
AgentandWorkflowinstances. - The Processor pipeline — the ordered chain that observes, audits, and enforces policy on every LLM call.
- The supervisor/sub-agent topology — how the primary agent dispatches to caged subagents.
- The checkpoint bridge — how Mastra's
suspend/resumemaps to kaged's checkpoint protocol. - The provider configuration — model aliases, fallback arrays, per-call routing.
- The observability integration — Langfuse tracing via Mastra's native exporter, structured-log fallback.
- The prompt management lifecycle — file-watched prompts, hot-reload at message boundaries.
- The cancellation path —
abortSignalwiring from operator cancel to in-flight LLM calls. - The plugin hook firing points — where in
runPrimaryand the message-reconstruction path each lifecycle hook fires (ADR-0023). - Context compaction — when, where, and how kaged-controlled context-window management runs at the harness boundary (ADR-0024).
It is not normative for:
- The tools agents can call (that's
agent-tooling.md). - The session state machine, run lifecycle, or PTY broker (that's
session-manager.md). - The sandbox mechanism (that's
sandbox.md). - The DSL syntax itself (that's
project-dsl.md). - The HTTP/WS API surface (that's
http-api.md). - The daemon process model (that's
daemon.md).
This spec is about how the agent thinks — the substrate beneath the session manager's run model and above the raw LLM provider calls.
Constraints (from ADRs)
| Constraint | Source |
|---|---|
Mastra v1.x is the agentic substrate; version-pinned dependency on @mastra/core |
ADR-0012 |
| kaged owns all prompts — every system prompt readable and editable by the operator | ADR-0012 (manifesto) |
| Mastra Cloud, Studio, Workspace, and RAG primitives are excluded | ADR-0012 |
All provider calls route through @kaged/llm via a LanguageModelV2 shim; no @ai-sdk/<provider> deps |
ADR-0014 |
| Langfuse is optional; kaged runs without it; structured-log fallback | ADR-0013 |
| Prompt management is file-based, not Langfuse-hosted | ADR-0013 |
| Project DSL is the portable artifact; the substrate it compiles to is implementation detail | ADR-0011 |
| Runtime is Bun + TypeScript | ADR-0004 |
| Subagents run in cages; the harness delegates cage spawning to the sandbox subsystem | ADR-0009 |
Project plugins subscribe to lifecycle hooks (on_session_start, on_session_idle, pre_compact, post_compact); the harness is the firing point |
ADR-0023 |
on_session_start and on_session_idle fire only on the primary agent (sessions are primary-owned); pre_compact and post_compact are per-agent |
ADR-0023 |
Context compaction is kaged-owned at the reconstructMessages() boundary; Mastra's internal trimming is neutralized |
ADR-0024 |
| Compaction is pre-call (proactive) with reactive fallback on provider context-length errors | ADR-0024 |
| Compaction is per-agent; subagents inherit defaults from parent | ADR-0024 |
Compactor plugin failures fall back to the drop strategy; compaction never stalls |
ADR-0024 |
| Compaction operates between LLM calls, never during a streaming response | ADR-0024 |
| Tool-call / tool-result message pairs are atomic across compaction | ADR-0024 |
Architecture
┌──────────────────────────────────────┐
│ Session Manager │
│ (dispatches runs, owns lifecycle) │
└──────────────┬───────────────────────┘
│ startRun(session, message)
▼
┌──────────────────────────────────────┐
│ Agent Harness │
│ │
│ ┌────────────────────────────────┐ │
│ │ DSL Compiler │ │
│ │ project.yaml → Mastra config │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Processor Pipeline │ │
│ │ audit → policy → observe │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Provider Router │ │
│ │ alias → model → fallback │ │
│ └────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Checkpoint Bridge │ │
│ │ Mastra suspend ↔ kaged pause │ │
│ └────────────────────────────────┘ │
│ │
└───┬──────────┬──────────┬────────────┘
│ │ │
┌────────▼──┐ ┌───▼────┐ ┌──▼──────────┐
│ @mastra/ │ │Langfuse│ │ Provider │
│ core │ │Exporter│ │ SDKs │
└───────────┘ └────────┘ └─────────────┘
Five subsystems in packages/harness/:
DslCompiler— reads the project DSL and produces MastraAgentconfigurations (primary + subagents),Workflowdefinitions, and tool registrations. The DSL is the input; Mastra config objects are the output.ProcessorPipeline— an ordered chain of MastraProcessorinstances that intercept every LLM call. Responsible for prompt auditing, policy enforcement, and pre-generation observation.ProviderRouter— resolves model aliases from the operator's local config, constructs fallback arrays, and routes eachgenerate/streamcall to the correct provider.CheckpointBridge— translates between Mastra'ssuspend/resumeprimitive and kaged's checkpoint protocol (defined insession-manager.md).ObservabilityExporter— conditionally registers Mastra's@mastra/langfuseexporter when Langfuse is configured; falls back to structured JSON logs to stdout.
Storage strategy (v0)
Mastra Agents can be constructed without memory and without a storage adapter. v0 takes that path: every primary Agent is constructed stateless, and kaged's existing storage layer (@kaged/storage, bun:sqlite) is the source of truth for all session state.
What kaged owns (not Mastra)
- Message history.
MessageRecordrows inbun:sqlite. The harness reads them on every run and reconstructs Mastra'smessagesargument frommessages WHERE session_id = ? AND NOT superseded ORDER BY created_at. - Run state.
RunRecordrows, transitioned by the session-manager state machine. - Checkpoints.
CheckpointRecordrows. The checkpoint bridge (see § Checkpoint bridge) serializes theMessageListMastra was operating on at suspend-time, but the snapshot lives in kaged's table — not Mastra's. - Operator identity.
created_byon session records;X-Kaged-User-Idon requests. Mastra never sees operator identity.
What Mastra owns
- The agent loop semantics — how multi-turn tool calls, supervisor delegations, and
suspend/resumeflow within a singleagent.stream(...)invocation. - The Processor pipeline lifecycle (audit, policy, observability run via Mastra's Processor hook order).
- The provider call mechanics — but only because we plug
@kaged/llm'sLanguageModelV2intoAgent.model(see § Provider abstraction).
Why stateless
- Bun-pure storage. Mastra's default storage adapters (
@mastra/libsql,@mastra/pg, etc.) target Node-shaped runtimes. kaged's storage isbun:sqlite. Constructing Agents stateless removes any need to either ship abun:sqlite-backedMemoryStorageadapter or pull a parallel SQLite driver into the daemon for Mastra's use. - Single source of truth. Two storage layers (kaged's and Mastra's) would diverge under the multi-operator, multi-session shapes the daemon already handles. One is simpler and matches how sessions are already modeled.
- Survives daemon restart. Mastra holds no state across calls; the next
agent.stream(...)reconstructs context fromMessageRecordrows. Restarting the daemon mid-session loses only the in-flight LLM connection (already handled by run-cancellation), not history.
When this could change
A future ADR or amendment may add Mastra-side memory if v0.x or v1 introduces features that require it (e.g., observational memory across runs that aren't fully expressible through kaged's MessageRecord). Per ADR-0014, any Mastra-side persistence would be bun:sqlite-backed via a custom MemoryStorage adapter, not via @mastra/libsql or any better-sqlite3-based dependency.
DSL compilation
Compilation boundary
The project DSL (project-dsl.md) defines agents, their prompts, their tools, and their cages. The harness compiles this into Mastra runtime objects. The operator never sees Mastra types; the DSL is the interface.
project.yaml Mastra runtime
───────────── ──────────────
primary:
model: "fast" → Agent({ model: resolved("fast"), ... })
system_prompt: ./prompts/pri.md → instructions: fileContent("./prompts/pri.md")
cage: disabled → cagePolicy: null (root agent, interim)
tools: → tools: { ...toolRegistry.resolve(["file.*", ...]) }
"file.*": { enabled: true }
"code.lsp": { enabled: true }
# kaged.issue.* and kaged.workflow.* enabled by role-based default (root agent)
subagents:
researcher:
model: "smart" → Agent({ model: resolved("smart"), ... })
system_prompt: ./prompts/res.md
cage: → cagePolicy: { fs: [...], net: [...], ... }
fs: [{ path: ./src, mode: ro }]
net: { allow: ["api.github.com:443"] }
tools: → tools: { ...toolRegistry.resolve(["search.*"]) }
"search.*": { enabled: true }
What the compiler produces
For each project load (or DSL hot-reload), the compiler emits:
interface CompiledAgentNode {
config: MastraAgentConfig; // Mastra Agent config for this node
cagePolicy: CagePolicy | null; // null when cage: disabled (root agent interim)
tools: ResolvedToolSet; // per-agent resolved tools
children: Record<string, CompiledAgentNode>; // recursive subtree
}
interface CompiledProject {
root: CompiledAgentNode; // the primary agent (tree root)
workflows: MastraWorkflowConfig[]; // if DSL declares workflows
promptFiles: PromptFileRef[]; // paths to watch for hot-reload
modelAliases: Record<string, ResolvedModel>; // from local config
}
The CompiledAgentNode is recursive — it mirrors the AgentSpec tree from project-dsl.md. Each node carries its own Mastra config, cage policy, resolved tools, and children. The compiler walks the AgentSpec tree depth-first; project references are flattened into CompiledAgentNode subtrees at compile time (per federated-config.md).
What stays outside Mastra
The compiler deliberately excludes from Mastra config:
- Cage policies. Mastra never sees
CagePolicy. The session manager passes cage policies to the sandbox subsystem when spawning subagent processes. Mastra'sagentsconfig receives the sub-agent as a participant; the sandbox enforces isolation independently. - kaged-internal state. Session IDs, run IDs, operator identity, audit log handles — these flow through
ToolCallContext(peragent-tooling.md), not through Mastra. - UI state. The harness emits events to the session manager; the session manager streams to the UI. Mastra has no UI awareness.
Compilation triggers
| Trigger | Action |
|---|---|
| Daemon startup + project load | Full compilation. Agents constructed. |
| DSL file change (file watcher) | Re-compile. New agents available for next session. Active sessions continue with their compiled snapshot (per session-manager.md). |
| Prompt file change (file watcher) | Prompt content reloaded. Active sessions pick up changes at next message boundary. No re-compilation of agent topology. |
| Local config change (model aliases) | Provider router updated. Active sessions pick up changes at next LLM call. |
Processor pipeline
The Processor pipeline is the core mechanism that satisfies the manifesto principle: every system prompt is readable by the operator. Per ADR-0012's audit, no code path adds messages after the Processor pipeline runs and before the provider call. The message array exiting the pipeline is what the LLM sees.
Pipeline order
Processors execute in registration order. kaged registers three:
1. AuditProcessor — logs the full pre-generation message array
2. PolicyProcessor — enforces kaged policies (token budgets, content gates)
3. ObservabilityProcessor — exports trace data to Langfuse (or structured logs)
AuditProcessor
Fires on every processInput and processInputStep invocation. Records the complete message array (system messages + user/assistant messages) to the kaged audit log.
class AuditProcessor implements Processor {
processInput({ messages, systemMessages, messageList }) {
auditLog.write({
event: "agent.pre_generation",
system_messages: systemMessages,
messages: messages,
timestamp: Date.now(),
});
// Pass through unmodified
return { messages, systemMessages };
}
processInputStep({ messages, systemMessages, stepNumber, tools }) {
auditLog.write({
event: "agent.pre_generation_step",
step: stepNumber,
tool_count: Object.keys(tools).length,
timestamp: Date.now(),
});
return { messages, systemMessages };
}
}
This processor is always registered, even when Langfuse is disabled. The kaged audit log is the non-optional record.
PolicyProcessor
Enforces operator-configured policies before the LLM call proceeds. Can modify messages or abort execution.
Responsibilities:
- Token budget enforcement. If the message array exceeds the configured context window budget, truncate or abort with a clear error. The operator configures budget thresholds in the DSL or local config.
- Content gates. If the operator configures content restrictions (e.g., "subagent X must not see file Y's content"), the PolicyProcessor strips or redacts matching content from the message array before it reaches the LLM.
- Abort on violation. If a policy is violated and cannot be remediated by message modification, the processor calls
abort()with a descriptive error. The session manager surfaces this as a run failure.
class PolicyProcessor implements Processor {
processInput({ messages, systemMessages, abort }) {
const totalTokens = estimateTokens(messages, systemMessages);
if (totalTokens > this.budget.hard_limit) {
abort("Token budget exceeded: " + totalTokens + " > " + this.budget.hard_limit);
return { messages, systemMessages };
}
// Apply content gates...
return { messages, systemMessages };
}
}
ObservabilityProcessor
When Langfuse is configured, exports pre-generation message snapshots as Langfuse trace spans. When Langfuse is not configured, writes structured JSON to stdout (the fallback path from ADR-0013).
class ObservabilityProcessor implements Processor {
processInput({ messages, systemMessages }) {
if (this.langfuse) {
this.langfuse.span({
name: "pre_generation",
input: { system_messages: systemMessages, messages },
});
} else {
structuredLog("agent.trace", { system_messages: systemMessages, messages });
}
return { messages, systemMessages };
}
}
Processor visibility guarantees
Per the ADR-0012 audit:
| Hook | When | What kaged sees |
|---|---|---|
processInput() |
Once at start of execution | Full messages + systemMessages (agent instructions, memory context, user-provided) |
processInputStep() |
Every agentic loop step (including tool-call continuations) | Same as above, plus stepNumber; can override tools and toolChoice |
processOutputStream() |
During streaming output | Output chunks from the LLM |
Nothing is injected after the pipeline. The message array the pipeline emits is the message array the LLM receives.
Supervisor pattern (recursive)
Per ADR-0022, the agent tree is recursive. Every agent that has subagents is a Mastra supervisor over its direct children. The tree structure is the call graph: a parent calls its direct children; sibling and cross-tree calls do not exist. There is no can_be_called_by check and no event-routed dispatch (interconnect).
How it works
- The DSL compiler walks the
AgentSpectree depth-first, producing aCompiledAgentNodeat each level. For every node that has children, the compiler registers the children on the parent'sagentsconfig:
// Recursive — called for each node in the AgentSpec tree
function buildAgentNode(
spec: AgentSpec,
key: string,
treePath: string, // e.g. "primary", "primary.subagents.researcher"
): CompiledAgentNode {
const children: Record<string, CompiledAgentNode> = {};
const childAgents: Record<string, Agent> = {};
for (const [childKey, childSpec] of Object.entries(spec.subagents ?? {})) {
const childPath = `${treePath}.subagents.${childKey}`;
const childNode = buildAgentNode(childSpec, childKey, childPath);
children[childKey] = childNode;
childAgents[childKey] = childNode.config.agent;
}
const agent = new Agent({
name: key,
instructions: () => promptStore.get(treePath),
model: kagedModel(resolveRoute(spec.model)),
tools: resolveToolsForAgent(spec, treePath),
agents: childAgents, // direct children only
});
return {
config: { agent },
cagePolicy: spec.cage === "disabled" ? null : compileCage(spec.cage),
tools: resolveToolsForAgent(spec, treePath),
children,
};
}
Mastra converts each child agent into a synthetic tool:
agent-researcher,agent-writer. The tool's description is the child'sdescriptionfield — operator-provided via the DSL.The parent's
instructionscontain the routing logic. This is operator-authored in the DSL's system prompt file. No framework-generated routing prompt is injected.The LLM in the parent decides when to call which
agent-{key}tool based on the instructions and descriptions. Delegation is reasoning-driven, not declarative.Depth limit. The tree is bounded at 16 levels (same as the project-reference depth limit in
federated-config.md). The compiler rejects deeper nesting at parse time.
Per-agent tool resolution
Each agent in the tree has its own resolved tool set. The resolution chain (from agent-tooling.md):
- Built-in registry (all tools exist but are not enabled by default)
- Role-based defaults (root agent gets
kaged.issue.*andkaged.workflow.*; all others start empty) - Agent's
tools:block (operator opts in per agent) principal_scopeenforcement (schema rejectskaged.*on non-root agents)- Cage filter at dispatch time
The resolveToolsForAgent() function applies steps 1–4 at compile time. Step 5 is runtime.
Delegation hooks
kaged registers three hooks on every supervisor (any agent with children):
| Hook | Purpose |
|---|---|
onDelegationStart |
Logs the delegation to the audit log with the full tree-position path. Can modify the prompt sent to the child or abort the delegation (e.g., if the child's cage is not ready). |
onDelegationComplete |
Logs the result. Can provide feedback to the parent or bail on the delegation chain. |
messageFilter |
Controls which parent messages reach each child. kaged uses this to strip messages containing content outside the child's cage allowlist. |
Message hygiene
Mastra's stripParentToolParts() removes parent-tool-call references from messages forwarded to children (preventing children from seeing tools they don't have). kaged's messageFilter adds cage-aware content filtering on top.
Cage integration
When a parent agent delegates to a child, the session manager:
- Receives the delegation via
onDelegationStart. - Looks up the child's
CagePolicyfrom theCompiledAgentNode. - If
cagePolicyisnull(child hascage: disabled), the child runs in the daemon's process context — same as the root agent. - If
cagePolicyis non-null, spawns the child's process in a cage (persandbox.md). - The Mastra agent runs inside the caged process; its tools are mediated by
ToolPermissions(peragent-tooling.md).
The root agent's cage must be disabled in the current interim state (ADR-0022 § Interim state). The supervisor infrastructure to cage the primary process is scheduled for a follow-up ADR.
Checkpoint bridge
kaged's checkpoint protocol (defined in session-manager.md) enables operator pause, resume, and rollback of agent execution at message boundaries.
v0 model: kaged-native stateless checkpoints
v0 does not use Mastra's Workflow suspend()/resume() primitive. Mastra Agents are constructed stateless (§ Storage strategy), and Mastra Workflow suspend requires Mastra-side snapshot persistence (StorageAdapter) that kaged does not provide. Instead, checkpoints are implemented natively using kaged's existing storage layer.
A checkpoint is a pointer into the persisted message history. No separate snapshot blob is stored. The full state at any checkpoint is reconstructible from MessageRecord rows up to the messageCursor plus the prompt content at that time.
Checkpoint triggers
| Trigger | Mechanism |
|---|---|
| Operator pauses (⏸) | Daemon fires abortController.abort(). In-flight agent.stream() cancels. Harness captures partial output and returns with finishReason: "aborted". Daemon persists the partial assistant message, creates CheckpointRecord, transitions session running → paused. |
Model calls checkpoint tool |
Tool handler sets a checkpointRequested signal on shared run context and returns a result indicating execution will pause ("Checkpoint taken. Awaiting operator."). The current generation completes naturally. runPrimary returns with checkpointRequested populated on the result. Daemon creates CheckpointRecord, transitions session running → paused. |
Checkpoint record
Created by the daemon's capture_checkpoint effect handler:
const record: CheckpointRecord = {
id: generateId(),
sessionId,
runId,
createdAt: Date.now(),
createdBy: initiator, // "operator" | "model"
reason: detail ?? null,
messageCursor: lastMessageId, // last MessageRecord.id before the pause
resumedAt: null,
rolledBack: false,
supersededBy: null,
};
storage.createCheckpoint(record);
The messageCursor points to the last MessageRecord.id persisted before the checkpoint. For operator pause, this is the partial assistant message from the aborted stream. For model-requested checkpoints, this is the complete assistant message from the naturally-finished stream.
runPrimary result extension
RunPrimaryResult carries an optional checkpointRequested field so the daemon can distinguish a normal completion from one that should trigger a checkpoint:
interface RunPrimaryResult {
// ... existing fields ...
checkpointRequested?: {
detail?: string; // model-provided reason
};
}
When checkpointRequested is present, the daemon creates a CheckpointRecord with createdBy: "model" and transitions the session to paused instead of idle. When absent, the daemon follows the normal run_completed → idle path.
For operator-initiated pause, the daemon does not wait for the runPrimary result — it fires abortController.abort() immediately and creates the checkpoint in its own handler, independent of the harness return path.
checkpoint tool
The checkpoint tool is a kaged-internal tool registered on the primary agent (defined in agent-tooling.md). Its behavior:
- The model calls
checkpoint({ reason?: string }). - The tool handler sets a shared
checkpointRequestedflag accessible to therunPrimaryclosure. - The tool returns
{ status: "checkpoint_taken", message: "Execution paused. Awaiting operator." }. - The model sees the result. Since the tool's description instructs it to stop after calling checkpoint, the model produces a final text response and the generation ends naturally.
runPrimaryreturns withcheckpointRequestedpopulated.
The checkpoint tool's description in the tool registry is: "Pause execution and yield control to the operator. Call this when you need human review, approval, or input before proceeding. After calling this tool, produce a brief summary of your current state and stop."
Resume
- Operator sends a resume request (
POST /api/v1/sessions/:id/resume). - Daemon loads the
CheckpointRecordviastorage.getCheckpoint(checkpointId). - Checks for prompt edits via
storage.listPromptEdits(checkpointId). If edits exist, theapply_prompt_editseffect updates the prompt store. - Reconstructs the message history from
MessageRecordrows — samereconstructMessages()path used bydispatchPrimary, which already filterssupersededmessages. - Creates a new run.
- Calls
runPrimarywith the reconstructed messages. The model continues naturally from the full context. - Updates the checkpoint:
storage.updateCheckpoint(checkpointId, { resumedAt: Date.now() }). - Session transitions
paused → running(via session machineresumeevent).
Resume is a new run, not a continuation of the old one. The model sees the full message history and generates a new response. This is the intended behavior: at a message boundary, the next generation is always a fresh agent.stream() call with reconstructed context.
Rollback
- Operator sends a rollback request (
POST /api/v1/sessions/:id/rollback). - Session machine emits
kill_post_checkpoint_subagentsandsupersede_messages_afterside effects. - The
supersede_messages_aftereffect marks allMessageRecordrows created after the rollback target checkpoint'smessageCursorassuperseded = true. - Checkpoint's
rolledBackflag is set totrue. - Session transitions
paused → idle. - The next
post_messagereconstructs messages from non-superseded rows only, effectively rewinding history to the checkpoint.
The rollback target checkpoint is preserved — it is not deleted.
Prompt editing during pause
While paused, the operator can edit agent prompts via the checkpoint inspection API. Edits are stored in the prompt_edits table:
interface PromptEditRecord {
id: string;
checkpointId: string;
sessionId: string;
target: string; // agent name ("primary", or subagent name)
oldHash: string;
newHash: string;
newContent: string;
editedAt: number;
}
On resume, the apply_prompt_edits effect reads the edits for the checkpoint and updates the prompt store (on-disk files or the in-memory prompt cache). The resumed run uses the new prompt content.
Limitations
- No mid-generation pause.
abortController.abort()cancels the in-flight HTTP connection. Partial output up to the abort point is captured and persisted, but the model's response is incomplete. The operator sees the partial output and can resume (which starts a fresh generation) or rollback. - No Mastra-side state preservation. Since Agents are stateless, there is no Mastra
MessageListto serialize. Context is reconstructed from kaged'sMessageRecordrows on every run. This is by design (§ Storage strategy, § Survives daemon restart). - Resume cannot continue mid-tool-loop. If the model was in the middle of a multi-step tool-use loop when paused, resume starts a fresh generation. The model sees the tool calls and results from the partial run in its message history and decides how to proceed. This matches the spec's message-boundary semantics.
When this could change
A future amendment may adopt Mastra Workflow suspend()/resume() if kaged adds a bun:sqlite-backed StorageAdapter for Mastra. This would enable mid-step suspension (between workflow steps) without aborting the HTTP connection. The checkpoint bridge helpers in checkpoint-bridge.ts (suspend(), resume(), serializeSnapshot(), deserializeSnapshot(), snapshotFromMessages()) are retained as pure serialization utilities for that future path.
Provider configuration
Model aliases
Operators configure model aliases in local config (local-config.md). The DSL references aliases; the harness resolves them at runtime.
# local.toml
[models]
fast = "anthropic:claude-sonnet-4-20250514"
smart = "anthropic:claude-opus-4-20250514"
cheap = "openai:gpt-4.1-mini"
local = "ollama:llama-4-scout"
# project.yaml
agents:
primary:
model: "smart" # resolved to anthropic:claude-opus-4-20250514
subagents:
researcher:
model: "fast" # resolved to anthropic:claude-sonnet-4-20250514
Fallback arrays
Operators can configure fallback chains for resilience:
[models]
smart = ["anthropic:claude-opus-4-20250514", "openai:o3", "ollama:llama-4-scout"]
The provider router tries each in order. If the first provider returns an error (rate limit, outage, timeout), the router falls back to the next. Fallback is per-call, not per-session.
Provider credentials
Provider API keys are configured in local config (per local-config.md), never in the project DSL (per ADR-0011 — project files are portable; credentials are operator-local).
# local.toml
[providers.anthropic]
api_key = "${KAGED_ANTHROPIC_API_KEY}"
[providers.openai]
api_key = "${KAGED_OPENAI_API_KEY}"
Environment variable references are resolved at config load time. Raw API keys in config files are supported but discouraged (documented as a security consideration).
Provider abstraction
Per ADR-0014, all LLM calls route through @kaged/llm. The harness does not depend on @ai-sdk/<provider> packages. Instead, @kaged/llm exposes a LanguageModelV2 factory — the Vercel AI SDK provider interface that Mastra v1.x consumes — which the harness uses as the model field on every Mastra Agent.
import { kagedModel } from "@kaged/llm/mastra";
const agent = new Agent({
id: "primary",
name: "Primary",
instructions: () => promptStore.get("primary"),
model: kagedModel(resolvedRoute), // ← LanguageModelV2 backed by @kaged/llm
tools: { /* ... */ },
});
kagedModel(route) returns a LanguageModelV2 whose doStream / doGenerate methods map Mastra's LanguageModelV2CallOptions to kaged's Context + StreamOptions, call streamModel() / completeModel(), and translate the resulting StreamEvents back into LanguageModelV2StreamParts. The mapping lives in packages/llm/src/mastra-model.ts and is the only Mastra-aware code in @kaged/llm.
Implications:
- One provider code path. Agent loop, provider test endpoint, ad-hoc calls — all route through
streamModel/completeModel. - Custom headers, retry policy, OAuth refresh, telemetry — all extension points live in
@kaged/llm. - No transitive
@ai-sdk/<provider>dependencies (only@ai-sdk/provider-v5for interface types, pinned to whichever version Mastra v1.x targets). - OAuth / subscription providers are supported by design:
@kaged/llmis operator-owned code and may ship adapters Mastra / Vercel won't (see ADR-0014 andllm.mdfor the OAuth strategy). v0 ships API-key providers only.
Observability
Langfuse integration
When the operator configures Langfuse credentials in local.toml, the daemon initializes a Mastra observability instance with a LangfuseExporter. Before each primary run, agent.__registerMastra(mastra) injects the observability context into the agent. Mastra then automatically creates hierarchical traces with correct nesting:
invoke_agentspan (type: AGENT) — top-level run contextchat modelspan (type: GENERATION) — LLM call with model, provider, usagemodel_stepspans — individual reasoning/generation steps- Tool execution spans — tool name, input, output, timing
For multi-step runs (tool calls followed by additional LLM generation), Mastra creates additional generation and tool spans automatically. Tool definitions, provider/model metadata, and token usage are enriched by Mastra without manual code.
When Langfuse is not configured, kaged runs normally and emits no Langfuse traces.
# local.toml
[langfuse]
enabled = true
base_url = "http://langfuse.local:3000"
public_key_env = "KAGED_LANGFUSE_PUBLIC_KEY"
secret_key_env = "KAGED_LANGFUSE_SECRET_KEY"
The observability instance is initialized from daemon-side local config:
import { initMastraObservability } from "@kaged/harness";
if (pubKey && secKey) {
initMastraObservability({
enabled: true,
baseUrl: config.langfuse.base_url ?? "https://cloud.langfuse.com",
publicKey: pubKey,
secretKey: secKey,
});
}
Per ADR-0013, kaged uses Mastra's native observability pipeline rather than maintaining a separate manual Langfuse SDK integration. This ensures correct trace hierarchy, automatic metadata enrichment, and compatibility across Mastra version upgrades without manual tracing code that drifts from the actual execution flow.
Structured-log fallback
When Langfuse is not configured (the default), the ObservabilityProcessor writes structured JSON logs to stdout. These are sufficient for basic debugging and integrate with any log aggregator (Loki, Datadog, etc.).
Log format:
{
"level": "info",
"event": "agent.generation",
"session_id": "01HXAB...",
"run_id": "01HXAC...",
"agent": "primary",
"model": "anthropic:claude-sonnet-4-20250514",
"tokens_in": 1523,
"tokens_out": 847,
"duration_ms": 2340,
"tool_calls": ["file.read", "code.lsp"],
"timestamp": "2026-05-22T14:30:00.000Z"
}
What is NOT in Langfuse
Per ADR-0013:
- Prompt management. Prompts are files on disk, not Langfuse-hosted.
- A/B testing. Not a first-party feature. Operators version prompt files and compare traces.
- Prompt audit log. The git log of prompt files is the audit log.
Prompt management
Prompts are files
All prompts that shape agent behavior live in the project directory as files referenced from the DSL:
# project.yaml
agents:
primary:
system_prompt: ./prompts/primary.md
subagents:
researcher:
system_prompt: ./prompts/researcher.md
The harness reads prompt files at compilation time and registers them for file watching.
Hot-reload
The daemon's file watcher (per daemon.md) monitors prompt files referenced by the active project. On change:
- The harness re-reads the file content.
- At the next message boundary (never mid-generation), the agent's
instructionsare updated with the new content. - The session manager emits a
prompt.reloadedaudit event. - The UI shows an indicator: "System prompt updated."
This is the instructions field on Mastra's Agent — which supports dynamic instructions via a function. The harness provides a function that reads from the current file content:
const agent = new Agent({
instructions: () => promptStore.get("primary"),
// ...
});
Prompt visibility
The operator can view all active prompts via the UI or API:
GET /api/v1/sessions/:id/prompts— returns the current system prompts for the primary and all active subagents.- At a checkpoint, prompts are editable (per
session-manager.md).
The AuditProcessor logs the full system message array on every generation. The operator can always see exactly what the LLM received.
Cancellation
abortSignal wiring
When the operator cancels a run (per session-manager.md):
- The session manager calls
abortController.abort(). - The
abortSignalpropagates to Mastra's in-flightgenerate/streamcall. - Mastra aborts the HTTP connection to the provider.
- If sub-agents are active, the session manager also sends
SIGTERMto their caged processes. - The harness captures partial output from the aborted generation.
The abortSignal is passed to Mastra's Agent.generate() / Agent.stream():
const result = await primaryAgent.generate(messages, {
abortSignal: runAbortController.signal,
});
Plugin hook firing
Per ADR-0023, the harness is the firing point for project-plugin lifecycle hooks. The plugin host (per plugin-host.md § Lifecycle hooks) defines the wire protocol; this section is normative for where in the run lifecycle each hook fires.
Hook firing summary
| Hook | Where in runPrimary |
Scope | Cardinality |
|---|---|---|---|
on_session_start |
Inside reconstructMessages(), after history reconstruction and before the system prompt is finalized |
Primary only — sessions are primary-owned | Once per session (first run only; tracked by session.recalledOnce flag) |
on_session_idle |
Outside runPrimary — fired by the session manager's idle detector after a debounce window with no run activity |
Primary only | Once per idle event |
pre_compact |
Inside the compaction pipeline, before strategy (see Compaction) | Per-agent | Once per compaction event for the affected agent |
post_compact |
Inside the compaction pipeline, after strategy (see Compaction) | Per-agent | Once per compaction event for the affected agent |
on_session_start firing
The harness fires on_session_start exactly once per session, on the first runPrimary invocation for that session. The session manager tracks this via a recalled_at column on the session record; the harness reads this column in reconstructMessages() and skips firing if non-null.
Firing order inside reconstructMessages():
1. Read MessageRecord rows for this session (non-superseded, ordered)
2. Build the base system prompt from the agent's `instructions` function
3. IF session.recalled_at IS NULL AND agent_path == "primary":
For each plugin declared on the primary with `on_session_start` in hooks:
Call plugin.kaged.hook.on_session_start({ _context })
If result.inject is non-empty:
Wrap in <plugin:NAME>...</plugin:NAME>
Append to the system prompt array (after the base instructions)
Set session.recalled_at = NOW()
4. Continue with normal message reconstruction
Multiple plugins on the primary fire in manifest-declaration order. Each plugin's inject content is appended in order. The audit log (agent.pre_generation event from the AuditProcessor) sees the assembled system prompt — including all <plugin:NAME> blocks — exactly as the LLM will receive it.
Failure handling. A hook that throws, times out, or returns a malformed result is logged (plugin.hook.failed / plugin.hook.timeout per plugin-host.md) and treated as if the plugin returned null. The session continues with no inject from that plugin.
on_session_idle firing
Idle detection is not part of runPrimary — it is a session-manager concern that fires after the last run completes and the session has been idle for the debounce window.
Firing flow:
1. Run completes; session transitions to idle state
2. Session manager starts an idle timer (debounce window)
3. If a new run starts before the timer fires: cancel the timer
4. If the timer fires:
For each plugin declared on the primary with `on_session_idle` in hooks:
Fetch the message transcript (all non-superseded messages since session start
or since last idle fire — plugins decide via their config)
Call plugin.kaged.hook.on_session_idle({ _context, transcript })
Response is not expected; treat as fire-and-forget after JSON-RPC ack
Mark session.last_idle_fire = NOW()
The debounce window default is configurable per-plugin in the manifest (plugins may want different idle thresholds). The session manager's default if a plugin specifies none is 30 seconds.
Restart semantics. Per ADR-0023, pending idle timers are not restored across daemon restart. After a restart, the session-manager re-arms the timer on the next genuine run completion.
pre_compact and post_compact firing
pre_compact and post_compact fire inside the compaction pipeline, see Compaction below for the full firing-point detail.
Subagent semantics
Plugins declared on subagents:
- Receive
pre_compactandpost_compactwhen the subagent's context window is compacted (each agent has its own window, per ADR-0024). - Never receive
on_session_startoron_session_idle— these are primary-only because subagents have no sessions. The plugin host emits aplugin.hook.illegalwarning at load time when a subagent's plugin declaration includes these hooks; the daemon still starts but the hooks never fire.
Context populated
For every hook fire, the harness populates the canonical PluginCallContext (per plugin-host.md):
operator_id— from the request that started the run (or the session creator foron_session_idle).project_id— the normalized project root of the current project load.agent_path, the canonical path of the agent the hook is firing for. For session-lifecycle hooks this is always"primary"; forpre_compactandpost_compactit is the path of the agent whose window is being compacted.session_id— the session ID.request_id— a freshly-generated request ID, distinct from any HTTP request ID. Hook firings have their own trace IDs.
Compaction
Per ADR-0024, context compaction is kaged-owned at the harness boundary. This section is the normative spec for how compaction runs inside runPrimary and reconstructMessages().
Where compaction lives
The harness is the single owner of context-window management. Mastra's internal trimming is neutralized — the harness never lets Mastra's message-list exceed the configured threshold so Mastra has nothing to trim. Specifically:
- The harness reconstructs the message list from
MessageRecordrows on every run (the existingreconstructMessages()path). - Before the reconstructed list is handed to
agent.stream(...), the harness runs the compaction pipeline. - The pipeline either passes the list through unchanged (no compaction needed) or replaces it with a compacted version.
- Mastra only ever sees the post-compaction list.
This means: Mastra never trims, because the list it receives is always within bounds. If a context-length error still occurs (estimator wrong, model metadata stale, etc.), the harness compacts reactively and retries — see § Reactive fallback.
The compaction pipeline
┌───────────────────────────────────────────────────────────┐
│ reconstructMessages() — build candidate list │
│ (read MessageRecord rows; apply on_session_start; │
│ drop tool calls per include_tool_results_in_context) │
└──────────────────────┬────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Tool output pruning pre-pass (per § Tool output pruning) │
│ replace stale large tool results with short notices │
│ lightweight — no superseding, no CompactionRecord │
└──────────────────────┬────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────┐
│ Token estimation (per @kaged/llm) │
│ compute total estimated tokens for the candidate list │
│ + system prompt + model's reserved-output budget │
│ (uses FALLBACK_CONTEXT_WINDOW when model meta absent) │
└──────────────────────┬────────────────────────────────────┘
│
estimate < upper_threshold?
│ │
yes │ │ no
│ ▼
│ ┌──────────────────────────────┐
│ │ Compaction triggered │
│ │ audit: compaction.triggered │
│ └────────────┬─────────────────┘
│ │
│ ▼
│ ┌──────────────────────────────┐
│ │ 1. Fire pre_compact (observer) │
│ │ for each subscribed plugin │
│ │ in manifest-declaration │
│ │ order │
│ │ Apply retain[] / inject │
│ └────────────┬─────────────────┘
│ │
│ ▼
│ ┌──────────────────────────────┐
│ │ 2. Apply strategy: │
│ │ drop / summarize / delegate │
│ │ / checkpoint │
│ │ (delegate = compactor plug) │
│ └────────────┬─────────────────┘
│ │
│ ▼
│ ┌──────────────────────────────┐
│ │ 3. No-op guard: │
│ │ if 0 messages superseded │
│ │ → return unchanged │
│ │ audit: compaction.noop │
│ │ (skipped for dry-run) │
│ └────────────┬─────────────────┘
│ │
│ ▼
│ ┌──────────────────────────────┐
│ │ 4. Mark superseded; persist │
│ │ CompactionRecord; audit │
│ │ compaction.completed │
│ └────────────┬─────────────────┘
│ │
└───────────────┘
│
▼
┌───────────────────────────────────┐
│ Hand compacted list to │
│ agent.stream(...) │
└───────────────────────────────────┘
Trigger semantics
Pre-call (proactive). Before each LLM call, the harness:
- Reconstructs the candidate message list.
- Runs tool output pruning (see § Tool output pruning below) to reduce stale tool-result bloat.
- Estimates token usage via
@kaged/llm's estimator (seellm.md § Token estimation). When model metadata is unavailable, the estimator uses a fallback context window of 128,000 tokens to ensurefractionis always positive and proactive compaction can trigger. - Adds the system prompt size and the model's reserved-output budget (from the model alias's metadata).
- Compares against the agent's configured upper threshold (default 0.85 of the model's context window).
- If over: fire compaction. Compact until the estimate is below the lower threshold (default 0.60; hysteresis prevents oscillation).
Reactive (post-failure). If the estimate is wrong and the provider returns a context-length error (specific error codes per @kaged/llm), the harness:
- Catches the error (detected via
isContextLengthError— regex patterns for OpenAI, Anthropic, Google, Azure error messages + HTTP 413). - Checks the
reactiveRetryAttemptedflag — if already set, re-throws immediately (single retry cap). - Sets
reactiveRetryAttempted = true. - Treats the failure as a forced compaction trigger (
trigger: "provider_overflow_retry"). - Compacts the message list.
- No-op guard: if compaction superseded zero messages (
compacted: false), the retry is skipped — the original error propagates. Retrying with unchanged context would hit the same error. - Retries the call once with the compacted message list. If the retry also fails on context length, the run is marked failed with a
context_overflowerror.
Context-overflow error messages from failed provider calls are not persisted to the message history — they would consume context on subsequent runs and provide no value to the model.
The reactiveRetryAttempted flag also gates a secondary detection path: when runPrimary returns finishReason: "error" with a context-length error message (rather than throwing), the same compaction-and-retry flow applies, subject to the same single-attempt cap.
The audit log captures both pre-call and reactive triggers with distinct reason fields (threshold_crossed vs provider_overflow_retry).
Strategy execution
The four strategies (configured in AgentSpec.compaction.strategy):
drop
The default / fallback. When configured as strategy: "drop", the harness first attempts summarization if both conditions are met: (1) a summarizeFn is available, and (2) the agent's compaction config includes a summarize block. If summarization succeeds, the result is used — this preserves more context than a pure drop. If summarization throws, the pipeline falls back to pure drop silently.
When summarization is not available (no summarizeFn or no summarize config), or when the summarization attempt fails, the harness drops oldest non-superseded, non-always-keep messages until the estimate is below the lower threshold. Drops happen in tool-pair-atomic units (tool call + tool result drop together, never split). The dropped messages are marked superseded = true.
No plugin compactor is called. Observer hooks still fire before the drop.
summarize
The harness invokes a summarizer model (resolved via the operator's model alias, configured in agent.compaction.summarize.model) with the operator-authored summarizer prompt (referenced via project:/ URI in agent.compaction.summarize.prompt). The summarizer receives the window of messages to compact and returns a summary message.
The returned summary becomes a single new MessageRecord (role: "system", metadata: { "kind": "compaction_summary", ... }). The compacted messages are marked superseded = true. The reconstructed list becomes: [always-keep, summary, recent messages].
Summarizer cost is tracked separately in the session's stats — see § Cost surfacing below.
delegate
The harness calls the designated compactor plugin (named in agent.compaction.delegate.plugin) via pre_compact with role: compactor. The plugin returns a CompactorResult (per plugin-host.md § Plugin roles). The harness replaces the message list with the result's messages, marks result.superseded as superseded, and creates a CompactionRecord with result.summary.
If the plugin fails (throws / times out / returns invalid CompactorResult), the harness falls back to drop and logs the full failure chain.
checkpoint
The harness creates a CheckpointRecord (per session-manager.md § Checkpoint protocol) with reason: compaction_pending, marks the session paused, and ends the run with finishReason: "awaiting_compaction". The operator inspects the proposed compaction (the configured fallback sub-strategy — drop by default) and approves, edits, or rejects via the Compactor UI (per ui/compactor.md). On resume, the operator-approved compaction is applied.
Always-keep set
The following messages are never compacted regardless of strategy:
- The system prompt(s) — always passed through.
- The first operator message in the session (the initial task).
- Messages whose
metadata.always_keep = true. Plugins can set this metadata viaagent-tooling.mdmessage annotations. - Operator-configured always-keep predicates (per
agent.compaction.always_keep).
The compaction pipeline filters always-keep messages out of the compaction candidate set before any strategy runs. If a compactor plugin returns a messages list that omits an always-keep message, the harness emits compactor_dropped_always_keep and falls back to drop.
Atomicity of tool-call/tool-result pairs
Tool calls and their results are coupled. The compaction pipeline treats them as atomic units:
- A
dropstrategy drops the pair together (never the call without the result, or vice versa). - A
summarizestrategy receives the full pair as one logical unit; the summarizer sees both call and result. - A
delegatestrategy: themessages_being_compactedarray sent to the compactor includes the full pair; the compactor must respect the coupling (if it returns a list with a call but no result, the harness emitscompactor_split_tool_pairand falls back todrop).
Tracking and persistence
A CompactionRecord is written for every compaction event:
interface CompactionRecord {
id: string;
sessionId: string;
runId: string; // the run that triggered (or the one immediately after for idle-driven)
agentPath: string; // which agent's window was compacted
createdAt: number;
trigger: "threshold_crossed" | "provider_overflow_retry" | "operator_manual" | "scheduled";
strategy: "drop" | "summarize" | "delegate" | "checkpoint";
thresholdEstimate: number; // estimated token usage at trigger time
afterEstimate: number; // estimated token usage after compaction
windowUpper: number; // configured upper threshold
windowLower: number; // configured lower threshold
supersededMessageIds: string[]; // messages marked superseded by this event
summaryMessageId: string | null; // new MessageRecord ID, if summarize/delegate
summary: string | null; // human-readable summary (audit log + UI)
pluginsFired: { name: string, role: "observer" | "compactor", duration_ms: number, result_kind: "inject" | "retain" | "compactor_result" | "null" | "error" }[];
pluginCost: { provider: string, model: string, input_tokens: number, output_tokens: number, cost_usd: number } | null;
fallbackOccurred: boolean; // true if a delegate/summarize fell back to drop
fallbackReason: string | null; // human-readable reason if fallback
operatorFlag: "good" | "bad" | "neutral" | null; // operator feedback (per ADR-0024)
operatorNotes: string | null;
}
The schema migration is in @kaged/storage; see also session-manager.md § CompactionRecord.
Dry-run mode
The harness exposes a dry-run path: given a session and an agent, compute what compaction would do against the current message list without committing. The dry-run:
- Runs the trigger check (always considered "triggered" in dry-run).
- Fires observer hooks (with a
dry_run: trueflag in_contextso plugins can skip side effects). - Executes the strategy.
- Returns the proposed
CompactionRecord(without persisting) and the proposedmessageslist.
The Compactor UI uses this to preview compaction before the operator commits. The dry-run path is the strategy-preview implementation; manual-compact uses the same code path but commits.
The dry-run is exposed via POST /api/v1/sessions/:id/compactions/dry-run (per http-api.md).
Manual compaction
Operators can trigger compaction at any time via the Compactor UI or directly via POST /api/v1/sessions/:id/compact. The endpoint accepts an optional strategy override and an optional agent_path (defaults to primary). The harness runs the same pipeline as automatic compaction; the trigger field on the resulting CompactionRecord is operator_manual.
Per-agent semantics
Each agent in the recursive tree has its own context window and its own compaction config. The harness tracks per-agent estimates and fires compaction independently for each agent. A subagent that has not declared the memory plugin (or any compaction-relevant plugin) still has its window managed — the drop strategy is the default fallback.
Cost surfacing
When the summarize strategy invokes a model, its cost is tracked separately in the session's stats:
- A new field
stats.compactor_cost: CostBreakdown | nullonRunPrimaryResult(parallel tostats.costfor the primary call). MessageRecord.metadata.compactor_costfor the summary message (ifsummarizeordelegateproduced one).- The session's aggregate cost view (per
http-api.mdsession-detail response) sums both:primary: $X, compactor: $Y, total: $Z.
The Compactor UI surfaces compactor cost prominently in the per-session stats panel.
Audit events
The harness emits these audit events for compaction (in addition to the plugin host's hook-firing events):
| Event | When | Data |
|---|---|---|
compaction.triggered |
A compaction event begins | session, run, agent, trigger, threshold_estimate, strategy |
compaction.completed |
Compaction succeeded | session, run, agent, after_estimate, superseded_count, summary_message_id, plugins_fired, duration_ms |
compaction.noop |
Strategy executed but superseded zero messages | session, run, agent, reason, strategy |
compaction.failed |
A strategy failed and fallback ran | session, run, agent, attempted_strategy, fallback_strategy, reason |
compaction.flagged |
Operator attached a flag to a compaction event | session, compaction_id, flag, notes_length |
Reactive fallback retry
The reactive path enforces a single retry cap via a reactiveRetryAttempted flag scoped to the dispatch cycle. Once set, no further reactive compaction is attempted regardless of subsequent errors.
The no-op guard applies to the reactive path: if compaction superseded zero messages, the retry is skipped and the original context-length error propagates. This prevents infinite loops where compaction "succeeds" but context is unchanged.
Context-overflow error messages from failed provider calls are not persisted to the message history — they would consume context on subsequent runs.
If the single retry also fails on context length, the run is marked failed and the session transitions to a degraded state (context_overflow). The operator sees this in the UI as a clear error with the option to manually compact and retry.
Tool output pruning
Before the compaction pipeline runs, the harness applies a lightweight pruning pre-pass to reduce context pressure from stale tool-result messages. Pruning is not compaction — it does not mark messages superseded, does not produce a CompactionRecord, and does not fire plugin hooks. It operates on the CompactableMessage[] list in place.
The pruning algorithm:
- Walk tool-result messages from newest to oldest.
- Accumulate token counts. The most recent
protectTokens(default 40,000) worth of tool output is left intact. - Beyond the protection window, replace large tool-result content with a short
[Pruned — N tokens]notice. - Skip tool results whose
toolNameis in theprotectedToolsset (default:read,skill). - Skip results where the pruned notice would be as large as or larger than the original content (no net savings).
- Only apply pruning if the total savings across all candidates exceeds
minimumSavings(default 20,000 tokens). If not, return the message list unchanged.
Pruning runs as a pre-pass in dispatchPrimary (daemon side), before runCompactionPipeline(). The pruned message list feeds into the compaction pipeline as its input. If pruning reduces context below the upper threshold, compaction does not fire — pruning alone was sufficient.
interface PruneConfig {
readonly protectTokens: number; // default 40_000
readonly minimumSavings: number; // default 20_000
readonly protectedTools: readonly string[]; // default ["read", "skill"]
}
interface PruneResult {
readonly messages: readonly CompactableMessage[];
readonly prunedCount: number;
readonly tokensSaved: number;
}
function pruneToolOutputs(
messages: readonly CompactableMessage[],
systemPrompt: string,
modelMeta: ModelMeta | null,
config?: PruneConfig,
): PruneResult;
Implementation: @kaged/harness pruning.ts.
Disabling Mastra's internal trimming
Mastra v1.x has its own context-management logic. The harness disables it via the Mastra Agent constructor config (passing a messageList that ignores Mastra's MessageListBudgetOptions, or by using the bare LanguageModelV2 interface when feasible — see mastra-adapter.ts). The exact mechanism is implementation detail; the contract is that Mastra must not trim the message list the harness hands to agent.stream(...).
If Mastra cannot be cleanly disabled in some future version, the harness escape hatch is to bypass Agent.stream entirely and call @kaged/llm's streamModel directly for that call (per the existing per-call escape hatch in § Escape hatch).
Streaming relay & daemon integration
The harness exposes one entry point to the daemon: runPrimary. The daemon calls it from handlePostMessage (see http-api.md) as a fire-and-forget operation after the 201 response. The function builds the Mastra Agent, calls agent.stream(...), relays output over the daemon's WebSocket transport, and persists the final AssistantMessage.
Runtime entry point
interface RunPrimaryInput {
sessionId: string;
runId: string;
compiledProject: CompiledProject;
providerRoute: ProviderRoute; // resolved from compiled.primary.model + local-config
messages: Message[]; // reconstructed from MessageRecord rows
abortSignal: AbortSignal;
publish: (event: HarnessOutputEvent) => void; // harness-owned; daemon translates to WsFrame
maxSteps?: number; // from DSL or session override; passed to Mastra
maxOutputTokens?: number; // from DSL or session override; passed to provider
}
interface RunPrimaryResult {
assistantMessage: AssistantMessage; // for storage.createMessage
usage: Usage;
finishReason: "stop" | "length" | "toolUse" | "error" | "aborted";
}
function runPrimary(input: RunPrimaryInput): Promise<RunPrimaryResult>;
runPrimary is fully async. The session-manager state machine guarantees one in-flight run per session; calling runPrimary twice for the same runId is undefined behavior at the harness layer (the state machine is the gate).
The publish callback receives a harness-owned event type, not a daemon WsFrame. This preserves the ADR-0011 substrate-portability commitment: if Mastra is replaced (the ADR-0012 escape hatch), the harness's outbound contract to the daemon does not change. The daemon owns the WsFrame envelope (channel, seq, transport).
Harness output event shape
type HarnessOutputEvent =
| {
type: "message.start";
runId: string;
messageId: string;
provider: string; // kaged provider name (e.g. "anthropic", "openai")
model: string; // model ID (e.g. "claude-sonnet-4-20250514")
}
| { type: "message.delta"; messageId: string; delta: string; kind: "text" | "thinking" }
| { type: "message.tool_call"; messageId: string; id: string; name: string; arguments: unknown }
| { type: "message.tool_result"; messageId: string; toolCallId: string; content: string; isError: boolean }
| {
type: "message.end";
messageId: string;
stopReason: "stop" | "length" | "toolUse" | "error" | "aborted";
usage?: Usage;
errorMessage?: string;
stats?: {
/** Time to first token in milliseconds. */
ttft: number | null;
/** Total generation duration in milliseconds (from stream open to final token). */
duration: number;
/** Tokens per second (output tokens / duration). */
tps: number;
/** Cost breakdown in USD, computed via @kaged/llm calculateCost. Null when model metadata unavailable. */
cost: { input: number; output: number; reasoning: number; cacheRead: number; cacheWrite: number; total: number } | null;
};
};
The type strings match kaged's WsOutputType 1:1. The daemon wraps each event into a WsFrame on the output channel with monotonically increasing seq.
message.start enrichment. The provider and model fields identify which provider and model are servicing this run. The UI uses these to render a provider:model label on the message bubble header. These values come from the resolved ProviderRoute — they are always available (the harness cannot start a stream without a resolved route).
message.end enrichment. The optional stats object carries post-completion timing and cost data. The harness computes these from its own instrumentation:
ttft— elapsed ms fromfetchcall to firsttext_deltaorthinking_deltaevent.nullif the stream errored before any content token (e.g. immediate 429). Stored on theAssistantMessageasttft.duration— elapsed ms from stream open to final token (not including post-stream persistence). Stored on theAssistantMessageasduration.tps— output tokens divided by duration in seconds. A derived convenience; the UI displays it directly.cost— dollar cost breakdown from@kaged/llm'scalculateCost(usage, modelMeta).nullwhen model metadata is unavailable (unknown model, self-hosted without config). When present, the UI renders the total; the breakdown is available on hover/expand.
stats is absent (not null) when the run terminated before any meaningful measurement (e.g. immediate abort before stream opened). The UI treats missing stats as "no data available."
Stream event mapping
Mastra emits a ReadableStream<ChunkType> (agent.stream(...).fullStream). The harness consumes it, maps each chunk to a HarnessOutputEvent, and calls publish(event). The daemon wraps each event into a WsFrame on the output channel (http-api.md).
Mastra ChunkType.type |
HarnessOutputEvent.type (same string as WsOutputType) |
Event payload shape |
|---|---|---|
| (start of stream) | message.start |
{ runId, messageId, provider, model } (kaged-generated; one per stream; provider+model from resolved route) |
text-delta |
message.delta |
{ messageId, delta, kind: "text" } |
reasoning-delta |
message.delta |
{ messageId, delta, kind: "thinking" } (v0 surfaces as a delta with a kind discriminator) |
tool-call |
message.tool_call |
{ messageId, id, name, arguments } |
tool-result |
message.tool_result |
{ messageId, toolCallId, content, isError } |
step-start |
(suppressed) | Not surfaced in v0; logged via observability pipeline only |
step-complete |
(suppressed) | Same |
source |
(suppressed) | v0 does not surface citations; deferred |
file |
(suppressed) | v0 does not surface file outputs; deferred |
text-start / text-end |
(suppressed) | Boundaries inferred from the delta stream; not transported in v0 |
reasoning-start / reasoning-end |
(suppressed) | Same |
finish |
message.end |
{ messageId, stopReason, usage, stats: { ttft, duration, tps, cost } } |
error |
message.end |
{ messageId, stopReason: "error", errorMessage } (terminal; run also transitions to failed; stats may be absent if error occurred before stream opened) |
step-start / step-complete chunks are emitted by Mastra for every loop iteration including tool-call continuations. v0 collapses these into a single message envelope because the UI renders one message per run for now. Future versions may expose step boundaries; the protocol slot is available (WsEventType already includes run.started / run.ended; new types may be added).
WebSocket relay topology
The daemon maintains a session → socket registry. The registry is populated when WebSocket connections open and drained when they close.
// In packages/daemon/src/runtime/ws-registry.ts (planned)
const subscribers = new Map<string, Set<ServerWebSocket<WsSessionData>>>();
export function registerSocket(sessionId: string, ws: ServerWebSocket<WsSessionData>): void;
export function unregisterSocket(sessionId: string, ws: ServerWebSocket<WsSessionData>): void;
export function publishHarnessEvent(sessionId: string, event: HarnessOutputEvent): void;
publishHarnessEvent wraps the harness event into a WsFrame (channel output, monotonically increasing seq per subscriber socket), serializes to JSON, and sends to every subscriber. Sessions with no subscribers (operator UI not connected, or connected to a different session) are skipped — the run still completes and persists; only the live relay is dark.
runPrimary's publish callback is a closure that calls publishHarnessEvent(sessionId, event). The harness has no direct reference to ServerWebSocket and no knowledge of the WS frame envelope.
Persistence: AssistantMessage → MessageRecord
Mastra's AssistantMessage is rich — text content, thinking content, tool calls, usage, stop reason, latency. kaged's MessageRecord has content: string and metadata: JSON. The mapping:
MessageRecord.role="primary"MessageRecord.content= concatenation of allTextContentblocks (the human-readable transcript)MessageRecord.metadata= JSON encoding of:provider,modelusage(input / output / cache / cost)stopReasonduration,ttftcontentBlocks— the full structured(TextContent | ThinkingContent | ToolCall)[]array, so a future version can reconstruct the message faithfully without re-running the LLMerrorMessage,errorStatusif the run terminated inerror
The UI's chat transcript renders content (plain text). The metadata.contentBlocks field is for replay, audit-log inspection, and future UI features (collapsed thinking sections, inline tool-call cards).
When MessageRecord is read back for the next run's context (rebuilding Mastra's messages argument), the harness prefers metadata.contentBlocks when present and falls back to wrapping content in a single TextContent block.
Fire-and-forget pattern
handlePostMessage returns 201 immediately after persisting the operator message and starting the run. runPrimary is launched without await. If runPrimary rejects, the harness catches the rejection, publishes a message.end frame with stop_reason: "error", transitions the run to failed, and transitions the session to idle. Unhandled rejection from runPrimary is a kaged bug, not a normal failure mode.
Escape hatch
Per ADR-0012, the escape path if Mastra becomes unsuitable is:
- Pin
@mastra/coreto the last good version. - Evaluate replacement (VoltAgent, hand-rolled harness, or direct
@kaged/llmcalls without an agent loop). - Replace the DSL compiler behind the stable DSL contract from ADR-0011.
The DSL is the portable artifact. The harness is implementation detail. The operator's project files do not change if the substrate changes.
For specific cases where Mastra's abstractions are insufficient, the harness can call @kaged/llm directly (streamModel / completeModel) without constructing a Mastra Agent. Per ADR-0014, the provider layer is the same in either case — @kaged/llm is both Mastra's model and the per-call escape hatch. The escape hatch is used surgically, not as a pattern.
Failure modes
| Failure | Detection | Recovery | Operator impact |
|---|---|---|---|
Mastra Agent.generate throws |
Exception in harness | Run marked failed. Session → idle. |
"Agent generation failed: [error]." |
| Provider rate limited | HTTP 429 from provider | Provider router tries next fallback. If all exhausted, run fails. | "All providers rate-limited. Try again later." (or transparent fallback) |
| Provider unreachable | HTTP timeout / connection refused | Provider router tries next fallback. | "Provider unreachable." (or transparent fallback) |
Processor calls abort() |
PolicyProcessor detects violation | Run marked failed with policy violation details. |
"Policy violation: [detail]. Check your project config." |
| Langfuse exporter fails | Network error to Langfuse endpoint | Exporter logs warning; agent continues. Tracing is best-effort. | Warning in logs. No operator-visible impact. Agent continues. |
| Prompt file not found | File read error at compilation or hot-reload | Compilation fails with clear error. Agent not started. | "Prompt file not found: ./prompts/primary.md" |
| Prompt file change during generation | File watcher fires mid-generation | Change queued. Applied at next message boundary, not mid-generation. | Seamless. Next generation uses new prompt. |
| Model alias not found in local config | Alias resolution failure | Compilation fails with clear error. | "Model alias 'fast' not found in local config." |
| Checkpoint creation fails | Storage write error during createCheckpoint |
Session stays running. Run completes normally without checkpoint. |
Warning: "Checkpoint could not be saved." |
| Resume fails (checkpoint not found) | getCheckpoint returns null |
Session stays paused. Operator can retry with valid checkpoint or rollback. |
"Resume failed: checkpoint not found." |
| Abort during stream yields no content | abortSignal fires before any tokens received |
Empty partial message persisted. Checkpoint points to empty message. | Operator sees empty assistant message. Resume starts fresh generation. |
| Sub-agent delegation fails | onDelegationStart aborts, or sub-agent cage fails to spawn |
Delegation error returned to supervisor. Supervisor may retry or fail. | "Sub-agent [name] could not be started: [reason]." |
| Context window exceeded | Token count > model limit | PolicyProcessor truncates or aborts (configurable). | "Context window budget exceeded." or automatic truncation. |
Excluded Mastra surfaces
Per ADR-0012, the following Mastra features are explicitly not used:
| Surface | Reason |
|---|---|
| Mastra Cloud | Vendor-hosted. kaged is self-hosted. |
| Mastra Studio | kaged ships its own UI. |
| Mastra Workspace / Skills | Conceptually collides with kaged's [CAGED] sandbox. The SkillsProcessor (which injects <available_skills> XML into system messages) never fires because kaged does not configure a Workspace. |
| Mastra RAG primitives | Not required for v0. |
If any of these are adopted in a future version, a follow-up ADR is required.
Testing notes
DSL compilation tests
- Minimal DSL. Compile a project with one primary agent, no subagents. Assert a valid
MastraAgentConfigis produced. - Subagents. Compile a project with three subagents. Assert each is registered on the primary's
agentsconfig. - Cage policies excluded from Mastra. Compile a project with caged subagents. Assert
CagePolicyis in the compiled output but not in any Mastra config object. - Prompt file resolution. Compile with
system_prompt: ./prompts/primary.md. Assert prompt content is read from that file. - Model alias resolution. Compile with
model: "fast". Assert resolution to the concrete model identifier from local config. Assert error for unknown aliases. - Re-compilation. Change the DSL. Assert new compilation produces different agents. Assert active sessions are unaffected.
Processor pipeline tests
- AuditProcessor logs all messages. Generate a response. Assert the audit log contains the full system + user message array.
- AuditProcessor logs every step. Generate a multi-step response (tool call + continuation). Assert
processInputSteplogged for each step with incrementingstepNumber. - PolicyProcessor enforces budget. Set a token budget. Send messages exceeding it. Assert
abort()is called. - PolicyProcessor passes valid requests. Send messages within budget. Assert pass-through.
- Pipeline order. Assert AuditProcessor fires before PolicyProcessor. Assert PolicyProcessor fires before ObservabilityProcessor.
- No post-pipeline injection. Capture the message array exiting the pipeline. Capture what the LLM receives (via mock provider). Assert they are identical.
Supervisor tests
- Child agent delegation. Parent delegates to a child. Assert
onDelegationStartfires with the correct tree-position path. Assert the child receives the correct messages. - Recursive delegation. Parent delegates to child A, which delegates to grandchild B. Assert both
onDelegationStarthooks fire with correct paths ("primary.subagents.A","primary.subagents.A.subagents.B"). - Message filter. Configure a message filter that strips messages with content outside the child's cage. Assert filtered messages do not reach the child.
- Delegation hooks. Assert
onDelegationStartandonDelegationCompletefire in order. AssertonDelegationStartcan abort a delegation. - Per-agent tool resolution. Compile a project with root agent (no
tools:override) and a child with"file.*": { enabled: true }. Assert root getskaged.issue.*andkaged.workflow.*by default. Assert child gets onlyfile.*tools. - Depth limit. Compile a project with 17 levels of nesting. Assert parse-time rejection.
- No sibling dispatch. Assert child A cannot call child B (no
agent-Btool registered on A). Only the parent hasagent-Aandagent-B.
Checkpoint bridge tests
- Operator pause → checkpoint created. Trigger operator pause via
abortController.abort(). AssertrunPrimaryreturns withfinishReason: "aborted". Assert daemon creates aCheckpointRecordwithcreatedBy: "operator"andmessageCursorpointing to the persisted partial message. - Model-initiated checkpoint. Mock a primary that calls the
checkpointtool. AssertrunPrimaryreturns withcheckpointRequestedpopulated. Assert daemon creates aCheckpointRecordwithcreatedBy: "model"and the model's reason. - Resume reconstructs messages. Create a checkpoint, then resume. Assert a new run is created. Assert
runPrimaryis called with messages reconstructed from non-supersededMessageRecordrows. AssertCheckpointRecord.resumedAtis updated. - Resume with edited prompts. Create a checkpoint, add prompt edits, resume. Assert
apply_prompt_editseffect fires before the new run. Assert the resumedrunPrimarycall uses the updated prompt content. - Rollback supersedes messages. Create a checkpoint at message 5, post more messages, then rollback to that checkpoint. Assert messages after the checkpoint's
messageCursorare markedsuperseded = true. AssertCheckpointRecord.rolledBackis set totrue. Assert session transitions toidle. - Rollback preserves checkpoint. After rollback, assert the target checkpoint record still exists in storage (not deleted).
- Session state transitions. Assert
running → pausedon checkpoint. Assertpaused → runningon resume with pending work. Assertpaused → idleon resume without pending work. Assertpaused → idleon rollback. - Serialization helpers round-trip.
snapshotFromMessages()→serializeSnapshot()→deserializeSnapshot(). Assert the result matches the original snapshot. (These test the retained pure helpers, not the runtime checkpoint path.)
Observability tests
- Langfuse configured. Set credentials. Assert the daemon initializes the Langfuse client singleton and traces are exported (mock Langfuse endpoint).
- Langfuse not configured. Omit credentials. Assert no exporter is registered. Assert structured logs are written to stdout.
- Langfuse failure. Configure Langfuse with an unreachable endpoint. Assert the agent continues normally. Assert a warning is logged.
Provider routing tests
- Single model. Configure
fast = "anthropic:claude-sonnet-4-20250514". Assert the correct provider is called. - Fallback chain. Configure
smart = ["anthropic:...", "openai:..."]. Mock first provider to return 429. Assert second provider is tried. - All providers down. Configure a fallback chain. Mock all providers to fail. Assert the run fails with a clear error.
- Credential resolution. Configure
api_key = "${KAGED_ANTHROPIC_API_KEY}". Set the env var. Assert the key is resolved correctly.
Prompt management tests
- Hot-reload. Change a prompt file. Assert the agent uses the new content at the next message boundary.
- No mid-generation reload. Change a prompt file during generation. Assert the current generation completes with the old prompt.
- Prompt visibility. Fetch prompts via API. Assert the current content matches the file.
Open questions
- Token budget strategy. When the context window is exceeded, should the harness truncate (drop oldest messages) or abort? Both are valid; the operator should probably configure the behavior. Provisional: abort by default, truncation opt-in via DSL config.
- Processor ordering extensibility. v0 has three hardcoded processors. Should operators be able to register custom processors via the DSL or plugins? Deferred to v0.x.
- Observational memory. Mastra has a token-aware truncation and "observational memory" feature. Should kaged use it, or implement its own context window management? Provisional: use Mastra's, but the PolicyProcessor enforces a hard ceiling.
- Multi-model per session. Can different runs in the same session use different models (e.g., operator changes the alias mid-session)? Provisional: yes — the provider router resolves aliases per-call, not per-session.
- Mastra version pinning strategy. Pin to exact patch (
1.2.3) or minor range (^1.2.0)? Exact pin is safest; minor range keeps security patches flowing. Provisional: exact pin inpackage.json, with a monthly review cadence for bumps.
Amendments
2026-05-23 — Provider strategy via @kaged/llm shim; v0 stateless storage; daemon integration spec
Driven by ADR-0014 and the v0 wiring of handlePostMessage through the harness:
- Provider abstraction reversed. Previous wording said the harness "passes the resolved model identifier and credentials to Mastra's
Agentconfig; Mastra handles the rest." Per ADR-0014, all provider calls route through@kaged/llmvia itsLanguageModelV2shim. The harness useskagedModel(route)asAgent.modeland does not depend on@ai-sdk/<provider>packages. - Escape hatch corrected. Previous wording named "direct calls to the Vercel AI SDK (
generateText/streamText)" as the per-call escape. Per ADR-0014 the escape hatch is direct@kaged/llmcalls (streamModel/completeModel). The provider layer is the same in both cases. - Storage strategy added (new § Storage strategy). v0 constructs Mastra Agents stateless; kaged's
@kaged/storageis the source of truth. Mastra holds no cross-call state. A future amendment may add abun:sqlite-backedMemoryStorageadapter if Mastra-side persistence becomes necessary. - Streaming relay added (new § Streaming relay & daemon integration). Defines the
runPrimaryentry point the daemon calls, theChunkType→WsOutputTypemapping, the WebSocket session-registry pattern, theAssistantMessage→MessageRecordpersistence mapping, and the fire-and-forget pattern forhandlePostMessage. ADR-0014added to the Constrained-by list and constraint table.- Runtime entry-point publish callback re-typed. Earlier draft said
publish: (frame: WsFrame) => void, which leaked the daemon's transport envelope into the harness contract. Per ADR-0011, the harness's outbound contract must be substrate-portable. Re-defined aspublish: (event: HarnessOutputEvent) => voidwith a harness-owned event type. The WS-frame envelope (channel, seq, JSON serialization) is the daemon's concern and lives inpackages/daemon/src/runtime/ws-registry.ts.
2026-05-24 — Enriched harness output events (provider:model, timing, cost)
Driven by streaming-first enrichment work (ADR-0016) and @kaged/llm model metadata catalog:
message.startenriched withproviderandmodel. The event now carries the resolved provider name and model ID from theProviderRoute. The UI uses these to render aprovider:modellabel on the message bubble header. These fields are always present — the harness cannot start a stream without a resolved route.message.endenriched withstatsobject. A new optionalstatsfield carries post-completion instrumentation:ttft(time to first token, ms),duration(total generation time, ms),tps(tokens per second), andcost(USD breakdown from@kaged/llm'scalculateCost).statsis absent when the run terminated before meaningful measurement.costisnullwhen model metadata is unavailable.- Stream event mapping table updated.
message.startrow now shows{ runId, messageId, provider, model }.finishrow now shows{ messageId, stopReason, usage, stats }. - Harness instrumentation contract documented. The harness is responsible for capturing TTFT (first content delta timestamp minus stream-open timestamp), duration (stream-open to final token), and calling
calculateCostfrom@kaged/llmwith the completedUsageand looked-upModelMeta. These are harness concerns — the daemon passes them through unchanged.
2026-05-24 — Subagent topology implementation (item 28)
Driven by item 28 (subagent topology in @kaged/harness). Implementation in mastra-adapter.ts, tool-adapter.ts, delegation.ts, runtime.ts.
- Supervisor pattern implemented.
buildSubagents()inmastra-adapter.tsiteratesCompiledProject.subagents, constructs a MastraAgentper entry with the subagent'sdescription,instructions, and resolved tools, and registers them on the primary'sagentsconfig. Matches the § Supervisor pattern / How it works steps 1–4 exactly. - Tool injection via
@kaged/agent-tooling.resolveToolsForAgent()callsToolRegistry.resolve()with the agent's tool globs, then converts eachToolDefinitionto a MastraToolActionviatoolDefinitionsToRecord()(JSON Schema → Zod conversion intool-adapter.ts). Tools are per-agent, not shared. - Delegation hooks partially implemented.
DelegationHooksinterface indelegation.tsdeclaresonDelegationStartandonDelegationComplete.buildDelegationConfig()wires audit-log entries (delegation start/complete with subagent key, prompt snippet, timing) and cage-policy lookup fromCompiledProject.cagePolicies. The config is an execution option passed toagent.stream(), not a constructor param. messageFilterdeferred. The spec's third hook (messageFilterfor cage-aware content stripping) is not yet implemented. It depends on the sandbox runtime (item 30) to define what "outside the cage allowlist" means at runtime.- Cage spawning deferred. § Cage integration steps 3–4 (spawning the sub-agent's process in a cage, tool mediation via
ToolPermissions) depend on@kaged/sandbox(item 30). TheonDelegationStarthook has a placeholder for cage-not-ready abort but does not perform actual spawning. SubagentTopologyDepsfactored out. Topology concerns (ToolRegistry,delegationHooks,streamDefaults) are separated fromAgentFactoryDepsinto a dedicatedSubagentTopologyDepsinterface, keeping the single-primary path unchanged.
2026-05-25 — Checkpoint bridge: v0 kaged-native stateless model (item 32)
Driven by item 32 (checkpoint bridge implementation) and design analysis of Mastra Workflow suspend()/resume() vs kaged-native checkpoints:
- Mastra Workflow suspend/resume removed from v0 checkpoint bridge. Previous wording described the checkpoint bridge as a translation layer between Mastra's
suspend(payload)/run.resume({ step, resumeData })and kaged's checkpoint protocol. v0 Agents are constructed stateless (§ Storage strategy) — Mastra Workflow suspend requires Mastra-side snapshot persistence via aStorageAdapterthat kaged does not provide. The bridge now operates entirely within kaged's storage layer. - Checkpoint is a message-cursor pointer, not a snapshot blob. Previous wording described checkpoints as serialized
MessageListstates. v0 checkpoints store only amessageCursor(theMessageRecord.idat pause time). Full state is reconstructible fromMessageRecordrows up to the cursor — the samereconstructMessages()path used by every run. - Two trigger mechanisms defined. Operator pause uses
abortController.abort()→ partial message persisted →CheckpointRecordcreated. Model-initiated checkpoint uses acheckpointtool → tool setscheckpointRequestedflag → generation completes naturally →runPrimaryreturns withcheckpointRequested→ daemon createsCheckpointRecord. RunPrimaryResultextended withcheckpointRequested. Optional field that signals the daemon to create a checkpoint and transition topausedinstead ofidle.- Resume is a new run, not a continuation. Previous wording said resume "continues from the suspended point" via Mastra's
run.resume(). v0 resume creates a new run, reconstructs messages from non-superseded rows, and callsrunPrimary— the model continues naturally from full context. - Checkpoint bridge test notes rewritten. Tests now verify kaged-native checkpoint creation, message cursor accuracy, resume message reconstruction, rollback message superseding, and session state transitions — not Mastra suspend/resume mechanics.
- Failure modes updated. Replaced Mastra-specific
suspend()/resume()validation errors with kaged-native failure modes (storage write errors, checkpoint not found, abort before content). - Pure serialization helpers retained.
checkpoint-bridge.tsfunctions (suspend(),resume(),serializeSnapshot(),deserializeSnapshot(),snapshotFromMessages()) are kept as utilities for a future Mastra-side persistence path. They are not used in the v0 runtime checkpoint flow.
2026-06-03 — Agent execution limits and stop reason surface
RunPrimaryInputextended. Two new optional fields:maxSteps(integer) andmaxOutputTokens(integer). Sourced from the DSL'sAgentSpec.max_steps/AgentSpec.max_output_tokensor from a session-level UI override.agent.stream()execution options updated. The harness passesmaxStepsandmaxTokens(mapped tomaxOutputTokens) to Mastra'sagent.stream()call. This replaces the previous default behavior where Mastra usedmaxSteps = 5and the provider used its own opaque default.- Stop reason already surfaced. The
message.endevent already carriesstopReason("stop" | "length" | "toolUse" | "error" | "aborted"). The UI uses this to display why a run ended and to offer an auto-continue action whenstopReason === "length". - Escape hatch preserved. If the operator does not set
max_stepsormax_output_tokens, the harness passes nothing to Mastra, preserving the previous default behavior. This is the migration path for existing projects. - Test notes added. Tests verify:
maxStepsreaches Mastra,maxOutputTokensreaches the provider request body, missing fields result in no parameter being passed, and stop reason"length"triggers the auto-continue affordance in the UI.
2026-05-27 — ADR-0023 & ADR-0024: plugin hook firing + context compaction
- New § Plugin hook firing. The harness is the firing point for project-plugin lifecycle hooks.
on_session_startandon_session_idlefire only on plugins declared on the primary (sessions are primary-owned).pre_compactandpost_compactfire per-agent. Firing order, context population, and failure handling specified. on_session_startis once-per-session. Tracked viasession.recalled_at. The harness fires the hook inreconstructMessages()on the first run of the session, appends<plugin:NAME>...</plugin:NAME>-wrapped inject content to the system prompt, and records the firing.on_session_idlefires from the session manager, notrunPrimary. Idle detection runs after the last run completes and the session has been idle for the debounce window (default 30s; per-plugin configurable). Pending timers are not restored across daemon restart.- New § Compaction. Context compaction is kaged-owned at the
reconstructMessages()boundary. Mastra's internal trimming is neutralized — the harness never lets the message list exceed thresholds, so Mastra has nothing to trim. - Compaction pipeline defined. Token estimation pre-call → observer hooks fire → strategy applies (
drop/summarize/delegate/checkpoint) → mark superseded → persistCompactionRecord→ hand compacted list toagent.stream(). - Hard threshold with hysteresis. Default upper threshold 0.85; lower 0.60. Compaction runs until below the lower threshold to avoid oscillation. Both thresholds operator-configurable per agent.
- Reactive fallback. If the estimate is wrong and the provider returns a context-length error, the harness compacts reactively and retries once. Second failure marks the run failed with
context_overflow. compactorplugin role. Plugins declared withrole: compactorreceivepre_compactwithrole: compactorin params and return aCompactorResultthat replaces the strategy step. The harness validates the result (always-keep preserved, tool pairs intact, valid superseded subset); invalid results fall back todrop.- Always-keep set defined. System prompt, first operator message, messages with
metadata.always_keep = true, and operator-configured predicates. The compaction pipeline filters always-keep out of the candidate set before any strategy runs. - Tool-call/tool-result pair atomicity. Coupled pairs are the compaction unit. Drop/summarize/delegate strategies treat them as one logical message; splitting is a contract violation.
CompactionRecordshape specified. Persisted in@kaged/storage(seesession-manager.md § CompactionRecord). Includes operator-feedback fields (operatorFlag,operatorNotes) per ADR-0024.- Dry-run path. The compaction pipeline runs without committing; observer hooks see
dry_run: truein context and skip side effects. Used by the Compactor UI for strategy preview and by manual-compact for the commit flow. - Cost surfacing. Summarizer model calls produce a
compactor_costfield onRunPrimaryResult.statsand on the summary message's metadata. Session aggregate cost view surfacesprimary: $X, compactor: $Y. - New audit events.
compaction.triggered,compaction.completed,compaction.failed,compaction.flagged. Plugin-host's hook-firing events (perplugin-host.md) capture the per-plugin half. - Mastra internal trimming disabled. Configured via Mastra
Agentconstructor; escape hatch is the existing per-call bypass tostreamModel. The contract: Mastra never trims the message list the harness handed it. - Constraints table updated. Six new constraints added covering hook firing, compaction ownership, per-agent semantics, compactor failure fallback, atomic tool pairs, and between-call execution.
2026-05-26 — ADR-0022: recursive agent tree, per-agent tool resolution, no can_be_called_by/interconnect
Per ADR-0022:
CompiledProjectrestructured. Previous shape had a flatprimary: MastraAgentConfigandsubagents: Record<string, MastraAgentConfig>alongsidecagePolicies: Record<string, CagePolicy>. New shape uses a recursiveCompiledAgentNodecarrying its ownconfig,cagePolicy,tools, andchildren.CompiledProject.rootis the tree root (primary agent). There is no separatesubagentsorcagePoliciesmap — everything is in the tree.- Supervisor pattern rewritten for recursive tree walk. The compiler walks the
AgentSpectree depth-first viabuildAgentNode(). Every agent with children is a Mastra supervisor over its direct children. The tree structure is the call graph — nocan_be_called_bychecks, no event-routed dispatch (interconnect). Depth bounded at 16 levels. - Per-agent tool resolution documented. Each agent's tool set is resolved independently via
resolveToolsForAgent()at compile time. Resolution chain: built-in registry → role-based defaults (root getskaged.issue.*/kaged.workflow.*) → agent'stools:block →principal_scopeenforcement → cage filter at dispatch. Cross-referencesagent-tooling.md§ Per-agent tool resolution. - DSL compilation example updated. Example now shows per-agent
tools:andcage:on both root and child agents instead of project-level tool declarations. - Cage integration updated. Root agent has
cage: disabled(interim state). Children withcage: disabledrun in daemon process context; children with full cage block are spawned in cages. Previous wording said "the supervisor (primary) runs uncaged; only sub-agents are caged" — replaced with cage-per-agent semantics. - Supervisor tests expanded. Added recursive delegation test, per-agent tool resolution test, depth limit test, and no-sibling-dispatch test.
- ADR-0022 added to constrained-by list.
References
- ADR-0004 — Bun + TypeScript runtime
- ADR-0009 — sandbox mechanism for caged subagents
- ADR-0011 — project portability (DSL is the stable contract)
- ADR-0012 — Mastra as agentic substrate (includes prompt-injection audit)
- ADR-0013 — Langfuse as optional observability
- ADR-0022 — recursive agents, per-agent tools and cage
- ADR-0023 — project-plugin lifecycle hooks
- ADR-0024 — context compaction owned by the harness
agent-tooling.md— tools available to agents, per-agent resolutiondaemon.md— process model, file watcher, startup sequencefederated-config.md— project-reference flattening intoAgentSpecsubtreeslocal-config.md— model aliases, provider credentialsproject-dsl.md— DSL syntax,AgentSpectype (compiled by the harness)sandbox.md— cage spawn/teardownsession-manager.md— session lifecycle, checkpoint protocol, run model- Mastra docs: https://mastra.ai/docs
- Mastra GitHub: https://github.com/mastra-ai/mastra