ADR-0012: Agentic substrate is Mastra v1.x

Status: Accepted
Date: 2026-05-22
Deciders: @karasu
Supersedes: —
Superseded by: —

Context

kaged owns the primary agent loop. That commitment is about ownership of the loop's semantics — what prompts run, what tools are exposed, what policies apply, when the operator can pause, what runs in a cage. It is silent on implementation.

There are two implementation paths:

Build the loop from scratch. Provider adapters (Anthropic, OpenAI, Google, local), tool-call shape normalization, streaming/cancellation machinery, suspend/resume primitive for debug checkpoints, token budget management. Honest estimate: 4-8 weeks of focused harness work before kaged's own differentiators (DSL routing, sandbox supervision, mobile UI) get any attention.
Use an existing TS agent framework as plumbing. The loop is implemented; kaged contributes the topology, the prompts, the policies, the cages, the UI. The framework is a dependency, not a master.

The TS agent framework landscape in May 2026 has a clear shortlist for path 2:

Mastra (v1.0, January 2026) — supervisor pattern + sub-agents, typed suspendSchema/resumeSchema for human-in-the-loop pause/resume, Processors that intercept message transformation, abortSignal cancellation, ~19.4k GitHub stars, ~300k weekly npm downloads, $13M YC W25 funding, MIT core.
Vercel AI SDK + Agent — closer to the metal, more assembly required, fewer batteries.
VoltAgent — youngest of the batteries-included frameworks, similar shape to Mastra.

The decision is load-bearing because the harness sits beneath kaged's project DSL, sandbox supervision, and debug-checkpoint UX. Choosing badly here means either (a) months of avoidable harness work, or (b) a framework that constrains kaged's manifesto promises.

Decision

kaged adopts Mastra v1.x as the agentic substrate for the primary agent loop. kaged's project DSL compiles to Mastra Agent + Workflow configurations. The DSL is the operator-facing surface; Mastra is implementation detail. Langfuse is the observability substrate (see ADR-0013); tracing is emitted via Mastra's native @mastra/observability + @mastra/langfuse pipeline, not a separate manual SDK integration.

The decision is specifically for v1.x and pinned. Major-version upgrades require a follow-up ADR.

Specific primitives kaged depends on

Agent and supervisor/sub-agent pattern → primary agent + dispatched subagents
Workflow with suspendSchema / resumeSchema → debug checkpoint protocol (the operator-pause primitive)
Processors (message interceptors before/after generation) → the hook where kaged wraps tool dispatch with cage spawning, audits prompts, and enforces policy
abortSignal → operator-initiated cancellation
Provider abstraction → cross-provider routing per call, model fallback arrays
Token-aware truncation and observational memory → context window management

Specific Mastra surfaces kaged will NOT use

Mastra Cloud — vendor-hosted; kaged is self-hosted.
Mastra Studio — Mastra's hosted dev/observability surface. kaged ships its own UI; Langfuse covers observability.
Mastra's Workspace sandbox abstraction — collides conceptually with kaged's [caged] sandbox. Toggle off / ignore at the substrate level; kaged owns sandboxing. Revisit as a design decision when sandbox supervision spec is written (specs/sandbox.md); not a blocker now.
Mastra's RAG primitives — not required for v0.

Consequences

What this commits us to

A version-pinned dependency on @mastra/core, @mastra/observability, and @mastra/langfuse in the harness. No direct langfuse SDK dependency — Mastra's observability pipeline handles the Langfuse exporter.
An interface layer in kaged that compiles the project DSL into Mastra agent + workflow configurations. This layer is the substrate-portability surface (see "What becomes harder" below).
Audit of Mastra's prompt-construction behavior to verify the manifesto principle that all system prompts are operator-readable (see Exploratory Note below).
Following Mastra's release cadence closely enough to know when to bump or pin. Their changelog has shipped meaningful changes weekly through March 2026.

What this forecloses

We cannot easily switch substrates mid-release without rewriting the DSL → substrate compiler. Mitigated by ADR-0011 project portability: the DSL is the portable artifact; the substrate it compiles to is implementation detail and can be replaced behind a stable DSL contract.
Adopting Mastra's deeper hosted offerings (Cloud, Studio) is off the table — see "specific surfaces kaged will NOT use" above.

What becomes easier

Suspend/resume for debug checkpoints: implemented, typed, with documented schemas. The single most-leveraged Mastra primitive for kaged.
Tool-call shape normalization across providers: handled.
Streaming, cancellation, retry, rate limiting: handled.
Token budget management and history compression: handled (observational memory).
Tracing: kaged uses Mastra's native observability pipeline via __registerMastra(). Mastra creates correctly hierarchical traces automatically — agent spans, generation spans, tool execution spans — with metadata enrichment (provider, model, tool definitions, usage). No manual span management in kaged code.
Time to first working primary agent in kaged: days, not weeks.

What becomes harder

A second engineer touching the harness has to learn Mastra's mental model before they can be productive on kaged's loop code.
If Mastra ships a breaking change in Agent semantics or Processor lifecycle, kaged inherits the upgrade work.
Debugging the loop now requires understanding two surfaces (kaged's DSL semantics and Mastra's implementation of them). The Processor system makes this tractable — every transformation is observable — but it's not zero.
We carry a dependency on a VC-funded open-source project. Mitigations in "Vendor risk" below.

Vendor risk

Mastra is YC W25, $13M raised, made by the team behind Gatsby. The Gatsby trajectory (open-source darling → acquisition → effectively dead) is a salient reference. Two mitigations:

MIT core is forkable. If the project goes bad, the version we're pinned to remains usable, and a hard fork is feasible if necessary.
Stay in the OSS lane. kaged's dependency is the npm package and the documented APIs. We do not adopt Mastra Cloud, Studio, or any backend-dependent feature. This keeps the dependency narrow enough to fork or replace.

If Mastra is acquired, sunset, or otherwise diverges from kaged's interests, the escape path is: pin to the last good version, evaluate VoltAgent or a hand-rolled harness, replace the DSL → substrate compiler behind the stable DSL contract from ADR-0011.

Exploratory note — prompt-injection audit

The kaged manifesto commits to every system prompt being readable and editable by the operator. Mastra's behavior under the hood needs to be verified against this commitment before harness code lands.

What to verify

What prompts does Mastra inject by default? When Agent constructs its messages array for an LLM call, are there framework-default system prompts? If yes — what are they? Can they be disabled, overridden, or made visible?
What does the Processor hook expose? Per Mastra's docs, Processors "intercept and transform messages before or after generation." Verify the full pre-generation message array is observable from a Processor — including any Mastra-injected content.
Does the supervisor pattern inject coordination prompts? When a supervisor dispatches to a sub-agent, does Mastra add its own routing prompt? Can it be replaced with kaged's?
Tool descriptions — when tools are passed to the model, what shape does Mastra send? Are there framework-added descriptions or annotations beyond what kaged provides?

How to verify

A one-day source read of @mastra/core, focused on the Agent.generate / Agent.stream paths, the Workflow suspend/resume primitives, and the Processor lifecycle. The codebase is TypeScript and the OSS license permits this freely.

Expected outcome

Best case: every prompt construction step is observable via a Processor, including Mastra-injected content. kaged adds a Processor that logs the full pre-generation message array to Langfuse via Mastra's observability pipeline, satisfying the manifesto.

Acceptable case: there are framework-default prompts, but they are documented, overridable, or disable-able. kaged either turns them off, or surfaces them in the operator UI alongside kaged-owned prompts as "Mastra defaults" so the operator can see them.

Unacceptable case: there are hidden prompts that cannot be observed or overridden. In that case, this ADR is amended to Rejected and a follow-up ADR replaces Mastra with either Vercel AI SDK + Agent or a hand-rolled loop.

Timing

This audit happens before packages/harness/ lands. It can run in parallel with the rest of doc-first work. If the audit fails the manifesto bar, ADR-0012 is amended before any harness code is committed.

Alternatives considered

Alternative A — Build the loop from scratch

Why tempting: Full control. No framework dependency. Manifesto promises are trivially verifiable because every line is ours.

Why rejected: 4-8 weeks of pure harness work before kaged's differentiators get any attention. The suspend/resume primitive alone is non-trivial to design well; Mastra has already shipped it as a typed schema-driven API. Reinventing the wheel here delays the things that make kaged kaged — DSL routing, sandbox supervision, mobile UI. Reinventing wheels is documented as a non-virtue (see ADR-0011 on portability via interface, not via re-implementation).

Alternative B — Vercel AI SDK + Agent

Why tempting: Closer to the metal. Smaller surface area. Vercel has strong long-term commitment to the SDK.

Why rejected: Fewer batteries — no supervisor pattern, no typed suspend/resume primitive, no Processor hooks. We'd build more ourselves. Acceptable if Mastra fails the prompt-injection audit; not the first choice while it stands.

Alternative C — VoltAgent

Why tempting: Structurally similar to Mastra with a broader console (observability, evals, prompts, guardrails, deployment in one place).

Why rejected: Younger than Mastra at v1.0, smaller production track record. The "broader console" feature is partly a negative for kaged — kaged ships its own UI; we don't want to compete with or be confused with VoltOps Console. Mastra's narrower posture (framework + minimal Studio) plays better with kaged's "we own the UX" stance.

Alternative D — LangChain.js

Why tempting: Mature, large ecosystem.

Why rejected: Reputation for being heavy and over-abstracted in TS. Mastra was explicitly built as a less-LangChain alternative for TypeScript; the ecosystem now treats it as the modern choice. Bringing LangChain into kaged would feel like a 2024 decision.

References

ADR-0011 — project portability (the escape hatch that makes substrate replaceable behind the DSL)
ADR-0013 — observability substrate is Langfuse (the natural pair to this decision)
Mastra docs: https://mastra.ai/docs
Mastra GitHub: https://github.com/mastra-ai/mastra
Mastra v1.0 announcement: January 2026
Mastra changelog: https://mastra.ai/blog/category/changelogs
Prior production experience with Mastra in a customer-support agent project — proven operational footprint

Amendments

2026-05-22 — Prompt-injection audit completed: PASS (best case)

The exploratory note above required a source-level audit of @mastra/core before harness code lands. The audit was conducted against the Mastra v1.x codebase on GitHub (mastra-ai/mastra, packages/core/src/agent/) and the official documentation at mastra.ai/docs. Four questions were verified; all four pass.

1. Default-injected prompts — none

The Agent class has an instructions field typed as AgentInstructions = SystemMessage. This is the developer's system prompt. There is no hardcoded framework-default preamble, fallback system prompt, or hidden instruction string. When instructions is omitted, no system message is added to the LLM call.

The getInstructions() method resolves the developer-provided value, which can be a string, a CoreSystemMessage, an array of system messages, or a function (dynamic instructions per request context). The addSystemMessage() helper in prepare-memory-step.ts adds only the developer's instructions to the MessageList.

One conditional injection: if the agent has a Mastra Workspace with skills configured, a SkillsProcessor injects an <available_skills> XML block into system messages. This is observable (it runs through the Processor pipeline) and controllable (kaged can intercept/remove it). Since ADR-0012 explicitly excludes Mastra's Workspace abstraction, this processor never fires in kaged's usage.

2. Processor visibility — full; nothing added post-Processor

The Processor interface exposes three input hooks:

Hook	When	What it sees
`processInput()`	Once at start of execution	`messages` (user + assistant), `systemMessages` (all system messages including agent instructions and memory context), full `MessageList` instance
`processInputStep()`	Every step of the agentic loop, including tool-call continuations	Same as above, plus `stepNumber`; can override `tools` and `toolChoice`
`processOutputStream()`	During streaming output	Output chunks from the LLM

processInput receives systemMessages: CoreMessage[] documented as "all system messages (agent instructions, memory context, user-provided). Can be modified and returned." A Processor can return { messages, systemMessages } to fully replace both arrays. It can also call abort() to halt execution entirely.

No code path adds messages after the Processor pipeline runs and before the provider call. The message array exiting the Processor pipeline is what the LLM sees.

Implication for kaged: kaged registers a Processor that logs the complete pre-generation message array to Langfuse via Mastra's observability pipeline. Every prompt the operator's agent sends to the LLM is observable. This satisfies the manifesto principle.

3. Supervisor routing prompts — developer-authored, not framework-generated

The supervisor pattern converts sub-agents into synthetic tools named agent-{key} (e.g., agent-researchAgent). Each synthetic tool's description is the sub-agent's description field — developer-provided. The supervisor's own instructions field is where routing logic lives ("Delegate to research-agent for facts, then writing-agent for content") — also developer-authored.

The LLM decides when to call which agent-{key} tool based on the supervisor's instructions and each tool's description. No framework-generated routing prompt is injected.

When delegating, stripParentToolParts() removes parent-tool-call references from messages forwarded to sub-agents (preventing sub-agents from seeing tools they don't have). This is a cleanup step, not a prompt injection.

Additional delegation hooks are developer-controlled: onDelegationStart (can modify the prompt or abort), onDelegationComplete (can provide feedback or bail), messageFilter (controls which parent messages reach each sub-agent).

Implication for kaged: kaged's DSL compiles to supervisor instructions and sub-agent description fields. The operator controls all routing prompts. Mastra provides the tool-wrapping mechanism; kaged provides the words.

4. Tool descriptions — pass-through

Tools registered with the Tool class carry developer-provided description and inputSchema. The framework does not modify, annotate, or append to tool descriptions before sending them to the model. The only framework-generated tool descriptions are the synthetic agent-{key} tools for sub-agents, which use the sub-agent's description field verbatim.

5. Suspend/resume suitability (bonus — kaged's most-leveraged primitive)

Workflow steps declare suspendSchema and resumeSchema (Zod schemas). Calling suspend(payload) halts execution, validates the payload against suspendSchema, serializes state, and returns control. Calling run.resume({ step, resumeData }) validates resumeData against resumeSchema and continues from the suspended point. Multiple steps can suspend independently (result.suspended is an array).

The MessageList has serialize() / deserialize() methods covering system messages, tagged system messages, user messages, and memory info.

Limitations for kaged's checkpoint use case:

No built-in rollback. resume() continues forward from the suspended point. kaged's "rollback to an earlier checkpoint" requires kaged to maintain its own checkpoint snapshots (serialized MessageList states) and create a new run from a previous state. This is kaged-owned work, not a Mastra gap.
Suspend is per-step in workflows, not per-token in agent generation. For "operator hits pause mid-generation," the abortSignal handles cancellation; kaged then checkpoints at the next step boundary. This matches the spec in session-manager.md (checkpoints occur at message boundaries, never mid-generation).

Audit verdict

Question	Result	Summary
Default injected prompts	Pass	`instructions` is developer-controlled; no framework preamble
Processor visibility	Pass (best case)	Full message array observable; nothing added post-Processor
Supervisor routing prompts	Pass	Developer-authored via `instructions` + sub-agent `description`; no hidden routing prompt
Tool description annotations	Pass	Pass-through; no framework annotations
Suspend/resume suitability	Acceptable	Typed schemas, serializable state; rollback is kaged's responsibility

Per the exploratory note's criteria, this is the best case outcome: "every prompt construction step is observable via a Processor, including Mastra-injected content." ADR-0012 status remains Accepted. No amendment to decision or status required. Harness code (packages/harness/) may proceed.

Amendments

2026-06-02 — Observability migrated to Mastra native pipeline

What changed: The Decision and Consequences sections were updated to reflect that observability tracing is now handled by Mastra's native @mastra/observability + @mastra/langfuse pipeline rather than a separate direct Langfuse SDK integration. The langfuse npm package was removed from @kaged/harness dependencies. The __registerMastra(mastra) method is used to inject observability context before each agent run.

Why: Mastra's integrated observability produces correct hierarchical traces (agent → generation → tool → generation) with automatic metadata enrichment, tool definitions, and usage data. The manual SDK approach produced flat traces with missing metadata. Using the platform's built-in capability is the correct posture per ADR-0013's amended principle. See ADR-0013 2026-06-02 amendment for full details.