Spec: LLM Provider Interface

Status: Draft
Last amended: 2026-06-29 (Antigravity native module implemented in @kaged/llm)
Constrained by: ADR-0004, ADR-0011, ADR-0012, ADR-0024, ADR-0026, ADR-0028, ADR-0041, ADR-0049
Implements: packages/llm/

Purpose

This spec defines @kaged/llm — the package that resolves, loads, and wraps LLM provider modules. It takes a resolved route (provider name, model ID, credentials) and a conversation context, returns a stream of typed events representing the assistant's response, and exposes the wrapped model as a LanguageModelV2 for Mastra.

Per ADR-0049 (as amended 2026-06-29), the standard provider drivers are statically bundled into the daemon binary at build time, not loaded from an operator-local on-disk store. A provider yields a LanguageModelV2:

Catalog providers — the @ai-sdk/* package the models.dev catalog names in its npm field, instantiated with the catalog api baseURL and env credentials. The package is one of the drivers bundled into the binary (§ Bundled drivers).
Provider plugins (kaged-convention) — for providers absent from models.dev, whose wire/auth diverges from any stock package, or whose package is outside the bundled set. A plugin additionally exports a catalog contribution, an optional usage fetcher (ADR-0026), and its own auth flow (ADR-0028). Antigravity is the reference plugin.

@kaged/llm itself is the resolver + loader + middleware: resolve a route → look the package up in the bundled-driver registry → wrap it in kaged's provider-agnostic middleware (Langfuse spans, x-litellm-model-id capture, spend/usage events, retry/fallback). The package no longer contains provider-specific wire code; the six hand adapters and their SSE/partial-JSON parsers are deleted (see § Amendments — 2026-06-25). The on-disk provider store described in the as-accepted ADR-0049 was retired on 2026-06-29 because a --compile'd binary cannot resolve dynamically-imported on-disk modules' transitive deps (see § Bundled drivers).

This package is normative for:

The provider resolution contract: how a (providerName, modelId) route resolves to a loaded LanguageModelV2.
The bundled-driver registry: which @ai-sdk/* packages are compiled into the binary (§ Bundled drivers).
The provider-plugin contract: what a kaged-convention plugin exports beyond a LanguageModelV2 factory.
The middleware stack applied to every resolved model (the new home of retry, telemetry, spend events, x-litellm-model-id capture).
The streaming event protocol (the kaged-side StreamEvent shape Mastra consumes via the LanguageModelV2 shim).
The message, tool, and context types that cross the package boundary.
The error taxonomy for provider failures (surfaced provider-agnostically through middleware).
AbortSignal integration for request cancellation.

It is not normative for:

Model alias resolution or fallback chains (that's provider-router in @kaged/harness).
Credential storage or operator-local config (that's @kaged/local-config).
Session state machines or run lifecycle (that's session-manager.md).
The WebSocket relay from daemon to UI (that's http-api.md).
The models.dev catalog itself — provenance, schema, and refresh cadence live with the kaged-models distribution at models.kaged.dev (ADR-0049 §5, delivered). This package consumes the catalog JSON as a read-only input.
Provider wire protocols — those live in the loaded @ai-sdk/* package or the provider plugin. This package does not parse SSE, build provider-specific request bodies, or implement retry inside the wire path.

Per ADR-0049, this package remains the single provider path for kaged. It exposes a LanguageModelV2 (see § Mastra integration) that Mastra v1.x consumes as Agent.model, so the agent loop and the direct call path both route through the same resolver + middleware. ADR-0014's "LanguageModelV2 is the integration boundary" call is preserved; its "no @ai-sdk/* tree / hand-written adapters" stance is reversed.

Constraints (from ADRs)

Constraint	Source
Runtime is Bun; no Node-isms in production daemon code	ADR-0004
Projects are portable; provider credentials are operator-local	ADR-0011
Mastra v1.x is the agentic substrate; `Agent.model` is a `LanguageModelV2`	ADR-0012
Standard provider drivers are statically bundled into the daemon binary; `@kaged/llm` is resolver + bundled-driver registry + middleware	ADR-0049 (amended 2026-06-29)
Every resolved model is wrapped in kaged's middleware (Langfuse, spend, retry, capture)	ADR-0049
Container image bundles the catalog JSON + pre-seeds the default provider set; `bun` is present for installs	ADR-0041, ADR-0049
Operator overrides (incl. per-model `packageOverride`) live in the DB and merge over the catalog base	ADR-0026
Provider auth flows (OAuth etc.) are owned by provider plugins, not by core	ADR-0028 (as amended by ADR-0049)

Types

All types are kaged's own. They are informed by pi-ai's shapes but are not imported from it.

Message types

interface TextContent {
  type: "text";
  text: string;
}

interface ThinkingContent {
  type: "thinking";
  thinking: string;
}

interface ImageContent {
  type: "image";
  data: string;        // base64
  mimeType: string;    // e.g. "image/png"
}

interface ToolCall {
  type: "toolCall";
  id: string;
  name: string;
  arguments: Record<string, unknown>;
}

interface UserMessage {
  role: "user";
  content: string | (TextContent | ImageContent)[];
  timestamp: number;
}

interface AssistantMessage {
  role: "assistant";
  content: (TextContent | ThinkingContent | ToolCall)[];
  provider: string;
  model: string;
  usage: Usage;
  stopReason: StopReason;
  errorMessage?: string;
  timestamp: number;
  duration?: number;
  ttft?: number;     // time to first token (ms)
}

interface ToolResultMessage {
  role: "toolResult";
  toolCallId: string;
  toolName: string;
  content: (TextContent | ImageContent)[];
  isError: boolean;
  timestamp: number;
}

interface SystemMessage {
  role: "system";
  content: string;
}

type Message = UserMessage | SystemMessage | AssistantMessage | ToolResultMessage;

Context

interface Tool {
  name: string;
  description: string;
  parameters: Record<string, unknown>;  // JSON Schema object
  strict?: boolean;
}

interface Context {
  systemPrompt?: string[];
  messages: Message[];
  tools?: Tool[];
}

Usage & Stop

interface Usage {
  input: number;
  output: number;
  cacheRead: number;
  cacheWrite: number;
  totalTokens: number;
  reasoningTokens?: number;
  cost: {
    input: number;
    output: number;
    reasoning: number;
    cacheRead: number;
    cacheWrite: number;
    total: number;
  };
}

type StopReason = "stop" | "length" | "toolUse" | "error" | "aborted";

Stream events

The event stream is an AsyncIterable<StreamEvent> with a .result() method that resolves to the final AssistantMessage.

type StreamEvent =
  | { type: "start"; partial: AssistantMessage }
  | { type: "text_start"; contentIndex: number; partial: AssistantMessage }
  | { type: "text_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
  | { type: "text_end"; contentIndex: number; content: string; partial: AssistantMessage }
  | { type: "thinking_start"; contentIndex: number; partial: AssistantMessage }
  | { type: "thinking_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
  | { type: "thinking_end"; contentIndex: number; content: string; partial: AssistantMessage }
  | { type: "toolcall_start"; contentIndex: number; partial: AssistantMessage }
  | { type: "toolcall_delta"; contentIndex: number; delta: string; partial: AssistantMessage }
  | { type: "toolcall_end"; contentIndex: number; toolCall: ToolCall; partial: AssistantMessage }
  | { type: "done"; reason: "stop" | "length" | "toolUse"; message: AssistantMessage }
  | { type: "error"; reason: "error" | "aborted"; error: AssistantMessage };

These events match pi-ai's AssistantMessageEvent shape exactly. This is intentional — the event protocol is battle-tested and the daemon's WebSocket relay can forward them without transformation.

Provider resolution interface

@kaged/llm's primary entry is resolveModel — given a route, return a middleware-wrapped LanguageModelV2. The streamModel / completeModel conveniences drive that model for one-shot calls (used by the daemon's provider test endpoint). All three are public API.

import type { LanguageModelV2 } from "@ai-sdk/provider";

/** What the harness hands to @kaged/llm after route resolution. */
interface ProviderRoute {
  providerName: string;
  modelId: string;
  /** Local/config override of the npm package to load. For custom providers this is the
   *  `npm` field from local.toml; for catalog providers it overrides the catalog's `npm`
   *  field at the provider level. */
  npmPackage?: string;
  /** Resolved by the harness from local-config (apiKey literal or apiKey_env).
   *  Optional — provider plugins that own their auth flow (ADR-0028) may ignore this. */
  apiKey?: string;
  /** Optional override of the catalog's `api` baseURL. */
  baseUrl?: string;
  /** Provider-specific defaults merged into every call's options. */
  defaultOptions?: Record<string, unknown>;
  /** Optional per-model npm package override passed directly on the route. Lower precedence
   *  than `context.overrides`; kept for callers that have already resolved the override. */
  packageOverride?: string;
}

/** Context passed to resolveModel. */
interface ResolveContext {
  /** Catalog snapshot to resolve against (vendored models.dev JSON; see § Catalog metadata). */
  catalog: CatalogSnapshot;
  /** Operator overrides (ADR-0026), including per-model `packageOverride`. */
  overrides: ModelOverride[];
  /** Abort the resolution (e.g. caller cancelled). */
  signal?: AbortSignal;
}

/** Injectable dependencies for resolveModel. The default implementation looks the package up
 *  in the bundled-driver registry (§ Bundled drivers); callers pass alternatives for unit tests. */
interface ResolveModelDeps {
  /** Look up the bundled driver module for a package name. Returns undefined if the package
   *  is not a bundled driver (caller treats it as a provider plugin or errors). */
  loadDriver?: (pkg: string) => { module: unknown; factory: string } | undefined;
  /** Wrap the instantiated model before returning it. */
  wrapMiddleware?: (model: LanguageModelV2) => LanguageModelV2;
}

interface StreamOptions {
  signal?: AbortSignal;
  temperature?: number;
  maxTokens?: number;
  topP?: number;
  stopSequences?: string[];
  reasoning?: EffortLevel;
  headers?: Record<string, string>;
}

type EffortLevel = "minimal" | "low" | "medium" | "high";

/** Resolve a route to a middleware-wrapped LanguageModelV2 from the bundled driver registry. */
function resolveModel(
  route: ProviderRoute,
  context: ResolveContext,
): Promise<LanguageModelV2>;

/** Convenience: resolve + stream a one-shot call. Drives doStream on the wrapped model. */
function streamModel(
  route: ProviderRoute,
  context: Context,
  options?: StreamOptions,
): LlmEventStream;

/** Convenience: resolve + await a one-shot full response. */
async function completeModel(
  route: ProviderRoute,
  context: Context,
  options?: StreamOptions,
): Promise<AssistantMessage>;

Resolution algorithm

resolveModel(route, context) performs:

Resolve the package name. Look up route.providerName in context.catalog.providers using the catalog provider id (route.catalogId ?? route.providerName). Package resolution follows this precedence:
- route.packageOverride (per-model route-level override) when set.
- context.overrides per-model packageOverride for (providerName, modelId) (ADR-0026 extension).
- route.npmPackage (provider-level npm override from local config, e.g. for custom providers).
- Catalog model-level npm field.
- Catalog provider-level npm field. If the resolved package is absent from the bundled-driver registry (below), treat it as a provider plugin (see § Provider plugin contract).
Look up the bundled driver. Look the resolved package name up in the static bundled-driver registry (§ Bundled drivers). The standard @ai-sdk/* drivers are statically imported into the daemon and compiled into the binary by bun build --compile — there is no on-disk provider store and no runtime install. If the package is in the registry, proceed to step 3. If it is absent and the provider is not a plugin, throw a driver_not_bundled error naming the package (the caller surfaces it to the UI, directing the operator to add a provider plugin). Why bundled, not on-disk: per ADR-0049 § Amendments 2026-06-29, a --compile'd daemon cannot resolve a dynamically-imported on-disk module's bare-specifier transitive dependencies (e.g. @ai-sdk/provider-utils) against an external node_modules; static bundling resolves them at build time.
Instantiate.
- Catalog package: call the bundled package's factory (e.g. createAnthropic, createOpenAI, createGoogleGenerativeAI, createOpenAICompatible) with modelId, the catalog's api baseURL (or route.baseUrl if overridden), and credentials from route.apiKey resolved by the harness. Per-model deviations (e.g. /v1/responses models routed through @ai-sdk/openai rather than the provider's default @ai-sdk/openai-compatible) are honoured via the packageOverride mechanism. Custom/baseURL-only providers (e.g. a self-hosted gateway configured with npmPackage: "@ai-sdk/openai-compatible" + a custom baseUrl) resolve to the bundled @ai-sdk/openai-compatible.
- Provider plugin: call the plugin's exported factory (route, context) → LanguageModelV2. The plugin owns its auth flow (ADR-0028, as amended by ADR-0049) and any custom wire logic. Antigravity is the reference plugin (see § Provider plugin contract).
Wrap with middleware. Pass the instantiated model through wrapLanguageModel(...) once, applying the kaged middleware stack: Langfuse MODEL_GENERATION spans, x-litellm-model-id capture, spend/usage events and limits (ADR-0026), retry/fallback (§ Retry policy). The middleware is provider-agnostic — applied identically to every resolved model regardless of whether it came from a package or a plugin.
Return the wrapped LanguageModelV2.

The streamModel and completeModel conveniences call resolveModel (with a default ResolveContext provided by the daemon — see § Integration with harness and daemon) and then drive the wrapped model's doStream / doGenerate, translating the resulting Vercel LanguageModelV2StreamParts into kaged's StreamEvent shape (see § Mastra integration).

Why resolveModel is primary. The harness calls it once per agent-loop setup to construct Agent.model. The daemon's provider test endpoint uses streamModel / completeModel for one-shot calls. Same resolver, same middleware, same wrapped LanguageModelV2 — the convenience functions just hide the resolve-then-drive step. There is no second provider path.

Catalog-driven provider resolution

Per ADR-0049 §4–§5, the catalog is the source of truth for which providers exist, which npm package serves each, and what baseURL/credentials each requires. Kaged consumes the models.dev api.json schema, mirrored and republished at models.kaged.dev (delivered alongside ADR-0049). The daemon image bundles a vendored snapshot of the combined catalog that the operator refreshes on their own cadence (never on the hot path).

Live endpoints at models.kaged.dev:

Endpoint	Shape
`/api.json`	Provider entries only
`/models.json`	Model entries only
`/catalog.json`	Combined (providers + models) — the shape the daemon bundles
`/manifest.json`	Provenance (`source_commit`, `fetched_at`, `schema_version`)
`/logos/{provider}.svg`	Per-provider logos (+ `/logos/labs/{lab}.svg`)

Catalog snapshot

The CatalogSnapshot type mirrors the combined catalog.json document from models.kaged.dev. The daemon loads it once at startup (and on operator-initiated refresh) and reads from it via the helpers in § Catalog operations.

/** Vendored models.dev snapshot, sourced from https://models.kaged.dev/catalog.json. */
interface CatalogSnapshot {
  /** Schema version from models.dev. */
  schemaVersion: string;
  /** Source commit from the kaged-models repo (provenance). */
  sourceCommit: string;
  /** When the snapshot was fetched/generated by kaged-models CI. */
  fetchedAt: number;             // epoch ms
  /** Provider entries keyed by canonical name. */
  providers: Record<string, CatalogProvider>;
  /** Model entries keyed by canonical "<provider>/<modelId>". */
  models: Record<string, CatalogModel>;
}

interface CatalogProvider {
  /** Canonical provider name (e.g. "anthropic", "openai", "google"). */
  name: string;
  /** Human-readable label. */
  label: string;
  /** npm package that serves this provider's models (e.g. "@ai-sdk/anthropic").
   *  The loader imports this package from the store by absolute path. */
  npm: string;
  /** Default base URL the package should target. */
  api: string;
  /** Environment variable name(s) the package reads for credentials (informational;
   *  kaged resolves credentials via local-config and passes them to the factory directly). */
  env: string[];
  /** Logos (vendored under $KAGED_HOME/catalog/logos and self-served — never hot-linked). */
  logo?: string;
}

interface CatalogModel {
  /** Canonical "provider/modelId" key. */
  key: string;
  /** Display name. */
  name: string;
  /** Per-model npm override (rare; e.g. "/v1/responses" models served by @ai-sdk/openai
   *  rather than the provider's default package). The loader honours this if present,
   *  subject to the operator's ADR-0026 `packageOverride` taking precedence. */
  npm?: string;
  /** Model-level metadata (capabilities, pricing, context limits). See § Model metadata catalog. */
  meta: ModelMeta;
}

Catalog operations

@kaged/llm exposes thin read-only helpers over the loaded snapshot. The daemon consumes these for the UI's provider/model pickers and for resolver inputs; nothing here writes to the snapshot.

Function	Returns	Purpose
`loadCatalog()`	`CatalogSnapshot`	Load the bundled snapshot. Cheap; cached after first call.
`listProviders(catalog)`	`CatalogProvider[]`	Provider entries for the picker.
`listModels(catalog, providerName?)`	`CatalogModel[]`	Model entries, optionally scoped to a provider.
`lookupProvider(catalog, name)`	`CatalogProvider \| undefined`	Provider entry by name.
`lookupModel(catalog, provider, modelId)`	`CatalogModel \| undefined`	Model entry by provider + modelId.
`resolvePackageName(catalog, overrides, provider, modelId)`	`string`	Effective npm package: operator `packageOverride` → catalog model-level `npm` → catalog provider `npm`.

What this replaces

This section replaces the pre-ADR-0049 § API shape resolution and § Driver catalog. The hand-maintained providerName → ApiShape map, the hardcoded driver table, the listDrivers() / knownProviders() / resolveApiShape() / getDefaultBaseUrl() / getDriverTestModel() helpers, and the DriverInfo / DriverAuthMode types are deleted. Provider and model metadata now flows from the catalog snapshot, not from kaged-maintained constants.

The daemon's GET /api/v1/local/providers endpoint (UI integration) is amended to relay catalog-derived data: the response shape gains providers: CatalogProvider[] and models: CatalogModel[] drawn from the snapshot. Availability is determined by the bundled-driver registry plus compiled-in plugins (the installed: string[] field tied to the retired $KAGED_HOME/providers/installed.json store is dropped — see ADR-0049 § Amendments 2026-06-29). The UI renders provider config forms from providers; the per-driver fields (label, defaultBaseUrl, authModes, testModel) are derived from the catalog entry plus the provider plugin contract (if the provider is a plugin). Local drivers (ollama, vllm, lm-studio, litellm) are catalog entries with api pointing at localhost URLs; their authModes include none first based on a kaged-side classification of local vs. remote providers. The HTTP API spec carries the precise response shape.

Catalog refresh and removal surfacing

The catalog snapshot is bundled into @kaged/llm at build time (the most recent at the time the package is built) and operator-refreshed from models.kaged.dev on the operator's schedule, never auto-fetched on the request hot path (ADR-0049 §5), never implied as a side effect of any other action. Refresh is whole-catalog: the daemon swaps in a new snapshot and re-reads.

The data-flow model is catalog as live base layer + operator overrides on top, with local config storing the minimum required to keep a provider/model working even if the catalog snapshot loses the entry:

What lives where

Layer	Stored?	Contents
Catalog snapshot (vendored in `@kaged/llm`, refreshable)	Bundled + operator-refreshed	Full metadata for every provider/model the snapshot covers, capabilities, pricing, context limits, display names, `npm` package, `api` baseURL. Referenced live at call time, not copied.
Operator's `local.toml` provider entry	Always	`npm` (= the `npm` package name from the catalog at add-time), `credentials` map satisfying the catalog's `env` requirements, optional `base_url`, optional `default_options`.
Operator's `local.toml` model entry (within provider)	Always	`id` (the model ID as the provider expects it) + `name` (nice display name).
Operator overrides (DB, ADR-0026)	When set	Per-(provider, model, field) overrides, sparse. Includes the new `packageOverride` field.

Everything beyond the minimum (capabilities, pricing, context limits, deprecation dates, etc.) is read live from the catalog snapshot when the entry is present and falls back to overrides-only when it isn't.

Resolution at call time

lookupModelMeta(provider, modelId) reads the catalog snapshot. resolveModelMeta(provider, modelId, overrides) (ADR-0026) merges:

Catalog entry for (provider, modelId) when present (the base layer).
Operator overrides on top (highest priority).

When the catalog lacks the entry, the base layer is null and only overrides apply, same pattern as the pre-ADR-0049 "model not in LiteLLM" path. Missing metadata is never fatal; the harness falls back to conservative defaults (e.g. compaction uses a fallback context window).

Removal surfacing on sync

Operator-initiated sync is the only path by which the daemon's catalog snapshot changes. The sync flow uses two endpoints:

POST /api/v1/local/catalog/sync fetches the latest snapshot from models.kaged.dev and returns a diff against the current bundled snapshot. It does not apply the changes.
POST /api/v1/local/catalog/sync/apply applies the confirmed sync. It accepts a list of keep decisions for configured rows the new snapshot drops.

The diff screen shows:

Added: providers and models present in the new snapshot but absent from the old.
Removed: providers and models present in the old snapshot but absent from the new.
Changed: entries whose metadata differs between snapshots. The UI summarizes this as "metadata updated".

For each removed entry that the operator has configured (i.e. there is matching local.toml provider or model row), the UI presents a "keep" checkbox, checked by default. The operator chooses:

Keep checked (default): local config row is preserved; the catalog simply no longer contributes a base layer for that entry. Future lookupModelMeta returns null for it; resolveModelMeta produces a meta built from overrides only. The provider/model keeps working because local config still has the minimum (npm package, model ID, display name) and the driver is bundled into the daemon binary (per § Bundled drivers).
Uncheck: local config row is deleted as part of the sync apply step. Operator has explicitly chosen to drop it.

Catalog sync never silently modifies operator config or removes routes. The only writes to local config during sync are the explicit operator-confirmed deletions from the "keep" checkbox.

When a whole provider disappears

If a provider entry is removed from a refreshed catalog snapshot:

The operator's local.toml still has the provider row with its npm package name and credentials.
The driver is still bundled into the daemon binary (catalog refresh cannot remove a bundled driver).
The provider becomes functionally a custom provider, resolution falls back to "no catalog entry, use local config + overrides only." Calls proceed; metadata is sparse (overrides-only or null-defaulted) but the wire works because the driver is resolved by package name from the bundled registry.
The UI surfaces the provider as "no longer in catalog" cosmetically; no functional impact unless the operator chooses to remove it via the sync confirmation.

This is the resilience property: as long as the operator's local config has the npm package name and the package is installed, the provider works, regardless of what the catalog does.

Provider config shape post-ADR-0049

The provider configuration in local.toml uses a catalog-linked shape. The npm field is primary and required, replacing the old driver field.

npm (required): the AI SDK package name to dynamically load.
catalog_id (optional): links the provider back to the models.dev catalog. If absent, it is a custom provider.
name (optional): the user-facing label. Defaults to the catalog name.
base_url (optional): overrides the catalog's default API endpoint.
credentials (optional): a map satisfying the catalog's environment requirements.
header_mappings (optional): maps environment variable names to request header names.
discover_endpoint (optional): the /v1/models URL for custom discovery.

The old driver field is removed from the schema. The loader silently prunes any rows missing the required npm field at load time.

Custom provider add flow

Operators can add custom providers that are not present in the catalog. This supports custom or self-hosted endpoints like local Ollama or vLLM instances.

Wire shape

Custom providers use the following configuration shape:

npm (required): the AI SDK package name to load.
base_url (optional): the custom API endpoint.
credentials (optional): a map of environment variables to header values for custom authentication.
header_mappings (optional): maps environment variable names to request header names.

Discovery

Custom providers can declare a /models discovery endpoint. The daemon fetches this endpoint to discover available models. It supplements the discovered list with any matching catalog entries, matched by the npm package name. The operator can also add models manually.

UI representation

Providers that are absent from the catalog display a "Custom Integration" badge in the UI. The detail view renders the package name, base URL, and credential mappings instead of catalog metadata.

Model discovery

The catalog snapshot is the primary source of "which models exist" — see § Catalog-driven provider resolution. The daemon's model catalog endpoints (http-api.md § Model catalog) read from the snapshot via listModels(catalog, providerName?) and listProviders(catalog). Persistence is handled by @kaged/local-config.

`listModels(catalog, providerName?)`

Returns model entries from the catalog snapshot, optionally scoped to a provider. Pure, synchronous, no network. See § Catalog operations for the signature.

Model display names

Model display names follow a simple precedence:

Catalog name — the CatalogModel.name field from the bundled snapshot (primary source for catalog models).
Operator-supplied name — the name field in the operator's local.toml model entry (for custom models or when the operator wants to override the catalog's display name).
Raw id — if neither is present, the model's raw id is used as-is. The operator can add a name if they want a nicer display.

No automatic name derivation (the pre-ADR-0049 humanizeModelId function is deleted — it was unnecessary functionality given the catalog already provides names).

Deprecation of per-provider listModels and models/refresh

The per-provider listModels function and the POST /api/v1/local/providers/:name/models/refresh endpoint are deprecated and removed entirely. They are replaced by the catalog sync flow for catalog providers, and the model discovery flow for custom providers. Custom providers can declare a /models discovery endpoint, and the daemon fetches this endpoint to discover available models. This replaces the old live model reconciliation flow.

Model metadata catalog

@kaged/llm is the single source of truth for model metadata — capabilities, pricing, and context limits. The daemon and UI consume this metadata via the shared types; they never parse pricing data themselves.

Data source

Per ADR-0049 §5, the base catalog is a vendored snapshot of models.dev (api.json schema), mirrored and republished at models.kaged.dev (delivered alongside ADR-0049). The snapshot is bundled into @kaged/llm at build time (the most recent at the time the package is built) and referenced live at call time — operator local.toml stores only the minimum (provider npm package name, model id + display name); the catalog contributes the rest (capabilities, pricing, context limits, deprecation) when the entry is present. Operator-initiated sync refreshes the snapshot from models.kaged.dev; the operator confirms any removals (see § Catalog refresh and removal surfacing). This replaces the pre-ADR-0049 LiteLLM model_prices_and_context_window.json snapshot as the base layer; ADR-0026's override/spend/usage machinery is preserved unchanged.

The catalog keys are canonical "provider/modelId" strings (e.g. "anthropic/claude-sonnet-4-20250514", "openai/gpt-4.1-mini") — matching kaged's native convention cleanly, so the LiteLLM-era key normalization layer is no longer required.

`ModelMeta`

The extracted metadata type for a single model. This is the shape the daemon and UI consume — never the raw catalog JSON.

interface ModelMeta {
  /** Catalog key (e.g. "anthropic/claude-sonnet-4-20250514"). */
  key: string;

  /** Catalog provider identifier (e.g. "anthropic", "openai", "google").
   *  Field name kept stable across the LiteLLM→models.dev migration for caller compatibility;
   *  pre-ADR-0049 this was `litellmProvider`, the semantics are now "catalog provider". */
  litellmProvider: string;

  /** Model mode — only "chat" models are relevant for kaged's agent loop. */
  mode: string;

  // --- Context limits ---
  maxInputTokens: number | null;
  maxOutputTokens: number | null;

  // --- Pricing (USD per token) ---
  pricing: {
    input: number;
    output: number;
    reasoning: number | null;          // null = same as output
    cacheRead: number | null;
    cacheWrite: number | null;
  };

  // --- Capabilities ---
  capabilities: {
    reasoning: boolean;
    vision: boolean;
    functionCalling: boolean;
    streaming: boolean;
    promptCaching: boolean;
    responseSchema: boolean;
    systemMessages: boolean;
    webSearch: boolean;
    audioInput: boolean;
    audioOutput: boolean;
    pdf: boolean;
  };

  // --- Deprecation ---
  deprecationDate: string | null;      // ISO 8601 date string or null
}

Fields not present in the catalog entry default to null (limits, optional pricing) or false (capabilities). pricing.reasoning defaults to null, meaning the caller should fall back to pricing.output for reasoning tokens when computing cost.

`lookupModelMeta`

Look up model metadata by kaged's "provider:model" identifier. Reads from the bundled catalog snapshot.

function lookupModelMeta(provider: string, modelId: string): ModelMeta | null;

With models.dev as the source, the catalog's "provider/modelId" key matches kaged's "provider:model" convention after a separator swap (: → /). No per-provider prefix mapping is required (the LiteLLM-era "google" → "gemini/" mapping is gone — models.dev uses "google" as the provider name). If no match is found, returns null — the caller proceeds without metadata. Missing metadata is never a fatal error.

`calculateCost`

A pure utility that computes the dollar cost of a completed LLM call from token counts and pricing metadata. Unchanged from pre-ADR-0049.

interface CostInput {
  usage: Usage;
  meta: ModelMeta | null;
}

interface CostBreakdown {
  input: number;
  output: number;
  reasoning: number;
  cacheRead: number;
  cacheWrite: number;
  total: number;
}

function calculateCost(input: CostInput): CostBreakdown;

If meta is null (unknown model), all costs are 0. If meta.pricing.reasoning is null, reasoning tokens are priced at meta.pricing.output. The function is pure — no side effects, no network calls.

Operator overrides and the resolution pipeline

Per ADR-0026, operators can override any metadata field per provider+model. Overrides are stored in the DB (@kaged/storage, model_overrides table), not in local.toml. The resolution order (post-ADR-0049):

Operator override from DB (highest priority)
Bundled models.dev snapshot (lowest priority)

This allows operators to correct stale pricing, fix wrong context windows, or add metadata for models not in the catalog (e.g. self-hosted fine-tunes behind Ollama/vLLM).

Override storage shape

interface ModelOverride {
  provider: string;        // kaged provider name (e.g. "anthropic", "ollama")
  modelId: string;         // model ID (e.g. "claude-sonnet-4-20250514", "my-llama3")
  field: string;           // field name from ModelMeta (e.g. "maxInputTokens", "pricing.input") — or "packageOverride"
  value: string;           // JSON-encoded value (number, boolean, string, or null)
  updatedAt: number;       // epoch ms
}

Sparse — only overridden fields have rows. The primary key is (provider, modelId, field).

Overridable fields

All scalar fields on ModelMeta, plus the new packageOverride field added by ADR-0049 Q6:

Field	Type	Notes
`maxInputTokens`	`number \| null`	Context window for compaction thresholds
`maxOutputTokens`	`number \| null`	Max output tokens
`pricing.input`	`number`	USD per input token
`pricing.output`	`number`	USD per output token
`pricing.reasoning`	`number \| null`	USD per reasoning token
`pricing.cacheRead`	`number \| null`	USD per cache read token
`pricing.cacheWrite`	`number \| null`	USD per cache write token
`capabilities.reasoning`	`boolean`
`capabilities.vision`	`boolean`
`capabilities.functionCalling`	`boolean`
`capabilities.promptCaching`	`boolean`
`capabilities.responseSchema`	`boolean`
`capabilities.systemMessages`	`boolean`
`capabilities.webSearch`	`boolean`
`capabilities.audioInput`	`boolean`
`capabilities.audioOutput`	`boolean`
`capabilities.pdf`	`boolean`
`deprecationDate`	`string \| null`
`tokenizer`	`string`	`"tiktoken" \| "gemini" \| "llama" \| "unknown"`
`packageOverride`	`string`	New (ADR-0049 Q6). Per-model npm package override. If set, `resolveModel` uses this package instead of the catalog's `npm` field for this `(provider, modelId)`.

Nested fields use dot notation in the field column (e.g. pricing.input, capabilities.vision).

`resolveModelMeta`

The merge function. Used by callers that need override-aware metadata (harness, daemon, compaction).

interface ResolvedModelMeta {
  meta: ModelMeta;                         // the merged metadata result
  package: string | null;                  // effective npm package for this model
  sources: Record<string, "override" | "default">;  // per-field origin tracking
}

function resolveModelMeta(
  provider: string,
  modelId: string,
  overrides: ModelOverride[],
  catalog?: CatalogLike,
): ResolvedModelMeta;

Behavior:

Start with the catalog default (lookupModelMeta). If no catalog entry exists, start from a null-default ModelMeta (all fields null/false, key and litellmProvider synthesized from inputs).
For each override, apply the value to the corresponding field. Dot-notation fields set nested values (e.g. pricing.input sets meta.pricing.input).
The sources map tracks which fields came from overrides vs defaults, enabling the UI to render visual distinction.
The package field is the effective npm package for this (provider, modelId), resolved in the following order:
1. A packageOverride row in overrides.
2. The catalog model's npm field.
3. The catalog provider's npm field.
4. null if no catalog is supplied and no packageOverride exists.

packageOverride is not part of ModelMeta itself — it lives only in the override table. The package field on ResolvedModelMeta exposes the effective package so callers (including the daemon's model metadata endpoint) can render it without a separate call.

Models not in the catalog. When lookupModelMeta returns null, the override system builds a ModelMeta entirely from overrides. Missing fields default to null (numeric), false (capabilities), or "unknown" (tokenizer). The key field is set to {provider}/{modelId}, litellmProvider is set to the provider name.

Example: Override context window for a self-hosted model

const overrides: ModelOverride[] = [
  { provider: "ollama", modelId: "llama3.1:70b", field: "maxInputTokens", value: "131072", updatedAt: Date.now() },
  { provider: "ollama", modelId: "llama3.1:70b", field: "pricing.input", value: "0", updatedAt: Date.now() },
  { provider: "ollama", modelId: "llama3.1:70b", field: "pricing.output", value: "0", updatedAt: Date.now() },
];

const resolved = resolveModelMeta("ollama", "llama3.1:70b", overrides);
// resolved.meta.maxInputTokens === 131072  (from override)
// resolved.meta.pricing.input === 0        (from override)
// resolved.meta.capabilities.vision === false (default, no override)
// resolved.sources["maxInputTokens"] === "override"
// resolved.sources["capabilities.vision"] === "default"

Example: Route `/v1/responses` models through `@ai-sdk/openai` instead of the default `@ai-sdk/openai-compatible`

const overrides: ModelOverride[] = [
  { provider: "openai", modelId: "gpt-5", field: "packageOverride", value: '"@ai-sdk/openai"', updatedAt: Date.now() },
];

// In the resolver:
resolvePackageName(catalog, overrides, "openai", "gpt-5");
// → "@ai-sdk/openai"   (override wins over catalog's default `npm` field for openai)

What this is NOT

Not a runtime fetcher. v0 does not fetch the catalog at runtime. The bundled snapshot is the base. Refresh is operator-chosen (cron / manual / never), never on the hot path.
Not exhaustive. The catalog covers models present in models.dev's dataset. Local/self-hosted models (Ollama, vLLM, LM Studio) are unlikely to appear; their metadata comes from operator overrides or defaults to null.
Not the live model list. The model discovery flow fetches live model IDs from custom provider APIs. lookupModelMeta() enriches those IDs with static metadata from the bundled snapshot. They are independent, a model can appear in discovery without metadata, and the catalog can contain models the operator hasn't configured.

Token estimation

Per ADR-0024, the harness needs to estimate token usage before each LLM call to decide whether compaction should fire. @kaged/llm exposes the estimator.

`estimateTokens`

interface EstimateInput {
  messages: Message[];                   // the candidate message list
  systemPrompt: string | string[];       // system prompt(s)
  modelMeta: ModelMeta | null;           // resolved via lookupModelMeta
  reservedOutputTokens?: number;         // budget reserved for the LLM's response (default 4096)
}

interface EstimateResult {
  inputTokens: number;                   // estimated input tokens (messages + system)
  reservedOutputTokens: number;          // echoed back; the harness uses this for the threshold check
  totalTokens: number;                   // inputTokens + reservedOutputTokens
  fraction: number;                      // totalTokens / modelMeta.contextWindow
  contextWindow: number | null;          // from modelMeta; null if metadata unavailable
  algorithm: "tiktoken" | "fallback";    // which estimator was used
}

function estimateTokens(input: EstimateInput): EstimateResult;

Behavior:

Algorithm preference. When modelMeta.tokenizer === "tiktoken" (most OpenAI and Anthropic models map cleanly), the estimator uses a local tiktoken implementation. When the model uses a different tokenizer (Gemini, Llama, etc.) or modelMeta is null, the estimator falls back to a character-count heuristic (chars / 3.5, rounded up). The algorithm field reports which was used.
Conservative. The estimator over-estimates rather than under-estimates. Wrong-direction errors (estimating too few tokens, then hitting context-length at provider) are caught by the reactive-fallback path in the harness — but they're expensive, so the estimator errs cautious.
System prompt counts. All system prompt content (including plugin-injected memory wrapped in <plugin:NAME> blocks) is counted.
Tool calls and results. Counted as part of the message they belong to. Tool-result bodies can be large; the estimator includes them in full.
reservedOutputTokens default. 4096. The harness can override per-call based on the operator's expected response length (a one-shot summarizer call might reserve 1500; a long-form coding session might reserve 8000).

fraction is the operative number. The harness compares fraction against the agent's configured upper threshold (default 0.85 per ADR-0024). When fraction >= upper_threshold, the harness triggers compaction.

contextWindow: null handling. When the resolved model is unknown to the metadata catalog (local Ollama model, brand-new release not yet in the bundled snapshot), modelMeta is null and contextWindow is null. The harness falls back to a conservative default (32k tokens) and emits a warning per call. Operators with unknown models should add metadata via local-config overrides (future work; not v0).

Example use from harness

import { estimateTokens, lookupModelMeta } from "@kaged/llm";

const modelMeta = lookupModelMeta(route.provider, route.model);
const estimate = estimateTokens({
  messages: candidateList,
  systemPrompt: assembledSystemPrompt,
  modelMeta,
  reservedOutputTokens: agentConfig.compaction?.reservedOutputTokens ?? 4096,
});

if (estimate.contextWindow !== null && estimate.fraction >= agentConfig.compaction.upper_threshold) {
  await runCompactionPipeline({ /* ... */ });
}

Performance

Tiktoken path: ~1-3ms for a typical session message list (50-200 messages).
Fallback path: ~0.1ms (string-length arithmetic).

Estimation runs before every LLM call. The cost is acceptable; it does not dominate latency.

`tokenizer` field on `ModelMeta`

ModelMeta is extended with an optional tokenizer: "tiktoken" | "gemini" | "llama" | "unknown" field. The bundled LiteLLM snapshot is augmented with this field where it can be determined from the LiteLLM data; unknown defaults to "unknown" and the estimator uses the fallback path.

Out of scope

Exact token counting via the provider's native tokenizer API. v0 estimates locally. Some providers offer a /tokenize endpoint; the harness may use these in the reactive-fallback path in a future amendment.
Per-prompt warm-cache of token counts. The estimator is stateless; computing the same message list twice does the work twice. A future cache keyed on message content hashes is plausible if profiling shows it matters.

Provider usage reporting

@kaged/llm exposes a usage reporting interface for querying provider quota and consumption data. The daemon calls fetchers and relays the results to the UI for budget dashboards and routing/backoff decisions.

Post-ADR-0049, fetchers are contributed by provider plugins via the plugin contract's fetchUsage field, or shipped in core for catalog providers whose quota endpoint is worth supporting but whose overall wire doesn't justify a plugin. Either way, the daemon treats them uniformly: it looks up the fetcher by provider name and calls it on demand.

Types

All usage types are kaged's own. They provide a normalized schema for representing provider quota limits, consumption windows, and budget status. Unchanged from pre-ADR-0049.

type UsageUnit = "percent" | "tokens" | "requests" | "usd" | "minutes" | "bytes" | "unknown";

type UsageStatus = "ok" | "warning" | "exhausted" | "unknown";

interface UsageWindow {
  id: string;                       // stable identifier (e.g. "quota", "5h", "7d")
  label: string;                    // human label (e.g. "Quota", "5 Hour", "7 Day")
  durationMs?: number;              // window duration when known
  resetsAt?: number;                // absolute reset timestamp (ms since epoch)
}

interface UsageAmount {
  used?: number;
  limit?: number;
  remaining?: number;
  usedFraction?: number;            // 0..1
  remainingFraction?: number;       // 0..1
  unit: UsageUnit;
}

interface UsageScope {
  provider: string;
  accountId?: string;
  projectId?: string;
  modelId?: string;
  tier?: string;
  windowId?: string;
  shared?: boolean;                 // quota shared across models in the provider
}

interface UsageLimit {
  id: string;                       // unique per limit entry (e.g. "zai:tokens", "gemini-3-flash:free:default")
  label: string;                    // display label
  scope: UsageScope;
  window?: UsageWindow;
  amount: UsageAmount;
  status?: UsageStatus;
}

interface UsageReport {
  provider: string;
  fetchedAt: number;                // epoch ms
  limits: UsageLimit[];
  metadata?: Record<string, unknown>;
}

interface UsageFetchOptions {
  apiKey?: string;                  // for API-key-authenticated providers (zai)
  accessToken?: string;             // for OAuth providers (antigravity, via plugin)
  projectId?: string;               // required by some OAuth providers
  baseUrl?: string;
  signal?: AbortSignal;
}

UsageReport is the primary shape the UI consumes. A single report contains multiple UsageLimit entries — one per quota window or limit type the provider exposes. The UI renders these as a budget dashboard: progress bars from usedFraction, status badges from status, reset countdowns from window.resetsAt.

Fetcher registry

The daemon holds a provider → fetcher registry populated at startup:

Core fetchers for catalog providers whose quota endpoint lives outside any plugin. Currently:
- fetchZaiUsage — Z.AI coding plan quota (GET /api/monitor/usage/quota/limit on https://api.z.ai). Auth: apiKey. The base URL is extracted from the baseUrl origin (strips /api/anthropic); Authorization carries the raw API key. Returns two UsageLimit entries (zai:tokens, zai:requests) sharing a UsageWindow with resetsAt from nextResetTime. Status: exhausted at usedFraction >= 1, warning at >= 0.9.
Plugin-contributed fetchers for plugins that export fetchUsage in their ProviderPlugin contract (§ Provider plugin contract). Antigravity is the reference: its plugin contributes fetchAntigravityUsage (formerly in core, migrating with the plugin), which fetches per-model quota from Antigravity's fetchAvailableModels endpoint and returns per-model/per-tier/per-window UsageLimit entries with unit: "percent".

All fetchers are async, return null on failure (non-ok response, invalid payload, missing credentials), and never throw — the daemon handles the null gracefully.

Adding new fetchers

Post-ADR-0049, two paths:

Plugin path (preferred for new OAuth/custom providers): the provider's plugin exports fetchUsage as part of its ProviderPlugin contract. The daemon's registry picks it up at plugin-load time. The plugin owns the fetcher's lifecycle alongside its wire and auth code.
Core path (for catalog API-key providers whose quota endpoint is straightforward): add packages/llm/src/usage/<provider>.ts exporting async function fetch<Provider>Usage(options: UsageFetchOptions): Promise<UsageReport | null>, export from packages/llm/src/index.ts, add a row to the core-fetcher table above. The daemon wires it to the appropriate credential source.

The plugin path is preferred because it keeps provider-specific concerns together. The core path remains available for providers where the wire is a stock @ai-sdk/* package but the quota endpoint is custom (Z.AI is the canonical example).

What this is NOT

Not a local accumulator. These fetchers query the provider's own quota API. They don't track usage locally by counting tokens in SQLite — that's the spend-event pipeline in the middleware (§ Middleware stack, ADR-0026).
Not polled automatically. The daemon decides when to fetch (on session start, on a timer, after a 429). The fetcher is a pure query function.
Not a routing gate. The daemon/harness may use usage data to inform fallback decisions, but the fetcher itself doesn't block or reroute calls. The spend-limit gate (which does block) is middleware, not the fetcher.

Bundled drivers

Per ADR-0049 § Amendments 2026-06-29, the standard catalog drivers are statically imported into the daemon and compiled into the binary by bun build --compile. There is no operator-local provider store, no runtime bun add, and no on-disk node_modules for providers. (The as-accepted ADR-0049 $KAGED_HOME/providers store, installed.json, and installProvider flow are retired — a --compile'd daemon cannot resolve a dynamically-imported on-disk module's bare-specifier transitive dependencies, but it resolves statically-bundled imports at build time.)

Bundled driver registry

@kaged/llm exposes a static registry mapping each supported npm package name to its imported factory. The packages are normal daemon dependencies, so bun build --compile bundles them (and their transitive dependencies) into the binary.

/** Statically-imported provider modules, keyed by npm package name. Bundled into the
 *  daemon binary at build time; the resolver looks packages up here, never on disk. */
const BUNDLED_DRIVERS: Record<string, { module: unknown; factory: string }>;

/** True if `pkg` is a bundled standard driver (vs. a provider plugin / unsupported). */
function isBundledDriver(pkg: string): boolean;

The bundled set is the full set named in the resolver's factory map:

npm package	factory	provider(s)
`@ai-sdk/anthropic`	`createAnthropic`	Anthropic
`@ai-sdk/openai`	`createOpenAI`	OpenAI (incl. `/v1/responses`)
`@ai-sdk/openai-compatible`	`createOpenAICompatible`	the OpenAI-API-compatible long tail (custom/baseURL, self-hosted gateways, Ollama, vLLM, GLM, …)
`@ai-sdk/google`	`createGoogleGenerativeAI`	Google Generative AI
`@ai-sdk/google-vertex`	`createVertex`	Google Vertex
`@ai-sdk/groq`	`createGroq`	Groq
`@ai-sdk/xai`	`createXai`	xAI
`@ai-sdk/mistral`	`createMistral`	Mistral
`@ai-sdk/cerebras`	`createCerebras`	Cerebras
`@ai-sdk/cohere`	`createCohere`	Cohere
`@ai-sdk/togetherai`	`createTogetherAI`	Together AI
`@ai-sdk/deepinfra`	`createDeepInfra`	DeepInfra
`@ai-sdk/perplexity`	`createPerplexity`	Perplexity
`@openrouter/ai-sdk-provider`	`createOpenRouter`	OpenRouter

Adding a stock driver is a daemon dependency addition plus a rebuild — not a runtime install. A provider whose resolved package is not in this registry and is not a provider plugin produces a driver_not_bundled error directing the operator to a provider plugin.

Catalog vs. bundled set. The catalog (models.kaged.dev) still names which providers exist and which npm package serves each; the bundled registry is the subset of those packages that are compiled in. A catalog provider whose npm is outside the bundled set is reachable only via a provider plugin.

Middleware stack

Per ADR-0049 §3, every resolved model — package or plugin — is wrapped once with kaged's middleware before being handed to Mastra or driven directly. This is where ADR-0014's "inline control" over provider calls now lives: provider-agnostic, applied uniformly.

The middleware uses the AI SDK's wrapLanguageModel({ model, middleware: [...] }) pattern. Each layer is a small, single-purpose middleware object implementing transformParams / wrapStream / onError hooks as appropriate. The order is significant — earlier layers see the call first on the way in, last on the way out.

#	Layer	Responsibility	Source
1	`x-litellm-model-id` capture	Inject the kaged-side `(provider, modelId)` into the request as the `x-litellm-model-id` header so providers that echo it (LiteLLM-style routers, internal gateways) can be cross-referenced. No-op for providers that ignore it.	ADR-0026
2	Spend limit gate	Before the call: query `provider_spend_events` for the current rolling window (5h, 7d) and compare against `provider_spend_limits`. Reject (hard block) if exceeded. Surface a structured error with which limit, current spend, and reset time.	ADR-0026
3	Retry / fallback	Retry transient failures (429, 5xx, network errors) with exponential backoff + jitter. Honor `Retry-After`. Limited attempt count, capped delay. If a fallback chain is configured (future; not v0), try the next provider after final-attempt failure. See § Retry policy.	ADR-0049
4	Langfuse `MODEL_GENERATION` span	Open a span around the call with provider/model/usage attributes; close on completion. Honors the operator's Langfuse configuration (ADR-0013) — no-op when Langfuse is disabled.	ADR-0013, ADR-0049
5	Usage / spend event	After the call: extract `Usage` from the response, write a `provider_spend_events` row with the computed cost (via `calculateCost`). Invalidates the provider's `provider_usage_cache` row (the call changed usage; cached report is stale).	ADR-0026

Each layer is independently testable. The wrapping is provider-agnostic — the same stack applies to @ai-sdk/anthropic, @ai-sdk/openai, and the Antigravity plugin identically. A provider that needs custom behavior at one of these layers (e.g. Antigravity's per-frame usage extraction) handles that inside its own factory before returning the LanguageModelV2 — the middleware above wraps whatever the factory produced.

Why this is in @kaged/llm rather than the daemon. The middleware is provider-call machinery; the daemon is the lifecycle/routing layer. Keeping the middleware here means both call paths (agent-loop via Mastra, direct via streamModel/completeModel) share it without the daemon having to inject it twice. The daemon's role is to provide the ResolveContext (catalog snapshot, overrides, install-gating callback) — it does not wrap the middleware itself.

Provider plugin contract

Per ADR-0049 §4 (as amended 2026-06-29), a provider plugin serves a provider absent from models.dev or whose wire/auth diverges from any stock @ai-sdk/* package. Antigravity is the reference plugin. Loading model (v0.2.4): the as-accepted ADR-0049 loaded plugins as in-store modules dynamically imported by absolute path; that was retired when the provider store was retired (a bun --compile binary cannot resolve a dynamically-imported on-disk module's transitive bare-specifier deps — see § Bundled drivers). First-party plugins are now compiled into the daemon binary (a static import registered in the resolver, exactly like a bundled driver, but yielding its LanguageModelV2 via createModel rather than a stock factory). A general operator-installable third-party plugin loader is deferred (the store can't do it under --compile; the subprocess-host alternative was rejected in ADR-0049 § Alternative F). The export contract below is unchanged; only where the module comes from changed (compiled-in, not store-loaded).

What a plugin exports

A provider plugin is a TypeScript/JavaScript module that default-exports a factory and named-exports the supporting metadata:

import type { LanguageModelV2 } from "@ai-sdk/provider";

export interface ProviderPlugin {
  /** Canonical provider name (must match the key the plugin's catalog contribution uses). */
  name: string;

  /** Factory: produce a LanguageModelV2 for a specific route.
   *  The plugin owns its auth resolution (ADR-0028) and any custom wire logic.
   *  The kaged middleware wraps whatever this returns. */
  createModel(
    route: ProviderRoute,
    context: PluginResolveContext,
  ): LanguageModelV2 | Promise<LanguageModelV2>;

  /** Catalog contribution: the plugin's own provider entry + model entries,
   *  merged into the effective catalog (operator-overridable) so the plugin's
   *  models appear in pickers and resolve metadata identically to catalog models. */
  catalogContribution: {
    provider: CatalogProvider;
    models: CatalogModel[];
  };

  /** Optional: usage fetcher (ADR-0026 `UsageReport`). Plugins that expose a
   *  quota endpoint contribute it here; the daemon polls it on demand like any
   *  other provider usage fetcher. */
  fetchUsage?: (options: UsageFetchOptions) => Promise<UsageReport | null>;

  /** Optional: declare which auth modes this plugin supports.
   *  Drives the UI's credentials rendering the same way the pre-ADR-0049 DriverInfo.authModes did. */
  authModes?: ("api_key" | "oauth" | "none")[];
}

export interface PluginResolveContext {
  /** Operator's resolved credentials from local-config (apiKey/env). Plugins that
   *  own their OAuth flow (ADR-0028) may ignore this and use their own token store. */
  apiKey?: string;
  /** Operator overrides for this provider (ADR-0026), including packageOverride. */
  overrides: ModelOverride[];
  /** Abort the resolution. */
  signal?: AbortSignal;
}

export default function plugin(): ProviderPlugin;

Resolution path for plugins

When resolveModel resolves a providerName whose package is not in the bundled-driver registry but which is a registered compiled-in plugin, it follows the plugin path:

Look the providerName up in the compiled-in plugin registry (a static map analogous to the bundled-driver registry; plugins register at build time, not via $KAGED_HOME/providers). If absent and the package is also not a bundled driver, throw driver_not_bundled.
Take the registered ProviderPlugin object (already in memory — no dynamic import, no on-disk resolution).
Call plugin.createModel(route, context) to get the LanguageModelV2.
Wrap with the standard middleware stack.
Return.

The plugin's catalogContribution is loaded once at daemon startup and merged with the bundled snapshot to form the effective catalog. Operator overrides (ADR-0026) apply over the contribution exactly as they do over catalog entries — same merge path, same precedence.

Antigravity as the reference plugin

Antigravity is the most divergent provider in the kaged roster (Google Cloud Code proxy, /v1internal:streamGenerateContent, custom envelope, per-frame usage, OAuth + projectId). It is implemented as a native module in @kaged/llm/src/antigravity/ — not a separate plugin repo, not a dynamically-loaded module. It is compiled into the daemon binary alongside the bundled @ai-sdk/* drivers.

It is a fetch-wrapper, not a standalone driver. Antigravity targets Google models, so the module does not implement its own doStream/doGenerate; instead createModel(route) instantiates the bundled @ai-sdk/google driver with a custom fetch override. The custom fetch:

Rewrites the URL from generativelanguage.googleapis.com to cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse.
Wraps the request body in the Antigravity envelope: { project, model, request: <gemini body>, userAgent, requestId }.
Swaps auth: strips x-api-key, adds Authorization: Bearer <token> + the Antigravity-specific headers (User-Agent, X-Goog-Api-Client, Client-Metadata).
Sanitizes tool schemas (strips const/$ref/$defs/default/examples, adds empty-schema placeholders, sanitizes tool names) for Antigravity's strict protobuf validation.
Resolves thinking config per model family: Claude uses thinking_budget (snake_case); Gemini 3 uses thinkingLevel strings; Gemini 2.5 uses numeric thinkingBudget.
Manages thought-signature caching for Claude multi-turn tool calls (sentinel injection for cache misses; stripping unsigned thinking blocks before tool_use parts).
Unwraps the response: each SSE frame's { response: { candidates... } } is stripped to just { candidates... } so @ai-sdk/google's parser sees a normal Gemini stream.

The module exports:

createModel(route) — wraps createGoogleGenerativeAI({ fetch: customFetch }) and returns a LanguageModelV2. Registered in the bundled-driver registry under "antigravity".
OAuth functions (buildAuthorizationUrl, exchangeCode, refreshTokens, ensureFreshTokens) — used by the daemon's auth handler.
Token store at $KAGED_HOME/oauth/antigravity-tokens.json — atomic read/write, auto-refresh on expiry.
Quota protection — fetchAvailableModels fetches per-model remainingFraction from Antigravity; the custom fetch blocks requests when usage exceeds 90% (configurable), preventing Google from penalizing accounts that fully exhaust quota.

The daemon resolves Antigravity credentials at call time in resolveModelRoute (primary-runner.ts): if the provider is antigravity, it reads the OAuth token store via ensureFreshTokens(kagedHome), injects accessToken into route.apiKey and projectId into route.defaultOptions. The OAuth login flow is exposed via POST /api/v1/local/providers/antigravity/auth/start (builds PKCE URL + spins up a temporary Bun.serve on port 51121 for the Google callback) and GET .../auth/status (checks token store). The UI renders a "Connect" button for antigravity providers that triggers this flow.

Implementation provenance. The wire logic (OAuth, envelope, tool-schema sanitization, thinking config, signature caching) is a native reimplementation informed by the opencode-antigravity-auth plugin by Jens (MIT). Multi-account rotation, Gemini-CLI quota fallback, and the opencode-plugin bridge concept were dropped; single-account with quota protection is the v0 scope.

What this replaces

This section replaces the pre-ADR-0049 § Provider adapter contract (which described the six hand adapters: anthropic-messages, openai-completions, openai-responses, openai-codex-responses, google-generative-ai, antigravity), the § SSE parser (hand-adapter infrastructure for parsing raw HTTP responses), and the § Partial-JSON parser (hand-adapter infrastructure for streaming tool-call arguments). All three are deleted — the loaded @ai-sdk/* packages own their wire parsing; provider plugins own theirs. Kaged no longer parses SSE or partial JSON in core.

The openai-responses reasoning-capture tech debt (STATUS.md § Known tech debt) resolves by deletion: the @ai-sdk/openai package parses reasoning output correctly, and the stub adapter that dropped it is gone.

Error taxonomy

Provider errors surface as StreamEvent with type: "error" and the error detail in error.errorMessage, exactly as in pre-ADR-0049. The @kaged/llm package does not throw exceptions for provider failures — errors are events.

Post-ADR-0049, the error originates in the loaded @ai-sdk/* package or provider plugin (whichever the resolver loaded) and is caught and classified by the retry/error middleware layer (§ Middleware stack). The classification maps provider-specific exceptions and HTTP states onto kaged's stable error taxonomy:

Error class	Cause	`errorMessage` contains
`auth_failed`	401/403 from provider	HTTP status + provider error body
`rate_limited`	429 from provider	HTTP status + `Retry-After` if present
`context_too_long`	400 with context-length signal	Provider's error message
`model_not_found`	404 or model-not-available	Provider + model ID
`provider_error`	500/502/503 from provider	HTTP status + body excerpt
`network_error`	DNS failure, connection refused, timeout	Error message from the underlying fetch
`aborted`	`AbortSignal` triggered	`"Request aborted"`
`parse_error`	Malformed response from provider	Raw data excerpt
`empty_response`	HTTP 200 with no usable content	Provider returned an empty response
`driver_not_bundled`	Resolver could not find the resolved package in the bundled-driver registry and the provider is not a plugin	Package name + provider name + plugin hint
`package_load_failed`	The bundled driver did not export the expected factory, or instantiation failed	Package name + error
`spend_limit_exceeded`	Middleware spend-limit gate blocked the call	Which limit (5h/7d/pct), current spend, reset time

The first nine classes (auth_failed through empty_response) come from the loaded package/plugin via the middleware's error trap. The last three (driver_not_bundled, package_load_failed, spend_limit_exceeded) originate inside @kaged/llm itself — resolver failures and middleware gate failures. They surface as the same StreamEvent shape so callers don't need to special-case them.

Retry policy

Per ADR-0049 §3, retry lives in the middleware stack — applied identically to every resolved model regardless of provider. The defaults are unchanged from pre-ADR-0049:

Property	Default	Notes
Max attempts	3	Includes the initial request.
Base delay	1000 ms	Doubled on each attempt.
Max delay	30000 ms	Caps both exponential backoff and `Retry-After` headers.
Jitter	±25%	Applied to avoid thundering herds.
Retryable statuses	429, 5xx	4xx other than 429 are client errors and are surfaced immediately.
Retryable network errors	connection, network, fetch, timeout, ECONNREFUSED, ENOTFOUND, ETIMEDOUT	DNS/refused/timeout errors are retried.
`Retry-After`	honored	`retry-after-ms` header is preferred; `retry-after` seconds are converted to ms.

The middleware returns the first successful response. If all attempts fail, the final error is surfaced as a StreamEvent with type: "error". AbortSignal aborts the in-flight attempt and prevents further retries.

Successful HTTP 200 responses that contain no usable content (empty body, no content blocks) are treated as an error rather than a silent empty completion — same posture as pre-ADR-0049. The middleware enforces this provider-agnostically.

Wiring and the open-call invariant (ADR-0052)

The retry middleware is applied by composing it into the wrapMiddleware the daemon passes to resolveModel (via ResolveModelDeps.wrapMiddleware). It is Tier 1 of the two-tier retry model in ADR-0052: fast, automatic, in-run, invisible to the operator.

Open-call only. The middleware wraps the model's doStream / doGenerate invocation, not stream consumption. It retries a failure to open the call; it does not retry once the stream has begun yielding content. This preserves streaming-first UX (ADR-0016): the harness publishes deltas live, so an automatic in-run retry after the first delta would replay content to the operator. Invariant: once the first delta is published, no automatic in-run retry occurs.
Cancellation. The middleware respects the run's AbortSignal: an in-flight retry back-off sleep is aborted when the run is cancelled (POST /sessions/:id/runs/:rid/cancel). This is how Tier 1 is operator-cancellable.
Long Retry-After hand-off. When a provider advises a back-off longer than the Max delay (30 s) cap, Tier 1 does not burn attempts idling. The failure is surfaced with the provider's advised absolute next-eligible time, which the daemon persists as retry_after_until on the failed run (see http-api.md message fields) so Tier 2 (the operator-visible frontend loop) can schedule rather than hammer.

Retry classification for Tier 2 (ADR-0052)

The frontend Tier 2 loop must not blindly retry every failure. The daemon derives a normalized { retryable, retry_class, retry_after_until? } from the error taxonomy above and persists it on the failed run:

Error class	`retryable` (Tier 2 auto-arms)
`rate_limited`, `provider_error`, `network_error`	yes
`context_too_long`	no — daemon already compaction-retried once (ADR-0024)
`auth_failed`, `model_not_found`, `spend_limit_exceeded`, `driver_not_bundled`, `package_load_failed`	no — not transient
`aborted`	no — operator cancelled
`parse_error`, `empty_response`	no — replay unlikely to differ

Tier 2 gates auto-retry on retryable, never on a coarse run_failed alone.

Auth model

Per ADR-0049, auth flows are owned by provider plugins for non-catalog providers, and injected via the catalog's env field plus kaged's existing apiKey resolution for catalog providers. The @kaged/llm package itself owns no provider-specific OAuth flow post-ADR-0049 — the src/oauth/ module that landed under ADR-0028 migrates to the Antigravity plugin (and is the template for future OAuth plugins).

API-key resolution (in `@kaged/local-config`, not here)

For catalog providers using API keys, @kaged/local-config resolves the provider's credentials map. Each entry is either a literal value or an env variable name; the resolved credential is passed as route.apiKey to resolveModel and the loaded package's factory receives it. @kaged/llm itself never reads environment variables or config files directly.

OAuth providers

A class of providers exists that the catalog names but for which kaged must run an OAuth flow against a consumer subscription (Claude Pro, ChatGPT Plus, GitHub Copilot, Antigravity, etc.). The terms of service for programmatic use of these subscriptions are in a gray area that corporate vendors choose to avoid. kaged is operator-owned, self-hosted; whether to use an OAuth-backed personal subscription with kaged is a decision the operator makes about their own account and provider relationship.

Per ADR-0049, kaged makes this an explicit architectural slot. Two paths:

Catalog provider with OAuth-bearing package — the @ai-sdk/* package the catalog names handles bearer-token injection inline. Kaged resolves a fresh token (refreshed via the harness's existing token-resolution path) and passes it as route.apiKey. No plugin needed. Claude-OAuth trends here (@ai-sdk/anthropic + bearer).
Provider plugin with own auth — for providers whose OAuth flow or wire requires custom logic absent from any stock package, the plugin owns its auth entirely (ADR-0028 PKCE, token store, refresh). Antigravity is the reference: its plugin owns the Google OAuth flow, the projectId resolution, the bearer-token refresh, and the wire-level envelope. The pre-ADR-0049 @kaged/llm/oauth/ module migrates into the Antigravity plugin (and serves as the template for future OAuth plugins — Copilot, Codex, etc.).

The operator-owns-TOS-choice stance is normative; the architecture does not need to be revisited when new OAuth providers land.

OAuth token storage (unchanged)

For plugins that own their auth, the token storage shape defined in ADR-0028 still applies — file at $XDG_CONFIG_HOME/kaged/oauth/<provider>-tokens.json, Zod-validated, atomic writes. The storage location is shared across plugins; each plugin owns only its own provider's file. The schema (refresh token, access token, expiry, optional metadata) is unchanged.

Mastra integration

Per ADR-0049, the LanguageModelV2 Mastra consumes is the same wrapped object the resolver returns — no translation shim between kaged and Mastra. ADR-0014's "LanguageModelV2 is the integration boundary" call is preserved and made more literal: the boundary object is now produced by the resolver + middleware directly, not by a kaged-authored shim translating kaged events to Vercel parts.

Public API

import { kagedModel } from "@kaged/llm/mastra";       // separate entry point
import type { LanguageModelV2 } from "@ai-sdk/provider";

/** Resolve a route to a middleware-wrapped LanguageModelV2 suitable for Mastra's Agent.model.
 *  Thin wrapper around resolveModel with a default ResolveContext provided by the daemon. */
function kagedModel(
  route: ProviderRoute,
  context?: ResolveContext,
): Promise<LanguageModelV2>;

kagedModel(route) is a thin async wrapper around resolveModel. It returns the wrapped LanguageModelV2 directly — the same object that drives the agent loop. There is no event-shape translation at this boundary; Mastra consumes Vercel parts natively.

Where translation still happens

Translation is no longer at the Mastra boundary; it is at the direct-call boundary. The kaged StreamEvent shape (§ Types) remains the type the daemon publishes to the UI via WebSocket and the type LlmEventStream yields. The convenience functions streamModel / completeModel (§ Provider resolution interface) drive the wrapped model's doStream / doGenerate and translate the resulting Vercel LanguageModelV2StreamParts into kaged's StreamEvent shape.

Mapping (Vercel → kaged), inside streamModel:

Vercel { type: "text-delta", textDelta } → kaged StreamEvent.text_delta
Vercel { type: "reasoning", textDelta } → kaged StreamEvent.thinking_delta
Vercel { type: "tool-call", toolCallId, toolName, args } → kaged StreamEvent.toolcall_end
Vercel { type: "finish", finishReason, usage } → kaged StreamEvent.done
Vercel { type: "error", error } → kaged StreamEvent.error

This is the inverse of the pre-ADR-0049 mapping direction (which translated kaged events → Vercel parts for the shim). The inversion is intentional: the loaded @ai-sdk/* packages own their wire format and produce Vercel parts natively; kaged normalizes to its own event shape only for the direct-call path that needs it.

Why a separate entry point (`@kaged/llm/mastra`)

kagedModel is still exported from @kaged/llm/mastra rather than the main index for two reasons:

Mastra is optional at this boundary. A consumer that only wants the resolver + middleware (e.g. a future non-Mastra agent loop) imports resolveModel from the main index and drives the LanguageModelV2 themselves. They don't need the daemon-default-ResolveContext-injected kagedModel wrapper.
Type-only isolation. The main @kaged/llm exports don't depend on @ai-sdk/provider types at the surface (callers pass ProviderRoute and receive LlmEventStream). The @kaged/llm/mastra entry point pulls in LanguageModelV2 for the kagedModel return type. This keeps the dependency surface tight.

@kaged/harness imports @kaged/llm/mastra. The daemon's provider test endpoint imports the main @kaged/llm.

Integration with harness and daemon

Per ADR-0049 and agent.md, @kaged/llm is consumed two ways. The same resolver + middleware run in both:

Primary path — Mastra agent loop

daemon (handlePostMessage)
  → harness (runPrimary)
  → harness (routeModel → ProviderRoute)
  → @kaged/llm.kagedModel(route, daemonResolveContext) → Promise<LanguageModelV2>
     ↳ resolveModel: catalog lookup → store check → opaque import → instantiate → wrap with middleware
  → Mastra (new Agent({ model: <wrapped LanguageModelV2> }).stream(messages))
  → Mastra calls LanguageModelV2.doStream(opts) on the wrapped model
  → middleware runs (capture, spend gate, retry, langfuse, usage event)
  → underlying @ai-sdk/* package or plugin handles the wire
  → Mastra emits ChunkType on fullStream
  → harness maps ChunkType → WsFrame
  → daemon publishes WsFrame to session subscribers

The agent loop, tool dispatch, supervisor / sub-agent topology, Processor pipeline, and suspend / resume checkpoints are all Mastra's responsibility. @kaged/llm is the resolver + middleware Mastra's Agent.model calls into.

Direct path — provider test, ad-hoc calls (no agent loop)

daemon (handleTestProvider, etc.)
  → @kaged/llm.completeModel(route, context, options)
     ↳ resolveModel → drive doGenerate → translate Vercel parts → kaged StreamEvents
  → returns AssistantMessage

The provider test endpoint and any future "I just want to ping the provider" call path uses completeModel / streamModel directly. Same resolver, same middleware, same underlying LanguageModelV2 — the convenience functions just add the Vercel→kaged event translation the direct path needs.

Why the same code in both paths

Per ADR-0049, kaged maintains one resolution + middleware path, not two. The middleware (capture, spend gate, retry, Langfuse, usage event) applies identically in both paths because it wraps the model before any caller sees it. The loaded @ai-sdk/* package or provider plugin handles the wire in both paths. Custom headers, retry policy, telemetry, spend tracking — all live in the middleware layer and apply uniformly.

The daemon's role is to provide the ResolveContext (catalog snapshot, operator overrides, install-gating callback that surfaces the "install this provider's package?" UI prompt) and to wire kagedModel into Mastra's Agent.model field. The daemon does not duplicate middleware logic.

Package structure

Post-ADR-0049, the package loses its provider-specific code (adapters, SSE parser, partial-JSON parser, OAuth module — all migrate to plugins or are deleted) and gains the resolver, loader, install flow, and middleware stack.

packages/llm/
  package.json
  tsconfig.json
  src/
    index.ts                 # public API: resolveModel, streamModel, completeModel, types re-export
    mastra.ts                # separate entry point: kagedModel (resolveModel + daemon-default context)
    types.ts                 # Message, Context, Tool, StreamEvent, Usage, etc. (unchanged boundary types)
    stream.ts                # LlmEventStream class (AsyncIterable + result()) — used by direct-call path
    route.ts                 # ProviderRoute, ResolveContext, StreamOptions, EffortLevel

    # Catalog layer (replaces provider-map.ts + hand driver catalog)
    catalog/
      snapshot.ts            # CatalogSnapshot, CatalogProvider, CatalogModel types
      load.ts                # loadCatalog(), cache
      lookup.ts              # listProviders, listModels, lookupProvider, lookupModel, resolvePackageName
      

    # Model metadata (LiteLLM base swapped for models.dev; ADR-0026 override pipeline preserved)
    model-meta.ts            # ModelMeta type, lookupModelMeta, resolveModelMeta, calculateCost
    estimate-tokens.ts       # estimateTokens (ADR-0024; unchanged)
    overrides.ts             # ModelOverride type, packageOverride merge logic (ADR-0026 extension)

    # Resolver + bundled-driver registry + middleware (the core)
    resolve/
      resolve-model.ts       # resolveModel: catalog lookup → bundled-driver registry lookup → instantiate → wrap
      bundled-drivers.ts     # static registry of the 14 @ai-sdk drivers (loadDriver, isBundledDriver, bundledDriverPackages)
    middleware/
      index.ts               # wrapLanguageModel + middleware stack ordering
      capture.ts             # x-litellm-model-id injection
      spend-gate.ts          # ADR-0026 spend-limit gate (hard block)
      retry.ts               # retry/fallback with exponential backoff + Retry-After honoring
      langfuse.ts            # MODEL_GENERATION span (no-op when Langfuse disabled)
      usage-event.ts         # ADR-0026 spend event write + provider_usage_cache invalidation

    # Convenience wrappers over resolveModel for the direct-call path
    dispatch.ts              # streamModel/completeModel: resolveModel → drive doStream/doGenerate → map Vercel→StreamEvent

    # Plugin contract types (first-party plugins are compiled into the daemon — these are the types they implement)
    plugin-types.ts          # ProviderPlugin, PluginResolveContext

    # Usage fetchers (catalog-provider quota endpoints; plugin fetchers travel with their plugins)
    usage-types.ts           # UsageReport, UsageLimit, UsageWindow, UsageFetchOptions, etc.
    usage/
      zai.ts                 # Z.AI coding plan quota (core; plugin-eligible later)

    # Mastra shim
    mastra-model.ts          # kagedModel implementation (thin wrapper; re-exported via mastra.ts)

    # Bundled catalog snapshot (replaces data/litellm-pricing.json)
    data/
      catalog.json           # bundled models.dev snapshot (operator-refreshed; bundled in the image)
      manifest.json          # { source_commit, fetched_at, schema_version } — provenance

  __tests__/
    types.test.ts
    route.test.ts
    catalog/
      load.test.ts           # catalog snapshot loading
      lookup.test.ts         # listProviders/listModels/lookupProvider/lookupModel/resolvePackageName
      
    resolve/
      resolve-model.test.ts  # resolver algorithm: catalog hit/miss, bundled-driver lookup, driver_not_bundled, override, middleware wrap
    middleware/
      capture.test.ts
      spend-gate.test.ts
      retry.test.ts          # exponential backoff, jitter, Retry-After, abort handling
      langfuse.test.ts
      usage-event.test.ts
    dispatch.test.ts         # streamModel/completeModel: resolveModel mock + Vercel→StreamEvent mapping
    model-meta.test.ts       # lookupModelMeta, resolveModelMeta, calculateCost, packageOverride merge
    estimate-tokens.test.ts  # unchanged
    plugin-types.test.ts     # ProviderPlugin contract shape
    usage/
      zai.test.ts
    mastra-model.test.ts     # kagedModel: resolveModel mock + daemon-default context

What was deleted

The following packages/llm/src/ files are removed by this amendment (their responsibilities migrate to plugins or to the new resolver/middleware layout):

provider-map.ts — replaced by the catalog snapshot.
dispatch.ts (in old form) — replaced by the resolver + the new thin dispatch.ts for the direct-call path.
models.ts (in old form) — replaced by catalog/lookup.ts. The old listModels() (live API fetcher) is deprecated and removed.
sse-parser.ts — deleted; the loaded @ai-sdk/* packages own their SSE parsing.
partial-json.ts — deleted; same reason.
data/litellm-pricing.json — replaced by data/catalog.json (models.dev snapshot).
providers/anthropic.ts, openai-completions.ts, openai-responses.ts, google.ts, antigravity.ts, codex/*, copilot/* — all deleted. Catalog providers load their @ai-sdk/* packages from the store; Antigravity (and Codex, Copilot if they remain non-catalog) become provider plugins.
oauth/* — migrates wholesale into the Antigravity plugin (and serves as the template for future OAuth plugins). The @kaged/llm/oauth/ entry point is removed; plugins own their auth.
usage/antigravity.ts — migrates into the Antigravity plugin's fetchUsage.
antigravity-schema.ts, gemini-schema.ts — deleted; schema normalization is owned by the loaded packages or the Antigravity plugin.
providers/retry.ts (old form) — replaced by middleware/retry.ts.

The corresponding __tests__/ files are deleted alongside their sources. New tests follow the new layout above.

Testing notes

Post-ADR-0049, the test surface shifts from per-provider adapter tests (mocking fetch to return canned SSE) to resolver, middleware, and catalog tests. The testing posture:

Resolver tests (resolve/resolve-model.test.ts): mock the catalog (catalog snapshot fixtures) and inject a fake loadDriver returning a fake bundled-driver module. Assert the resolution algorithm step-by-step: catalog hit/miss, bundled-driver registry lookup, packageOverride precedence, custom-provider (npmPackage + baseUrl) riding a bundled driver, instantiation, middleware wrapping. The not-bundled path verifies the driver_not_bundled error; the already-aborted-signal path verifies abort.
Middleware tests (one per layer, under middleware/): each layer is independently testable.
- capture.test.ts: assert x-litellm-model-id is set on the outbound request for catalog and plugin models.
- spend-gate.test.ts: given current spend near/exceeding a limit, assert the gate blocks the call with the correct error class (spend_limit_exceeded) and message (which limit, current spend, reset time). Given current spend under the limit, assert the call proceeds.
- retry.test.ts: feed transient errors (429, 503, network errors), assert retry attempts with exponential backoff and jitter; honor Retry-After headers; abort the in-flight attempt and prevent further retries when AbortSignal fires.
- langfuse.test.ts: assert a MODEL_GENERATION span opens and closes around the call; assert it's a no-op when Langfuse is disabled in config.
- usage-event.test.ts: assert a provider_spend_events row is written with the correct cost (via calculateCost from the model's metadata); assert the provider_usage_cache row for the provider is invalidated.
Dispatch tests (dispatch.test.ts): mock resolveModel to return a fake LanguageModelV2 whose doStream yields canned Vercel parts. Assert streamModel translates them into the correct kaged StreamEvent sequence and final AssistantMessage shape.
Error tests: feed errors from the underlying model (401, 429, 500, network), assert the middleware's retry layer classifies them onto kaged's error taxonomy (§ Error taxonomy). Assert resolver-originated errors (driver_not_bundled, package_load_failed) surface as the same StreamEvent shape.
Abort tests: fire AbortController.abort() mid-stream, assert aborted event and partial AssistantMessage.
Catalog tests (catalog/load.test.ts, catalog/lookup.test.ts): given a fixture catalog JSON, assert loadCatalog parses it correctly; assert listProviders, listModels, lookupProvider, lookupModel, resolvePackageName honor precedence (override → model-level npm → provider-level npm).
Model metadata tests (model-meta.test.ts): the pre-ADR-0049 test cases survive with the source swapped from LiteLLM to models.dev — lookupModelMeta returns the correct ModelMeta from the bundled snapshot, unknown-model returns null, capability/pricing extraction, calculateCost with/without metadata, reasoning fallback. Plus a new case: resolvePackageName honoring packageOverride.
Plugin contract tests (plugin-types.test.ts): the ProviderPlugin interface is structural; tests assert a reference plugin (Antigravity) satisfies it and that the resolver's plugin-lookup path (provider resolves to a registered compiled-in plugin → call createModel) works end-to-end with a fixture plugin module.
Mastra shim test (mastra-model.test.ts): construct via kagedModel(route, context) with a mock ResolveContext. Assert it returns the wrapped LanguageModelV2 produced by resolveModel. There is no event-shape translation at this boundary anymore — the test verifies the wrap, not a mapping.

All tests use bun:test. No live provider calls in unit tests. Integration tests against real providers are manual/operator-initiated (not in CI).

Open questions

Architectural questions are closed. The implementation-phase questions that remain are surfaced here so reviewers know what is not yet decided at the code level (none block the spec being the contract):

✅ ~~OAuth stored in local config by extending ProviderSchema~~ — resolved pre-ADR-0049. Post-ADR-0049 amendment: OAuth flows owned by provider plugins, not by @kaged/llm core. See § Auth model.
✅ ~~Streaming: dual call path~~ — preserved through ADR-0049; primary via Mastra (wrapped LanguageModelV2), direct via streamModel / completeModel. See § Integration with harness and daemon.
✅ ~~Package scope~~ — resolver + loader + middleware (post-ADR-0049); not a general-purpose LLM framework.
✅ ~~Tool surface~~ — kaged Tool shape (JSON Schema parameters), unchanged.

Implementation-phase (not spec-blocking):

When does the Antigravity plugin land? Spec describes the contract; the actual build-out in kaged-plugin-antigravity (a fetch-wrapper over the bundled @ai-sdk/google driver, with opencode-antigravity-auth vendored and the whole thing compiled into the daemon) is scheduled separately. See that repo's TODO.md and § Provider plugin contract.
Compiled-in plugin registration seam. How a first-party plugin registers into the resolver's compiled-in plugin registry (the analogue of the bundled-driver registry) is an implementation detail to settle when the first plugin lands. No dynamic-import passthrough is needed — there is no store (per § Amendments 2026-06-29).
Driver set already bundled. The 14 @ai-sdk/* drivers are statically imported in packages/llm/src/resolve/bundled-drivers.ts and compiled in by bun build --compile; no pre-seed/image step (the Dockerfile store COPY was removed).
Per-provider disposition for non-catalog OAuth providers (Copilot, Codex). Deferred per ADR-0049 Q5 — single-user pre-alpha, no compatibility surface to manage. Decided per-provider when added.

Already delivered (no longer open):

✅ kaged-models repo + models.kaged.dev publish pipeline — shipped alongside ADR-0049 acceptance. Live at models.kaged.dev with /api.json, /models.json, /catalog.json, /manifest.json, /logos/{provider}.svg. See ADR-0049 §5 for the endpoint table.

Amendments

2026-06-29 — Antigravity native module implemented in @kaged/llm

The Antigravity integration is implemented as a native module in @kaged/llm/src/antigravity/ (13 files), not as a separate plugin repo or a dynamically-loaded module. It is a fetch-wrapper over the bundled @ai-sdk/google driver, compiled into the daemon binary. The module includes: OAuth (PKCE + token exchange + refresh + projectId discovery), request transform (envelope wrap + URL rewrite + header swap + tool-schema sanitization + thinking config + thought-signature caching for Claude multi-turn tools), response transform (SSE .response unwrap), quota protection (blocks when usage exceeds 90%), and a createModel(route) factory registered in the bundled-driver registry under "antigravity". The daemon resolves OAuth tokens from $KAGED_HOME/oauth/antigravity-tokens.json via ensureFreshTokens(kagedHome) in resolveModelRoute. Login flow via POST /api/v1/local/providers/antigravity/auth/start (temp Bun.serve on port 51121 for the Google callback) + GET .../auth/status. The § Antigravity reference plugin section above is rewritten to match. Multi-account rotation and Gemini-CLI quota fallback dropped (single-account with quota protection is v0 scope). Wire logic is a native reimplementation informed by opencode-antigravity-auth (MIT, by Jens).

2026-06-29 — ADR-0049 amended: bundled drivers replace the on-disk store; provider plugins compiled-in

The on-disk provider store ($KAGED_HOME/providers, installed.json, installProvider/bun add, absolute-path entry resolution, opaque dynamic import) was retired. A bun build --compile daemon cannot resolve a dynamically-imported on-disk module's transitive bare-specifier dependencies (the Cannot find module '@ai-sdk/provider-utils' failure, confirmed live; oven-sh/bun#27058), so the store could never load a provider in the container. See ADR-0049 § Amendments 2026-06-29.

Changes to this spec:

§ Bundled drivers replaces § Provider store + § Install flow: the standard @ai-sdk/* drivers (the 14 in the factory map) are statically imported and compiled into the daemon binary; resolveModel looks packages up in a bundled-driver registry, not on disk. @ai-sdk/provider bumped from v3/v5 references to v4 to match the bundled drivers and ai@7.
§ Resolution algorithm rewritten: catalog lookup → bundled-driver registry lookup → instantiate → middleware. The storeCheck/entryPath/dynamicImport/requestInstall deps and the package_not_installed error are removed; ResolveModelDeps now carries loadDriver + wrapMiddleware, and the resolver-originated error is driver_not_bundled.
§ Provider plugin contract: the export contract (createModel / catalogContribution / fetchUsage / authModes) is unchanged, but plugins are now compiled into the daemon (registered in a compiled-in plugin registry), not loaded from $KAGED_HOME/providers. The @ai-sdk/provider-v5 type import is corrected to @ai-sdk/provider (v4). A general operator-installable third-party plugin loader is deferred.
§ Antigravity reference plugin: clarified that Antigravity is a fetch-wrapper over the bundled @ai-sdk/google driver (custom fetch injecting OAuth + the Antigravity envelope), not a standalone driver; its source lives in kaged-plugin-antigravity with opencode-antigravity-auth vendored, compiled into the daemon.

Shipped in daemon v0.2.4.

2026-06-26 — Provider store implementation: route/npmPackage, ResolveModelDeps, install storeRoot

Implementation-phase amendment landing alongside the ADR-0049 provider-store code:

ProviderRoute gains npmPackage and packageOverride. npmPackage carries the provider-level npm package name from local config (used for custom providers and to override the catalog's provider-level npm). packageOverride is a route-level per-model override for callers that have already resolved it.
Package resolution precedence clarified. The resolver now evaluates overrides in this order: route.packageOverride → context.overrides packageOverride → route.npmPackage → catalog model-level npm → catalog provider-level npm.
ResolveModelDeps documented. The resolver accepts injectable storeCheck, entryPath, dynamicImport, and wrapMiddleware hooks for unit testing. storeCheck and entryPath may return Promises.
InstallProviderOptions gains storeRoot. The install flow needs the absolute path to $KAGED_HOME/providers; the caller (daemon) provides it.

2026-06-26 — ADR-0049: finalize provider model metadata + tokenizer

Per the ADR-0049 provider cleanup:

driver field removed from ProviderRoute. @kaged/llm no longer accepts a driver on the route; the daemon resolves providers by catalog_id and npm.
tokenizer made optional on ModelMeta. The bundled catalog may now declare an explicit tokenizer per model; when absent, @kaged/llm derives one from the provider key (tiktoken/gemini/llama/unknown). The estimator uses tiktoken when declared/derived, otherwise the conservative character-count fallback.
Provider usage fetcher deferred. The legacy DRIVER_FETCHER_MAP and fetchZaiUsage wiring are removed from @kaged/llm. Usage fetching is now an open topic; future work will key fetchers to catalog entries.

2026-06-25 — ADR-0049: provider store, dynamic loading, middleware seam (major rewrite)

Driven by ADR-0049 (Accepted 2026-06-25). This is a structural rewrite of the spec — not a narrow amendment. The durable surface (boundary types, token estimation, the dual call path) is preserved; the provider layer is replaced wholesale.

Architectural shift. @kaged/llm transitions from a hand-adapter provider interface (six adapters, SSE parser, partial-JSON parser, OAuth module) to a resolver + loader + middleware that dynamically loads @ai-sdk/* packages (or kaged-convention provider plugins) from an operator-local store at $KAGED_HOME/providers. The loaded module yields a LanguageModelV2; kaged wraps it in provider-agnostic middleware (capture, spend gate, retry, Langfuse, usage event) before exposing it to Mastra or the direct-call path.

Sections rewritten or replaced.

§ Purpose reframed: from "talks to LLM providers" to "resolves, loads, and wraps LLM provider modules." "Not normative for" list extended to cover provider wire protocols (now owned by loaded packages/plugins) and the models.dev catalog itself (owned by kaged-models).
§ Constraints table updated: removed the "no official SDKs / pure fetch-based" constraint (reversed by ADR-0049); added the store + dynamic-load + middleware + bundled-catalog + operator-override constraints. § Design rationale deleted (it justified the hand-adapter stance that's now reversed).
§ Provider adapter interface → § Provider resolution interface. New primary entry resolveModel(route, context) → Promise<LanguageModelV2> with the resolution algorithm documented step-by-step (catalog lookup → store check → install gate → opaque dynamic import by absolute path → instantiate → wrap with middleware). streamModel / completeModel retained as conveniences that internally call resolveModel and add Vercel→StreamEvent translation.
§ API shape resolution + § Driver catalog → § Catalog-driven provider resolution. The hand-maintained providerName → ApiShape map, the hardcoded driver table, and the DriverInfo / DriverAuthMode / listDrivers() / knownProviders() / resolveApiShape() / getDefaultBaseUrl() / getDriverTestModel() surface is deleted. Provider and model metadata flows from the bundled models.dev catalog snapshot via CatalogProvider / CatalogModel types and read-only helpers (loadCatalog, listProviders, listModels, lookupProvider, lookupModel, resolvePackageName).
§ Model discovery. listModels() is deprecated and removed. The catalog snapshot is the primary source. Custom providers use the model discovery flow.
§ Model metadata catalog. Base source swapped: LiteLLM model_prices_and_context_window.json → vendored models.dev snapshot. The ModelMeta type is preserved (the litellmProvider field name is kept for caller compatibility; semantics now "catalog provider"). Key normalization simplified — models.dev uses "provider/modelId" natively, matching kaged's convention cleanly. The ADR-0026 override pipeline is preserved unchanged.
§ Operator overrides. New packageOverride field added per ADR-0029 Q6 — per-model npm package override, stored in the ADR-0026 model_overrides table, honoured by resolvePackageName with precedence: operator override → catalog model-level npm → catalog provider npm.
§ Provider adapter contract → § Provider store + § Install flow + § Middleware stack + § Provider plugin contract. The six hand adapters (anthropic-messages, openai-completions, openai-responses, openai-codex-responses, google-generative-ai, antigravity) and their per-adapter documentation are deleted. The store layout ($KAGED_HOME/providers/package.json, bun.lock, node_modules/, installed.json) is documented. The install flow (installProvider / uninstallProvider shelling out to bun add / bun remove) is documented. The middleware stack (capture / spend-gate / retry / Langfuse / usage-event layers, applied via wrapLanguageModel) is documented. The provider plugin contract (ProviderPlugin interface with createModel / catalogContribution / optional fetchUsage / authModes) is documented, with Antigravity as the reference plugin.
§ SSE parser + § Partial-JSON parser — DELETED. Hand-adapter infrastructure. The loaded @ai-sdk/* packages own their SSE parsing; provider plugins own theirs. Kaged no longer parses SSE or partial JSON in core.
§ Error taxonomy. The nine pre-ADR-0049 error classes (auth_failed through empty_response) survive — they originate in loaded packages/plugins and are caught + classified by the middleware. Three new classes added: package_not_installed, package_load_failed (resolver-originated), and spend_limit_exceeded (middleware-gate-originated).
§ Retry policy. Defaults unchanged. Location changed: lives in the middleware stack (middleware/retry.ts), not in per-adapter code. Applied uniformly to every resolved model.
§ Auth model. OAuth flows owned by provider plugins for non-catalog providers (Antigravity is the reference; the pre-ADR-0049 @kaged/llm/oauth/ module migrates into the Antigravity plugin). Catalog providers using OAuth ride the catalog-named @ai-sdk/* package with kaged-resolved bearer tokens. The operator-owns-TOS-choice stance is normative; the architecture no longer needs revisiting when new OAuth providers land.
§ Mastra integration. The LanguageModelV2 shim largely collapses. kagedModel(route, context) becomes a thin async wrapper around resolveModel — it returns the wrapped LanguageModelV2 directly. No event-shape translation at the Mastra boundary anymore; Mastra consumes Vercel parts natively. Translation moves to the direct-call boundary (streamModel / completeModel map Vercel parts → kaged StreamEvents for the daemon's WebSocket relay and the LlmEventStream type).
§ Integration with harness and daemon. Both call paths redrawn: the primary path goes through resolveModel → wrapped LanguageModelV2 → Mastra; the direct path goes through resolveModel → streamModel translation. Same resolver, same middleware.
§ Package structure. Radical rewrite. New layout under src/ for catalog/, resolve/, store/, middleware/, plugin-types.ts. Old provider-map.ts, dispatch.ts (old form), models.ts (old form), sse-parser.ts, partial-json.ts, data/litellm-pricing.json, providers/* (all six adapters + Codex + Copilot), oauth/*, usage/antigravity.ts, antigravity-schema.ts, gemini-schema.ts, providers/retry.ts are deleted. Corresponding __tests__/ files deleted alongside.
§ Testing notes. Test surface shifts from per-provider adapter tests (mocking fetch to return canned SSE) to resolver tests, middleware tests (one per layer), catalog tests, dispatch tests, install-flow tests, and plugin-contract tests. Specific test cases documented per layer.
§ Token estimation — unchanged. The estimator consumes ModelMeta; the source swap (LiteLLM → models.dev) is transparent to it.
§ Types (Message / Context / Usage / StreamEvent) — unchanged. These are the boundary types Mastra and the daemon consume.

Constrained-by list. Added ADR-0049; also added ADR-0041 (relevant to the image pre-seed + bundled catalog + bun-on-PATH requirements). Removed ADR-0014 — it's now Superseded (partial) by ADR-0049.

Implementation order (per ADR-0003 doc-first → TDD). This spec amendment lands first. Code follows via TDD: failing tests against the new resolver/middleware contract, then implementation that deletes the hand adapters + parsers + OAuth module and lands the resolver + loader + middleware. STATUS.md "Additional OAuth provider configs" gap is superseded by the Antigravity-plugin migration.

Known tech debt resolved by this amendment. The openai-responses reasoning-capture gap (STATUS.md § Known tech debt, also documented in the 2026-06-03 amendment below) resolves by deletion — the stub adapter is gone, and @ai-sdk/openai parses reasoning output correctly.

2026-06-03 — `openai-responses` reasoning capture noted as a known gap

openai-responses adapter documented as a reasoning stub. The v0 adapter list now records that the generic openai-responses adapter requests reasoning but never parses reasoning output events (response.reasoning_summary_text.* / reasoning items), so reasoning content is dropped for any reasoning model routed through it. This is captured as tech debt in STATUS.md; the openai-codex-responses adapter is the reference implementation for closing it. No code change accompanies this amendment — it documents existing behavior surfaced during the reasoning-ordering fixes across the streaming adapters.

2026-05-31 — GitHub Copilot driver + device-code OAuth flow

New copilot driver added. GitHub Copilot joins the driver catalog as an openai-completions provider with default base URL https://api.githubcopilot.com, auth mode oauth, and default test model gpt-4o.
Device-code OAuth flow added. @kaged/llm/oauth now supports providers that authenticate via device code instead of PKCE browser redirect. ProviderOAuthConfig gains optional deviceCode configuration and login start results can return userCode / verificationUri for daemon→UI relay.
Copilot-specific request headers documented. Requests may add X-Initiator, Copilot-Vision-Request, and Openai-Intent dynamically based on the conversation and image input.
Enterprise URL resolution documented. Copilot keeps the public default host (api.githubcopilot.com) but can derive enterprise hosts as https://copilot-api.{ghe-domain} when the operator authenticated against GitHub Enterprise.
Post-login model policy activation added. Copilot runs a best-effort post-login hook that enables known models requiring policy acceptance before first use.
Token exchange behavior documented. Login stores the long-lived GitHub OAuth token. At request time, the OpenAI-compatible adapter exchanges it against GET https://api.github.com/copilot_internal/v2/token to obtain the short-lived Copilot API bearer token.
Catalog tables updated. Both the API-shape resolution table and the v0 driver catalog now include copilot.
Package structure updated. Added Copilot provider constants/helpers (src/providers/copilot/) plus device-code OAuth flow support in src/oauth/device-code.ts and matching tests under __tests__/oauth/ and __tests__/providers/copilot/.

2026-05-27 — ADR-0024: `estimateTokens` API for pre-call compaction threshold check

Per ADR-0024:

New § Token estimation added (under § ModelMeta). Documents estimateTokens() — the function the harness calls before every LLM call to compute the current token usage and compare against the agent's configured compaction threshold.
EstimateInput and EstimateResult types defined. Inputs: messages, system prompt, model metadata, optional reserved output budget. Outputs: input tokens, reserved output tokens, total, fraction-of-context-window, context window size, algorithm used.
Algorithm selection. Tiktoken when the model uses an OpenAI-compatible tokenizer; character-count fallback otherwise. The estimator over-estimates rather than under-estimates (reactive fallback in the harness catches the wrong-direction cases).
tokenizer field added to ModelMeta. Values: "tiktoken" | "gemini" | "llama" | "unknown". Used by the estimator to choose the algorithm; populated from the bundled LiteLLM snapshot where determinable, defaults to "unknown".
Unknown-model handling. When lookupModelMeta returns null (model not in catalog), the estimator falls back to a 32k conservative default for contextWindow and emits a per-call warning. Operators with unknown models will eventually have a local-config override path (future work; not v0).
Constrained-by list extended with ADR-0024.

2026-05-23 — `LanguageModelV2` shim + dual call-path + OAuth provider role

Driven by ADR-0014:

LanguageModelV2 shim added (new § Mastra integration). kagedModel(route) is a factory exported from @kaged/llm/mastra that returns a Vercel-AI-SDK-shaped LanguageModelV2. Mastra v1.x uses this as Agent.model. The shim is the only Mastra-aware code in @kaged/llm.
Integration with harness rewritten. Replaced the previous single-path call chain with the dual-path description: primary path (agent loop via Mastra) and direct path (provider test, ad-hoc calls). Same provider adapters run in both.
OAuth providers section rewritten. Was "OAuth (future)" with a forward-compat note. Now "OAuth providers (kaged's distinctive path)" — @kaged/llm's ability to ship OAuth / subscription adapters Mastra / Vercel won't is the reason the dual-path architecture exists, not an afterthought. Operator-owns-TOS-choice stance documented. v0 status unchanged: API keys only ship in v0.
Package structure updated. Added mastra.ts (entry point) and mastra-model.ts (implementation) to the src/ listing, plus mastra-model.test.ts to __tests__/.
Constraint table + Constrained-by list updated. New row pointing at ADR-0014. Constrained by list now includes ADR-0012 and ADR-0014.
Open questions cross-referenced. Items #1 and #2 cite ADR-0014 for the now-concrete resolutions.

2026-05-23 — Driver catalog spec

New § Driver catalog added (under § API shape resolution). Documents DriverInfo, DriverAuthMode, listDrivers(), and the full v0 driver table with labels, auth modes, base URLs, local flags, and test models.
UI integration contract documented. Specifies how the daemon relays known_drivers: DriverInfo[] and how the UI consumes it (driver select rendering, base URL pre-fill, conditional credentials, contextual badges). No driver metadata hardcoded in the UI. (Legacy; the driver catalog was superseded by ADR-0049. Provider selection is now catalog-driven.)

2026-05-23 — Model discovery functions

New § Model discovery added (under § Driver catalog). Documents listModels() and humanizeModelId() — the two functions the daemon's model catalog endpoints consume.
listModels() fetches live model lists from provider APIs. Covers all four API shapes with per-shape extraction logic (OpenAI /v1/models, Anthropic paginated /v1/models, Google paginated /v1beta/models with generateContent filter). Returns { ok, models, error? } — never throws.
humanizeModelId() generates display names from model IDs (hyphen/underscore → space, title-case). Used as fallback when operators haven't set an explicit name in their model catalog.
"Not normative for" list updated. Replaced the stale "deferred" model catalogs note with accurate scope: this package provides discovery functions; persistence is @kaged/local-config's responsibility.

2026-05-24 — Model metadata catalog + pricing

Driven by streaming-first enrichment work (provider:model labels, post-message stats bar, cost tracking in UI):

Usage.cost.reasoning field added. The cost object inside Usage gains a reasoning: number field to separately track the dollar cost of reasoning/thinking tokens. Previously, reasoning tokens were silently lumped into the output cost; now callers can display them distinctly.
New § Model metadata catalog added (under § Model discovery). Defines:
- ModelMeta — the extracted metadata type for a single model (context limits, pricing per-token, capability flags, deprecation date). Sourced from LiteLLM's community-maintained JSON; @kaged/llm ships a bundled snapshot updated at release time.
- lookupModelMeta(provider, modelId) — maps kaged's "provider:model" convention to LiteLLM keys and returns ModelMeta | null. Missing metadata is never fatal.
- calculateCost(usage, meta) — pure utility computing dollar cost from token counts and pricing metadata. Falls back to output pricing for reasoning tokens when pricing.reasoning is null. Returns all-zero when metadata is unavailable.
- Operator overrides — local config can override pricing and capabilities per model, taking precedence over the bundled catalog.
Package structure updated. Added model-meta.ts (types + functions), data/litellm-pricing.json (bundled snapshot), and model-meta.test.ts.
Testing notes updated. Added model metadata test cases: key normalization, unknown model, capability extraction, pricing extraction, calculateCost with/without metadata, reasoning price fallback.

2026-05-25 — Antigravity provider adapter

Adds Antigravity (Google Cloud Code proxy) as the fifth API shape and provider adapter:

New antigravity API shape added. ApiShape union extended. Antigravity gets its own shape rather than reusing google-generative-ai — URL structure (/v1internal:streamGenerateContent, model in body not path), auth mechanism (Bearer token / OAuth, not API key), request envelope ({ model, request: { ...innerBody } }), and rate-limit semantics all differ.
streamAntigravity adapter implemented (providers/antigravity.ts). Full streaming adapter with: Antigravity envelope wrapping, Bearer token auth, Antigravity-specific headers (User-Agent, X-Goog-Api-Client, Client-Metadata), line-based SSE parsing (Antigravity-specific wire format; see 2026-06-03 amendment), thinking/text/toolCall streaming, per-frame usage extraction, abort handling, and rate-limit-aware 429 error handling.
RateLimitInfo type exported. Structured rate-limit info (retryAfterMs, reason, quotaResetTime, message) extracted from 429 responses. Parses Go-style compound durations (1h16m0.667s), retry-after-ms/retry-after headers, and structured error body details (RetryInfo, ErrorInfo, QuotaFailure). Surfaced in errorMessage for upstream rotation/backoff decisions.
Thinking budget differs by model family. Claude models (detected by modelId.includes("claude")) use different budget ranges than Gemini models — Claude omits includeThoughts and uses higher budgets at each effort level.
Thinking blocks stripped from outgoing requests. Assistant message history omits thinking blocks when building contents — Antigravity generates fresh thinking each turn (matching the reference plugin's approach). If stripping leaves a history turn with no valid parts, the adapter omits that turn instead of sending an empty contents entry.
listModels returns informational error for antigravity. Antigravity does not expose a model listing endpoint; models must be configured in the project DSL.
Driver catalog updated. New antigravity entry: label "Antigravity", base URL https://cloudcode-pa.googleapis.com, auth mode oauth, test model gemini-2.5-flash.
Dispatch wired. dispatch.ts routes antigravity shape to streamAntigravity.
Package structure updated. Added providers/antigravity.ts and __tests__/providers/antigravity.test.ts.
31 new tests. Text streaming, per-frame usage tracking, tool calls, thinking, rate-limit extraction (structured body, message fallback, header), safety filter, request format (URL, Bearer auth, envelope, headers, thinking budgets per model family, thinking block stripping).

2026-06-03 — Antigravity SSE and history sanitation

Antigravity uses a line-based SSE parser, not the shared parseSseStream. Antigravity's wire format sends data: {json}\n lines where each line is a complete event — it does not use \n\n double-newline framing like standard SSE. The adapter uses a local parseAntigravityStream generator that splits on \n, processes each data:-prefixed line individually, and buffers partial lines across network chunks. This matches the battle-tested OpenCode reference plugin (createStreamingTransformer in reference/opencode-antigravity-auth/). The shared parseSseStream (which waits for \n\n boundaries) must NOT be used for Antigravity — doing so causes silent output loss.
Antigravity strips empty sanitized history turns. During contents construction, empty text parts are omitted. If a user or assistant history message has no remaining valid parts after provider-specific sanitation (for example, an assistant turn containing only stripped thinking), the adapter omits the entire history turn rather than sending { parts: [] }.
Testing notes updated. Antigravity provider tests cover split/coalesced SSE frames carrying thinking/text output and empty sanitized history turns.

2026-05-30 — ADR-0028: OAuth provider module

Per ADR-0028:

OAuth provider module added (src/oauth/). Generic framework for any 3rd-party OAuth-backed LLM provider. 12 modules: types.ts (ProviderOAuthConfig, ProviderTokens), pkce.ts (PKCE via crypto.subtle), token-store.ts (per-provider JSON at $XDG_CONFIG_HOME/kaged/oauth/<provider>-tokens.json), authorize.ts (auth URL construction from config), callback-server.ts (temporary Bun.serve()), token-exchange.ts (code exchange with post-login hook support), refresh.ts (proactive + reactive refresh, resolveOAuthCredentials), login.ts, logout.ts, status.ts, index.ts (barrel).
Driver catalog extended. ProviderOAuthConfig declarations in PROVIDER_OAUTH_CONFIGS alongside existing DRIVER_AUTH_MODES. DriverInfo gains optional oauth field. resolveOAuthConfig(driverName) export.
Antigravity config registered. Full Google OAuth config with post-login hook for project ID resolution (fetchProjectId + onboardUser logic migrated from daemon).
Package export added. @kaged/llm/oauth entry point for daemon consumption.
Constrained-by list extended with ADR-0028.
Package structure updated. Added src/oauth/ (12 files) and __tests__/oauth/ (3 test files).

2026-05-30 — ADR-0026: Model metadata overrides + cost management + usage pipeline

Per ADR-0026:

§ Operator overrides rewritten (under § Model metadata catalog). Replaced the local.toml override description with the full DB-backed override system: ModelOverride storage shape, sparse key-value schema, overridable fields table with dot-notation for nested fields, resolveModelMeta function with ResolvedModelMeta return type (merged result + per-field source tracking), examples for self-hosted models and stale-pricing corrections.
resolveModelMeta replaces lookupModelMeta for override-aware callers. The harness, compaction, and token estimator call resolveModelMeta instead of lookupModelMeta. The merge path is: LiteLLM default → apply overrides → return. Models not in LiteLLM get a synthetic ModelMeta built from overrides only.
Context window overrides feed compaction. Per the ADR-0024 amendment, maxInputTokens is overridable. The compaction system uses the effective (merged) context window for threshold calculation.
Constrained-by list extended with ADR-0026.
Provider usage pipeline documented (§ Provider usage reporting extended). On-demand fetch with DB cache (provider_usage_cache table), cache invalidated after every LLM call to that provider, manual refresh endpoint for out-of-band usage.
Cost accumulation per provider. New provider_spend_events table records cost per LLM call. Daemon sums events per rolling window (5h, 7d) and compares against configured limits before each call.
Spend limit enforcement. provider_spend_limits table stores per-provider limits (absolute USD per window, percentage of rolling window for quota-based providers). Enforcement is a hard block before LLM dispatch — not a soft warning.

2026-05-29 — Z.AI driver + provider usage reporting

New zai driver added. Maps to anthropic-messages API shape with base URL https://api.z.ai/api/anthropic. Label "Z.AI (GLM Coding Plan)". Test model glm-5.1. Auth mode api_key. The existing streamAnthropic adapter handles non-Anthropic base URLs by switching from X-Api-Key to Authorization: Bearer — no new provider adapter needed.
New § Provider usage reporting added (under § Driver catalog). Defines the normalized types (UsageReport, UsageLimit, UsageWindow, UsageAmount, UsageScope, UsageFetchOptions) and two provider-specific fetchers:
- fetchZaiUsage — queries GET /api/monitor/usage/quota/limit on https://api.z.ai. Parses TOKENS_LIMIT and TIME_LIMIT entries into UsageLimit entries with usedFraction/remainingFraction, status classification (ok/warning/exhausted), and resetsAt from the API's nextResetTime.
- fetchAntigravityUsage — queries POST /v1internal:fetchAvailableModels on Antigravity's endpoint. Parses per-model, per-tier quota info (fraction-based) into flat UsageLimit entries. Requires OAuth accessToken + projectId.
UsageFetchOptions supports both auth patterns. apiKey for API-key providers (zai), accessToken + projectId for OAuth providers (antigravity). Each fetcher validates its own requirements at the top and returns null if missing.
API shape resolution table updated. New zai row.
Driver catalog table updated. New zai entry with full metadata.
Package structure updated. Added usage-types.ts, usage/zai.ts, usage/antigravity.ts.

References

ADR-0004: Runtime — Bun + TypeScript
ADR-0011: Project portability
ADR-0012: Agentic substrate is Mastra v1.x
ADR-0014: All LLM providers route through @kaged/llm; Mastra integrates via a LanguageModelV2 shim — Superseded (partial) by ADR-0049; the durable "LanguageModelV2 is the integration boundary" call is preserved.
ADR-0024: Context compaction is kaged-owned, layered, observable, and operator-tunable
ADR-0026: Cost management, model metadata overrides, and provider usage tracking
ADR-0028: 3rd-party OAuth provider auth
ADR-0041: Containerised daemon — image pre-seed, bundled catalog, bun on PATH
ADR-0049: Providers are dynamically-loaded modules from an operator-local store — the ADR this amendment implements
Spec: Agent harness — Mastra integration, LanguageModelV2 consumer
Spec: Local config — credential storage
Spec: HTTP API — WebSocket relay, /api/v1/local/providers response shape
Spec: Session manager — run state machine
models.dev — MIT; the catalog source kaged-models mirrors and republishes to models.kaged.dev
opencode-antigravity-auth — reference custom-provider OAuth plugin (the pattern the Antigravity plugin follows)
reference/oh-my-pi/packages/ai/ — pi-ai reference implementation (historical wire-protocol source for the deleted hand adapters; retained for institutional memory)

Spec: LLM Provider Interface

Purpose

Constraints (from ADRs)

Types

Message types

Context

Usage & Stop

Stream events

Provider resolution interface

Resolution algorithm

Catalog-driven provider resolution

Catalog snapshot

Catalog operations

What this replaces

Catalog refresh and removal surfacing

What lives where

Resolution at call time

Removal surfacing on sync

When a whole provider disappears

Provider config shape post-ADR-0049

Custom provider add flow

Wire shape

Discovery

UI representation

Model discovery

listModels(catalog, providerName?)

Model display names

Deprecation of per-provider listModels and models/refresh

Model metadata catalog

Data source

ModelMeta

lookupModelMeta

calculateCost

Operator overrides and the resolution pipeline

Override storage shape

Overridable fields

resolveModelMeta

Example: Override context window for a self-hosted model

Example: Route /v1/responses models through @ai-sdk/openai instead of the default @ai-sdk/openai-compatible

What this is NOT

Token estimation

estimateTokens

Example use from harness

Performance

tokenizer field on ModelMeta

Out of scope

Provider usage reporting

Types

Fetcher registry

Adding new fetchers

What this is NOT

Bundled drivers

Bundled driver registry

Middleware stack

Provider plugin contract

What a plugin exports

Resolution path for plugins

Antigravity as the reference plugin

What this replaces

Error taxonomy

Retry policy

Wiring and the open-call invariant (ADR-0052)

Retry classification for Tier 2 (ADR-0052)

Auth model

API-key resolution (in @kaged/local-config, not here)

OAuth providers

OAuth token storage (unchanged)

Mastra integration

Public API

Where translation still happens

Why a separate entry point (@kaged/llm/mastra)

Integration with harness and daemon

Primary path — Mastra agent loop

Direct path — provider test, ad-hoc calls (no agent loop)

Why the same code in both paths

Package structure

What was deleted

Testing notes

Open questions

Amendments

`listModels(catalog, providerName?)`

`ModelMeta`

`lookupModelMeta`

`calculateCost`

`resolveModelMeta`

Example: Route `/v1/responses` models through `@ai-sdk/openai` instead of the default `@ai-sdk/openai-compatible`

`estimateTokens`

`tokenizer` field on `ModelMeta`

API-key resolution (in `@kaged/local-config`, not here)

Why a separate entry point (`@kaged/llm/mastra`)

2026-06-03 — `openai-responses` reasoning capture noted as a known gap

2026-05-27 — ADR-0024: `estimateTokens` API for pre-call compaction threshold check

2026-05-23 — `LanguageModelV2` shim + dual call-path + OAuth provider role