ADR-0028: 3rd-party OAuth provider auth — token lifecycle and credential management

  • Status: Proposed
  • Date: 2026-05-30
  • Deciders: @karasu
  • Supersedes:
  • Superseded by:

Context

Kaged routes LLM calls through @kaged/llm (per ADR-0014). Some LLM providers require short-lived OAuth access tokens rather than static API keys. These tokens typically expire in ~1 hour. The daemon's credential resolution (resolveCredentials() in primary-runner.ts) currently reads a static access_token from local.toml — an operator must manually paste a token that becomes invalid within the hour.

Per ADR-0007, operator authentication (who is using kaged) is handled by an OAuth sidecar proxy. That ADR governs operator identity, not provider credentials. This ADR is about the latter: how kaged acquires, stores, refreshes, and injects provider-specific OAuth tokens for LLM API access. These are fundamentally different problems at different layers.

The current state:

  1. Static tokens don't work for OAuth-backed providers. Access tokens expire in ~1 hour. Manual token rotation is not viable for multi-hour sessions or unattended operation.
  2. No login flow exists. Operators cannot initiate an OAuth authorization code flow from within kaged. They must manually extract tokens from browser devtools or other tools.
  3. No refresh mechanism exists. When a token expires mid-session, the next LLM call fails with an auth error. There is no proactive or reactive refresh.
  4. No persistent token storage. Tokens live only in local.toml as static strings. There is no token lifecycle management.

The first OAuth-backed provider is already wired in @kaged/llm. It works with valid access tokens passed via route.apiKey. The provider map marks it as oauth-only. The plumbing exists; the token lifecycle is the gap.

Multi-hour sessions (recursive agents per ADR-0022, compaction summarizers per ADR-0024) require tokens that survive the session duration.

Decision

@kaged/llm gains a provider OAuth module that owns the full OAuth lifecycle for any 3rd-party LLM provider: PKCE-based authorization code grant via browser redirect, token exchange, persistent refresh token storage (Zod-validated JSON per provider), proactive access token refresh with expiry buffer, and integration into the credential resolution path. Each OAuth-backed provider registers its OAuth configuration (endpoints, scopes, client credentials) in the driver catalog. The module lives in @kaged/llm — it is provider infrastructure, not daemon routing logic. The daemon calls into @kaged/llm for credential resolution; @kaged/llm handles token acquisition, storage, and refresh. v0 ships single-account support per provider. Multi-account rotation is deferred to a future amendment.

Specifics

1. Provider OAuth configuration

Each OAuth-backed provider in the @kaged/llm driver catalog declares its OAuth configuration:

interface ProviderOAuthConfig {
  /** OAuth authorization endpoint URL. */
  authEndpoint: string;
  /** OAuth token exchange endpoint URL. */
  tokenEndpoint: string;
  /** OAuth client ID (public — PKCE provides client-side security). */
  clientId: string;
  /** OAuth client secret (required by some providers for token exchange). */
  clientSecret?: string;
  /** OAuth scopes to request. */
  scopes: string[];
  /** Callback server port (default 51121, configurable). */
  callbackPort?: number;
  /** How long before expiry to proactively refresh (default 60s). */
  expiryBufferMs?: number;
  /** Optional: endpoint to fetch user identity after token exchange. */
  userinfoEndpoint?: string;
  /** Optional: extra provider-specific post-login steps. */
  postLoginHook?: (tokens: OAuthTokens) => Promise<Record<string, unknown>>;
}

Provider-specific constants (client IDs, endpoints, scopes) are defined in the driver catalog alongside the existing DriverInfo entries. No OAuth secrets are hardcoded — the client credentials are public values embedded in the upstream provider's client applications. PKCE provides client-side security.

2. Authorization flow (login)

Triggered by the operator via UI button or daemon API call. Steps:

  1. Generate PKCE verifier + challenge. Using Bun.crypto.subtle (WebCrypto API) — SHA-256 hash of a cryptographically random verifier, base64url-encoded as the challenge. No external dependency.
  2. Build authorization URL. authEndpoint with query params: client_id, response_type=code, redirect_uri, scope, code_challenge, code_challenge_method=S256, state (base64url-encoded JSON containing verifier and provider-specific metadata), access_type=offline, prompt=consent.
  3. Open browser. Bun.open() (or open command on Linux) to launch the authorization URL in the system browser.
  4. Start local callback server. A temporary Bun.serve() on the configured port that listens for GET /oauth-callback?code=...&state=.... The server runs for a maximum of 5 minutes, then shuts down.
  5. Receive callback. Extract code and state from the callback request. Decode state to recover the PKCE verifier and provider context.
  6. Exchange code for tokens. POST to tokenEndpoint with grant_type=authorization_code, code, redirect_uri, client_id, client_secret (if configured), code_verifier. Response contains access_token, refresh_token, expires_in.
  7. Fetch user info. (If userinfoEndpoint is configured) GET the endpoint with the access token to resolve the operator's identity for display.
  8. Run post-login hook. (If postLoginHook is configured) Execute provider-specific post-login steps (e.g. resolving a project ID, fetching additional metadata).
  9. Store tokens. Persist to the provider's token store file (see §3).
  10. Respond to browser. Serve a simple HTML success page that auto-closes or shows a "return to kaged" link.
  11. Notify UI. Emit a daemon event so the UI can update the provider credential status display.

3. Token storage

File: $XDG_CONFIG_HOME/kaged/oauth/<provider-name>-tokens.json (resolved via XDG_CONFIG_HOME env var, falling back to ~/.config/kaged/). One file per provider.

Schema (Zod-validated):

const ProviderTokenSchema = z.object({
  provider: z.string(),          // driver name
  identity: z.string().optional(), // email or user handle (from userinfo)
  refreshToken: z.string().min(1),
  accessToken: z.string().min(1),
  expiresAt: z.number(),         // Unix epoch ms, absolute
  obtainedAt: z.number(),        // Unix epoch ms
  metadata: z.record(z.unknown()).optional(), // provider-specific (project ID, etc.)
});

File permissions: 0600 (owner read/write only). The file contains refresh tokens — treat as a secret.

Write pattern: atomic write (write to .tmp file, rename). No file locking for v0 (single daemon process).

4. Token refresh

Proactive refresh. Before each LLM call that requires an OAuth token, the system checks token expiry. If the access token is expired or within expiryBufferMs of expiry:

  1. POST to tokenEndpoint with grant_type=refresh_token, refresh_token, client_id, client_secret (if configured).
  2. Response contains a new access_token and expires_in.
  3. Update accessToken and expiresAt in the token store.
  4. Return the fresh access token.

Reactive refresh (on 401). If an LLM call fails with a 401/auth error, the system attempts one refresh and retries the call once. If the refresh fails, the error surfaces to the operator.

No background refresh timer in v0. The proactive check on each call is sufficient. A background timer is a future optimization.

5. Integration with resolveCredentials()

resolveCredentials() in primary-runner.ts currently reads access_token from local.toml for OAuth drivers. The change:

  • When authModes.includes("oauth") for a driver, call the @kaged/llm OAuth module to resolve credentials.
  • The OAuth module loads the provider's token store, checks expiry, refreshes if needed, returns the fresh accessToken and any provider-specific metadata.
  • The existing local.toml access_token field remains as a fallback for operators who prefer manual tokens.
  • The resolution order becomes: (1) OAuth token store via @kaged/llm (if available and not expired), (2) local.toml access_token / access_token_env, (3) return null (credentials unresolved).

6. Daemon HTTP endpoints

For each OAuth-backed provider, three endpoints under /api/v1/local/providers/<provider>/auth/:

  • POST /login — Initiates the OAuth flow. Opens browser, starts callback server. Returns { ok: true, redirectUrl: string } immediately so the UI can show "Opening browser...".
  • GET /status — Returns current auth state: { authenticated: boolean, identity?: string, expiresAt?: number, metadata?: Record<string, unknown> }. Does not expose tokens.
  • POST /logout — Deletes the provider's token store file. Returns { ok: true }.

These are local-only endpoints (per the daemon's loopback binding constraint).

7. UI changes

The existing provider config screen (packages/ui/src/components/screens/config-providers.tsx) already has driverUsesOAuth() detection and renders access_token + project_id inputs for OAuth drivers. The changes:

  • When the driver uses OAuth auth mode, replace the manual access_token input with:
    • Login button ("Sign in with [provider]") — calls POST /login, then shows "Waiting for browser..." with a spinner.
    • Status display — after successful login, shows identity, metadata, and token expiry. Polls GET /status every 60 seconds.
    • Logout button — calls POST /logout.
  • Keep the manual access_token / access_token_env fields as an "Advanced" collapsible section for operators who want to bypass the OAuth flow.

8. What v0 does NOT ship

These are explicitly deferred to future amendments:

  • Multi-account support. v0 stores one set of tokens per provider. Multi-account rotation with health scoring and rate-limit awareness is a substantial addition that deserves its own spec.
  • Plugin extraction. The OAuth logic lives in @kaged/llm. The plugin-host (per plugin-host spec) has no provider-auth hook point. When such a hook is added (future ADR), this logic can be extracted into a plugin.
  • Background refresh timer. Tokens are refreshed on-demand (before each call or on 401). A proactive background timer is a future optimization.
  • Generic provider registration. v0 ships with hardcoded provider OAuth configs in the driver catalog. A future amendment may allow operators to register custom OAuth providers via configuration.

Consequences

What this commits us to

  • An OAuth module in @kaged/llm with PKCE implementation, token storage, and refresh logic — reusable across all OAuth-backed providers.
  • Provider OAuth configs declared in the @kaged/llm driver catalog alongside existing DriverInfo entries.
  • PKCE implementation using Bun.crypto.subtle — no external OAuth library dependency.
  • A local HTTP server (temporary Bun.serve()) for the OAuth callback — lives for the duration of the login flow only.
  • Zod-validated JSON files at $XDG_CONFIG_HOME/kaged/oauth/<provider>-tokens.json storing refresh tokens, one per provider.
  • Changes to resolveCredentials() in primary-runner.ts to delegate to @kaged/llm for OAuth credential resolution.
  • Three new daemon API endpoints per OAuth-backed provider for auth login/status/logout.
  • UI changes to the provider config screen: login button, status display, logout for OAuth drivers.
  • Spec amendments to docs/specs/http-api.md (new endpoints) and docs/specs/ui/README.md (new UI flow).
  • STATUS.md update.

What this forecloses

  • Using an external OAuth library (e.g., @openauthjs/openauth). Bun's built-in crypto.subtle provides everything needed for PKCE. Adding an npm dependency for this would violate the minimal-dependency posture (per AGENTS.md: "Bun built-ins first").
  • Plugin-based auth in v0. The @kaged/llm placement is correct for the current architecture. When the plugin-host gains auth middleware hooks, this module can be extracted.
  • Refresh token rotation in v0. Some providers rotate refresh tokens; others don't. The token store schema accommodates this but v0 doesn't implement rotation logic.
  • Multiple accounts per provider in v0. Single account per provider.

What becomes easier

  • Adding new OAuth-backed providers. Once the framework exists in @kaged/llm, adding a new provider is a matter of defining a ProviderOAuthConfig in the driver catalog — the PKCE, token storage, refresh, and UI plumbing are shared.
  • Long-running sessions. Tokens refresh transparently. Multi-hour sessions with recursive agents and compaction summarizers work without interruption.
  • Future multi-account support. The token store schema is designed to accommodate multiple accounts per provider with minimal changes.

What becomes harder

  • @kaged/llm surface area. The OAuth module adds ~500-800 LOC. It is self-contained but must be maintained.
  • Browser dependency. The login flow requires a system browser. Headless/server environments need an alternative flow (manual URL copy-paste to a different machine). The daemon should log the authorization URL so operators can manually open it if Bun.open() fails.
  • Token store security. The JSON files contain refresh tokens. File permissions must be correct. The daemon must not log tokens or expose them in API responses.
  • Port availability. The callback server needs a configurable port (default 51121). If the port is in use, the login flow must detect this and inform the operator.

Alternatives considered

Alternative A — Static tokens only (current state)

Keep the manual access_token field in local.toml and require operators to refresh tokens manually.

Why tempting: Zero implementation effort. No new code, no new files, no new endpoints.

Why rejected: Access tokens expire in ~1 hour. Multi-hour sessions are the primary use case for kaged (recursive agents, compaction summarizers). Manual token rotation every hour is not viable for unattended or long-running sessions. The static token field remains as a fallback, but the primary flow must be automated.

Alternative B — Per-provider daemon modules

Build a separate auth module in the daemon for each OAuth-backed provider, similar to the existing antigravity-auth/ directory pattern.

Why tempting: Isolates provider-specific logic. Each provider gets its own directory with its own constants.

Why rejected: Every new OAuth provider would duplicate the same PKCE flow, token storage, refresh logic, and callback server. The only per-provider differences are endpoints, scopes, and client credentials — all of which are configuration. A shared framework in @kaged/llm is the right abstraction boundary since OAuth is provider infrastructure, not daemon routing logic.

Alternative C — Plugin-based auth from the start

Build the auth subsystem as plugins using the plugin-host's JSON-RPC protocol, adding a new provider_auth hook capability to the plugin-host spec.

Why tempting: Correct long-term architecture. Plugin isolation keeps provider-specific code opt-in. The plugin-host's sandboxing (Bubblewrap) adds a security boundary.

Why rejected: The plugin-host has no provider-auth hook point. Adding one requires:

  1. A new ADR for the hook interface.
  2. Amendments to the plugin-host spec (new capability type, new hook signature).
  3. Changes to the daemon's resolveCredentials() path to call into the plugin host.
  4. The plugin itself must run an HTTP server for the OAuth callback — but plugins communicate via JSON-RPC stdio, not HTTP. The daemon would need to proxy HTTP callbacks to the plugin.
  5. The token store would need to live in the plugin's brokered SQLite storage, which is not yet implemented (per plugin-host spec gap analysis).

This is 3-4 ADRs and significant plugin-host infrastructure work before a single line of auth code. Pragmatic v0 is @kaged/llm-internal, with a clear extraction path when the plugin-host gains auth hooks.

Alternative D — External sidecar for token management

Run a separate process (sidecar) that handles the OAuth flow and serves tokens to the daemon via a local API or file.

Why tempting: Separation of concerns. The sidecar can be restarted independently. Matches the pattern of ADR-0007's OAuth proxy sidecar for operator auth.

Why rejected: ADR-0007's sidecar pattern is for operator identity — a different layer. Provider credential management is an LLM provider concern. Adding a separate process for this adds operational complexity (another process to start, monitor, configure) without clear benefit. The token lifecycle is simple enough to live in-process.

Alternative E — Use @openauthjs/openauth or similar OAuth library

Import an OAuth library to handle PKCE, token exchange, and refresh.

Why tempting: Battle-tested code. Less to write and test.

Why rejected: The OAuth flow here is a single authorization code grant with PKCE — approximately 50 lines of cryptographic operations. Bun's crypto.subtle provides SHA-256, base64url encoding, and random byte generation. Adding an npm dependency for this violates the project's "Bun built-ins first" posture and adds a supply chain dependency for trivially implementable functionality.

Alternative F — Token refresh via background timer

Run a setInterval that proactively refreshes tokens every N minutes, regardless of whether LLM calls are being made.

Why tempting: Tokens are always fresh. No on-demand refresh latency.

Why rejected: Unnecessary network traffic and potential for waking up dormant refresh tokens. The proactive check before each LLM call is sufficient — if no calls are being made, no tokens need refreshing. If calls are being made, the check is essentially free (compare two timestamps). A background timer adds complexity (startup ordering, shutdown cleanup, error handling) for no practical benefit in v0.

References