diff --git a/.planning/phases/03-operator-experience/03-RESEARCH.md b/.planning/phases/03-operator-experience/03-RESEARCH.md
new file mode 100644
index 0000000..d2c5754
--- /dev/null
+++ b/.planning/phases/03-operator-experience/03-RESEARCH.md
@@ -0,0 +1,769 @@
+# Phase 3: Operator Experience - Research
+
+**Researched:** 2026-03-23
+**Domain:** Slack OAuth V2, Stripe Subscriptions, BYO API Key Encryption, Cost Dashboard
+**Confidence:** HIGH (core stack verified against official docs)
+
+---
+
+
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+- Slack connection via standard OAuth2 "Add to Slack" flow — operator clicks button, authorizes, tokens stored automatically
+- WhatsApp connection: guided manual setup (Claude's discretion confirmed)
+- After connecting a channel, the wizard MUST include a "send test message" step — required, not optional
+- Test message verifies end-to-end connectivity before the agent goes live
+- Onboarding sequence: Connect Channel → Configure Agent → Send Test Message
+- Agent goes live automatically after the test message succeeds — no separate "Go Live" button
+- Pricing model: per-agent monthly (e.g., $49/agent/month)
+- 14-day free trial with full access, credit card required upfront
+- Subscription management via Stripe: subscribe, upgrade (add agents), downgrade (remove agents), cancel
+- LLM-03 resolved: BYO API keys IS in v1 scope (Phase 3)
+- Cost metrics: token usage per agent, cost breakdown by LLM provider, message volume per agent/channel, budget alerts
+- Budget alerts: visual indicator when approaching or exceeding per-agent budget limits (from AGNT-07)
+
+### Claude's Discretion
+- WhatsApp connection method (guided manual vs embedded signup)
+- Stepper UI for onboarding (yes/no, visual style)
+- Non-payment enforcement behavior
+- BYO key scope (tenant-level settings page vs per-agent)
+- Cost dashboard time range options
+- Dashboard chart library (recharts, nivo, etc.)
+- Stripe webhook event handling strategy (idempotency, retry)
+
+### Deferred Ideas (OUT OF SCOPE)
+None — discussion stayed within phase scope
+
+
+---
+
+
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| AGNT-07 | Agent token usage tracked per-agent per-tenant with configurable budget limits | Audit event JSONB metadata must store `prompt_tokens`, `completion_tokens`, `provider`; budget stored on Tenant model; alert threshold query pattern documented |
+| LLM-03 | Tenant can provide their own API keys for supported LLM providers (BYO keys, encrypted at rest) | Fernet AES-128-CBC with HMAC-SHA256; envelope encryption pattern; new `tenant_llm_keys` table; LiteLLM routing integration |
+| PRTA-03 | Operator can connect messaging channels (Slack, WhatsApp) via guided wizard | Slack OAuth V2 flow; required scopes; token storage in `channel_connections.config`; WhatsApp manual setup steps |
+| PRTA-04 | New tenants are guided through structured onboarding (connect channel, configure agent, test message) | Stepper UI pattern; Next.js App Router multi-step page; test message endpoint |
+| PRTA-05 | Operator can manage subscription plans and billing via Stripe integration | Stripe Checkout with per-seat quantity; Billing Portal for self-service; webhook event map; idempotency pattern |
+| PRTA-06 | Portal displays agent cost tracking and usage metrics per tenant | SQL aggregate query on audit_events; JSONB path extraction; Recharts for visualization; time-range filtering |
+
+
+---
+
+## Summary
+
+Phase 3 adds the commercial and operational layer to the Konstruct portal: Slack OAuth, subscription billing, BYO key encryption, and a cost dashboard. All four areas are well-trodden territory with mature libraries — the risks are in integration details, not algorithmic complexity.
+
+The largest architectural gap is in the audit trail: the existing `audit_events.metadata` JSONB field stores `model` and `iteration` but NOT `prompt_tokens`, `completion_tokens`, or `cost_usd`. These fields must be added to the audit logger before the cost dashboard can function. This is a prerequisite for PRTA-06 and AGNT-07 and needs to be Wave 0 work.
+
+The second important finding is that WhatsApp Embedded Signup (Meta OAuth flow) is now the standard for BSP-level onboarding in 2026, but it requires a registered Facebook Business Verification and a BSP/Tech Provider program account. For v1 "guided manual setup" is the correct choice — it means operators manually create a WhatsApp Business App, get their phone number token, and paste credentials into the portal. This avoids the multi-week Meta verification process while shipping.
+
+**Primary recommendation:** Build Slack OAuth → Stripe billing → BYO key encryption → cost dashboard in that order. Each is independently deployable. Start with the audit trail metadata migration as Wave 0.
+
+---
+
+## Standard Stack
+
+### Core
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `stripe` (Python) | `>=12.0.0` | Stripe API, webhook verification, subscription management | Official Stripe Python SDK; `StripeClient` pattern is current API |
+| `cryptography` (Python) | `>=47.0.0` | BYO key encryption via Fernet | pyca/cryptography is the Python standard; already used for bcrypt via `bcrypt` dep; Fernet is audited |
+| `slack-bolt` (Python) | `>=1.22.0` | Slack OAuth installer, Events API | Already in CLAUDE.md tech stack; `OAuthFlow` handles token exchange |
+| `stripe` (npm) | `>=17.0.0` | Stripe.js for frontend Checkout redirect | Official JS client |
+| `recharts` | `>=2.15.0` | Cost dashboard charts | 17M weekly downloads vs Nivo's 2M; simpler JSX API; strong shadcn/ui alignment |
+
+### Supporting
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| `@stripe/stripe-js` | `>=5.0.0` | Stripe Checkout redirect from browser | When creating Checkout Sessions from portal |
+| `slack-sdk` (Python) | `>=3.35.0` | Lower-level Slack Web API calls (post test message) | For the "send test message" verification step |
+
+### Alternatives Considered
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| Fernet (AES-128-CBC + HMAC) | AES-256-GCM via `cryptography.hazmat` | AES-256-GCM is stronger but requires manual MAC management; Fernet is audited, has MultiFernet key rotation, and AES-128-CBC + HMAC-SHA256 is sufficient for API key protection |
+| Recharts | Nivo | Nivo has more chart types but 8x fewer downloads, worse documentation, and verbose API; Recharts is recommended for SaaS admin dashboards |
+| Stripe Billing Portal (hosted) | Custom billing UI | Custom UI requires full payment method management; Billing Portal handles card updates, invoice history, cancellation in a Stripe-hosted page — use it |
+
+**Installation:**
+```bash
+# Python (add to packages/shared/pyproject.toml)
+uv add stripe cryptography
+
+# Node (in packages/portal)
+npm install recharts @stripe/stripe-js stripe
+```
+
+---
+
+## Architecture Patterns
+
+### Recommended Project Structure (new files only)
+
+```
+packages/
+├── shared/
+│ └── shared/
+│ ├── models/
+│ │ └── billing.py # TenantBilling, TenantLlmKey models
+│ └── api/
+│ ├── billing.py # Stripe webhooks + subscription endpoints
+│ └── channels.py # Slack OAuth callback, channel connection
+├── portal/
+│ └── app/
+│ ├── api/
+│ │ └── slack/
+│ │ └── callback/
+│ │ └── route.ts # Slack OAuth redirect handler
+│ └── (dashboard)/
+│ ├── onboarding/
+│ │ └── page.tsx # Connect Channel → Configure Agent → Test
+│ ├── billing/
+│ │ └── page.tsx # Subscription status + Billing Portal redirect
+│ ├── usage/
+│ │ └── [tenantId]/
+│ │ └── page.tsx # Cost dashboard per tenant
+│ └── settings/
+│ └── api-keys/
+│ └── page.tsx # BYO key management
+migrations/
+└── versions/
+ ├── xxxx_add_billing_fields.py # stripe_customer_id, subscription_status, trial_ends_at on tenants
+ ├── xxxx_add_tenant_llm_keys.py # tenant_llm_keys table
+ └── xxxx_add_token_fields.py # prompt_tokens, completion_tokens, cost_usd, provider on audit_events
+```
+
+### Pattern 1: Slack OAuth V2 Flow
+
+**What:** Operator clicks "Add to Slack" → Slack authorization page → redirect back to portal callback → exchange code for bot token → store in `channel_connections`
+
+**Scopes required (bot):**
+- `app_mentions:read` — receive @mention events
+- `channels:read` — list public channels
+- `channels:history` — read channel message history
+- `chat:write` — post messages (required for test message + agent replies)
+- `groups:read` — private channels
+- `im:read` / `im:write` / `im:history` — DM support
+- `mpim:read` / `mpim:history` — multi-party DMs
+
+**OAuth V2 flow:**
+
+```
+1. Operator visits /onboarding → clicks "Add to Slack"
+2. Portal redirects to:
+ https://slack.com/oauth/v2/authorize
+ ?client_id=
+ &scope=app_mentions:read,channels:read,channels:history,chat:write,im:read,im:write,im:history
+ &redirect_uri=https://app.konstruct.ai/api/slack/callback
+ &state=
+
+3. User approves → Slack redirects to /api/slack/callback?code=xxx&state=yyy
+
+4. FastAPI backend exchanges code:
+ POST https://slack.com/api/oauth.v2.access
+ client_id, client_secret, code, redirect_uri
+
+5. Response contains:
+ {
+ "ok": true,
+ "access_token": "xoxb-...", ← bot token, store encrypted
+ "team": { "id": "T12345", "name": "Acme Corp" },
+ "bot_user_id": "U67890",
+ "scope": "app_mentions:read,..."
+ }
+
+6. Store in channel_connections:
+ - channel_type: "slack"
+ - workspace_id: team.id
+ - config: { "bot_token": encrypt(access_token), "bot_user_id": ..., "team_name": ... }
+```
+
+**State parameter** must encode `tenant_id` + CSRF token (sign with HMAC-SHA256, verify on callback).
+
+```python
+# Source: https://docs.slack.dev/authentication/installing-with-oauth/
+
+# Generate state
+import hmac, hashlib, secrets, json, base64
+
+def generate_oauth_state(tenant_id: str, secret: str) -> str:
+ nonce = secrets.token_urlsafe(16)
+ payload = json.dumps({"tenant_id": tenant_id, "nonce": nonce})
+ sig = hmac.new(secret.encode(), payload.encode(), hashlib.sha256).hexdigest()
+ return base64.urlsafe_b64encode(f"{payload}:{sig}".encode()).decode()
+
+def verify_oauth_state(state: str, secret: str) -> str:
+ """Returns tenant_id or raises ValueError."""
+ decoded = base64.urlsafe_b64decode(state.encode()).decode()
+ payload_str, sig = decoded.rsplit(":", 1)
+ expected = hmac.new(secret.encode(), payload_str.encode(), hashlib.sha256).hexdigest()
+ if not hmac.compare_digest(sig, expected):
+ raise ValueError("Invalid OAuth state")
+ return json.loads(payload_str)["tenant_id"]
+```
+
+### Pattern 2: Stripe Per-Agent Subscription
+
+**What:** Operator subscribes → Checkout Session created with quantity=agent_count → redirected to Stripe → on success webhook, provision access.
+
+**Key objects to persist on Tenant:**
+- `stripe_customer_id` (String) — created once per tenant on first subscription
+- `stripe_subscription_id` (String | None)
+- `stripe_subscription_item_id` (String | None) — needed for quantity updates
+- `subscription_status` (Enum: `trialing`, `active`, `past_due`, `canceled`, `unpaid`)
+- `trial_ends_at` (DateTime | None)
+- `agent_quota` (Integer) — number of paid seats
+
+**Checkout Session creation (Python):**
+```python
+# Source: https://docs.stripe.com/payments/checkout/build-subscriptions
+
+import stripe
+
+client = stripe.StripeClient(api_key=settings.stripe_secret_key)
+
+session = client.v1.checkout.sessions.create({
+ "mode": "subscription",
+ "customer": tenant.stripe_customer_id, # or create new
+ "line_items": [{
+ "price": settings.stripe_per_agent_price_id,
+ "quantity": agent_count, # number of agents being subscribed
+ }],
+ "subscription_data": {
+ "trial_period_days": 14,
+ },
+ "success_url": f"{settings.portal_url}/billing?session_id={{CHECKOUT_SESSION_ID}}",
+ "cancel_url": f"{settings.portal_url}/billing",
+})
+# Return session.url to frontend for redirect
+```
+
+**Quantity update when agents are added/removed:**
+```python
+# Source: https://docs.stripe.com/api/subscription_items/update?lang=python
+
+client.v1.subscription_items.update(
+ tenant.stripe_subscription_item_id,
+ {"quantity": new_agent_count},
+)
+```
+
+**Billing Portal session:**
+```python
+# Source: https://docs.stripe.com/customer-management/integrate-customer-portal
+
+portal_session = client.v1.billing_portal.sessions.create({
+ "customer": tenant.stripe_customer_id,
+ "return_url": f"{settings.portal_url}/billing",
+})
+# Return portal_session.url to frontend
+```
+
+### Pattern 3: Stripe Webhook Handler
+
+**Critical webhook events to handle:**
+
+| Event | Action |
+|-------|--------|
+| `checkout.session.completed` | Store `subscription_id`, `subscription_item_id`, set status `trialing` or `active` |
+| `customer.subscription.created` | Same as above if not using Checkout |
+| `customer.subscription.updated` | Update `subscription_status`, `agent_quota`, `trial_ends_at` |
+| `customer.subscription.deleted` | Set status `canceled`, deactivate all agents |
+| `customer.subscription.trial_will_end` | Send alert email (3 days before trial ends) |
+| `invoice.paid` | Set status `active`, re-enable agents if they were suspended |
+| `invoice.payment_failed` | Set status `past_due`, send payment failure notification |
+
+**FastAPI webhook endpoint:**
+```python
+# Source: https://docs.stripe.com/webhooks
+
+from fastapi import APIRouter, Request, HTTPException
+import stripe
+
+webhook_router = APIRouter()
+
+@webhook_router.post("/webhooks/stripe")
+async def stripe_webhook(
+ request: Request,
+ session: AsyncSession = Depends(get_session),
+) -> dict[str, str]:
+ payload = await request.body()
+ sig_header = request.headers.get("stripe-signature", "")
+
+ try:
+ event = stripe.WebhookEvent.construct_from(
+ stripe.Webhook.construct_event(
+ payload, sig_header, settings.stripe_webhook_secret
+ ).to_dict(),
+ stripe.api_key,
+ )
+ except stripe.SignatureVerificationError:
+ raise HTTPException(status_code=400, detail="Invalid signature")
+
+ # Idempotency: check if event already processed
+ already_processed = await _check_event_processed(session, event["id"])
+ if already_processed:
+ return {"status": "already_processed"}
+
+ await _record_event_processed(session, event["id"])
+ await _dispatch_event(session, event)
+ return {"status": "ok"}
+```
+
+**Idempotency table:** Add a `stripe_events` table with `(event_id PRIMARY KEY, processed_at)` — INSERT with ON CONFLICT DO NOTHING; if 0 rows affected, skip processing.
+
+**Non-payment enforcement:** When `subscription_status` becomes `past_due` after grace period (configurable, suggest 7 days), set `Agent.is_active = False` for all tenant agents. The gateway/orchestrator already gates on `is_active`, so no further changes needed.
+
+### Pattern 4: BYO API Key Encryption (Envelope Encryption)
+
+**What:** Tenant provides their OpenAI/Anthropic API key. We encrypt it before storing. The platform-level master encryption key is in environment variables (or secrets manager).
+
+**Important:** Fernet uses AES-128-CBC + HMAC-SHA256, NOT AES-256. This is still cryptographically sound and the `cryptography` library is audited. CLAUDE.md specifies "AES-256" aspirationally — Fernet is the correct practical choice. Document this tradeoff in ADR-005.
+
+**Schema — new table `tenant_llm_keys`:**
+```sql
+CREATE TABLE tenant_llm_keys (
+ id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+ tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
+ provider TEXT NOT NULL, -- 'openai' | 'anthropic' | 'custom'
+ label TEXT NOT NULL, -- human-readable name
+ encrypted_key TEXT NOT NULL,
+ key_version INT NOT NULL DEFAULT 1, -- for rotation tracking
+ created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+ UNIQUE(tenant_id, provider) -- one key per provider per tenant
+);
+-- RLS enabled: same pattern as agents table
+```
+
+**Encryption service:**
+```python
+# Source: https://cryptography.io/en/latest/fernet/
+
+from cryptography.fernet import Fernet, MultiFernet
+import os
+
+class KeyEncryptionService:
+ """
+ Encrypts/decrypts tenant BYO API keys.
+
+ PLATFORM_ENCRYPTION_KEY env var must be a URL-safe base64 Fernet key.
+ For rotation: PLATFORM_ENCRYPTION_KEY_PREVIOUS holds the prior key.
+ """
+
+ def __init__(self) -> None:
+ primary = Fernet(os.environ["PLATFORM_ENCRYPTION_KEY"])
+ keys = [primary]
+ if prev := os.environ.get("PLATFORM_ENCRYPTION_KEY_PREVIOUS"):
+ keys.append(Fernet(prev))
+ self._fernet = MultiFernet(keys)
+
+ def encrypt(self, plaintext: str) -> str:
+ return self._fernet.encrypt(plaintext.encode()).decode()
+
+ def decrypt(self, ciphertext: str) -> str:
+ return self._fernet.decrypt(ciphertext.encode()).decode()
+
+ def rotate(self, ciphertext: str) -> str:
+ """Re-encrypt under the current primary key."""
+ return self._fernet.rotate(ciphertext.encode()).decode()
+```
+
+**Key generation for setup:**
+```bash
+python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
+```
+
+**LiteLLM integration:** When routing LLM calls, check if the tenant has a BYO key for the requested provider. If yes, decrypt and inject into the LiteLLM call. Never log the decrypted key.
+
+### Pattern 5: Cost Dashboard — Audit Event Aggregation
+
+**CRITICAL PREREQUISITE:** The current audit logger stores `model` in metadata but NOT token counts. The runner.py `log_llm_call` metadata must be extended before the cost dashboard can work.
+
+**Required metadata fields to add to `log_llm_call`:**
+```python
+# In orchestrator/agents/runner.py — extend existing metadata dict:
+metadata={
+ "model": data.get("model", agent.model_preference),
+ "provider": _extract_provider(data.get("model", "")), # "openai" | "anthropic" | "ollama"
+ "prompt_tokens": usage.get("prompt_tokens", 0),
+ "completion_tokens": usage.get("completion_tokens", 0),
+ "total_tokens": usage.get("total_tokens", 0),
+ "cost_usd": _calculate_cost(model, usage), # pre-calculated, stored as float
+ "iteration": iteration,
+ "tool_calls_count": len(response_tool_calls),
+}
+```
+
+**Dashboard aggregation query:**
+```sql
+-- Token usage per agent for a time range
+SELECT
+ agent_id,
+ SUM((metadata->>'prompt_tokens')::int) AS prompt_tokens,
+ SUM((metadata->>'completion_tokens')::int) AS completion_tokens,
+ SUM((metadata->>'total_tokens')::int) AS total_tokens,
+ SUM((metadata->>'cost_usd')::float) AS cost_usd,
+ COUNT(*) AS llm_call_count
+FROM audit_events
+WHERE
+ tenant_id = :tenant_id
+ AND action_type = 'llm_call'
+ AND created_at >= :start_date
+ AND created_at < :end_date
+GROUP BY agent_id;
+
+-- Cost by provider
+SELECT
+ metadata->>'provider' AS provider,
+ SUM((metadata->>'cost_usd')::float) AS cost_usd,
+ COUNT(*) AS call_count
+FROM audit_events
+WHERE
+ tenant_id = :tenant_id
+ AND action_type = 'llm_call'
+ AND created_at >= :start_date
+GROUP BY metadata->>'provider';
+
+-- Message volume by channel (count message events)
+SELECT
+ metadata->>'channel' AS channel,
+ COUNT(*) AS message_count
+FROM audit_events
+WHERE
+ tenant_id = :tenant_id
+ AND action_type = 'llm_call'
+ AND created_at >= :start_date
+GROUP BY metadata->>'channel';
+```
+
+**Index required:**
+```sql
+CREATE INDEX CONCURRENTLY idx_audit_events_tenant_type_created
+ ON audit_events (tenant_id, action_type, created_at DESC);
+
+-- GIN index for JSONB queries if aggregation volume is high
+CREATE INDEX CONCURRENTLY idx_audit_events_metadata
+ ON audit_events USING GIN (metadata);
+```
+
+**Time range options (Claude's discretion):** Offer Last 7 days / Last 30 days / This month / Custom range. Default to Last 30 days. Use a simple `