23 KiB
23 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 03-operator-experience | 01 | execute | 1 |
|
true |
|
|
Purpose: Every portal UI feature in Phase 3 depends on backend APIs and database schema. This plan ships all backend infrastructure so Plans 02-04 can focus on frontend. Output: New DB tables/fields, billing + channel + LLM key + usage API endpoints, encryption service, enhanced audit logger, and comprehensive test scaffolds.
<execution_context> @/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md @/home/adelorenzo/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/03-operator-experience/03-CONTEXT.md @.planning/phases/03-operator-experience/03-RESEARCH.mdFrom packages/shared/shared/models/tenant.py:
class Tenant(Base):
__tablename__ = "tenants"
id: Mapped[uuid.UUID]
name: Mapped[str]
slug: Mapped[str]
settings: Mapped[dict[str, Any]]
created_at: Mapped[datetime]
updated_at: Mapped[datetime]
agents: Mapped[list[Agent]] = relationship(...)
channel_connections: Mapped[list[ChannelConnection]] = relationship(...)
class Agent(Base):
__tablename__ = "agents"
id: Mapped[uuid.UUID]
tenant_id: Mapped[uuid.UUID]
name: Mapped[str]
role: Mapped[str]
is_active: Mapped[bool]
# ... other fields
class ChannelConnection(Base):
__tablename__ = "channel_connections"
# channel_type, workspace_id, tenant_id, config (JSON), is_active
From packages/shared/shared/models/audit.py:
class AuditEvent(AuditBase):
__tablename__ = "audit_events"
id: Mapped[uuid.UUID]
tenant_id: Mapped[uuid.UUID]
agent_id: Mapped[uuid.UUID | None]
action_type: Mapped[str] # "llm_call" | "tool_invocation" | "escalation"
metadata: Mapped[dict[str, Any]] # JSONB
created_at: Mapped[datetime]
From packages/shared/shared/config.py:
class Settings(BaseSettings):
# existing: database_url, redis_url, slack_bot_token, slack_signing_secret, ...
From packages/orchestrator/orchestrator/audit/logger.py:
class AuditLogger:
async def log_llm_call(self, tenant_id, agent_id, user_id, input_summary, output_summary, latency_ms, metadata=None)
async def log_tool_call(self, tool_name, args, result, tenant_id, agent_id, latency_ms, error=None)
async def log_escalation(self, ...)
From packages/shared/shared/api/portal.py:
portal_router = APIRouter(prefix="/api/portal")
# Existing: /auth/verify, /auth/register, /tenants CRUD, /tenants/{id}/agents CRUD
3. Create `packages/shared/shared/models/billing.py`:
- `TenantLlmKey` model: id (UUID PK), tenant_id (FK tenants.id CASCADE), provider (TEXT NOT NULL), label (TEXT NOT NULL), encrypted_key (TEXT NOT NULL), key_version (INT DEFAULT 1), created_at. UNIQUE(tenant_id, provider). Use AuditBase (same as audit_events — separate declarative base).
- `StripeEvent` model: event_id (TEXT PK), processed_at (TIMESTAMPTZ DEFAULT now()). Use Base from tenant.py.
- Note: tenant_llm_keys needs RLS enabled (same pattern as agents table).
4. Add billing fields to `Tenant` model in tenant.py:
- stripe_customer_id: Mapped[str | None] (String(255), nullable=True)
- stripe_subscription_id: Mapped[str | None] (String(255), nullable=True)
- stripe_subscription_item_id: Mapped[str | None] (String(255), nullable=True)
- subscription_status: Mapped[str] (String(50), default="none") — values: none, trialing, active, past_due, canceled, unpaid
- trial_ends_at: Mapped[datetime | None] (DateTime(timezone=True), nullable=True)
- agent_quota: Mapped[int] (Integer, default=0)
5. Add budget field to `Agent` model in tenant.py:
- budget_limit_usd: Mapped[float | None] (Float, nullable=True, default=None) — NULL means no limit
6. Create `packages/shared/shared/crypto.py` — KeyEncryptionService:
- Uses MultiFernet with PLATFORM_ENCRYPTION_KEY (required) and PLATFORM_ENCRYPTION_KEY_PREVIOUS (optional)
- Methods: encrypt(plaintext: str) -> str, decrypt(ciphertext: str) -> str, rotate(ciphertext: str) -> str
- See research Pattern 4 for exact implementation
7. Add to `packages/shared/shared/config.py`:
- platform_encryption_key: str = Field(default="", description="Fernet key for BYO API key encryption")
- platform_encryption_key_previous: str = Field(default="", description="Previous Fernet key for rotation")
- stripe_secret_key: str = Field(default="", description="Stripe secret API key")
- stripe_webhook_secret: str = Field(default="", description="Stripe webhook endpoint secret")
- stripe_per_agent_price_id: str = Field(default="", description="Stripe Price ID for per-agent monthly plan")
- portal_url: str = Field(default="http://localhost:3000", description="Portal base URL for Stripe redirects")
- slack_client_id: str = Field(default="", description="Slack OAuth app client ID")
- slack_client_secret: str = Field(default="", description="Slack OAuth app client secret")
- slack_oauth_redirect_uri: str = Field(default="http://localhost:3000/api/slack/callback", description="Slack OAuth redirect URI")
- oauth_state_secret: str = Field(default="", description="HMAC secret for OAuth state parameter signing")
8. Create Alembic migration `005_billing_and_usage.py`:
- ADD COLUMNS to tenants: stripe_customer_id, stripe_subscription_id, stripe_subscription_item_id, subscription_status, trial_ends_at, agent_quota
- ADD COLUMN to agents: budget_limit_usd
- CREATE TABLE tenant_llm_keys with RLS enabled (same FORCE ROW LEVEL SECURITY pattern as agents)
- CREATE TABLE stripe_events (event_id TEXT PK, processed_at TIMESTAMPTZ DEFAULT now())
- CREATE INDEX idx_audit_events_tenant_type_created ON audit_events (tenant_id, action_type, created_at DESC)
- GRANT SELECT, INSERT, DELETE on tenant_llm_keys to konstruct_app (DELETE needed for key removal)
- GRANT SELECT, INSERT on stripe_events to konstruct_app
9. Write test scaffolds:
- tests/unit/test_key_encryption.py — test encrypt/decrypt roundtrip, rotation, invalid ciphertext
- tests/unit/test_budget_alerts.py — test threshold logic (no limit, under 80%, at 80%, at 100%+)
cd /home/adelorenzo/repos/konstruct && pytest tests/unit/test_key_encryption.py tests/unit/test_budget_alerts.py -x -v
- stripe and cryptography in shared pyproject.toml, recharts and @stripe/stripe-js in portal package.json
- Tenant model has billing fields, Agent model has budget_limit_usd
- TenantLlmKey and StripeEvent models exist in billing.py
- KeyEncryptionService passes encrypt/decrypt/rotate tests
- Budget alert threshold logic passes at all levels
- Alembic migration 005 exists with all schema changes (including DELETE grant on tenant_llm_keys)
- Config has all new settings fields
Task 2: Backend API endpoints — channels, billing, usage aggregation, and audit logger enhancement
packages/shared/shared/api/channels.py,
packages/shared/shared/api/billing.py,
packages/shared/shared/api/usage.py,
packages/shared/shared/api/portal.py,
packages/orchestrator/orchestrator/agents/runner.py,
tests/unit/test_slack_oauth.py,
tests/unit/test_stripe_webhooks.py,
tests/unit/test_usage_aggregation.py
- test_generate_oauth_state: generate_oauth_state(tenant_id, secret) produces base64-encoded string containing tenant_id
- test_verify_oauth_state_valid: verify_oauth_state(valid_state, secret) returns correct tenant_id
- test_verify_oauth_state_tampered: verify_oauth_state(tampered_state, secret) raises ValueError
- test_stripe_webhook_idempotency: processing same event_id twice returns "already_processed" on second call
- test_stripe_subscription_updated: customer.subscription.updated event updates tenant subscription_status
- test_stripe_cancellation: customer.subscription.deleted event sets status=canceled and deactivates agents
- test_usage_group_by_agent: aggregation query groups prompt_tokens, completion_tokens, cost_usd by agent_id
- test_usage_group_by_provider: aggregation query groups cost_usd by provider
1. Create `packages/shared/shared/api/channels.py`:
- `generate_oauth_state(tenant_id: str, secret: str) -> str` — HMAC-SHA256 signed state with nonce (see research Pattern 1)
- `verify_oauth_state(state: str, secret: str) -> str` — returns tenant_id or raises ValueError
- `GET /api/portal/channels/slack/install?tenant_id={id}` — generates state, returns Slack OAuth authorize URL with scopes: app_mentions:read,channels:read,channels:history,chat:write,im:read,im:write,im:history
- `GET /api/portal/channels/slack/callback?code={code}&state={state}` — verifies state, exchanges code via POST to https://slack.com/api/oauth.v2.access, encrypts bot_token with KeyEncryptionService, stores in channel_connections (channel_type="slack", workspace_id=team.id, config={bot_token, bot_user_id, team_name})
- `POST /api/portal/channels/whatsapp/connect` — accepts {tenant_id, phone_number_id, waba_id, system_user_token}, validates by calling GET https://graph.facebook.com/v22.0/{phone_number_id} with token, encrypts token, stores in channel_connections
- `POST /api/portal/channels/{tenant_id}/test` — accepts {channel_type}, loads channel_connection for tenant, sends test message ("Konstruct connected successfully") via the appropriate channel SDK, returns success/failure
2. Create `packages/shared/shared/api/billing.py`:
- `POST /api/portal/billing/checkout` — accepts {tenant_id, agent_count}, creates Stripe Customer if none exists (lazy creation per research recommendation), creates Checkout Session with mode="subscription", trial_period_days=14, quantity=agent_count, returns session.url
- `POST /api/portal/billing/portal` — accepts {tenant_id}, creates Stripe Billing Portal session, returns portal_session.url
- `POST /api/webhooks/stripe` — Stripe webhook handler: reads raw body with request.body(), verifies signature with stripe.Webhook.construct_event(), checks idempotency via StripeEvent table (INSERT ON CONFLICT DO NOTHING), dispatches to handler per event type:
- checkout.session.completed: store subscription_id, subscription_item_id, set status
- customer.subscription.updated: update subscription_status, agent_quota, trial_ends_at
- customer.subscription.deleted: set status=canceled, set Agent.is_active=False for all tenant agents
- invoice.paid: set status=active, re-enable agents
- invoice.payment_failed: set status=past_due
- Use StripeClient pattern (not legacy stripe.api_key): `client = stripe.StripeClient(api_key=settings.stripe_secret_key)`
3. Create `packages/shared/shared/api/usage.py`:
- `GET /api/portal/usage/{tenant_id}/summary?start_date={}&end_date={}` — returns per-agent token usage and cost (SQL aggregate on audit_events WHERE action_type='llm_call', GROUP BY agent_id). Use CAST(:metadata AS jsonb) pattern for asyncpg.
- `GET /api/portal/usage/{tenant_id}/by-provider?start_date={}&end_date={}` — returns cost grouped by provider
- `GET /api/portal/usage/{tenant_id}/message-volume?start_date={}&end_date={}` — returns message count grouped by channel
- `GET /api/portal/usage/{tenant_id}/budget-alerts` — for each agent with budget_limit_usd, compare current month cost_usd sum against limit. Return status: "ok" (<80%), "warning" (80-99%), "exceeded" (>=100%).
- Include the composite index from migration 005 for performance.
4. Register new routers in the appropriate main.py files. Add channels_router, billing_router, llm_keys_router, and usage_router to the FastAPI app. The stripe webhook route should be on a separate prefix (/api/webhooks/stripe) without auth.
5. Enhance audit logger — in `packages/orchestrator/orchestrator/agents/runner.py`, extend the metadata dict passed to `log_llm_call()` to include:
- prompt_tokens: extracted from LiteLLM response usage object
- completion_tokens: extracted from LiteLLM response usage object
- total_tokens: prompt + completion
- cost_usd: use litellm.completion_cost() if available, otherwise estimate from model pricing table
- provider: extract from model string (e.g., "anthropic/claude-sonnet-4" -> "anthropic")
These fields are CRITICAL — the cost dashboard (Plan 04) queries them from audit_events.metadata JSONB.
6. Write test files:
- tests/unit/test_slack_oauth.py — test state generation, verification, and tampered state rejection
- tests/unit/test_stripe_webhooks.py — test idempotency (duplicate event skipped), subscription update, cancellation with agent deactivation
- tests/unit/test_usage_aggregation.py — test per-agent grouping and per-provider grouping with mock audit data
cd /home/adelorenzo/repos/konstruct && pytest tests/unit/test_slack_oauth.py tests/unit/test_stripe_webhooks.py tests/unit/test_usage_aggregation.py -x -v
- Slack OAuth install URL generation and callback exchange work (state HMAC verified)
- WhatsApp manual connect endpoint validates token and stores encrypted
- Test message endpoint sends via appropriate channel
- Stripe checkout session creation, billing portal session, and webhook handler all functional
- Webhook idempotency prevents duplicate processing
- Subscription cancellation deactivates all tenant agents
- Usage aggregation returns per-agent and per-provider data from audit_events
- Budget alerts return correct status for each threshold level
- Audit logger now includes token counts and cost in LLM call metadata
- All unit tests pass
Task 3: LLM key CRUD API endpoints
packages/shared/shared/api/llm_keys.py,
tests/unit/test_llm_keys_crud.py
- test_create_llm_key: POST with {provider, label, api_key} encrypts key and returns {id, provider, label, created_at} (no key in response)
- test_list_llm_keys_redacted: GET returns list with provider, label, created_at, key_hint (last 4 chars) — never the full key
- test_delete_llm_key: DELETE removes key, subsequent GET no longer includes it
- test_create_duplicate_provider: POST with same tenant_id+provider returns 409 Conflict (UNIQUE constraint)
- test_delete_nonexistent_key: DELETE with unknown key_id returns 404
1. Create `packages/shared/shared/api/llm_keys.py`:
- `llm_keys_router = APIRouter(prefix="/api/portal/tenants/{tenant_id}/llm-keys")`
- `GET /api/portal/tenants/{tenant_id}/llm-keys` — list all BYO keys for tenant. Return list of {id, provider, label, key_hint, created_at}. key_hint = last 4 characters of the original key (stored alongside encrypted_key in a separate column, or computed at creation and stored in label metadata). NEVER decrypt the key for listing.
- `POST /api/portal/tenants/{tenant_id}/llm-keys` — accepts {provider: str, label: str, api_key: str}. Encrypt api_key using KeyEncryptionService.encrypt(). Store key_hint (last 4 chars of api_key) for display. Insert into tenant_llm_keys table. Return {id, provider, label, key_hint, created_at} with 201 status. Handle UNIQUE(tenant_id, provider) conflict -> return 409.
- `DELETE /api/portal/tenants/{tenant_id}/llm-keys/{key_id}` — delete the key row. Verify tenant_id matches to prevent cross-tenant deletion. Return 204 on success, 404 if not found.
- Use the same dependency injection pattern as existing portal endpoints (get_db session, tenant authorization).
2. Update migration 005 if needed: add `key_hint` column (VARCHAR(4), nullable=True) to tenant_llm_keys table for storing the last 4 chars safely without decryption on list.
3. Write tests in `tests/unit/test_llm_keys_crud.py`:
- test_create_llm_key: verify encrypted_key is stored (not plaintext), response has no api_key field
- test_list_llm_keys_redacted: verify response never contains encrypted_key or plaintext key, only key_hint
- test_delete_llm_key: verify removal and 204 status
- test_create_duplicate_provider: verify 409 on UNIQUE violation
- test_delete_nonexistent_key: verify 404
cd /home/adelorenzo/repos/konstruct && pytest tests/unit/test_llm_keys_crud.py -x -v
- GET /api/portal/tenants/{tenant_id}/llm-keys returns redacted key list (provider, label, key_hint, created_at)
- POST creates encrypted key and returns 201 with no secret in response
- DELETE removes key and returns 204
- Duplicate provider per tenant returns 409
- Cross-tenant deletion prevented by tenant_id check
- All 5 tests pass
All test scaffolds created and passing:
- `pytest tests/unit/test_key_encryption.py -x` — Fernet encrypt/decrypt/rotate
- `pytest tests/unit/test_budget_alerts.py -x` — threshold logic
- `pytest tests/unit/test_slack_oauth.py -x` — OAuth state HMAC
- `pytest tests/unit/test_stripe_webhooks.py -x` — idempotency, status updates, cancellation
- `pytest tests/unit/test_usage_aggregation.py -x` — SQL aggregates
- `pytest tests/unit/test_llm_keys_crud.py -x` — LLM key CRUD operations
- `pytest tests/unit -x -q` — full unit suite still green
<success_criteria>
- All 6 test files pass with 0 failures
- Alembic migration 005 exists and is syntactically valid
- New API routers registered and importable (including llm_keys_router)
- KeyEncryptionService encrypt/decrypt roundtrip works
- LLM key CRUD endpoints return redacted data (never expose plaintext keys)
- Audit logger metadata includes prompt_tokens, completion_tokens, cost_usd, provider
- Existing test suite remains green </success_criteria>