- Create 03-01-SUMMARY.md with full plan documentation - Update STATE.md: progress 79%, 4 new decisions, session stopped at 03-01 - Update ROADMAP.md: Phase 3 plan progress (1/4 summaries) - Update REQUIREMENTS.md: mark AGNT-07, LLM-03, PRTA-03, PRTA-05, PRTA-06 complete
12 KiB
12 KiB
phase: 03-operator-experience
plan: 01
subsystem: api
tags: [stripe, fernet, encryption, billing, oauth, hmac, postgresql, alembic, fastapi, audit]
# Dependency graph
requires:
- phase: 02-agent-features
provides: audit_events table, JSONB metadata pattern, RLS framework, AuditBase declarative base
provides:
- Fernet-based KeyEncryptionService with MultiFernet key rotation (crypto.py)
- TenantLlmKey ORM model with encrypted BYO API key storage
- StripeEvent ORM model for webhook idempotency
- Stripe billing fields on Tenant model (stripe_customer_id, subscription_status, agent_quota, trial_ends_at)
- Budget limit field on Agent model (budget_limit_usd)
- Alembic migration 005 (billing columns, tenant_llm_keys, stripe_events, composite audit index)
- Slack OAuth state HMAC generation and verification (channels.py)
- Slack OAuth install URL and callback endpoints
- WhatsApp manual connect endpoint with Meta Graph API token validation
- Stripe Checkout session and Billing Portal session endpoints (billing.py)
- Stripe webhook handler with idempotency, subscription lifecycle management, agent deactivation on cancel
- LLM key CRUD: GET (redacted list), POST (encrypt + store), DELETE (204/404) (llm_keys.py)
- Usage aggregation endpoints: per-agent tokens/cost, per-provider cost, message volume, budget alerts (usage.py)
- compute_budget_status helper: ok/warning/exceeded thresholds at 80% and 100%
- Audit logger enhanced with prompt_tokens, completion_tokens, cost_usd, provider in LLM call metadata
- 32 unit tests passing across all new modules
affects:
- 03-02 (channel connection UI — depends on channels.py endpoints)
- 03-03 (billing UI — depends on billing.py and usage.py endpoints)
- 03-04 (cost dashboard — depends on audit_events.metadata JSONB with token/cost fields)
# Tech tracking
tech-stack:
added:
- stripe>=10.0.0 (Stripe API client with StripeClient pattern)
- cryptography>=42.0.0 (Fernet symmetric encryption via MultiFernet)
- recharts (portal, chart library for cost dashboard)
- "@stripe/stripe-js" (portal, Stripe.js for client-side checkout)
patterns:
- Fernet MultiFernet for BYO key encryption with key rotation support
- HMAC-SHA256 signed OAuth state with embedded nonce (CSRF protection)
- StripeClient(api_key=...) pattern — NOT legacy stripe.api_key module-level approach
- Stripe webhook idempotency via StripeEvent INSERT ... ON CONFLICT guard
- compute_budget_status pure function — threshold logic decoupled from DB for unit testing
- _aggregate_rows_by_agent/_provider helpers — in-memory aggregation for unit testing without DB
- AuditEvent.event_metadata column attribute maps to DB column "metadata" (SQLAlchemy 2.0 reserved name workaround)
key-files:
created:
- packages/shared/shared/crypto.py
- packages/shared/shared/models/billing.py
- packages/shared/shared/api/channels.py
- packages/shared/shared/api/billing.py
- packages/shared/shared/api/llm_keys.py
- packages/shared/shared/api/usage.py
- migrations/versions/005_billing_and_usage.py
- tests/unit/test_key_encryption.py
- tests/unit/test_budget_alerts.py
- tests/unit/test_slack_oauth.py
- tests/unit/test_stripe_webhooks.py
- tests/unit/test_usage_aggregation.py
- tests/unit/test_llm_keys_crud.py
modified:
- packages/shared/shared/config.py (added encryption, stripe, slack oauth settings)
- packages/shared/shared/models/tenant.py (billing fields on Tenant, budget_limit_usd on Agent)
- packages/shared/shared/models/audit.py (renamed metadata → event_metadata attribute)
- packages/shared/shared/api/init.py (export all new routers)
- packages/orchestrator/orchestrator/agents/runner.py (token metadata in audit log)
key-decisions:
- "AuditEvent ORM attribute renamed from 'metadata' to 'event_metadata' — SQLAlchemy 2.0 DeclarativeBase reserves 'metadata' as MetaData object; mapped_column('metadata', ...) preserves DB column name"
- "HMAC OAuth state format: base64url(payload_json).base64url(hmac_sig) with nonce — prevents replay and forgery"
- "StripeClient(api_key=settings.stripe_secret_key) — new v14+ API, thread-safe, replaces legacy stripe.api_key module-level assignment"
- "Webhook idempotency via StripeEvent INSERT + flush + IntegrityError catch — handles concurrent duplicate delivery gracefully"
- "compute_budget_status is a pure function — decoupled from DB so unit tests verify threshold logic without SQL"
- "LLM key listing returns key_hint (last 4 chars) — portal can display ...ABCD without decrypting ciphertext"
patterns-established:
- "Encryption service pattern: KeyEncryptionService wraps MultiFernet, accepts primary_key and optional previous_key for rotation window"
- "Budget alert thresholds: <80% = ok, 80-99% = warning, >=100% = exceeded"
- "Audit metadata fields for cost tracking: prompt_tokens, completion_tokens, total_tokens, cost_usd, provider extracted from model string"
- "Cross-tenant deletion protection: DELETE endpoint queries WHERE key_id = X AND tenant_id = Y"
requirements-completed: [AGNT-07, LLM-03, PRTA-03, PRTA-05, PRTA-06]
# Metrics
duration: 22min
completed: 2026-03-24
Phase 3 Plan 01: Backend Foundation for Operator Experience Summary
Fernet encryption service, Stripe billing integration, HMAC Slack OAuth, LLM key CRUD, usage aggregation endpoints, and 32 unit tests — all backend APIs for Phase 3 portal UI
Performance
- Duration: 22 min
- Started: 2026-03-24T03:14:36Z
- Completed: 2026-03-24T03:36:11Z
- Tasks: 3 (all TDD)
- Files modified: 20
Accomplishments
- Full Fernet/MultiFernet encryption service for BYO API keys with key rotation support
- Complete Stripe billing stack: lazy customer creation, Checkout, Billing Portal, webhook handler with full subscription lifecycle (trialing → active → canceled → agent deactivation)
- Slack OAuth HMAC-signed state generation/verification and full callback flow; WhatsApp manual connect with Meta API token validation
- LLM key CRUD endpoints that never expose plaintext or encrypted keys (key_hint display pattern)
- Usage aggregation: per-agent token counts, per-provider cost, message volume, budget threshold alerts
- Audit logger enhanced with cost/token metadata for cost dashboard queries
- Migration 005 with all billing schema changes, RLS on tenant_llm_keys, composite index on audit_events
Task Commits
Each task was committed atomically:
- Task 1: DB migrations, models, encryption service, and test scaffolds -
215e67a(feat) - Task 2: Backend API endpoints — channels, billing, usage aggregation, and audit logger enhancement -
4cbf192(feat) - Task 3: LLM key CRUD API endpoints -
3c8fc25(feat)
Files Created/Modified
packages/shared/shared/crypto.py— KeyEncryptionService with MultiFernet encrypt/decrypt/rotatepackages/shared/shared/models/billing.py— TenantLlmKey (RLS, UNIQUE provider per tenant) and StripeEvent (idempotency) modelspackages/shared/shared/models/tenant.py— Added 6 billing columns to Tenant, budget_limit_usd to Agentpackages/shared/shared/api/channels.py— Slack OAuth state generation/verification, install URL, callback, WhatsApp connect, test endpointpackages/shared/shared/api/billing.py— Stripe Checkout, billing portal, webhook handler with full subscription lifecyclepackages/shared/shared/api/llm_keys.py— LLM key CRUD: GET (redacted), POST (encrypt+store), DELETE (204/404)packages/shared/shared/api/usage.py— Usage summary, by-provider, message volume, budget alerts, in-memory aggregation helperspackages/shared/shared/config.py— Added platform_encryption_key, stripe_, and slack_oauth settingspackages/shared/shared/models/audit.py— Renamed metadata column attribute to event_metadatapackages/shared/shared/api/__init__.py— Exports all 5 new routerspackages/orchestrator/orchestrator/agents/runner.py— Enhanced audit metadata with token counts and cost_usdmigrations/versions/005_billing_and_usage.py— Full schema migration for billing, RLS, grants, indextests/unit/test_key_encryption.py— 4 encryption tests (roundtrip, random IV, invalid token, rotation)tests/unit/test_budget_alerts.py— 8 threshold tests (none, 50%, 79%, 80%, 95%, 100%, 120%, 0%)tests/unit/test_slack_oauth.py— 6 OAuth state tests (generate, verify, tamper, wrong secret, nonce diff)tests/unit/test_stripe_webhooks.py— 3 webhook tests (idempotency, sub updated, cancellation+deactivation)tests/unit/test_usage_aggregation.py— 6 aggregation tests (per-agent single/multi/empty, per-provider single/multi/empty)tests/unit/test_llm_keys_crud.py— 5 CRUD tests (create, list redacted, delete, duplicate 409, nonexistent 404)
Decisions Made
AuditEvent.event_metadataattribute name — SQLAlchemy 2.0 DeclarativeBase hasmetadataas a reserved attribute (MetaData object). The Python attribute was renamed toevent_metadatawithmapped_column("metadata", ...)preserving the DB column name. The AuditLogger uses raw SQL text() so this only affects ORM read queries.StripeClient(api_key=...)pattern over legacystripe.api_key = ...— thread-safe, explicit per-client key, v14+ recommended approach.- Webhook idempotency: INSERT StripeEvent row, flush, catch IntegrityError on concurrent duplicate delivery — handles Stripe's at-least-once delivery guarantee.
compute_budget_statusas pure function — makes threshold logic easily unit-testable without DB setup.
Deviations from Plan
Auto-fixed Issues
1. [Rule 1 - Bug] Renamed AuditEvent.metadata to event_metadata
- Found during: Task 2 (billing.py import of AuditBase triggered SQLAlchemy class evaluation)
- Issue: SQLAlchemy 2.0 DeclarativeBase reserves
metadataas the MetaData object. Whenbilling.pyimportedAuditBasefromaudit.py, theAuditEventclass definition triggeredInvalidRequestError: Attribute name 'metadata' is reserved - Fix: Renamed attribute to
event_metadatawithmapped_column("metadata", ...)to preserve DB column name. AuditLogger unaffected (uses raw SQL text()) - Files modified: packages/shared/shared/models/audit.py
- Verification: All 32 tests pass including all audit-related tests
- Committed in:
4cbf192(Task 2 commit)
Total deviations: 1 auto-fixed (Rule 1 — bug) Impact on plan: Fix was necessary for correctness; no scope change. AuditLogger raw SQL path was unaffected, only ORM read path changed attribute name.
Issues Encountered
None beyond the auto-fixed bug above.
User Setup Required
The following environment variables must be added before running billing/channel features:
PLATFORM_ENCRYPTION_KEY— Fernet key (python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")PLATFORM_ENCRYPTION_KEY_PREVIOUS— (optional) previous key for rotation windowSTRIPE_SECRET_KEY— Stripe secret API key (sk_test_... or sk_live_...)STRIPE_WEBHOOK_SECRET— Stripe webhook signing secret (whsec_...)STRIPE_PER_AGENT_PRICE_ID— Stripe Price ID for per-agent monthly planSLACK_CLIENT_ID— Slack OAuth app client IDSLACK_CLIENT_SECRET— Slack OAuth app client secretOAUTH_STATE_SECRET— HMAC secret for OAuth state signing (any random hex string)
Next Phase Readiness
- All backend APIs ready for Phase 3 Plans 02-04 frontend work
- channel_connections, tenant_llm_keys, stripe_events tables ready post-migration 005
- Usage aggregation queries depend on audit_events.metadata having prompt_tokens/cost_usd (populated by enhanced runner.py)
- Plan 02 (channel connection UI) can use: channels_router endpoints
- Plan 03 (billing UI) can use: billing_router, usage_router endpoints
- Plan 04 (cost dashboard) can use: usage_router + budget alerts, audit_events composite index
Self-Check: PASSED
All 14 artifact files exist. All 3 commits verified: 215e67a, 4cbf192, 3c8fc25. All 32 tests passing.
Phase: 03-operator-experience Completed: 2026-03-24