Commit Graph

137 Commits

Author SHA1 Message Date
7b0594e7cc test(04-rbac-01): unit tests for RBAC guards, invitation system, portal auth
- test_rbac_guards.py: 11 tests covering platform_admin pass-through,
  customer_admin/operator 403 rejection, tenant membership checks,
  and platform_admin bypass for tenant-scoped guards
- test_invitations.py: 11 tests covering HMAC token roundtrip,
  tamper/expiry rejection, invitation create/accept/resend/list
- test_portal_auth.py: 7 tests covering role field (not is_admin),
  tenant_ids list, active_tenant_id, platform_admin all-tenants,
  customer_admin own-tenants-only
- All 27 tests pass
2026-03-24 13:55:55 -06:00
d59f85cd87 feat(04-rbac-01): RBAC guards + invite token + email + invitation API
- rbac.py: PortalCaller dataclass + get_portal_caller dependency (header-based)
- rbac.py: require_platform_admin (403 for non-platform_admin)
- rbac.py: require_tenant_admin (platform_admin bypasses; customer_admin
  checks UserTenantRole; operator always rejected)
- rbac.py: require_tenant_member (platform_admin bypasses; all roles
  checked against UserTenantRole)
- invite_token.py: generate_invite_token (HMAC-SHA256, base64url, 48h TTL)
- invite_token.py: validate_invite_token (timing-safe compare_digest, TTL check)
- invite_token.py: token_to_hash (SHA-256 for DB storage)
- email.py: send_invite_email (sync smtplib, skips if smtp_host empty)
- invitations.py: POST /api/portal/invitations (create, requires tenant admin)
- invitations.py: POST /api/portal/invitations/accept (accept invitation)
- invitations.py: POST /api/portal/invitations/{id}/resend (regenerate token)
- invitations.py: GET /api/portal/invitations (list pending)
- portal.py: AuthVerifyResponse now returns role+tenant_ids+active_tenant_id
- portal.py: auth/register gated behind require_platform_admin
- tasks.py: send_invite_email_task Celery task (fire-and-forget)
- gateway/main.py: invitations_router mounted
2026-03-24 13:52:45 -06:00
f710c9c5fe feat(04-rbac-01): DB migration 006 + RBAC ORM models + config fields
- Migration 006: adds role TEXT+CHECK column to portal_users, backfills
  is_admin -> platform_admin/customer_admin, drops is_admin
- Migration 006: creates user_tenant_roles table (UNIQUE user_id+tenant_id)
- Migration 006: creates portal_invitations table with token_hash, status, expires_at
- PortalUser: replaced is_admin (bool) with role (str, default customer_admin)
- Added UserRole enum (PLATFORM_ADMIN, CUSTOMER_ADMIN, CUSTOMER_OPERATOR)
- Added UserTenantRole ORM model with FK cascade deletes
- Added PortalInvitation ORM model with token_hash unique constraint
- Settings: added invite_secret, smtp_host, smtp_port, smtp_username,
  smtp_password, smtp_from_email fields
2026-03-24 13:49:16 -06:00
2aecc5c787 fix(04-rbac): revise plans based on checker feedback 2026-03-24 13:46:03 -06:00
bf4adf0b21 docs(04-rbac): create phase plan — 3 plans in 3 waves 2026-03-24 13:37:36 -06:00
4706a87355 docs(04): add research and validation strategy 2026-03-24 13:28:17 -06:00
0dc21c6ee5 docs(04-rbac): research phase RBAC domain 2026-03-24 13:27:22 -06:00
dc758e9e3a docs(state): record phase 4 context session 2026-03-24 13:09:47 -06:00
52a30dd8e1 docs(04): capture phase context 2026-03-24 13:09:47 -06:00
7252845455 docs: add Phase 4 — RBAC with 3-tier roles and invitation flow
Three roles: platform admin (full SaaS), customer admin (tenant-scoped),
customer operator (read-only). Email invitation flow for tenant user
onboarding. 6 new requirements (RBAC-01 through RBAC-06).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:40:43 -06:00
0e0ea5fb66 fix: runtime deployment fixes for Docker Compose stack
- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:26:34 -06:00
d936bcf361 docs(phase-3): complete phase execution 2026-03-24 00:58:18 -06:00
80fd097e25 docs(03-05): complete gap closure plan — router wiring and field name fixes
- Add 03-05-SUMMARY.md
- Update STATE.md: advance metrics, record decision, update session
- Update ROADMAP.md: Phase 3 now shows 5/5 plans complete
2026-03-24 00:55:36 -06:00
7c8d219835 fix(03-05): fix Slack OAuth and budget alert field name mismatches
- Slack callback: check data.ok (not data.success) to match backend response
- SlackInstallResponse: use url + state fields (not authorize_url)
- connect-channel.tsx: update all authorize_url refs to url
- BudgetAlert: use current_usd (not current_cost_usd) to match backend Pydantic model
- usage page: update alert.current_cost_usd to alert.current_usd
2026-03-24 00:54:21 -06:00
c47cc2f5bf feat(03-05): mount Phase 3 API routers on gateway FastAPI app
- Import all 6 Phase 3 routers from shared.api (portal, billing, channels, llm_keys, usage, webhook)
- Add include_router() calls after existing whatsapp_router
- Update module docstring to document portal API endpoints
2026-03-24 00:53:32 -06:00
60c393b137 docs(03): create gap closure plan for router mounting and field name fixes 2026-03-23 22:39:34 -06:00
2416fe36b1 docs(03-04): mark plan complete after human-verify approval 2026-03-23 21:52:20 -06:00
f324beefba docs(03-02): mark plan complete — human-verify approved, state updated to 100%
- STATE.md: percent 86->100, position updated to all phases complete
- ROADMAP.md: Phase 3 Operator Experience marked 4/4 Complete
- Decisions from 03-02 added to accumulated context

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:52:08 -06:00
be61f94941 docs(03-03): complete billing management page plan — human-verify approved
- Updated SUMMARY.md: Task 2 (human-verify) marked approved, plan fully complete
- STATE.md: progress updated to 100%, decisions recorded, session updated
- ROADMAP.md: phase 3 plan progress updated (4/4 summaries complete)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:51:46 -06:00
521cec46f7 docs(03-02): complete onboarding wizard and BYO API key management plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:49:52 -06:00
b73f6bf7da docs(03-04): complete usage & cost dashboard plan
- Added Recharts cost tracking dashboard at /usage/[tenantId]
- UsageChart, ProviderCostChart, MessageVolumeChart components
- BudgetAlertBadge with ok/warning/exceeded color coding
- TanStack Query hooks for usage summary, provider costs, message volume, budget alerts
- Time range selector (Last 7/30 days, This month, Last 3 months)
- Usage nav link and /usage tenant picker index page
- Installed recharts (was in package.json but missing from node_modules)
- Portal builds cleanly with /usage and /usage/[tenantId] routes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:49:13 -06:00
11c1e52ea0 feat(03-02): onboarding wizard, Slack OAuth, WhatsApp connect, and BYO API keys UI
- Slack OAuth callback route handler (/api/slack/callback)
- Onboarding wizard: 3-step stepper (connect channel -> configure agent -> test message)
- Connect Channel: Slack OAuth button + WhatsApp manual credentials form
- Configure Agent: links to Agent Designer, Next enabled only with active agent
- Test Message: per-channel test buttons, required step, no separate Go Live button
- BYO API key management settings page at /settings/api-keys
- API Keys nav link in sidebar
- recharts installed (was missing, blocked portal build)
2026-03-23 21:48:06 -06:00
67632c11ce docs(03-03): complete billing management page plan — paused at human-verify checkpoint
- Billing page at /billing with SubscriptionCard, status badge, agent count +/- adjuster
- BillingStatus component with 6 subscription states (none/trialing/active/past_due/canceled/unpaid)
- TanStack Query mutation hooks: useCreateCheckoutSession, useCreateBillingPortalSession, useUpdateSubscriptionQuantity
- Billing nav link added to dashboard sidebar (CreditCard icon)
- Past-due warning banner and Checkout success toast with session_id URL cleanup
- Stopped at Task 2 checkpoint:human-verify awaiting visual confirmation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:47:31 -06:00
e0342f8ec1 docs(03-01): complete backend foundation plan — billing, encryption, HMAC OAuth, LLM key CRUD, usage aggregation
- Create 03-01-SUMMARY.md with full plan documentation
- Update STATE.md: progress 79%, 4 new decisions, session stopped at 03-01
- Update ROADMAP.md: Phase 3 plan progress (1/4 summaries)
- Update REQUIREMENTS.md: mark AGNT-07, LLM-03, PRTA-03, PRTA-05, PRTA-06 complete
2026-03-23 21:38:10 -06:00
3c8fc255bc feat(03-01): LLM key CRUD API endpoints with encryption
- Create llm_keys.py: GET list (redacted, key_hint only), POST (encrypt + store), DELETE (204 or 404)
- LlmKeyResponse never exposes encrypted_key or raw api_key
- 409 returned on duplicate (tenant_id, provider) key
- Cross-tenant deletion prevented by tenant_id verification in DELETE query
- Update api/__init__.py to export llm_keys_router
- All 5 LLM key CRUD tests passing (32 total unit tests green)
2026-03-23 21:36:08 -06:00
4cbf192fa5 feat(03-01): backend API endpoints — channels, billing, usage, and audit logger enhancement
- Create channels.py: HMAC-signed OAuth state generation/verification, Slack OAuth install/callback, WhatsApp manual connect, test message endpoint
- Create billing.py: Stripe Checkout session, billing portal session, webhook handler with idempotency (StripeEvent table), subscription lifecycle management
- Update usage.py: add _aggregate_rows_by_agent and _aggregate_rows_by_provider helpers (unit-testable without DB), complete usage endpoints
- Fix audit.py: rename 'metadata' attribute to 'event_metadata' (SQLAlchemy 2.0 DeclarativeBase reserves 'metadata')
- Enhance runner.py: audit log now includes prompt_tokens, completion_tokens, total_tokens, cost_usd, provider in LLM call metadata
- Update api/__init__.py to export all new routers
- All 27 unit tests passing
2026-03-23 21:24:08 -06:00
215e67a7eb feat(03-01): DB migrations, models, encryption service, and test scaffolds
- Add stripe and cryptography to shared pyproject.toml
- Add recharts, @stripe/stripe-js, stripe to portal package.json (submodule)
- Add billing fields to Tenant model (stripe_customer_id, subscription_status, agent_quota, trial_ends_at)
- Add budget_limit_usd to Agent model
- Create TenantLlmKey and StripeEvent models in billing.py (AuditBase and Base respectively)
- Create KeyEncryptionService (MultiFernet encrypt/decrypt/rotate) in crypto.py
- Create compute_budget_status helper in usage.py (threshold logic: ok/warning/exceeded)
- Add platform_encryption_key, stripe_, slack_oauth settings to config.py
- Create Alembic migration 005 with all schema changes, RLS, grants, and composite index
- All 12 tests passing (key encryption roundtrip, rotation, budget thresholds)
2026-03-23 21:19:09 -06:00
ac606cf9ff fix(03): revise plans based on checker feedback 2026-03-23 21:10:23 -06:00
1ff61d9ba4 docs(03-operator-experience): create phase plan 2026-03-23 21:03:30 -06:00
a42fa5f38a docs(03): add research and validation strategy 2026-03-23 20:55:12 -06:00
c4ebcf0de4 docs(03): research operator experience phase 2026-03-23 20:54:13 -06:00
a8f48df305 docs: resolve LLM-03 conflict — BYO keys confirmed for v1 Phase 3 2026-03-23 20:06:30 -06:00
c76b1ee3ce docs(state): record phase 3 context session 2026-03-23 20:06:09 -06:00
1672b4cc81 docs(03): capture phase context 2026-03-23 20:06:09 -06:00
c5a4515f8c docs(phase-2): complete phase execution 2026-03-23 19:20:04 -06:00
43cf7d4e63 docs(02-06): complete escalation and WhatsApp routing re-wire plan summary
- Created 02-06-SUMMARY.md documenting escalation wiring, WhatsApp outbound routing, and tier-2 scoping
- Updated STATE.md: advanced progress to 100%, recorded metrics and decisions
- Updated ROADMAP.md: Phase 2 marked Complete (6/6 plans)
2026-03-23 19:16:56 -06:00
bd217a4113 feat(02-06): re-wire escalation and WhatsApp outbound routing in pipeline
- Move key imports to module level in tasks.py for testability and clarity
- Pop WhatsApp extras (phone_number_id, bot_token) in handle_message before model_validate
- Build unified extras dict and extract wa_id from sender.user_id
- Change _process_message signature to accept extras dict
- Add _build_response_extras() helper for channel-aware extras assembly
- Replace all _update_slack_placeholder calls in _process_message with _send_response()
- Add escalation pre-check: skip LLM when Redis escalation_status_key == 'escalated'
- Add escalation post-check: check_escalation_rules after run_agent; call escalate_to_human
  when rule matches and agent.escalation_assignee is set
- Add _build_conversation_metadata() helper (billing keyword v1 detection)
- Add channel parameter to build_system_prompt(), build_messages_with_memory(),
  build_messages_with_media() for WhatsApp tier-2 business-function scoping
- WhatsApp scoping appends 'You only handle: {topics}' when tool_assignments non-empty
- Pass msg.channel to build_messages_with_memory() in _process_message
- All 26 new tests pass; all existing escalation/WhatsApp tests pass (no regressions)
2026-03-23 19:15:20 -06:00
77c9cfc825 test(02-06): add failing tests for escalation wiring and WhatsApp outbound routing
- Tests for handle_message WhatsApp extra extraction (phone_number_id, bot_token)
- Tests for _send_response routing to Slack and WhatsApp
- Tests for _process_message using _send_response (not _update_slack_placeholder directly)
- Tests for escalation pre-check (skip LLM when already escalated)
- Tests for escalation post-check (check_escalation_rules + escalate_to_human)
- Tests for _build_conversation_metadata billing keyword extraction
- Tests for build_system_prompt WhatsApp tier-2 scoping (Task 2)
- Tests for build_messages_with_memory channel parameter passthrough
2026-03-23 19:08:59 -06:00
48d9ef0c29 docs(02-agent-features): create gap closure plan for escalation and WhatsApp outbound wiring 2026-03-23 19:03:24 -06:00
d921ed776a docs(02-05): complete multimodal media support plan summary
- Add 02-05-SUMMARY.md with full task documentation and deviations
- Update STATE.md: advance to plan 5 of 5 in phase 02, add decisions
- Update ROADMAP.md: phase 2 now 5/5 plans complete (Complete status)
2026-03-23 15:21:38 -06:00
669c0b52b3 feat(02-05): multimodal LLM interpretation with image_url content blocks
- Add supports_vision(model_name) to builder.py — detects vision-capable models
  (claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*)
  with provider prefix stripping support
- Add generate_presigned_url(storage_key, expiry=3600) to builder.py — generates
  1-hour MinIO presigned URLs via boto3 S3 client
- Add build_messages_with_media() to builder.py — extends build_messages_with_memory()
  with media injection: IMAGE -> image_url blocks for vision models / text fallback for
  non-vision models, DOCUMENT -> text reference with presigned URL
- image_url blocks use 'detail: auto' per OpenAI/LiteLLM multipart format
- Add 27 unit tests in test_multimodal_messages.py (TDD)
2026-03-23 15:09:18 -06:00
9dd7c481a3 feat(02-05): Slack file_share extraction and channel-aware outbound routing
- Add gateway/channels/slack_media.py with is_file_share_event, media_type_from_mime,
  build_slack_storage_key, build_attachment_from_slack_file, download_and_store_slack_file
- Add _send_response() helper to orchestrator/tasks.py for channel-aware dispatch
  (Slack -> chat.update, WhatsApp -> send_whatsapp_message)
- Add send_whatsapp_message import to orchestrator/tasks.py for WhatsApp outbound
- Add boto3>=1.35.0 to gateway dependencies for MinIO S3 client
- Add 23 unit tests in test_slack_media.py (TDD)
2026-03-23 15:06:45 -06:00
eba6c85188 docs(02-02): complete tool framework and audit logging plan
- 02-02-SUMMARY.md: tool registry, executor, 4 built-in tools, immutable audit trail
- STATE.md: progress 89%, decisions recorded, session updated
- ROADMAP.md: phase 2 plan progress updated (4 of 5 summaries)
- REQUIREMENTS.md: AGNT-04 and AGNT-06 marked complete
2026-03-23 15:02:27 -06:00
44fa7e6845 feat(02-02): wire tool-call loop into agent runner and orchestrator pipeline
- runner.py: multi-turn tool-call loop (LLM -> tool -> observe -> respond)
- runner.py: max 5 iterations guard against runaway tool chains
- runner.py: confirmation gate — returns confirmation msg, stops loop
- runner.py: audit logging for every LLM call via audit_logger
- tasks.py: AuditLogger initialized at task start with session factory
- tasks.py: tool registry built from agent.tool_assignments
- tasks.py: pending tool confirmation flow via Redis (10 min TTL)
- tasks.py: memory persistence skipped for confirmation request responses
- llm-pool/router.py: LLMResponse model with content + tool_calls fields
- llm-pool/router.py: tools parameter forwarded to litellm.acompletion()
- llm-pool/main.py: CompleteRequest accepts optional tools list
- llm-pool/main.py: CompleteResponse includes tool_calls field
- Migration renamed to 004 (003 was already taken by escalation migration)
- [Rule 1 - Bug] Renamed 003_phase2_audit_kb.py -> 004 to fix duplicate revision ID
2026-03-23 15:00:17 -06:00
d1bcdef0f5 docs(02-04): complete human escalation handoff plan
- Summary with decisions, metrics, and self-check
- STATE.md: advance progress to 78%, add decisions, record session
- ROADMAP.md: update phase 2 plan progress (3 of 5 complete)
- REQUIREMENTS.md: mark AGNT-05 complete
2026-03-23 14:55:22 -06:00
f49927888e feat(02-02): tool registry, executor, and 4 built-in tools
- ToolDefinition Pydantic model with JSON Schema parameters + handler
- BUILTIN_TOOLS: web_search, kb_search, http_request, calendar_lookup
- http_request requires_confirmation=True (outbound side effects)
- get_tools_for_agent filters by agent.tool_assignments
- to_litellm_format converts to OpenAI function-calling schema
- execute_tool: jsonschema validation before handler call
- execute_tool: confirmation gate for requires_confirmation=True
- execute_tool: audit logging on every invocation (success + failure)
- web_search: Brave Search API with BRAVE_API_KEY env var
- kb_search: pgvector cosine similarity with HNSW index
- http_request: 30s timeout, 1MB cap, GET/POST/PUT/DELETE only
- calendar_lookup: Google Calendar events.list read-only
- jsonschema dependency added to orchestrator pyproject.toml
- [Rule 1 - Bug] Added missing execute_tool import in test
2026-03-23 14:54:14 -06:00
a025cadc44 feat(02-04): wire escalation into orchestrator pipeline
- Add escalation pre-check in _process_message: assistant mode for escalated threads
- Add escalation post-check after LLM response: calls escalate_to_human on rule match
- Load Slack bot token unconditionally (needed for escalation DM, not just placeholders)
- Add keyword-based conversation metadata detector (billing keywords, attempt counter)
- Add no-op audit logger stub (replaced by real AuditLogger from Plan 02 when available)
- Add escalation_assignee and natural_language_escalation fields to Agent model
- Add Alembic migration 003 for new Agent columns
2026-03-23 14:53:45 -06:00
420294b8fe test(02-02): add failing tool registry and executor unit tests
- Tests for BUILTIN_TOOLS (4 tools present, correct fields, confirmation flags)
- Tests for get_tools_for_agent filtering and to_litellm_format conversion
- Tests for execute_tool: valid args, invalid args, unknown tool, confirmation flow
- Tests for audit logger called on every invocation
2026-03-23 14:51:42 -06:00
4047b552a7 feat(02-04): implement escalation handler (rule evaluator, transcript, DM delivery)
- check_escalation_rules: condition parser for 'keyword AND count > N' and NL phrases
- build_transcript: formats messages as Slack mrkdwn, truncates at 3000 chars
- escalate_to_human: opens DM, posts transcript, sets Redis key, logs audit event
2026-03-23 14:50:56 -06:00
30b9f60668 feat(02-02): audit model, KB model, migration, and audit logger
- AuditEvent ORM model with tenant_id, action_type, latency_ms, metadata
- KnowledgeBaseDocument and KBChunk ORM models for vector KB
- Migration 003: audit_events (immutable via REVOKE), kb_documents, kb_chunks
  with HNSW index and RLS on all tables
- AuditLogger with log_llm_call, log_tool_call, log_escalation methods
- audit_events immutability enforced at DB level (UPDATE/DELETE rejected)
- [Rule 1 - Bug] Fixed CAST(:metadata AS jsonb) for asyncpg compatibility
2026-03-23 14:50:51 -06:00