- complete_stream() in router.py yields token strings via acompletion(stream=True)
- POST /complete/stream returns NDJSON: chunk lines then a done line
- Streaming path does not support tool calls (plain text only)
- Non-streaming POST /complete endpoint unchanged
LLM responses can take >60s (especially with local models). The
WebSocket listener was timing out before the response arrived,
causing agent replies to appear in logs but not in the chat UI.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Chat API queries on web_conversations need tenant context set before
RLS policies allow the SELECT. Also fixes crypto.randomUUID fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add OLLAMA_MODEL setting to shared config (default: qwen3:32b)
- LLM router reads from settings instead of hardcoded model name
- Create .env file with all configurable settings documented
- docker-compose passes OLLAMA_MODEL to llm-pool container
To change the model: edit OLLAMA_MODEL in .env and restart llm-pool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added balanced/economy/local groups alongside fast/quality so all 5
agent model_preference values resolve to real provider groups.
All default to local Ollama qwen3:32b, commercial as fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The gateway never called configure_rls_hook(engine), so SET LOCAL
app.current_tenant was never set for any DB operation through the
portal API endpoints. All tenant-scoped writes (agent creation, etc.)
failed with "new row violates row-level security policy."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create gateway/channels/web.py with normalize_web_event() and /chat/ws/{conversation_id}
WebSocket endpoint (auth via first JSON message, typing indicator, Redis pub-sub response)
- Create shared/api/chat.py with GET/POST/DELETE /api/portal/chat/conversations* REST API
with require_tenant_member RBAC enforcement and RLS context var setup
- Add chat_router to shared/api/__init__.py exports
- Mount chat_router and web_chat_router in gateway/main.py (Phase 6 Web Chat routers)
- All 19 unit tests pass; full 313-test suite green
- Add ChannelType.WEB = 'web' to shared/models/message.py
- Add webchat_response_key() to shared/redis_keys.py
- Create WebConversation and WebConversationMessage ORM models (SQLAlchemy 2.0)
- Create migration 008_web_chat.py with RLS, indexes, and channel_type CHECK update
- Pop conversation_id/portal_user_id extras in handle_message before model_validate
- Add web case to _build_response_extras and _send_response (Redis pub-sub publish)
- Import webchat_response_key in orchestrator/tasks.py
- Write 19 unit tests covering CHAT-01 through CHAT-05 (all pass)
- Add AgentTemplate ORM model to tenant.py (global, not tenant-scoped)
- Create migration 007 with agent_templates table and 7 seed templates
- Create shared/prompts/system_prompt_builder.py with build_system_prompt()
- AI transparency clause always present (non-negotiable per Phase 1 decision)
- Unit tests pass (17 tests, all sections verified)
- Add require_platform_admin guard to GET/POST /tenants, PUT/DELETE /tenants/{id}
- Add require_tenant_member to GET /tenants/{id}, GET agents, GET agent/{id}
- Add require_tenant_admin to POST agents, PUT/DELETE agents
- Add require_tenant_admin to billing checkout and portal endpoints
- Add require_tenant_admin to channels slack/install and whatsapp/connect
- Add require_tenant_member to channels /{tid}/test
- Add require_tenant_admin to all llm_keys endpoints
- Add require_tenant_member to all usage GET endpoints
- Add POST /tenants/{tid}/agents/{aid}/test (require_tenant_member for operators)
- Add GET /tenants/{tid}/users with pending invitations (require_tenant_admin)
- Add GET /admin/users with tenant filter/role filter (require_platform_admin)
- Add POST /admin/impersonate with AuditEvent logging (require_platform_admin)
- Add POST /admin/stop-impersonation with AuditEvent logging (require_platform_admin)
- Slack callback: check data.ok (not data.success) to match backend response
- SlackInstallResponse: use url + state fields (not authorize_url)
- connect-channel.tsx: update all authorize_url refs to url
- BudgetAlert: use current_usd (not current_cost_usd) to match backend Pydantic model
- usage page: update alert.current_cost_usd to alert.current_usd
- Slack OAuth callback route handler (/api/slack/callback)
- Onboarding wizard: 3-step stepper (connect channel -> configure agent -> test message)
- Connect Channel: Slack OAuth button + WhatsApp manual credentials form
- Configure Agent: links to Agent Designer, Next enabled only with active agent
- Test Message: per-channel test buttons, required step, no separate Go Live button
- BYO API key management settings page at /settings/api-keys
- API Keys nav link in sidebar
- recharts installed (was missing, blocked portal build)
- Create llm_keys.py: GET list (redacted, key_hint only), POST (encrypt + store), DELETE (204 or 404)
- LlmKeyResponse never exposes encrypted_key or raw api_key
- 409 returned on duplicate (tenant_id, provider) key
- Cross-tenant deletion prevented by tenant_id verification in DELETE query
- Update api/__init__.py to export llm_keys_router
- All 5 LLM key CRUD tests passing (32 total unit tests green)
- Add stripe and cryptography to shared pyproject.toml
- Add recharts, @stripe/stripe-js, stripe to portal package.json (submodule)
- Add billing fields to Tenant model (stripe_customer_id, subscription_status, agent_quota, trial_ends_at)
- Add budget_limit_usd to Agent model
- Create TenantLlmKey and StripeEvent models in billing.py (AuditBase and Base respectively)
- Create KeyEncryptionService (MultiFernet encrypt/decrypt/rotate) in crypto.py
- Create compute_budget_status helper in usage.py (threshold logic: ok/warning/exceeded)
- Add platform_encryption_key, stripe_, slack_oauth settings to config.py
- Create Alembic migration 005 with all schema changes, RLS, grants, and composite index
- All 12 tests passing (key encryption roundtrip, rotation, budget thresholds)
- Move key imports to module level in tasks.py for testability and clarity
- Pop WhatsApp extras (phone_number_id, bot_token) in handle_message before model_validate
- Build unified extras dict and extract wa_id from sender.user_id
- Change _process_message signature to accept extras dict
- Add _build_response_extras() helper for channel-aware extras assembly
- Replace all _update_slack_placeholder calls in _process_message with _send_response()
- Add escalation pre-check: skip LLM when Redis escalation_status_key == 'escalated'
- Add escalation post-check: check_escalation_rules after run_agent; call escalate_to_human
when rule matches and agent.escalation_assignee is set
- Add _build_conversation_metadata() helper (billing keyword v1 detection)
- Add channel parameter to build_system_prompt(), build_messages_with_memory(),
build_messages_with_media() for WhatsApp tier-2 business-function scoping
- WhatsApp scoping appends 'You only handle: {topics}' when tool_assignments non-empty
- Pass msg.channel to build_messages_with_memory() in _process_message
- All 26 new tests pass; all existing escalation/WhatsApp tests pass (no regressions)
- Add supports_vision(model_name) to builder.py — detects vision-capable models
(claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*)
with provider prefix stripping support
- Add generate_presigned_url(storage_key, expiry=3600) to builder.py — generates
1-hour MinIO presigned URLs via boto3 S3 client
- Add build_messages_with_media() to builder.py — extends build_messages_with_memory()
with media injection: IMAGE -> image_url blocks for vision models / text fallback for
non-vision models, DOCUMENT -> text reference with presigned URL
- image_url blocks use 'detail: auto' per OpenAI/LiteLLM multipart format
- Add 27 unit tests in test_multimodal_messages.py (TDD)
- Add escalation pre-check in _process_message: assistant mode for escalated threads
- Add escalation post-check after LLM response: calls escalate_to_human on rule match
- Load Slack bot token unconditionally (needed for escalation DM, not just placeholders)
- Add keyword-based conversation metadata detector (billing keywords, attempt counter)
- Add no-op audit logger stub (replaced by real AuditLogger from Plan 02 when available)
- Add escalation_assignee and natural_language_escalation fields to Agent model
- Add Alembic migration 003 for new Agent columns
- AuditEvent ORM model with tenant_id, action_type, latency_ms, metadata
- KnowledgeBaseDocument and KBChunk ORM models for vector KB
- Migration 003: audit_events (immutable via REVOKE), kb_documents, kb_chunks
with HNSW index and RLS on all tables
- AuditLogger with log_llm_call, log_tool_call, log_escalation methods
- audit_events immutability enforced at DB level (UPDATE/DELETE rejected)
- [Rule 1 - Bug] Fixed CAST(:metadata AS jsonb) for asyncpg compatibility
- Add MediaType(StrEnum) and MediaAttachment(BaseModel) to shared/models/message.py
- Add media: list[MediaAttachment] field to MessageContent
- Add whatsapp_app_secret, whatsapp_verify_token, and MinIO settings to shared/config.py
- Add normalize_whatsapp_event() to gateway/normalize.py (text, image, document support)
- Create whatsapp.py adapter with verify_whatsapp_signature() and verify_hub_challenge()
- 30 new passing tests (signature verification + normalizer)
- Initialize Next.js 16 project in packages/portal/ with TypeScript, Tailwind 4, shadcn/ui
- Auth.js v5 with Credentials provider calling FastAPI /auth/verify endpoint
- proxy.ts (Next.js 16 replacement for middleware.ts) protects all routes
- Login page with React Hook Form + zod validation (standard-schema resolver for zod v4 compat)
- Agent Designer: prominent dedicated module with Identity, Personality, Configuration,
Capabilities, Escalation, and Status sections; employee-centric language throughout
- Tenant CRUD: list, create (slug auto-gen), view/edit, delete with confirmation
- TanStack Query hooks for all API operations with proper cache invalidation
- Route group (dashboard) provides shared Nav sidebar + QueryClientProvider
- Update docker-compose.yml to add portal service on port 3000
- Deviations: middleware.ts renamed to proxy.ts in Next.js 16; zodResolver replaced with
standardSchemaResolver for zod v4 + @hookform/resolvers v5 compatibility
- Create llm_pool/router.py: LiteLLM Router with fast (Ollama) and quality (Anthropic/OpenAI) model groups
- Configure fallback chain: quality providers fail -> fast group
- Pin LiteLLM to ==1.82.5 (avoid September 2025 OOM regression in later releases)
- Create llm_pool/main.py: FastAPI service on port 8004 with /complete and /health endpoints
- Add providers/__init__.py: reserved for future per-provider customization
- Update docker-compose.yml: add llm-pool and celery-worker service stubs