Commit Graph

17 Commits

Author SHA1 Message Date
9090b54f43 feat(streaming): add run_agent_streaming() to orchestrator runner
- run_agent_streaming() calls POST /complete/stream and yields token strings
- Reads NDJSON lines from the streaming response, yields on 'chunk' events
- On 'error' or connection failure, yields the fallback response string
- Tool calls are not supported in the streaming path
- Existing run_agent() (non-streaming, tool-call loop) is unchanged
2026-03-25 17:57:00 -06:00
c72beb916b feat(06-01): add web channel type, Redis key, ORM models, migration, and tests
- Add ChannelType.WEB = 'web' to shared/models/message.py
- Add webchat_response_key() to shared/redis_keys.py
- Create WebConversation and WebConversationMessage ORM models (SQLAlchemy 2.0)
- Create migration 008_web_chat.py with RLS, indexes, and channel_type CHECK update
- Pop conversation_id/portal_user_id extras in handle_message before model_validate
- Add web case to _build_response_extras and _send_response (Redis pub-sub publish)
- Import webchat_response_key in orchestrator/tasks.py
- Write 19 unit tests covering CHAT-01 through CHAT-05 (all pass)
2026-03-25 10:26:34 -06:00
d59f85cd87 feat(04-rbac-01): RBAC guards + invite token + email + invitation API
- rbac.py: PortalCaller dataclass + get_portal_caller dependency (header-based)
- rbac.py: require_platform_admin (403 for non-platform_admin)
- rbac.py: require_tenant_admin (platform_admin bypasses; customer_admin
  checks UserTenantRole; operator always rejected)
- rbac.py: require_tenant_member (platform_admin bypasses; all roles
  checked against UserTenantRole)
- invite_token.py: generate_invite_token (HMAC-SHA256, base64url, 48h TTL)
- invite_token.py: validate_invite_token (timing-safe compare_digest, TTL check)
- invite_token.py: token_to_hash (SHA-256 for DB storage)
- email.py: send_invite_email (sync smtplib, skips if smtp_host empty)
- invitations.py: POST /api/portal/invitations (create, requires tenant admin)
- invitations.py: POST /api/portal/invitations/accept (accept invitation)
- invitations.py: POST /api/portal/invitations/{id}/resend (regenerate token)
- invitations.py: GET /api/portal/invitations (list pending)
- portal.py: AuthVerifyResponse now returns role+tenant_ids+active_tenant_id
- portal.py: auth/register gated behind require_platform_admin
- tasks.py: send_invite_email_task Celery task (fire-and-forget)
- gateway/main.py: invitations_router mounted
2026-03-24 13:52:45 -06:00
4cbf192fa5 feat(03-01): backend API endpoints — channels, billing, usage, and audit logger enhancement
- Create channels.py: HMAC-signed OAuth state generation/verification, Slack OAuth install/callback, WhatsApp manual connect, test message endpoint
- Create billing.py: Stripe Checkout session, billing portal session, webhook handler with idempotency (StripeEvent table), subscription lifecycle management
- Update usage.py: add _aggregate_rows_by_agent and _aggregate_rows_by_provider helpers (unit-testable without DB), complete usage endpoints
- Fix audit.py: rename 'metadata' attribute to 'event_metadata' (SQLAlchemy 2.0 DeclarativeBase reserves 'metadata')
- Enhance runner.py: audit log now includes prompt_tokens, completion_tokens, total_tokens, cost_usd, provider in LLM call metadata
- Update api/__init__.py to export all new routers
- All 27 unit tests passing
2026-03-23 21:24:08 -06:00
bd217a4113 feat(02-06): re-wire escalation and WhatsApp outbound routing in pipeline
- Move key imports to module level in tasks.py for testability and clarity
- Pop WhatsApp extras (phone_number_id, bot_token) in handle_message before model_validate
- Build unified extras dict and extract wa_id from sender.user_id
- Change _process_message signature to accept extras dict
- Add _build_response_extras() helper for channel-aware extras assembly
- Replace all _update_slack_placeholder calls in _process_message with _send_response()
- Add escalation pre-check: skip LLM when Redis escalation_status_key == 'escalated'
- Add escalation post-check: check_escalation_rules after run_agent; call escalate_to_human
  when rule matches and agent.escalation_assignee is set
- Add _build_conversation_metadata() helper (billing keyword v1 detection)
- Add channel parameter to build_system_prompt(), build_messages_with_memory(),
  build_messages_with_media() for WhatsApp tier-2 business-function scoping
- WhatsApp scoping appends 'You only handle: {topics}' when tool_assignments non-empty
- Pass msg.channel to build_messages_with_memory() in _process_message
- All 26 new tests pass; all existing escalation/WhatsApp tests pass (no regressions)
2026-03-23 19:15:20 -06:00
669c0b52b3 feat(02-05): multimodal LLM interpretation with image_url content blocks
- Add supports_vision(model_name) to builder.py — detects vision-capable models
  (claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*)
  with provider prefix stripping support
- Add generate_presigned_url(storage_key, expiry=3600) to builder.py — generates
  1-hour MinIO presigned URLs via boto3 S3 client
- Add build_messages_with_media() to builder.py — extends build_messages_with_memory()
  with media injection: IMAGE -> image_url blocks for vision models / text fallback for
  non-vision models, DOCUMENT -> text reference with presigned URL
- image_url blocks use 'detail: auto' per OpenAI/LiteLLM multipart format
- Add 27 unit tests in test_multimodal_messages.py (TDD)
2026-03-23 15:09:18 -06:00
9dd7c481a3 feat(02-05): Slack file_share extraction and channel-aware outbound routing
- Add gateway/channels/slack_media.py with is_file_share_event, media_type_from_mime,
  build_slack_storage_key, build_attachment_from_slack_file, download_and_store_slack_file
- Add _send_response() helper to orchestrator/tasks.py for channel-aware dispatch
  (Slack -> chat.update, WhatsApp -> send_whatsapp_message)
- Add send_whatsapp_message import to orchestrator/tasks.py for WhatsApp outbound
- Add boto3>=1.35.0 to gateway dependencies for MinIO S3 client
- Add 23 unit tests in test_slack_media.py (TDD)
2026-03-23 15:06:45 -06:00
44fa7e6845 feat(02-02): wire tool-call loop into agent runner and orchestrator pipeline
- runner.py: multi-turn tool-call loop (LLM -> tool -> observe -> respond)
- runner.py: max 5 iterations guard against runaway tool chains
- runner.py: confirmation gate — returns confirmation msg, stops loop
- runner.py: audit logging for every LLM call via audit_logger
- tasks.py: AuditLogger initialized at task start with session factory
- tasks.py: tool registry built from agent.tool_assignments
- tasks.py: pending tool confirmation flow via Redis (10 min TTL)
- tasks.py: memory persistence skipped for confirmation request responses
- llm-pool/router.py: LLMResponse model with content + tool_calls fields
- llm-pool/router.py: tools parameter forwarded to litellm.acompletion()
- llm-pool/main.py: CompleteRequest accepts optional tools list
- llm-pool/main.py: CompleteResponse includes tool_calls field
- Migration renamed to 004 (003 was already taken by escalation migration)
- [Rule 1 - Bug] Renamed 003_phase2_audit_kb.py -> 004 to fix duplicate revision ID
2026-03-23 15:00:17 -06:00
f49927888e feat(02-02): tool registry, executor, and 4 built-in tools
- ToolDefinition Pydantic model with JSON Schema parameters + handler
- BUILTIN_TOOLS: web_search, kb_search, http_request, calendar_lookup
- http_request requires_confirmation=True (outbound side effects)
- get_tools_for_agent filters by agent.tool_assignments
- to_litellm_format converts to OpenAI function-calling schema
- execute_tool: jsonschema validation before handler call
- execute_tool: confirmation gate for requires_confirmation=True
- execute_tool: audit logging on every invocation (success + failure)
- web_search: Brave Search API with BRAVE_API_KEY env var
- kb_search: pgvector cosine similarity with HNSW index
- http_request: 30s timeout, 1MB cap, GET/POST/PUT/DELETE only
- calendar_lookup: Google Calendar events.list read-only
- jsonschema dependency added to orchestrator pyproject.toml
- [Rule 1 - Bug] Added missing execute_tool import in test
2026-03-23 14:54:14 -06:00
a025cadc44 feat(02-04): wire escalation into orchestrator pipeline
- Add escalation pre-check in _process_message: assistant mode for escalated threads
- Add escalation post-check after LLM response: calls escalate_to_human on rule match
- Load Slack bot token unconditionally (needed for escalation DM, not just placeholders)
- Add keyword-based conversation metadata detector (billing keywords, attempt counter)
- Add no-op audit logger stub (replaced by real AuditLogger from Plan 02 when available)
- Add escalation_assignee and natural_language_escalation fields to Agent model
- Add Alembic migration 003 for new Agent columns
2026-03-23 14:53:45 -06:00
4047b552a7 feat(02-04): implement escalation handler (rule evaluator, transcript, DM delivery)
- check_escalation_rules: condition parser for 'keyword AND count > N' and NL phrases
- build_transcript: formats messages as Slack mrkdwn, truncates at 3000 chars
- escalate_to_human: opens DM, posts transcript, sets Redis key, logs audit event
2026-03-23 14:50:56 -06:00
30b9f60668 feat(02-02): audit model, KB model, migration, and audit logger
- AuditEvent ORM model with tenant_id, action_type, latency_ms, metadata
- KnowledgeBaseDocument and KBChunk ORM models for vector KB
- Migration 003: audit_events (immutable via REVOKE), kb_documents, kb_chunks
  with HNSW index and RLS on all tables
- AuditLogger with log_llm_call, log_tool_call, log_escalation methods
- audit_events immutability enforced at DB level (UPDATE/DELETE rejected)
- [Rule 1 - Bug] Fixed CAST(:metadata AS jsonb) for asyncpg compatibility
2026-03-23 14:50:51 -06:00
45b957377f feat(02-01): wire two-layer memory into orchestrator pipeline
- builder.py: add build_messages_with_memory() — injects pgvector context as
  system message + sliding window history before current user turn
- runner.py: accept optional messages parameter; fallback to simple build for
  backward compat (existing tests unaffected)
- tasks.py: memory pipeline in _process_message — load short-term + retrieve
  long-term before LLM call; append turns to Redis + dispatch embed_and_store
  fire-and-forget after response
- tasks.py: add embed_and_store Celery task (sync def + asyncio.run()) for
  async pgvector backfill — never blocks the LLM response pipeline
- memory/embedder.py: lazy singleton SentenceTransformer (all-MiniLM-L6-v2)
  with embed_text() / embed_texts() helpers
- All 202 tests pass (196 existing + 6 new memory integration tests)
2026-03-23 14:45:21 -06:00
28a5ee996e feat(02-01): add two-layer memory system — Redis sliding window + pgvector long-term
- ConversationEmbedding ORM model with Vector(384) column (pgvector)
- memory_short_key, escalation_status_key, pending_tool_confirm_key in redis_keys.py
- orchestrator/memory/short_term.py: RPUSH/LTRIM sliding window (get_recent_messages, append_message)
- orchestrator/memory/long_term.py: pgvector HNSW cosine search (retrieve_relevant, store_embedding)
- Migration 002: conversation_embeddings table, HNSW index, RLS with FORCE, SELECT/INSERT only
- 10 unit tests (fakeredis), 6 integration tests (pgvector) — all passing
- Auto-fix [Rule 3]: postgres image updated to pgvector/pgvector:pg16 (extension required)
2026-03-23 14:41:57 -06:00
6f30705e1a feat(01-03): Channel Gateway (Slack adapter) and Message Router
- gateway/normalize.py: normalize_slack_event -> KonstructMessage (strips bot mention)
- gateway/channels/slack.py: register_slack_handlers for app_mention + DM events
  - rate limit check -> ephemeral rejection on exceeded
  - idempotency dedup (Slack retry protection)
  - placeholder 'Thinking...' message posted in-thread before Celery dispatch
  - auto-follow engaged threads with 30-minute TTL
  - HTTP 200 returned immediately; all LLM work dispatched to Celery
- gateway/main.py: FastAPI on port 8001, /slack/events + /health
- router/tenant.py: resolve_tenant workspace_id -> tenant_id (RLS-bypass query)
- router/ratelimit.py: check_rate_limit Redis token bucket, RateLimitExceeded exception
- router/idempotency.py: is_duplicate + mark_processed (SET NX, 24h TTL)
- router/context.py: load_agent_for_tenant with RLS ContextVar setup
- orchestrator/tasks.py: handle_message now extracts placeholder_ts/channel_id,
  calls _update_slack_placeholder via chat.update after LLM response
- docker-compose.yml: gateway service on port 8001
- pyproject.toml: added redis, konstruct-router, konstruct-orchestrator deps
2026-03-23 10:27:59 -06:00
8257c554d7 feat(01-02): Celery orchestrator — handle_message task, system prompt builder, LLM pool runner
- Create orchestrator/main.py: Celery app with Redis broker/backend, task_acks_late=True, 10-min timeout
- Create orchestrator/tasks.py: SYNC def handle_message (critical pattern: asyncio.run for async work)
  - Deserializes KonstructMessage, sets RLS context, loads agent from DB, calls run_agent
  - Retries up to 3x on deserialization failure
- Create orchestrator/agents/builder.py: build_system_prompt assembles system_prompt + identity + persona + AI transparency clause
- Create orchestrator/agents/runner.py: run_agent posts to llm-pool /complete via httpx, returns polite fallback on error
- Add Celery[redis] dependency to orchestrator pyproject.toml
- Create tests/integration/test_llm_fallback.py: 7 tests for fallback routing and 503 on total failure (LLM-01)
- Create tests/integration/test_llm_providers.py: 12 tests verifying all three providers configured correctly (LLM-02)
- All 19 integration tests pass
2026-03-23 10:06:44 -06:00
5714acf741 feat(01-foundation-01): monorepo scaffolding, Docker Compose, and shared data models
- pyproject.toml: uv workspace with 5 member packages (shared, gateway, router, orchestrator, llm-pool)
- docker-compose.yml: PostgreSQL 16 + Redis 7 + Ollama services on konstruct-net
- .env.example: all required env vars documented, konstruct_app role (not superuser)
- scripts/init-db.sh: creates konstruct_app role at DB init time
- packages/shared/shared/config.py: Pydantic Settings loading all env vars
- packages/shared/shared/models/message.py: KonstructMessage, ChannelType, SenderInfo, MessageContent
- packages/shared/shared/models/tenant.py: Tenant, Agent, ChannelConnection SQLAlchemy 2.0 models
- packages/shared/shared/models/auth.py: PortalUser model for admin portal auth
- packages/shared/shared/db.py: async SQLAlchemy engine, session factory, get_session dependency
- packages/shared/shared/rls.py: current_tenant_id ContextVar and configure_rls_hook with parameterized SET LOCAL
- packages/shared/shared/redis_keys.py: tenant-namespaced key constructors (rate_limit, idempotency, session, engaged_thread)
2026-03-23 09:49:28 -06:00