diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md index 86ae884..e284531 100644 --- a/.planning/ROADMAP.md +++ b/.planning/ROADMAP.md @@ -13,7 +13,7 @@ Konstruct ships in three coarse phases ordered by dependency: first build the se Decimal phases appear between their surrounding integers in numeric order. - [x] **Phase 1: Foundation** - Secure multi-tenant pipeline with Slack end-to-end and basic agent response (completed 2026-03-23) -- [ ] **Phase 2: Agent Features** - Persistent memory, tool framework, WhatsApp integration, and human escalation +- [x] **Phase 2: Agent Features** - Persistent memory, tool framework, WhatsApp integration, and human escalation (completed 2026-03-23) - [ ] **Phase 3: Operator Experience** - Admin portal, tenant onboarding, and Stripe billing ## Phase Details @@ -79,7 +79,7 @@ Phases execute in numeric order: 1 → 2 → 3 | Phase | Plans Complete | Status | Completed | |-------|----------------|--------|-----------| | 1. Foundation | 4/4 | Complete | 2026-03-23 | -| 2. Agent Features | 4/5 | In Progress| | +| 2. Agent Features | 5/5 | Complete | 2026-03-23 | | 3. Operator Experience | 0/2 | Not started | - | --- diff --git a/.planning/STATE.md b/.planning/STATE.md index 58bfe92..492e48c 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -3,8 +3,8 @@ gsd_state_version: 1.0 milestone: v1.0 milestone_name: milestone status: planning -stopped_at: Completed 02-agent-features/02-02-PLAN.md -last_updated: "2026-03-23T21:02:15.263Z" +stopped_at: Completed 02-agent-features/02-05-PLAN.md +last_updated: "2026-03-23T21:35:00.000Z" last_activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning progress: total_phases: 3 @@ -25,12 +25,12 @@ See: .planning/PROJECT.md (updated 2026-03-22) ## Current Position -Phase: 1 of 3 (Foundation) -Plan: 0 of 3 in current phase -Status: Ready to plan -Last activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning +Phase: 2 of 3 (Agent Features) +Plan: 5 of 5 in current phase +Status: In progress +Last activity: 2026-03-23 — Completed 02-05 multimodal media support and WhatsApp outbound routing -Progress: [░░░░░░░░░░] 0% +Progress: [████████░░] 78% ## Performance Metrics @@ -58,6 +58,7 @@ Progress: [░░░░░░░░░░] 0% | Phase 02-agent-features P02-01 | 9m 22s | 2 tasks | 15 files | | Phase 02-agent-features P04 | 5m | 2 tasks | 7 files | | Phase 02-agent-features P02 | 12m 22s | 3 tasks | 19 files | +| Phase 02-agent-features P05 | ~25m | 2 tasks | 6 files | ## Accumulated Context @@ -95,6 +96,10 @@ Recent decisions affecting current work: - [Phase 02-agent-features]: CAST(:metadata AS jsonb) for asyncpg JSONB params — :: cast syntax fails with named params - [Phase 02-agent-features]: Migration 004 (not 003) for audit_events — 003_escalation_fields.py claimed revision 003 first - [Phase 02-agent-features]: AuditLogger uses raw INSERT text() — ORM model would allow accidental SQLAlchemy UPDATE/DELETE on audit rows +- [Phase 02-agent-features]: boto3 added to gateway pyproject.toml explicitly — was used via local import in whatsapp.py but never declared, causing ModuleNotFoundError in tests +- [Phase 02-agent-features]: boto3 patched at import site patch('boto3.client') not patch('module.boto3') — local imports inside async functions require patching the actual module, not the module attribute +- [Phase 02-agent-features]: build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, all memory context preserved alongside image_url blocks +- [Phase 02-agent-features]: AUDIO/VIDEO attachments text-referenced only in v1 — OpenAI image_url blocks support images only, not audio/video ### Pending Todos @@ -106,6 +111,6 @@ None yet. ## Session Continuity -Last session: 2026-03-23T21:02:15.260Z -Stopped at: Completed 02-agent-features/02-02-PLAN.md +Last session: 2026-03-23T21:35:00.000Z +Stopped at: Completed 02-agent-features/02-05-PLAN.md Resume file: None diff --git a/.planning/phases/02-agent-features/02-05-SUMMARY.md b/.planning/phases/02-agent-features/02-05-SUMMARY.md new file mode 100644 index 0000000..7adadef --- /dev/null +++ b/.planning/phases/02-agent-features/02-05-SUMMARY.md @@ -0,0 +1,133 @@ +--- +phase: 02-agent-features +plan: "05" +subsystem: orchestrator-media +tags: [multimodal, media, slack, whatsapp, minio, vision, llm] +dependency_graph: + requires: ["02-02", "02-03"] + provides: ["multimodal-llm-messages", "slack-media-extraction", "whatsapp-outbound-routing"] + affects: ["orchestrator/tasks.py", "orchestrator/agents/builder.py", "gateway/channels/slack_media.py"] +tech_stack: + added: ["boto3>=1.35.0 (gateway dependency)"] + patterns: + - "OpenAI multipart content format: {role: user, content: [{type: text}, {type: image_url}]}" + - "Vision model detection via regex pattern matching on model name" + - "Channel-aware outbound routing via _send_response() dispatcher" + - "TDD: RED (failing tests) -> GREEN (passing implementation)" +key_files: + created: + - packages/gateway/gateway/channels/slack_media.py + - tests/unit/test_slack_media.py + - tests/unit/test_multimodal_messages.py + modified: + - packages/orchestrator/orchestrator/tasks.py + - packages/orchestrator/orchestrator/agents/builder.py + - packages/gateway/pyproject.toml +key_decisions: + - "boto3 added to gateway pyproject.toml — not previously installed despite whatsapp.py using it via local import; uv add --directory installs at workspace root level" + - "boto3.client patched at import site in tests (patch('boto3.client')) — boto3 is a local import inside async functions, not a module-level attribute" + - "build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, not a replacement, preserving all memory context" + - "AUDIO/VIDEO attachments text-referenced only in v1 — image_url blocks are image-only in OpenAI/LiteLLM format" + - "Non-vision model graceful degradation: IMAGE -> '[Image attached: filename]' appended to text content" +metrics: + duration: "~25 minutes" + completed_date: "2026-03-23" + tasks_completed: 2 + files_changed: 6 +--- + +# Phase 2 Plan 05: Cross-Channel Media Support and Multimodal LLM Interpretation Summary + +Multimodal media pipeline wired end-to-end: Slack file_share events download to MinIO, WhatsApp responses routed via send_whatsapp_message, and LLM prompts enriched with image_url content blocks for vision-capable models. + +## What Was Built + +### Task 1: Slack file_share extraction and channel-aware outbound routing (commit: 9dd7c48) + +**New: `packages/gateway/gateway/channels/slack_media.py`** + +Provides the complete Slack file_share pipeline: +- `is_file_share_event(event)` — detects `subtype == "file_share"` +- `media_type_from_mime(mime_type)` — maps MIME types to MediaType enum (IMAGE/DOCUMENT/AUDIO/VIDEO) +- `build_slack_storage_key(tenant_id, agent_id, message_id, filename)` — generates `{tenant_id}/{agent_id}/{message_id}/{filename}` storage key +- `build_attachment_from_slack_file(file_info, storage_key)` — creates MediaAttachment from Slack file metadata +- `download_and_store_slack_file(...)` — async: downloads via bot token, uploads to MinIO, returns (storage_key, presigned_url) + +**Updated: `packages/orchestrator/orchestrator/tasks.py`** + +Added `_send_response(channel, text, extras)` helper function for channel-aware outbound routing: +- `channel == "slack"` → calls `_update_slack_placeholder()` with `bot_token`, `channel_id`, `placeholder_ts` from extras +- `channel == "whatsapp"` → calls `send_whatsapp_message()` with `phone_number_id`, `bot_token` (access_token), `wa_id` from extras +- Unknown channels → logs warning and returns (no crash) + +Added `send_whatsapp_message` import from `gateway.channels.whatsapp` at module top. + +**Updated: `packages/gateway/pyproject.toml`** + +Added `boto3>=1.35.0` as an explicit gateway dependency (was used via local import in whatsapp.py but never declared). + +### Task 2: Multimodal LLM interpretation with image_url content blocks (commit: 669c0b5) + +**Updated: `packages/orchestrator/orchestrator/agents/builder.py`** + +Added three new functions: + +1. `supports_vision(model_name: str) -> bool` + - Strips provider prefixes (`anthropic/`, `openai/`) before matching + - Returns True for: claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2* + - Returns False for: gpt-3.5-turbo, gpt-4, ollama/llama3, etc. + +2. `generate_presigned_url(storage_key: str, expiry: int = 3600) -> str` + - Uses boto3 S3 client with MinIO endpoint from shared settings + - Default 1-hour expiry + - Returns presigned GET URL string + +3. `build_messages_with_media(agent, current_message, media_attachments, recent_messages, relevant_context) -> list[dict]` + - Extends `build_messages_with_memory()` with media injection + - IMAGE + vision model → multipart content list with `image_url` blocks (`detail: "auto"`) + - IMAGE + non-vision model → appends `[Image attached: filename]` text + - DOCUMENT → appends `[Document attached: filename - presigned_url]` text (any model) + - AUDIO/VIDEO → appends `[Audio/Video attached: filename]` text + - Missing storage_key → skipped gracefully (logged at DEBUG) + - Memory context (recent + relevant) fully preserved alongside media + +## Tests + +| File | Tests | Coverage | +|------|-------|----------| +| `tests/unit/test_slack_media.py` | 23 | file_share detection, MIME mapping, storage key format, MinIO upload (mocked), bot token auth, channel-aware routing | +| `tests/unit/test_multimodal_messages.py` | 27 | supports_vision (14 models), presigned URL (expiry, key), build_messages_with_media (8 scenarios) | +| **Total new tests** | **50** | | + +Full suite: 308 tests, all passing. + +## Deviations from Plan + +### Auto-fixed Issues + +**1. [Rule 3 - Blocking] boto3 not installed despite plan assumption** +- **Found during:** Task 1 (GREEN phase) — test patching `boto3.client` failed with `ModuleNotFoundError` +- **Issue:** Plan stated "boto3 may already be installed from Plan 02-03" — it was used via local import in whatsapp.py but never declared as a dependency, so it wasn't installed in the venv +- **Fix:** Added `boto3>=1.35.0` to `packages/gateway/pyproject.toml` dependencies and ran `uv add boto3` to install at workspace root level +- **Files modified:** `packages/gateway/pyproject.toml`, `pyproject.toml` (lockfile update), `uv.lock` +- **Commit:** 9dd7c48 + +**2. [Rule 1 - Bug] boto3 test patch target mismatch** +- **Found during:** Task 1 (GREEN phase) — `patch("gateway.channels.slack_media.boto3")` failed with `AttributeError: does not have the attribute 'boto3'` +- **Issue:** boto3 is imported inside the async function (`import boto3` as a local import), not at module level. Patching the module attribute doesn't work for local imports. +- **Fix:** Changed test patch target to `patch("boto3.client")` which patches the actual boto3 module's client factory — works regardless of where the import happens +- **Files modified:** `tests/unit/test_slack_media.py` +- **Commit:** 9dd7c48 + +## Self-Check: PASSED + +Files created/verified: +- FOUND: packages/gateway/gateway/channels/slack_media.py +- FOUND: tests/unit/test_slack_media.py +- FOUND: tests/unit/test_multimodal_messages.py + +Commits verified: +- FOUND: 9dd7c48 feat(02-05): Slack file_share extraction and channel-aware outbound routing +- FOUND: 669c0b5 feat(02-05): multimodal LLM interpretation with image_url content blocks + +Test suite: 308/308 passed