docs(02-05): complete multimodal media support plan summary
- Add 02-05-SUMMARY.md with full task documentation and deviations - Update STATE.md: advance to plan 5 of 5 in phase 02, add decisions - Update ROADMAP.md: phase 2 now 5/5 plans complete (Complete status)
This commit is contained in:
@@ -13,7 +13,7 @@ Konstruct ships in three coarse phases ordered by dependency: first build the se
|
||||
Decimal phases appear between their surrounding integers in numeric order.
|
||||
|
||||
- [x] **Phase 1: Foundation** - Secure multi-tenant pipeline with Slack end-to-end and basic agent response (completed 2026-03-23)
|
||||
- [ ] **Phase 2: Agent Features** - Persistent memory, tool framework, WhatsApp integration, and human escalation
|
||||
- [x] **Phase 2: Agent Features** - Persistent memory, tool framework, WhatsApp integration, and human escalation (completed 2026-03-23)
|
||||
- [ ] **Phase 3: Operator Experience** - Admin portal, tenant onboarding, and Stripe billing
|
||||
|
||||
## Phase Details
|
||||
@@ -79,7 +79,7 @@ Phases execute in numeric order: 1 → 2 → 3
|
||||
| Phase | Plans Complete | Status | Completed |
|
||||
|-------|----------------|--------|-----------|
|
||||
| 1. Foundation | 4/4 | Complete | 2026-03-23 |
|
||||
| 2. Agent Features | 4/5 | In Progress| |
|
||||
| 2. Agent Features | 5/5 | Complete | 2026-03-23 |
|
||||
| 3. Operator Experience | 0/2 | Not started | - |
|
||||
|
||||
---
|
||||
|
||||
@@ -3,8 +3,8 @@ gsd_state_version: 1.0
|
||||
milestone: v1.0
|
||||
milestone_name: milestone
|
||||
status: planning
|
||||
stopped_at: Completed 02-agent-features/02-02-PLAN.md
|
||||
last_updated: "2026-03-23T21:02:15.263Z"
|
||||
stopped_at: Completed 02-agent-features/02-05-PLAN.md
|
||||
last_updated: "2026-03-23T21:35:00.000Z"
|
||||
last_activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning
|
||||
progress:
|
||||
total_phases: 3
|
||||
@@ -25,12 +25,12 @@ See: .planning/PROJECT.md (updated 2026-03-22)
|
||||
|
||||
## Current Position
|
||||
|
||||
Phase: 1 of 3 (Foundation)
|
||||
Plan: 0 of 3 in current phase
|
||||
Status: Ready to plan
|
||||
Last activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning
|
||||
Phase: 2 of 3 (Agent Features)
|
||||
Plan: 5 of 5 in current phase
|
||||
Status: In progress
|
||||
Last activity: 2026-03-23 — Completed 02-05 multimodal media support and WhatsApp outbound routing
|
||||
|
||||
Progress: [░░░░░░░░░░] 0%
|
||||
Progress: [████████░░] 78%
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
@@ -58,6 +58,7 @@ Progress: [░░░░░░░░░░] 0%
|
||||
| Phase 02-agent-features P02-01 | 9m 22s | 2 tasks | 15 files |
|
||||
| Phase 02-agent-features P04 | 5m | 2 tasks | 7 files |
|
||||
| Phase 02-agent-features P02 | 12m 22s | 3 tasks | 19 files |
|
||||
| Phase 02-agent-features P05 | ~25m | 2 tasks | 6 files |
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
@@ -95,6 +96,10 @@ Recent decisions affecting current work:
|
||||
- [Phase 02-agent-features]: CAST(:metadata AS jsonb) for asyncpg JSONB params — :: cast syntax fails with named params
|
||||
- [Phase 02-agent-features]: Migration 004 (not 003) for audit_events — 003_escalation_fields.py claimed revision 003 first
|
||||
- [Phase 02-agent-features]: AuditLogger uses raw INSERT text() — ORM model would allow accidental SQLAlchemy UPDATE/DELETE on audit rows
|
||||
- [Phase 02-agent-features]: boto3 added to gateway pyproject.toml explicitly — was used via local import in whatsapp.py but never declared, causing ModuleNotFoundError in tests
|
||||
- [Phase 02-agent-features]: boto3 patched at import site patch('boto3.client') not patch('module.boto3') — local imports inside async functions require patching the actual module, not the module attribute
|
||||
- [Phase 02-agent-features]: build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, all memory context preserved alongside image_url blocks
|
||||
- [Phase 02-agent-features]: AUDIO/VIDEO attachments text-referenced only in v1 — OpenAI image_url blocks support images only, not audio/video
|
||||
|
||||
### Pending Todos
|
||||
|
||||
@@ -106,6 +111,6 @@ None yet.
|
||||
|
||||
## Session Continuity
|
||||
|
||||
Last session: 2026-03-23T21:02:15.260Z
|
||||
Stopped at: Completed 02-agent-features/02-02-PLAN.md
|
||||
Last session: 2026-03-23T21:35:00.000Z
|
||||
Stopped at: Completed 02-agent-features/02-05-PLAN.md
|
||||
Resume file: None
|
||||
|
||||
133
.planning/phases/02-agent-features/02-05-SUMMARY.md
Normal file
133
.planning/phases/02-agent-features/02-05-SUMMARY.md
Normal file
@@ -0,0 +1,133 @@
|
||||
---
|
||||
phase: 02-agent-features
|
||||
plan: "05"
|
||||
subsystem: orchestrator-media
|
||||
tags: [multimodal, media, slack, whatsapp, minio, vision, llm]
|
||||
dependency_graph:
|
||||
requires: ["02-02", "02-03"]
|
||||
provides: ["multimodal-llm-messages", "slack-media-extraction", "whatsapp-outbound-routing"]
|
||||
affects: ["orchestrator/tasks.py", "orchestrator/agents/builder.py", "gateway/channels/slack_media.py"]
|
||||
tech_stack:
|
||||
added: ["boto3>=1.35.0 (gateway dependency)"]
|
||||
patterns:
|
||||
- "OpenAI multipart content format: {role: user, content: [{type: text}, {type: image_url}]}"
|
||||
- "Vision model detection via regex pattern matching on model name"
|
||||
- "Channel-aware outbound routing via _send_response() dispatcher"
|
||||
- "TDD: RED (failing tests) -> GREEN (passing implementation)"
|
||||
key_files:
|
||||
created:
|
||||
- packages/gateway/gateway/channels/slack_media.py
|
||||
- tests/unit/test_slack_media.py
|
||||
- tests/unit/test_multimodal_messages.py
|
||||
modified:
|
||||
- packages/orchestrator/orchestrator/tasks.py
|
||||
- packages/orchestrator/orchestrator/agents/builder.py
|
||||
- packages/gateway/pyproject.toml
|
||||
key_decisions:
|
||||
- "boto3 added to gateway pyproject.toml — not previously installed despite whatsapp.py using it via local import; uv add --directory installs at workspace root level"
|
||||
- "boto3.client patched at import site in tests (patch('boto3.client')) — boto3 is a local import inside async functions, not a module-level attribute"
|
||||
- "build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, not a replacement, preserving all memory context"
|
||||
- "AUDIO/VIDEO attachments text-referenced only in v1 — image_url blocks are image-only in OpenAI/LiteLLM format"
|
||||
- "Non-vision model graceful degradation: IMAGE -> '[Image attached: filename]' appended to text content"
|
||||
metrics:
|
||||
duration: "~25 minutes"
|
||||
completed_date: "2026-03-23"
|
||||
tasks_completed: 2
|
||||
files_changed: 6
|
||||
---
|
||||
|
||||
# Phase 2 Plan 05: Cross-Channel Media Support and Multimodal LLM Interpretation Summary
|
||||
|
||||
Multimodal media pipeline wired end-to-end: Slack file_share events download to MinIO, WhatsApp responses routed via send_whatsapp_message, and LLM prompts enriched with image_url content blocks for vision-capable models.
|
||||
|
||||
## What Was Built
|
||||
|
||||
### Task 1: Slack file_share extraction and channel-aware outbound routing (commit: 9dd7c48)
|
||||
|
||||
**New: `packages/gateway/gateway/channels/slack_media.py`**
|
||||
|
||||
Provides the complete Slack file_share pipeline:
|
||||
- `is_file_share_event(event)` — detects `subtype == "file_share"`
|
||||
- `media_type_from_mime(mime_type)` — maps MIME types to MediaType enum (IMAGE/DOCUMENT/AUDIO/VIDEO)
|
||||
- `build_slack_storage_key(tenant_id, agent_id, message_id, filename)` — generates `{tenant_id}/{agent_id}/{message_id}/{filename}` storage key
|
||||
- `build_attachment_from_slack_file(file_info, storage_key)` — creates MediaAttachment from Slack file metadata
|
||||
- `download_and_store_slack_file(...)` — async: downloads via bot token, uploads to MinIO, returns (storage_key, presigned_url)
|
||||
|
||||
**Updated: `packages/orchestrator/orchestrator/tasks.py`**
|
||||
|
||||
Added `_send_response(channel, text, extras)` helper function for channel-aware outbound routing:
|
||||
- `channel == "slack"` → calls `_update_slack_placeholder()` with `bot_token`, `channel_id`, `placeholder_ts` from extras
|
||||
- `channel == "whatsapp"` → calls `send_whatsapp_message()` with `phone_number_id`, `bot_token` (access_token), `wa_id` from extras
|
||||
- Unknown channels → logs warning and returns (no crash)
|
||||
|
||||
Added `send_whatsapp_message` import from `gateway.channels.whatsapp` at module top.
|
||||
|
||||
**Updated: `packages/gateway/pyproject.toml`**
|
||||
|
||||
Added `boto3>=1.35.0` as an explicit gateway dependency (was used via local import in whatsapp.py but never declared).
|
||||
|
||||
### Task 2: Multimodal LLM interpretation with image_url content blocks (commit: 669c0b5)
|
||||
|
||||
**Updated: `packages/orchestrator/orchestrator/agents/builder.py`**
|
||||
|
||||
Added three new functions:
|
||||
|
||||
1. `supports_vision(model_name: str) -> bool`
|
||||
- Strips provider prefixes (`anthropic/`, `openai/`) before matching
|
||||
- Returns True for: claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*
|
||||
- Returns False for: gpt-3.5-turbo, gpt-4, ollama/llama3, etc.
|
||||
|
||||
2. `generate_presigned_url(storage_key: str, expiry: int = 3600) -> str`
|
||||
- Uses boto3 S3 client with MinIO endpoint from shared settings
|
||||
- Default 1-hour expiry
|
||||
- Returns presigned GET URL string
|
||||
|
||||
3. `build_messages_with_media(agent, current_message, media_attachments, recent_messages, relevant_context) -> list[dict]`
|
||||
- Extends `build_messages_with_memory()` with media injection
|
||||
- IMAGE + vision model → multipart content list with `image_url` blocks (`detail: "auto"`)
|
||||
- IMAGE + non-vision model → appends `[Image attached: filename]` text
|
||||
- DOCUMENT → appends `[Document attached: filename - presigned_url]` text (any model)
|
||||
- AUDIO/VIDEO → appends `[Audio/Video attached: filename]` text
|
||||
- Missing storage_key → skipped gracefully (logged at DEBUG)
|
||||
- Memory context (recent + relevant) fully preserved alongside media
|
||||
|
||||
## Tests
|
||||
|
||||
| File | Tests | Coverage |
|
||||
|------|-------|----------|
|
||||
| `tests/unit/test_slack_media.py` | 23 | file_share detection, MIME mapping, storage key format, MinIO upload (mocked), bot token auth, channel-aware routing |
|
||||
| `tests/unit/test_multimodal_messages.py` | 27 | supports_vision (14 models), presigned URL (expiry, key), build_messages_with_media (8 scenarios) |
|
||||
| **Total new tests** | **50** | |
|
||||
|
||||
Full suite: 308 tests, all passing.
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 3 - Blocking] boto3 not installed despite plan assumption**
|
||||
- **Found during:** Task 1 (GREEN phase) — test patching `boto3.client` failed with `ModuleNotFoundError`
|
||||
- **Issue:** Plan stated "boto3 may already be installed from Plan 02-03" — it was used via local import in whatsapp.py but never declared as a dependency, so it wasn't installed in the venv
|
||||
- **Fix:** Added `boto3>=1.35.0` to `packages/gateway/pyproject.toml` dependencies and ran `uv add boto3` to install at workspace root level
|
||||
- **Files modified:** `packages/gateway/pyproject.toml`, `pyproject.toml` (lockfile update), `uv.lock`
|
||||
- **Commit:** 9dd7c48
|
||||
|
||||
**2. [Rule 1 - Bug] boto3 test patch target mismatch**
|
||||
- **Found during:** Task 1 (GREEN phase) — `patch("gateway.channels.slack_media.boto3")` failed with `AttributeError: does not have the attribute 'boto3'`
|
||||
- **Issue:** boto3 is imported inside the async function (`import boto3` as a local import), not at module level. Patching the module attribute doesn't work for local imports.
|
||||
- **Fix:** Changed test patch target to `patch("boto3.client")` which patches the actual boto3 module's client factory — works regardless of where the import happens
|
||||
- **Files modified:** `tests/unit/test_slack_media.py`
|
||||
- **Commit:** 9dd7c48
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
Files created/verified:
|
||||
- FOUND: packages/gateway/gateway/channels/slack_media.py
|
||||
- FOUND: tests/unit/test_slack_media.py
|
||||
- FOUND: tests/unit/test_multimodal_messages.py
|
||||
|
||||
Commits verified:
|
||||
- FOUND: 9dd7c48 feat(02-05): Slack file_share extraction and channel-aware outbound routing
|
||||
- FOUND: 669c0b5 feat(02-05): multimodal LLM interpretation with image_url content blocks
|
||||
|
||||
Test suite: 308/308 passed
|
||||
Reference in New Issue
Block a user