Files
konstruct/.planning/phases/02-agent-features/02-05-SUMMARY.md
Adolfo Delorenzo d921ed776a docs(02-05): complete multimodal media support plan summary
- Add 02-05-SUMMARY.md with full task documentation and deviations
- Update STATE.md: advance to plan 5 of 5 in phase 02, add decisions
- Update ROADMAP.md: phase 2 now 5/5 plans complete (Complete status)
2026-03-23 15:21:38 -06:00

134 lines
7.1 KiB
Markdown

---
phase: 02-agent-features
plan: "05"
subsystem: orchestrator-media
tags: [multimodal, media, slack, whatsapp, minio, vision, llm]
dependency_graph:
requires: ["02-02", "02-03"]
provides: ["multimodal-llm-messages", "slack-media-extraction", "whatsapp-outbound-routing"]
affects: ["orchestrator/tasks.py", "orchestrator/agents/builder.py", "gateway/channels/slack_media.py"]
tech_stack:
added: ["boto3>=1.35.0 (gateway dependency)"]
patterns:
- "OpenAI multipart content format: {role: user, content: [{type: text}, {type: image_url}]}"
- "Vision model detection via regex pattern matching on model name"
- "Channel-aware outbound routing via _send_response() dispatcher"
- "TDD: RED (failing tests) -> GREEN (passing implementation)"
key_files:
created:
- packages/gateway/gateway/channels/slack_media.py
- tests/unit/test_slack_media.py
- tests/unit/test_multimodal_messages.py
modified:
- packages/orchestrator/orchestrator/tasks.py
- packages/orchestrator/orchestrator/agents/builder.py
- packages/gateway/pyproject.toml
key_decisions:
- "boto3 added to gateway pyproject.toml — not previously installed despite whatsapp.py using it via local import; uv add --directory installs at workspace root level"
- "boto3.client patched at import site in tests (patch('boto3.client')) — boto3 is a local import inside async functions, not a module-level attribute"
- "build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, not a replacement, preserving all memory context"
- "AUDIO/VIDEO attachments text-referenced only in v1 — image_url blocks are image-only in OpenAI/LiteLLM format"
- "Non-vision model graceful degradation: IMAGE -> '[Image attached: filename]' appended to text content"
metrics:
duration: "~25 minutes"
completed_date: "2026-03-23"
tasks_completed: 2
files_changed: 6
---
# Phase 2 Plan 05: Cross-Channel Media Support and Multimodal LLM Interpretation Summary
Multimodal media pipeline wired end-to-end: Slack file_share events download to MinIO, WhatsApp responses routed via send_whatsapp_message, and LLM prompts enriched with image_url content blocks for vision-capable models.
## What Was Built
### Task 1: Slack file_share extraction and channel-aware outbound routing (commit: 9dd7c48)
**New: `packages/gateway/gateway/channels/slack_media.py`**
Provides the complete Slack file_share pipeline:
- `is_file_share_event(event)` — detects `subtype == "file_share"`
- `media_type_from_mime(mime_type)` — maps MIME types to MediaType enum (IMAGE/DOCUMENT/AUDIO/VIDEO)
- `build_slack_storage_key(tenant_id, agent_id, message_id, filename)` — generates `{tenant_id}/{agent_id}/{message_id}/{filename}` storage key
- `build_attachment_from_slack_file(file_info, storage_key)` — creates MediaAttachment from Slack file metadata
- `download_and_store_slack_file(...)` — async: downloads via bot token, uploads to MinIO, returns (storage_key, presigned_url)
**Updated: `packages/orchestrator/orchestrator/tasks.py`**
Added `_send_response(channel, text, extras)` helper function for channel-aware outbound routing:
- `channel == "slack"` → calls `_update_slack_placeholder()` with `bot_token`, `channel_id`, `placeholder_ts` from extras
- `channel == "whatsapp"` → calls `send_whatsapp_message()` with `phone_number_id`, `bot_token` (access_token), `wa_id` from extras
- Unknown channels → logs warning and returns (no crash)
Added `send_whatsapp_message` import from `gateway.channels.whatsapp` at module top.
**Updated: `packages/gateway/pyproject.toml`**
Added `boto3>=1.35.0` as an explicit gateway dependency (was used via local import in whatsapp.py but never declared).
### Task 2: Multimodal LLM interpretation with image_url content blocks (commit: 669c0b5)
**Updated: `packages/orchestrator/orchestrator/agents/builder.py`**
Added three new functions:
1. `supports_vision(model_name: str) -> bool`
- Strips provider prefixes (`anthropic/`, `openai/`) before matching
- Returns True for: claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*
- Returns False for: gpt-3.5-turbo, gpt-4, ollama/llama3, etc.
2. `generate_presigned_url(storage_key: str, expiry: int = 3600) -> str`
- Uses boto3 S3 client with MinIO endpoint from shared settings
- Default 1-hour expiry
- Returns presigned GET URL string
3. `build_messages_with_media(agent, current_message, media_attachments, recent_messages, relevant_context) -> list[dict]`
- Extends `build_messages_with_memory()` with media injection
- IMAGE + vision model → multipart content list with `image_url` blocks (`detail: "auto"`)
- IMAGE + non-vision model → appends `[Image attached: filename]` text
- DOCUMENT → appends `[Document attached: filename - presigned_url]` text (any model)
- AUDIO/VIDEO → appends `[Audio/Video attached: filename]` text
- Missing storage_key → skipped gracefully (logged at DEBUG)
- Memory context (recent + relevant) fully preserved alongside media
## Tests
| File | Tests | Coverage |
|------|-------|----------|
| `tests/unit/test_slack_media.py` | 23 | file_share detection, MIME mapping, storage key format, MinIO upload (mocked), bot token auth, channel-aware routing |
| `tests/unit/test_multimodal_messages.py` | 27 | supports_vision (14 models), presigned URL (expiry, key), build_messages_with_media (8 scenarios) |
| **Total new tests** | **50** | |
Full suite: 308 tests, all passing.
## Deviations from Plan
### Auto-fixed Issues
**1. [Rule 3 - Blocking] boto3 not installed despite plan assumption**
- **Found during:** Task 1 (GREEN phase) — test patching `boto3.client` failed with `ModuleNotFoundError`
- **Issue:** Plan stated "boto3 may already be installed from Plan 02-03" — it was used via local import in whatsapp.py but never declared as a dependency, so it wasn't installed in the venv
- **Fix:** Added `boto3>=1.35.0` to `packages/gateway/pyproject.toml` dependencies and ran `uv add boto3` to install at workspace root level
- **Files modified:** `packages/gateway/pyproject.toml`, `pyproject.toml` (lockfile update), `uv.lock`
- **Commit:** 9dd7c48
**2. [Rule 1 - Bug] boto3 test patch target mismatch**
- **Found during:** Task 1 (GREEN phase) — `patch("gateway.channels.slack_media.boto3")` failed with `AttributeError: does not have the attribute 'boto3'`
- **Issue:** boto3 is imported inside the async function (`import boto3` as a local import), not at module level. Patching the module attribute doesn't work for local imports.
- **Fix:** Changed test patch target to `patch("boto3.client")` which patches the actual boto3 module's client factory — works regardless of where the import happens
- **Files modified:** `tests/unit/test_slack_media.py`
- **Commit:** 9dd7c48
## Self-Check: PASSED
Files created/verified:
- FOUND: packages/gateway/gateway/channels/slack_media.py
- FOUND: tests/unit/test_slack_media.py
- FOUND: tests/unit/test_multimodal_messages.py
Commits verified:
- FOUND: 9dd7c48 feat(02-05): Slack file_share extraction and channel-aware outbound routing
- FOUND: 669c0b5 feat(02-05): multimodal LLM interpretation with image_url content blocks
Test suite: 308/308 passed