- Add 02-05-SUMMARY.md with full task documentation and deviations - Update STATE.md: advance to plan 5 of 5 in phase 02, add decisions - Update ROADMAP.md: phase 2 now 5/5 plans complete (Complete status)
7.1 KiB
phase, plan, subsystem, tags, dependency_graph, tech_stack, key_files, key_decisions, metrics
| phase | plan | subsystem | tags | dependency_graph | tech_stack | key_files | key_decisions | metrics | |||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 02-agent-features | 05 | orchestrator-media |
|
|
|
|
|
|
Phase 2 Plan 05: Cross-Channel Media Support and Multimodal LLM Interpretation Summary
Multimodal media pipeline wired end-to-end: Slack file_share events download to MinIO, WhatsApp responses routed via send_whatsapp_message, and LLM prompts enriched with image_url content blocks for vision-capable models.
What Was Built
Task 1: Slack file_share extraction and channel-aware outbound routing (commit: 9dd7c48)
New: packages/gateway/gateway/channels/slack_media.py
Provides the complete Slack file_share pipeline:
is_file_share_event(event)— detectssubtype == "file_share"media_type_from_mime(mime_type)— maps MIME types to MediaType enum (IMAGE/DOCUMENT/AUDIO/VIDEO)build_slack_storage_key(tenant_id, agent_id, message_id, filename)— generates{tenant_id}/{agent_id}/{message_id}/{filename}storage keybuild_attachment_from_slack_file(file_info, storage_key)— creates MediaAttachment from Slack file metadatadownload_and_store_slack_file(...)— async: downloads via bot token, uploads to MinIO, returns (storage_key, presigned_url)
Updated: packages/orchestrator/orchestrator/tasks.py
Added _send_response(channel, text, extras) helper function for channel-aware outbound routing:
channel == "slack"→ calls_update_slack_placeholder()withbot_token,channel_id,placeholder_tsfrom extraschannel == "whatsapp"→ callssend_whatsapp_message()withphone_number_id,bot_token(access_token),wa_idfrom extras- Unknown channels → logs warning and returns (no crash)
Added send_whatsapp_message import from gateway.channels.whatsapp at module top.
Updated: packages/gateway/pyproject.toml
Added boto3>=1.35.0 as an explicit gateway dependency (was used via local import in whatsapp.py but never declared).
Task 2: Multimodal LLM interpretation with image_url content blocks (commit: 669c0b5)
Updated: packages/orchestrator/orchestrator/agents/builder.py
Added three new functions:
-
supports_vision(model_name: str) -> bool- Strips provider prefixes (
anthropic/,openai/) before matching - Returns True for: claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*
- Returns False for: gpt-3.5-turbo, gpt-4, ollama/llama3, etc.
- Strips provider prefixes (
-
generate_presigned_url(storage_key: str, expiry: int = 3600) -> str- Uses boto3 S3 client with MinIO endpoint from shared settings
- Default 1-hour expiry
- Returns presigned GET URL string
-
build_messages_with_media(agent, current_message, media_attachments, recent_messages, relevant_context) -> list[dict]- Extends
build_messages_with_memory()with media injection - IMAGE + vision model → multipart content list with
image_urlblocks (detail: "auto") - IMAGE + non-vision model → appends
[Image attached: filename]text - DOCUMENT → appends
[Document attached: filename - presigned_url]text (any model) - AUDIO/VIDEO → appends
[Audio/Video attached: filename]text - Missing storage_key → skipped gracefully (logged at DEBUG)
- Memory context (recent + relevant) fully preserved alongside media
- Extends
Tests
| File | Tests | Coverage |
|---|---|---|
tests/unit/test_slack_media.py |
23 | file_share detection, MIME mapping, storage key format, MinIO upload (mocked), bot token auth, channel-aware routing |
tests/unit/test_multimodal_messages.py |
27 | supports_vision (14 models), presigned URL (expiry, key), build_messages_with_media (8 scenarios) |
| Total new tests | 50 |
Full suite: 308 tests, all passing.
Deviations from Plan
Auto-fixed Issues
1. [Rule 3 - Blocking] boto3 not installed despite plan assumption
- Found during: Task 1 (GREEN phase) — test patching
boto3.clientfailed withModuleNotFoundError - Issue: Plan stated "boto3 may already be installed from Plan 02-03" — it was used via local import in whatsapp.py but never declared as a dependency, so it wasn't installed in the venv
- Fix: Added
boto3>=1.35.0topackages/gateway/pyproject.tomldependencies and ranuv add boto3to install at workspace root level - Files modified:
packages/gateway/pyproject.toml,pyproject.toml(lockfile update),uv.lock - Commit:
9dd7c48
2. [Rule 1 - Bug] boto3 test patch target mismatch
- Found during: Task 1 (GREEN phase) —
patch("gateway.channels.slack_media.boto3")failed withAttributeError: does not have the attribute 'boto3' - Issue: boto3 is imported inside the async function (
import boto3as a local import), not at module level. Patching the module attribute doesn't work for local imports. - Fix: Changed test patch target to
patch("boto3.client")which patches the actual boto3 module's client factory — works regardless of where the import happens - Files modified:
tests/unit/test_slack_media.py - Commit:
9dd7c48
Self-Check: PASSED
Files created/verified:
- FOUND: packages/gateway/gateway/channels/slack_media.py
- FOUND: tests/unit/test_slack_media.py
- FOUND: tests/unit/test_multimodal_messages.py
Commits verified:
- FOUND:
9dd7c48feat(02-05): Slack file_share extraction and channel-aware outbound routing - FOUND:
669c0b5feat(02-05): multimodal LLM interpretation with image_url content blocks
Test suite: 308/308 passed