Files
Adolfo Delorenzo d921ed776a docs(02-05): complete multimodal media support plan summary
- Add 02-05-SUMMARY.md with full task documentation and deviations
- Update STATE.md: advance to plan 5 of 5 in phase 02, add decisions
- Update ROADMAP.md: phase 2 now 5/5 plans complete (Complete status)
2026-03-23 15:21:38 -06:00

7.1 KiB

phase, plan, subsystem, tags, dependency_graph, tech_stack, key_files, key_decisions, metrics
phase plan subsystem tags dependency_graph tech_stack key_files key_decisions metrics
02-agent-features 05 orchestrator-media
multimodal
media
slack
whatsapp
minio
vision
llm
requires provides affects
02-02
02-03
multimodal-llm-messages
slack-media-extraction
whatsapp-outbound-routing
orchestrator/tasks.py
orchestrator/agents/builder.py
gateway/channels/slack_media.py
added patterns
boto3>=1.35.0 (gateway dependency)
OpenAI multipart content format: {role: user, content: [{type: text}, {type: image_url}]}
Vision model detection via regex pattern matching on model name
Channel-aware outbound routing via _send_response() dispatcher
TDD: RED (failing tests) -> GREEN (passing implementation)
created modified
packages/gateway/gateway/channels/slack_media.py
tests/unit/test_slack_media.py
tests/unit/test_multimodal_messages.py
packages/orchestrator/orchestrator/tasks.py
packages/orchestrator/orchestrator/agents/builder.py
packages/gateway/pyproject.toml
boto3 added to gateway pyproject.toml — not previously installed despite whatsapp.py using it via local import; uv add --directory installs at workspace root level
boto3.client patched at import site in tests (patch('boto3.client')) — boto3 is a local import inside async functions, not a module-level attribute
build_messages_with_media() wraps build_messages_with_memory() — media enrichment is additive, not a replacement, preserving all memory context
AUDIO/VIDEO attachments text-referenced only in v1 — image_url blocks are image-only in OpenAI/LiteLLM format
Non-vision model graceful degradation: IMAGE -> '[Image attached: filename]' appended to text content
duration completed_date tasks_completed files_changed
~25 minutes 2026-03-23 2 6

Phase 2 Plan 05: Cross-Channel Media Support and Multimodal LLM Interpretation Summary

Multimodal media pipeline wired end-to-end: Slack file_share events download to MinIO, WhatsApp responses routed via send_whatsapp_message, and LLM prompts enriched with image_url content blocks for vision-capable models.

What Was Built

Task 1: Slack file_share extraction and channel-aware outbound routing (commit: 9dd7c48)

New: packages/gateway/gateway/channels/slack_media.py

Provides the complete Slack file_share pipeline:

  • is_file_share_event(event) — detects subtype == "file_share"
  • media_type_from_mime(mime_type) — maps MIME types to MediaType enum (IMAGE/DOCUMENT/AUDIO/VIDEO)
  • build_slack_storage_key(tenant_id, agent_id, message_id, filename) — generates {tenant_id}/{agent_id}/{message_id}/{filename} storage key
  • build_attachment_from_slack_file(file_info, storage_key) — creates MediaAttachment from Slack file metadata
  • download_and_store_slack_file(...) — async: downloads via bot token, uploads to MinIO, returns (storage_key, presigned_url)

Updated: packages/orchestrator/orchestrator/tasks.py

Added _send_response(channel, text, extras) helper function for channel-aware outbound routing:

  • channel == "slack" → calls _update_slack_placeholder() with bot_token, channel_id, placeholder_ts from extras
  • channel == "whatsapp" → calls send_whatsapp_message() with phone_number_id, bot_token (access_token), wa_id from extras
  • Unknown channels → logs warning and returns (no crash)

Added send_whatsapp_message import from gateway.channels.whatsapp at module top.

Updated: packages/gateway/pyproject.toml

Added boto3>=1.35.0 as an explicit gateway dependency (was used via local import in whatsapp.py but never declared).

Task 2: Multimodal LLM interpretation with image_url content blocks (commit: 669c0b5)

Updated: packages/orchestrator/orchestrator/agents/builder.py

Added three new functions:

  1. supports_vision(model_name: str) -> bool

    • Strips provider prefixes (anthropic/, openai/) before matching
    • Returns True for: claude-3*, gpt-4o*, gpt-4-vision*, gemini-pro-vision*, gemini-1.5*, gemini-2*
    • Returns False for: gpt-3.5-turbo, gpt-4, ollama/llama3, etc.
  2. generate_presigned_url(storage_key: str, expiry: int = 3600) -> str

    • Uses boto3 S3 client with MinIO endpoint from shared settings
    • Default 1-hour expiry
    • Returns presigned GET URL string
  3. build_messages_with_media(agent, current_message, media_attachments, recent_messages, relevant_context) -> list[dict]

    • Extends build_messages_with_memory() with media injection
    • IMAGE + vision model → multipart content list with image_url blocks (detail: "auto")
    • IMAGE + non-vision model → appends [Image attached: filename] text
    • DOCUMENT → appends [Document attached: filename - presigned_url] text (any model)
    • AUDIO/VIDEO → appends [Audio/Video attached: filename] text
    • Missing storage_key → skipped gracefully (logged at DEBUG)
    • Memory context (recent + relevant) fully preserved alongside media

Tests

File Tests Coverage
tests/unit/test_slack_media.py 23 file_share detection, MIME mapping, storage key format, MinIO upload (mocked), bot token auth, channel-aware routing
tests/unit/test_multimodal_messages.py 27 supports_vision (14 models), presigned URL (expiry, key), build_messages_with_media (8 scenarios)
Total new tests 50

Full suite: 308 tests, all passing.

Deviations from Plan

Auto-fixed Issues

1. [Rule 3 - Blocking] boto3 not installed despite plan assumption

  • Found during: Task 1 (GREEN phase) — test patching boto3.client failed with ModuleNotFoundError
  • Issue: Plan stated "boto3 may already be installed from Plan 02-03" — it was used via local import in whatsapp.py but never declared as a dependency, so it wasn't installed in the venv
  • Fix: Added boto3>=1.35.0 to packages/gateway/pyproject.toml dependencies and ran uv add boto3 to install at workspace root level
  • Files modified: packages/gateway/pyproject.toml, pyproject.toml (lockfile update), uv.lock
  • Commit: 9dd7c48

2. [Rule 1 - Bug] boto3 test patch target mismatch

  • Found during: Task 1 (GREEN phase) — patch("gateway.channels.slack_media.boto3") failed with AttributeError: does not have the attribute 'boto3'
  • Issue: boto3 is imported inside the async function (import boto3 as a local import), not at module level. Patching the module attribute doesn't work for local imports.
  • Fix: Changed test patch target to patch("boto3.client") which patches the actual boto3 module's client factory — works regardless of where the import happens
  • Files modified: tests/unit/test_slack_media.py
  • Commit: 9dd7c48

Self-Check: PASSED

Files created/verified:

  • FOUND: packages/gateway/gateway/channels/slack_media.py
  • FOUND: tests/unit/test_slack_media.py
  • FOUND: tests/unit/test_multimodal_messages.py

Commits verified:

  • FOUND: 9dd7c48 feat(02-05): Slack file_share extraction and channel-aware outbound routing
  • FOUND: 669c0b5 feat(02-05): multimodal LLM interpretation with image_url content blocks

Test suite: 308/308 passed