Files
konstruct/.planning/phases/01-foundation/01-02-SUMMARY.md
Adolfo Delorenzo 0e0ea5fb66 fix: runtime deployment fixes for Docker Compose stack
- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:26:34 -06:00

9.3 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
01-foundation 02 llm
litellm
celery
redis
ollama
anthropic
openai
fastapi
httpx
pytest
phase provides
01-foundation plan 01 Shared models (KonstructMessage, Agent), shared config (settings), shared db (get_session, engine), shared rls (configure_rls_hook, current_tenant_id)
LLM Backend Pool FastAPI service (port 8004) with /complete and /health endpoints
LiteLLM Router with fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups
Automatic fallback chain: quality providers -> fast group
Celery app with Redis broker/backend (orchestrator.main)
handle_message Celery task (sync def, asyncio.run pattern)
System prompt builder: assembles system_prompt + identity + persona + AI transparency clause
Agent runner: httpx POST to llm-pool /complete with polite fallback on error
19 integration tests: 7 fallback routing tests (LLM-01), 12 provider config tests (LLM-02)
01-foundation plan 03 (Channel Gateway — dispatches handle_message tasks to Celery)
All future orchestrator plans (must maintain sync-def Celery task pattern)
Phase 2 memory and tool plans (extend _process_message pipeline)
added patterns
litellm==1.82.5 (pinned — September 2025 OOM regression in later versions)
celery[redis]>=5.4.0
fastapi[standard] (added to llm-pool package)
Celery sync-def + asyncio.run() pattern for async work in tasks
LiteLLM Router model groups (fast/quality) as abstraction over provider selection
httpx.AsyncClient for service-to-service calls (orchestrator -> llm-pool)
ContextVar (current_tenant_id) for RLS scope — set/reset around DB block
created modified
packages/llm-pool/llm_pool/router.py
packages/llm-pool/llm_pool/main.py
packages/llm-pool/llm_pool/__init__.py
packages/llm-pool/llm_pool/providers/__init__.py
packages/orchestrator/orchestrator/main.py
packages/orchestrator/orchestrator/tasks.py
packages/orchestrator/orchestrator/agents/builder.py
packages/orchestrator/orchestrator/agents/runner.py
packages/orchestrator/orchestrator/__init__.py
packages/orchestrator/orchestrator/agents/__init__.py
tests/integration/test_llm_fallback.py
tests/integration/test_llm_providers.py
packages/llm-pool/pyproject.toml
packages/orchestrator/pyproject.toml
docker-compose.yml
LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions; do not upgrade without testing
llm-pool runs on port 8004, consistent with shared/config.py llm_pool_url default (plan originally stated 8002 but shared config established 8004 in Plan 01)
Celery tasks are always sync def with asyncio.run() — this is a hard architectural constraint, never async def
AI transparency clause is unconditional in system prompt — agents must always disclose AI identity when directly asked
LiteLLM Router fallback: quality -> fast (not quality -> 503) gives graceful degradation to local inference
Celery sync-def pattern: All @app.task functions must be def (not async def). Use asyncio.run() for async sub-calls.
LLM pool abstraction: callers use model group names ('quality', 'fast') not provider-specific model IDs
Runner fallback: non-200 from llm-pool returns polite fallback string, never raises to caller
RLS context: configure_rls_hook(engine) once, set current_tenant_id ContextVar around DB operations, always reset in finally block
LLM-01
LLM-02
6min 2026-03-23

Phase 1 Plan 2: LLM Backend Pool + Celery Orchestrator Summary

LiteLLM Router service (port 8004) with Ollama/Anthropic/OpenAI fallback chain and Celery handle_message task using sync-def + asyncio.run pattern, verified by 19 integration tests

Performance

  • Duration: 6 min
  • Started: 2026-03-23T16:01:17Z
  • Completed: 2026-03-23T16:07:10Z
  • Tasks: 2
  • Files modified: 15 (12 created, 3 modified)

Accomplishments

  • LLM Backend Pool FastAPI service with LiteLLM Router: fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups, automatic cross-group fallback, HTTP 503 on total exhaustion
  • Celery orchestrator skeleton: handle_message task (sync def), system prompt builder (name + role + persona + AI transparency clause), runner (httpx to llm-pool with polite fallback)
  • 19 green integration tests covering fallback routing (LLM-01) and provider configuration (LLM-02)
  • Docker Compose updated with llm-pool (port 8004, healthcheck) and celery-worker services

Task Commits

Each task was committed atomically:

  1. Task 1: LLM Backend Pool service with LiteLLM Router and fallback - ee2f88e (feat)
  2. Task 2: Celery orchestrator with system prompt builder and integration tests - 8257c55 (feat)

Plan metadata: (docs commit follows self-check)

Files Created/Modified

  • packages/llm-pool/llm_pool/router.py — LiteLLM Router: 3-entry model_list, fallbacks, latency routing, complete() async function
  • packages/llm-pool/llm_pool/main.py — FastAPI app on port 8004: POST /complete, GET /health, 503 error handling
  • packages/llm-pool/llm_pool/__init__.py — Package exports (complete, llm_router)
  • packages/llm-pool/llm_pool/providers/__init__.py — Empty placeholder for future provider customization
  • packages/llm-pool/pyproject.toml — Pinned litellm==1.82.5, added fastapi[standard]
  • packages/orchestrator/orchestrator/main.py — Celery app: Redis broker/backend, task discovery, task_acks_late=True
  • packages/orchestrator/orchestrator/tasks.py — handle_message (sync def!), _process_message (async), RLS context setup
  • packages/orchestrator/orchestrator/agents/builder.py — build_system_prompt + build_messages with AI transparency clause
  • packages/orchestrator/orchestrator/agents/runner.py — run_agent: httpx POST to llm-pool, 120s timeout, polite fallback on error
  • packages/orchestrator/pyproject.toml — Added celery[redis]>=5.4.0
  • docker-compose.yml — Added llm-pool and celery-worker services
  • tests/integration/test_llm_fallback.py — 7 tests: success paths, 503 on total failure, metadata forwarding (LLM-01)
  • tests/integration/test_llm_providers.py — 12 tests: model_list structure, provider routing, fallback config (LLM-02)

Decisions Made

  • LiteLLM pinned to ==1.82.5: Explicitly not latest — a September 2025 OOM regression exists in later releases. Warning comment added to pyproject.toml and router.py.
  • Port 8004, not 8002: The plan stated port 8002, but Plan 01's shared/config.py already defined llm_pool_url = "http://localhost:8004". Used 8004 to maintain config consistency.
  • AI transparency clause is unconditional: Added without configuration option — per product design, agents must never deny being AIs when directly asked.
  • Celery sync-def is a hard rule: Enforced with prominent comment block in tasks.py; _process_message is a private async function called only via asyncio.run().

Deviations from Plan

Auto-fixed Issues

1. [Rule 1 - Bug] Fixed test for LiteLLM fallback behavior at Router boundary

  • Found during: Task 2 (integration test verification)
  • Issue: Initial test test_fallback_invoked_when_primary_raises mocked Router.acompletion to raise on first call then succeed on second, expecting a 200. But our code's exception handler catches the first raise and immediately returns 503 — LiteLLM's internal retry/fallback happens inside acompletion, not across multiple calls to it from our code.
  • Fix: Renamed test to test_fallback_succeeds_when_router_returns_response and updated to correctly test the boundary: if acompletion succeeds (router resolved fallback internally), endpoint returns 200; if it raises (all exhausted), returns 503.
  • Files modified: tests/integration/test_llm_fallback.py
  • Verification: All 19 tests pass
  • Committed in: 8257c55 (Task 2 commit)

Total deviations: 1 auto-fixed (Rule 1 — incorrect test assumption) Impact on plan: Test correction only — no production code changed. 19 tests accurately verify the specified behavior.

Issues Encountered

  • python binary not in PATH for uv project — all test/import verification commands use uv run python and uv run pytest

User Setup Required

None - no external service configuration required. LLM API keys are read from .env at runtime; empty strings are the safe default for local development with Ollama only.

Next Phase Readiness

  • LLM pool and Celery orchestrator are ready for the Channel Gateway (Plan 03) to dispatch handle_message tasks
  • Docker Compose llm-pool and celery-worker services defined (not yet built/tested in container — deferred to integration phase)
  • handle_message task interface is stable: accepts KonstructMessage.model_dump(), returns {message_id, response, tenant_id}

Self-Check: PASSED

All key files verified present. Both task commits (ee2f88e, 8257c55) verified in git log. 19 integration tests pass. SUMMARY.md created. STATE.md, ROADMAP.md, REQUIREMENTS.md updated.


Phase: 01-foundation Completed: 2026-03-23