--- phase: 01-foundation plan: 02 subsystem: llm tags: [litellm, celery, redis, ollama, anthropic, openai, fastapi, httpx, pytest] # Dependency graph requires: - phase: 01-foundation plan 01 provides: "Shared models (KonstructMessage, Agent), shared config (settings), shared db (get_session, engine), shared rls (configure_rls_hook, current_tenant_id)" provides: - "LLM Backend Pool FastAPI service (port 8004) with /complete and /health endpoints" - "LiteLLM Router with fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups" - "Automatic fallback chain: quality providers -> fast group" - "Celery app with Redis broker/backend (orchestrator.main)" - "handle_message Celery task (sync def, asyncio.run pattern)" - "System prompt builder: assembles system_prompt + identity + persona + AI transparency clause" - "Agent runner: httpx POST to llm-pool /complete with polite fallback on error" - "19 integration tests: 7 fallback routing tests (LLM-01), 12 provider config tests (LLM-02)" affects: - "01-foundation plan 03 (Channel Gateway — dispatches handle_message tasks to Celery)" - "All future orchestrator plans (must maintain sync-def Celery task pattern)" - "Phase 2 memory and tool plans (extend _process_message pipeline)" # Tech tracking tech-stack: added: - "litellm==1.82.5 (pinned — September 2025 OOM regression in later versions)" - "celery[redis]>=5.4.0" - "fastapi[standard] (added to llm-pool package)" patterns: - "Celery sync-def + asyncio.run() pattern for async work in tasks" - "LiteLLM Router model groups (fast/quality) as abstraction over provider selection" - "httpx.AsyncClient for service-to-service calls (orchestrator -> llm-pool)" - "ContextVar (current_tenant_id) for RLS scope — set/reset around DB block" key-files: created: - packages/llm-pool/llm_pool/router.py - packages/llm-pool/llm_pool/main.py - packages/llm-pool/llm_pool/__init__.py - packages/llm-pool/llm_pool/providers/__init__.py - packages/orchestrator/orchestrator/main.py - packages/orchestrator/orchestrator/tasks.py - packages/orchestrator/orchestrator/agents/builder.py - packages/orchestrator/orchestrator/agents/runner.py - packages/orchestrator/orchestrator/__init__.py - packages/orchestrator/orchestrator/agents/__init__.py - tests/integration/test_llm_fallback.py - tests/integration/test_llm_providers.py modified: - packages/llm-pool/pyproject.toml - packages/orchestrator/pyproject.toml - docker-compose.yml key-decisions: - "LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions; do not upgrade without testing" - "llm-pool runs on port 8004, consistent with shared/config.py llm_pool_url default (plan originally stated 8002 but shared config established 8004 in Plan 01)" - "Celery tasks are always sync def with asyncio.run() — this is a hard architectural constraint, never async def" - "AI transparency clause is unconditional in system prompt — agents must always disclose AI identity when directly asked" - "LiteLLM Router fallback: quality -> fast (not quality -> 503) gives graceful degradation to local inference" patterns-established: - "Celery sync-def pattern: All @app.task functions must be def (not async def). Use asyncio.run() for async sub-calls." - "LLM pool abstraction: callers use model group names ('quality', 'fast') not provider-specific model IDs" - "Runner fallback: non-200 from llm-pool returns polite fallback string, never raises to caller" - "RLS context: configure_rls_hook(engine) once, set current_tenant_id ContextVar around DB operations, always reset in finally block" requirements-completed: [LLM-01, LLM-02] # Metrics duration: 6min completed: 2026-03-23 --- # Phase 1 Plan 2: LLM Backend Pool + Celery Orchestrator Summary **LiteLLM Router service (port 8004) with Ollama/Anthropic/OpenAI fallback chain and Celery handle_message task using sync-def + asyncio.run pattern, verified by 19 integration tests** ## Performance - **Duration:** 6 min - **Started:** 2026-03-23T16:01:17Z - **Completed:** 2026-03-23T16:07:10Z - **Tasks:** 2 - **Files modified:** 15 (12 created, 3 modified) ## Accomplishments - LLM Backend Pool FastAPI service with LiteLLM Router: fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups, automatic cross-group fallback, HTTP 503 on total exhaustion - Celery orchestrator skeleton: handle_message task (sync def), system prompt builder (name + role + persona + AI transparency clause), runner (httpx to llm-pool with polite fallback) - 19 green integration tests covering fallback routing (LLM-01) and provider configuration (LLM-02) - Docker Compose updated with llm-pool (port 8004, healthcheck) and celery-worker services ## Task Commits Each task was committed atomically: 1. **Task 1: LLM Backend Pool service with LiteLLM Router and fallback** - `ee2f88e` (feat) 2. **Task 2: Celery orchestrator with system prompt builder and integration tests** - `8257c55` (feat) **Plan metadata:** _(docs commit follows self-check)_ ## Files Created/Modified - `packages/llm-pool/llm_pool/router.py` — LiteLLM Router: 3-entry model_list, fallbacks, latency routing, `complete()` async function - `packages/llm-pool/llm_pool/main.py` — FastAPI app on port 8004: POST /complete, GET /health, 503 error handling - `packages/llm-pool/llm_pool/__init__.py` — Package exports (complete, llm_router) - `packages/llm-pool/llm_pool/providers/__init__.py` — Empty placeholder for future provider customization - `packages/llm-pool/pyproject.toml` — Pinned litellm==1.82.5, added fastapi[standard] - `packages/orchestrator/orchestrator/main.py` — Celery app: Redis broker/backend, task discovery, task_acks_late=True - `packages/orchestrator/orchestrator/tasks.py` — handle_message (sync def!), _process_message (async), RLS context setup - `packages/orchestrator/orchestrator/agents/builder.py` — build_system_prompt + build_messages with AI transparency clause - `packages/orchestrator/orchestrator/agents/runner.py` — run_agent: httpx POST to llm-pool, 120s timeout, polite fallback on error - `packages/orchestrator/pyproject.toml` — Added celery[redis]>=5.4.0 - `docker-compose.yml` — Added llm-pool and celery-worker services - `tests/integration/test_llm_fallback.py` — 7 tests: success paths, 503 on total failure, metadata forwarding (LLM-01) - `tests/integration/test_llm_providers.py` — 12 tests: model_list structure, provider routing, fallback config (LLM-02) ## Decisions Made - **LiteLLM pinned to ==1.82.5**: Explicitly not latest — a September 2025 OOM regression exists in later releases. Warning comment added to pyproject.toml and router.py. - **Port 8004, not 8002**: The plan stated port 8002, but Plan 01's shared/config.py already defined `llm_pool_url = "http://localhost:8004"`. Used 8004 to maintain config consistency. - **AI transparency clause is unconditional**: Added without configuration option — per product design, agents must never deny being AIs when directly asked. - **Celery sync-def is a hard rule**: Enforced with prominent comment block in tasks.py; `_process_message` is a private async function called only via `asyncio.run()`. ## Deviations from Plan ### Auto-fixed Issues **1. [Rule 1 - Bug] Fixed test for LiteLLM fallback behavior at Router boundary** - **Found during:** Task 2 (integration test verification) - **Issue:** Initial test `test_fallback_invoked_when_primary_raises` mocked `Router.acompletion` to raise on first call then succeed on second, expecting a 200. But our code's exception handler catches the first raise and immediately returns 503 — LiteLLM's internal retry/fallback happens *inside* `acompletion`, not across multiple calls to it from our code. - **Fix:** Renamed test to `test_fallback_succeeds_when_router_returns_response` and updated to correctly test the boundary: if `acompletion` succeeds (router resolved fallback internally), endpoint returns 200; if it raises (all exhausted), returns 503. - **Files modified:** `tests/integration/test_llm_fallback.py` - **Verification:** All 19 tests pass - **Committed in:** `8257c55` (Task 2 commit) --- **Total deviations:** 1 auto-fixed (Rule 1 — incorrect test assumption) **Impact on plan:** Test correction only — no production code changed. 19 tests accurately verify the specified behavior. ## Issues Encountered - `python` binary not in PATH for uv project — all test/import verification commands use `uv run python` and `uv run pytest` ## User Setup Required None - no external service configuration required. LLM API keys are read from `.env` at runtime; empty strings are the safe default for local development with Ollama only. ## Next Phase Readiness - LLM pool and Celery orchestrator are ready for the Channel Gateway (Plan 03) to dispatch `handle_message` tasks - Docker Compose llm-pool and celery-worker services defined (not yet built/tested in container — deferred to integration phase) - `handle_message` task interface is stable: accepts `KonstructMessage.model_dump()`, returns `{message_id, response, tenant_id}` --- *Phase: 01-foundation* *Completed: 2026-03-23*