Files

Adolfo Delorenzo 0e0ea5fb66 fix: runtime deployment fixes for Docker Compose stack

- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-24 12:26:34 -06:00

9.3 KiB

Raw Blame History

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed

phase

plan

subsystem

tags

requires

provides

affects

tech-stack

key-files

key-decisions

patterns-established

requirements-completed

duration

completed

01-foundation

llm

litellm

celery

redis

ollama

anthropic

openai

fastapi

httpx

pytest

phase	provides
01-foundation plan 01	Shared models (KonstructMessage, Agent), shared config (settings), shared db (get_session, engine), shared rls (configure_rls_hook, current_tenant_id)

LLM Backend Pool FastAPI service (port 8004) with /complete and /health endpoints

LiteLLM Router with fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups

Automatic fallback chain: quality providers -> fast group

Celery app with Redis broker/backend (orchestrator.main)

handle_message Celery task (sync def, asyncio.run pattern)

System prompt builder: assembles system_prompt + identity + persona + AI transparency clause

Agent runner: httpx POST to llm-pool /complete with polite fallback on error

19 integration tests: 7 fallback routing tests (LLM-01), 12 provider config tests (LLM-02)

01-foundation plan 03 (Channel Gateway — dispatches handle_message tasks to Celery)

All future orchestrator plans (must maintain sync-def Celery task pattern)

Phase 2 memory and tool plans (extend _process_message pipeline)

added

patterns

litellm==1.82.5 (pinned — September 2025 OOM regression in later versions)

celery[redis]>=5.4.0

fastapi[standard] (added to llm-pool package)

Celery sync-def + asyncio.run() pattern for async work in tasks

LiteLLM Router model groups (fast/quality) as abstraction over provider selection

httpx.AsyncClient for service-to-service calls (orchestrator -> llm-pool)

ContextVar (current_tenant_id) for RLS scope — set/reset around DB block

created

modified

packages/llm-pool/llm_pool/router.py

packages/llm-pool/llm_pool/main.py

packages/llm-pool/llm_pool/__init__.py

packages/llm-pool/llm_pool/providers/__init__.py

packages/orchestrator/orchestrator/main.py

packages/orchestrator/orchestrator/tasks.py

packages/orchestrator/orchestrator/agents/builder.py

packages/orchestrator/orchestrator/agents/runner.py

packages/orchestrator/orchestrator/__init__.py

packages/orchestrator/orchestrator/agents/__init__.py

tests/integration/test_llm_fallback.py

tests/integration/test_llm_providers.py

packages/llm-pool/pyproject.toml

packages/orchestrator/pyproject.toml

docker-compose.yml

LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions; do not upgrade without testing

llm-pool runs on port 8004, consistent with shared/config.py llm_pool_url default (plan originally stated 8002 but shared config established 8004 in Plan 01)

Celery tasks are always sync def with asyncio.run() — this is a hard architectural constraint, never async def

AI transparency clause is unconditional in system prompt — agents must always disclose AI identity when directly asked

LiteLLM Router fallback: quality -> fast (not quality -> 503) gives graceful degradation to local inference

Celery sync-def pattern: All @app.task functions must be def (not async def). Use asyncio.run() for async sub-calls.

LLM pool abstraction: callers use model group names ('quality', 'fast') not provider-specific model IDs

Runner fallback: non-200 from llm-pool returns polite fallback string, never raises to caller

RLS context: configure_rls_hook(engine) once, set current_tenant_id ContextVar around DB operations, always reset in finally block

LLM-01

LLM-02

6min

2026-03-23

Phase 1 Plan 2: LLM Backend Pool + Celery Orchestrator Summary

LiteLLM Router service (port 8004) with Ollama/Anthropic/OpenAI fallback chain and Celery handle_message task using sync-def + asyncio.run pattern, verified by 19 integration tests

Performance

Duration: 6 min
Started: 2026-03-23T16:01:17Z
Completed: 2026-03-23T16:07:10Z
Tasks: 2
Files modified: 15 (12 created, 3 modified)

Accomplishments

LLM Backend Pool FastAPI service with LiteLLM Router: fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups, automatic cross-group fallback, HTTP 503 on total exhaustion
Celery orchestrator skeleton: handle_message task (sync def), system prompt builder (name + role + persona + AI transparency clause), runner (httpx to llm-pool with polite fallback)
19 green integration tests covering fallback routing (LLM-01) and provider configuration (LLM-02)
Docker Compose updated with llm-pool (port 8004, healthcheck) and celery-worker services

Task Commits

Each task was committed atomically:

Task 1: LLM Backend Pool service with LiteLLM Router and fallback - ee2f88e (feat)
Task 2: Celery orchestrator with system prompt builder and integration tests - 8257c55 (feat)

Plan metadata: (docs commit follows self-check)

Files Created/Modified

packages/llm-pool/llm_pool/router.py — LiteLLM Router: 3-entry model_list, fallbacks, latency routing, complete() async function
packages/llm-pool/llm_pool/main.py — FastAPI app on port 8004: POST /complete, GET /health, 503 error handling
packages/llm-pool/llm_pool/__init__.py — Package exports (complete, llm_router)
packages/llm-pool/llm_pool/providers/__init__.py — Empty placeholder for future provider customization
packages/llm-pool/pyproject.toml — Pinned litellm==1.82.5, added fastapi[standard]
packages/orchestrator/orchestrator/main.py — Celery app: Redis broker/backend, task discovery, task_acks_late=True
packages/orchestrator/orchestrator/tasks.py — handle_message (sync def!), _process_message (async), RLS context setup
packages/orchestrator/orchestrator/agents/builder.py — build_system_prompt + build_messages with AI transparency clause
packages/orchestrator/orchestrator/agents/runner.py — run_agent: httpx POST to llm-pool, 120s timeout, polite fallback on error
packages/orchestrator/pyproject.toml — Added celery[redis]>=5.4.0
docker-compose.yml — Added llm-pool and celery-worker services
tests/integration/test_llm_fallback.py — 7 tests: success paths, 503 on total failure, metadata forwarding (LLM-01)
tests/integration/test_llm_providers.py — 12 tests: model_list structure, provider routing, fallback config (LLM-02)

Decisions Made

LiteLLM pinned to ==1.82.5: Explicitly not latest — a September 2025 OOM regression exists in later releases. Warning comment added to pyproject.toml and router.py.
Port 8004, not 8002: The plan stated port 8002, but Plan 01's shared/config.py already defined llm_pool_url = "http://localhost:8004". Used 8004 to maintain config consistency.
AI transparency clause is unconditional: Added without configuration option — per product design, agents must never deny being AIs when directly asked.
Celery sync-def is a hard rule: Enforced with prominent comment block in tasks.py; _process_message is a private async function called only via asyncio.run().

Deviations from Plan

Auto-fixed Issues

1. [Rule 1 - Bug] Fixed test for LiteLLM fallback behavior at Router boundary

Found during: Task 2 (integration test verification)
Issue: Initial test test_fallback_invoked_when_primary_raises mocked Router.acompletion to raise on first call then succeed on second, expecting a 200. But our code's exception handler catches the first raise and immediately returns 503 — LiteLLM's internal retry/fallback happens inside acompletion, not across multiple calls to it from our code.
Fix: Renamed test to test_fallback_succeeds_when_router_returns_response and updated to correctly test the boundary: if acompletion succeeds (router resolved fallback internally), endpoint returns 200; if it raises (all exhausted), returns 503.
Files modified: tests/integration/test_llm_fallback.py
Verification: All 19 tests pass
Committed in: 8257c55 (Task 2 commit)

Total deviations: 1 auto-fixed (Rule 1 — incorrect test assumption) Impact on plan: Test correction only — no production code changed. 19 tests accurately verify the specified behavior.

Issues Encountered

python binary not in PATH for uv project — all test/import verification commands use uv run python and uv run pytest

User Setup Required

None - no external service configuration required. LLM API keys are read from .env at runtime; empty strings are the safe default for local development with Ollama only.

Next Phase Readiness

LLM pool and Celery orchestrator are ready for the Channel Gateway (Plan 03) to dispatch handle_message tasks
Docker Compose llm-pool and celery-worker services defined (not yet built/tested in container — deferred to integration phase)
handle_message task interface is stable: accepts KonstructMessage.model_dump(), returns {message_id, response, tenant_id}

Self-Check: PASSED

All key files verified present. Both task commits (ee2f88e, 8257c55) verified in git log. 19 integration tests pass. SUMMARY.md created. STATE.md, ROADMAP.md, REQUIREMENTS.md updated.

Phase: 01-foundation Completed: 2026-03-23

9.3 KiB Raw Blame History