| 01-foundation |
02 |
llm |
| litellm |
| celery |
| redis |
| ollama |
| anthropic |
| openai |
| fastapi |
| httpx |
| pytest |
|
| phase |
provides |
| 01-foundation plan 01 |
Shared models (KonstructMessage, Agent), shared config (settings), shared db (get_session, engine), shared rls (configure_rls_hook, current_tenant_id) |
|
|
| LLM Backend Pool FastAPI service (port 8004) with /complete and /health endpoints |
| LiteLLM Router with fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups |
| Automatic fallback chain: quality providers -> fast group |
| Celery app with Redis broker/backend (orchestrator.main) |
| handle_message Celery task (sync def, asyncio.run pattern) |
| System prompt builder: assembles system_prompt + identity + persona + AI transparency clause |
| Agent runner: httpx POST to llm-pool /complete with polite fallback on error |
| 19 integration tests: 7 fallback routing tests (LLM-01), 12 provider config tests (LLM-02) |
|
| 01-foundation plan 03 (Channel Gateway — dispatches handle_message tasks to Celery) |
| All future orchestrator plans (must maintain sync-def Celery task pattern) |
| Phase 2 memory and tool plans (extend _process_message pipeline) |
|
| added |
patterns |
| litellm==1.82.5 (pinned — September 2025 OOM regression in later versions) |
| celery[redis]>=5.4.0 |
| fastapi[standard] (added to llm-pool package) |
|
| Celery sync-def + asyncio.run() pattern for async work in tasks |
| LiteLLM Router model groups (fast/quality) as abstraction over provider selection |
| httpx.AsyncClient for service-to-service calls (orchestrator -> llm-pool) |
| ContextVar (current_tenant_id) for RLS scope — set/reset around DB block |
|
|
| created |
modified |
| packages/llm-pool/llm_pool/router.py |
| packages/llm-pool/llm_pool/main.py |
| packages/llm-pool/llm_pool/__init__.py |
| packages/llm-pool/llm_pool/providers/__init__.py |
| packages/orchestrator/orchestrator/main.py |
| packages/orchestrator/orchestrator/tasks.py |
| packages/orchestrator/orchestrator/agents/builder.py |
| packages/orchestrator/orchestrator/agents/runner.py |
| packages/orchestrator/orchestrator/__init__.py |
| packages/orchestrator/orchestrator/agents/__init__.py |
| tests/integration/test_llm_fallback.py |
| tests/integration/test_llm_providers.py |
|
| packages/llm-pool/pyproject.toml |
| packages/orchestrator/pyproject.toml |
| docker-compose.yml |
|
|
| LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions; do not upgrade without testing |
| llm-pool runs on port 8004, consistent with shared/config.py llm_pool_url default (plan originally stated 8002 but shared config established 8004 in Plan 01) |
| Celery tasks are always sync def with asyncio.run() — this is a hard architectural constraint, never async def |
| AI transparency clause is unconditional in system prompt — agents must always disclose AI identity when directly asked |
| LiteLLM Router fallback: quality -> fast (not quality -> 503) gives graceful degradation to local inference |
|
| Celery sync-def pattern: All @app.task functions must be def (not async def). Use asyncio.run() for async sub-calls. |
| LLM pool abstraction: callers use model group names ('quality', 'fast') not provider-specific model IDs |
| Runner fallback: non-200 from llm-pool returns polite fallback string, never raises to caller |
| RLS context: configure_rls_hook(engine) once, set current_tenant_id ContextVar around DB operations, always reset in finally block |
|
|
6min |
2026-03-23 |