docs(01-02): complete LLM pool and orchestrator plan
This commit is contained in:
@@ -27,8 +27,8 @@ Requirements for beta-ready release. Each maps to roadmap phases.
|
|||||||
|
|
||||||
### LLM Backend
|
### LLM Backend
|
||||||
|
|
||||||
- [ ] **LLM-01**: LiteLLM router abstracts LLM provider selection with fallback routing
|
- [x] **LLM-01**: LiteLLM router abstracts LLM provider selection with fallback routing
|
||||||
- [ ] **LLM-02**: Platform supports Ollama (local) and commercial APIs (Anthropic, OpenAI) as LLM providers
|
- [x] **LLM-02**: Platform supports Ollama (local) and commercial APIs (Anthropic, OpenAI) as LLM providers
|
||||||
- [ ] **LLM-03**: Tenant can provide their own API keys for supported LLM providers (BYO keys, encrypted at rest) ⚠️ CONFLICT: listed as v1 here but out-of-scope in PROJECT.md — resolve before Phase 3 planning
|
- [ ] **LLM-03**: Tenant can provide their own API keys for supported LLM providers (BYO keys, encrypted at rest) ⚠️ CONFLICT: listed as v1 here but out-of-scope in PROJECT.md — resolve before Phase 3 planning
|
||||||
|
|
||||||
### Multi-Tenancy & Security
|
### Multi-Tenancy & Security
|
||||||
@@ -107,8 +107,8 @@ Which phases cover which requirements. Updated during roadmap creation.
|
|||||||
| AGNT-05 | Phase 2 | Pending |
|
| AGNT-05 | Phase 2 | Pending |
|
||||||
| AGNT-06 | Phase 2 | Pending |
|
| AGNT-06 | Phase 2 | Pending |
|
||||||
| AGNT-07 | Phase 3 | Pending |
|
| AGNT-07 | Phase 3 | Pending |
|
||||||
| LLM-01 | Phase 1 | Pending |
|
| LLM-01 | Phase 1 | Complete |
|
||||||
| LLM-02 | Phase 1 | Pending |
|
| LLM-02 | Phase 1 | Complete |
|
||||||
| LLM-03 | Phase 3 | Pending |
|
| LLM-03 | Phase 3 | Pending |
|
||||||
| TNNT-01 | Phase 1 | Complete |
|
| TNNT-01 | Phase 1 | Complete |
|
||||||
| TNNT-02 | Phase 1 | Complete |
|
| TNNT-02 | Phase 1 | Complete |
|
||||||
|
|||||||
@@ -77,7 +77,7 @@ Phases execute in numeric order: 1 → 2 → 3
|
|||||||
|
|
||||||
| Phase | Plans Complete | Status | Completed |
|
| Phase | Plans Complete | Status | Completed |
|
||||||
|-------|----------------|--------|-----------|
|
|-------|----------------|--------|-----------|
|
||||||
| 1. Foundation | 1/4 | In Progress| |
|
| 1. Foundation | 2/4 | In Progress| |
|
||||||
| 2. Agent Features | 0/4 | Not started | - |
|
| 2. Agent Features | 0/4 | Not started | - |
|
||||||
| 3. Operator Experience | 0/2 | Not started | - |
|
| 3. Operator Experience | 0/2 | Not started | - |
|
||||||
|
|
||||||
|
|||||||
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
|
|||||||
milestone: v1.0
|
milestone: v1.0
|
||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
status: planning
|
status: planning
|
||||||
stopped_at: Completed 01-foundation 01-01-PLAN.md
|
stopped_at: Completed 01-foundation 01-02-PLAN.md
|
||||||
last_updated: "2026-03-23T15:59:38.482Z"
|
last_updated: "2026-03-23T16:08:44.982Z"
|
||||||
last_activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning
|
last_activity: 2026-03-23 — Roadmap created, ready for Phase 1 planning
|
||||||
progress:
|
progress:
|
||||||
total_phases: 3
|
total_phases: 3
|
||||||
completed_phases: 0
|
completed_phases: 0
|
||||||
total_plans: 4
|
total_plans: 4
|
||||||
completed_plans: 1
|
completed_plans: 2
|
||||||
percent: 0
|
percent: 0
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -51,6 +51,7 @@ Progress: [░░░░░░░░░░] 0%
|
|||||||
|
|
||||||
*Updated after each plan completion*
|
*Updated after each plan completion*
|
||||||
| Phase 01-foundation P01 | 12 | 2 tasks | 32 files |
|
| Phase 01-foundation P01 | 12 | 2 tasks | 32 files |
|
||||||
|
| Phase 01-foundation P02 | 6 | 2 tasks | 15 files |
|
||||||
|
|
||||||
## Accumulated Context
|
## Accumulated Context
|
||||||
|
|
||||||
@@ -65,6 +66,10 @@ Recent decisions affecting current work:
|
|||||||
- [Phase 01-foundation]: PostgreSQL RLS with FORCE ROW LEVEL SECURITY chosen for tenant isolation; app connects as konstruct_app role (not superuser)
|
- [Phase 01-foundation]: PostgreSQL RLS with FORCE ROW LEVEL SECURITY chosen for tenant isolation; app connects as konstruct_app role (not superuser)
|
||||||
- [Phase 01-foundation]: SET LOCAL app.current_tenant uses UUID-sanitized f-string (not parameterized) — asyncpg does not support prepared statement placeholders for SET LOCAL
|
- [Phase 01-foundation]: SET LOCAL app.current_tenant uses UUID-sanitized f-string (not parameterized) — asyncpg does not support prepared statement placeholders for SET LOCAL
|
||||||
- [Phase 01-foundation]: channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL in Alembic migrations
|
- [Phase 01-foundation]: channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL in Alembic migrations
|
||||||
|
- [Phase 01-foundation]: LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions
|
||||||
|
- [Phase 01-foundation]: Celery tasks are always sync def with asyncio.run() — hard architectural constraint, never async def
|
||||||
|
- [Phase 01-foundation]: AI transparency clause is unconditional in system prompt — agents must disclose AI identity when directly asked
|
||||||
|
- [Phase 01-foundation]: llm-pool port 8004 (consistent with shared/config.py llm_pool_url default, not plan-stated 8002)
|
||||||
|
|
||||||
### Pending Todos
|
### Pending Todos
|
||||||
|
|
||||||
@@ -76,6 +81,6 @@ None yet.
|
|||||||
|
|
||||||
## Session Continuity
|
## Session Continuity
|
||||||
|
|
||||||
Last session: 2026-03-23T15:59:38.480Z
|
Last session: 2026-03-23T16:08:44.980Z
|
||||||
Stopped at: Completed 01-foundation 01-01-PLAN.md
|
Stopped at: Completed 01-foundation 01-02-PLAN.md
|
||||||
Resume file: None
|
Resume file: None
|
||||||
|
|||||||
162
.planning/phases/01-foundation/01-02-SUMMARY.md
Normal file
162
.planning/phases/01-foundation/01-02-SUMMARY.md
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
---
|
||||||
|
phase: 01-foundation
|
||||||
|
plan: 02
|
||||||
|
subsystem: llm
|
||||||
|
tags: [litellm, celery, redis, ollama, anthropic, openai, fastapi, httpx, pytest]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 01-foundation plan 01
|
||||||
|
provides: "Shared models (KonstructMessage, Agent), shared config (settings), shared db (get_session, engine), shared rls (configure_rls_hook, current_tenant_id)"
|
||||||
|
|
||||||
|
provides:
|
||||||
|
- "LLM Backend Pool FastAPI service (port 8004) with /complete and /health endpoints"
|
||||||
|
- "LiteLLM Router with fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups"
|
||||||
|
- "Automatic fallback chain: quality providers -> fast group"
|
||||||
|
- "Celery app with Redis broker/backend (orchestrator.main)"
|
||||||
|
- "handle_message Celery task (sync def, asyncio.run pattern)"
|
||||||
|
- "System prompt builder: assembles system_prompt + identity + persona + AI transparency clause"
|
||||||
|
- "Agent runner: httpx POST to llm-pool /complete with polite fallback on error"
|
||||||
|
- "19 integration tests: 7 fallback routing tests (LLM-01), 12 provider config tests (LLM-02)"
|
||||||
|
|
||||||
|
affects:
|
||||||
|
- "01-foundation plan 03 (Channel Gateway — dispatches handle_message tasks to Celery)"
|
||||||
|
- "All future orchestrator plans (must maintain sync-def Celery task pattern)"
|
||||||
|
- "Phase 2 memory and tool plans (extend _process_message pipeline)"
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added:
|
||||||
|
- "litellm==1.82.5 (pinned — September 2025 OOM regression in later versions)"
|
||||||
|
- "celery[redis]>=5.4.0"
|
||||||
|
- "fastapi[standard] (added to llm-pool package)"
|
||||||
|
patterns:
|
||||||
|
- "Celery sync-def + asyncio.run() pattern for async work in tasks"
|
||||||
|
- "LiteLLM Router model groups (fast/quality) as abstraction over provider selection"
|
||||||
|
- "httpx.AsyncClient for service-to-service calls (orchestrator -> llm-pool)"
|
||||||
|
- "ContextVar (current_tenant_id) for RLS scope — set/reset around DB block"
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- packages/llm-pool/llm_pool/router.py
|
||||||
|
- packages/llm-pool/llm_pool/main.py
|
||||||
|
- packages/llm-pool/llm_pool/__init__.py
|
||||||
|
- packages/llm-pool/llm_pool/providers/__init__.py
|
||||||
|
- packages/orchestrator/orchestrator/main.py
|
||||||
|
- packages/orchestrator/orchestrator/tasks.py
|
||||||
|
- packages/orchestrator/orchestrator/agents/builder.py
|
||||||
|
- packages/orchestrator/orchestrator/agents/runner.py
|
||||||
|
- packages/orchestrator/orchestrator/__init__.py
|
||||||
|
- packages/orchestrator/orchestrator/agents/__init__.py
|
||||||
|
- tests/integration/test_llm_fallback.py
|
||||||
|
- tests/integration/test_llm_providers.py
|
||||||
|
modified:
|
||||||
|
- packages/llm-pool/pyproject.toml
|
||||||
|
- packages/orchestrator/pyproject.toml
|
||||||
|
- docker-compose.yml
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "LiteLLM pinned to ==1.82.5, not latest — September 2025 OOM regression in later versions; do not upgrade without testing"
|
||||||
|
- "llm-pool runs on port 8004, consistent with shared/config.py llm_pool_url default (plan originally stated 8002 but shared config established 8004 in Plan 01)"
|
||||||
|
- "Celery tasks are always sync def with asyncio.run() — this is a hard architectural constraint, never async def"
|
||||||
|
- "AI transparency clause is unconditional in system prompt — agents must always disclose AI identity when directly asked"
|
||||||
|
- "LiteLLM Router fallback: quality -> fast (not quality -> 503) gives graceful degradation to local inference"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "Celery sync-def pattern: All @app.task functions must be def (not async def). Use asyncio.run() for async sub-calls."
|
||||||
|
- "LLM pool abstraction: callers use model group names ('quality', 'fast') not provider-specific model IDs"
|
||||||
|
- "Runner fallback: non-200 from llm-pool returns polite fallback string, never raises to caller"
|
||||||
|
- "RLS context: configure_rls_hook(engine) once, set current_tenant_id ContextVar around DB operations, always reset in finally block"
|
||||||
|
|
||||||
|
requirements-completed: [LLM-01, LLM-02]
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 6min
|
||||||
|
completed: 2026-03-23
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 1 Plan 2: LLM Backend Pool + Celery Orchestrator Summary
|
||||||
|
|
||||||
|
**LiteLLM Router service (port 8004) with Ollama/Anthropic/OpenAI fallback chain and Celery handle_message task using sync-def + asyncio.run pattern, verified by 19 integration tests**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 6 min
|
||||||
|
- **Started:** 2026-03-23T16:01:17Z
|
||||||
|
- **Completed:** 2026-03-23T16:07:10Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 15 (12 created, 3 modified)
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
|
||||||
|
- LLM Backend Pool FastAPI service with LiteLLM Router: fast (Ollama qwen3:8b) and quality (Anthropic claude-sonnet-4 + OpenAI gpt-4o) model groups, automatic cross-group fallback, HTTP 503 on total exhaustion
|
||||||
|
- Celery orchestrator skeleton: handle_message task (sync def), system prompt builder (name + role + persona + AI transparency clause), runner (httpx to llm-pool with polite fallback)
|
||||||
|
- 19 green integration tests covering fallback routing (LLM-01) and provider configuration (LLM-02)
|
||||||
|
- Docker Compose updated with llm-pool (port 8004, healthcheck) and celery-worker services
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
Each task was committed atomically:
|
||||||
|
|
||||||
|
1. **Task 1: LLM Backend Pool service with LiteLLM Router and fallback** - `ee2f88e` (feat)
|
||||||
|
2. **Task 2: Celery orchestrator with system prompt builder and integration tests** - `8257c55` (feat)
|
||||||
|
|
||||||
|
**Plan metadata:** _(docs commit follows self-check)_
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `packages/llm-pool/llm_pool/router.py` — LiteLLM Router: 3-entry model_list, fallbacks, latency routing, `complete()` async function
|
||||||
|
- `packages/llm-pool/llm_pool/main.py` — FastAPI app on port 8004: POST /complete, GET /health, 503 error handling
|
||||||
|
- `packages/llm-pool/llm_pool/__init__.py` — Package exports (complete, llm_router)
|
||||||
|
- `packages/llm-pool/llm_pool/providers/__init__.py` — Empty placeholder for future provider customization
|
||||||
|
- `packages/llm-pool/pyproject.toml` — Pinned litellm==1.82.5, added fastapi[standard]
|
||||||
|
- `packages/orchestrator/orchestrator/main.py` — Celery app: Redis broker/backend, task discovery, task_acks_late=True
|
||||||
|
- `packages/orchestrator/orchestrator/tasks.py` — handle_message (sync def!), _process_message (async), RLS context setup
|
||||||
|
- `packages/orchestrator/orchestrator/agents/builder.py` — build_system_prompt + build_messages with AI transparency clause
|
||||||
|
- `packages/orchestrator/orchestrator/agents/runner.py` — run_agent: httpx POST to llm-pool, 120s timeout, polite fallback on error
|
||||||
|
- `packages/orchestrator/pyproject.toml` — Added celery[redis]>=5.4.0
|
||||||
|
- `docker-compose.yml` — Added llm-pool and celery-worker services
|
||||||
|
- `tests/integration/test_llm_fallback.py` — 7 tests: success paths, 503 on total failure, metadata forwarding (LLM-01)
|
||||||
|
- `tests/integration/test_llm_providers.py` — 12 tests: model_list structure, provider routing, fallback config (LLM-02)
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- **LiteLLM pinned to ==1.82.5**: Explicitly not latest — a September 2025 OOM regression exists in later releases. Warning comment added to pyproject.toml and router.py.
|
||||||
|
- **Port 8004, not 8002**: The plan stated port 8002, but Plan 01's shared/config.py already defined `llm_pool_url = "http://localhost:8004"`. Used 8004 to maintain config consistency.
|
||||||
|
- **AI transparency clause is unconditional**: Added without configuration option — per product design, agents must never deny being AIs when directly asked.
|
||||||
|
- **Celery sync-def is a hard rule**: Enforced with prominent comment block in tasks.py; `_process_message` is a private async function called only via `asyncio.run()`.
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
### Auto-fixed Issues
|
||||||
|
|
||||||
|
**1. [Rule 1 - Bug] Fixed test for LiteLLM fallback behavior at Router boundary**
|
||||||
|
- **Found during:** Task 2 (integration test verification)
|
||||||
|
- **Issue:** Initial test `test_fallback_invoked_when_primary_raises` mocked `Router.acompletion` to raise on first call then succeed on second, expecting a 200. But our code's exception handler catches the first raise and immediately returns 503 — LiteLLM's internal retry/fallback happens *inside* `acompletion`, not across multiple calls to it from our code.
|
||||||
|
- **Fix:** Renamed test to `test_fallback_succeeds_when_router_returns_response` and updated to correctly test the boundary: if `acompletion` succeeds (router resolved fallback internally), endpoint returns 200; if it raises (all exhausted), returns 503.
|
||||||
|
- **Files modified:** `tests/integration/test_llm_fallback.py`
|
||||||
|
- **Verification:** All 19 tests pass
|
||||||
|
- **Committed in:** `8257c55` (Task 2 commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Total deviations:** 1 auto-fixed (Rule 1 — incorrect test assumption)
|
||||||
|
**Impact on plan:** Test correction only — no production code changed. 19 tests accurately verify the specified behavior.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
|
||||||
|
- `python` binary not in PATH for uv project — all test/import verification commands use `uv run python` and `uv run pytest`
|
||||||
|
|
||||||
|
## User Setup Required
|
||||||
|
|
||||||
|
None - no external service configuration required. LLM API keys are read from `.env` at runtime; empty strings are the safe default for local development with Ollama only.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
|
||||||
|
- LLM pool and Celery orchestrator are ready for the Channel Gateway (Plan 03) to dispatch `handle_message` tasks
|
||||||
|
- Docker Compose llm-pool and celery-worker services defined (not yet built/tested in container — deferred to integration phase)
|
||||||
|
- `handle_message` task interface is stable: accepts `KonstructMessage.model_dump()`, returns `{message_id, response, tenant_id}`
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 01-foundation*
|
||||||
|
*Completed: 2026-03-23*
|
||||||
Reference in New Issue
Block a user