278 lines
13 KiB
Markdown
278 lines
13 KiB
Markdown
---
|
|
phase: 01-foundation
|
|
plan: 02
|
|
type: execute
|
|
wave: 2
|
|
depends_on: ["01-01"]
|
|
files_modified:
|
|
- packages/llm-pool/__init__.py
|
|
- packages/llm-pool/main.py
|
|
- packages/llm-pool/router.py
|
|
- packages/llm-pool/providers/__init__.py
|
|
- packages/orchestrator/__init__.py
|
|
- packages/orchestrator/main.py
|
|
- packages/orchestrator/tasks.py
|
|
- packages/orchestrator/agents/__init__.py
|
|
- packages/orchestrator/agents/builder.py
|
|
- packages/orchestrator/agents/runner.py
|
|
- docker-compose.yml
|
|
- tests/integration/test_llm_fallback.py
|
|
- tests/integration/test_llm_providers.py
|
|
autonomous: true
|
|
requirements:
|
|
- LLM-01
|
|
- LLM-02
|
|
|
|
must_haves:
|
|
truths:
|
|
- "A completion request to the LLM pool service returns an LLM-generated response from the configured provider"
|
|
- "When the primary provider is unavailable, the LLM pool automatically falls back to the next provider in the chain"
|
|
- "Both Ollama (local) and Anthropic/OpenAI (commercial) are configured as available providers"
|
|
- "Celery worker dispatches handle_message tasks asynchronously without blocking the caller"
|
|
artifacts:
|
|
- path: "packages/llm-pool/main.py"
|
|
provides: "FastAPI service exposing /complete endpoint"
|
|
exports: ["app"]
|
|
- path: "packages/llm-pool/router.py"
|
|
provides: "LiteLLM Router with model groups and fallback chains"
|
|
exports: ["llm_router", "complete"]
|
|
- path: "packages/orchestrator/tasks.py"
|
|
provides: "Celery task handle_message (sync def, uses asyncio.run)"
|
|
exports: ["handle_message"]
|
|
- path: "packages/orchestrator/agents/builder.py"
|
|
provides: "System prompt assembly from agent persona fields"
|
|
exports: ["build_system_prompt"]
|
|
- path: "packages/orchestrator/agents/runner.py"
|
|
provides: "LLM call via llm-pool HTTP endpoint, response parsing"
|
|
exports: ["run_agent"]
|
|
key_links:
|
|
- from: "packages/orchestrator/agents/runner.py"
|
|
to: "packages/llm-pool/main.py"
|
|
via: "HTTP POST to /complete endpoint"
|
|
pattern: "httpx.*llm.pool.*complete"
|
|
- from: "packages/orchestrator/tasks.py"
|
|
to: "packages/orchestrator/agents/runner.py"
|
|
via: "Celery task calls run_agent"
|
|
pattern: "run_agent"
|
|
- from: "packages/llm-pool/router.py"
|
|
to: "LiteLLM"
|
|
via: "Router.acompletion() with model_list and fallbacks"
|
|
pattern: "router\\.acompletion"
|
|
---
|
|
|
|
<objective>
|
|
Build the LLM Backend Pool service (LiteLLM Router with Ollama + Anthropic/OpenAI providers and fallback routing) and the Celery-based Agent Orchestrator skeleton (async task dispatch, system prompt assembly, LLM call via pool).
|
|
|
|
Purpose: Provide the LLM inference layer that the Channel Gateway (Plan 03) will dispatch work to. Establishes the critical Celery sync-def pattern and the LiteLLM Router configuration before any channel integration exists.
|
|
|
|
Output: Running LLM pool FastAPI service on port 8002, Celery worker processing handle_message tasks, system prompt builder, and green integration tests for fallback routing.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
|
|
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/STATE.md
|
|
@.planning/phases/01-foundation/01-CONTEXT.md
|
|
@.planning/phases/01-foundation/01-RESEARCH.md
|
|
@.planning/phases/01-foundation/01-01-SUMMARY.md
|
|
|
|
<interfaces>
|
|
<!-- From Plan 01 — shared models and DB layer the orchestrator depends on -->
|
|
|
|
From packages/shared/models/message.py:
|
|
```python
|
|
class ChannelType(StrEnum):
|
|
SLACK = "slack"
|
|
WHATSAPP = "whatsapp"
|
|
MATTERMOST = "mattermost"
|
|
|
|
class KonstructMessage(BaseModel):
|
|
id: str
|
|
tenant_id: str | None = None
|
|
channel: ChannelType
|
|
channel_metadata: dict
|
|
sender: SenderInfo
|
|
content: MessageContent
|
|
timestamp: datetime
|
|
thread_id: str | None = None
|
|
reply_to: str | None = None
|
|
context: dict = Field(default_factory=dict)
|
|
```
|
|
|
|
From packages/shared/models/tenant.py:
|
|
```python
|
|
class Agent(Base):
|
|
id: Mapped[uuid.UUID]
|
|
tenant_id: Mapped[uuid.UUID]
|
|
name: Mapped[str]
|
|
role: Mapped[str]
|
|
persona: Mapped[str | None]
|
|
system_prompt: Mapped[str | None]
|
|
model_preference: Mapped[str] # "quality" | "fast"
|
|
tool_assignments: Mapped[list] # JSON
|
|
escalation_rules: Mapped[list] # JSON
|
|
is_active: Mapped[bool]
|
|
```
|
|
|
|
From packages/shared/db.py:
|
|
```python
|
|
async def get_session() -> AsyncGenerator[AsyncSession, None]: ...
|
|
```
|
|
|
|
From packages/shared/config.py:
|
|
```python
|
|
class Settings(BaseSettings):
|
|
anthropic_api_key: str
|
|
openai_api_key: str
|
|
ollama_base_url: str = "http://ollama:11434"
|
|
redis_url: str = "redis://redis:6379/0"
|
|
# ...
|
|
```
|
|
</interfaces>
|
|
</context>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto">
|
|
<name>Task 1: LLM Backend Pool service with LiteLLM Router and fallback</name>
|
|
<files>
|
|
packages/llm-pool/__init__.py,
|
|
packages/llm-pool/main.py,
|
|
packages/llm-pool/router.py,
|
|
packages/llm-pool/providers/__init__.py,
|
|
docker-compose.yml
|
|
</files>
|
|
<action>
|
|
1. Create `packages/llm-pool/router.py`:
|
|
- Configure LiteLLM `Router` with `model_list` containing three model entries:
|
|
- `"fast"` group: `ollama/qwen3:8b` pointing to `settings.ollama_base_url`
|
|
- `"quality"` group: `anthropic/claude-sonnet-4-20250514` with `settings.anthropic_api_key`
|
|
- `"quality"` group (fallback): `openai/gpt-4o` with `settings.openai_api_key`
|
|
- Configure `fallbacks=[{"quality": ["fast"]}]` — if all quality providers fail, fall back to fast
|
|
- Set `routing_strategy="latency-based-routing"`, `num_retries=2`, `set_verbose=False`
|
|
- Pin LiteLLM to `1.82.5` in pyproject.toml (not latest — September 2025 OOM issue)
|
|
- Export an async `complete(model_group: str, messages: list[dict], tenant_id: str)` function that calls `router.acompletion()` and returns the response content string. Include tenant_id in metadata for cost tracking.
|
|
|
|
2. Create `packages/llm-pool/main.py`:
|
|
- FastAPI app on port 8002
|
|
- `POST /complete` endpoint accepting `{ model: str, messages: list[dict], tenant_id: str }` — model is the group name ("quality" or "fast"), messages is the OpenAI-format message list
|
|
- Returns `{ content: str, model: str, usage: { prompt_tokens: int, completion_tokens: int } }`
|
|
- `GET /health` endpoint returning `{ status: "ok" }`
|
|
- Error handling: If LiteLLM raises an exception (all providers down), return 503 with `{ error: "All providers unavailable" }`
|
|
|
|
3. Update `docker-compose.yml` to add the `llm-pool` service:
|
|
- Build from packages/llm-pool or use uvicorn command
|
|
- Port 8002
|
|
- Depends on: ollama, redis
|
|
- Environment: all LLM-related env vars from .env
|
|
|
|
4. Create `packages/llm-pool/providers/__init__.py` — empty for now, prepared for future per-provider customization.
|
|
|
|
5. Create `packages/llm-pool/__init__.py` with minimal exports.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/adelorenzo/repos/konstruct && python -c "from packages.llm_pool.router import complete; from packages.llm_pool.main import app; print('LLM pool imports OK')"</automated>
|
|
</verify>
|
|
<done>
|
|
- LiteLLM Router configured with fast (Ollama) and quality (Anthropic + OpenAI) model groups
|
|
- Fallback chain: quality providers -> fast
|
|
- /complete endpoint accepts model group, messages, tenant_id and returns LLM response
|
|
- LiteLLM pinned to 1.82.5
|
|
- Docker Compose includes llm-pool service
|
|
</done>
|
|
</task>
|
|
|
|
<task type="auto">
|
|
<name>Task 2: Celery orchestrator with system prompt builder and integration tests</name>
|
|
<files>
|
|
packages/orchestrator/__init__.py,
|
|
packages/orchestrator/main.py,
|
|
packages/orchestrator/tasks.py,
|
|
packages/orchestrator/agents/__init__.py,
|
|
packages/orchestrator/agents/builder.py,
|
|
packages/orchestrator/agents/runner.py,
|
|
tests/integration/test_llm_fallback.py,
|
|
tests/integration/test_llm_providers.py
|
|
</files>
|
|
<action>
|
|
1. Create `packages/orchestrator/main.py`:
|
|
- Celery app configured with Redis broker (`settings.redis_url`)
|
|
- Result backend: Redis
|
|
- Include tasks from `packages.orchestrator.tasks`
|
|
|
|
2. Create `packages/orchestrator/tasks.py`:
|
|
- CRITICAL PATTERN: All Celery tasks MUST be `def` (synchronous), NOT `async def`.
|
|
- `@app.task def handle_message(message_data: dict) -> dict`: Deserializes message_data into KonstructMessage, calls `asyncio.run(_process_message(msg))`, returns result dict.
|
|
- `async def _process_message(msg: KonstructMessage) -> dict`: Loads agent config from DB (using tenant_id + RLS), builds system prompt, calls LLM pool, returns response content.
|
|
- Add a clear comment block at the top: "# CELERY TASKS MUST BE SYNC def — async def causes RuntimeError or silent hang. Use asyncio.run() for async work."
|
|
|
|
3. Create `packages/orchestrator/agents/builder.py`:
|
|
- `build_system_prompt(agent: Agent) -> str`: Assembles the system prompt from agent fields:
|
|
- Starts with agent.system_prompt if provided
|
|
- Appends persona context: "Your name is {agent.name}. Your role is {agent.role}."
|
|
- If agent.persona is set, appends: "Persona: {agent.persona}"
|
|
- Appends AI transparency clause: "If asked directly whether you are an AI, always respond honestly that you are an AI assistant."
|
|
- Per user decision: professional + warm tone is the default persona
|
|
- `build_messages(system_prompt: str, user_message: str, history: list[dict] | None = None) -> list[dict]`: Returns OpenAI-format messages list with system prompt, optional history, and user message.
|
|
|
|
4. Create `packages/orchestrator/agents/runner.py`:
|
|
- `async def run_agent(msg: KonstructMessage, agent: Agent) -> str`: Builds system prompt, constructs messages, calls LLM pool via `httpx.AsyncClient` POST to `http://llm-pool:8002/complete` with `{ model: agent.model_preference, messages: messages, tenant_id: msg.tenant_id }`. Returns the content string from the response.
|
|
- Handle errors: If LLM pool returns non-200, log error and return a polite fallback message ("I'm having trouble processing your request right now. Please try again in a moment.").
|
|
|
|
5. Create `tests/integration/test_llm_fallback.py` (LLM-01):
|
|
- Mock LiteLLM Router to simulate primary provider failure
|
|
- Verify that when "quality" primary (Anthropic) raises an exception, the request automatically retries with fallback (OpenAI), then falls back to "fast" (Ollama)
|
|
- Test that a successful fallback still returns a valid response
|
|
- Test that when ALL providers fail, a 503 is returned
|
|
|
|
6. Create `tests/integration/test_llm_providers.py` (LLM-02):
|
|
- Mock LiteLLM Router to verify both Ollama and commercial API configurations are present
|
|
- Test that a request with model="fast" routes to Ollama
|
|
- Test that a request with model="quality" routes to Anthropic or OpenAI
|
|
- Verify the model_list contains entries for all three providers
|
|
|
|
7. Create `__init__.py` files for orchestrator and orchestrator/agents packages.
|
|
|
|
8. Update docker-compose.yml to add `celery-worker` service:
|
|
- Command: `celery -A packages.orchestrator.main worker --loglevel=info`
|
|
- Depends on: redis, postgres, llm-pool
|
|
</action>
|
|
<verify>
|
|
<automated>cd /home/adelorenzo/repos/konstruct && pytest tests/integration/test_llm_fallback.py tests/integration/test_llm_providers.py -x -q</automated>
|
|
</verify>
|
|
<done>
|
|
- Celery worker starts and accepts handle_message tasks
|
|
- All Celery tasks are sync def with asyncio.run() pattern (never async def)
|
|
- System prompt builder assembles persona, role, name, and AI transparency clause
|
|
- LLM pool fallback: quality -> fast verified by integration tests
|
|
- Both Ollama and commercial providers configured and routable
|
|
- handle_message pipeline: deserialize -> load agent -> build prompt -> call LLM pool -> return response
|
|
</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<verification>
|
|
- `pytest tests/integration/test_llm_fallback.py -x` proves fallback routing works
|
|
- `pytest tests/integration/test_llm_providers.py -x` proves both local and commercial providers are configured
|
|
- LLM pool /complete endpoint returns valid responses
|
|
- Celery worker processes handle_message tasks without RuntimeError
|
|
- No `async def` Celery tasks exist (grep confirms)
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- LLM Backend Pool routes requests through LiteLLM to configured providers with automatic fallback
|
|
- Celery orchestrator dispatches and completes handle_message tasks asynchronously
|
|
- System prompt reflects agent's name, role, persona, and AI transparency clause
|
|
- All tests green
|
|
</success_criteria>
|
|
|
|
<output>
|
|
After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md`
|
|
</output>
|