konstruct/.planning/phases/01-foundation/01-02-PLAN.md at 6c1086046ff65be103ae546fba41db3bc76bee0e

adelorenzo/konstruct

Fork 0

Files

Adolfo Delorenzo d611a07cc2 docs(01-foundation): create phase plan

2026-03-23 09:32:44 -06:00

13 KiB

Raw Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

01-foundation

execute

01-01

packages/llm-pool/__init__.py

packages/llm-pool/main.py

packages/llm-pool/router.py

packages/llm-pool/providers/__init__.py

packages/orchestrator/__init__.py

packages/orchestrator/main.py

packages/orchestrator/tasks.py

packages/orchestrator/agents/__init__.py

packages/orchestrator/agents/builder.py

packages/orchestrator/agents/runner.py

docker-compose.yml

tests/integration/test_llm_fallback.py

tests/integration/test_llm_providers.py

true

LLM-01

LLM-02

truths

artifacts

key_links

A completion request to the LLM pool service returns an LLM-generated response from the configured provider

When the primary provider is unavailable, the LLM pool automatically falls back to the next provider in the chain

Both Ollama (local) and Anthropic/OpenAI (commercial) are configured as available providers

Celery worker dispatches handle_message tasks asynchronously without blocking the caller

path

provides

exports

packages/llm-pool/main.py

FastAPI service exposing /complete endpoint

app

path

provides

exports

packages/llm-pool/router.py

LiteLLM Router with model groups and fallback chains

llm_router

complete

path

provides

exports

packages/orchestrator/tasks.py

Celery task handle_message (sync def, uses asyncio.run)

handle_message

path

provides

exports

packages/orchestrator/agents/builder.py

System prompt assembly from agent persona fields

build_system_prompt

path

provides

exports

packages/orchestrator/agents/runner.py

LLM call via llm-pool HTTP endpoint, response parsing

run_agent

from	to	via	pattern
packages/orchestrator/agents/runner.py	packages/llm-pool/main.py	HTTP POST to /complete endpoint	httpx.llm.pool.complete

from	to	via	pattern
packages/orchestrator/tasks.py	packages/orchestrator/agents/runner.py	Celery task calls run_agent	run_agent

from	to	via	pattern
packages/llm-pool/router.py	LiteLLM	Router.acompletion() with model_list and fallbacks	router.acompletion

Build the LLM Backend Pool service (LiteLLM Router with Ollama + Anthropic/OpenAI providers and fallback routing) and the Celery-based Agent Orchestrator skeleton (async task dispatch, system prompt assembly, LLM call via pool).

Purpose: Provide the LLM inference layer that the Channel Gateway (Plan 03) will dispatch work to. Establishes the critical Celery sync-def pattern and the LiteLLM Router configuration before any channel integration exists.

Output: Running LLM pool FastAPI service on port 8002, Celery worker processing handle_message tasks, system prompt builder, and green integration tests for fallback routing.

<execution_context> @/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md @/home/adelorenzo/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/01-foundation/01-CONTEXT.md @.planning/phases/01-foundation/01-RESEARCH.md @.planning/phases/01-foundation/01-01-SUMMARY.md

From packages/shared/models/message.py:

class ChannelType(StrEnum):
    SLACK = "slack"
    WHATSAPP = "whatsapp"
    MATTERMOST = "mattermost"

class KonstructMessage(BaseModel):
    id: str
    tenant_id: str | None = None
    channel: ChannelType
    channel_metadata: dict
    sender: SenderInfo
    content: MessageContent
    timestamp: datetime
    thread_id: str | None = None
    reply_to: str | None = None
    context: dict = Field(default_factory=dict)

From packages/shared/models/tenant.py:

class Agent(Base):
    id: Mapped[uuid.UUID]
    tenant_id: Mapped[uuid.UUID]
    name: Mapped[str]
    role: Mapped[str]
    persona: Mapped[str | None]
    system_prompt: Mapped[str | None]
    model_preference: Mapped[str]  # "quality" | "fast"
    tool_assignments: Mapped[list]  # JSON
    escalation_rules: Mapped[list]  # JSON
    is_active: Mapped[bool]

From packages/shared/db.py:

async def get_session() -> AsyncGenerator[AsyncSession, None]: ...

From packages/shared/config.py:

class Settings(BaseSettings):
    anthropic_api_key: str
    openai_api_key: str
    ollama_base_url: str = "http://ollama:11434"
    redis_url: str = "redis://redis:6379/0"
    # ...

Task 1: LLM Backend Pool service with LiteLLM Router and fallback packages/llm-pool/__init__.py, packages/llm-pool/main.py, packages/llm-pool/router.py, packages/llm-pool/providers/__init__.py, docker-compose.yml 1. Create `packages/llm-pool/router.py`: - Configure LiteLLM `Router` with `model_list` containing three model entries: - `"fast"` group: `ollama/qwen3:8b` pointing to `settings.ollama_base_url` - `"quality"` group: `anthropic/claude-sonnet-4-20250514` with `settings.anthropic_api_key` - `"quality"` group (fallback): `openai/gpt-4o` with `settings.openai_api_key` - Configure `fallbacks=[{"quality": ["fast"]}]` — if all quality providers fail, fall back to fast - Set `routing_strategy="latency-based-routing"`, `num_retries=2`, `set_verbose=False` - Pin LiteLLM to `1.82.5` in pyproject.toml (not latest — September 2025 OOM issue) - Export an async `complete(model_group: str, messages: list[dict], tenant_id: str)` function that calls `router.acompletion()` and returns the response content string. Include tenant_id in metadata for cost tracking.

2. Create `packages/llm-pool/main.py`:
   - FastAPI app on port 8002
   - `POST /complete` endpoint accepting `{ model: str, messages: list[dict], tenant_id: str }` — model is the group name ("quality" or "fast"), messages is the OpenAI-format message list
   - Returns `{ content: str, model: str, usage: { prompt_tokens: int, completion_tokens: int } }`
   - `GET /health` endpoint returning `{ status: "ok" }`
   - Error handling: If LiteLLM raises an exception (all providers down), return 503 with `{ error: "All providers unavailable" }`

3. Update `docker-compose.yml` to add the `llm-pool` service:
   - Build from packages/llm-pool or use uvicorn command
   - Port 8002
   - Depends on: ollama, redis
   - Environment: all LLM-related env vars from .env

4. Create `packages/llm-pool/providers/__init__.py` — empty for now, prepared for future per-provider customization.

5. Create `packages/llm-pool/__init__.py` with minimal exports.

cd /home/adelorenzo/repos/konstruct && python -c "from packages.llm_pool.router import complete; from packages.llm_pool.main import app; print('LLM pool imports OK')" - LiteLLM Router configured with fast (Ollama) and quality (Anthropic + OpenAI) model groups - Fallback chain: quality providers -> fast - /complete endpoint accepts model group, messages, tenant_id and returns LLM response - LiteLLM pinned to 1.82.5 - Docker Compose includes llm-pool service Task 2: Celery orchestrator with system prompt builder and integration tests packages/orchestrator/__init__.py, packages/orchestrator/main.py, packages/orchestrator/tasks.py, packages/orchestrator/agents/__init__.py, packages/orchestrator/agents/builder.py, packages/orchestrator/agents/runner.py, tests/integration/test_llm_fallback.py, tests/integration/test_llm_providers.py 1. Create `packages/orchestrator/main.py`: - Celery app configured with Redis broker (`settings.redis_url`) - Result backend: Redis - Include tasks from `packages.orchestrator.tasks`

2. Create `packages/orchestrator/tasks.py`:
   - CRITICAL PATTERN: All Celery tasks MUST be `def` (synchronous), NOT `async def`.
   - `@app.task def handle_message(message_data: dict) -> dict`: Deserializes message_data into KonstructMessage, calls `asyncio.run(_process_message(msg))`, returns result dict.
   - `async def _process_message(msg: KonstructMessage) -> dict`: Loads agent config from DB (using tenant_id + RLS), builds system prompt, calls LLM pool, returns response content.
   - Add a clear comment block at the top: "# CELERY TASKS MUST BE SYNC def — async def causes RuntimeError or silent hang. Use asyncio.run() for async work."

3. Create `packages/orchestrator/agents/builder.py`:
   - `build_system_prompt(agent: Agent) -> str`: Assembles the system prompt from agent fields:
     - Starts with agent.system_prompt if provided
     - Appends persona context: "Your name is {agent.name}. Your role is {agent.role}."
     - If agent.persona is set, appends: "Persona: {agent.persona}"
     - Appends AI transparency clause: "If asked directly whether you are an AI, always respond honestly that you are an AI assistant."
     - Per user decision: professional + warm tone is the default persona
   - `build_messages(system_prompt: str, user_message: str, history: list[dict] | None = None) -> list[dict]`: Returns OpenAI-format messages list with system prompt, optional history, and user message.

4. Create `packages/orchestrator/agents/runner.py`:
   - `async def run_agent(msg: KonstructMessage, agent: Agent) -> str`: Builds system prompt, constructs messages, calls LLM pool via `httpx.AsyncClient` POST to `http://llm-pool:8002/complete` with `{ model: agent.model_preference, messages: messages, tenant_id: msg.tenant_id }`. Returns the content string from the response.
   - Handle errors: If LLM pool returns non-200, log error and return a polite fallback message ("I'm having trouble processing your request right now. Please try again in a moment.").

5. Create `tests/integration/test_llm_fallback.py` (LLM-01):
   - Mock LiteLLM Router to simulate primary provider failure
   - Verify that when "quality" primary (Anthropic) raises an exception, the request automatically retries with fallback (OpenAI), then falls back to "fast" (Ollama)
   - Test that a successful fallback still returns a valid response
   - Test that when ALL providers fail, a 503 is returned

6. Create `tests/integration/test_llm_providers.py` (LLM-02):
   - Mock LiteLLM Router to verify both Ollama and commercial API configurations are present
   - Test that a request with model="fast" routes to Ollama
   - Test that a request with model="quality" routes to Anthropic or OpenAI
   - Verify the model_list contains entries for all three providers

7. Create `__init__.py` files for orchestrator and orchestrator/agents packages.

8. Update docker-compose.yml to add `celery-worker` service:
   - Command: `celery -A packages.orchestrator.main worker --loglevel=info`
   - Depends on: redis, postgres, llm-pool

cd /home/adelorenzo/repos/konstruct && pytest tests/integration/test_llm_fallback.py tests/integration/test_llm_providers.py -x -q - Celery worker starts and accepts handle_message tasks - All Celery tasks are sync def with asyncio.run() pattern (never async def) - System prompt builder assembles persona, role, name, and AI transparency clause - LLM pool fallback: quality -> fast verified by integration tests - Both Ollama and commercial providers configured and routable - handle_message pipeline: deserialize -> load agent -> build prompt -> call LLM pool -> return response - `pytest tests/integration/test_llm_fallback.py -x` proves fallback routing works - `pytest tests/integration/test_llm_providers.py -x` proves both local and commercial providers are configured - LLM pool /complete endpoint returns valid responses - Celery worker processes handle_message tasks without RuntimeError - No `async def` Celery tasks exist (grep confirms)

<success_criteria>

LLM Backend Pool routes requests through LiteLLM to configured providers with automatic fallback
Celery orchestrator dispatches and completes handle_message tasks asynchronously
System prompt reflects agent's name, role, persona, and AI transparency clause
All tests green </success_criteria>

After completion, create `.planning/phases/01-foundation/01-02-SUMMARY.md`

13 KiB Raw Blame History

13 KiB

Raw Blame History