feat(01-02): LLM Backend Pool — LiteLLM Router with Ollama + Anthropic + OpenAI fallback

- Create llm_pool/router.py: LiteLLM Router with fast (Ollama) and quality (Anthropic/OpenAI) model groups
- Configure fallback chain: quality providers fail -> fast group
- Pin LiteLLM to ==1.82.5 (avoid September 2025 OOM regression in later releases)
- Create llm_pool/main.py: FastAPI service on port 8004 with /complete and /health endpoints
- Add providers/__init__.py: reserved for future per-provider customization
- Update docker-compose.yml: add llm-pool and celery-worker service stubs
This commit is contained in:
2026-03-23 10:03:05 -06:00
parent 0054383be0
commit ee2f88e13b
7 changed files with 370 additions and 5 deletions

View File

@@ -9,7 +9,10 @@ description = "LLM Backend Pool — LiteLLM router for Ollama, vLLM, OpenAI, Ant
requires-python = ">=3.12"
dependencies = [
"konstruct-shared",
"litellm>=1.54.0",
# Pinned: do NOT upgrade past 1.82.5 — a September 2025 OOM regression exists
# in later releases. Verify fix before bumping.
"litellm==1.82.5",
"fastapi[standard]>=0.115.0",
"httpx>=0.28.0",
]