- complete_stream() in router.py yields token strings via acompletion(stream=True)
- POST /complete/stream returns NDJSON: chunk lines then a done line
- Streaming path does not support tool calls (plain text only)
- Non-streaming POST /complete endpoint unchanged
- Add OLLAMA_MODEL setting to shared config (default: qwen3:32b)
- LLM router reads from settings instead of hardcoded model name
- Create .env file with all configurable settings documented
- docker-compose passes OLLAMA_MODEL to llm-pool container
To change the model: edit OLLAMA_MODEL in .env and restart llm-pool.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added balanced/economy/local groups alongside fast/quality so all 5
agent model_preference values resolve to real provider groups.
All default to local Ollama qwen3:32b, commercial as fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create llm_pool/router.py: LiteLLM Router with fast (Ollama) and quality (Anthropic/OpenAI) model groups
- Configure fallback chain: quality providers fail -> fast group
- Pin LiteLLM to ==1.82.5 (avoid September 2025 OOM regression in later releases)
- Create llm_pool/main.py: FastAPI service on port 8004 with /complete and /health endpoints
- Add providers/__init__.py: reserved for future per-provider customization
- Update docker-compose.yml: add llm-pool and celery-worker service stubs