Adolfo Delorenzo
dd80e2b822
perf: bypass Celery for web chat — stream LLM directly from WebSocket
Eliminates 5-10s of overhead by calling the LLM pool's streaming
endpoint directly from the WebSocket handler instead of going through
Celery queue → worker → asyncio.run() → Redis pub-sub → WebSocket.
New flow: WebSocket → agent lookup → memory → LLM stream → WebSocket
Old flow: WebSocket → Celery → worker → DB → memory → LLM → Redis → WebSocket
Memory still saved (Redis sliding window + fire-and-forget embedding).
Slack/WhatsApp still use Celery (async webhook pattern).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>