Files
konstruct/packages/gateway
Adolfo Delorenzo dd80e2b822 perf: bypass Celery for web chat — stream LLM directly from WebSocket
Eliminates 5-10s of overhead by calling the LLM pool's streaming
endpoint directly from the WebSocket handler instead of going through
Celery queue → worker → asyncio.run() → Redis pub-sub → WebSocket.

New flow: WebSocket → agent lookup → memory → LLM stream → WebSocket
Old flow: WebSocket → Celery → worker → DB → memory → LLM → Redis → WebSocket

Memory still saved (Redis sliding window + fire-and-forget embedding).
Slack/WhatsApp still use Celery (async webhook pattern).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 18:32:16 -06:00
..