docs: complete project research

2026-03-22 00:12:58 -06:00
parent 320da9df87
commit 376982f16f
5 changed files with 1655 additions and 0 deletions
--- a/.planning/research/ARCHITECTURE.md
+++ b/.planning/research/ARCHITECTURE.md
@@ -0,0 +1,511 @@
+# Architecture Research
+
+**Domain:** Channel-native AI workforce platform (multi-tenant, messaging-channel-first)
+**Researched:** 2026-03-22
+**Confidence:** HIGH (core patterns verified against official Slack docs, LiteLLM docs, pgvector community resources, and multiple production-pattern sources)
+
+---
+
+## Standard Architecture
+
+### System Overview
+
+```
+External Channels (Slack, WhatsApp)
+         │ HTTPS webhooks / Events API
+         ▼
+┌─────────────────────────────────────────────────────────────┐
+│                    INGRESS LAYER                             │
+│  ┌─────────────────────┐   ┌─────────────────────────────┐  │
+│  │   Channel Gateway   │   │   Stripe Webhook Endpoint   │  │
+│  │  (FastAPI service)  │   │   (billing events)          │  │
+│  └──────────┬──────────┘   └─────────────────────────────┘  │
+└─────────────│───────────────────────────────────────────────┘
+              │ Normalized KonstructMessage
+              ▼
+┌─────────────────────────────────────────────────────────────┐
+│                   MESSAGE ROUTING LAYER                      │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │  Message Router                                      │   │
+│  │  - Tenant resolution (channel org → tenant_id)       │   │
+│  │  - Per-tenant rate limiting (Redis token bucket)     │   │
+│  │  - Context loading (tenant config, agent config)     │   │
+│  │  - Idempotency check (Redis dedup key)               │   │
+│  └────────────────────────┬─────────────────────────────┘   │
+└───────────────────────────│─────────────────────────────────┘
+                            │ Enqueued task (Celery)
+                            ▼
+┌─────────────────────────────────────────────────────────────┐
+│                  AGENT ORCHESTRATION LAYER                   │
+│  ┌──────────────────────────────────────────────────────┐   │
+│  │  Agent Orchestrator (per-tenant Celery worker)       │   │
+│  │  - Agent context assembly (persona, tools, memory)   │   │
+│  │  - Conversation history retrieval (Redis + pgvector) │   │
+│  │  - LLM call dispatch → LLM Backend Pool              │   │
+│  │  - Tool execution (registry lookup + run)            │   │
+│  │  - Response routing back to originating channel      │   │
+│  └──────────────────────────────────────────────────────┘   │
+└───────────────────┬────────────────────────┬────────────────┘
+                    │                        │
+                    ▼                        ▼
+┌────────────────────────┐    ┌──────────────────────────────┐
+│   LLM BACKEND POOL     │    │      TOOL EXECUTOR           │
+│                        │    │  - Registry (tool → handler) │
+│   LiteLLM Router       │    │  - Execution (async/sync)    │
+│   ├── Ollama (local)   │    │  - Result capture + logging  │
+│   ├── Anthropic API    │    └──────────────────────────────┘
+│   └── OpenAI API       │
+└────────────────────────┘
+         │
+         ▼
+┌─────────────────────────────────────────────────────────────┐
+│                     DATA LAYER                               │
+│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐  │
+│  │  PostgreSQL  │  │    Redis     │  │   MinIO / S3      │  │
+│  │  (+ pgvector │  │  (sessions,  │  │  (file attach.,   │  │
+│  │  + RLS)      │  │  rate limit, │  │  agent artifacts) │  │
+│  │              │  │  task queue, │  │                   │  │
+│  │  - tenants   │  │  pub/sub)    │  │                   │  │
+│  │  - agents    │  └──────────────┘  └───────────────────┘  │
+│  │  - messages  │                                            │
+│  │  - tools     │                                            │
+│  │  - billing   │                                            │
+│  └──────────────┘                                            │
+└─────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────┐
+│                    ADMIN PORTAL                              │
+│  Next.js 14 App Router (separate deployment)                 │
+│  - Tenant management, agent config, billing, monitoring      │
+│  - Reads/writes to FastAPI REST API (auth via JWT)           │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Component Responsibilities
+
+| Component | Responsibility | Communicates With |
+|-----------|----------------|-------------------|
+| **Channel Gateway** | Receive and verify inbound webhooks from Slack/WhatsApp; normalize to KonstructMessage; acknowledge within 3s | Message Router (HTTP or enqueue), Redis (idempotency) |
+| **Message Router** | Resolve tenant from channel metadata; rate-limit per tenant; load tenant/agent context; enqueue to Celery | PostgreSQL (tenant lookup), Redis (rate limit + dedup), Celery (enqueue) |
+| **Agent Orchestrator** | Assemble agent prompt from persona + memory + conversation history; call LLM; execute tools; emit response back to channel | LLM Backend Pool, Tool Executor, Memory Layer (Redis + pgvector), Channel Gateway (outbound) |
+| **LLM Backend Pool** | Route LLM calls across Ollama/Anthropic/OpenAI with fallback, retry, and cost tracking | Ollama (local HTTP), Anthropic API, OpenAI API |
+| **Tool Executor** | Maintain tool registry; execute tool calls from agent; return results; log every invocation for audit | External APIs (per-tool), PostgreSQL (audit log) |
+| **Memory Layer** | Short-term: Redis sliding window for recent messages; Long-term: pgvector for semantic retrieval of past conversations | Redis, PostgreSQL (pgvector extension) |
+| **Admin Portal** | UI for tenant CRUD, agent configuration, channel setup, billing, and usage monitoring | FastAPI REST API (authenticated) |
+| **Billing Service** | Handle Stripe webhooks; update tenant subscription state; enforce feature limits based on plan | Stripe, PostgreSQL (subscription state) |
+
+---
+
+## Recommended Project Structure
+
+```
+konstruct/
+├── packages/
+│   ├── gateway/                     # Channel Gateway service (FastAPI)
+│   │   ├── channels/
+│   │   │   ├── slack.py             # Slack Events API handler (HTTP mode)
+│   │   │   └── whatsapp.py          # WhatsApp Cloud API webhook handler
+│   │   ├── normalize.py             # → KonstructMessage
+│   │   ├── verify.py                # Signature verification per channel
+│   │   └── main.py                  # FastAPI app, routes
+│   │
+│   ├── router/                      # Message Router service (FastAPI)
+│   │   ├── tenant.py                # Channel org ID → tenant_id lookup
+│   │   ├── ratelimit.py             # Redis token bucket per tenant
+│   │   ├── idempotency.py           # Redis dedup (message_id key, TTL)
+│   │   ├── context.py               # Load agent config from DB
+│   │   └── main.py
+│   │
+│   ├── orchestrator/                # Agent Orchestrator (Celery workers)
+│   │   ├── tasks.py                 # Celery task: handle_message
+│   │   ├── agents/
+│   │   │   ├── builder.py           # Assemble agent (persona + tools + memory)
+│   │   │   └── runner.py            # LLM call loop (reason → tool → observe)
+│   │   ├── memory/
+│   │   │   ├── short_term.py        # Redis sliding window (last N messages)
+│   │   │   └── long_term.py         # pgvector semantic search
+│   │   ├── tools/
+│   │   │   ├── registry.py          # Tool name → handler function mapping
+│   │   │   ├── executor.py          # Async execution + audit logging
+│   │   │   └── builtins/            # Built-in tools (web search, calendar, etc.)
+│   │   └── main.py                  # Worker entry point
+│   │
+│   ├── llm-pool/                    # LLM Backend Pool (LiteLLM wrapper)
+│   │   ├── router.py                # LiteLLM Router config (model groups)
+│   │   ├── providers/
+│   │   │   ├── ollama.py
+│   │   │   ├── anthropic.py
+│   │   │   └── openai.py
+│   │   └── main.py                  # FastAPI app exposing /complete endpoint
+│   │
+│   ├── portal/                      # Next.js 14 Admin Dashboard
+│   │   ├── app/
+│   │   │   ├── (auth)/              # Login, signup routes
+│   │   │   ├── dashboard/           # Post-auth layout
+│   │   │   ├── tenants/             # Tenant management
+│   │   │   ├── agents/              # Agent config
+│   │   │   ├── billing/             # Stripe customer portal
+│   │   │   └── api/                 # Next.js API routes (thin proxy or auth only)
+│   │   ├── components/              # shadcn/ui components
+│   │   └── lib/
+│   │       ├── api.ts               # TanStack Query hooks + API client
+│   │       └── auth.ts              # NextAuth.js config
+│   │
+│   └── shared/                      # Shared Python library (no service)
+│       ├── models/
+│       │   ├── message.py           # KonstructMessage Pydantic model
+│       │   ├── tenant.py            # Tenant, Agent SQLAlchemy models
+│       │   └── billing.py           # Subscription, Plan models
+│       ├── db.py                    # SQLAlchemy async engine + session factory
+│       ├── rls.py                   # SET app.current_tenant helper
+│       └── config.py                # Pydantic Settings (env vars)
+│
+├── migrations/                      # Alembic (single migration history)
+├── tests/
+│   ├── unit/
+│   ├── integration/
+│   └── e2e/
+├── docker-compose.yml               # All services + infra (Redis, PG, MinIO, Ollama)
+└── pyproject.toml                   # uv workspace config, shared deps
+```
+
+### Structure Rationale
+
+- **packages/ per service:** Each directory is a standalone FastAPI app or Celery worker with its own `main.py`. The boundary maps to a Docker container. Services communicate over HTTP or Celery/Redis, not in-process imports.
+- **shared/:** Common Pydantic models and SQLAlchemy models live here to prevent duplication and drift. No business logic — only types, DB session factory, and config.
+- **gateway/ channels/:** Each channel adapter is a separate file so adding a new channel (e.g., Telegram in v2) is an isolated change with no blast radius.
+- **orchestrator/ memory/:** Short-term and long-term memory are separate modules because they have different backends, eviction policies, and query semantics.
+- **portal/ app/:** Next.js App Router route grouping with `(auth)` for pre-auth pages and `dashboard/` for post-auth so layout boundaries are explicit.
+
+---
+
+## Architectural Patterns
+
+### Pattern 1: Immediate-Acknowledge, Async-Process
+
+**What:** The Channel Gateway returns HTTP 200 to Slack/WhatsApp within 3 seconds, without performing any LLM work. The actual processing is dispatched to Celery.
+
+**When to use:** Always. Slack will retry and flag your app as unhealthy if it doesn't receive a 2xx within 3 seconds. WhatsApp Cloud API requires sub-20s acknowledgment.
+
+**Trade-offs:** Adds Celery + Redis infrastructure requirement. The response to the user is sent as a follow-up message, not as the HTTP response — this is intentional and matches how Slack/WhatsApp users expect bots to behave anyway (typing indicator → message appears).
+
+**Example:**
+```python
+# gateway/channels/slack.py
+@app.event("message")
+async def handle_message(event, say, client):
+    # 1. Normalize immediately
+    msg = normalize_slack(event)
+    # 2. Verify idempotency (skip duplicate events)
+    if await is_duplicate(msg.id):
+        return
+    # 3. Enqueue for async processing — DO NOT call LLM here
+    handle_message_task.delay(msg.model_dump())
+    # Gateway returns 200 implicitly — Slack is satisfied
+```
+
+### Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook
+
+**What:** Set `app.current_tenant` on the PostgreSQL connection immediately after acquiring it from the pool. RLS policies use this setting to filter every query automatically, so application code never manually adds `WHERE tenant_id = ...`.
+
+**When to use:** Every DB interaction in the Message Router and Agent Orchestrator.
+
+**Trade-offs:** Requires careful pool management — connections must be reset before returning to the pool. The `sqlalchemy-tenants` library or a custom `before_cursor_execute` event listener handles this.
+
+**Example:**
+```python
+# shared/rls.py
+from sqlalchemy import event
+
+@event.listens_for(engine.sync_engine, "before_cursor_execute")
+def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
+    tenant_id = get_current_tenant_id()  # from contextvars
+    if tenant_id:
+        cursor.execute(f"SET app.current_tenant = '{tenant_id}'")
+```
+
+### Pattern 3: Four-Layer Agent Memory
+
+**What:** Combine Redis (fast, ephemeral) for short-term context and pgvector (persistent, semantic) for long-term recall. The agent always has the last N messages in context (Redis sliding window). For deeper history, the orchestrator optionally queries pgvector for semantically similar past exchanges.
+
+**When to use:** Every agent invocation. Short-term is mandatory; long-term retrieval is triggered when conversation references past events or when context window pressure requires compressing history.
+
+**Trade-offs:** Two backends to operate and keep in sync. A background Celery task flushes Redis conversation state to PostgreSQL/pgvector asynchronously — if it fails, recent messages may not be permanently indexed, but conversation continuity is preserved by Redis until flush succeeds.
+
+**Example flow:**
+```
+User message arrives
+    → Load last 20 messages from Redis (short-term)
+    → Optionally: similarity search pgvector for relevant past conversations
+    → Build context window: [system prompt] + [retrieved history] + [recent messages]
+    → LLM call
+    → Append response to Redis sliding window
+    → Background task: embed + store to pgvector
+```
+
+### Pattern 4: LiteLLM Router as Internal Singleton
+
+**What:** The LLM Backend Pool exposes a single internal HTTP endpoint (`/complete`). All orchestrator workers call this endpoint. The LiteLLM Router behind it handles provider selection, fallback chains, and cost tracking without the orchestrator needing to know which model is used.
+
+**When to use:** All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator — always go through the pool.
+
+**Trade-offs:** Adds one network hop per LLM call. This is acceptable — LiteLLM's own benchmarks show 8ms P95 overhead at 1k RPS.
+
+**Configuration example:**
+```python
+# llm-pool/router.py
+from litellm import Router
+
+router = Router(
+    model_list=[
+        {"model_name": "fast", "litellm_params": {"model": "ollama/qwen3:8b", "api_base": "http://ollama:11434"}},
+        {"model_name": "quality", "litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}},
+        {"model_name": "quality", "litellm_params": {"model": "openai/gpt-4o"}},  # fallback
+    ],
+    fallbacks=[{"quality": ["fast"]}],  # cost-cap fallback
+    routing_strategy="latency-based-routing",
+)
+```
+
+---
+
+## Data Flow
+
+### Inbound Message Flow (Happy Path)
+
+```
+User sends message in Slack
+    │
+    ▼ HTTPS POST (Events API, HTTP mode)
+Channel Gateway
+    │ verify Slack signature (X-Slack-Signature)
+    │ normalize → KonstructMessage(id, tenant_id=None, channel=slack, ...)
+    │ check Redis idempotency key (message_id) → not seen
+    │ set Redis idempotency key (TTL 24h)
+    │ enqueue Celery task: handle_message(message)
+    │ return HTTP 200 immediately
+    ▼
+Celery Broker (Redis)
+    │
+    ▼
+Agent Orchestrator (Celery worker)
+    │ resolve tenant_id from channel_metadata.workspace_id → PostgreSQL
+    │ load agent config (persona, model preference, tools) → PostgreSQL (RLS-scoped)
+    │ load short-term memory → Redis (last 20 messages for this thread_id)
+    │ optionally query pgvector for relevant past context
+    │ assemble prompt: system_prompt + memory + current message
+    │ POST /complete → LLM Backend Pool
+    │     LiteLLM Router selects provider, executes, returns response
+    │ parse response: text reply OR tool_call
+    │ if tool_call: Tool Executor.run(tool_name, args) → external API → result
+    │ append result to prompt, re-call LLM if needed
+    │ write KonstructMessage + response to PostgreSQL (audit)
+    │ update Redis sliding window with new messages
+    │ background: embed messages → pgvector
+    ▼
+Channel Gateway (outbound)
+    │ POST message back to Slack via slack-sdk client.chat_postMessage()
+    ▼
+User sees response in Slack
+```
+
+### Tenant Resolution Flow
+
+```
+KonstructMessage.channel_metadata = {"workspace_id": "T123ABC"}
+    │
+    ▼ Router: SELECT tenant_id FROM channel_connections
+      WHERE channel_type = 'slack' AND external_org_id = 'T123ABC'
+    │
+    ▼ tenant_id resolved → stored in Python contextvar for RLS
+    │
+    All subsequent DB queries automatically scoped by RLS policy:
+    CREATE POLICY tenant_isolation ON agents
+        USING (tenant_id = current_setting('app.current_tenant')::uuid);
+```
+
+### Admin Portal Data Flow
+
+```
+Browser → Next.js App Router (RSC or client component)
+    │ TanStack Query useQuery / useMutation
+    ▼
+FastAPI REST API (authenticated endpoint)
+    │ JWT verification (NextAuth.js token)
+    │ Tenant scope enforced (user.tenant_id from token)
+    ▼
+PostgreSQL (RLS active: queries scoped to token's tenant)
+```
+
+### Billing Event Flow
+
+```
+Stripe subscription event (checkout.session.completed, etc.)
+    │ HTTPS POST to /webhooks/stripe
+    ▼
+Billing endpoint (FastAPI)
+    │ verify Stripe webhook signature (stripe-signature header)
+    │ parse event type
+    │ update tenants.subscription_status, plan_tier in PostgreSQL
+    │ if downgrade: update agent count limit, feature flags
+    ▼
+Next message processed by Router picks up new plan limits
+```
+
+---
+
+## Integration Points
+
+### External Services
+
+| Service | Integration Pattern | Key Requirements | Notes |
+|---------|---------------------|-----------------|-------|
+| **Slack** | HTTP Events API (webhook) — NOT Socket Mode | Public HTTPS URL with valid TLS; respond 200 within 3s | Socket Mode only for local dev. HTTP required for production and any future Marketplace distribution. Slack explicitly recommends HTTP for production reliability. |
+| **WhatsApp Cloud API** | Meta webhook (HTTPS POST) | TLS required (no self-signed); verify token for subscription; 200 response within 20s | Meta has fully deprecated on-premise option. Cloud API is now the only supported path. |
+| **LiteLLM** | In-process Python SDK OR sidecar HTTP proxy | Ollama running as Docker service; Anthropic/OpenAI API keys | Run as a separate service for isolation, or as an embedded router in the orchestrator. Separate service recommended for cost tracking and rate limiting. |
+| **Stripe** | Webhook (HTTPS POST) | Signature verification via `stripe.WebhookSignature`; idempotent event handlers | Use Stripe's hosted billing portal for self-service plan changes — avoids building custom subscription UI. |
+| **Ollama** | HTTP (Docker network) | GPU passthrough optional; accessible on internal Docker network | `http://ollama:11434` on compose network. No auth required on internal network. |
+
+### Internal Service Boundaries
+
+| Boundary | Communication | Protocol | Notes |
+|----------|---------------|----------|-------|
+| Gateway → Router | Direct HTTP POST (on same Docker network) or shared Celery queue | HTTP or Celery | For v1 simplicity, Gateway can enqueue directly to Celery, bypassing a separate Router HTTP call |
+| Router → Orchestrator | Celery task via Redis broker | Celery/Redis | Decouples ingress from processing; enables retries, dead-letter queue, and horizontal scaling of workers |
+| Orchestrator → LLM Pool | Internal HTTP POST | HTTP (FastAPI) | Keeps LLM routing concerns isolated; allows pool to be scaled independently |
+| Orchestrator → Channel Gateway (outbound) | Direct Slack/WhatsApp SDK calls | HTTPS (external) | Orchestrator holds channel credentials and calls the appropriate SDK directly for responses |
+| Portal → API | REST over HTTPS | HTTP (FastAPI) | Portal never accesses DB directly — all reads/writes through authenticated API |
+| Any service → PostgreSQL | SQLAlchemy async (asyncpg driver) | TCP | RLS enforced; tenant context set before every query |
+| Any service → Redis | aioredis / redis-py async | TCP | Namespaced by tenant_id to prevent accidental cross-tenant access |
+
+---
+
+## Build Order (Dependency Graph)
+
+Building the wrong component first creates integration debt. The correct order:
+
+```
+Phase 1 — Foundation (build this first)
+│
+├── 1. Shared models + DB schema (Pydantic models, SQLAlchemy models, Alembic migrations)
+│       └── Required by: every other service
+│
+├── 2. PostgreSQL + Redis + Docker Compose dev environment
+│       └── Required by: everything
+│
+├── 3. Channel Gateway — Slack adapter only
+│       └── Unblocks: end-to-end message flow testing
+│
+├── 4. Message Router — tenant resolution + rate limiting
+│       └── Unblocks: scoped agent invocation
+│
+├── 5. LLM Backend Pool — LiteLLM with Ollama + Anthropic
+│       └── Unblocks: agent can actually generate responses
+│
+└── 6. Agent Orchestrator — single agent, no tools, no memory
+        └── First working end-to-end: Slack message → LLM response → Slack reply
+
+Phase 2 — Feature Completeness
+│
+├── 7. Memory Layer (Redis short-term + pgvector long-term)
+│       └── Depends on: working orchestrator
+│
+├── 8. Tool Framework (registry + executor + first built-in tools)
+│       └── Depends on: working orchestrator
+│
+├── 9. WhatsApp channel adapter in Gateway
+│       └── Mostly isolated: same normalize.py, new channel handler
+│
+├── 10. Admin Portal (Next.js) — tenant CRUD + agent config
+│        └── Depends on: stable DB schema (stabilizes after step 8)
+│
+└── 11. Billing integration (Stripe webhooks + subscription enforcement)
+         └── Depends on: tenant model, admin portal
+```
+
+**Key dependency insight:** Steps 1-6 must be strictly sequential. Steps 7-11 can overlap after step 6 is working, but the portal (10) and billing (11) should not be started until the DB schema is stable, which happens after memory and tools are defined (steps 7-8).
+
+---
+
+## Scaling Considerations
+
+| Scale | Architecture Adjustments |
+|-------|--------------------------|
+| 0-100 tenants (beta) | Single Docker Compose host. One Celery worker process. All services on same machine. PostgreSQL RLS sufficient. |
+| 100-1k tenants | Scale Celery workers horizontally (multiple replicas). Separate Redis for broker vs. cache. Add connection pooling (PgBouncer). Consider moving Ollama to dedicated GPU host. |
+| 1k-10k tenants | Kubernetes (k3s). Multiple Gateway replicas behind load balancer. Celery worker auto-scaling. PostgreSQL read replica for analytics/portal queries. Qdrant for vector search at scale (pgvector starts to slow above ~1M embeddings). |
+| 10k+ tenants | Schema-per-tenant for Enterprise tier. Dedicated inference cluster. Multi-region PostgreSQL (Citus or regional replicas). |
+
+### Scaling Priorities
+
+1. **First bottleneck:** Celery workers during LLM call bursts. LLM calls are slow (2-30s). Workers pile up. Fix: increase worker count, implement per-tenant concurrency limits, add request coalescing for burst traffic.
+2. **Second bottleneck:** PostgreSQL connection exhaustion under concurrent tenant load. Fix: PgBouncer transaction-mode pooling. This is critical early because each Celery worker opens its own SQLAlchemy async session.
+3. **Third bottleneck:** pgvector query latency as embedding count grows. Fix: HNSW index tuning, then migrate to Qdrant for the vector tier while keeping PostgreSQL for structured data.
+
+---
+
+## Anti-Patterns
+
+### Anti-Pattern 1: Doing LLM Work Inside the Webhook Handler
+
+**What people do:** Call the LLM synchronously inside the Slack event handler or WhatsApp webhook endpoint and return the AI response as the HTTP reply.
+
+**Why it's wrong:** Slack requires HTTP 200 within 3 seconds. OpenAI/Anthropic calls routinely take 5-30 seconds. The webhook times out, Slack retries the event (causing duplicate processing), and the app gets flagged as unreliable.
+
+**Do this instead:** Acknowledge immediately (HTTP 200), enqueue to Celery, and send the AI response as a follow-up message via the channel API.
+
+### Anti-Pattern 2: Shared Redis Namespace Across Tenants
+
+**What people do:** Store conversation history as `redis.set("history:{thread_id}", ...)` without scoping by tenant.
+
+**Why it's wrong:** thread_id values can collide between tenants (e.g., two tenants both have Slack thread `C123/T456`). Tenant A reads Tenant B's conversation history.
+
+**Do this instead:** Always namespace Redis keys as `{tenant_id}:{key_type}:{resource_id}`. Example: `tenant_abc123:history:slack_C123_T456`.
+
+### Anti-Pattern 3: Calling LLM Providers Directly from Orchestrator
+
+**What people do:** Import the `anthropic` SDK directly in the orchestrator and call `anthropic.messages.create(...)`.
+
+**Why it's wrong:** Bypasses the LiteLLM router, losing fallback behavior, cost tracking, rate limit enforcement, and the ability to switch providers without touching orchestrator code.
+
+**Do this instead:** All LLM calls go through the LLM Backend Pool service (or an embedded LiteLLM Router). The orchestrator sends a generic `complete(messages, model_group="quality")` call and the pool handles provider selection.
+
+### Anti-Pattern 4: Fat Channel Gateway
+
+**What people do:** Add tenant resolution, rate limiting, and business logic to the gateway service to "simplify" the architecture.
+
+**Why it's wrong:** The gateway must respond in under 3 seconds and must stay stateless to handle channel-specific webhook verification. Mixing business logic in couples the gateway to your domain model and makes it impossible to scale independently.
+
+**Do this instead:** Gateway does exactly three things: verify signature, normalize message, enqueue. All business logic lives downstream.
+
+### Anti-Pattern 5: Embedding Agent Memory in PostgreSQL Without an Index
+
+**What people do:** Store conversation embeddings in a `vector` column in PostgreSQL and run similarity queries without an HNSW or IVFFlat index.
+
+**Why it's wrong:** pgvector without an index performs sequential scans. With more than ~50k embeddings per tenant, queries slow to seconds.
+
+**Do this instead:** Create HNSW indexes on vector columns from the start. `CREATE INDEX ON conversation_embeddings USING hnsw (embedding vector_cosine_ops);`
+
+---
+
+## Sources
+
+- [Slack: Comparing HTTP and Socket Mode](https://docs.slack.dev/apis/events-api/comparing-http-socket-mode/) — MEDIUM confidence (official Slack docs, accessed 2026-03-22)
+- [Slack: Using Socket Mode](https://docs.slack.dev/apis/events-api/using-socket-mode/) — HIGH confidence (official)
+- [LiteLLM: Router Architecture](https://docs.litellm.ai/docs/router_architecture) — HIGH confidence (official LiteLLM docs)
+- [LiteLLM: Routing and Load Balancing](https://docs.litellm.ai/docs/routing) — HIGH confidence (official)
+- [Redis: AI Agent Memory Architecture](https://redis.io/blog/ai-agent-memory-stateful-systems/) — MEDIUM confidence (official Redis blog)
+- [Redis: AI Agent Architecture 2026](https://redis.io/blog/ai-agent-architecture/) — MEDIUM confidence (official Redis blog)
+- [Crunchy Data: Row Level Security for Tenants](https://www.crunchydata.com/blog/row-level-security-for-tenants-in-postgres) — HIGH confidence (authoritative PostgreSQL resource)
+- [AWS: Multi-Tenant Data Isolation with PostgreSQL RLS](https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/) — MEDIUM confidence
+- [DEV Community: Building WhatsApp Business Bots](https://dev.to/achiya-automation/building-whatsapp-business-bots-with-the-official-api-architecture-webhooks-and-automation-1ce4) — LOW confidence (community post)
+- [ChatArchitect: Scalable Webhook Architecture for WhatsApp](https://www.chatarchitect.com/news/building-a-scalable-webhook-architecture-for-custom-whatsapp-solutions) — LOW confidence (community)
+- [PyWa documentation](https://pywa.readthedocs.io/en/1.6.0/) — MEDIUM confidence (library docs)
+- [fast.io: Multi-Tenant AI Agent Architecture](https://fast.io/resources/ai-agent-multi-tenant-architecture/) — LOW confidence (vendor blog)
+- [Microsoft Learn: AI Agent Orchestration Patterns](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns) — MEDIUM confidence
+- [DEV Community: Webhooks at Scale — Idempotency](https://dev.to/art_light/webhooks-at-scale-designing-an-idempotent-replay-safe-and-observable-webhook-system-7lk) — LOW confidence (community post, pattern well-corroborated)
+
+---
+
+*Architecture research for: Konstruct — channel-native AI workforce platform*
+*Researched: 2026-03-22*