konstruct/.planning/research/ARCHITECTURE.md

# Architecture Research

**Domain:** Channel-native AI workforce platform (multi-tenant, messaging-channel-first)
**Researched:** 2026-03-22
**Confidence:** HIGH (core patterns verified against official Slack docs, LiteLLM docs, pgvector community resources, and multiple production-pattern sources)

---

## Standard Architecture

### System Overview

```
External Channels (Slack, WhatsApp)
         │ HTTPS webhooks / Events API
         ▼
┌─────────────────────────────────────────────────────────────┐
│                    INGRESS LAYER                             │
│  ┌─────────────────────┐   ┌─────────────────────────────┐  │
│  │   Channel Gateway   │   │   Stripe Webhook Endpoint   │  │
│  │  (FastAPI service)  │   │   (billing events)          │  │
│  └──────────┬──────────┘   └─────────────────────────────┘  │
└─────────────│───────────────────────────────────────────────┘
              │ Normalized KonstructMessage
              ▼
┌─────────────────────────────────────────────────────────────┐
│                   MESSAGE ROUTING LAYER                      │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Message Router                                      │   │
│  │  - Tenant resolution (channel org → tenant_id)       │   │
│  │  - Per-tenant rate limiting (Redis token bucket)     │   │
│  │  - Context loading (tenant config, agent config)     │   │
│  │  - Idempotency check (Redis dedup key)               │   │
│  └────────────────────────┬─────────────────────────────┘   │
└───────────────────────────│─────────────────────────────────┘
                            │ Enqueued task (Celery)
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                  AGENT ORCHESTRATION LAYER                   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Agent Orchestrator (per-tenant Celery worker)       │   │
│  │  - Agent context assembly (persona, tools, memory)   │   │
│  │  - Conversation history retrieval (Redis + pgvector) │   │
│  │  - LLM call dispatch → LLM Backend Pool              │   │
│  │  - Tool execution (registry lookup + run)            │   │
│  │  - Response routing back to originating channel      │   │
│  └──────────────────────────────────────────────────────┘   │
└───────────────────┬────────────────────────┬────────────────┘
                    │                        │
                    ▼                        ▼
┌────────────────────────┐    ┌──────────────────────────────┐
│   LLM BACKEND POOL     │    │      TOOL EXECUTOR           │
│                        │    │  - Registry (tool → handler) │
│   LiteLLM Router       │    │  - Execution (async/sync)    │
│   ├── Ollama (local)   │    │  - Result capture + logging  │
│   ├── Anthropic API    │    └──────────────────────────────┘
│   └── OpenAI API       │
└────────────────────────┘
         │
         ▼
┌─────────────────────────────────────────────────────────────┐
│                     DATA LAYER                               │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────┐  │
│  │  PostgreSQL  │  │    Redis     │  │   MinIO / S3      │  │
│  │  (+ pgvector │  │  (sessions,  │  │  (file attach.,   │  │
│  │  + RLS)      │  │  rate limit, │  │  agent artifacts) │  │
│  │              │  │  task queue, │  │                   │  │
│  │  - tenants   │  │  pub/sub)    │  │                   │  │
│  │  - agents    │  └──────────────┘  └───────────────────┘  │
│  │  - messages  │                                            │
│  │  - tools     │                                            │
│  │  - billing   │                                            │
│  └──────────────┘                                            │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    ADMIN PORTAL                              │
│  Next.js 14 App Router (separate deployment)                 │
│  - Tenant management, agent config, billing, monitoring      │
│  - Reads/writes to FastAPI REST API (auth via JWT)           │
└─────────────────────────────────────────────────────────────┘
```

### Component Responsibilities

| Component | Responsibility | Communicates With |
|-----------|----------------|-------------------|
| **Channel Gateway** | Receive and verify inbound webhooks from Slack/WhatsApp; normalize to KonstructMessage; acknowledge within 3s | Message Router (HTTP or enqueue), Redis (idempotency) |
| **Message Router** | Resolve tenant from channel metadata; rate-limit per tenant; load tenant/agent context; enqueue to Celery | PostgreSQL (tenant lookup), Redis (rate limit + dedup), Celery (enqueue) |
| **Agent Orchestrator** | Assemble agent prompt from persona + memory + conversation history; call LLM; execute tools; emit response back to channel | LLM Backend Pool, Tool Executor, Memory Layer (Redis + pgvector), Channel Gateway (outbound) |
| **LLM Backend Pool** | Route LLM calls across Ollama/Anthropic/OpenAI with fallback, retry, and cost tracking | Ollama (local HTTP), Anthropic API, OpenAI API |
| **Tool Executor** | Maintain tool registry; execute tool calls from agent; return results; log every invocation for audit | External APIs (per-tool), PostgreSQL (audit log) |
| **Memory Layer** | Short-term: Redis sliding window for recent messages; Long-term: pgvector for semantic retrieval of past conversations | Redis, PostgreSQL (pgvector extension) |
| **Admin Portal** | UI for tenant CRUD, agent configuration, channel setup, billing, and usage monitoring | FastAPI REST API (authenticated) |
| **Billing Service** | Handle Stripe webhooks; update tenant subscription state; enforce feature limits based on plan | Stripe, PostgreSQL (subscription state) |

---

## Recommended Project Structure

```
konstruct/
├── packages/
│   ├── gateway/                     # Channel Gateway service (FastAPI)
│   │   ├── channels/
│   │   │   ├── slack.py             # Slack Events API handler (HTTP mode)
│   │   │   └── whatsapp.py          # WhatsApp Cloud API webhook handler
│   │   ├── normalize.py             # → KonstructMessage
│   │   ├── verify.py                # Signature verification per channel
│   │   └── main.py                  # FastAPI app, routes
│   │
│   ├── router/                      # Message Router service (FastAPI)
│   │   ├── tenant.py                # Channel org ID → tenant_id lookup
│   │   ├── ratelimit.py             # Redis token bucket per tenant
│   │   ├── idempotency.py           # Redis dedup (message_id key, TTL)
│   │   ├── context.py               # Load agent config from DB
│   │   └── main.py
│   │
│   ├── orchestrator/                # Agent Orchestrator (Celery workers)
│   │   ├── tasks.py                 # Celery task: handle_message
│   │   ├── agents/
│   │   │   ├── builder.py           # Assemble agent (persona + tools + memory)
│   │   │   └── runner.py            # LLM call loop (reason → tool → observe)
│   │   ├── memory/
│   │   │   ├── short_term.py        # Redis sliding window (last N messages)
│   │   │   └── long_term.py         # pgvector semantic search
│   │   ├── tools/
│   │   │   ├── registry.py          # Tool name → handler function mapping
│   │   │   ├── executor.py          # Async execution + audit logging
│   │   │   └── builtins/            # Built-in tools (web search, calendar, etc.)
│   │   └── main.py                  # Worker entry point
│   │
│   ├── llm-pool/                    # LLM Backend Pool (LiteLLM wrapper)
│   │   ├── router.py                # LiteLLM Router config (model groups)
│   │   ├── providers/
│   │   │   ├── ollama.py
│   │   │   ├── anthropic.py
│   │   │   └── openai.py
│   │   └── main.py                  # FastAPI app exposing /complete endpoint
│   │
│   ├── portal/                      # Next.js 14 Admin Dashboard
│   │   ├── app/
│   │   │   ├── (auth)/              # Login, signup routes
│   │   │   ├── dashboard/           # Post-auth layout
│   │   │   ├── tenants/             # Tenant management
│   │   │   ├── agents/              # Agent config
│   │   │   ├── billing/             # Stripe customer portal
│   │   │   └── api/                 # Next.js API routes (thin proxy or auth only)
│   │   ├── components/              # shadcn/ui components
│   │   └── lib/
│   │       ├── api.ts               # TanStack Query hooks + API client
│   │       └── auth.ts              # NextAuth.js config
│   │
│   └── shared/                      # Shared Python library (no service)
│       ├── models/
│       │   ├── message.py           # KonstructMessage Pydantic model
│       │   ├── tenant.py            # Tenant, Agent SQLAlchemy models
│       │   └── billing.py           # Subscription, Plan models
│       ├── db.py                    # SQLAlchemy async engine + session factory
│       ├── rls.py                   # SET app.current_tenant helper
│       └── config.py                # Pydantic Settings (env vars)
│
├── migrations/                      # Alembic (single migration history)
├── tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
├── docker-compose.yml               # All services + infra (Redis, PG, MinIO, Ollama)
└── pyproject.toml                   # uv workspace config, shared deps
```

### Structure Rationale

- **packages/ per service:** Each directory is a standalone FastAPI app or Celery worker with its own `main.py`. The boundary maps to a Docker container. Services communicate over HTTP or Celery/Redis, not in-process imports.
- **shared/:** Common Pydantic models and SQLAlchemy models live here to prevent duplication and drift. No business logic — only types, DB session factory, and config.
- **gateway/ channels/:** Each channel adapter is a separate file so adding a new channel (e.g., Telegram in v2) is an isolated change with no blast radius.
- **orchestrator/ memory/:** Short-term and long-term memory are separate modules because they have different backends, eviction policies, and query semantics.
- **portal/ app/:** Next.js App Router route grouping with `(auth)` for pre-auth pages and `dashboard/` for post-auth so layout boundaries are explicit.

---

## Architectural Patterns

### Pattern 1: Immediate-Acknowledge, Async-Process

**What:** The Channel Gateway returns HTTP 200 to Slack/WhatsApp within 3 seconds, without performing any LLM work. The actual processing is dispatched to Celery.

**When to use:** Always. Slack will retry and flag your app as unhealthy if it doesn't receive a 2xx within 3 seconds. WhatsApp Cloud API requires sub-20s acknowledgment.

**Trade-offs:** Adds Celery + Redis infrastructure requirement. The response to the user is sent as a follow-up message, not as the HTTP response — this is intentional and matches how Slack/WhatsApp users expect bots to behave anyway (typing indicator → message appears).

**Example:**
```python
# gateway/channels/slack.py
@app.event("message")
async def handle_message(event, say, client):
    # 1. Normalize immediately
    msg = normalize_slack(event)
    # 2. Verify idempotency (skip duplicate events)
    if await is_duplicate(msg.id):
        return
    # 3. Enqueue for async processing — DO NOT call LLM here
    handle_message_task.delay(msg.model_dump())
    # Gateway returns 200 implicitly — Slack is satisfied
```

### Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook

**What:** Set `app.current_tenant` on the PostgreSQL connection immediately after acquiring it from the pool. RLS policies use this setting to filter every query automatically, so application code never manually adds `WHERE tenant_id = ...`.

**When to use:** Every DB interaction in the Message Router and Agent Orchestrator.

**Trade-offs:** Requires careful pool management — connections must be reset before returning to the pool. The `sqlalchemy-tenants` library or a custom `before_cursor_execute` event listener handles this.

**Example:**
```python
# shared/rls.py
from sqlalchemy import event

@event.listens_for(engine.sync_engine, "before_cursor_execute")
def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
    tenant_id = get_current_tenant_id()  # from contextvars
    if tenant_id:
        cursor.execute(f"SET app.current_tenant = '{tenant_id}'")
```

### Pattern 3: Four-Layer Agent Memory

**What:** Combine Redis (fast, ephemeral) for short-term context and pgvector (persistent, semantic) for long-term recall. The agent always has the last N messages in context (Redis sliding window). For deeper history, the orchestrator optionally queries pgvector for semantically similar past exchanges.

**When to use:** Every agent invocation. Short-term is mandatory; long-term retrieval is triggered when conversation references past events or when context window pressure requires compressing history.

**Trade-offs:** Two backends to operate and keep in sync. A background Celery task flushes Redis conversation state to PostgreSQL/pgvector asynchronously — if it fails, recent messages may not be permanently indexed, but conversation continuity is preserved by Redis until flush succeeds.

**Example flow:**
```
User message arrives
    → Load last 20 messages from Redis (short-term)
    → Optionally: similarity search pgvector for relevant past conversations
    → Build context window: [system prompt] + [retrieved history] + [recent messages]
    → LLM call
    → Append response to Redis sliding window
    → Background task: embed + store to pgvector
```

### Pattern 4: LiteLLM Router as Internal Singleton

**What:** The LLM Backend Pool exposes a single internal HTTP endpoint (`/complete`). All orchestrator workers call this endpoint. The LiteLLM Router behind it handles provider selection, fallback chains, and cost tracking without the orchestrator needing to know which model is used.

**When to use:** All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator — always go through the pool.

**Trade-offs:** Adds one network hop per LLM call. This is acceptable — LiteLLM's own benchmarks show 8ms P95 overhead at 1k RPS.

**Configuration example:**
```python
# llm-pool/router.py
from litellm import Router

router = Router(
    model_list=[
        {"model_name": "fast", "litellm_params": {"model": "ollama/qwen3:8b", "api_base": "http://ollama:11434"}},
        {"model_name": "quality", "litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}},
        {"model_name": "quality", "litellm_params": {"model": "openai/gpt-4o"}},  # fallback
    ],
    fallbacks=[{"quality": ["fast"]}],  # cost-cap fallback
    routing_strategy="latency-based-routing",
)
```

---

## Data Flow

### Inbound Message Flow (Happy Path)

```
User sends message in Slack
    │
    ▼ HTTPS POST (Events API, HTTP mode)
Channel Gateway
    │ verify Slack signature (X-Slack-Signature)
    │ normalize → KonstructMessage(id, tenant_id=None, channel=slack, ...)
    │ check Redis idempotency key (message_id) → not seen
    │ set Redis idempotency key (TTL 24h)
    │ enqueue Celery task: handle_message(message)
    │ return HTTP 200 immediately
    ▼
Celery Broker (Redis)
    │
    ▼
Agent Orchestrator (Celery worker)
    │ resolve tenant_id from channel_metadata.workspace_id → PostgreSQL
    │ load agent config (persona, model preference, tools) → PostgreSQL (RLS-scoped)
    │ load short-term memory → Redis (last 20 messages for this thread_id)
    │ optionally query pgvector for relevant past context
    │ assemble prompt: system_prompt + memory + current message
    │ POST /complete → LLM Backend Pool
    │     LiteLLM Router selects provider, executes, returns response
    │ parse response: text reply OR tool_call
    │ if tool_call: Tool Executor.run(tool_name, args) → external API → result
    │ append result to prompt, re-call LLM if needed
    │ write KonstructMessage + response to PostgreSQL (audit)
    │ update Redis sliding window with new messages
    │ background: embed messages → pgvector
    ▼
Channel Gateway (outbound)
    │ POST message back to Slack via slack-sdk client.chat_postMessage()
    ▼
User sees response in Slack
```

### Tenant Resolution Flow

```
KonstructMessage.channel_metadata = {"workspace_id": "T123ABC"}
    │
    ▼ Router: SELECT tenant_id FROM channel_connections
      WHERE channel_type = 'slack' AND external_org_id = 'T123ABC'
    │
    ▼ tenant_id resolved → stored in Python contextvar for RLS
    │
    All subsequent DB queries automatically scoped by RLS policy:
    CREATE POLICY tenant_isolation ON agents
        USING (tenant_id = current_setting('app.current_tenant')::uuid);
```

### Admin Portal Data Flow

```
Browser → Next.js App Router (RSC or client component)
    │ TanStack Query useQuery / useMutation
    ▼
FastAPI REST API (authenticated endpoint)
    │ JWT verification (NextAuth.js token)
    │ Tenant scope enforced (user.tenant_id from token)
    ▼
PostgreSQL (RLS active: queries scoped to token's tenant)
```

### Billing Event Flow

```
Stripe subscription event (checkout.session.completed, etc.)
    │ HTTPS POST to /webhooks/stripe
    ▼
Billing endpoint (FastAPI)
    │ verify Stripe webhook signature (stripe-signature header)
    │ parse event type
    │ update tenants.subscription_status, plan_tier in PostgreSQL
    │ if downgrade: update agent count limit, feature flags
    ▼
Next message processed by Router picks up new plan limits
```

---

## Integration Points

### External Services

| Service | Integration Pattern | Key Requirements | Notes |
|---------|---------------------|-----------------|-------|
| **Slack** | HTTP Events API (webhook) — NOT Socket Mode | Public HTTPS URL with valid TLS; respond 200 within 3s | Socket Mode only for local dev. HTTP required for production and any future Marketplace distribution. Slack explicitly recommends HTTP for production reliability. |
| **WhatsApp Cloud API** | Meta webhook (HTTPS POST) | TLS required (no self-signed); verify token for subscription; 200 response within 20s | Meta has fully deprecated on-premise option. Cloud API is now the only supported path. |
| **LiteLLM** | In-process Python SDK OR sidecar HTTP proxy | Ollama running as Docker service; Anthropic/OpenAI API keys | Run as a separate service for isolation, or as an embedded router in the orchestrator. Separate service recommended for cost tracking and rate limiting. |
| **Stripe** | Webhook (HTTPS POST) | Signature verification via `stripe.WebhookSignature`; idempotent event handlers | Use Stripe's hosted billing portal for self-service plan changes — avoids building custom subscription UI. |
| **Ollama** | HTTP (Docker network) | GPU passthrough optional; accessible on internal Docker network | `http://ollama:11434` on compose network. No auth required on internal network. |

### Internal Service Boundaries

| Boundary | Communication | Protocol | Notes |
|----------|---------------|----------|-------|
| Gateway → Router | Direct HTTP POST (on same Docker network) or shared Celery queue | HTTP or Celery | For v1 simplicity, Gateway can enqueue directly to Celery, bypassing a separate Router HTTP call |
| Router → Orchestrator | Celery task via Redis broker | Celery/Redis | Decouples ingress from processing; enables retries, dead-letter queue, and horizontal scaling of workers |
| Orchestrator → LLM Pool | Internal HTTP POST | HTTP (FastAPI) | Keeps LLM routing concerns isolated; allows pool to be scaled independently |
| Orchestrator → Channel Gateway (outbound) | Direct Slack/WhatsApp SDK calls | HTTPS (external) | Orchestrator holds channel credentials and calls the appropriate SDK directly for responses |
| Portal → API | REST over HTTPS | HTTP (FastAPI) | Portal never accesses DB directly — all reads/writes through authenticated API |
| Any service → PostgreSQL | SQLAlchemy async (asyncpg driver) | TCP | RLS enforced; tenant context set before every query |
| Any service → Redis | aioredis / redis-py async | TCP | Namespaced by tenant_id to prevent accidental cross-tenant access |

---

## Build Order (Dependency Graph)

Building the wrong component first creates integration debt. The correct order:

```
Phase 1 — Foundation (build this first)
│
├── 1. Shared models + DB schema (Pydantic models, SQLAlchemy models, Alembic migrations)
│       └── Required by: every other service
│
├── 2. PostgreSQL + Redis + Docker Compose dev environment
│       └── Required by: everything
│
├── 3. Channel Gateway — Slack adapter only
│       └── Unblocks: end-to-end message flow testing
│
├── 4. Message Router — tenant resolution + rate limiting
│       └── Unblocks: scoped agent invocation
│
├── 5. LLM Backend Pool — LiteLLM with Ollama + Anthropic
│       └── Unblocks: agent can actually generate responses
│
└── 6. Agent Orchestrator — single agent, no tools, no memory
        └── First working end-to-end: Slack message → LLM response → Slack reply

Phase 2 — Feature Completeness
│
├── 7. Memory Layer (Redis short-term + pgvector long-term)
│       └── Depends on: working orchestrator
│
├── 8. Tool Framework (registry + executor + first built-in tools)
│       └── Depends on: working orchestrator
│
├── 9. WhatsApp channel adapter in Gateway
│       └── Mostly isolated: same normalize.py, new channel handler
│
├── 10. Admin Portal (Next.js) — tenant CRUD + agent config
│        └── Depends on: stable DB schema (stabilizes after step 8)
│
└── 11. Billing integration (Stripe webhooks + subscription enforcement)
         └── Depends on: tenant model, admin portal
```

**Key dependency insight:** Steps 1-6 must be strictly sequential. Steps 7-11 can overlap after step 6 is working, but the portal (10) and billing (11) should not be started until the DB schema is stable, which happens after memory and tools are defined (steps 7-8).

---

## Scaling Considerations

| Scale | Architecture Adjustments |
|-------|--------------------------|
| 0-100 tenants (beta) | Single Docker Compose host. One Celery worker process. All services on same machine. PostgreSQL RLS sufficient. |
| 100-1k tenants | Scale Celery workers horizontally (multiple replicas). Separate Redis for broker vs. cache. Add connection pooling (PgBouncer). Consider moving Ollama to dedicated GPU host. |
| 1k-10k tenants | Kubernetes (k3s). Multiple Gateway replicas behind load balancer. Celery worker auto-scaling. PostgreSQL read replica for analytics/portal queries. Qdrant for vector search at scale (pgvector starts to slow above ~1M embeddings). |
| 10k+ tenants | Schema-per-tenant for Enterprise tier. Dedicated inference cluster. Multi-region PostgreSQL (Citus or regional replicas). |

### Scaling Priorities

1. **First bottleneck:** Celery workers during LLM call bursts. LLM calls are slow (2-30s). Workers pile up. Fix: increase worker count, implement per-tenant concurrency limits, add request coalescing for burst traffic.
2. **Second bottleneck:** PostgreSQL connection exhaustion under concurrent tenant load. Fix: PgBouncer transaction-mode pooling. This is critical early because each Celery worker opens its own SQLAlchemy async session.
3. **Third bottleneck:** pgvector query latency as embedding count grows. Fix: HNSW index tuning, then migrate to Qdrant for the vector tier while keeping PostgreSQL for structured data.

---

## Anti-Patterns

### Anti-Pattern 1: Doing LLM Work Inside the Webhook Handler

**What people do:** Call the LLM synchronously inside the Slack event handler or WhatsApp webhook endpoint and return the AI response as the HTTP reply.

**Why it's wrong:** Slack requires HTTP 200 within 3 seconds. OpenAI/Anthropic calls routinely take 5-30 seconds. The webhook times out, Slack retries the event (causing duplicate processing), and the app gets flagged as unreliable.

**Do this instead:** Acknowledge immediately (HTTP 200), enqueue to Celery, and send the AI response as a follow-up message via the channel API.

### Anti-Pattern 2: Shared Redis Namespace Across Tenants

**What people do:** Store conversation history as `redis.set("history:{thread_id}", ...)` without scoping by tenant.

**Why it's wrong:** thread_id values can collide between tenants (e.g., two tenants both have Slack thread `C123/T456`). Tenant A reads Tenant B's conversation history.

**Do this instead:** Always namespace Redis keys as `{tenant_id}:{key_type}:{resource_id}`. Example: `tenant_abc123:history:slack_C123_T456`.

### Anti-Pattern 3: Calling LLM Providers Directly from Orchestrator

**What people do:** Import the `anthropic` SDK directly in the orchestrator and call `anthropic.messages.create(...)`.

**Why it's wrong:** Bypasses the LiteLLM router, losing fallback behavior, cost tracking, rate limit enforcement, and the ability to switch providers without touching orchestrator code.

**Do this instead:** All LLM calls go through the LLM Backend Pool service (or an embedded LiteLLM Router). The orchestrator sends a generic `complete(messages, model_group="quality")` call and the pool handles provider selection.

### Anti-Pattern 4: Fat Channel Gateway

**What people do:** Add tenant resolution, rate limiting, and business logic to the gateway service to "simplify" the architecture.

**Why it's wrong:** The gateway must respond in under 3 seconds and must stay stateless to handle channel-specific webhook verification. Mixing business logic in couples the gateway to your domain model and makes it impossible to scale independently.

**Do this instead:** Gateway does exactly three things: verify signature, normalize message, enqueue. All business logic lives downstream.

### Anti-Pattern 5: Embedding Agent Memory in PostgreSQL Without an Index

**What people do:** Store conversation embeddings in a `vector` column in PostgreSQL and run similarity queries without an HNSW or IVFFlat index.

**Why it's wrong:** pgvector without an index performs sequential scans. With more than ~50k embeddings per tenant, queries slow to seconds.

**Do this instead:** Create HNSW indexes on vector columns from the start. `CREATE INDEX ON conversation_embeddings USING hnsw (embedding vector_cosine_ops);`

---

## Sources

- [Slack: Comparing HTTP and Socket Mode](https://docs.slack.dev/apis/events-api/comparing-http-socket-mode/) — MEDIUM confidence (official Slack docs, accessed 2026-03-22)
- [Slack: Using Socket Mode](https://docs.slack.dev/apis/events-api/using-socket-mode/) — HIGH confidence (official)
- [LiteLLM: Router Architecture](https://docs.litellm.ai/docs/router_architecture) — HIGH confidence (official LiteLLM docs)
- [LiteLLM: Routing and Load Balancing](https://docs.litellm.ai/docs/routing) — HIGH confidence (official)
- [Redis: AI Agent Memory Architecture](https://redis.io/blog/ai-agent-memory-stateful-systems/) — MEDIUM confidence (official Redis blog)
- [Redis: AI Agent Architecture 2026](https://redis.io/blog/ai-agent-architecture/) — MEDIUM confidence (official Redis blog)
- [Crunchy Data: Row Level Security for Tenants](https://www.crunchydata.com/blog/row-level-security-for-tenants-in-postgres) — HIGH confidence (authoritative PostgreSQL resource)
- [AWS: Multi-Tenant Data Isolation with PostgreSQL RLS](https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/) — MEDIUM confidence
- [DEV Community: Building WhatsApp Business Bots](https://dev.to/achiya-automation/building-whatsapp-business-bots-with-the-official-api-architecture-webhooks-and-automation-1ce4) — LOW confidence (community post)
- [ChatArchitect: Scalable Webhook Architecture for WhatsApp](https://www.chatarchitect.com/news/building-a-scalable-webhook-architecture-for-custom-whatsapp-solutions) — LOW confidence (community)
- [PyWa documentation](https://pywa.readthedocs.io/en/1.6.0/) — MEDIUM confidence (library docs)
- [fast.io: Multi-Tenant AI Agent Architecture](https://fast.io/resources/ai-agent-multi-tenant-architecture/) — LOW confidence (vendor blog)
- [Microsoft Learn: AI Agent Orchestration Patterns](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns) — MEDIUM confidence
- [DEV Community: Webhooks at Scale — Idempotency](https://dev.to/art_light/webhooks-at-scale-designing-an-idempotent-replay-safe-and-observable-webhook-system-7lk) — LOW confidence (community post, pattern well-corroborated)

---

*Architecture research for: Konstruct — channel-native AI workforce platform*
*Researched: 2026-03-22*