Files
konstruct/.planning/research/ARCHITECTURE.md

512 lines
32 KiB
Markdown

# Architecture Research
**Domain:** Channel-native AI workforce platform (multi-tenant, messaging-channel-first)
**Researched:** 2026-03-22
**Confidence:** HIGH (core patterns verified against official Slack docs, LiteLLM docs, pgvector community resources, and multiple production-pattern sources)
---
## Standard Architecture
### System Overview
```
External Channels (Slack, WhatsApp)
│ HTTPS webhooks / Events API
┌─────────────────────────────────────────────────────────────┐
│ INGRESS LAYER │
│ ┌─────────────────────┐ ┌─────────────────────────────┐ │
│ │ Channel Gateway │ │ Stripe Webhook Endpoint │ │
│ │ (FastAPI service) │ │ (billing events) │ │
│ └──────────┬──────────┘ └─────────────────────────────┘ │
└─────────────│───────────────────────────────────────────────┘
│ Normalized KonstructMessage
┌─────────────────────────────────────────────────────────────┐
│ MESSAGE ROUTING LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Message Router │ │
│ │ - Tenant resolution (channel org → tenant_id) │ │
│ │ - Per-tenant rate limiting (Redis token bucket) │ │
│ │ - Context loading (tenant config, agent config) │ │
│ │ - Idempotency check (Redis dedup key) │ │
│ └────────────────────────┬─────────────────────────────┘ │
└───────────────────────────│─────────────────────────────────┘
│ Enqueued task (Celery)
┌─────────────────────────────────────────────────────────────┐
│ AGENT ORCHESTRATION LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Agent Orchestrator (per-tenant Celery worker) │ │
│ │ - Agent context assembly (persona, tools, memory) │ │
│ │ - Conversation history retrieval (Redis + pgvector) │ │
│ │ - LLM call dispatch → LLM Backend Pool │ │
│ │ - Tool execution (registry lookup + run) │ │
│ │ - Response routing back to originating channel │ │
│ └──────────────────────────────────────────────────────┘ │
└───────────────────┬────────────────────────┬────────────────┘
│ │
▼ ▼
┌────────────────────────┐ ┌──────────────────────────────┐
│ LLM BACKEND POOL │ │ TOOL EXECUTOR │
│ │ │ - Registry (tool → handler) │
│ LiteLLM Router │ │ - Execution (async/sync) │
│ ├── Ollama (local) │ │ - Result capture + logging │
│ ├── Anthropic API │ └──────────────────────────────┘
│ └── OpenAI API │
└────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ MinIO / S3 │ │
│ │ (+ pgvector │ │ (sessions, │ │ (file attach., │ │
│ │ + RLS) │ │ rate limit, │ │ agent artifacts) │ │
│ │ │ │ task queue, │ │ │ │
│ │ - tenants │ │ pub/sub) │ │ │ │
│ │ - agents │ └──────────────┘ └───────────────────┘ │
│ │ - messages │ │
│ │ - tools │ │
│ │ - billing │ │
│ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ ADMIN PORTAL │
│ Next.js 14 App Router (separate deployment) │
│ - Tenant management, agent config, billing, monitoring │
│ - Reads/writes to FastAPI REST API (auth via JWT) │
└─────────────────────────────────────────────────────────────┘
```
### Component Responsibilities
| Component | Responsibility | Communicates With |
|-----------|----------------|-------------------|
| **Channel Gateway** | Receive and verify inbound webhooks from Slack/WhatsApp; normalize to KonstructMessage; acknowledge within 3s | Message Router (HTTP or enqueue), Redis (idempotency) |
| **Message Router** | Resolve tenant from channel metadata; rate-limit per tenant; load tenant/agent context; enqueue to Celery | PostgreSQL (tenant lookup), Redis (rate limit + dedup), Celery (enqueue) |
| **Agent Orchestrator** | Assemble agent prompt from persona + memory + conversation history; call LLM; execute tools; emit response back to channel | LLM Backend Pool, Tool Executor, Memory Layer (Redis + pgvector), Channel Gateway (outbound) |
| **LLM Backend Pool** | Route LLM calls across Ollama/Anthropic/OpenAI with fallback, retry, and cost tracking | Ollama (local HTTP), Anthropic API, OpenAI API |
| **Tool Executor** | Maintain tool registry; execute tool calls from agent; return results; log every invocation for audit | External APIs (per-tool), PostgreSQL (audit log) |
| **Memory Layer** | Short-term: Redis sliding window for recent messages; Long-term: pgvector for semantic retrieval of past conversations | Redis, PostgreSQL (pgvector extension) |
| **Admin Portal** | UI for tenant CRUD, agent configuration, channel setup, billing, and usage monitoring | FastAPI REST API (authenticated) |
| **Billing Service** | Handle Stripe webhooks; update tenant subscription state; enforce feature limits based on plan | Stripe, PostgreSQL (subscription state) |
---
## Recommended Project Structure
```
konstruct/
├── packages/
│ ├── gateway/ # Channel Gateway service (FastAPI)
│ │ ├── channels/
│ │ │ ├── slack.py # Slack Events API handler (HTTP mode)
│ │ │ └── whatsapp.py # WhatsApp Cloud API webhook handler
│ │ ├── normalize.py # → KonstructMessage
│ │ ├── verify.py # Signature verification per channel
│ │ └── main.py # FastAPI app, routes
│ │
│ ├── router/ # Message Router service (FastAPI)
│ │ ├── tenant.py # Channel org ID → tenant_id lookup
│ │ ├── ratelimit.py # Redis token bucket per tenant
│ │ ├── idempotency.py # Redis dedup (message_id key, TTL)
│ │ ├── context.py # Load agent config from DB
│ │ └── main.py
│ │
│ ├── orchestrator/ # Agent Orchestrator (Celery workers)
│ │ ├── tasks.py # Celery task: handle_message
│ │ ├── agents/
│ │ │ ├── builder.py # Assemble agent (persona + tools + memory)
│ │ │ └── runner.py # LLM call loop (reason → tool → observe)
│ │ ├── memory/
│ │ │ ├── short_term.py # Redis sliding window (last N messages)
│ │ │ └── long_term.py # pgvector semantic search
│ │ ├── tools/
│ │ │ ├── registry.py # Tool name → handler function mapping
│ │ │ ├── executor.py # Async execution + audit logging
│ │ │ └── builtins/ # Built-in tools (web search, calendar, etc.)
│ │ └── main.py # Worker entry point
│ │
│ ├── llm-pool/ # LLM Backend Pool (LiteLLM wrapper)
│ │ ├── router.py # LiteLLM Router config (model groups)
│ │ ├── providers/
│ │ │ ├── ollama.py
│ │ │ ├── anthropic.py
│ │ │ └── openai.py
│ │ └── main.py # FastAPI app exposing /complete endpoint
│ │
│ ├── portal/ # Next.js 14 Admin Dashboard
│ │ ├── app/
│ │ │ ├── (auth)/ # Login, signup routes
│ │ │ ├── dashboard/ # Post-auth layout
│ │ │ ├── tenants/ # Tenant management
│ │ │ ├── agents/ # Agent config
│ │ │ ├── billing/ # Stripe customer portal
│ │ │ └── api/ # Next.js API routes (thin proxy or auth only)
│ │ ├── components/ # shadcn/ui components
│ │ └── lib/
│ │ ├── api.ts # TanStack Query hooks + API client
│ │ └── auth.ts # NextAuth.js config
│ │
│ └── shared/ # Shared Python library (no service)
│ ├── models/
│ │ ├── message.py # KonstructMessage Pydantic model
│ │ ├── tenant.py # Tenant, Agent SQLAlchemy models
│ │ └── billing.py # Subscription, Plan models
│ ├── db.py # SQLAlchemy async engine + session factory
│ ├── rls.py # SET app.current_tenant helper
│ └── config.py # Pydantic Settings (env vars)
├── migrations/ # Alembic (single migration history)
├── tests/
│ ├── unit/
│ ├── integration/
│ └── e2e/
├── docker-compose.yml # All services + infra (Redis, PG, MinIO, Ollama)
└── pyproject.toml # uv workspace config, shared deps
```
### Structure Rationale
- **packages/ per service:** Each directory is a standalone FastAPI app or Celery worker with its own `main.py`. The boundary maps to a Docker container. Services communicate over HTTP or Celery/Redis, not in-process imports.
- **shared/:** Common Pydantic models and SQLAlchemy models live here to prevent duplication and drift. No business logic — only types, DB session factory, and config.
- **gateway/ channels/:** Each channel adapter is a separate file so adding a new channel (e.g., Telegram in v2) is an isolated change with no blast radius.
- **orchestrator/ memory/:** Short-term and long-term memory are separate modules because they have different backends, eviction policies, and query semantics.
- **portal/ app/:** Next.js App Router route grouping with `(auth)` for pre-auth pages and `dashboard/` for post-auth so layout boundaries are explicit.
---
## Architectural Patterns
### Pattern 1: Immediate-Acknowledge, Async-Process
**What:** The Channel Gateway returns HTTP 200 to Slack/WhatsApp within 3 seconds, without performing any LLM work. The actual processing is dispatched to Celery.
**When to use:** Always. Slack will retry and flag your app as unhealthy if it doesn't receive a 2xx within 3 seconds. WhatsApp Cloud API requires sub-20s acknowledgment.
**Trade-offs:** Adds Celery + Redis infrastructure requirement. The response to the user is sent as a follow-up message, not as the HTTP response — this is intentional and matches how Slack/WhatsApp users expect bots to behave anyway (typing indicator → message appears).
**Example:**
```python
# gateway/channels/slack.py
@app.event("message")
async def handle_message(event, say, client):
# 1. Normalize immediately
msg = normalize_slack(event)
# 2. Verify idempotency (skip duplicate events)
if await is_duplicate(msg.id):
return
# 3. Enqueue for async processing — DO NOT call LLM here
handle_message_task.delay(msg.model_dump())
# Gateway returns 200 implicitly — Slack is satisfied
```
### Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook
**What:** Set `app.current_tenant` on the PostgreSQL connection immediately after acquiring it from the pool. RLS policies use this setting to filter every query automatically, so application code never manually adds `WHERE tenant_id = ...`.
**When to use:** Every DB interaction in the Message Router and Agent Orchestrator.
**Trade-offs:** Requires careful pool management — connections must be reset before returning to the pool. The `sqlalchemy-tenants` library or a custom `before_cursor_execute` event listener handles this.
**Example:**
```python
# shared/rls.py
from sqlalchemy import event
@event.listens_for(engine.sync_engine, "before_cursor_execute")
def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
tenant_id = get_current_tenant_id() # from contextvars
if tenant_id:
cursor.execute(f"SET app.current_tenant = '{tenant_id}'")
```
### Pattern 3: Four-Layer Agent Memory
**What:** Combine Redis (fast, ephemeral) for short-term context and pgvector (persistent, semantic) for long-term recall. The agent always has the last N messages in context (Redis sliding window). For deeper history, the orchestrator optionally queries pgvector for semantically similar past exchanges.
**When to use:** Every agent invocation. Short-term is mandatory; long-term retrieval is triggered when conversation references past events or when context window pressure requires compressing history.
**Trade-offs:** Two backends to operate and keep in sync. A background Celery task flushes Redis conversation state to PostgreSQL/pgvector asynchronously — if it fails, recent messages may not be permanently indexed, but conversation continuity is preserved by Redis until flush succeeds.
**Example flow:**
```
User message arrives
→ Load last 20 messages from Redis (short-term)
→ Optionally: similarity search pgvector for relevant past conversations
→ Build context window: [system prompt] + [retrieved history] + [recent messages]
→ LLM call
→ Append response to Redis sliding window
→ Background task: embed + store to pgvector
```
### Pattern 4: LiteLLM Router as Internal Singleton
**What:** The LLM Backend Pool exposes a single internal HTTP endpoint (`/complete`). All orchestrator workers call this endpoint. The LiteLLM Router behind it handles provider selection, fallback chains, and cost tracking without the orchestrator needing to know which model is used.
**When to use:** All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator — always go through the pool.
**Trade-offs:** Adds one network hop per LLM call. This is acceptable — LiteLLM's own benchmarks show 8ms P95 overhead at 1k RPS.
**Configuration example:**
```python
# llm-pool/router.py
from litellm import Router
router = Router(
model_list=[
{"model_name": "fast", "litellm_params": {"model": "ollama/qwen3:8b", "api_base": "http://ollama:11434"}},
{"model_name": "quality", "litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}},
{"model_name": "quality", "litellm_params": {"model": "openai/gpt-4o"}}, # fallback
],
fallbacks=[{"quality": ["fast"]}], # cost-cap fallback
routing_strategy="latency-based-routing",
)
```
---
## Data Flow
### Inbound Message Flow (Happy Path)
```
User sends message in Slack
▼ HTTPS POST (Events API, HTTP mode)
Channel Gateway
│ verify Slack signature (X-Slack-Signature)
│ normalize → KonstructMessage(id, tenant_id=None, channel=slack, ...)
│ check Redis idempotency key (message_id) → not seen
│ set Redis idempotency key (TTL 24h)
│ enqueue Celery task: handle_message(message)
│ return HTTP 200 immediately
Celery Broker (Redis)
Agent Orchestrator (Celery worker)
│ resolve tenant_id from channel_metadata.workspace_id → PostgreSQL
│ load agent config (persona, model preference, tools) → PostgreSQL (RLS-scoped)
│ load short-term memory → Redis (last 20 messages for this thread_id)
│ optionally query pgvector for relevant past context
│ assemble prompt: system_prompt + memory + current message
│ POST /complete → LLM Backend Pool
│ LiteLLM Router selects provider, executes, returns response
│ parse response: text reply OR tool_call
│ if tool_call: Tool Executor.run(tool_name, args) → external API → result
│ append result to prompt, re-call LLM if needed
│ write KonstructMessage + response to PostgreSQL (audit)
│ update Redis sliding window with new messages
│ background: embed messages → pgvector
Channel Gateway (outbound)
│ POST message back to Slack via slack-sdk client.chat_postMessage()
User sees response in Slack
```
### Tenant Resolution Flow
```
KonstructMessage.channel_metadata = {"workspace_id": "T123ABC"}
▼ Router: SELECT tenant_id FROM channel_connections
WHERE channel_type = 'slack' AND external_org_id = 'T123ABC'
▼ tenant_id resolved → stored in Python contextvar for RLS
All subsequent DB queries automatically scoped by RLS policy:
CREATE POLICY tenant_isolation ON agents
USING (tenant_id = current_setting('app.current_tenant')::uuid);
```
### Admin Portal Data Flow
```
Browser → Next.js App Router (RSC or client component)
│ TanStack Query useQuery / useMutation
FastAPI REST API (authenticated endpoint)
│ JWT verification (NextAuth.js token)
│ Tenant scope enforced (user.tenant_id from token)
PostgreSQL (RLS active: queries scoped to token's tenant)
```
### Billing Event Flow
```
Stripe subscription event (checkout.session.completed, etc.)
│ HTTPS POST to /webhooks/stripe
Billing endpoint (FastAPI)
│ verify Stripe webhook signature (stripe-signature header)
│ parse event type
│ update tenants.subscription_status, plan_tier in PostgreSQL
│ if downgrade: update agent count limit, feature flags
Next message processed by Router picks up new plan limits
```
---
## Integration Points
### External Services
| Service | Integration Pattern | Key Requirements | Notes |
|---------|---------------------|-----------------|-------|
| **Slack** | HTTP Events API (webhook) — NOT Socket Mode | Public HTTPS URL with valid TLS; respond 200 within 3s | Socket Mode only for local dev. HTTP required for production and any future Marketplace distribution. Slack explicitly recommends HTTP for production reliability. |
| **WhatsApp Cloud API** | Meta webhook (HTTPS POST) | TLS required (no self-signed); verify token for subscription; 200 response within 20s | Meta has fully deprecated on-premise option. Cloud API is now the only supported path. |
| **LiteLLM** | In-process Python SDK OR sidecar HTTP proxy | Ollama running as Docker service; Anthropic/OpenAI API keys | Run as a separate service for isolation, or as an embedded router in the orchestrator. Separate service recommended for cost tracking and rate limiting. |
| **Stripe** | Webhook (HTTPS POST) | Signature verification via `stripe.WebhookSignature`; idempotent event handlers | Use Stripe's hosted billing portal for self-service plan changes — avoids building custom subscription UI. |
| **Ollama** | HTTP (Docker network) | GPU passthrough optional; accessible on internal Docker network | `http://ollama:11434` on compose network. No auth required on internal network. |
### Internal Service Boundaries
| Boundary | Communication | Protocol | Notes |
|----------|---------------|----------|-------|
| Gateway → Router | Direct HTTP POST (on same Docker network) or shared Celery queue | HTTP or Celery | For v1 simplicity, Gateway can enqueue directly to Celery, bypassing a separate Router HTTP call |
| Router → Orchestrator | Celery task via Redis broker | Celery/Redis | Decouples ingress from processing; enables retries, dead-letter queue, and horizontal scaling of workers |
| Orchestrator → LLM Pool | Internal HTTP POST | HTTP (FastAPI) | Keeps LLM routing concerns isolated; allows pool to be scaled independently |
| Orchestrator → Channel Gateway (outbound) | Direct Slack/WhatsApp SDK calls | HTTPS (external) | Orchestrator holds channel credentials and calls the appropriate SDK directly for responses |
| Portal → API | REST over HTTPS | HTTP (FastAPI) | Portal never accesses DB directly — all reads/writes through authenticated API |
| Any service → PostgreSQL | SQLAlchemy async (asyncpg driver) | TCP | RLS enforced; tenant context set before every query |
| Any service → Redis | aioredis / redis-py async | TCP | Namespaced by tenant_id to prevent accidental cross-tenant access |
---
## Build Order (Dependency Graph)
Building the wrong component first creates integration debt. The correct order:
```
Phase 1 — Foundation (build this first)
├── 1. Shared models + DB schema (Pydantic models, SQLAlchemy models, Alembic migrations)
│ └── Required by: every other service
├── 2. PostgreSQL + Redis + Docker Compose dev environment
│ └── Required by: everything
├── 3. Channel Gateway — Slack adapter only
│ └── Unblocks: end-to-end message flow testing
├── 4. Message Router — tenant resolution + rate limiting
│ └── Unblocks: scoped agent invocation
├── 5. LLM Backend Pool — LiteLLM with Ollama + Anthropic
│ └── Unblocks: agent can actually generate responses
└── 6. Agent Orchestrator — single agent, no tools, no memory
└── First working end-to-end: Slack message → LLM response → Slack reply
Phase 2 — Feature Completeness
├── 7. Memory Layer (Redis short-term + pgvector long-term)
│ └── Depends on: working orchestrator
├── 8. Tool Framework (registry + executor + first built-in tools)
│ └── Depends on: working orchestrator
├── 9. WhatsApp channel adapter in Gateway
│ └── Mostly isolated: same normalize.py, new channel handler
├── 10. Admin Portal (Next.js) — tenant CRUD + agent config
│ └── Depends on: stable DB schema (stabilizes after step 8)
└── 11. Billing integration (Stripe webhooks + subscription enforcement)
└── Depends on: tenant model, admin portal
```
**Key dependency insight:** Steps 1-6 must be strictly sequential. Steps 7-11 can overlap after step 6 is working, but the portal (10) and billing (11) should not be started until the DB schema is stable, which happens after memory and tools are defined (steps 7-8).
---
## Scaling Considerations
| Scale | Architecture Adjustments |
|-------|--------------------------|
| 0-100 tenants (beta) | Single Docker Compose host. One Celery worker process. All services on same machine. PostgreSQL RLS sufficient. |
| 100-1k tenants | Scale Celery workers horizontally (multiple replicas). Separate Redis for broker vs. cache. Add connection pooling (PgBouncer). Consider moving Ollama to dedicated GPU host. |
| 1k-10k tenants | Kubernetes (k3s). Multiple Gateway replicas behind load balancer. Celery worker auto-scaling. PostgreSQL read replica for analytics/portal queries. Qdrant for vector search at scale (pgvector starts to slow above ~1M embeddings). |
| 10k+ tenants | Schema-per-tenant for Enterprise tier. Dedicated inference cluster. Multi-region PostgreSQL (Citus or regional replicas). |
### Scaling Priorities
1. **First bottleneck:** Celery workers during LLM call bursts. LLM calls are slow (2-30s). Workers pile up. Fix: increase worker count, implement per-tenant concurrency limits, add request coalescing for burst traffic.
2. **Second bottleneck:** PostgreSQL connection exhaustion under concurrent tenant load. Fix: PgBouncer transaction-mode pooling. This is critical early because each Celery worker opens its own SQLAlchemy async session.
3. **Third bottleneck:** pgvector query latency as embedding count grows. Fix: HNSW index tuning, then migrate to Qdrant for the vector tier while keeping PostgreSQL for structured data.
---
## Anti-Patterns
### Anti-Pattern 1: Doing LLM Work Inside the Webhook Handler
**What people do:** Call the LLM synchronously inside the Slack event handler or WhatsApp webhook endpoint and return the AI response as the HTTP reply.
**Why it's wrong:** Slack requires HTTP 200 within 3 seconds. OpenAI/Anthropic calls routinely take 5-30 seconds. The webhook times out, Slack retries the event (causing duplicate processing), and the app gets flagged as unreliable.
**Do this instead:** Acknowledge immediately (HTTP 200), enqueue to Celery, and send the AI response as a follow-up message via the channel API.
### Anti-Pattern 2: Shared Redis Namespace Across Tenants
**What people do:** Store conversation history as `redis.set("history:{thread_id}", ...)` without scoping by tenant.
**Why it's wrong:** thread_id values can collide between tenants (e.g., two tenants both have Slack thread `C123/T456`). Tenant A reads Tenant B's conversation history.
**Do this instead:** Always namespace Redis keys as `{tenant_id}:{key_type}:{resource_id}`. Example: `tenant_abc123:history:slack_C123_T456`.
### Anti-Pattern 3: Calling LLM Providers Directly from Orchestrator
**What people do:** Import the `anthropic` SDK directly in the orchestrator and call `anthropic.messages.create(...)`.
**Why it's wrong:** Bypasses the LiteLLM router, losing fallback behavior, cost tracking, rate limit enforcement, and the ability to switch providers without touching orchestrator code.
**Do this instead:** All LLM calls go through the LLM Backend Pool service (or an embedded LiteLLM Router). The orchestrator sends a generic `complete(messages, model_group="quality")` call and the pool handles provider selection.
### Anti-Pattern 4: Fat Channel Gateway
**What people do:** Add tenant resolution, rate limiting, and business logic to the gateway service to "simplify" the architecture.
**Why it's wrong:** The gateway must respond in under 3 seconds and must stay stateless to handle channel-specific webhook verification. Mixing business logic in couples the gateway to your domain model and makes it impossible to scale independently.
**Do this instead:** Gateway does exactly three things: verify signature, normalize message, enqueue. All business logic lives downstream.
### Anti-Pattern 5: Embedding Agent Memory in PostgreSQL Without an Index
**What people do:** Store conversation embeddings in a `vector` column in PostgreSQL and run similarity queries without an HNSW or IVFFlat index.
**Why it's wrong:** pgvector without an index performs sequential scans. With more than ~50k embeddings per tenant, queries slow to seconds.
**Do this instead:** Create HNSW indexes on vector columns from the start. `CREATE INDEX ON conversation_embeddings USING hnsw (embedding vector_cosine_ops);`
---
## Sources
- [Slack: Comparing HTTP and Socket Mode](https://docs.slack.dev/apis/events-api/comparing-http-socket-mode/) — MEDIUM confidence (official Slack docs, accessed 2026-03-22)
- [Slack: Using Socket Mode](https://docs.slack.dev/apis/events-api/using-socket-mode/) — HIGH confidence (official)
- [LiteLLM: Router Architecture](https://docs.litellm.ai/docs/router_architecture) — HIGH confidence (official LiteLLM docs)
- [LiteLLM: Routing and Load Balancing](https://docs.litellm.ai/docs/routing) — HIGH confidence (official)
- [Redis: AI Agent Memory Architecture](https://redis.io/blog/ai-agent-memory-stateful-systems/) — MEDIUM confidence (official Redis blog)
- [Redis: AI Agent Architecture 2026](https://redis.io/blog/ai-agent-architecture/) — MEDIUM confidence (official Redis blog)
- [Crunchy Data: Row Level Security for Tenants](https://www.crunchydata.com/blog/row-level-security-for-tenants-in-postgres) — HIGH confidence (authoritative PostgreSQL resource)
- [AWS: Multi-Tenant Data Isolation with PostgreSQL RLS](https://aws.amazon.com/blogs/database/multi-tenant-data-isolation-with-postgresql-row-level-security/) — MEDIUM confidence
- [DEV Community: Building WhatsApp Business Bots](https://dev.to/achiya-automation/building-whatsapp-business-bots-with-the-official-api-architecture-webhooks-and-automation-1ce4) — LOW confidence (community post)
- [ChatArchitect: Scalable Webhook Architecture for WhatsApp](https://www.chatarchitect.com/news/building-a-scalable-webhook-architecture-for-custom-whatsapp-solutions) — LOW confidence (community)
- [PyWa documentation](https://pywa.readthedocs.io/en/1.6.0/) — MEDIUM confidence (library docs)
- [fast.io: Multi-Tenant AI Agent Architecture](https://fast.io/resources/ai-agent-multi-tenant-architecture/) — LOW confidence (vendor blog)
- [Microsoft Learn: AI Agent Orchestration Patterns](https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/ai-agent-design-patterns) — MEDIUM confidence
- [DEV Community: Webhooks at Scale — Idempotency](https://dev.to/art_light/webhooks-at-scale-designing-an-idempotent-replay-safe-and-observable-webhook-system-7lk) — LOW confidence (community post, pattern well-corroborated)
---
*Architecture research for: Konstruct — channel-native AI workforce platform*
*Researched: 2026-03-22*