docs(06): research web chat phase — WebSocket, Redis pub-sub, channel adapter, portal UI

2026-03-25 10:01:45 -06:00
parent 1b086b8c82
commit 03e38f3692
1 changed files with 628 additions and 0 deletions
--- a/.planning/phases/06-web-chat/06-RESEARCH.md
+++ b/.planning/phases/06-web-chat/06-RESEARCH.md
@@ -0,0 +1,628 @@
+# Phase 6: Web Chat - Research
+
+**Researched:** 2026-03-25
+**Domain:** Real-time web chat (WebSocket + Redis pub-sub + new channel adapter + portal UI)
+**Confidence:** HIGH
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+- Dedicated `/chat` page (full-screen, not a floating widget)
+- Left sidebar: conversation list grouped by agent, with timestamps and last message preview
+- Right panel: active conversation with message bubbles (user right-aligned, agent left-aligned)
+- "New Conversation" button opens an agent picker (shows agents the user has access to)
+- Markdown rendering in agent messages
+- Image/document display inline (consistent with Phase 2 media support)
+- Typing indicator (animated dots) while waiting for agent response
+- All three roles can chat: platform admin, customer admin, customer operator
+- Users can only see/chat with agents belonging to tenants they have access to (RBAC)
+- Platform admins can chat with any agent across all tenants
+- Operators can chat (read-only restrictions do NOT apply to conversations)
+- One conversation thread per user-agent pair (matches per-user per-agent memory model)
+- Users can start new conversation (clears thread context) or continue existing one
+- Conversation list sorted by most recent, paginated for long histories
+- WebSocket connection for real-time, HTTP polling fallback if WebSocket unavailable
+- Gateway receives web chat message, normalizes to KonstructMessage (channel: "web"), dispatches through existing pipeline
+- Agent response pushed back via WebSocket
+- New "web" channel adapter in gateway alongside Slack and WhatsApp
+- channel_metadata includes: portal_user_id, tenant_id, conversation_id
+- Tenant resolution from the authenticated session (not from channel metadata like Slack workspace ID)
+- Outbound: push response via WebSocket connection keyed to conversation_id
+
+### Claude's Discretion
+- WebSocket library choice (native ws, Socket.IO, etc.)
+- Message bubble visual design
+- Conversation pagination strategy (infinite scroll vs load more)
+- Whether to show tool invocation indicators in chat (e.g., "Searching knowledge base...")
+- Agent avatar/icon in chat
+- Sound notification on new message
+- Mobile responsiveness approach
+
+### Deferred Ideas (OUT OF SCOPE)
+None raised.
+</user_constraints>
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| CHAT-01 | Users can open a chat window with any AI Employee and have a real-time conversation within the portal | WebSocket endpoint on FastAPI gateway + browser WebSocket client in portal chat page |
+| CHAT-02 | Web chat supports full agent pipeline — memory, tools, escalation, and media | "web" channel added to ChannelType enum; handle_message Celery task already handles all pipeline stages; _send_response needs "web" case via Redis pub-sub |
+| CHAT-03 | Conversation history persists and is visible when the user returns | New conversations DB table + pgvector already keyed per-user per-agent; history load on page visit |
+| CHAT-04 | Chat respects RBAC — users can only chat with agents belonging to tenants they have access to | require_tenant_member FastAPI dependency already exists; new chat API endpoints use same pattern; platform_admin bypasses tenant check |
+| CHAT-05 | Chat interface feels responsive — typing indicators, message streaming or fast response display | Typing indicator via WebSocket "typing" event immediately on message send; WebSocket pushes final response when Celery completes |
+</phase_requirements>
+
+---
+
+## Summary
+
+Phase 6 adds a web chat channel to the Konstruct portal — the first channel that originates inside the portal itself rather than from an external messaging platform. The architecture follows the same channel adapter pattern established in Phases 1 and 2: a new "web" adapter in the gateway normalizes portal messages into KonstructMessage format and dispatches them to the existing Celery pipeline. The key new infrastructure is a WebSocket endpoint on the gateway and a Redis pub-sub channel that bridges the Celery worker's response delivery back to the WebSocket connection.
+
+The frontend is a new `/chat` route in the Next.js portal. It uses the native browser WebSocket API (no additional library required) with a React hook managing connection lifecycle. The UI requires one new shadcn/ui component not yet in the project (ScrollArea) and markdown rendering (react-markdown is not yet installed). Both are straightforward additions.
+
+The most important constraint to keep in mind during planning: the Celery worker and the FastAPI gateway are separate processes. The Celery task cannot call back to the WebSocket connection directly. The correct pattern is Celery publishes the response to a Redis pub-sub channel; the gateway WebSocket handler subscribes to that channel and forwards to the browser. This Redis pub-sub bridge is the critical new piece that does not exist yet.
+
+**Primary recommendation:** Use FastAPI native WebSocket + Redis pub-sub bridge for cross-process response delivery. No additional Python WebSocket libraries needed. Use native browser WebSocket API in the portal. Add react-markdown for markdown rendering.
+
+---
+
+## Standard Stack
+
+### Core
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| FastAPI WebSocket | Built into fastapi[standard] 0.135.2 | WebSocket endpoint on gateway | Already installed, Starlette-native, zero new deps |
+| redis.asyncio pub-sub | redis 5.0.0+ (already installed) | Bridge Celery response → WebSocket | Cross-process response delivery; already used everywhere in this codebase |
+| Browser WebSocket API | Native (no library) | Portal WebSocket client | Works in all modern browsers, zero bundle cost |
+| react-markdown | 9.x | Render agent markdown responses | Standard React markdown renderer; supports GFM, syntax highlighting |
+| remark-gfm | 4.x | GitHub Flavored Markdown support | Tables, strikethrough, task lists in agent responses |
+
+### Supporting
+
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| @radix-ui/react-scroll-area (via shadcn) | already available via @base-ui/react | Scrollable message container | Message list that auto-scrolls to bottom |
+| lucide-react | already installed | Icons (typing dots, send button, agent avatar) | Already used throughout portal |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| Redis pub-sub bridge | Socket.IO | Socket.IO adds significant bundle weight and complexity; Redis pub-sub is already used in this codebase (rate limiting, session, escalation) |
+| React native WebSocket | socket.io-client | Same reason — unnecessary dependency when native WebSocket is sufficient |
+| react-markdown | marked + dangerouslySetInnerHTML | react-markdown is React-native and safe; marked requires XSS sanitization as a separate step |
+
+**Installation:**
+```bash
+# Portal
+cd packages/portal && npm install react-markdown remark-gfm
+
+# Backend: no new dependencies needed
+# FastAPI WebSocket is in fastapi[standard] already installed
+# redis pub-sub is in redis 5.0.0 already installed
+```
+
+---
+
+## Architecture Patterns
+
+### Recommended Project Structure
+
+New files added in this phase:
+
+```
+packages/
+├── gateway/gateway/channels/
+│   └── web.py                    # Web channel adapter + WebSocket endpoint + pub-sub subscriber
+├── shared/shared/
+│   ├── models/message.py         # Add ChannelType.WEB = "web"
+│   ├── redis_keys.py             # Add webchat_response_key(tenant_id, conversation_id)
+│   └── api/
+│       └── chat.py               # REST API: list conversations, get history, create/reset
+├── migrations/versions/
+│   └── 008_web_chat.py           # conversations table
+└── packages/portal/
+    ├── app/(dashboard)/chat/
+    │   └── page.tsx              # Chat page (client component)
+    ├── components/
+    │   ├── chat-sidebar.tsx      # Conversation list sidebar
+    │   ├── chat-window.tsx       # Active conversation + message bubbles
+    │   ├── chat-message.tsx      # Single message bubble with markdown
+    │   └── typing-indicator.tsx  # Animated dots
+    └── lib/
+        ├── api.ts                # Add chat API types + functions
+        ├── queries.ts            # Add useConversations, useConversationHistory
+        └── use-chat-socket.ts    # WebSocket lifecycle hook
+```
+
+### Pattern 1: Redis Pub-Sub Response Bridge
+
+**What:** Celery task (separate process) completes LLM response and needs to push it to a WebSocket connection held by the gateway FastAPI process. Redis pub-sub is the standard cross-process channel.
+
+**When to use:** Any time a background worker needs to push a result back to a long-lived connection.
+
+**Flow:**
+1. Browser sends message via WebSocket to gateway
+2. Gateway dispatches `handle_message.delay(payload)` (identical to Slack/WhatsApp)
+3. Gateway subscribes to Redis channel `{tenant_id}:webchat:response:{conversation_id}` and waits
+4. Celery's `_send_response` for "web" channel publishes response to same Redis channel
+5. Gateway receives pub-sub message, pushes to browser WebSocket
+
+**Example — gateway side:**
+```python
+# Source: redis.asyncio pub-sub docs + existing redis usage in this codebase
+import redis.asyncio as aioredis
+from fastapi import WebSocket
+
+async def websocket_wait_for_response(
+    ws: WebSocket,
+    redis_url: str,
+    response_channel: str,
+    timeout: float = 60.0,
+) -> None:
+    """Subscribe to response channel and forward to WebSocket."""
+    r = aioredis.from_url(redis_url)
+    pubsub = r.pubsub()
+    try:
+        await pubsub.subscribe(response_channel)
+        # Wait for response with timeout
+        async for message in pubsub.listen():
+            if message["type"] == "message":
+                await ws.send_text(message["data"])
+                return
+    finally:
+        await pubsub.unsubscribe(response_channel)
+        await pubsub.aclose()
+        await r.aclose()
+```
+
+**Example — Celery task side (in `_send_response`):**
+```python
+# Add "web" case to _send_response in orchestrator/tasks.py
+elif channel_str == "web":
+    conversation_id: str = extras.get("conversation_id", "") or ""
+    tenant_id: str = extras.get("tenant_id", "") or ""
+    if not conversation_id or not tenant_id:
+        logger.warning("_send_response: web channel missing conversation_id or tenant_id")
+        return
+    response_channel = webchat_response_key(tenant_id, conversation_id)
+    publish_redis = aioredis.from_url(settings.redis_url)
+    try:
+        await publish_redis.publish(response_channel, json.dumps({
+            "type": "response",
+            "text": text,
+            "conversation_id": conversation_id,
+        }))
+    finally:
+        await publish_redis.aclose()
+```
+
+### Pattern 2: FastAPI WebSocket Endpoint
+
+**What:** Native FastAPI WebSocket with auth validation from headers. Gateway already holds the Redis client at startup; WebSocket handler uses it.
+
+**When to use:** Every web chat message from the portal browser.
+
+```python
+# Source: FastAPI WebSocket docs (verified — WebSocket import is in fastapi package)
+from fastapi import WebSocket, WebSocketDisconnect, Depends
+from fastapi.websockets import WebSocketState
+
+@app.websocket("/chat/ws/{conversation_id}")
+async def chat_websocket(
+    conversation_id: str,
+    websocket: WebSocket,
+) -> None:
+    await websocket.accept()
+    try:
+        while True:
+            data = await websocket.receive_json()
+            # Validate auth headers from data["auth"]
+            # Normalize to KonstructMessage, dispatch to Celery
+            # Subscribe to Redis response channel
+            # Push response back to websocket
+    except WebSocketDisconnect:
+        pass
+```
+
+**Critical note:** WebSocket headers are available at handshake time via `websocket.headers`. Auth token or RBAC headers should be sent as custom headers in the browser WebSocket constructor (not supported by all browsers) OR as a first message after connection. The established pattern in this project is to send RBAC headers as `X-Portal-User-Id`, `X-Portal-User-Role`, `X-Portal-Tenant-Id`. For WebSocket, send these as a JSON "auth" message immediately after connection (handshake headers are unreliable with the browser WebSocket API).
+
+### Pattern 3: Browser WebSocket Hook
+
+**What:** React hook that manages WebSocket connection lifecycle (connect on mount, reconnect on disconnect, send/receive messages).
+
+```typescript
+// packages/portal/lib/use-chat-socket.ts
+// Native browser WebSocket — no library needed
+"use client";
+
+import { useEffect, useRef, useCallback, useState } from "react";
+
+interface ChatSocketOptions {
+  conversationId: string;
+  onMessage: (text: string) => void;
+  onTyping: (isTyping: boolean) => void;
+  authHeaders: { userId: string; role: string; tenantId: string | null };
+}
+
+export function useChatSocket({
+  conversationId,
+  onMessage,
+  onTyping,
+  authHeaders,
+}: ChatSocketOptions) {
+  const wsRef = useRef<WebSocket | null>(null);
+  const [isConnected, setIsConnected] = useState(false);
+
+  const send = useCallback((text: string) => {
+    if (wsRef.current?.readyState === WebSocket.OPEN) {
+      wsRef.current.send(JSON.stringify({
+        type: "message",
+        text,
+        auth: authHeaders,
+      }));
+      onTyping(true);  // Show typing indicator immediately
+    }
+  }, [authHeaders, onTyping]);
+
+  useEffect(() => {
+    const wsUrl = `${process.env.NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`;
+    const ws = new WebSocket(wsUrl);
+    wsRef.current = ws;
+
+    ws.onopen = () => setIsConnected(true);
+    ws.onclose = () => setIsConnected(false);
+    ws.onmessage = (event) => {
+      const data = JSON.parse(event.data as string);
+      if (data.type === "response") {
+        onTyping(false);
+        onMessage(data.text as string);
+      }
+    };
+
+    return () => ws.close();
+  }, [conversationId, onMessage, onTyping]);
+
+  return { send, isConnected };
+}
+```
+
+### Pattern 4: Conversation Persistence (New DB Table)
+
+**What:** A `conversations` table to persist chat history visible on return visits.
+
+**When to use:** Every web chat message — store each turn in the DB.
+
+```python
+# New ORM model — migration 008
+class WebConversation(Base):
+    """Persistent conversation thread for portal web chat."""
+    __tablename__ = "web_conversations"
+
+    id: Mapped[uuid.UUID] = ...
+    tenant_id: Mapped[uuid.UUID] = ...  # RLS enforced
+    agent_id: Mapped[uuid.UUID] = ...
+    user_id: Mapped[uuid.UUID] = ...    # portal user UUID (from Auth.js session)
+    created_at: Mapped[datetime] = ...
+    updated_at: Mapped[datetime] = ...  # used for sort order
+
+    __table_args__ = (
+        UniqueConstraint("tenant_id", "agent_id", "user_id"),  # one thread per pair
+    )
+
+
+class WebConversationMessage(Base):
+    """Individual message within a web conversation."""
+    __tablename__ = "web_conversation_messages"
+
+    id: Mapped[uuid.UUID] = ...
+    conversation_id: Mapped[uuid.UUID] = ForeignKey("web_conversations.id")
+    tenant_id: Mapped[uuid.UUID] = ...  # RLS enforced
+    role: Mapped[str] = ...             # "user" | "assistant"
+    content: Mapped[str] = ...
+    created_at: Mapped[datetime] = ...
+```
+
+**Note:** The `user_id` for web chat is the portal user's UUID from Auth.js — different from the Slack user ID string used in existing memory. The Redis memory key `memory:short:{agent_id}:{user_id}` will use the portal user's UUID string as `user_id`, keeping it compatible with the existing memory system.
+
+### Pattern 5: Conversation REST API
+
+**What:** REST endpoints for listing conversations, loading history, and resetting. This is separate from the WebSocket endpoint.
+
+```
+GET  /api/portal/chat/conversations?tenant_id={id}     — list all conversations for user
+GET  /api/portal/chat/conversations/{id}/messages      — load history (paginated)
+POST /api/portal/chat/conversations                    — create new or get-or-create
+DELETE /api/portal/chat/conversations/{id}             — reset (delete messages, keep thread)
+```
+
+### Anti-Patterns to Avoid
+
+- **Streaming token-by-token:** The requirements doc explicitly marks "Real-time token streaming in chat" as Out of Scope (consistent with Slack/WhatsApp — they don't support partial messages). The typing indicator shows while the full LLM call runs; the complete response arrives as one message.
+- **WebSocket auth via URL query params:** Never put tokens/user IDs in the WebSocket URL. Use JSON message after connection.
+- **Calling Celery result backend from WebSocket handler:** Celery result backends add latency and coupling. Use Redis pub-sub directly.
+- **One WebSocket connection per page load (not per conversation):** The connection should be scoped per conversation_id so reconnect on conversation switch is clean.
+- **Storing conversation history only in Redis:** Redis memory (sliding window) is the agent's working context. The DB `web_conversation_messages` table is what shows up when the user returns to the chat page. These are separate concerns.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Markdown rendering | Custom regex parser | react-markdown + remark-gfm | Handles edge cases, escapes XSS, supports all GFM |
+| WebSocket reconnection | Custom exponential backoff | Simple reconnect on close (sufficient for v1) | LLM calls are short; connections don't stay open for hours |
+| Auth for WebSocket | Custom token scheme | Send auth as first JSON message using existing RBAC headers | Consistent with existing `X-Portal-*` header pattern |
+| Cross-process response delivery | Shared memory / HTTP callback | Redis pub-sub | Already in use; correct pattern for Celery → FastAPI bridge |
+
+**Key insight:** The web channel adapter is the only genuinely new piece of infrastructure. Everything else — RBAC, memory, tool calling, escalation, audit — already works and processes messages tagged with any channel type. Adding `ChannelType.WEB = "web"` and a new `_send_response` branch is sufficient to wire the whole pipeline.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: WebSocket Auth — Browser API Limitation
+
+**What goes wrong:** The browser's native `WebSocket` constructor does not support custom headers. Code that tries `new WebSocket(url, { headers: {...} })` fails silently or raises a TypeError.
+
+**Why it happens:** The WebSocket spec only allows specifying subprotocols as the second argument, not headers. This is a deliberate browser security decision.
+
+**How to avoid:** Send auth information as a JSON "auth" message immediately after connection opens. The FastAPI WebSocket handler should require this first message before processing any chat messages. This is established practice for browser WebSocket auth.
+
+**Warning signs:** Tests that use httpx websocket client work fine (httpx supports headers) but the browser connection is rejected.
+
+### Pitfall 2: Celery Sync Context in Async `_send_response`
+
+**What goes wrong:** `_send_response` is an async function called from `asyncio.run()` inside the sync Celery task. Adding Redis pub-sub code there requires creating a new async Redis client per task, which is the existing pattern — but forgetting `await publish_redis.aclose()` leaks connections.
+
+**Why it happens:** The "Celery tasks MUST be sync def" constraint (STATE.md) means we're always bridging sync→async via `asyncio.run()`. Every async resource must be explicitly closed.
+
+**How to avoid:** Follow the existing pattern in `_process_message`: use `try/finally` around every `aioredis.from_url()` call to ensure `aclose()` always runs.
+
+**Warning signs:** Redis connection count grows over time; "too many connections" errors in production.
+
+### Pitfall 3: Conversation ID vs Thread ID Confusion
+
+**What goes wrong:** The KonstructMessage `thread_id` field is used by the memory system to scope Redis sliding window. For web chat, `thread_id` should be the `conversation_id` (UUID) from the `web_conversations` table. If this is set incorrectly (e.g., to the portal user_id), all conversations for a user share one memory window.
+
+**Why it happens:** Slack sets `thread_id` to `thread_ts` (string). WhatsApp sets it to `wa_id`. Web chat must set it to `conversation_id` (UUID string) — one distinct value per conversation.
+
+**How to avoid:** The web channel normalizer should set `thread_id = conversation_id` in the KonstructMessage. The `user_id` for memory key construction comes from `sender.user_id` (portal user UUID string). The combination `tenant_id + agent_id + user_id` (Redis memory key) matches correctly.
+
+### Pitfall 4: New Conversation vs Continue — Race Condition
+
+**What goes wrong:** User clicks "New Conversation" while a response is still in flight for the old conversation. The old conversation's pub-sub response arrives and updates the new conversation's state.
+
+**Why it happens:** The WebSocket is keyed to `conversation_id`. When the user resets the thread, a new `conversation_id` is created. The old pub-sub subscription must be cleaned up before subscribing to the new one.
+
+**How to avoid:** When the user creates a new conversation: (1) close/unmount the old WebSocket connection, (2) create a new `web_conversations` row via REST API (getting a new UUID), (3) connect new WebSocket to the new conversation_id. React's `useEffect` cleanup handles this naturally when `conversationId` changes.
+
+### Pitfall 5: `ChannelType.WEB` Missing from DB CHECK Constraint
+
+**What goes wrong:** Adding `WEB = "web"` to the Python `ChannelType` StrEnum does not automatically update the PostgreSQL CHECK constraint on the `channel_type` column. Existing data is fine, but inserting new records with `channel = "web"` fails at the DB level.
+
+**Why it happens:** STATE.md documents the decision: "channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL." The CHECK constraint lists allowed values and must be updated via migration.
+
+**How to avoid:** Migration 008 must ALTER the CHECK constraint on any affected tables to include `"web"`. Check which tables have `channel_type` constraints: `channel_connections` (stores active channel configs per tenant). The `conversation_embeddings` and audit tables use `TEXT` without CHECK, so only `channel_connections` needs the update.
+
+**Warning signs:** `CheckViolation` error from PostgreSQL when the gateway tries to normalize a web message.
+
+### Pitfall 6: React 19 + Next.js 16 `use()` for Async Data
+
+**What goes wrong:** Using `useState` + `useEffect` to fetch conversation history in a client component works but misses the React 19 preferred pattern.
+
+**Why it happens:** React 19 introduces `use()` for Promises directly in components (TanStack Query handles this abstraction). The existing codebase already uses TanStack Query uniformly — don't break this pattern.
+
+**How to avoid:** Add `useConversations` and `useConversationHistory` hooks in `queries.ts` following the existing pattern (e.g., `useAgents`, `useTenants`). Use `useQuery` from `@tanstack/react-query`.
+
+---
+
+## Code Examples
+
+Verified patterns from existing codebase:
+
+### Adding ChannelType.WEB to the enum
+```python
+# packages/shared/shared/models/message.py
+# Source: existing file — add one line
+class ChannelType(StrEnum):
+    SLACK = "slack"
+    WHATSAPP = "whatsapp"
+    MATTERMOST = "mattermost"
+    ROCKETCHAT = "rocketchat"
+    TEAMS = "teams"
+    TELEGRAM = "telegram"
+    SIGNAL = "signal"
+    WEB = "web"          # Add this line
+```
+
+### Adding webchat Redis key to redis_keys.py
+```python
+# packages/shared/shared/redis_keys.py
+# Source: existing file pattern
+def webchat_response_key(tenant_id: str, conversation_id: str) -> str:
+    """
+    Redis pub-sub channel for web chat response delivery.
+
+    Published by Celery task after LLM response; subscribed by WebSocket handler.
+    """
+    return f"{tenant_id}:webchat:response:{conversation_id}"
+```
+
+### Web channel extras in handle_message
+```python
+# packages/orchestrator/orchestrator/tasks.py
+# Source: existing extras pattern (line 246-254)
+# Add to handle_message alongside existing Slack/WhatsApp extras:
+conversation_id: str = message_data.pop("conversation_id", "") or ""
+portal_user_id: str = message_data.pop("portal_user_id", "") or ""
+
+# Add to extras dict (line 269-274):
+extras: dict[str, Any] = {
+    "placeholder_ts": placeholder_ts,
+    "channel_id": channel_id,
+    "phone_number_id": phone_number_id,
+    "bot_token": bot_token,
+    "wa_id": wa_id,
+    "conversation_id": conversation_id,
+    "portal_user_id": portal_user_id,
+}
+```
+
+### TanStack Query hook pattern (follows existing)
+```typescript
+// packages/portal/lib/queries.ts
+// Source: existing useAgents pattern
+export function useConversations(tenantId: string) {
+  return useQuery({
+    queryKey: ["conversations", tenantId],
+    queryFn: () => api.get<ConversationsResponse>(`/api/portal/chat/conversations?tenant_id=${tenantId}`),
+    enabled: !!tenantId,
+  });
+}
+
+export function useConversationHistory(conversationId: string) {
+  return useQuery({
+    queryKey: ["conversation-history", conversationId],
+    queryFn: () => api.get<MessagesResponse>(`/api/portal/chat/conversations/${conversationId}/messages`),
+    enabled: !!conversationId,
+  });
+}
+```
+
+### FastAPI WebSocket endpoint in gateway main.py
+```python
+# packages/gateway/gateway/main.py — add alongside existing routers
+# Source: FastAPI WebSocket API (verified available in fastapi 0.135.2)
+from gateway.channels.web import chat_websocket_router
+app.include_router(chat_websocket_router)
+```
+
+### RBAC enforcement in chat REST API
+```python
+# packages/shared/shared/api/chat.py
+# Source: existing pattern from rbac.py + portal.py
+@router.get("/api/portal/chat/conversations")
+async def list_conversations(
+    tenant_id: UUID,
+    caller: PortalCaller = Depends(get_portal_caller),
+    session: AsyncSession = Depends(get_session),
+) -> ConversationsResponse:
+    await require_tenant_member(tenant_id, caller, session)
+    # ... query web_conversations WHERE tenant_id = tenant_id AND user_id = caller.user_id
+```
+
+### Proxy.ts update — add /chat to allowed operator paths
+```typescript
+// packages/portal/proxy.ts
+// Source: existing file — /chat must NOT be in CUSTOMER_OPERATOR_RESTRICTED
+// Operators can chat (chatting IS the product)
+// No change needed to proxy.ts — /chat is not in the restricted list
+// Just add /chat to nav.tsx
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| `middleware.ts` | `proxy.ts` (function named `proxy`) | Next.js 16 | Already migrated in this project — STATE.md confirms |
+| `useSearchParams` synchronous | `use(searchParams)` to unwrap Promise | Next.js 15 | Already applied in this project per STATE.md |
+| `zodResolver` from hookform | `standardSchemaResolver` | hookform/resolvers v5 | Already applied — don't use zodResolver |
+| `stripe.api_key = ...` | `new StripeClient(api_key=...)` | stripe v14+ | Already applied — use thread-safe constructor |
+| `Column()` SQLAlchemy | `mapped_column()` + `Mapped[]` | SQLAlchemy 2.0 | Already the pattern — use mapped_column |
+
+**Deprecated/outdated:**
+- `middleware.ts`: deprecated in Next.js 16, renamed to `proxy.ts`. Already done in this project.
+- SQLAlchemy `sa.Enum` for channel_type: causes duplicate DDL — use TEXT + CHECK constraint (STATE.md decision).
+
+---
+
+## Open Questions
+
+1. **HTTP Polling Fallback Scope**
+   - What we know: CONTEXT.md specifies "fallback to HTTP polling if WebSocket unavailable"
+   - What's unclear: Is this needed for v1 given all modern browsers support WebSocket? WebSocket failure typically indicates a network/proxy issue that polling would also fail on.
+   - Recommendation: Implement WebSocket only for v1. Add a simple error state ("Connection lost — please refresh") instead of full polling fallback. Real polling fallback is significant complexity for an edge case.
+
+2. **Media Upload in Web Chat**
+   - What we know: CONTEXT.md says "image/document display inline (consistent with media support from Phase 2)." Phase 2 media goes through MinIO.
+   - What's unclear: Can users upload media directly in web chat (browser file picker), or does "inline display" mean only displaying agent responses that contain media?
+   - Recommendation: v1 — display media in agent responses (agent can return image URLs from MinIO/S3). User-to-agent file upload is a separate feature. The KonstructMessage already supports MediaAttachment; the web normalizer can include media from agent tool results.
+
+3. **Agent Selection Scope for Platform Admins**
+   - What we know: Platform admins can chat with "any agent across all tenants."
+   - What's unclear: The agent picker UI — does a platform admin see all agents grouped by tenant, or do they first pick a tenant then pick an agent?
+   - Recommendation: Use the existing tenant switcher pattern from the agents page: platform admin sees agents grouped by tenant in the sidebar. This reuses `useTenants()` + `useAgents(tenantId)` pattern already in the agents list page.
+
+---
+
+## Validation Architecture
+
+### Test Framework
+| Property | Value |
+|----------|-------|
+| Framework | pytest 8.3.0 + pytest-asyncio 0.25.0 |
+| Config file | `pyproject.toml` (root) — `asyncio_mode = "auto"`, `testpaths = ["tests"]` |
+| Quick run command | `pytest tests/unit/test_web_channel.py -x` |
+| Full suite command | `pytest tests/unit -x` |
+
+### Phase Requirements → Test Map
+
+| Req ID | Behavior | Test Type | Automated Command | File Exists? |
+|--------|----------|-----------|-------------------|-------------|
+| CHAT-01 | WebSocket endpoint accepts connection and dispatches to Celery | unit | `pytest tests/unit/test_web_channel.py::test_websocket_dispatches_to_celery -x` | ❌ Wave 0 |
+| CHAT-01 | Web channel normalizer produces valid KonstructMessage | unit | `pytest tests/unit/test_web_channel.py::test_normalize_web_event -x` | ❌ Wave 0 |
+| CHAT-02 | `_send_response` for "web" channel publishes to Redis pub-sub | unit | `pytest tests/unit/test_web_channel.py::test_send_response_web_publishes_to_redis -x` | ❌ Wave 0 |
+| CHAT-03 | Conversation history REST endpoint returns paginated messages | unit | `pytest tests/unit/test_chat_api.py::test_list_conversation_history -x` | ❌ Wave 0 |
+| CHAT-04 | Chat API returns 403 for user not member of tenant | unit | `pytest tests/unit/test_chat_api.py::test_chat_rbac_enforcement -x` | ❌ Wave 0 |
+| CHAT-04 | Platform admin can access agents across all tenants | unit | `pytest tests/unit/test_chat_api.py::test_platform_admin_cross_tenant -x` | ❌ Wave 0 |
+| CHAT-05 | Typing indicator message sent immediately on WebSocket receive | unit | `pytest tests/unit/test_web_channel.py::test_typing_indicator_sent -x` | ❌ Wave 0 |
+
+### Sampling Rate
+- **Per task commit:** `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x`
+- **Per wave merge:** `pytest tests/unit -x`
+- **Phase gate:** Full suite green before `/gsd:verify-work`
+
+### Wave 0 Gaps
+- [ ] `tests/unit/test_web_channel.py` — covers CHAT-01, CHAT-02, CHAT-05
+- [ ] `tests/unit/test_chat_api.py` — covers CHAT-03, CHAT-04
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- Existing codebase — `packages/gateway/gateway/channels/slack.py`, `whatsapp.py`, `normalize.py` — channel adapter pattern directly replicated
+- Existing codebase — `packages/orchestrator/orchestrator/tasks.py` — `_send_response` extension point verified by reading full source
+- Existing codebase — `packages/shared/shared/models/message.py` — ChannelType enum verified, "web" not yet present
+- Existing codebase — `packages/shared/shared/redis_keys.py` — key naming convention verified
+- Existing codebase — `packages/shared/shared/api/rbac.py` — `require_tenant_member`, `get_portal_caller` pattern verified
+- FastAPI source — `fastapi` 0.135.2 installed, `from fastapi import WebSocket` verified importable
+- redis.asyncio — version 5.0.0+ installed, pub-sub available (`r.pubsub()` verified importable)
+- Next.js 16 bundled docs — `packages/portal/node_modules/next/dist/docs/` — proxy.ts naming, `use(searchParams)` patterns confirmed
+- `packages/portal/package.json` — Next.js 16.2.1, React 19.2.4, confirmed packages
+
+### Secondary (MEDIUM confidence)
+- `.planning/STATE.md` — all architecture decisions (channel_type TEXT+CHECK, Celery sync-only, hookform resolver, proxy.ts naming) verified against actual files
+- react-markdown 9.x + remark-gfm 4.x — current stable versions for React 19 compatibility (not yet installed, based on known package state)
+
+### Tertiary (LOW confidence)
+- None — all claims verified against codebase or installed package docs
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — all backend packages verified installed and importable; portal packages verified via package.json
+- Architecture: HIGH — channel adapter pattern, extras dict pattern, RBAC pattern all verified by reading actual source files
+- Pitfalls: HIGH — most pitfalls derive directly from STATE.md documented decisions (CHECK constraint, Celery sync, browser WebSocket header limitation)
+
+**Research date:** 2026-03-25
+**Valid until:** 2026-04-25 (stable stack; react-markdown version should be re-checked if planning is delayed)