# Phase 6: Web Chat - Research

**Researched:** 2026-03-25
**Domain:** Real-time web chat (WebSocket + Redis pub-sub + new channel adapter + portal UI)
**Confidence:** HIGH

<user_constraints>
## User Constraints (from CONTEXT.md)

### Locked Decisions
- Dedicated `/chat` page (full-screen, not a floating widget)
- Left sidebar: conversation list grouped by agent, with timestamps and last message preview
- Right panel: active conversation with message bubbles (user right-aligned, agent left-aligned)
- "New Conversation" button opens an agent picker (shows agents the user has access to)
- Markdown rendering in agent messages
- Image/document display inline (consistent with Phase 2 media support)
- Typing indicator (animated dots) while waiting for agent response
- All three roles can chat: platform admin, customer admin, customer operator
- Users can only see/chat with agents belonging to tenants they have access to (RBAC)
- Platform admins can chat with any agent across all tenants
- Operators can chat (read-only restrictions do NOT apply to conversations)
- One conversation thread per user-agent pair (matches per-user per-agent memory model)
- Users can start new conversation (clears thread context) or continue existing one
- Conversation list sorted by most recent, paginated for long histories
- WebSocket connection for real-time, HTTP polling fallback if WebSocket unavailable
- Gateway receives web chat message, normalizes to KonstructMessage (channel: "web"), dispatches through existing pipeline
- Agent response pushed back via WebSocket
- New "web" channel adapter in gateway alongside Slack and WhatsApp
- channel_metadata includes: portal_user_id, tenant_id, conversation_id
- Tenant resolution from the authenticated session (not from channel metadata like Slack workspace ID)
- Outbound: push response via WebSocket connection keyed to conversation_id

### Claude's Discretion
- WebSocket library choice (native ws, Socket.IO, etc.)
- Message bubble visual design
- Conversation pagination strategy (infinite scroll vs load more)
- Whether to show tool invocation indicators in chat (e.g., "Searching knowledge base...")
- Agent avatar/icon in chat
- Sound notification on new message
- Mobile responsiveness approach

### Deferred Ideas (OUT OF SCOPE)
None raised.
</user_constraints>

<phase_requirements>
## Phase Requirements

| ID | Description | Research Support |
|----|-------------|-----------------|
| CHAT-01 | Users can open a chat window with any AI Employee and have a real-time conversation within the portal | WebSocket endpoint on FastAPI gateway + browser WebSocket client in portal chat page |
| CHAT-02 | Web chat supports full agent pipeline — memory, tools, escalation, and media | "web" channel added to ChannelType enum; handle_message Celery task already handles all pipeline stages; _send_response needs "web" case via Redis pub-sub |
| CHAT-03 | Conversation history persists and is visible when the user returns | New conversations DB table + pgvector already keyed per-user per-agent; history load on page visit |
| CHAT-04 | Chat respects RBAC — users can only chat with agents belonging to tenants they have access to | require_tenant_member FastAPI dependency already exists; new chat API endpoints use same pattern; platform_admin bypasses tenant check |
| CHAT-05 | Chat interface feels responsive — typing indicators, message streaming or fast response display | Typing indicator via WebSocket "typing" event immediately on message send; WebSocket pushes final response when Celery completes |
</phase_requirements>

---

## Summary

Phase 6 adds a web chat channel to the Konstruct portal — the first channel that originates inside the portal itself rather than from an external messaging platform. The architecture follows the same channel adapter pattern established in Phases 1 and 2: a new "web" adapter in the gateway normalizes portal messages into KonstructMessage format and dispatches them to the existing Celery pipeline. The key new infrastructure is a WebSocket endpoint on the gateway and a Redis pub-sub channel that bridges the Celery worker's response delivery back to the WebSocket connection.

The frontend is a new `/chat` route in the Next.js portal. It uses the native browser WebSocket API (no additional library required) with a React hook managing connection lifecycle. The UI requires one new shadcn/ui component not yet in the project (ScrollArea) and markdown rendering (react-markdown is not yet installed). Both are straightforward additions.

The most important constraint to keep in mind during planning: the Celery worker and the FastAPI gateway are separate processes. The Celery task cannot call back to the WebSocket connection directly. The correct pattern is Celery publishes the response to a Redis pub-sub channel; the gateway WebSocket handler subscribes to that channel and forwards to the browser. This Redis pub-sub bridge is the critical new piece that does not exist yet.

**Primary recommendation:** Use FastAPI native WebSocket + Redis pub-sub bridge for cross-process response delivery. No additional Python WebSocket libraries needed. Use native browser WebSocket API in the portal. Add react-markdown for markdown rendering.

---

## Standard Stack

### Core

| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| FastAPI WebSocket | Built into fastapi[standard] 0.135.2 | WebSocket endpoint on gateway | Already installed, Starlette-native, zero new deps |
| redis.asyncio pub-sub | redis 5.0.0+ (already installed) | Bridge Celery response → WebSocket | Cross-process response delivery; already used everywhere in this codebase |
| Browser WebSocket API | Native (no library) | Portal WebSocket client | Works in all modern browsers, zero bundle cost |
| react-markdown | 9.x | Render agent markdown responses | Standard React markdown renderer; supports GFM, syntax highlighting |
| remark-gfm | 4.x | GitHub Flavored Markdown support | Tables, strikethrough, task lists in agent responses |

### Supporting

| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| @radix-ui/react-scroll-area (via shadcn) | already available via @base-ui/react | Scrollable message container | Message list that auto-scrolls to bottom |
| lucide-react | already installed | Icons (typing dots, send button, agent avatar) | Already used throughout portal |

### Alternatives Considered

| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| Redis pub-sub bridge | Socket.IO | Socket.IO adds significant bundle weight and complexity; Redis pub-sub is already used in this codebase (rate limiting, session, escalation) |
| React native WebSocket | socket.io-client | Same reason — unnecessary dependency when native WebSocket is sufficient |
| react-markdown | marked + dangerouslySetInnerHTML | react-markdown is React-native and safe; marked requires XSS sanitization as a separate step |

**Installation:**
```bash
# Portal
cd packages/portal && npm install react-markdown remark-gfm

# Backend: no new dependencies needed
# FastAPI WebSocket is in fastapi[standard] already installed
# redis pub-sub is in redis 5.0.0 already installed
```

---

## Architecture Patterns

### Recommended Project Structure

New files added in this phase:

```
packages/
├── gateway/gateway/channels/
│   └── web.py                    # Web channel adapter + WebSocket endpoint + pub-sub subscriber
├── shared/shared/
│   ├── models/message.py         # Add ChannelType.WEB = "web"
│   ├── redis_keys.py             # Add webchat_response_key(tenant_id, conversation_id)
│   └── api/
│       └── chat.py               # REST API: list conversations, get history, create/reset
├── migrations/versions/
│   └── 008_web_chat.py           # conversations table
└── packages/portal/
    ├── app/(dashboard)/chat/
    │   └── page.tsx              # Chat page (client component)
    ├── components/
    │   ├── chat-sidebar.tsx      # Conversation list sidebar
    │   ├── chat-window.tsx       # Active conversation + message bubbles
    │   ├── chat-message.tsx      # Single message bubble with markdown
    │   └── typing-indicator.tsx  # Animated dots
    └── lib/
        ├── api.ts                # Add chat API types + functions
        ├── queries.ts            # Add useConversations, useConversationHistory
        └── use-chat-socket.ts    # WebSocket lifecycle hook
```

### Pattern 1: Redis Pub-Sub Response Bridge

**What:** Celery task (separate process) completes LLM response and needs to push it to a WebSocket connection held by the gateway FastAPI process. Redis pub-sub is the standard cross-process channel.

**When to use:** Any time a background worker needs to push a result back to a long-lived connection.

**Flow:**
1. Browser sends message via WebSocket to gateway
2. Gateway dispatches `handle_message.delay(payload)` (identical to Slack/WhatsApp)
3. Gateway subscribes to Redis channel `{tenant_id}:webchat:response:{conversation_id}` and waits
4. Celery's `_send_response` for "web" channel publishes response to same Redis channel
5. Gateway receives pub-sub message, pushes to browser WebSocket

**Example — gateway side:**
```python
# Source: redis.asyncio pub-sub docs + existing redis usage in this codebase
import redis.asyncio as aioredis
from fastapi import WebSocket

async def websocket_wait_for_response(
    ws: WebSocket,
    redis_url: str,
    response_channel: str,
    timeout: float = 60.0,
) -> None:
    """Subscribe to response channel and forward to WebSocket."""
    r = aioredis.from_url(redis_url)
    pubsub = r.pubsub()
    try:
        await pubsub.subscribe(response_channel)
        # Wait for response with timeout
        async for message in pubsub.listen():
            if message["type"] == "message":
                await ws.send_text(message["data"])
                return
    finally:
        await pubsub.unsubscribe(response_channel)
        await pubsub.aclose()
        await r.aclose()
```

**Example — Celery task side (in `_send_response`):**
```python
# Add "web" case to _send_response in orchestrator/tasks.py
elif channel_str == "web":
    conversation_id: str = extras.get("conversation_id", "") or ""
    tenant_id: str = extras.get("tenant_id", "") or ""
    if not conversation_id or not tenant_id:
        logger.warning("_send_response: web channel missing conversation_id or tenant_id")
        return
    response_channel = webchat_response_key(tenant_id, conversation_id)
    publish_redis = aioredis.from_url(settings.redis_url)
    try:
        await publish_redis.publish(response_channel, json.dumps({
            "type": "response",
            "text": text,
            "conversation_id": conversation_id,
        }))
    finally:
        await publish_redis.aclose()
```

### Pattern 2: FastAPI WebSocket Endpoint

**What:** Native FastAPI WebSocket with auth validation from headers. Gateway already holds the Redis client at startup; WebSocket handler uses it.

**When to use:** Every web chat message from the portal browser.

```python
# Source: FastAPI WebSocket docs (verified — WebSocket import is in fastapi package)
from fastapi import WebSocket, WebSocketDisconnect, Depends
from fastapi.websockets import WebSocketState

@app.websocket("/chat/ws/{conversation_id}")
async def chat_websocket(
    conversation_id: str,
    websocket: WebSocket,
) -> None:
    await websocket.accept()
    try:
        while True:
            data = await websocket.receive_json()
            # Validate auth headers from data["auth"]
            # Normalize to KonstructMessage, dispatch to Celery
            # Subscribe to Redis response channel
            # Push response back to websocket
    except WebSocketDisconnect:
        pass
```

**Critical note:** WebSocket headers are available at handshake time via `websocket.headers`. Auth token or RBAC headers should be sent as custom headers in the browser WebSocket constructor (not supported by all browsers) OR as a first message after connection. The established pattern in this project is to send RBAC headers as `X-Portal-User-Id`, `X-Portal-User-Role`, `X-Portal-Tenant-Id`. For WebSocket, send these as a JSON "auth" message immediately after connection (handshake headers are unreliable with the browser WebSocket API).

### Pattern 3: Browser WebSocket Hook

**What:** React hook that manages WebSocket connection lifecycle (connect on mount, reconnect on disconnect, send/receive messages).

```typescript
// packages/portal/lib/use-chat-socket.ts
// Native browser WebSocket — no library needed
"use client";

import { useEffect, useRef, useCallback, useState } from "react";

interface ChatSocketOptions {
  conversationId: string;
  onMessage: (text: string) => void;
  onTyping: (isTyping: boolean) => void;
  authHeaders: { userId: string; role: string; tenantId: string | null };
}

export function useChatSocket({
  conversationId,
  onMessage,
  onTyping,
  authHeaders,
}: ChatSocketOptions) {
  const wsRef = useRef<WebSocket | null>(null);
  const [isConnected, setIsConnected] = useState(false);

  const send = useCallback((text: string) => {
    if (wsRef.current?.readyState === WebSocket.OPEN) {
      wsRef.current.send(JSON.stringify({
        type: "message",
        text,
        auth: authHeaders,
      }));
      onTyping(true);  // Show typing indicator immediately
    }
  }, [authHeaders, onTyping]);

  useEffect(() => {
    const wsUrl = `${process.env.NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`;
    const ws = new WebSocket(wsUrl);
    wsRef.current = ws;

    ws.onopen = () => setIsConnected(true);
    ws.onclose = () => setIsConnected(false);
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data as string);
      if (data.type === "response") {
        onTyping(false);
        onMessage(data.text as string);
      }
    };

    return () => ws.close();
  }, [conversationId, onMessage, onTyping]);

  return { send, isConnected };
}
```

### Pattern 4: Conversation Persistence (New DB Table)

**What:** A `conversations` table to persist chat history visible on return visits.

**When to use:** Every web chat message — store each turn in the DB.

```python
# New ORM model — migration 008
class WebConversation(Base):
    """Persistent conversation thread for portal web chat."""
    __tablename__ = "web_conversations"

    id: Mapped[uuid.UUID] = ...
    tenant_id: Mapped[uuid.UUID] = ...  # RLS enforced
    agent_id: Mapped[uuid.UUID] = ...
    user_id: Mapped[uuid.UUID] = ...    # portal user UUID (from Auth.js session)
    created_at: Mapped[datetime] = ...
    updated_at: Mapped[datetime] = ...  # used for sort order

    __table_args__ = (
        UniqueConstraint("tenant_id", "agent_id", "user_id"),  # one thread per pair
    )


class WebConversationMessage(Base):
    """Individual message within a web conversation."""
    __tablename__ = "web_conversation_messages"

    id: Mapped[uuid.UUID] = ...
    conversation_id: Mapped[uuid.UUID] = ForeignKey("web_conversations.id")
    tenant_id: Mapped[uuid.UUID] = ...  # RLS enforced
    role: Mapped[str] = ...             # "user" | "assistant"
    content: Mapped[str] = ...
    created_at: Mapped[datetime] = ...
```

**Note:** The `user_id` for web chat is the portal user's UUID from Auth.js — different from the Slack user ID string used in existing memory. The Redis memory key `memory:short:{agent_id}:{user_id}` will use the portal user's UUID string as `user_id`, keeping it compatible with the existing memory system.

### Pattern 5: Conversation REST API

**What:** REST endpoints for listing conversations, loading history, and resetting. This is separate from the WebSocket endpoint.

```
GET  /api/portal/chat/conversations?tenant_id={id}     — list all conversations for user
GET  /api/portal/chat/conversations/{id}/messages      — load history (paginated)
POST /api/portal/chat/conversations                    — create new or get-or-create
DELETE /api/portal/chat/conversations/{id}             — reset (delete messages, keep thread)
```

### Anti-Patterns to Avoid

- **Streaming token-by-token:** The requirements doc explicitly marks "Real-time token streaming in chat" as Out of Scope (consistent with Slack/WhatsApp — they don't support partial messages). The typing indicator shows while the full LLM call runs; the complete response arrives as one message.
- **WebSocket auth via URL query params:** Never put tokens/user IDs in the WebSocket URL. Use JSON message after connection.
- **Calling Celery result backend from WebSocket handler:** Celery result backends add latency and coupling. Use Redis pub-sub directly.
- **One WebSocket connection per page load (not per conversation):** The connection should be scoped per conversation_id so reconnect on conversation switch is clean.
- **Storing conversation history only in Redis:** Redis memory (sliding window) is the agent's working context. The DB `web_conversation_messages` table is what shows up when the user returns to the chat page. These are separate concerns.

---

## Don't Hand-Roll

| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Markdown rendering | Custom regex parser | react-markdown + remark-gfm | Handles edge cases, escapes XSS, supports all GFM |
| WebSocket reconnection | Custom exponential backoff | Simple reconnect on close (sufficient for v1) | LLM calls are short; connections don't stay open for hours |
| Auth for WebSocket | Custom token scheme | Send auth as first JSON message using existing RBAC headers | Consistent with existing `X-Portal-*` header pattern |
| Cross-process response delivery | Shared memory / HTTP callback | Redis pub-sub | Already in use; correct pattern for Celery → FastAPI bridge |

**Key insight:** The web channel adapter is the only genuinely new piece of infrastructure. Everything else — RBAC, memory, tool calling, escalation, audit — already works and processes messages tagged with any channel type. Adding `ChannelType.WEB = "web"` and a new `_send_response` branch is sufficient to wire the whole pipeline.

---

## Common Pitfalls

### Pitfall 1: WebSocket Auth — Browser API Limitation

**What goes wrong:** The browser's native `WebSocket` constructor does not support custom headers. Code that tries `new WebSocket(url, { headers: {...} })` fails silently or raises a TypeError.

**Why it happens:** The WebSocket spec only allows specifying subprotocols as the second argument, not headers. This is a deliberate browser security decision.

**How to avoid:** Send auth information as a JSON "auth" message immediately after connection opens. The FastAPI WebSocket handler should require this first message before processing any chat messages. This is established practice for browser WebSocket auth.

**Warning signs:** Tests that use httpx websocket client work fine (httpx supports headers) but the browser connection is rejected.

### Pitfall 2: Celery Sync Context in Async `_send_response`

**What goes wrong:** `_send_response` is an async function called from `asyncio.run()` inside the sync Celery task. Adding Redis pub-sub code there requires creating a new async Redis client per task, which is the existing pattern — but forgetting `await publish_redis.aclose()` leaks connections.

**Why it happens:** The "Celery tasks MUST be sync def" constraint (STATE.md) means we're always bridging sync→async via `asyncio.run()`. Every async resource must be explicitly closed.

**How to avoid:** Follow the existing pattern in `_process_message`: use `try/finally` around every `aioredis.from_url()` call to ensure `aclose()` always runs.

**Warning signs:** Redis connection count grows over time; "too many connections" errors in production.

### Pitfall 3: Conversation ID vs Thread ID Confusion

**What goes wrong:** The KonstructMessage `thread_id` field is used by the memory system to scope Redis sliding window. For web chat, `thread_id` should be the `conversation_id` (UUID) from the `web_conversations` table. If this is set incorrectly (e.g., to the portal user_id), all conversations for a user share one memory window.

**Why it happens:** Slack sets `thread_id` to `thread_ts` (string). WhatsApp sets it to `wa_id`. Web chat must set it to `conversation_id` (UUID string) — one distinct value per conversation.

**How to avoid:** The web channel normalizer should set `thread_id = conversation_id` in the KonstructMessage. The `user_id` for memory key construction comes from `sender.user_id` (portal user UUID string). The combination `tenant_id + agent_id + user_id` (Redis memory key) matches correctly.

### Pitfall 4: New Conversation vs Continue — Race Condition

**What goes wrong:** User clicks "New Conversation" while a response is still in flight for the old conversation. The old conversation's pub-sub response arrives and updates the new conversation's state.

**Why it happens:** The WebSocket is keyed to `conversation_id`. When the user resets the thread, a new `conversation_id` is created. The old pub-sub subscription must be cleaned up before subscribing to the new one.

**How to avoid:** When the user creates a new conversation: (1) close/unmount the old WebSocket connection, (2) create a new `web_conversations` row via REST API (getting a new UUID), (3) connect new WebSocket to the new conversation_id. React's `useEffect` cleanup handles this naturally when `conversationId` changes.

### Pitfall 5: `ChannelType.WEB` Missing from DB CHECK Constraint

**What goes wrong:** Adding `WEB = "web"` to the Python `ChannelType` StrEnum does not automatically update the PostgreSQL CHECK constraint on the `channel_type` column. Existing data is fine, but inserting new records with `channel = "web"` fails at the DB level.

**Why it happens:** STATE.md documents the decision: "channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL." The CHECK constraint lists allowed values and must be updated via migration.

**How to avoid:** Migration 008 must ALTER the CHECK constraint on any affected tables to include `"web"`. Check which tables have `channel_type` constraints: `channel_connections` (stores active channel configs per tenant). The `conversation_embeddings` and audit tables use `TEXT` without CHECK, so only `channel_connections` needs the update.

**Warning signs:** `CheckViolation` error from PostgreSQL when the gateway tries to normalize a web message.

### Pitfall 6: React 19 + Next.js 16 `use()` for Async Data

**What goes wrong:** Using `useState` + `useEffect` to fetch conversation history in a client component works but misses the React 19 preferred pattern.

**Why it happens:** React 19 introduces `use()` for Promises directly in components (TanStack Query handles this abstraction). The existing codebase already uses TanStack Query uniformly — don't break this pattern.

**How to avoid:** Add `useConversations` and `useConversationHistory` hooks in `queries.ts` following the existing pattern (e.g., `useAgents`, `useTenants`). Use `useQuery` from `@tanstack/react-query`.

---

## Code Examples

Verified patterns from existing codebase:

### Adding ChannelType.WEB to the enum
```python
# packages/shared/shared/models/message.py
# Source: existing file — add one line
class ChannelType(StrEnum):
    SLACK = "slack"
    WHATSAPP = "whatsapp"
    MATTERMOST = "mattermost"
    ROCKETCHAT = "rocketchat"
    TEAMS = "teams"
    TELEGRAM = "telegram"
    SIGNAL = "signal"
    WEB = "web"          # Add this line
```

### Adding webchat Redis key to redis_keys.py
```python
# packages/shared/shared/redis_keys.py
# Source: existing file pattern
def webchat_response_key(tenant_id: str, conversation_id: str) -> str:
    """
    Redis pub-sub channel for web chat response delivery.

    Published by Celery task after LLM response; subscribed by WebSocket handler.
    """
    return f"{tenant_id}:webchat:response:{conversation_id}"
```

### Web channel extras in handle_message
```python
# packages/orchestrator/orchestrator/tasks.py
# Source: existing extras pattern (line 246-254)
# Add to handle_message alongside existing Slack/WhatsApp extras:
conversation_id: str = message_data.pop("conversation_id", "") or ""
portal_user_id: str = message_data.pop("portal_user_id", "") or ""

# Add to extras dict (line 269-274):
extras: dict[str, Any] = {
    "placeholder_ts": placeholder_ts,
    "channel_id": channel_id,
    "phone_number_id": phone_number_id,
    "bot_token": bot_token,
    "wa_id": wa_id,
    "conversation_id": conversation_id,
    "portal_user_id": portal_user_id,
}
```

### TanStack Query hook pattern (follows existing)
```typescript
// packages/portal/lib/queries.ts
// Source: existing useAgents pattern
export function useConversations(tenantId: string) {
  return useQuery({
    queryKey: ["conversations", tenantId],
    queryFn: () => api.get<ConversationsResponse>(`/api/portal/chat/conversations?tenant_id=${tenantId}`),
    enabled: !!tenantId,
  });
}

export function useConversationHistory(conversationId: string) {
  return useQuery({
    queryKey: ["conversation-history", conversationId],
    queryFn: () => api.get<MessagesResponse>(`/api/portal/chat/conversations/${conversationId}/messages`),
    enabled: !!conversationId,
  });
}
```

### FastAPI WebSocket endpoint in gateway main.py
```python
# packages/gateway/gateway/main.py — add alongside existing routers
# Source: FastAPI WebSocket API (verified available in fastapi 0.135.2)
from gateway.channels.web import chat_websocket_router
app.include_router(chat_websocket_router)
```

### RBAC enforcement in chat REST API
```python
# packages/shared/shared/api/chat.py
# Source: existing pattern from rbac.py + portal.py
@router.get("/api/portal/chat/conversations")
async def list_conversations(
    tenant_id: UUID,
    caller: PortalCaller = Depends(get_portal_caller),
    session: AsyncSession = Depends(get_session),
) -> ConversationsResponse:
    await require_tenant_member(tenant_id, caller, session)
    # ... query web_conversations WHERE tenant_id = tenant_id AND user_id = caller.user_id
```

### Proxy.ts update — add /chat to allowed operator paths
```typescript
// packages/portal/proxy.ts
// Source: existing file — /chat must NOT be in CUSTOMER_OPERATOR_RESTRICTED
// Operators can chat (chatting IS the product)
// No change needed to proxy.ts — /chat is not in the restricted list
// Just add /chat to nav.tsx
```

---

## State of the Art

| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| `middleware.ts` | `proxy.ts` (function named `proxy`) | Next.js 16 | Already migrated in this project — STATE.md confirms |
| `useSearchParams` synchronous | `use(searchParams)` to unwrap Promise | Next.js 15 | Already applied in this project per STATE.md |
| `zodResolver` from hookform | `standardSchemaResolver` | hookform/resolvers v5 | Already applied — don't use zodResolver |
| `stripe.api_key = ...` | `new StripeClient(api_key=...)` | stripe v14+ | Already applied — use thread-safe constructor |
| `Column()` SQLAlchemy | `mapped_column()` + `Mapped[]` | SQLAlchemy 2.0 | Already the pattern — use mapped_column |

**Deprecated/outdated:**
- `middleware.ts`: deprecated in Next.js 16, renamed to `proxy.ts`. Already done in this project.
- SQLAlchemy `sa.Enum` for channel_type: causes duplicate DDL — use TEXT + CHECK constraint (STATE.md decision).

---

## Open Questions

1. **HTTP Polling Fallback Scope**
   - What we know: CONTEXT.md specifies "fallback to HTTP polling if WebSocket unavailable"
   - What's unclear: Is this needed for v1 given all modern browsers support WebSocket? WebSocket failure typically indicates a network/proxy issue that polling would also fail on.
   - Recommendation: Implement WebSocket only for v1. Add a simple error state ("Connection lost — please refresh") instead of full polling fallback. Real polling fallback is significant complexity for an edge case.

2. **Media Upload in Web Chat**
   - What we know: CONTEXT.md says "image/document display inline (consistent with media support from Phase 2)." Phase 2 media goes through MinIO.
   - What's unclear: Can users upload media directly in web chat (browser file picker), or does "inline display" mean only displaying agent responses that contain media?
   - Recommendation: v1 — display media in agent responses (agent can return image URLs from MinIO/S3). User-to-agent file upload is a separate feature. The KonstructMessage already supports MediaAttachment; the web normalizer can include media from agent tool results.

3. **Agent Selection Scope for Platform Admins**
   - What we know: Platform admins can chat with "any agent across all tenants."
   - What's unclear: The agent picker UI — does a platform admin see all agents grouped by tenant, or do they first pick a tenant then pick an agent?
   - Recommendation: Use the existing tenant switcher pattern from the agents page: platform admin sees agents grouped by tenant in the sidebar. This reuses `useTenants()` + `useAgents(tenantId)` pattern already in the agents list page.

---

## Validation Architecture

### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest 8.3.0 + pytest-asyncio 0.25.0 |
| Config file | `pyproject.toml` (root) — `asyncio_mode = "auto"`, `testpaths = ["tests"]` |
| Quick run command | `pytest tests/unit/test_web_channel.py -x` |
| Full suite command | `pytest tests/unit -x` |

### Phase Requirements → Test Map

| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| CHAT-01 | WebSocket endpoint accepts connection and dispatches to Celery | unit | `pytest tests/unit/test_web_channel.py::test_websocket_dispatches_to_celery -x` | ❌ Wave 0 |
| CHAT-01 | Web channel normalizer produces valid KonstructMessage | unit | `pytest tests/unit/test_web_channel.py::test_normalize_web_event -x` | ❌ Wave 0 |
| CHAT-02 | `_send_response` for "web" channel publishes to Redis pub-sub | unit | `pytest tests/unit/test_web_channel.py::test_send_response_web_publishes_to_redis -x` | ❌ Wave 0 |
| CHAT-03 | Conversation history REST endpoint returns paginated messages | unit | `pytest tests/unit/test_chat_api.py::test_list_conversation_history -x` | ❌ Wave 0 |
| CHAT-04 | Chat API returns 403 for user not member of tenant | unit | `pytest tests/unit/test_chat_api.py::test_chat_rbac_enforcement -x` | ❌ Wave 0 |
| CHAT-04 | Platform admin can access agents across all tenants | unit | `pytest tests/unit/test_chat_api.py::test_platform_admin_cross_tenant -x` | ❌ Wave 0 |
| CHAT-05 | Typing indicator message sent immediately on WebSocket receive | unit | `pytest tests/unit/test_web_channel.py::test_typing_indicator_sent -x` | ❌ Wave 0 |

### Sampling Rate
- **Per task commit:** `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x`
- **Per wave merge:** `pytest tests/unit -x`
- **Phase gate:** Full suite green before `/gsd:verify-work`

### Wave 0 Gaps
- [ ] `tests/unit/test_web_channel.py` — covers CHAT-01, CHAT-02, CHAT-05
- [ ] `tests/unit/test_chat_api.py` — covers CHAT-03, CHAT-04

---

## Sources

### Primary (HIGH confidence)
- Existing codebase — `packages/gateway/gateway/channels/slack.py`, `whatsapp.py`, `normalize.py` — channel adapter pattern directly replicated
- Existing codebase — `packages/orchestrator/orchestrator/tasks.py` — `_send_response` extension point verified by reading full source
- Existing codebase — `packages/shared/shared/models/message.py` — ChannelType enum verified, "web" not yet present
- Existing codebase — `packages/shared/shared/redis_keys.py` — key naming convention verified
- Existing codebase — `packages/shared/shared/api/rbac.py` — `require_tenant_member`, `get_portal_caller` pattern verified
- FastAPI source — `fastapi` 0.135.2 installed, `from fastapi import WebSocket` verified importable
- redis.asyncio — version 5.0.0+ installed, pub-sub available (`r.pubsub()` verified importable)
- Next.js 16 bundled docs — `packages/portal/node_modules/next/dist/docs/` — proxy.ts naming, `use(searchParams)` patterns confirmed
- `packages/portal/package.json` — Next.js 16.2.1, React 19.2.4, confirmed packages

### Secondary (MEDIUM confidence)
- `.planning/STATE.md` — all architecture decisions (channel_type TEXT+CHECK, Celery sync-only, hookform resolver, proxy.ts naming) verified against actual files
- react-markdown 9.x + remark-gfm 4.x — current stable versions for React 19 compatibility (not yet installed, based on known package state)

### Tertiary (LOW confidence)
- None — all claims verified against codebase or installed package docs

---

## Metadata

**Confidence breakdown:**
- Standard stack: HIGH — all backend packages verified installed and importable; portal packages verified via package.json
- Architecture: HIGH — channel adapter pattern, extras dict pattern, RBAC pattern all verified by reading actual source files
- Pitfalls: HIGH — most pitfalls derive directly from STATE.md documented decisions (CHECK constraint, Celery sync, browser WebSocket header limitation)

**Research date:** 2026-03-25
**Valid until:** 2026-04-25 (stable stack; react-markdown version should be re-checked if planning is delayed)