docs(06): research web chat phase — WebSocket, Redis pub-sub, channel adapter, portal UI
This commit is contained in:
628
.planning/phases/06-web-chat/06-RESEARCH.md
Normal file
628
.planning/phases/06-web-chat/06-RESEARCH.md
Normal file
@@ -0,0 +1,628 @@
|
||||
# Phase 6: Web Chat - Research
|
||||
|
||||
**Researched:** 2026-03-25
|
||||
**Domain:** Real-time web chat (WebSocket + Redis pub-sub + new channel adapter + portal UI)
|
||||
**Confidence:** HIGH
|
||||
|
||||
<user_constraints>
|
||||
## User Constraints (from CONTEXT.md)
|
||||
|
||||
### Locked Decisions
|
||||
- Dedicated `/chat` page (full-screen, not a floating widget)
|
||||
- Left sidebar: conversation list grouped by agent, with timestamps and last message preview
|
||||
- Right panel: active conversation with message bubbles (user right-aligned, agent left-aligned)
|
||||
- "New Conversation" button opens an agent picker (shows agents the user has access to)
|
||||
- Markdown rendering in agent messages
|
||||
- Image/document display inline (consistent with Phase 2 media support)
|
||||
- Typing indicator (animated dots) while waiting for agent response
|
||||
- All three roles can chat: platform admin, customer admin, customer operator
|
||||
- Users can only see/chat with agents belonging to tenants they have access to (RBAC)
|
||||
- Platform admins can chat with any agent across all tenants
|
||||
- Operators can chat (read-only restrictions do NOT apply to conversations)
|
||||
- One conversation thread per user-agent pair (matches per-user per-agent memory model)
|
||||
- Users can start new conversation (clears thread context) or continue existing one
|
||||
- Conversation list sorted by most recent, paginated for long histories
|
||||
- WebSocket connection for real-time, HTTP polling fallback if WebSocket unavailable
|
||||
- Gateway receives web chat message, normalizes to KonstructMessage (channel: "web"), dispatches through existing pipeline
|
||||
- Agent response pushed back via WebSocket
|
||||
- New "web" channel adapter in gateway alongside Slack and WhatsApp
|
||||
- channel_metadata includes: portal_user_id, tenant_id, conversation_id
|
||||
- Tenant resolution from the authenticated session (not from channel metadata like Slack workspace ID)
|
||||
- Outbound: push response via WebSocket connection keyed to conversation_id
|
||||
|
||||
### Claude's Discretion
|
||||
- WebSocket library choice (native ws, Socket.IO, etc.)
|
||||
- Message bubble visual design
|
||||
- Conversation pagination strategy (infinite scroll vs load more)
|
||||
- Whether to show tool invocation indicators in chat (e.g., "Searching knowledge base...")
|
||||
- Agent avatar/icon in chat
|
||||
- Sound notification on new message
|
||||
- Mobile responsiveness approach
|
||||
|
||||
### Deferred Ideas (OUT OF SCOPE)
|
||||
None raised.
|
||||
</user_constraints>
|
||||
|
||||
<phase_requirements>
|
||||
## Phase Requirements
|
||||
|
||||
| ID | Description | Research Support |
|
||||
|----|-------------|-----------------|
|
||||
| CHAT-01 | Users can open a chat window with any AI Employee and have a real-time conversation within the portal | WebSocket endpoint on FastAPI gateway + browser WebSocket client in portal chat page |
|
||||
| CHAT-02 | Web chat supports full agent pipeline — memory, tools, escalation, and media | "web" channel added to ChannelType enum; handle_message Celery task already handles all pipeline stages; _send_response needs "web" case via Redis pub-sub |
|
||||
| CHAT-03 | Conversation history persists and is visible when the user returns | New conversations DB table + pgvector already keyed per-user per-agent; history load on page visit |
|
||||
| CHAT-04 | Chat respects RBAC — users can only chat with agents belonging to tenants they have access to | require_tenant_member FastAPI dependency already exists; new chat API endpoints use same pattern; platform_admin bypasses tenant check |
|
||||
| CHAT-05 | Chat interface feels responsive — typing indicators, message streaming or fast response display | Typing indicator via WebSocket "typing" event immediately on message send; WebSocket pushes final response when Celery completes |
|
||||
</phase_requirements>
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
Phase 6 adds a web chat channel to the Konstruct portal — the first channel that originates inside the portal itself rather than from an external messaging platform. The architecture follows the same channel adapter pattern established in Phases 1 and 2: a new "web" adapter in the gateway normalizes portal messages into KonstructMessage format and dispatches them to the existing Celery pipeline. The key new infrastructure is a WebSocket endpoint on the gateway and a Redis pub-sub channel that bridges the Celery worker's response delivery back to the WebSocket connection.
|
||||
|
||||
The frontend is a new `/chat` route in the Next.js portal. It uses the native browser WebSocket API (no additional library required) with a React hook managing connection lifecycle. The UI requires one new shadcn/ui component not yet in the project (ScrollArea) and markdown rendering (react-markdown is not yet installed). Both are straightforward additions.
|
||||
|
||||
The most important constraint to keep in mind during planning: the Celery worker and the FastAPI gateway are separate processes. The Celery task cannot call back to the WebSocket connection directly. The correct pattern is Celery publishes the response to a Redis pub-sub channel; the gateway WebSocket handler subscribes to that channel and forwards to the browser. This Redis pub-sub bridge is the critical new piece that does not exist yet.
|
||||
|
||||
**Primary recommendation:** Use FastAPI native WebSocket + Redis pub-sub bridge for cross-process response delivery. No additional Python WebSocket libraries needed. Use native browser WebSocket API in the portal. Add react-markdown for markdown rendering.
|
||||
|
||||
---
|
||||
|
||||
## Standard Stack
|
||||
|
||||
### Core
|
||||
|
||||
| Library | Version | Purpose | Why Standard |
|
||||
|---------|---------|---------|--------------|
|
||||
| FastAPI WebSocket | Built into fastapi[standard] 0.135.2 | WebSocket endpoint on gateway | Already installed, Starlette-native, zero new deps |
|
||||
| redis.asyncio pub-sub | redis 5.0.0+ (already installed) | Bridge Celery response → WebSocket | Cross-process response delivery; already used everywhere in this codebase |
|
||||
| Browser WebSocket API | Native (no library) | Portal WebSocket client | Works in all modern browsers, zero bundle cost |
|
||||
| react-markdown | 9.x | Render agent markdown responses | Standard React markdown renderer; supports GFM, syntax highlighting |
|
||||
| remark-gfm | 4.x | GitHub Flavored Markdown support | Tables, strikethrough, task lists in agent responses |
|
||||
|
||||
### Supporting
|
||||
|
||||
| Library | Version | Purpose | When to Use |
|
||||
|---------|---------|---------|-------------|
|
||||
| @radix-ui/react-scroll-area (via shadcn) | already available via @base-ui/react | Scrollable message container | Message list that auto-scrolls to bottom |
|
||||
| lucide-react | already installed | Icons (typing dots, send button, agent avatar) | Already used throughout portal |
|
||||
|
||||
### Alternatives Considered
|
||||
|
||||
| Instead of | Could Use | Tradeoff |
|
||||
|------------|-----------|----------|
|
||||
| Redis pub-sub bridge | Socket.IO | Socket.IO adds significant bundle weight and complexity; Redis pub-sub is already used in this codebase (rate limiting, session, escalation) |
|
||||
| React native WebSocket | socket.io-client | Same reason — unnecessary dependency when native WebSocket is sufficient |
|
||||
| react-markdown | marked + dangerouslySetInnerHTML | react-markdown is React-native and safe; marked requires XSS sanitization as a separate step |
|
||||
|
||||
**Installation:**
|
||||
```bash
|
||||
# Portal
|
||||
cd packages/portal && npm install react-markdown remark-gfm
|
||||
|
||||
# Backend: no new dependencies needed
|
||||
# FastAPI WebSocket is in fastapi[standard] already installed
|
||||
# redis pub-sub is in redis 5.0.0 already installed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Patterns
|
||||
|
||||
### Recommended Project Structure
|
||||
|
||||
New files added in this phase:
|
||||
|
||||
```
|
||||
packages/
|
||||
├── gateway/gateway/channels/
|
||||
│ └── web.py # Web channel adapter + WebSocket endpoint + pub-sub subscriber
|
||||
├── shared/shared/
|
||||
│ ├── models/message.py # Add ChannelType.WEB = "web"
|
||||
│ ├── redis_keys.py # Add webchat_response_key(tenant_id, conversation_id)
|
||||
│ └── api/
|
||||
│ └── chat.py # REST API: list conversations, get history, create/reset
|
||||
├── migrations/versions/
|
||||
│ └── 008_web_chat.py # conversations table
|
||||
└── packages/portal/
|
||||
├── app/(dashboard)/chat/
|
||||
│ └── page.tsx # Chat page (client component)
|
||||
├── components/
|
||||
│ ├── chat-sidebar.tsx # Conversation list sidebar
|
||||
│ ├── chat-window.tsx # Active conversation + message bubbles
|
||||
│ ├── chat-message.tsx # Single message bubble with markdown
|
||||
│ └── typing-indicator.tsx # Animated dots
|
||||
└── lib/
|
||||
├── api.ts # Add chat API types + functions
|
||||
├── queries.ts # Add useConversations, useConversationHistory
|
||||
└── use-chat-socket.ts # WebSocket lifecycle hook
|
||||
```
|
||||
|
||||
### Pattern 1: Redis Pub-Sub Response Bridge
|
||||
|
||||
**What:** Celery task (separate process) completes LLM response and needs to push it to a WebSocket connection held by the gateway FastAPI process. Redis pub-sub is the standard cross-process channel.
|
||||
|
||||
**When to use:** Any time a background worker needs to push a result back to a long-lived connection.
|
||||
|
||||
**Flow:**
|
||||
1. Browser sends message via WebSocket to gateway
|
||||
2. Gateway dispatches `handle_message.delay(payload)` (identical to Slack/WhatsApp)
|
||||
3. Gateway subscribes to Redis channel `{tenant_id}:webchat:response:{conversation_id}` and waits
|
||||
4. Celery's `_send_response` for "web" channel publishes response to same Redis channel
|
||||
5. Gateway receives pub-sub message, pushes to browser WebSocket
|
||||
|
||||
**Example — gateway side:**
|
||||
```python
|
||||
# Source: redis.asyncio pub-sub docs + existing redis usage in this codebase
|
||||
import redis.asyncio as aioredis
|
||||
from fastapi import WebSocket
|
||||
|
||||
async def websocket_wait_for_response(
|
||||
ws: WebSocket,
|
||||
redis_url: str,
|
||||
response_channel: str,
|
||||
timeout: float = 60.0,
|
||||
) -> None:
|
||||
"""Subscribe to response channel and forward to WebSocket."""
|
||||
r = aioredis.from_url(redis_url)
|
||||
pubsub = r.pubsub()
|
||||
try:
|
||||
await pubsub.subscribe(response_channel)
|
||||
# Wait for response with timeout
|
||||
async for message in pubsub.listen():
|
||||
if message["type"] == "message":
|
||||
await ws.send_text(message["data"])
|
||||
return
|
||||
finally:
|
||||
await pubsub.unsubscribe(response_channel)
|
||||
await pubsub.aclose()
|
||||
await r.aclose()
|
||||
```
|
||||
|
||||
**Example — Celery task side (in `_send_response`):**
|
||||
```python
|
||||
# Add "web" case to _send_response in orchestrator/tasks.py
|
||||
elif channel_str == "web":
|
||||
conversation_id: str = extras.get("conversation_id", "") or ""
|
||||
tenant_id: str = extras.get("tenant_id", "") or ""
|
||||
if not conversation_id or not tenant_id:
|
||||
logger.warning("_send_response: web channel missing conversation_id or tenant_id")
|
||||
return
|
||||
response_channel = webchat_response_key(tenant_id, conversation_id)
|
||||
publish_redis = aioredis.from_url(settings.redis_url)
|
||||
try:
|
||||
await publish_redis.publish(response_channel, json.dumps({
|
||||
"type": "response",
|
||||
"text": text,
|
||||
"conversation_id": conversation_id,
|
||||
}))
|
||||
finally:
|
||||
await publish_redis.aclose()
|
||||
```
|
||||
|
||||
### Pattern 2: FastAPI WebSocket Endpoint
|
||||
|
||||
**What:** Native FastAPI WebSocket with auth validation from headers. Gateway already holds the Redis client at startup; WebSocket handler uses it.
|
||||
|
||||
**When to use:** Every web chat message from the portal browser.
|
||||
|
||||
```python
|
||||
# Source: FastAPI WebSocket docs (verified — WebSocket import is in fastapi package)
|
||||
from fastapi import WebSocket, WebSocketDisconnect, Depends
|
||||
from fastapi.websockets import WebSocketState
|
||||
|
||||
@app.websocket("/chat/ws/{conversation_id}")
|
||||
async def chat_websocket(
|
||||
conversation_id: str,
|
||||
websocket: WebSocket,
|
||||
) -> None:
|
||||
await websocket.accept()
|
||||
try:
|
||||
while True:
|
||||
data = await websocket.receive_json()
|
||||
# Validate auth headers from data["auth"]
|
||||
# Normalize to KonstructMessage, dispatch to Celery
|
||||
# Subscribe to Redis response channel
|
||||
# Push response back to websocket
|
||||
except WebSocketDisconnect:
|
||||
pass
|
||||
```
|
||||
|
||||
**Critical note:** WebSocket headers are available at handshake time via `websocket.headers`. Auth token or RBAC headers should be sent as custom headers in the browser WebSocket constructor (not supported by all browsers) OR as a first message after connection. The established pattern in this project is to send RBAC headers as `X-Portal-User-Id`, `X-Portal-User-Role`, `X-Portal-Tenant-Id`. For WebSocket, send these as a JSON "auth" message immediately after connection (handshake headers are unreliable with the browser WebSocket API).
|
||||
|
||||
### Pattern 3: Browser WebSocket Hook
|
||||
|
||||
**What:** React hook that manages WebSocket connection lifecycle (connect on mount, reconnect on disconnect, send/receive messages).
|
||||
|
||||
```typescript
|
||||
// packages/portal/lib/use-chat-socket.ts
|
||||
// Native browser WebSocket — no library needed
|
||||
"use client";
|
||||
|
||||
import { useEffect, useRef, useCallback, useState } from "react";
|
||||
|
||||
interface ChatSocketOptions {
|
||||
conversationId: string;
|
||||
onMessage: (text: string) => void;
|
||||
onTyping: (isTyping: boolean) => void;
|
||||
authHeaders: { userId: string; role: string; tenantId: string | null };
|
||||
}
|
||||
|
||||
export function useChatSocket({
|
||||
conversationId,
|
||||
onMessage,
|
||||
onTyping,
|
||||
authHeaders,
|
||||
}: ChatSocketOptions) {
|
||||
const wsRef = useRef<WebSocket | null>(null);
|
||||
const [isConnected, setIsConnected] = useState(false);
|
||||
|
||||
const send = useCallback((text: string) => {
|
||||
if (wsRef.current?.readyState === WebSocket.OPEN) {
|
||||
wsRef.current.send(JSON.stringify({
|
||||
type: "message",
|
||||
text,
|
||||
auth: authHeaders,
|
||||
}));
|
||||
onTyping(true); // Show typing indicator immediately
|
||||
}
|
||||
}, [authHeaders, onTyping]);
|
||||
|
||||
useEffect(() => {
|
||||
const wsUrl = `${process.env.NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`;
|
||||
const ws = new WebSocket(wsUrl);
|
||||
wsRef.current = ws;
|
||||
|
||||
ws.onopen = () => setIsConnected(true);
|
||||
ws.onclose = () => setIsConnected(false);
|
||||
ws.onmessage = (event) => {
|
||||
const data = JSON.parse(event.data as string);
|
||||
if (data.type === "response") {
|
||||
onTyping(false);
|
||||
onMessage(data.text as string);
|
||||
}
|
||||
};
|
||||
|
||||
return () => ws.close();
|
||||
}, [conversationId, onMessage, onTyping]);
|
||||
|
||||
return { send, isConnected };
|
||||
}
|
||||
```
|
||||
|
||||
### Pattern 4: Conversation Persistence (New DB Table)
|
||||
|
||||
**What:** A `conversations` table to persist chat history visible on return visits.
|
||||
|
||||
**When to use:** Every web chat message — store each turn in the DB.
|
||||
|
||||
```python
|
||||
# New ORM model — migration 008
|
||||
class WebConversation(Base):
|
||||
"""Persistent conversation thread for portal web chat."""
|
||||
__tablename__ = "web_conversations"
|
||||
|
||||
id: Mapped[uuid.UUID] = ...
|
||||
tenant_id: Mapped[uuid.UUID] = ... # RLS enforced
|
||||
agent_id: Mapped[uuid.UUID] = ...
|
||||
user_id: Mapped[uuid.UUID] = ... # portal user UUID (from Auth.js session)
|
||||
created_at: Mapped[datetime] = ...
|
||||
updated_at: Mapped[datetime] = ... # used for sort order
|
||||
|
||||
__table_args__ = (
|
||||
UniqueConstraint("tenant_id", "agent_id", "user_id"), # one thread per pair
|
||||
)
|
||||
|
||||
|
||||
class WebConversationMessage(Base):
|
||||
"""Individual message within a web conversation."""
|
||||
__tablename__ = "web_conversation_messages"
|
||||
|
||||
id: Mapped[uuid.UUID] = ...
|
||||
conversation_id: Mapped[uuid.UUID] = ForeignKey("web_conversations.id")
|
||||
tenant_id: Mapped[uuid.UUID] = ... # RLS enforced
|
||||
role: Mapped[str] = ... # "user" | "assistant"
|
||||
content: Mapped[str] = ...
|
||||
created_at: Mapped[datetime] = ...
|
||||
```
|
||||
|
||||
**Note:** The `user_id` for web chat is the portal user's UUID from Auth.js — different from the Slack user ID string used in existing memory. The Redis memory key `memory:short:{agent_id}:{user_id}` will use the portal user's UUID string as `user_id`, keeping it compatible with the existing memory system.
|
||||
|
||||
### Pattern 5: Conversation REST API
|
||||
|
||||
**What:** REST endpoints for listing conversations, loading history, and resetting. This is separate from the WebSocket endpoint.
|
||||
|
||||
```
|
||||
GET /api/portal/chat/conversations?tenant_id={id} — list all conversations for user
|
||||
GET /api/portal/chat/conversations/{id}/messages — load history (paginated)
|
||||
POST /api/portal/chat/conversations — create new or get-or-create
|
||||
DELETE /api/portal/chat/conversations/{id} — reset (delete messages, keep thread)
|
||||
```
|
||||
|
||||
### Anti-Patterns to Avoid
|
||||
|
||||
- **Streaming token-by-token:** The requirements doc explicitly marks "Real-time token streaming in chat" as Out of Scope (consistent with Slack/WhatsApp — they don't support partial messages). The typing indicator shows while the full LLM call runs; the complete response arrives as one message.
|
||||
- **WebSocket auth via URL query params:** Never put tokens/user IDs in the WebSocket URL. Use JSON message after connection.
|
||||
- **Calling Celery result backend from WebSocket handler:** Celery result backends add latency and coupling. Use Redis pub-sub directly.
|
||||
- **One WebSocket connection per page load (not per conversation):** The connection should be scoped per conversation_id so reconnect on conversation switch is clean.
|
||||
- **Storing conversation history only in Redis:** Redis memory (sliding window) is the agent's working context. The DB `web_conversation_messages` table is what shows up when the user returns to the chat page. These are separate concerns.
|
||||
|
||||
---
|
||||
|
||||
## Don't Hand-Roll
|
||||
|
||||
| Problem | Don't Build | Use Instead | Why |
|
||||
|---------|-------------|-------------|-----|
|
||||
| Markdown rendering | Custom regex parser | react-markdown + remark-gfm | Handles edge cases, escapes XSS, supports all GFM |
|
||||
| WebSocket reconnection | Custom exponential backoff | Simple reconnect on close (sufficient for v1) | LLM calls are short; connections don't stay open for hours |
|
||||
| Auth for WebSocket | Custom token scheme | Send auth as first JSON message using existing RBAC headers | Consistent with existing `X-Portal-*` header pattern |
|
||||
| Cross-process response delivery | Shared memory / HTTP callback | Redis pub-sub | Already in use; correct pattern for Celery → FastAPI bridge |
|
||||
|
||||
**Key insight:** The web channel adapter is the only genuinely new piece of infrastructure. Everything else — RBAC, memory, tool calling, escalation, audit — already works and processes messages tagged with any channel type. Adding `ChannelType.WEB = "web"` and a new `_send_response` branch is sufficient to wire the whole pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Common Pitfalls
|
||||
|
||||
### Pitfall 1: WebSocket Auth — Browser API Limitation
|
||||
|
||||
**What goes wrong:** The browser's native `WebSocket` constructor does not support custom headers. Code that tries `new WebSocket(url, { headers: {...} })` fails silently or raises a TypeError.
|
||||
|
||||
**Why it happens:** The WebSocket spec only allows specifying subprotocols as the second argument, not headers. This is a deliberate browser security decision.
|
||||
|
||||
**How to avoid:** Send auth information as a JSON "auth" message immediately after connection opens. The FastAPI WebSocket handler should require this first message before processing any chat messages. This is established practice for browser WebSocket auth.
|
||||
|
||||
**Warning signs:** Tests that use httpx websocket client work fine (httpx supports headers) but the browser connection is rejected.
|
||||
|
||||
### Pitfall 2: Celery Sync Context in Async `_send_response`
|
||||
|
||||
**What goes wrong:** `_send_response` is an async function called from `asyncio.run()` inside the sync Celery task. Adding Redis pub-sub code there requires creating a new async Redis client per task, which is the existing pattern — but forgetting `await publish_redis.aclose()` leaks connections.
|
||||
|
||||
**Why it happens:** The "Celery tasks MUST be sync def" constraint (STATE.md) means we're always bridging sync→async via `asyncio.run()`. Every async resource must be explicitly closed.
|
||||
|
||||
**How to avoid:** Follow the existing pattern in `_process_message`: use `try/finally` around every `aioredis.from_url()` call to ensure `aclose()` always runs.
|
||||
|
||||
**Warning signs:** Redis connection count grows over time; "too many connections" errors in production.
|
||||
|
||||
### Pitfall 3: Conversation ID vs Thread ID Confusion
|
||||
|
||||
**What goes wrong:** The KonstructMessage `thread_id` field is used by the memory system to scope Redis sliding window. For web chat, `thread_id` should be the `conversation_id` (UUID) from the `web_conversations` table. If this is set incorrectly (e.g., to the portal user_id), all conversations for a user share one memory window.
|
||||
|
||||
**Why it happens:** Slack sets `thread_id` to `thread_ts` (string). WhatsApp sets it to `wa_id`. Web chat must set it to `conversation_id` (UUID string) — one distinct value per conversation.
|
||||
|
||||
**How to avoid:** The web channel normalizer should set `thread_id = conversation_id` in the KonstructMessage. The `user_id` for memory key construction comes from `sender.user_id` (portal user UUID string). The combination `tenant_id + agent_id + user_id` (Redis memory key) matches correctly.
|
||||
|
||||
### Pitfall 4: New Conversation vs Continue — Race Condition
|
||||
|
||||
**What goes wrong:** User clicks "New Conversation" while a response is still in flight for the old conversation. The old conversation's pub-sub response arrives and updates the new conversation's state.
|
||||
|
||||
**Why it happens:** The WebSocket is keyed to `conversation_id`. When the user resets the thread, a new `conversation_id` is created. The old pub-sub subscription must be cleaned up before subscribing to the new one.
|
||||
|
||||
**How to avoid:** When the user creates a new conversation: (1) close/unmount the old WebSocket connection, (2) create a new `web_conversations` row via REST API (getting a new UUID), (3) connect new WebSocket to the new conversation_id. React's `useEffect` cleanup handles this naturally when `conversationId` changes.
|
||||
|
||||
### Pitfall 5: `ChannelType.WEB` Missing from DB CHECK Constraint
|
||||
|
||||
**What goes wrong:** Adding `WEB = "web"` to the Python `ChannelType` StrEnum does not automatically update the PostgreSQL CHECK constraint on the `channel_type` column. Existing data is fine, but inserting new records with `channel = "web"` fails at the DB level.
|
||||
|
||||
**Why it happens:** STATE.md documents the decision: "channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL." The CHECK constraint lists allowed values and must be updated via migration.
|
||||
|
||||
**How to avoid:** Migration 008 must ALTER the CHECK constraint on any affected tables to include `"web"`. Check which tables have `channel_type` constraints: `channel_connections` (stores active channel configs per tenant). The `conversation_embeddings` and audit tables use `TEXT` without CHECK, so only `channel_connections` needs the update.
|
||||
|
||||
**Warning signs:** `CheckViolation` error from PostgreSQL when the gateway tries to normalize a web message.
|
||||
|
||||
### Pitfall 6: React 19 + Next.js 16 `use()` for Async Data
|
||||
|
||||
**What goes wrong:** Using `useState` + `useEffect` to fetch conversation history in a client component works but misses the React 19 preferred pattern.
|
||||
|
||||
**Why it happens:** React 19 introduces `use()` for Promises directly in components (TanStack Query handles this abstraction). The existing codebase already uses TanStack Query uniformly — don't break this pattern.
|
||||
|
||||
**How to avoid:** Add `useConversations` and `useConversationHistory` hooks in `queries.ts` following the existing pattern (e.g., `useAgents`, `useTenants`). Use `useQuery` from `@tanstack/react-query`.
|
||||
|
||||
---
|
||||
|
||||
## Code Examples
|
||||
|
||||
Verified patterns from existing codebase:
|
||||
|
||||
### Adding ChannelType.WEB to the enum
|
||||
```python
|
||||
# packages/shared/shared/models/message.py
|
||||
# Source: existing file — add one line
|
||||
class ChannelType(StrEnum):
|
||||
SLACK = "slack"
|
||||
WHATSAPP = "whatsapp"
|
||||
MATTERMOST = "mattermost"
|
||||
ROCKETCHAT = "rocketchat"
|
||||
TEAMS = "teams"
|
||||
TELEGRAM = "telegram"
|
||||
SIGNAL = "signal"
|
||||
WEB = "web" # Add this line
|
||||
```
|
||||
|
||||
### Adding webchat Redis key to redis_keys.py
|
||||
```python
|
||||
# packages/shared/shared/redis_keys.py
|
||||
# Source: existing file pattern
|
||||
def webchat_response_key(tenant_id: str, conversation_id: str) -> str:
|
||||
"""
|
||||
Redis pub-sub channel for web chat response delivery.
|
||||
|
||||
Published by Celery task after LLM response; subscribed by WebSocket handler.
|
||||
"""
|
||||
return f"{tenant_id}:webchat:response:{conversation_id}"
|
||||
```
|
||||
|
||||
### Web channel extras in handle_message
|
||||
```python
|
||||
# packages/orchestrator/orchestrator/tasks.py
|
||||
# Source: existing extras pattern (line 246-254)
|
||||
# Add to handle_message alongside existing Slack/WhatsApp extras:
|
||||
conversation_id: str = message_data.pop("conversation_id", "") or ""
|
||||
portal_user_id: str = message_data.pop("portal_user_id", "") or ""
|
||||
|
||||
# Add to extras dict (line 269-274):
|
||||
extras: dict[str, Any] = {
|
||||
"placeholder_ts": placeholder_ts,
|
||||
"channel_id": channel_id,
|
||||
"phone_number_id": phone_number_id,
|
||||
"bot_token": bot_token,
|
||||
"wa_id": wa_id,
|
||||
"conversation_id": conversation_id,
|
||||
"portal_user_id": portal_user_id,
|
||||
}
|
||||
```
|
||||
|
||||
### TanStack Query hook pattern (follows existing)
|
||||
```typescript
|
||||
// packages/portal/lib/queries.ts
|
||||
// Source: existing useAgents pattern
|
||||
export function useConversations(tenantId: string) {
|
||||
return useQuery({
|
||||
queryKey: ["conversations", tenantId],
|
||||
queryFn: () => api.get<ConversationsResponse>(`/api/portal/chat/conversations?tenant_id=${tenantId}`),
|
||||
enabled: !!tenantId,
|
||||
});
|
||||
}
|
||||
|
||||
export function useConversationHistory(conversationId: string) {
|
||||
return useQuery({
|
||||
queryKey: ["conversation-history", conversationId],
|
||||
queryFn: () => api.get<MessagesResponse>(`/api/portal/chat/conversations/${conversationId}/messages`),
|
||||
enabled: !!conversationId,
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### FastAPI WebSocket endpoint in gateway main.py
|
||||
```python
|
||||
# packages/gateway/gateway/main.py — add alongside existing routers
|
||||
# Source: FastAPI WebSocket API (verified available in fastapi 0.135.2)
|
||||
from gateway.channels.web import chat_websocket_router
|
||||
app.include_router(chat_websocket_router)
|
||||
```
|
||||
|
||||
### RBAC enforcement in chat REST API
|
||||
```python
|
||||
# packages/shared/shared/api/chat.py
|
||||
# Source: existing pattern from rbac.py + portal.py
|
||||
@router.get("/api/portal/chat/conversations")
|
||||
async def list_conversations(
|
||||
tenant_id: UUID,
|
||||
caller: PortalCaller = Depends(get_portal_caller),
|
||||
session: AsyncSession = Depends(get_session),
|
||||
) -> ConversationsResponse:
|
||||
await require_tenant_member(tenant_id, caller, session)
|
||||
# ... query web_conversations WHERE tenant_id = tenant_id AND user_id = caller.user_id
|
||||
```
|
||||
|
||||
### Proxy.ts update — add /chat to allowed operator paths
|
||||
```typescript
|
||||
// packages/portal/proxy.ts
|
||||
// Source: existing file — /chat must NOT be in CUSTOMER_OPERATOR_RESTRICTED
|
||||
// Operators can chat (chatting IS the product)
|
||||
// No change needed to proxy.ts — /chat is not in the restricted list
|
||||
// Just add /chat to nav.tsx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## State of the Art
|
||||
|
||||
| Old Approach | Current Approach | When Changed | Impact |
|
||||
|--------------|------------------|--------------|--------|
|
||||
| `middleware.ts` | `proxy.ts` (function named `proxy`) | Next.js 16 | Already migrated in this project — STATE.md confirms |
|
||||
| `useSearchParams` synchronous | `use(searchParams)` to unwrap Promise | Next.js 15 | Already applied in this project per STATE.md |
|
||||
| `zodResolver` from hookform | `standardSchemaResolver` | hookform/resolvers v5 | Already applied — don't use zodResolver |
|
||||
| `stripe.api_key = ...` | `new StripeClient(api_key=...)` | stripe v14+ | Already applied — use thread-safe constructor |
|
||||
| `Column()` SQLAlchemy | `mapped_column()` + `Mapped[]` | SQLAlchemy 2.0 | Already the pattern — use mapped_column |
|
||||
|
||||
**Deprecated/outdated:**
|
||||
- `middleware.ts`: deprecated in Next.js 16, renamed to `proxy.ts`. Already done in this project.
|
||||
- SQLAlchemy `sa.Enum` for channel_type: causes duplicate DDL — use TEXT + CHECK constraint (STATE.md decision).
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **HTTP Polling Fallback Scope**
|
||||
- What we know: CONTEXT.md specifies "fallback to HTTP polling if WebSocket unavailable"
|
||||
- What's unclear: Is this needed for v1 given all modern browsers support WebSocket? WebSocket failure typically indicates a network/proxy issue that polling would also fail on.
|
||||
- Recommendation: Implement WebSocket only for v1. Add a simple error state ("Connection lost — please refresh") instead of full polling fallback. Real polling fallback is significant complexity for an edge case.
|
||||
|
||||
2. **Media Upload in Web Chat**
|
||||
- What we know: CONTEXT.md says "image/document display inline (consistent with media support from Phase 2)." Phase 2 media goes through MinIO.
|
||||
- What's unclear: Can users upload media directly in web chat (browser file picker), or does "inline display" mean only displaying agent responses that contain media?
|
||||
- Recommendation: v1 — display media in agent responses (agent can return image URLs from MinIO/S3). User-to-agent file upload is a separate feature. The KonstructMessage already supports MediaAttachment; the web normalizer can include media from agent tool results.
|
||||
|
||||
3. **Agent Selection Scope for Platform Admins**
|
||||
- What we know: Platform admins can chat with "any agent across all tenants."
|
||||
- What's unclear: The agent picker UI — does a platform admin see all agents grouped by tenant, or do they first pick a tenant then pick an agent?
|
||||
- Recommendation: Use the existing tenant switcher pattern from the agents page: platform admin sees agents grouped by tenant in the sidebar. This reuses `useTenants()` + `useAgents(tenantId)` pattern already in the agents list page.
|
||||
|
||||
---
|
||||
|
||||
## Validation Architecture
|
||||
|
||||
### Test Framework
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Framework | pytest 8.3.0 + pytest-asyncio 0.25.0 |
|
||||
| Config file | `pyproject.toml` (root) — `asyncio_mode = "auto"`, `testpaths = ["tests"]` |
|
||||
| Quick run command | `pytest tests/unit/test_web_channel.py -x` |
|
||||
| Full suite command | `pytest tests/unit -x` |
|
||||
|
||||
### Phase Requirements → Test Map
|
||||
|
||||
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|
||||
|--------|----------|-----------|-------------------|-------------|
|
||||
| CHAT-01 | WebSocket endpoint accepts connection and dispatches to Celery | unit | `pytest tests/unit/test_web_channel.py::test_websocket_dispatches_to_celery -x` | ❌ Wave 0 |
|
||||
| CHAT-01 | Web channel normalizer produces valid KonstructMessage | unit | `pytest tests/unit/test_web_channel.py::test_normalize_web_event -x` | ❌ Wave 0 |
|
||||
| CHAT-02 | `_send_response` for "web" channel publishes to Redis pub-sub | unit | `pytest tests/unit/test_web_channel.py::test_send_response_web_publishes_to_redis -x` | ❌ Wave 0 |
|
||||
| CHAT-03 | Conversation history REST endpoint returns paginated messages | unit | `pytest tests/unit/test_chat_api.py::test_list_conversation_history -x` | ❌ Wave 0 |
|
||||
| CHAT-04 | Chat API returns 403 for user not member of tenant | unit | `pytest tests/unit/test_chat_api.py::test_chat_rbac_enforcement -x` | ❌ Wave 0 |
|
||||
| CHAT-04 | Platform admin can access agents across all tenants | unit | `pytest tests/unit/test_chat_api.py::test_platform_admin_cross_tenant -x` | ❌ Wave 0 |
|
||||
| CHAT-05 | Typing indicator message sent immediately on WebSocket receive | unit | `pytest tests/unit/test_web_channel.py::test_typing_indicator_sent -x` | ❌ Wave 0 |
|
||||
|
||||
### Sampling Rate
|
||||
- **Per task commit:** `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x`
|
||||
- **Per wave merge:** `pytest tests/unit -x`
|
||||
- **Phase gate:** Full suite green before `/gsd:verify-work`
|
||||
|
||||
### Wave 0 Gaps
|
||||
- [ ] `tests/unit/test_web_channel.py` — covers CHAT-01, CHAT-02, CHAT-05
|
||||
- [ ] `tests/unit/test_chat_api.py` — covers CHAT-03, CHAT-04
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
### Primary (HIGH confidence)
|
||||
- Existing codebase — `packages/gateway/gateway/channels/slack.py`, `whatsapp.py`, `normalize.py` — channel adapter pattern directly replicated
|
||||
- Existing codebase — `packages/orchestrator/orchestrator/tasks.py` — `_send_response` extension point verified by reading full source
|
||||
- Existing codebase — `packages/shared/shared/models/message.py` — ChannelType enum verified, "web" not yet present
|
||||
- Existing codebase — `packages/shared/shared/redis_keys.py` — key naming convention verified
|
||||
- Existing codebase — `packages/shared/shared/api/rbac.py` — `require_tenant_member`, `get_portal_caller` pattern verified
|
||||
- FastAPI source — `fastapi` 0.135.2 installed, `from fastapi import WebSocket` verified importable
|
||||
- redis.asyncio — version 5.0.0+ installed, pub-sub available (`r.pubsub()` verified importable)
|
||||
- Next.js 16 bundled docs — `packages/portal/node_modules/next/dist/docs/` — proxy.ts naming, `use(searchParams)` patterns confirmed
|
||||
- `packages/portal/package.json` — Next.js 16.2.1, React 19.2.4, confirmed packages
|
||||
|
||||
### Secondary (MEDIUM confidence)
|
||||
- `.planning/STATE.md` — all architecture decisions (channel_type TEXT+CHECK, Celery sync-only, hookform resolver, proxy.ts naming) verified against actual files
|
||||
- react-markdown 9.x + remark-gfm 4.x — current stable versions for React 19 compatibility (not yet installed, based on known package state)
|
||||
|
||||
### Tertiary (LOW confidence)
|
||||
- None — all claims verified against codebase or installed package docs
|
||||
|
||||
---
|
||||
|
||||
## Metadata
|
||||
|
||||
**Confidence breakdown:**
|
||||
- Standard stack: HIGH — all backend packages verified installed and importable; portal packages verified via package.json
|
||||
- Architecture: HIGH — channel adapter pattern, extras dict pattern, RBAC pattern all verified by reading actual source files
|
||||
- Pitfalls: HIGH — most pitfalls derive directly from STATE.md documented decisions (CHECK constraint, Celery sync, browser WebSocket header limitation)
|
||||
|
||||
**Research date:** 2026-03-25
|
||||
**Valid until:** 2026-04-25 (stable stack; react-markdown version should be re-checked if planning is delayed)
|
||||
Reference in New Issue
Block a user