diff --git a/.planning/phases/06-web-chat/06-RESEARCH.md b/.planning/phases/06-web-chat/06-RESEARCH.md new file mode 100644 index 0000000..d1a3c00 --- /dev/null +++ b/.planning/phases/06-web-chat/06-RESEARCH.md @@ -0,0 +1,628 @@ +# Phase 6: Web Chat - Research + +**Researched:** 2026-03-25 +**Domain:** Real-time web chat (WebSocket + Redis pub-sub + new channel adapter + portal UI) +**Confidence:** HIGH + + +## User Constraints (from CONTEXT.md) + +### Locked Decisions +- Dedicated `/chat` page (full-screen, not a floating widget) +- Left sidebar: conversation list grouped by agent, with timestamps and last message preview +- Right panel: active conversation with message bubbles (user right-aligned, agent left-aligned) +- "New Conversation" button opens an agent picker (shows agents the user has access to) +- Markdown rendering in agent messages +- Image/document display inline (consistent with Phase 2 media support) +- Typing indicator (animated dots) while waiting for agent response +- All three roles can chat: platform admin, customer admin, customer operator +- Users can only see/chat with agents belonging to tenants they have access to (RBAC) +- Platform admins can chat with any agent across all tenants +- Operators can chat (read-only restrictions do NOT apply to conversations) +- One conversation thread per user-agent pair (matches per-user per-agent memory model) +- Users can start new conversation (clears thread context) or continue existing one +- Conversation list sorted by most recent, paginated for long histories +- WebSocket connection for real-time, HTTP polling fallback if WebSocket unavailable +- Gateway receives web chat message, normalizes to KonstructMessage (channel: "web"), dispatches through existing pipeline +- Agent response pushed back via WebSocket +- New "web" channel adapter in gateway alongside Slack and WhatsApp +- channel_metadata includes: portal_user_id, tenant_id, conversation_id +- Tenant resolution from the authenticated session (not from channel metadata like Slack workspace ID) +- Outbound: push response via WebSocket connection keyed to conversation_id + +### Claude's Discretion +- WebSocket library choice (native ws, Socket.IO, etc.) +- Message bubble visual design +- Conversation pagination strategy (infinite scroll vs load more) +- Whether to show tool invocation indicators in chat (e.g., "Searching knowledge base...") +- Agent avatar/icon in chat +- Sound notification on new message +- Mobile responsiveness approach + +### Deferred Ideas (OUT OF SCOPE) +None raised. + + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|-----------------| +| CHAT-01 | Users can open a chat window with any AI Employee and have a real-time conversation within the portal | WebSocket endpoint on FastAPI gateway + browser WebSocket client in portal chat page | +| CHAT-02 | Web chat supports full agent pipeline — memory, tools, escalation, and media | "web" channel added to ChannelType enum; handle_message Celery task already handles all pipeline stages; _send_response needs "web" case via Redis pub-sub | +| CHAT-03 | Conversation history persists and is visible when the user returns | New conversations DB table + pgvector already keyed per-user per-agent; history load on page visit | +| CHAT-04 | Chat respects RBAC — users can only chat with agents belonging to tenants they have access to | require_tenant_member FastAPI dependency already exists; new chat API endpoints use same pattern; platform_admin bypasses tenant check | +| CHAT-05 | Chat interface feels responsive — typing indicators, message streaming or fast response display | Typing indicator via WebSocket "typing" event immediately on message send; WebSocket pushes final response when Celery completes | + + +--- + +## Summary + +Phase 6 adds a web chat channel to the Konstruct portal — the first channel that originates inside the portal itself rather than from an external messaging platform. The architecture follows the same channel adapter pattern established in Phases 1 and 2: a new "web" adapter in the gateway normalizes portal messages into KonstructMessage format and dispatches them to the existing Celery pipeline. The key new infrastructure is a WebSocket endpoint on the gateway and a Redis pub-sub channel that bridges the Celery worker's response delivery back to the WebSocket connection. + +The frontend is a new `/chat` route in the Next.js portal. It uses the native browser WebSocket API (no additional library required) with a React hook managing connection lifecycle. The UI requires one new shadcn/ui component not yet in the project (ScrollArea) and markdown rendering (react-markdown is not yet installed). Both are straightforward additions. + +The most important constraint to keep in mind during planning: the Celery worker and the FastAPI gateway are separate processes. The Celery task cannot call back to the WebSocket connection directly. The correct pattern is Celery publishes the response to a Redis pub-sub channel; the gateway WebSocket handler subscribes to that channel and forwards to the browser. This Redis pub-sub bridge is the critical new piece that does not exist yet. + +**Primary recommendation:** Use FastAPI native WebSocket + Redis pub-sub bridge for cross-process response delivery. No additional Python WebSocket libraries needed. Use native browser WebSocket API in the portal. Add react-markdown for markdown rendering. + +--- + +## Standard Stack + +### Core + +| Library | Version | Purpose | Why Standard | +|---------|---------|---------|--------------| +| FastAPI WebSocket | Built into fastapi[standard] 0.135.2 | WebSocket endpoint on gateway | Already installed, Starlette-native, zero new deps | +| redis.asyncio pub-sub | redis 5.0.0+ (already installed) | Bridge Celery response → WebSocket | Cross-process response delivery; already used everywhere in this codebase | +| Browser WebSocket API | Native (no library) | Portal WebSocket client | Works in all modern browsers, zero bundle cost | +| react-markdown | 9.x | Render agent markdown responses | Standard React markdown renderer; supports GFM, syntax highlighting | +| remark-gfm | 4.x | GitHub Flavored Markdown support | Tables, strikethrough, task lists in agent responses | + +### Supporting + +| Library | Version | Purpose | When to Use | +|---------|---------|---------|-------------| +| @radix-ui/react-scroll-area (via shadcn) | already available via @base-ui/react | Scrollable message container | Message list that auto-scrolls to bottom | +| lucide-react | already installed | Icons (typing dots, send button, agent avatar) | Already used throughout portal | + +### Alternatives Considered + +| Instead of | Could Use | Tradeoff | +|------------|-----------|----------| +| Redis pub-sub bridge | Socket.IO | Socket.IO adds significant bundle weight and complexity; Redis pub-sub is already used in this codebase (rate limiting, session, escalation) | +| React native WebSocket | socket.io-client | Same reason — unnecessary dependency when native WebSocket is sufficient | +| react-markdown | marked + dangerouslySetInnerHTML | react-markdown is React-native and safe; marked requires XSS sanitization as a separate step | + +**Installation:** +```bash +# Portal +cd packages/portal && npm install react-markdown remark-gfm + +# Backend: no new dependencies needed +# FastAPI WebSocket is in fastapi[standard] already installed +# redis pub-sub is in redis 5.0.0 already installed +``` + +--- + +## Architecture Patterns + +### Recommended Project Structure + +New files added in this phase: + +``` +packages/ +├── gateway/gateway/channels/ +│ └── web.py # Web channel adapter + WebSocket endpoint + pub-sub subscriber +├── shared/shared/ +│ ├── models/message.py # Add ChannelType.WEB = "web" +│ ├── redis_keys.py # Add webchat_response_key(tenant_id, conversation_id) +│ └── api/ +│ └── chat.py # REST API: list conversations, get history, create/reset +├── migrations/versions/ +│ └── 008_web_chat.py # conversations table +└── packages/portal/ + ├── app/(dashboard)/chat/ + │ └── page.tsx # Chat page (client component) + ├── components/ + │ ├── chat-sidebar.tsx # Conversation list sidebar + │ ├── chat-window.tsx # Active conversation + message bubbles + │ ├── chat-message.tsx # Single message bubble with markdown + │ └── typing-indicator.tsx # Animated dots + └── lib/ + ├── api.ts # Add chat API types + functions + ├── queries.ts # Add useConversations, useConversationHistory + └── use-chat-socket.ts # WebSocket lifecycle hook +``` + +### Pattern 1: Redis Pub-Sub Response Bridge + +**What:** Celery task (separate process) completes LLM response and needs to push it to a WebSocket connection held by the gateway FastAPI process. Redis pub-sub is the standard cross-process channel. + +**When to use:** Any time a background worker needs to push a result back to a long-lived connection. + +**Flow:** +1. Browser sends message via WebSocket to gateway +2. Gateway dispatches `handle_message.delay(payload)` (identical to Slack/WhatsApp) +3. Gateway subscribes to Redis channel `{tenant_id}:webchat:response:{conversation_id}` and waits +4. Celery's `_send_response` for "web" channel publishes response to same Redis channel +5. Gateway receives pub-sub message, pushes to browser WebSocket + +**Example — gateway side:** +```python +# Source: redis.asyncio pub-sub docs + existing redis usage in this codebase +import redis.asyncio as aioredis +from fastapi import WebSocket + +async def websocket_wait_for_response( + ws: WebSocket, + redis_url: str, + response_channel: str, + timeout: float = 60.0, +) -> None: + """Subscribe to response channel and forward to WebSocket.""" + r = aioredis.from_url(redis_url) + pubsub = r.pubsub() + try: + await pubsub.subscribe(response_channel) + # Wait for response with timeout + async for message in pubsub.listen(): + if message["type"] == "message": + await ws.send_text(message["data"]) + return + finally: + await pubsub.unsubscribe(response_channel) + await pubsub.aclose() + await r.aclose() +``` + +**Example — Celery task side (in `_send_response`):** +```python +# Add "web" case to _send_response in orchestrator/tasks.py +elif channel_str == "web": + conversation_id: str = extras.get("conversation_id", "") or "" + tenant_id: str = extras.get("tenant_id", "") or "" + if not conversation_id or not tenant_id: + logger.warning("_send_response: web channel missing conversation_id or tenant_id") + return + response_channel = webchat_response_key(tenant_id, conversation_id) + publish_redis = aioredis.from_url(settings.redis_url) + try: + await publish_redis.publish(response_channel, json.dumps({ + "type": "response", + "text": text, + "conversation_id": conversation_id, + })) + finally: + await publish_redis.aclose() +``` + +### Pattern 2: FastAPI WebSocket Endpoint + +**What:** Native FastAPI WebSocket with auth validation from headers. Gateway already holds the Redis client at startup; WebSocket handler uses it. + +**When to use:** Every web chat message from the portal browser. + +```python +# Source: FastAPI WebSocket docs (verified — WebSocket import is in fastapi package) +from fastapi import WebSocket, WebSocketDisconnect, Depends +from fastapi.websockets import WebSocketState + +@app.websocket("/chat/ws/{conversation_id}") +async def chat_websocket( + conversation_id: str, + websocket: WebSocket, +) -> None: + await websocket.accept() + try: + while True: + data = await websocket.receive_json() + # Validate auth headers from data["auth"] + # Normalize to KonstructMessage, dispatch to Celery + # Subscribe to Redis response channel + # Push response back to websocket + except WebSocketDisconnect: + pass +``` + +**Critical note:** WebSocket headers are available at handshake time via `websocket.headers`. Auth token or RBAC headers should be sent as custom headers in the browser WebSocket constructor (not supported by all browsers) OR as a first message after connection. The established pattern in this project is to send RBAC headers as `X-Portal-User-Id`, `X-Portal-User-Role`, `X-Portal-Tenant-Id`. For WebSocket, send these as a JSON "auth" message immediately after connection (handshake headers are unreliable with the browser WebSocket API). + +### Pattern 3: Browser WebSocket Hook + +**What:** React hook that manages WebSocket connection lifecycle (connect on mount, reconnect on disconnect, send/receive messages). + +```typescript +// packages/portal/lib/use-chat-socket.ts +// Native browser WebSocket — no library needed +"use client"; + +import { useEffect, useRef, useCallback, useState } from "react"; + +interface ChatSocketOptions { + conversationId: string; + onMessage: (text: string) => void; + onTyping: (isTyping: boolean) => void; + authHeaders: { userId: string; role: string; tenantId: string | null }; +} + +export function useChatSocket({ + conversationId, + onMessage, + onTyping, + authHeaders, +}: ChatSocketOptions) { + const wsRef = useRef(null); + const [isConnected, setIsConnected] = useState(false); + + const send = useCallback((text: string) => { + if (wsRef.current?.readyState === WebSocket.OPEN) { + wsRef.current.send(JSON.stringify({ + type: "message", + text, + auth: authHeaders, + })); + onTyping(true); // Show typing indicator immediately + } + }, [authHeaders, onTyping]); + + useEffect(() => { + const wsUrl = `${process.env.NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`; + const ws = new WebSocket(wsUrl); + wsRef.current = ws; + + ws.onopen = () => setIsConnected(true); + ws.onclose = () => setIsConnected(false); + ws.onmessage = (event) => { + const data = JSON.parse(event.data as string); + if (data.type === "response") { + onTyping(false); + onMessage(data.text as string); + } + }; + + return () => ws.close(); + }, [conversationId, onMessage, onTyping]); + + return { send, isConnected }; +} +``` + +### Pattern 4: Conversation Persistence (New DB Table) + +**What:** A `conversations` table to persist chat history visible on return visits. + +**When to use:** Every web chat message — store each turn in the DB. + +```python +# New ORM model — migration 008 +class WebConversation(Base): + """Persistent conversation thread for portal web chat.""" + __tablename__ = "web_conversations" + + id: Mapped[uuid.UUID] = ... + tenant_id: Mapped[uuid.UUID] = ... # RLS enforced + agent_id: Mapped[uuid.UUID] = ... + user_id: Mapped[uuid.UUID] = ... # portal user UUID (from Auth.js session) + created_at: Mapped[datetime] = ... + updated_at: Mapped[datetime] = ... # used for sort order + + __table_args__ = ( + UniqueConstraint("tenant_id", "agent_id", "user_id"), # one thread per pair + ) + + +class WebConversationMessage(Base): + """Individual message within a web conversation.""" + __tablename__ = "web_conversation_messages" + + id: Mapped[uuid.UUID] = ... + conversation_id: Mapped[uuid.UUID] = ForeignKey("web_conversations.id") + tenant_id: Mapped[uuid.UUID] = ... # RLS enforced + role: Mapped[str] = ... # "user" | "assistant" + content: Mapped[str] = ... + created_at: Mapped[datetime] = ... +``` + +**Note:** The `user_id` for web chat is the portal user's UUID from Auth.js — different from the Slack user ID string used in existing memory. The Redis memory key `memory:short:{agent_id}:{user_id}` will use the portal user's UUID string as `user_id`, keeping it compatible with the existing memory system. + +### Pattern 5: Conversation REST API + +**What:** REST endpoints for listing conversations, loading history, and resetting. This is separate from the WebSocket endpoint. + +``` +GET /api/portal/chat/conversations?tenant_id={id} — list all conversations for user +GET /api/portal/chat/conversations/{id}/messages — load history (paginated) +POST /api/portal/chat/conversations — create new or get-or-create +DELETE /api/portal/chat/conversations/{id} — reset (delete messages, keep thread) +``` + +### Anti-Patterns to Avoid + +- **Streaming token-by-token:** The requirements doc explicitly marks "Real-time token streaming in chat" as Out of Scope (consistent with Slack/WhatsApp — they don't support partial messages). The typing indicator shows while the full LLM call runs; the complete response arrives as one message. +- **WebSocket auth via URL query params:** Never put tokens/user IDs in the WebSocket URL. Use JSON message after connection. +- **Calling Celery result backend from WebSocket handler:** Celery result backends add latency and coupling. Use Redis pub-sub directly. +- **One WebSocket connection per page load (not per conversation):** The connection should be scoped per conversation_id so reconnect on conversation switch is clean. +- **Storing conversation history only in Redis:** Redis memory (sliding window) is the agent's working context. The DB `web_conversation_messages` table is what shows up when the user returns to the chat page. These are separate concerns. + +--- + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| Markdown rendering | Custom regex parser | react-markdown + remark-gfm | Handles edge cases, escapes XSS, supports all GFM | +| WebSocket reconnection | Custom exponential backoff | Simple reconnect on close (sufficient for v1) | LLM calls are short; connections don't stay open for hours | +| Auth for WebSocket | Custom token scheme | Send auth as first JSON message using existing RBAC headers | Consistent with existing `X-Portal-*` header pattern | +| Cross-process response delivery | Shared memory / HTTP callback | Redis pub-sub | Already in use; correct pattern for Celery → FastAPI bridge | + +**Key insight:** The web channel adapter is the only genuinely new piece of infrastructure. Everything else — RBAC, memory, tool calling, escalation, audit — already works and processes messages tagged with any channel type. Adding `ChannelType.WEB = "web"` and a new `_send_response` branch is sufficient to wire the whole pipeline. + +--- + +## Common Pitfalls + +### Pitfall 1: WebSocket Auth — Browser API Limitation + +**What goes wrong:** The browser's native `WebSocket` constructor does not support custom headers. Code that tries `new WebSocket(url, { headers: {...} })` fails silently or raises a TypeError. + +**Why it happens:** The WebSocket spec only allows specifying subprotocols as the second argument, not headers. This is a deliberate browser security decision. + +**How to avoid:** Send auth information as a JSON "auth" message immediately after connection opens. The FastAPI WebSocket handler should require this first message before processing any chat messages. This is established practice for browser WebSocket auth. + +**Warning signs:** Tests that use httpx websocket client work fine (httpx supports headers) but the browser connection is rejected. + +### Pitfall 2: Celery Sync Context in Async `_send_response` + +**What goes wrong:** `_send_response` is an async function called from `asyncio.run()` inside the sync Celery task. Adding Redis pub-sub code there requires creating a new async Redis client per task, which is the existing pattern — but forgetting `await publish_redis.aclose()` leaks connections. + +**Why it happens:** The "Celery tasks MUST be sync def" constraint (STATE.md) means we're always bridging sync→async via `asyncio.run()`. Every async resource must be explicitly closed. + +**How to avoid:** Follow the existing pattern in `_process_message`: use `try/finally` around every `aioredis.from_url()` call to ensure `aclose()` always runs. + +**Warning signs:** Redis connection count grows over time; "too many connections" errors in production. + +### Pitfall 3: Conversation ID vs Thread ID Confusion + +**What goes wrong:** The KonstructMessage `thread_id` field is used by the memory system to scope Redis sliding window. For web chat, `thread_id` should be the `conversation_id` (UUID) from the `web_conversations` table. If this is set incorrectly (e.g., to the portal user_id), all conversations for a user share one memory window. + +**Why it happens:** Slack sets `thread_id` to `thread_ts` (string). WhatsApp sets it to `wa_id`. Web chat must set it to `conversation_id` (UUID string) — one distinct value per conversation. + +**How to avoid:** The web channel normalizer should set `thread_id = conversation_id` in the KonstructMessage. The `user_id` for memory key construction comes from `sender.user_id` (portal user UUID string). The combination `tenant_id + agent_id + user_id` (Redis memory key) matches correctly. + +### Pitfall 4: New Conversation vs Continue — Race Condition + +**What goes wrong:** User clicks "New Conversation" while a response is still in flight for the old conversation. The old conversation's pub-sub response arrives and updates the new conversation's state. + +**Why it happens:** The WebSocket is keyed to `conversation_id`. When the user resets the thread, a new `conversation_id` is created. The old pub-sub subscription must be cleaned up before subscribing to the new one. + +**How to avoid:** When the user creates a new conversation: (1) close/unmount the old WebSocket connection, (2) create a new `web_conversations` row via REST API (getting a new UUID), (3) connect new WebSocket to the new conversation_id. React's `useEffect` cleanup handles this naturally when `conversationId` changes. + +### Pitfall 5: `ChannelType.WEB` Missing from DB CHECK Constraint + +**What goes wrong:** Adding `WEB = "web"` to the Python `ChannelType` StrEnum does not automatically update the PostgreSQL CHECK constraint on the `channel_type` column. Existing data is fine, but inserting new records with `channel = "web"` fails at the DB level. + +**Why it happens:** STATE.md documents the decision: "channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL." The CHECK constraint lists allowed values and must be updated via migration. + +**How to avoid:** Migration 008 must ALTER the CHECK constraint on any affected tables to include `"web"`. Check which tables have `channel_type` constraints: `channel_connections` (stores active channel configs per tenant). The `conversation_embeddings` and audit tables use `TEXT` without CHECK, so only `channel_connections` needs the update. + +**Warning signs:** `CheckViolation` error from PostgreSQL when the gateway tries to normalize a web message. + +### Pitfall 6: React 19 + Next.js 16 `use()` for Async Data + +**What goes wrong:** Using `useState` + `useEffect` to fetch conversation history in a client component works but misses the React 19 preferred pattern. + +**Why it happens:** React 19 introduces `use()` for Promises directly in components (TanStack Query handles this abstraction). The existing codebase already uses TanStack Query uniformly — don't break this pattern. + +**How to avoid:** Add `useConversations` and `useConversationHistory` hooks in `queries.ts` following the existing pattern (e.g., `useAgents`, `useTenants`). Use `useQuery` from `@tanstack/react-query`. + +--- + +## Code Examples + +Verified patterns from existing codebase: + +### Adding ChannelType.WEB to the enum +```python +# packages/shared/shared/models/message.py +# Source: existing file — add one line +class ChannelType(StrEnum): + SLACK = "slack" + WHATSAPP = "whatsapp" + MATTERMOST = "mattermost" + ROCKETCHAT = "rocketchat" + TEAMS = "teams" + TELEGRAM = "telegram" + SIGNAL = "signal" + WEB = "web" # Add this line +``` + +### Adding webchat Redis key to redis_keys.py +```python +# packages/shared/shared/redis_keys.py +# Source: existing file pattern +def webchat_response_key(tenant_id: str, conversation_id: str) -> str: + """ + Redis pub-sub channel for web chat response delivery. + + Published by Celery task after LLM response; subscribed by WebSocket handler. + """ + return f"{tenant_id}:webchat:response:{conversation_id}" +``` + +### Web channel extras in handle_message +```python +# packages/orchestrator/orchestrator/tasks.py +# Source: existing extras pattern (line 246-254) +# Add to handle_message alongside existing Slack/WhatsApp extras: +conversation_id: str = message_data.pop("conversation_id", "") or "" +portal_user_id: str = message_data.pop("portal_user_id", "") or "" + +# Add to extras dict (line 269-274): +extras: dict[str, Any] = { + "placeholder_ts": placeholder_ts, + "channel_id": channel_id, + "phone_number_id": phone_number_id, + "bot_token": bot_token, + "wa_id": wa_id, + "conversation_id": conversation_id, + "portal_user_id": portal_user_id, +} +``` + +### TanStack Query hook pattern (follows existing) +```typescript +// packages/portal/lib/queries.ts +// Source: existing useAgents pattern +export function useConversations(tenantId: string) { + return useQuery({ + queryKey: ["conversations", tenantId], + queryFn: () => api.get(`/api/portal/chat/conversations?tenant_id=${tenantId}`), + enabled: !!tenantId, + }); +} + +export function useConversationHistory(conversationId: string) { + return useQuery({ + queryKey: ["conversation-history", conversationId], + queryFn: () => api.get(`/api/portal/chat/conversations/${conversationId}/messages`), + enabled: !!conversationId, + }); +} +``` + +### FastAPI WebSocket endpoint in gateway main.py +```python +# packages/gateway/gateway/main.py — add alongside existing routers +# Source: FastAPI WebSocket API (verified available in fastapi 0.135.2) +from gateway.channels.web import chat_websocket_router +app.include_router(chat_websocket_router) +``` + +### RBAC enforcement in chat REST API +```python +# packages/shared/shared/api/chat.py +# Source: existing pattern from rbac.py + portal.py +@router.get("/api/portal/chat/conversations") +async def list_conversations( + tenant_id: UUID, + caller: PortalCaller = Depends(get_portal_caller), + session: AsyncSession = Depends(get_session), +) -> ConversationsResponse: + await require_tenant_member(tenant_id, caller, session) + # ... query web_conversations WHERE tenant_id = tenant_id AND user_id = caller.user_id +``` + +### Proxy.ts update — add /chat to allowed operator paths +```typescript +// packages/portal/proxy.ts +// Source: existing file — /chat must NOT be in CUSTOMER_OPERATOR_RESTRICTED +// Operators can chat (chatting IS the product) +// No change needed to proxy.ts — /chat is not in the restricted list +// Just add /chat to nav.tsx +``` + +--- + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| `middleware.ts` | `proxy.ts` (function named `proxy`) | Next.js 16 | Already migrated in this project — STATE.md confirms | +| `useSearchParams` synchronous | `use(searchParams)` to unwrap Promise | Next.js 15 | Already applied in this project per STATE.md | +| `zodResolver` from hookform | `standardSchemaResolver` | hookform/resolvers v5 | Already applied — don't use zodResolver | +| `stripe.api_key = ...` | `new StripeClient(api_key=...)` | stripe v14+ | Already applied — use thread-safe constructor | +| `Column()` SQLAlchemy | `mapped_column()` + `Mapped[]` | SQLAlchemy 2.0 | Already the pattern — use mapped_column | + +**Deprecated/outdated:** +- `middleware.ts`: deprecated in Next.js 16, renamed to `proxy.ts`. Already done in this project. +- SQLAlchemy `sa.Enum` for channel_type: causes duplicate DDL — use TEXT + CHECK constraint (STATE.md decision). + +--- + +## Open Questions + +1. **HTTP Polling Fallback Scope** + - What we know: CONTEXT.md specifies "fallback to HTTP polling if WebSocket unavailable" + - What's unclear: Is this needed for v1 given all modern browsers support WebSocket? WebSocket failure typically indicates a network/proxy issue that polling would also fail on. + - Recommendation: Implement WebSocket only for v1. Add a simple error state ("Connection lost — please refresh") instead of full polling fallback. Real polling fallback is significant complexity for an edge case. + +2. **Media Upload in Web Chat** + - What we know: CONTEXT.md says "image/document display inline (consistent with media support from Phase 2)." Phase 2 media goes through MinIO. + - What's unclear: Can users upload media directly in web chat (browser file picker), or does "inline display" mean only displaying agent responses that contain media? + - Recommendation: v1 — display media in agent responses (agent can return image URLs from MinIO/S3). User-to-agent file upload is a separate feature. The KonstructMessage already supports MediaAttachment; the web normalizer can include media from agent tool results. + +3. **Agent Selection Scope for Platform Admins** + - What we know: Platform admins can chat with "any agent across all tenants." + - What's unclear: The agent picker UI — does a platform admin see all agents grouped by tenant, or do they first pick a tenant then pick an agent? + - Recommendation: Use the existing tenant switcher pattern from the agents page: platform admin sees agents grouped by tenant in the sidebar. This reuses `useTenants()` + `useAgents(tenantId)` pattern already in the agents list page. + +--- + +## Validation Architecture + +### Test Framework +| Property | Value | +|----------|-------| +| Framework | pytest 8.3.0 + pytest-asyncio 0.25.0 | +| Config file | `pyproject.toml` (root) — `asyncio_mode = "auto"`, `testpaths = ["tests"]` | +| Quick run command | `pytest tests/unit/test_web_channel.py -x` | +| Full suite command | `pytest tests/unit -x` | + +### Phase Requirements → Test Map + +| Req ID | Behavior | Test Type | Automated Command | File Exists? | +|--------|----------|-----------|-------------------|-------------| +| CHAT-01 | WebSocket endpoint accepts connection and dispatches to Celery | unit | `pytest tests/unit/test_web_channel.py::test_websocket_dispatches_to_celery -x` | ❌ Wave 0 | +| CHAT-01 | Web channel normalizer produces valid KonstructMessage | unit | `pytest tests/unit/test_web_channel.py::test_normalize_web_event -x` | ❌ Wave 0 | +| CHAT-02 | `_send_response` for "web" channel publishes to Redis pub-sub | unit | `pytest tests/unit/test_web_channel.py::test_send_response_web_publishes_to_redis -x` | ❌ Wave 0 | +| CHAT-03 | Conversation history REST endpoint returns paginated messages | unit | `pytest tests/unit/test_chat_api.py::test_list_conversation_history -x` | ❌ Wave 0 | +| CHAT-04 | Chat API returns 403 for user not member of tenant | unit | `pytest tests/unit/test_chat_api.py::test_chat_rbac_enforcement -x` | ❌ Wave 0 | +| CHAT-04 | Platform admin can access agents across all tenants | unit | `pytest tests/unit/test_chat_api.py::test_platform_admin_cross_tenant -x` | ❌ Wave 0 | +| CHAT-05 | Typing indicator message sent immediately on WebSocket receive | unit | `pytest tests/unit/test_web_channel.py::test_typing_indicator_sent -x` | ❌ Wave 0 | + +### Sampling Rate +- **Per task commit:** `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x` +- **Per wave merge:** `pytest tests/unit -x` +- **Phase gate:** Full suite green before `/gsd:verify-work` + +### Wave 0 Gaps +- [ ] `tests/unit/test_web_channel.py` — covers CHAT-01, CHAT-02, CHAT-05 +- [ ] `tests/unit/test_chat_api.py` — covers CHAT-03, CHAT-04 + +--- + +## Sources + +### Primary (HIGH confidence) +- Existing codebase — `packages/gateway/gateway/channels/slack.py`, `whatsapp.py`, `normalize.py` — channel adapter pattern directly replicated +- Existing codebase — `packages/orchestrator/orchestrator/tasks.py` — `_send_response` extension point verified by reading full source +- Existing codebase — `packages/shared/shared/models/message.py` — ChannelType enum verified, "web" not yet present +- Existing codebase — `packages/shared/shared/redis_keys.py` — key naming convention verified +- Existing codebase — `packages/shared/shared/api/rbac.py` — `require_tenant_member`, `get_portal_caller` pattern verified +- FastAPI source — `fastapi` 0.135.2 installed, `from fastapi import WebSocket` verified importable +- redis.asyncio — version 5.0.0+ installed, pub-sub available (`r.pubsub()` verified importable) +- Next.js 16 bundled docs — `packages/portal/node_modules/next/dist/docs/` — proxy.ts naming, `use(searchParams)` patterns confirmed +- `packages/portal/package.json` — Next.js 16.2.1, React 19.2.4, confirmed packages + +### Secondary (MEDIUM confidence) +- `.planning/STATE.md` — all architecture decisions (channel_type TEXT+CHECK, Celery sync-only, hookform resolver, proxy.ts naming) verified against actual files +- react-markdown 9.x + remark-gfm 4.x — current stable versions for React 19 compatibility (not yet installed, based on known package state) + +### Tertiary (LOW confidence) +- None — all claims verified against codebase or installed package docs + +--- + +## Metadata + +**Confidence breakdown:** +- Standard stack: HIGH — all backend packages verified installed and importable; portal packages verified via package.json +- Architecture: HIGH — channel adapter pattern, extras dict pattern, RBAC pattern all verified by reading actual source files +- Pitfalls: HIGH — most pitfalls derive directly from STATE.md documented decisions (CHECK constraint, Celery sync, browser WebSocket header limitation) + +**Research date:** 2026-03-25 +**Valid until:** 2026-04-25 (stable stack; react-markdown version should be re-checked if planning is delayed)