Compare commits

..

3 Commits

6 changed files with 1500 additions and 17 deletions

View File

@@ -111,6 +111,23 @@ Plans:
- [ ] 05-03-PLAN.md — Human verification: test all three creation paths, RBAC enforcement, system prompt auto-generation
- [ ] 05-04-PLAN.md — Gap closure: add /agents/new to proxy RBAC restrictions, hide New Employee button for operators, fix wizard deploy error handling
### Phase 6: Web Chat
**Goal**: Users can chat with AI Employees directly in the portal through a real-time web chat interface — no external messaging platform required
**Depends on**: Phase 5
**Requirements**: CHAT-01, CHAT-02, CHAT-03, CHAT-04, CHAT-05
**Success Criteria** (what must be TRUE):
1. A user can open a chat window with any AI Employee and have a real-time conversation within the portal
2. The chat interface supports the full agent pipeline — memory, tools, escalation, and media (same capabilities as Slack/WhatsApp)
3. Conversation history persists and is visible when the user returns to the chat
4. The chat respects RBAC — users can only chat with agents belonging to tenants they have access to
5. The chat interface feels responsive — typing indicators, message streaming or fast response display
**Plans**: 3 plans
Plans:
- [ ] 06-01-PLAN.md — Backend: DB migration (web_conversations + web_conversation_messages), ORM models, ChannelType.WEB, Redis pub-sub key, WebSocket endpoint, web channel adapter, chat REST API with RBAC, orchestrator _send_response wiring, unit tests
- [ ] 06-02-PLAN.md — Frontend: /chat page with conversation sidebar, message window with markdown rendering, typing indicators, WebSocket hook, agent picker dialog, nav link, react-markdown install
- [ ] 06-03-PLAN.md — Human verification: end-to-end chat flow, conversation persistence, RBAC enforcement, markdown rendering, all roles can chat
## Progress
**Execution Order:**
@@ -123,7 +140,7 @@ Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6
| 3. Operator Experience | 5/5 | Complete | 2026-03-24 |
| 4. RBAC | 3/3 | Complete | 2026-03-24 |
| 5. Employee Design | 4/4 | Complete | 2026-03-25 |
| 6. Web Chat | 0/0 | Not started | - |
| 6. Web Chat | 0/3 | Not started | - |
---
@@ -131,21 +148,6 @@ Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6
**LLM-03 conflict resolved:** BYO API keys confirmed in v1 scope per user decision during Phase 3 context gathering. Implemented via Fernet encryption in Phase 3.
### Phase 6: Web Chat
**Goal**: Users can chat with AI Employees directly in the portal through a real-time web chat interface — no external messaging platform required
**Depends on**: Phase 5
**Requirements**: CHAT-01, CHAT-02, CHAT-03, CHAT-04, CHAT-05
**Success Criteria** (what must be TRUE):
1. A user can open a chat window with any AI Employee and have a real-time conversation within the portal
2. The chat interface supports the full agent pipeline — memory, tools, escalation, and media (same capabilities as Slack/WhatsApp)
3. Conversation history persists and is visible when the user returns to the chat
4. The chat respects RBAC — users can only chat with agents belonging to tenants they have access to
5. The chat interface feels responsive — typing indicators, message streaming or fast response display
**Plans**: 0 plans
Plans:
- [ ] TBD (run /gsd:plan-phase 6 to break down)
---
*Roadmap created: 2026-03-23*
*Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements mapped*
*Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements + 5 Web Chat requirements mapped*

View File

@@ -0,0 +1,329 @@
---
phase: 06-web-chat
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- packages/shared/shared/models/message.py
- packages/shared/shared/redis_keys.py
- packages/shared/shared/models/chat.py
- packages/shared/shared/api/chat.py
- packages/shared/shared/api/__init__.py
- packages/gateway/gateway/channels/web.py
- packages/gateway/gateway/main.py
- packages/orchestrator/orchestrator/tasks.py
- migrations/versions/008_web_chat.py
- tests/unit/test_web_channel.py
- tests/unit/test_chat_api.py
autonomous: true
requirements:
- CHAT-01
- CHAT-02
- CHAT-03
- CHAT-04
- CHAT-05
must_haves:
truths:
- "Web channel messages normalize into valid KonstructMessage with channel='web'"
- "Celery _send_response publishes web channel responses to Redis pub-sub"
- "WebSocket endpoint accepts connections and dispatches messages to Celery pipeline"
- "Typing indicator event is sent immediately after receiving a user message"
- "Chat REST API enforces RBAC — non-members get 403"
- "Platform admin can access conversations for any tenant"
- "Conversation history persists in DB and is loadable via REST"
artifacts:
- path: "packages/shared/shared/models/chat.py"
provides: "WebConversation and WebConversationMessage ORM models"
contains: "class WebConversation"
- path: "packages/gateway/gateway/channels/web.py"
provides: "WebSocket endpoint and web channel normalizer"
contains: "async def chat_websocket"
- path: "packages/shared/shared/api/chat.py"
provides: "REST API for conversation CRUD"
exports: ["chat_router"]
- path: "migrations/versions/008_web_chat.py"
provides: "DB migration for web_conversations and web_conversation_messages tables"
contains: "web_conversations"
- path: "tests/unit/test_web_channel.py"
provides: "Unit tests for web channel adapter"
contains: "test_normalize_web_event"
- path: "tests/unit/test_chat_api.py"
provides: "Unit tests for chat REST API with RBAC"
contains: "test_chat_rbac_enforcement"
key_links:
- from: "packages/gateway/gateway/channels/web.py"
to: "packages/orchestrator/orchestrator/tasks.py"
via: "handle_message.delay() Celery dispatch"
pattern: "handle_message\\.delay"
- from: "packages/orchestrator/orchestrator/tasks.py"
to: "packages/shared/shared/redis_keys.py"
via: "Redis pub-sub publish for web channel"
pattern: "webchat_response_key"
- from: "packages/gateway/gateway/channels/web.py"
to: "packages/shared/shared/redis_keys.py"
via: "Redis pub-sub subscribe for response delivery"
pattern: "webchat_response_key"
- from: "packages/shared/shared/api/chat.py"
to: "packages/shared/shared/api/rbac.py"
via: "require_tenant_member RBAC guard"
pattern: "require_tenant_member"
user_setup: []
---
<objective>
Build the complete backend infrastructure for web chat: DB schema, ORM models, web channel adapter with WebSocket endpoint, Redis pub-sub response bridge, chat REST API with RBAC, and orchestrator integration. After this plan, the portal can send messages via WebSocket and receive responses through the full agent pipeline.
Purpose: Enables the portal to use the same agent pipeline as Slack/WhatsApp via a new "web" channel — the foundational plumbing that the frontend chat UI (Plan 02) connects to.
Output: Working WebSocket endpoint, conversation persistence, RBAC-enforced REST API, and unit tests.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/06-web-chat/06-CONTEXT.md
@.planning/phases/06-web-chat/06-RESEARCH.md
<interfaces>
<!-- Key types and contracts the executor needs. Extracted from codebase. -->
From packages/shared/shared/models/message.py:
```python
class ChannelType(StrEnum):
SLACK = "slack"
WHATSAPP = "whatsapp"
MATTERMOST = "mattermost"
ROCKETCHAT = "rocketchat"
TEAMS = "teams"
TELEGRAM = "telegram"
SIGNAL = "signal"
# WEB = "web" <-- ADD THIS
class KonstructMessage(BaseModel):
id: str
tenant_id: str | None
channel: ChannelType
channel_metadata: dict[str, Any]
sender: SenderInfo
content: MessageContent
timestamp: datetime
thread_id: str | None
reply_to: str | None
context: dict[str, Any]
```
From packages/shared/shared/redis_keys.py:
```python
# All keys follow: {tenant_id}:{key_type}:{discriminator}
def memory_short_key(tenant_id: str, agent_id: str, user_id: str) -> str
def escalation_status_key(tenant_id: str, thread_id: str) -> str
# ADD: webchat_response_key(tenant_id, conversation_id)
```
From packages/shared/shared/api/rbac.py:
```python
@dataclass
class PortalCaller:
user_id: uuid.UUID
role: str
tenant_id: uuid.UUID | None = None
async def get_portal_caller(...) -> PortalCaller
async def require_tenant_member(tenant_id: UUID, caller: PortalCaller, session: AsyncSession) -> None
async def require_tenant_admin(tenant_id: UUID, caller: PortalCaller, session: AsyncSession) -> None
```
From packages/orchestrator/orchestrator/tasks.py:
```python
# handle_message pops extras before model_validate:
# placeholder_ts, channel_id, phone_number_id, bot_token
# ADD: conversation_id, portal_user_id, tenant_id (for web)
# _send_response routes by channel_str:
# "slack" -> _update_slack_placeholder
# "whatsapp" -> send_whatsapp_message
# ADD: "web" -> Redis pub-sub publish
# _build_response_extras builds channel-specific extras dict
# ADD: "web" case returning conversation_id + tenant_id
```
From packages/shared/shared/api/__init__.py:
```python
# Current routers mounted on gateway:
# portal_router, billing_router, channels_router, llm_keys_router,
# usage_router, webhook_router, invitations_router, templates_router
# ADD: chat_router
```
From packages/gateway/gateway/main.py:
```python
# CORS allows: localhost:3000, 127.0.0.1:3000, 100.64.0.10:3000
# WebSocket doesn't use CORS (browser doesn't enforce) but same origin rules apply
# Include chat_router and WebSocket router here
```
</interfaces>
</context>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Backend models, migration, channel type, Redis key, and unit tests</name>
<files>
packages/shared/shared/models/message.py,
packages/shared/shared/redis_keys.py,
packages/shared/shared/models/chat.py,
migrations/versions/008_web_chat.py,
tests/unit/test_web_channel.py,
tests/unit/test_chat_api.py
</files>
<behavior>
- test_normalize_web_event: normalize_web_event({text, tenant_id, agent_id, user_id, conversation_id}) -> KonstructMessage with channel=WEB, thread_id=conversation_id, sender.user_id=portal_user_id
- test_send_response_web_publishes_to_redis: _send_response("web", "hello", {conversation_id, tenant_id}) publishes JSON to Redis channel matching webchat_response_key(tenant_id, conversation_id)
- test_typing_indicator_sent: WebSocket handler sends {"type": "typing"} immediately after receiving user message, before Celery dispatch
- test_chat_rbac_enforcement: GET /api/portal/chat/conversations?tenant_id=X returns 403 when caller is not a member of tenant X
- test_platform_admin_cross_tenant: GET /api/portal/chat/conversations?tenant_id=X returns 200 when caller is platform_admin (bypasses membership)
- test_list_conversation_history: GET /api/portal/chat/conversations/{id}/messages returns paginated messages ordered by created_at
- test_create_conversation: POST /api/portal/chat/conversations with {tenant_id, agent_id} creates or returns existing conversation for user+agent pair
</behavior>
<action>
1. Add WEB = "web" to ChannelType in packages/shared/shared/models/message.py
2. Add webchat_response_key(tenant_id, conversation_id) to packages/shared/shared/redis_keys.py following existing pattern: return f"{tenant_id}:webchat:response:{conversation_id}"
3. Create packages/shared/shared/models/chat.py with ORM models:
- WebConversation: id (UUID PK), tenant_id (UUID, FK tenants.id), agent_id (UUID, FK agents.id), user_id (UUID, FK portal_users.id), created_at, updated_at. UniqueConstraint on (tenant_id, agent_id, user_id). RLS via tenant_id.
- WebConversationMessage: id (UUID PK), conversation_id (UUID, FK web_conversations.id ON DELETE CASCADE), tenant_id (UUID), role (TEXT, CHECK "user"/"assistant"), content (TEXT), created_at. RLS via tenant_id.
Use mapped_column() + Mapped[] (SQLAlchemy 2.0 pattern, not Column()).
4. Create migration 008_web_chat.py:
- Create web_conversations table with columns matching ORM model
- Create web_conversation_messages table with FK to web_conversations
- Enable RLS on both tables (FORCE ROW LEVEL SECURITY)
- Create RLS policies matching existing pattern (current_setting('app.current_tenant')::uuid)
- ALTER CHECK constraint on channel_connections.channel_type to include 'web' (see Pitfall 5 in RESEARCH.md — the existing CHECK must be replaced, not just added to)
- Add index on web_conversation_messages(conversation_id, created_at)
5. Write test files FIRST (RED phase):
- tests/unit/test_web_channel.py: test normalize_web_event, test _send_response web publishes to Redis (mock aioredis), test typing indicator
- tests/unit/test_chat_api.py: test RBAC enforcement (403 for non-member), platform admin cross-tenant (200), list history (paginated), create conversation (get-or-create)
Use httpx AsyncClient with app fixture pattern from existing tests. Mock DB sessions and Redis.
IMPORTANT: Celery tasks MUST be sync def with asyncio.run() — never async def (hard architectural constraint).
IMPORTANT: Use TEXT+CHECK for role column (not sa.Enum) per Phase 1 convention.
IMPORTANT: _send_response "web" case must use try/finally around aioredis.from_url() to avoid connection leaks (Pitfall 2 from RESEARCH.md).
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x -v</automated>
</verify>
<done>
ChannelType.WEB exists. webchat_response_key function exists. ORM models define web_conversations and web_conversation_messages. Migration 008 creates both tables with RLS and updates channel_type CHECK constraint. All test assertions pass (RED then GREEN).
</done>
</task>
<task type="auto">
<name>Task 2: WebSocket endpoint, web channel adapter, REST API, orchestrator wiring</name>
<files>
packages/gateway/gateway/channels/web.py,
packages/shared/shared/api/chat.py,
packages/shared/shared/api/__init__.py,
packages/gateway/gateway/main.py,
packages/orchestrator/orchestrator/tasks.py
</files>
<action>
1. Create packages/gateway/gateway/channels/web.py with:
a. normalize_web_event() function: takes dict with {text, tenant_id, agent_id, user_id, display_name, conversation_id} and returns KonstructMessage with channel=ChannelType.WEB, thread_id=conversation_id, sender.user_id=user_id (portal user UUID string), channel_metadata={portal_user_id, tenant_id, conversation_id}
b. WebSocket endpoint at /chat/ws/{conversation_id}:
- Accept connection
- Wait for first JSON message with type="auth" containing {userId, role, tenantId} (browser cannot send custom headers — Pitfall 1 from RESEARCH.md)
- Validate auth: userId must be non-empty UUID string, role must be valid
- For each subsequent message (type="message"):
* Immediately send {"type": "typing"} back to client (CHAT-05)
* Normalize message to KonstructMessage via normalize_web_event
* Save user message to web_conversation_messages table
* Build extras dict: conversation_id, portal_user_id, tenant_id
* Dispatch handle_message.delay(msg.model_dump() | extras)
* Subscribe to Redis pub-sub channel webchat_response_key(tenant_id, conversation_id) with 60s timeout
* When response arrives: save assistant message to web_conversation_messages, send {"type": "response", "text": ..., "conversation_id": ...} to WebSocket
- On disconnect: unsubscribe and close Redis connections
c. Create an APIRouter with the WebSocket route for mounting
2. Create packages/shared/shared/api/chat.py with REST endpoints:
a. GET /api/portal/chat/conversations?tenant_id={id} — list conversations for the authenticated user within a tenant. For platform_admin: returns conversations across all tenants if no tenant_id. Uses require_tenant_member for RBAC. Returns [{id, agent_id, agent_name, updated_at, last_message_preview}] sorted by updated_at DESC.
b. GET /api/portal/chat/conversations/{id}/messages?limit=50&before={cursor} — paginated message history. Verify caller owns the conversation (same user_id) OR is platform_admin. Returns [{id, role, content, created_at}] ordered by created_at ASC.
c. POST /api/portal/chat/conversations — create or get-or-create conversation. Body: {tenant_id, agent_id}. Uses require_tenant_member. Returns conversation object with id.
d. DELETE /api/portal/chat/conversations/{id} — reset conversation (delete messages, keep row). Updates updated_at. Verify ownership.
All endpoints use Depends(get_portal_caller) and Depends(get_session). Set RLS context var (configure_rls_hook + current_tenant_id.set) before DB queries.
3. Update packages/shared/shared/api/__init__.py: add chat_router to imports and __all__
4. Update packages/gateway/gateway/main.py:
- Import chat_router from shared.api and web channel router from gateway.channels.web
- app.include_router(chat_router) for REST endpoints
- app.include_router(web_chat_router) for WebSocket endpoint
- Add comment block "Phase 6 Web Chat routers"
5. Update packages/orchestrator/orchestrator/tasks.py:
a. In handle_message: pop "conversation_id" and "portal_user_id" before model_validate (same pattern as placeholder_ts, channel_id). Add to extras dict.
b. In _build_response_extras: add "web" case returning {"conversation_id": extras.get("conversation_id"), "tenant_id": extras.get("tenant_id")}. Note: tenant_id for web comes from extras, not from channel_metadata like Slack.
c. In _send_response: add "web" case that publishes to Redis pub-sub:
```python
elif channel_str == "web":
conversation_id = extras.get("conversation_id", "")
tenant_id = extras.get("tenant_id", "")
if not conversation_id or not tenant_id:
logger.warning("_send_response: web channel missing conversation_id or tenant_id")
return
response_channel = webchat_response_key(tenant_id, conversation_id)
publish_redis = aioredis.from_url(settings.redis_url)
try:
await publish_redis.publish(response_channel, json.dumps({
"type": "response", "text": text, "conversation_id": conversation_id,
}))
finally:
await publish_redis.aclose()
```
d. Import webchat_response_key from shared.redis_keys at module level (matches existing import pattern for other keys)
IMPORTANT: WebSocket auth via JSON message after connection (NOT URL params or headers — browser limitation).
IMPORTANT: Redis pub-sub subscribe in WebSocket handler must use try/finally for cleanup (Pitfall 2).
IMPORTANT: The web normalizer must set thread_id = conversation_id (Pitfall 3 — conversation ID scopes memory correctly).
IMPORTANT: For DB access in WebSocket handler, use configure_rls_hook + current_tenant_id context var per existing pattern.
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x -v</automated>
</verify>
<done>
WebSocket endpoint at /chat/ws/{conversation_id} accepts connections, authenticates via JSON message, dispatches to Celery, subscribes to Redis for response. REST API provides conversation CRUD with RBAC. Orchestrator _send_response handles "web" channel via Redis pub-sub publish. All unit tests pass. Gateway mounts both routers.
</done>
</task>
</tasks>
<verification>
1. All unit tests pass: `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x`
2. Migration 008 applies cleanly: `cd /home/adelorenzo/repos/konstruct && alembic upgrade head`
3. Gateway starts without errors: `cd /home/adelorenzo/repos/konstruct/packages/gateway && python -c "from gateway.main import app; print('OK')"`
4. Full test suite still green: `pytest tests/unit -x`
</verification>
<success_criteria>
- ChannelType includes WEB
- WebSocket endpoint exists at /chat/ws/{conversation_id}
- REST API at /api/portal/chat/* provides conversation CRUD with RBAC
- _send_response in tasks.py handles "web" channel via Redis pub-sub
- web_conversations and web_conversation_messages tables created with RLS
- All 7+ unit tests pass covering CHAT-01 through CHAT-05
</success_criteria>
<output>
After completion, create `.planning/phases/06-web-chat/06-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,325 @@
---
phase: 06-web-chat
plan: 02
type: execute
wave: 2
depends_on: ["06-01"]
files_modified:
- packages/portal/app/(dashboard)/chat/page.tsx
- packages/portal/components/chat-sidebar.tsx
- packages/portal/components/chat-window.tsx
- packages/portal/components/chat-message.tsx
- packages/portal/components/typing-indicator.tsx
- packages/portal/lib/use-chat-socket.ts
- packages/portal/lib/queries.ts
- packages/portal/lib/api.ts
- packages/portal/components/nav.tsx
- packages/portal/package.json
autonomous: true
requirements:
- CHAT-01
- CHAT-03
- CHAT-04
- CHAT-05
must_haves:
truths:
- "User can navigate to /chat from the sidebar and see a conversation list"
- "User can select an agent and start a new conversation"
- "User can type a message and see it appear as a right-aligned bubble"
- "Agent response appears as a left-aligned bubble with markdown rendering"
- "Typing indicator (animated dots) shows while waiting for agent response"
- "Conversation history loads when user returns to a previous conversation"
- "Operator, customer admin, and platform admin can all access /chat"
artifacts:
- path: "packages/portal/app/(dashboard)/chat/page.tsx"
provides: "Main chat page with sidebar + active conversation"
min_lines: 50
- path: "packages/portal/components/chat-sidebar.tsx"
provides: "Conversation list with agent names and timestamps"
contains: "ChatSidebar"
- path: "packages/portal/components/chat-window.tsx"
provides: "Active conversation with message list, input, and send button"
contains: "ChatWindow"
- path: "packages/portal/components/chat-message.tsx"
provides: "Message bubble with markdown rendering and role-based alignment"
contains: "ChatMessage"
- path: "packages/portal/components/typing-indicator.tsx"
provides: "Animated typing dots component"
contains: "TypingIndicator"
- path: "packages/portal/lib/use-chat-socket.ts"
provides: "React hook managing WebSocket lifecycle"
contains: "useChatSocket"
key_links:
- from: "packages/portal/lib/use-chat-socket.ts"
to: "packages/gateway/gateway/channels/web.py"
via: "WebSocket connection to /chat/ws/{conversationId}"
pattern: "new WebSocket"
- from: "packages/portal/app/(dashboard)/chat/page.tsx"
to: "packages/portal/lib/queries.ts"
via: "useConversations + useConversationHistory hooks"
pattern: "useConversations|useConversationHistory"
- from: "packages/portal/components/nav.tsx"
to: "packages/portal/app/(dashboard)/chat/page.tsx"
via: "Nav link to /chat"
pattern: 'href.*"/chat"'
---
<objective>
Build the complete portal chat UI: a dedicated /chat page with conversation sidebar, message window with markdown rendering, typing indicators, and WebSocket integration. Users can start conversations with AI Employees, see real-time responses, and browse conversation history.
Purpose: Delivers the user-facing chat experience that connects to the backend infrastructure from Plan 01.
Output: Fully interactive chat page in the portal with all CHAT requirements addressed.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/06-web-chat/06-CONTEXT.md
@.planning/phases/06-web-chat/06-RESEARCH.md
@.planning/phases/06-web-chat/06-01-SUMMARY.md
<interfaces>
<!-- From Plan 01 — backend contracts the frontend connects to -->
WebSocket endpoint: ws://localhost:8001/chat/ws/{conversationId}
Protocol:
1. Client connects
2. Client sends: {"type": "auth", "userId": "uuid", "role": "role_string", "tenantId": "uuid|null"}
3. Client sends: {"type": "message", "text": "user message"}
4. Server sends: {"type": "typing"} (immediate)
5. Server sends: {"type": "response", "text": "agent reply", "conversation_id": "uuid"}
REST API:
GET /api/portal/chat/conversations?tenant_id={id}
-> [{id, agent_id, agent_name, updated_at, last_message_preview}]
GET /api/portal/chat/conversations/{id}/messages?limit=50&before={cursor}
-> [{id, role, content, created_at}]
POST /api/portal/chat/conversations
Body: {tenant_id, agent_id}
-> {id, tenant_id, agent_id, user_id, created_at, updated_at}
DELETE /api/portal/chat/conversations/{id}
-> 204
From packages/portal/lib/api.ts:
```typescript
export function setPortalSession(session: {...}): void;
function getAuthHeaders(): Record<string, string>;
const api = { get<T>, post<T>, put<T>, delete };
```
From packages/portal/lib/queries.ts:
```typescript
export const queryKeys = { tenants, agents, ... };
export function useAgents(tenantId: string): UseQueryResult<Agent[]>;
export function useTenants(page?: number): UseQueryResult<TenantsListResponse>;
// ADD: useConversations, useConversationHistory, useCreateConversation, useDeleteConversation
```
From packages/portal/components/nav.tsx:
```typescript
const navItems: NavItem[] = [
{ href: "/dashboard", ... },
{ href: "/agents", label: "Employees", ... },
// ADD: { href: "/chat", label: "Chat", icon: MessageSquare }
// Visible to ALL roles (no allowedRoles restriction)
];
```
From packages/portal/proxy.ts:
```typescript
const CUSTOMER_OPERATOR_RESTRICTED = ["/billing", "/settings/api-keys", "/users", "/admin", "/agents/new"];
// /chat is NOT in this list — operators CAN access chat (per CONTEXT.md: "chatting IS the product")
```
</interfaces>
</context>
<tasks>
<task type="auto">
<name>Task 1: Install dependencies, add API types/hooks, create WebSocket hook</name>
<files>
packages/portal/package.json,
packages/portal/lib/api.ts,
packages/portal/lib/queries.ts,
packages/portal/lib/use-chat-socket.ts
</files>
<action>
1. Install react-markdown and remark-gfm:
`cd packages/portal && npm install react-markdown remark-gfm`
2. Add chat types to packages/portal/lib/api.ts (at the bottom, after existing types):
```typescript
// Chat types
export interface Conversation {
id: string;
agent_id: string;
agent_name: string;
updated_at: string;
last_message_preview: string | null;
}
export interface ConversationMessage {
id: string;
role: "user" | "assistant";
content: string;
created_at: string;
}
export interface CreateConversationRequest {
tenant_id: string;
agent_id: string;
}
export interface ConversationDetail {
id: string;
tenant_id: string;
agent_id: string;
user_id: string;
created_at: string;
updated_at: string;
}
```
3. Add chat hooks to packages/portal/lib/queries.ts:
- Add to queryKeys: conversations(tenantId) and conversationHistory(conversationId)
- useConversations(tenantId: string) — GET /api/portal/chat/conversations?tenant_id={tenantId}, returns Conversation[], enabled: !!tenantId
- useConversationHistory(conversationId: string) — GET /api/portal/chat/conversations/{conversationId}/messages, returns ConversationMessage[], enabled: !!conversationId
- useCreateConversation() — POST mutation to /api/portal/chat/conversations, invalidates conversations query on success
- useDeleteConversation() — DELETE mutation, invalidates conversations + history queries
Follow the exact same pattern as useAgents, useCreateAgent, etc.
4. Create packages/portal/lib/use-chat-socket.ts:
- "use client" directive at top
- useChatSocket({ conversationId, onMessage, onTyping, authHeaders }) hook
- authHeaders: { userId: string; role: string; tenantId: string | null }
- On mount: create WebSocket to `${NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`
- On open: send auth JSON message immediately
- On message: parse JSON, if type="typing" call onTyping(true), if type="response" call onTyping(false) then onMessage(data.text)
- send(text: string) function: sends {"type": "message", "text": text} if connected
- Return { send, isConnected }
- On unmount/conversationId change: close WebSocket (useEffect cleanup)
- Simple reconnect: on close, attempt reconnect after 3s (limit to 3 retries, then show error)
- Use useRef for WebSocket instance, useState for isConnected
- Use useCallback for send to keep stable reference
IMPORTANT: Read packages/portal/node_modules/next/dist/docs/ for any relevant Next.js 16 patterns before writing code.
IMPORTANT: Use NEXT_PUBLIC_WS_URL env var (not NEXT_PUBLIC_API_URL) — WebSocket URL may differ from REST API URL.
IMPORTANT: Auth message sent as first JSON payload after connection (browser WebSocket cannot send custom headers).
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx next build 2>&1 | tail -20</automated>
</verify>
<done>
react-markdown and remark-gfm installed. Chat types exported from api.ts. Four query hooks (useConversations, useConversationHistory, useCreateConversation, useDeleteConversation) added to queries.ts. useChatSocket hook manages WebSocket lifecycle with auth and reconnection. Portal builds without errors.
</done>
</task>
<task type="auto">
<name>Task 2: Chat page, components, nav link, and styling</name>
<files>
packages/portal/app/(dashboard)/chat/page.tsx,
packages/portal/components/chat-sidebar.tsx,
packages/portal/components/chat-window.tsx,
packages/portal/components/chat-message.tsx,
packages/portal/components/typing-indicator.tsx,
packages/portal/components/nav.tsx
</files>
<action>
1. Create packages/portal/components/typing-indicator.tsx:
- "use client" component
- Three animated dots with CSS animation (scale/opacity pulsing with staggered delays)
- Wrapped in a message-bubble-style container (left-aligned, muted background)
- Use Tailwind animate classes or inline keyframes
2. Create packages/portal/components/chat-message.tsx:
- "use client" component
- Props: { role: "user" | "assistant"; content: string; createdAt: string }
- User messages: right-aligned, primary color background, white text
- Assistant messages: left-aligned, muted background, with agent avatar icon (Bot from lucide-react)
- Render content with react-markdown + remark-gfm for assistant messages (code blocks, lists, bold, links)
- User messages: plain text (no markdown rendering needed)
- Show timestamp in relative format (e.g., "2m ago") on hover or below message
- Inline image display for any markdown image links in agent responses
3. Create packages/portal/components/chat-sidebar.tsx:
- "use client" component
- Props: { conversations: Conversation[]; activeId: string | null; onSelect: (id: string) => void; onNewChat: () => void }
- "New Conversation" button at top (Plus icon from lucide-react)
- Scrollable list of conversations showing: agent name (bold), last message preview (truncated, muted), relative timestamp
- Active conversation highlighted with accent background
- Empty state: "No conversations yet"
4. Create packages/portal/components/chat-window.tsx:
- "use client" component
- Props: { conversationId: string; authHeaders: { userId, role, tenantId } }
- Uses useConversationHistory(conversationId) for initial load
- Uses useChatSocket for real-time messaging
- State: messages array (merged from history + new), isTyping boolean, inputText string
- On history load: populate messages from query data
- On WebSocket message: append to messages array, scroll to bottom
- On typing indicator: show TypingIndicator below last message
- Input area at bottom: textarea (auto-growing, max 4 lines) + Send button (SendHorizontal icon from lucide-react)
- Send on Enter (Shift+Enter for newline), clear input after send
- Auto-scroll to bottom on new messages (use ref + scrollIntoView)
- Show "Connecting..." state when WebSocket not connected
- Empty state when no conversationId selected: "Select a conversation or start a new one"
5. Create packages/portal/app/(dashboard)/chat/page.tsx:
- "use client" component
- Layout: flex row, full height (h-[calc(100vh-4rem)] or similar to fill dashboard area)
- Left: ChatSidebar (w-80, border-right)
- Right: ChatWindow (flex-1)
- State: activeConversationId (string | null), showAgentPicker (boolean)
- On mount: load conversations via useConversations(activeTenantId)
- For platform admin: use tenant switcher pattern — show all tenants, load agents per tenant
- "New Conversation" flow: show agent picker dialog (Dialog from shadcn base-ui). List agents from useAgents(tenantId). On agent select: call useCreateConversation, set activeConversationId to result.id
- URL state: sync activeConversationId to URL search param ?id={conversationId} for bookmark/refresh support
- Get auth headers from session (useSession from next-auth/react) — userId, role, activeTenantId
6. Update packages/portal/components/nav.tsx:
- Import MessageSquare from lucide-react
- Add { href: "/chat", label: "Chat", icon: MessageSquare } to navItems array
- Position after "Employees" and before "Usage"
- No allowedRoles restriction (all roles can chat per CONTEXT.md)
The chat should feel like a modern messaging app (Slack DMs / iMessage style) — not a clinical chatbot widget. Clean spacing, smooth scrolling, readable typography.
IMPORTANT: Use standardSchemaResolver (not zodResolver) if any forms are needed (per STATE.md convention).
IMPORTANT: use(searchParams) pattern for reading URL params in client components (Next.js 15/16 convention).
IMPORTANT: base-ui DialogTrigger uses render prop not asChild (per Phase 4 STATE.md decision).
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx next build 2>&1 | tail -20</automated>
</verify>
<done>
Chat page renders at /chat with sidebar (conversation list) and main panel (active conversation). New Conversation button opens agent picker dialog. Messages display with role-based alignment and markdown rendering. Typing indicator animates during response wait. Nav sidebar includes Chat link visible to all roles. Portal builds without errors.
</done>
</task>
</tasks>
<verification>
1. Portal builds: `cd packages/portal && npx next build`
2. Chat page accessible at /chat after login
3. Nav shows "Chat" link for all roles
4. No TypeScript errors in new files
</verification>
<success_criteria>
- /chat page renders with left sidebar and right conversation panel
- New Conversation flow: agent picker -> create conversation -> WebSocket connect
- Messages render with markdown (assistant) and plain text (user)
- Typing indicator shows animated dots during response generation
- Conversation history loads from REST API on page visit
- WebSocket connects and authenticates via JSON auth message
- Nav includes Chat link visible to all three roles
- Portal builds successfully
</success_criteria>
<output>
After completion, create `.planning/phases/06-web-chat/06-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,119 @@
---
phase: 06-web-chat
plan: 03
type: execute
wave: 2
depends_on: ["06-01", "06-02"]
files_modified: []
autonomous: false
requirements:
- CHAT-01
- CHAT-02
- CHAT-03
- CHAT-04
- CHAT-05
must_haves:
truths:
- "End-to-end chat works: user sends message via WebSocket, receives LLM response"
- "Conversation history persists and loads on page revisit"
- "Typing indicator appears during response generation"
- "Markdown renders correctly in agent responses"
- "RBAC enforced: operator can chat, but cannot see admin-only nav items"
- "Platform admin can chat with agents across tenants"
artifacts: []
key_links: []
---
<objective>
Human verification of the complete web chat feature. Test end-to-end flow, RBAC enforcement, conversation persistence, and UX quality.
Purpose: Confirm all CHAT requirements are met before marking Phase 6 complete.
Output: Verified working chat feature.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/phases/06-web-chat/06-01-SUMMARY.md
@.planning/phases/06-web-chat/06-02-SUMMARY.md
</context>
<tasks>
<task type="checkpoint:human-verify" gate="blocking">
<name>Task 1: Verify end-to-end web chat feature</name>
<files></files>
<action>
Present the following verification checklist to the user. This is a human verification checkpoint — no code changes needed.
What was built:
- WebSocket-based real-time chat in the portal at /chat
- Conversation sidebar with agent list, timestamps, message previews
- Message bubbles with markdown rendering and typing indicators
- Full agent pipeline integration (memory, tools, escalation, audit)
- Conversation history persistence in PostgreSQL
- RBAC enforcement (all roles can chat, scoped to accessible tenants)
Prerequisites:
- Docker Compose stack running (gateway, orchestrator, portal, postgres, redis)
- At least one active agent configured for a tenant
- Migration applied: `alembic upgrade head`
Test 1 — Basic Chat (CHAT-01, CHAT-05):
1. Log in to portal as customer_admin
2. Click "Chat" in the sidebar navigation
3. Click "New Conversation" and select an AI Employee
4. Type a message and press Enter
5. Verify: typing indicator (animated dots) appears immediately
6. Verify: agent response appears as a left-aligned message bubble
7. Verify: your message appears right-aligned
Test 2 — Markdown Rendering (CHAT-05):
1. Send a message that triggers a formatted response (e.g., "Give me a bulleted list of 3 tips")
2. Verify: response renders with proper markdown (bold, lists, code blocks)
Test 3 — Conversation History (CHAT-03):
1. After sending a few messages, navigate away from /chat (e.g., go to /dashboard)
2. Navigate back to /chat
3. Verify: previous conversation appears in sidebar with last message preview
4. Click the conversation
5. Verify: full message history loads (all previous messages visible)
Test 4 — RBAC (CHAT-04):
1. Log in as customer_operator
2. Verify: "Chat" link visible in sidebar
3. Navigate to /chat, start a conversation with an agent
4. Verify: chat works (operators can chat)
5. Verify: admin-only nav items (Billing, API Keys, Users) are still hidden
Test 5 — Full Pipeline (CHAT-02):
1. If the agent has tools configured, send a message that triggers tool use
2. Verify: agent invokes the tool and incorporates the result
3. (Optional) If escalation rules are configured, trigger one and verify handoff message
</action>
<verify>Human confirms all 5 test scenarios pass</verify>
<done>User types "approved" confirming end-to-end web chat works correctly across all CHAT requirements</done>
</task>
</tasks>
<verification>
All 5 test scenarios pass as described above.
</verification>
<success_criteria>
- Human confirms end-to-end chat works with real LLM responses
- Conversation history persists across page navigations
- Typing indicator visible during response generation
- Markdown renders correctly
- RBAC correctly scopes agent access
- All three roles (platform_admin, customer_admin, customer_operator) can chat
</success_criteria>
<output>
After completion, create `.planning/phases/06-web-chat/06-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,628 @@
# Phase 6: Web Chat - Research
**Researched:** 2026-03-25
**Domain:** Real-time web chat (WebSocket + Redis pub-sub + new channel adapter + portal UI)
**Confidence:** HIGH
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- Dedicated `/chat` page (full-screen, not a floating widget)
- Left sidebar: conversation list grouped by agent, with timestamps and last message preview
- Right panel: active conversation with message bubbles (user right-aligned, agent left-aligned)
- "New Conversation" button opens an agent picker (shows agents the user has access to)
- Markdown rendering in agent messages
- Image/document display inline (consistent with Phase 2 media support)
- Typing indicator (animated dots) while waiting for agent response
- All three roles can chat: platform admin, customer admin, customer operator
- Users can only see/chat with agents belonging to tenants they have access to (RBAC)
- Platform admins can chat with any agent across all tenants
- Operators can chat (read-only restrictions do NOT apply to conversations)
- One conversation thread per user-agent pair (matches per-user per-agent memory model)
- Users can start new conversation (clears thread context) or continue existing one
- Conversation list sorted by most recent, paginated for long histories
- WebSocket connection for real-time, HTTP polling fallback if WebSocket unavailable
- Gateway receives web chat message, normalizes to KonstructMessage (channel: "web"), dispatches through existing pipeline
- Agent response pushed back via WebSocket
- New "web" channel adapter in gateway alongside Slack and WhatsApp
- channel_metadata includes: portal_user_id, tenant_id, conversation_id
- Tenant resolution from the authenticated session (not from channel metadata like Slack workspace ID)
- Outbound: push response via WebSocket connection keyed to conversation_id
### Claude's Discretion
- WebSocket library choice (native ws, Socket.IO, etc.)
- Message bubble visual design
- Conversation pagination strategy (infinite scroll vs load more)
- Whether to show tool invocation indicators in chat (e.g., "Searching knowledge base...")
- Agent avatar/icon in chat
- Sound notification on new message
- Mobile responsiveness approach
### Deferred Ideas (OUT OF SCOPE)
None raised.
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| CHAT-01 | Users can open a chat window with any AI Employee and have a real-time conversation within the portal | WebSocket endpoint on FastAPI gateway + browser WebSocket client in portal chat page |
| CHAT-02 | Web chat supports full agent pipeline — memory, tools, escalation, and media | "web" channel added to ChannelType enum; handle_message Celery task already handles all pipeline stages; _send_response needs "web" case via Redis pub-sub |
| CHAT-03 | Conversation history persists and is visible when the user returns | New conversations DB table + pgvector already keyed per-user per-agent; history load on page visit |
| CHAT-04 | Chat respects RBAC — users can only chat with agents belonging to tenants they have access to | require_tenant_member FastAPI dependency already exists; new chat API endpoints use same pattern; platform_admin bypasses tenant check |
| CHAT-05 | Chat interface feels responsive — typing indicators, message streaming or fast response display | Typing indicator via WebSocket "typing" event immediately on message send; WebSocket pushes final response when Celery completes |
</phase_requirements>
---
## Summary
Phase 6 adds a web chat channel to the Konstruct portal — the first channel that originates inside the portal itself rather than from an external messaging platform. The architecture follows the same channel adapter pattern established in Phases 1 and 2: a new "web" adapter in the gateway normalizes portal messages into KonstructMessage format and dispatches them to the existing Celery pipeline. The key new infrastructure is a WebSocket endpoint on the gateway and a Redis pub-sub channel that bridges the Celery worker's response delivery back to the WebSocket connection.
The frontend is a new `/chat` route in the Next.js portal. It uses the native browser WebSocket API (no additional library required) with a React hook managing connection lifecycle. The UI requires one new shadcn/ui component not yet in the project (ScrollArea) and markdown rendering (react-markdown is not yet installed). Both are straightforward additions.
The most important constraint to keep in mind during planning: the Celery worker and the FastAPI gateway are separate processes. The Celery task cannot call back to the WebSocket connection directly. The correct pattern is Celery publishes the response to a Redis pub-sub channel; the gateway WebSocket handler subscribes to that channel and forwards to the browser. This Redis pub-sub bridge is the critical new piece that does not exist yet.
**Primary recommendation:** Use FastAPI native WebSocket + Redis pub-sub bridge for cross-process response delivery. No additional Python WebSocket libraries needed. Use native browser WebSocket API in the portal. Add react-markdown for markdown rendering.
---
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| FastAPI WebSocket | Built into fastapi[standard] 0.135.2 | WebSocket endpoint on gateway | Already installed, Starlette-native, zero new deps |
| redis.asyncio pub-sub | redis 5.0.0+ (already installed) | Bridge Celery response → WebSocket | Cross-process response delivery; already used everywhere in this codebase |
| Browser WebSocket API | Native (no library) | Portal WebSocket client | Works in all modern browsers, zero bundle cost |
| react-markdown | 9.x | Render agent markdown responses | Standard React markdown renderer; supports GFM, syntax highlighting |
| remark-gfm | 4.x | GitHub Flavored Markdown support | Tables, strikethrough, task lists in agent responses |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| @radix-ui/react-scroll-area (via shadcn) | already available via @base-ui/react | Scrollable message container | Message list that auto-scrolls to bottom |
| lucide-react | already installed | Icons (typing dots, send button, agent avatar) | Already used throughout portal |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| Redis pub-sub bridge | Socket.IO | Socket.IO adds significant bundle weight and complexity; Redis pub-sub is already used in this codebase (rate limiting, session, escalation) |
| React native WebSocket | socket.io-client | Same reason — unnecessary dependency when native WebSocket is sufficient |
| react-markdown | marked + dangerouslySetInnerHTML | react-markdown is React-native and safe; marked requires XSS sanitization as a separate step |
**Installation:**
```bash
# Portal
cd packages/portal && npm install react-markdown remark-gfm
# Backend: no new dependencies needed
# FastAPI WebSocket is in fastapi[standard] already installed
# redis pub-sub is in redis 5.0.0 already installed
```
---
## Architecture Patterns
### Recommended Project Structure
New files added in this phase:
```
packages/
├── gateway/gateway/channels/
│ └── web.py # Web channel adapter + WebSocket endpoint + pub-sub subscriber
├── shared/shared/
│ ├── models/message.py # Add ChannelType.WEB = "web"
│ ├── redis_keys.py # Add webchat_response_key(tenant_id, conversation_id)
│ └── api/
│ └── chat.py # REST API: list conversations, get history, create/reset
├── migrations/versions/
│ └── 008_web_chat.py # conversations table
└── packages/portal/
├── app/(dashboard)/chat/
│ └── page.tsx # Chat page (client component)
├── components/
│ ├── chat-sidebar.tsx # Conversation list sidebar
│ ├── chat-window.tsx # Active conversation + message bubbles
│ ├── chat-message.tsx # Single message bubble with markdown
│ └── typing-indicator.tsx # Animated dots
└── lib/
├── api.ts # Add chat API types + functions
├── queries.ts # Add useConversations, useConversationHistory
└── use-chat-socket.ts # WebSocket lifecycle hook
```
### Pattern 1: Redis Pub-Sub Response Bridge
**What:** Celery task (separate process) completes LLM response and needs to push it to a WebSocket connection held by the gateway FastAPI process. Redis pub-sub is the standard cross-process channel.
**When to use:** Any time a background worker needs to push a result back to a long-lived connection.
**Flow:**
1. Browser sends message via WebSocket to gateway
2. Gateway dispatches `handle_message.delay(payload)` (identical to Slack/WhatsApp)
3. Gateway subscribes to Redis channel `{tenant_id}:webchat:response:{conversation_id}` and waits
4. Celery's `_send_response` for "web" channel publishes response to same Redis channel
5. Gateway receives pub-sub message, pushes to browser WebSocket
**Example — gateway side:**
```python
# Source: redis.asyncio pub-sub docs + existing redis usage in this codebase
import redis.asyncio as aioredis
from fastapi import WebSocket
async def websocket_wait_for_response(
ws: WebSocket,
redis_url: str,
response_channel: str,
timeout: float = 60.0,
) -> None:
"""Subscribe to response channel and forward to WebSocket."""
r = aioredis.from_url(redis_url)
pubsub = r.pubsub()
try:
await pubsub.subscribe(response_channel)
# Wait for response with timeout
async for message in pubsub.listen():
if message["type"] == "message":
await ws.send_text(message["data"])
return
finally:
await pubsub.unsubscribe(response_channel)
await pubsub.aclose()
await r.aclose()
```
**Example — Celery task side (in `_send_response`):**
```python
# Add "web" case to _send_response in orchestrator/tasks.py
elif channel_str == "web":
conversation_id: str = extras.get("conversation_id", "") or ""
tenant_id: str = extras.get("tenant_id", "") or ""
if not conversation_id or not tenant_id:
logger.warning("_send_response: web channel missing conversation_id or tenant_id")
return
response_channel = webchat_response_key(tenant_id, conversation_id)
publish_redis = aioredis.from_url(settings.redis_url)
try:
await publish_redis.publish(response_channel, json.dumps({
"type": "response",
"text": text,
"conversation_id": conversation_id,
}))
finally:
await publish_redis.aclose()
```
### Pattern 2: FastAPI WebSocket Endpoint
**What:** Native FastAPI WebSocket with auth validation from headers. Gateway already holds the Redis client at startup; WebSocket handler uses it.
**When to use:** Every web chat message from the portal browser.
```python
# Source: FastAPI WebSocket docs (verified — WebSocket import is in fastapi package)
from fastapi import WebSocket, WebSocketDisconnect, Depends
from fastapi.websockets import WebSocketState
@app.websocket("/chat/ws/{conversation_id}")
async def chat_websocket(
conversation_id: str,
websocket: WebSocket,
) -> None:
await websocket.accept()
try:
while True:
data = await websocket.receive_json()
# Validate auth headers from data["auth"]
# Normalize to KonstructMessage, dispatch to Celery
# Subscribe to Redis response channel
# Push response back to websocket
except WebSocketDisconnect:
pass
```
**Critical note:** WebSocket headers are available at handshake time via `websocket.headers`. Auth token or RBAC headers should be sent as custom headers in the browser WebSocket constructor (not supported by all browsers) OR as a first message after connection. The established pattern in this project is to send RBAC headers as `X-Portal-User-Id`, `X-Portal-User-Role`, `X-Portal-Tenant-Id`. For WebSocket, send these as a JSON "auth" message immediately after connection (handshake headers are unreliable with the browser WebSocket API).
### Pattern 3: Browser WebSocket Hook
**What:** React hook that manages WebSocket connection lifecycle (connect on mount, reconnect on disconnect, send/receive messages).
```typescript
// packages/portal/lib/use-chat-socket.ts
// Native browser WebSocket — no library needed
"use client";
import { useEffect, useRef, useCallback, useState } from "react";
interface ChatSocketOptions {
conversationId: string;
onMessage: (text: string) => void;
onTyping: (isTyping: boolean) => void;
authHeaders: { userId: string; role: string; tenantId: string | null };
}
export function useChatSocket({
conversationId,
onMessage,
onTyping,
authHeaders,
}: ChatSocketOptions) {
const wsRef = useRef<WebSocket | null>(null);
const [isConnected, setIsConnected] = useState(false);
const send = useCallback((text: string) => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send(JSON.stringify({
type: "message",
text,
auth: authHeaders,
}));
onTyping(true); // Show typing indicator immediately
}
}, [authHeaders, onTyping]);
useEffect(() => {
const wsUrl = `${process.env.NEXT_PUBLIC_WS_URL ?? "ws://localhost:8001"}/chat/ws/${conversationId}`;
const ws = new WebSocket(wsUrl);
wsRef.current = ws;
ws.onopen = () => setIsConnected(true);
ws.onclose = () => setIsConnected(false);
ws.onmessage = (event) => {
const data = JSON.parse(event.data as string);
if (data.type === "response") {
onTyping(false);
onMessage(data.text as string);
}
};
return () => ws.close();
}, [conversationId, onMessage, onTyping]);
return { send, isConnected };
}
```
### Pattern 4: Conversation Persistence (New DB Table)
**What:** A `conversations` table to persist chat history visible on return visits.
**When to use:** Every web chat message — store each turn in the DB.
```python
# New ORM model — migration 008
class WebConversation(Base):
"""Persistent conversation thread for portal web chat."""
__tablename__ = "web_conversations"
id: Mapped[uuid.UUID] = ...
tenant_id: Mapped[uuid.UUID] = ... # RLS enforced
agent_id: Mapped[uuid.UUID] = ...
user_id: Mapped[uuid.UUID] = ... # portal user UUID (from Auth.js session)
created_at: Mapped[datetime] = ...
updated_at: Mapped[datetime] = ... # used for sort order
__table_args__ = (
UniqueConstraint("tenant_id", "agent_id", "user_id"), # one thread per pair
)
class WebConversationMessage(Base):
"""Individual message within a web conversation."""
__tablename__ = "web_conversation_messages"
id: Mapped[uuid.UUID] = ...
conversation_id: Mapped[uuid.UUID] = ForeignKey("web_conversations.id")
tenant_id: Mapped[uuid.UUID] = ... # RLS enforced
role: Mapped[str] = ... # "user" | "assistant"
content: Mapped[str] = ...
created_at: Mapped[datetime] = ...
```
**Note:** The `user_id` for web chat is the portal user's UUID from Auth.js — different from the Slack user ID string used in existing memory. The Redis memory key `memory:short:{agent_id}:{user_id}` will use the portal user's UUID string as `user_id`, keeping it compatible with the existing memory system.
### Pattern 5: Conversation REST API
**What:** REST endpoints for listing conversations, loading history, and resetting. This is separate from the WebSocket endpoint.
```
GET /api/portal/chat/conversations?tenant_id={id} — list all conversations for user
GET /api/portal/chat/conversations/{id}/messages — load history (paginated)
POST /api/portal/chat/conversations — create new or get-or-create
DELETE /api/portal/chat/conversations/{id} — reset (delete messages, keep thread)
```
### Anti-Patterns to Avoid
- **Streaming token-by-token:** The requirements doc explicitly marks "Real-time token streaming in chat" as Out of Scope (consistent with Slack/WhatsApp — they don't support partial messages). The typing indicator shows while the full LLM call runs; the complete response arrives as one message.
- **WebSocket auth via URL query params:** Never put tokens/user IDs in the WebSocket URL. Use JSON message after connection.
- **Calling Celery result backend from WebSocket handler:** Celery result backends add latency and coupling. Use Redis pub-sub directly.
- **One WebSocket connection per page load (not per conversation):** The connection should be scoped per conversation_id so reconnect on conversation switch is clean.
- **Storing conversation history only in Redis:** Redis memory (sliding window) is the agent's working context. The DB `web_conversation_messages` table is what shows up when the user returns to the chat page. These are separate concerns.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Markdown rendering | Custom regex parser | react-markdown + remark-gfm | Handles edge cases, escapes XSS, supports all GFM |
| WebSocket reconnection | Custom exponential backoff | Simple reconnect on close (sufficient for v1) | LLM calls are short; connections don't stay open for hours |
| Auth for WebSocket | Custom token scheme | Send auth as first JSON message using existing RBAC headers | Consistent with existing `X-Portal-*` header pattern |
| Cross-process response delivery | Shared memory / HTTP callback | Redis pub-sub | Already in use; correct pattern for Celery → FastAPI bridge |
**Key insight:** The web channel adapter is the only genuinely new piece of infrastructure. Everything else — RBAC, memory, tool calling, escalation, audit — already works and processes messages tagged with any channel type. Adding `ChannelType.WEB = "web"` and a new `_send_response` branch is sufficient to wire the whole pipeline.
---
## Common Pitfalls
### Pitfall 1: WebSocket Auth — Browser API Limitation
**What goes wrong:** The browser's native `WebSocket` constructor does not support custom headers. Code that tries `new WebSocket(url, { headers: {...} })` fails silently or raises a TypeError.
**Why it happens:** The WebSocket spec only allows specifying subprotocols as the second argument, not headers. This is a deliberate browser security decision.
**How to avoid:** Send auth information as a JSON "auth" message immediately after connection opens. The FastAPI WebSocket handler should require this first message before processing any chat messages. This is established practice for browser WebSocket auth.
**Warning signs:** Tests that use httpx websocket client work fine (httpx supports headers) but the browser connection is rejected.
### Pitfall 2: Celery Sync Context in Async `_send_response`
**What goes wrong:** `_send_response` is an async function called from `asyncio.run()` inside the sync Celery task. Adding Redis pub-sub code there requires creating a new async Redis client per task, which is the existing pattern — but forgetting `await publish_redis.aclose()` leaks connections.
**Why it happens:** The "Celery tasks MUST be sync def" constraint (STATE.md) means we're always bridging sync→async via `asyncio.run()`. Every async resource must be explicitly closed.
**How to avoid:** Follow the existing pattern in `_process_message`: use `try/finally` around every `aioredis.from_url()` call to ensure `aclose()` always runs.
**Warning signs:** Redis connection count grows over time; "too many connections" errors in production.
### Pitfall 3: Conversation ID vs Thread ID Confusion
**What goes wrong:** The KonstructMessage `thread_id` field is used by the memory system to scope Redis sliding window. For web chat, `thread_id` should be the `conversation_id` (UUID) from the `web_conversations` table. If this is set incorrectly (e.g., to the portal user_id), all conversations for a user share one memory window.
**Why it happens:** Slack sets `thread_id` to `thread_ts` (string). WhatsApp sets it to `wa_id`. Web chat must set it to `conversation_id` (UUID string) — one distinct value per conversation.
**How to avoid:** The web channel normalizer should set `thread_id = conversation_id` in the KonstructMessage. The `user_id` for memory key construction comes from `sender.user_id` (portal user UUID string). The combination `tenant_id + agent_id + user_id` (Redis memory key) matches correctly.
### Pitfall 4: New Conversation vs Continue — Race Condition
**What goes wrong:** User clicks "New Conversation" while a response is still in flight for the old conversation. The old conversation's pub-sub response arrives and updates the new conversation's state.
**Why it happens:** The WebSocket is keyed to `conversation_id`. When the user resets the thread, a new `conversation_id` is created. The old pub-sub subscription must be cleaned up before subscribing to the new one.
**How to avoid:** When the user creates a new conversation: (1) close/unmount the old WebSocket connection, (2) create a new `web_conversations` row via REST API (getting a new UUID), (3) connect new WebSocket to the new conversation_id. React's `useEffect` cleanup handles this naturally when `conversationId` changes.
### Pitfall 5: `ChannelType.WEB` Missing from DB CHECK Constraint
**What goes wrong:** Adding `WEB = "web"` to the Python `ChannelType` StrEnum does not automatically update the PostgreSQL CHECK constraint on the `channel_type` column. Existing data is fine, but inserting new records with `channel = "web"` fails at the DB level.
**Why it happens:** STATE.md documents the decision: "channel_type stored as TEXT with CHECK constraint — native sa.Enum caused duplicate CREATE TYPE DDL." The CHECK constraint lists allowed values and must be updated via migration.
**How to avoid:** Migration 008 must ALTER the CHECK constraint on any affected tables to include `"web"`. Check which tables have `channel_type` constraints: `channel_connections` (stores active channel configs per tenant). The `conversation_embeddings` and audit tables use `TEXT` without CHECK, so only `channel_connections` needs the update.
**Warning signs:** `CheckViolation` error from PostgreSQL when the gateway tries to normalize a web message.
### Pitfall 6: React 19 + Next.js 16 `use()` for Async Data
**What goes wrong:** Using `useState` + `useEffect` to fetch conversation history in a client component works but misses the React 19 preferred pattern.
**Why it happens:** React 19 introduces `use()` for Promises directly in components (TanStack Query handles this abstraction). The existing codebase already uses TanStack Query uniformly — don't break this pattern.
**How to avoid:** Add `useConversations` and `useConversationHistory` hooks in `queries.ts` following the existing pattern (e.g., `useAgents`, `useTenants`). Use `useQuery` from `@tanstack/react-query`.
---
## Code Examples
Verified patterns from existing codebase:
### Adding ChannelType.WEB to the enum
```python
# packages/shared/shared/models/message.py
# Source: existing file — add one line
class ChannelType(StrEnum):
SLACK = "slack"
WHATSAPP = "whatsapp"
MATTERMOST = "mattermost"
ROCKETCHAT = "rocketchat"
TEAMS = "teams"
TELEGRAM = "telegram"
SIGNAL = "signal"
WEB = "web" # Add this line
```
### Adding webchat Redis key to redis_keys.py
```python
# packages/shared/shared/redis_keys.py
# Source: existing file pattern
def webchat_response_key(tenant_id: str, conversation_id: str) -> str:
"""
Redis pub-sub channel for web chat response delivery.
Published by Celery task after LLM response; subscribed by WebSocket handler.
"""
return f"{tenant_id}:webchat:response:{conversation_id}"
```
### Web channel extras in handle_message
```python
# packages/orchestrator/orchestrator/tasks.py
# Source: existing extras pattern (line 246-254)
# Add to handle_message alongside existing Slack/WhatsApp extras:
conversation_id: str = message_data.pop("conversation_id", "") or ""
portal_user_id: str = message_data.pop("portal_user_id", "") or ""
# Add to extras dict (line 269-274):
extras: dict[str, Any] = {
"placeholder_ts": placeholder_ts,
"channel_id": channel_id,
"phone_number_id": phone_number_id,
"bot_token": bot_token,
"wa_id": wa_id,
"conversation_id": conversation_id,
"portal_user_id": portal_user_id,
}
```
### TanStack Query hook pattern (follows existing)
```typescript
// packages/portal/lib/queries.ts
// Source: existing useAgents pattern
export function useConversations(tenantId: string) {
return useQuery({
queryKey: ["conversations", tenantId],
queryFn: () => api.get<ConversationsResponse>(`/api/portal/chat/conversations?tenant_id=${tenantId}`),
enabled: !!tenantId,
});
}
export function useConversationHistory(conversationId: string) {
return useQuery({
queryKey: ["conversation-history", conversationId],
queryFn: () => api.get<MessagesResponse>(`/api/portal/chat/conversations/${conversationId}/messages`),
enabled: !!conversationId,
});
}
```
### FastAPI WebSocket endpoint in gateway main.py
```python
# packages/gateway/gateway/main.py — add alongside existing routers
# Source: FastAPI WebSocket API (verified available in fastapi 0.135.2)
from gateway.channels.web import chat_websocket_router
app.include_router(chat_websocket_router)
```
### RBAC enforcement in chat REST API
```python
# packages/shared/shared/api/chat.py
# Source: existing pattern from rbac.py + portal.py
@router.get("/api/portal/chat/conversations")
async def list_conversations(
tenant_id: UUID,
caller: PortalCaller = Depends(get_portal_caller),
session: AsyncSession = Depends(get_session),
) -> ConversationsResponse:
await require_tenant_member(tenant_id, caller, session)
# ... query web_conversations WHERE tenant_id = tenant_id AND user_id = caller.user_id
```
### Proxy.ts update — add /chat to allowed operator paths
```typescript
// packages/portal/proxy.ts
// Source: existing file — /chat must NOT be in CUSTOMER_OPERATOR_RESTRICTED
// Operators can chat (chatting IS the product)
// No change needed to proxy.ts — /chat is not in the restricted list
// Just add /chat to nav.tsx
```
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| `middleware.ts` | `proxy.ts` (function named `proxy`) | Next.js 16 | Already migrated in this project — STATE.md confirms |
| `useSearchParams` synchronous | `use(searchParams)` to unwrap Promise | Next.js 15 | Already applied in this project per STATE.md |
| `zodResolver` from hookform | `standardSchemaResolver` | hookform/resolvers v5 | Already applied — don't use zodResolver |
| `stripe.api_key = ...` | `new StripeClient(api_key=...)` | stripe v14+ | Already applied — use thread-safe constructor |
| `Column()` SQLAlchemy | `mapped_column()` + `Mapped[]` | SQLAlchemy 2.0 | Already the pattern — use mapped_column |
**Deprecated/outdated:**
- `middleware.ts`: deprecated in Next.js 16, renamed to `proxy.ts`. Already done in this project.
- SQLAlchemy `sa.Enum` for channel_type: causes duplicate DDL — use TEXT + CHECK constraint (STATE.md decision).
---
## Open Questions
1. **HTTP Polling Fallback Scope**
- What we know: CONTEXT.md specifies "fallback to HTTP polling if WebSocket unavailable"
- What's unclear: Is this needed for v1 given all modern browsers support WebSocket? WebSocket failure typically indicates a network/proxy issue that polling would also fail on.
- Recommendation: Implement WebSocket only for v1. Add a simple error state ("Connection lost — please refresh") instead of full polling fallback. Real polling fallback is significant complexity for an edge case.
2. **Media Upload in Web Chat**
- What we know: CONTEXT.md says "image/document display inline (consistent with media support from Phase 2)." Phase 2 media goes through MinIO.
- What's unclear: Can users upload media directly in web chat (browser file picker), or does "inline display" mean only displaying agent responses that contain media?
- Recommendation: v1 — display media in agent responses (agent can return image URLs from MinIO/S3). User-to-agent file upload is a separate feature. The KonstructMessage already supports MediaAttachment; the web normalizer can include media from agent tool results.
3. **Agent Selection Scope for Platform Admins**
- What we know: Platform admins can chat with "any agent across all tenants."
- What's unclear: The agent picker UI — does a platform admin see all agents grouped by tenant, or do they first pick a tenant then pick an agent?
- Recommendation: Use the existing tenant switcher pattern from the agents page: platform admin sees agents grouped by tenant in the sidebar. This reuses `useTenants()` + `useAgents(tenantId)` pattern already in the agents list page.
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest 8.3.0 + pytest-asyncio 0.25.0 |
| Config file | `pyproject.toml` (root) — `asyncio_mode = "auto"`, `testpaths = ["tests"]` |
| Quick run command | `pytest tests/unit/test_web_channel.py -x` |
| Full suite command | `pytest tests/unit -x` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| CHAT-01 | WebSocket endpoint accepts connection and dispatches to Celery | unit | `pytest tests/unit/test_web_channel.py::test_websocket_dispatches_to_celery -x` | ❌ Wave 0 |
| CHAT-01 | Web channel normalizer produces valid KonstructMessage | unit | `pytest tests/unit/test_web_channel.py::test_normalize_web_event -x` | ❌ Wave 0 |
| CHAT-02 | `_send_response` for "web" channel publishes to Redis pub-sub | unit | `pytest tests/unit/test_web_channel.py::test_send_response_web_publishes_to_redis -x` | ❌ Wave 0 |
| CHAT-03 | Conversation history REST endpoint returns paginated messages | unit | `pytest tests/unit/test_chat_api.py::test_list_conversation_history -x` | ❌ Wave 0 |
| CHAT-04 | Chat API returns 403 for user not member of tenant | unit | `pytest tests/unit/test_chat_api.py::test_chat_rbac_enforcement -x` | ❌ Wave 0 |
| CHAT-04 | Platform admin can access agents across all tenants | unit | `pytest tests/unit/test_chat_api.py::test_platform_admin_cross_tenant -x` | ❌ Wave 0 |
| CHAT-05 | Typing indicator message sent immediately on WebSocket receive | unit | `pytest tests/unit/test_web_channel.py::test_typing_indicator_sent -x` | ❌ Wave 0 |
### Sampling Rate
- **Per task commit:** `pytest tests/unit/test_web_channel.py tests/unit/test_chat_api.py -x`
- **Per wave merge:** `pytest tests/unit -x`
- **Phase gate:** Full suite green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `tests/unit/test_web_channel.py` — covers CHAT-01, CHAT-02, CHAT-05
- [ ] `tests/unit/test_chat_api.py` — covers CHAT-03, CHAT-04
---
## Sources
### Primary (HIGH confidence)
- Existing codebase — `packages/gateway/gateway/channels/slack.py`, `whatsapp.py`, `normalize.py` — channel adapter pattern directly replicated
- Existing codebase — `packages/orchestrator/orchestrator/tasks.py``_send_response` extension point verified by reading full source
- Existing codebase — `packages/shared/shared/models/message.py` — ChannelType enum verified, "web" not yet present
- Existing codebase — `packages/shared/shared/redis_keys.py` — key naming convention verified
- Existing codebase — `packages/shared/shared/api/rbac.py``require_tenant_member`, `get_portal_caller` pattern verified
- FastAPI source — `fastapi` 0.135.2 installed, `from fastapi import WebSocket` verified importable
- redis.asyncio — version 5.0.0+ installed, pub-sub available (`r.pubsub()` verified importable)
- Next.js 16 bundled docs — `packages/portal/node_modules/next/dist/docs/` — proxy.ts naming, `use(searchParams)` patterns confirmed
- `packages/portal/package.json` — Next.js 16.2.1, React 19.2.4, confirmed packages
### Secondary (MEDIUM confidence)
- `.planning/STATE.md` — all architecture decisions (channel_type TEXT+CHECK, Celery sync-only, hookform resolver, proxy.ts naming) verified against actual files
- react-markdown 9.x + remark-gfm 4.x — current stable versions for React 19 compatibility (not yet installed, based on known package state)
### Tertiary (LOW confidence)
- None — all claims verified against codebase or installed package docs
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all backend packages verified installed and importable; portal packages verified via package.json
- Architecture: HIGH — channel adapter pattern, extras dict pattern, RBAC pattern all verified by reading actual source files
- Pitfalls: HIGH — most pitfalls derive directly from STATE.md documented decisions (CHECK constraint, Celery sync, browser WebSocket header limitation)
**Research date:** 2026-03-25
**Valid until:** 2026-04-25 (stable stack; react-markdown version should be re-checked if planning is delayed)

View File

@@ -0,0 +1,80 @@
---
phase: 6
slug: web-chat
status: draft
nyquist_compliant: false
wave_0_complete: false
created: 2026-03-25
---
# Phase 6 — Validation Strategy
> Per-phase validation contract for feedback sampling during execution.
---
## Test Infrastructure
| Property | Value |
|----------|-------|
| **Framework** | pytest 8.x + pytest-asyncio (existing) |
| **Config file** | `pyproject.toml` (existing) |
| **Quick run command** | `pytest tests/unit -x -q` |
| **Full suite command** | `pytest tests/ -x` |
| **Estimated runtime** | ~30 seconds |
---
## Sampling Rate
- **After every task commit:** Run `pytest tests/unit -x -q`
- **After every plan wave:** Run `pytest tests/ -x`
- **Before `/gsd:verify-work`:** Full suite must be green
- **Max feedback latency:** 30 seconds
---
## Per-Task Verification Map
| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
|---------|------|------|-------------|-----------|-------------------|-------------|--------|
| 06-xx | 01 | 1 | CHAT-01,02 | unit | `pytest tests/unit/test_web_channel.py -x` | ❌ W0 | ⬜ pending |
| 06-xx | 01 | 1 | CHAT-03 | unit | `pytest tests/unit/test_web_conversations.py -x` | ❌ W0 | ⬜ pending |
| 06-xx | 01 | 1 | CHAT-04 | unit | `pytest tests/unit/test_web_rbac.py -x` | ❌ W0 | ⬜ pending |
| 06-xx | 02 | 2 | CHAT-01,05 | build | `cd packages/portal && npx next build` | ✅ | ⬜ pending |
| 06-xx | 02 | 2 | CHAT-03 | build | `cd packages/portal && npx next build` | ✅ | ⬜ pending |
*Status: ⬜ pending · ✅ green · ❌ red · ⚠️ flaky*
---
## Wave 0 Requirements
- [ ] `tests/unit/test_web_channel.py` — CHAT-01,02: web normalizer, WebSocket message handling
- [ ] `tests/unit/test_web_conversations.py` — CHAT-03: conversation CRUD API
- [ ] `tests/unit/test_web_rbac.py` — CHAT-04: RBAC enforcement on chat endpoints
---
## Manual-Only Verifications
| Behavior | Requirement | Why Manual | Test Instructions |
|----------|-------------|------------|-------------------|
| WebSocket chat sends message and receives real-time reply | CHAT-01,05 | Requires live WebSocket + LLM | Open /chat, select agent, send message, verify response appears |
| Conversation history loads on page visit | CHAT-03 | UI rendering | Navigate away and back to /chat, verify previous messages visible |
| Typing indicator displays during response generation | CHAT-05 | UI animation | Send message, observe animated dots before response |
| Agent markdown renders correctly | CHAT-05 | Visual rendering | Trigger a response with code blocks / lists / bold |
| Operator can chat but not see admin nav items | CHAT-04 | RBAC visual | Login as operator, verify /chat accessible but admin-only items hidden |
---
## Validation Sign-Off
- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
- [ ] Wave 0 covers all MISSING references
- [ ] No watch-mode flags
- [ ] Feedback latency < 30s
- [ ] `nyquist_compliant: true` set in frontmatter
**Approval:** pending