docs(01): research phase 1 foundation domain

This commit is contained in:
2026-03-23 09:25:20 -06:00
parent fe3b36be16
commit 2ab18fde4f

View File

@@ -0,0 +1,815 @@
# Phase 1: Foundation - Research
**Researched:** 2026-03-23
**Domain:** Multi-tenant Python monorepo scaffolding, PostgreSQL RLS, LiteLLM backend pool, Slack Events API, basic agent orchestrator, Next.js admin portal
**Confidence:** HIGH (synthesized from project research docs verified against PyPI, official Slack docs, LiteLLM docs, and pgvector sources — all conducted 2026-03-22)
---
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
- Next.js portal starts in Phase 1 — not deferred to Phase 3
- Portal includes tenant CRUD (create, list, view, edit, delete tenants)
- Portal includes Agent Designer module (job description, SOW, persona, system prompt, tool assignments, escalation rules)
- Auth.js v5 with email/password authentication from the start — no hardcoded credentials, no throwaway auth code
- Phase 3 scope narrows to: Stripe billing integration, onboarding wizard, cost tracking dashboard, channel connection wizard, and portal polish
- AI employees have human-like names by default (e.g., "Mara", "Alex") — matches the "hire an AI employee" branding
- Default persona tone: professional + warm — friendly but business-appropriate, like a good colleague
- Always transparent about being AI when asked directly — never pretends to be human
- Silent until spoken to — no auto-introduction message when added to a Slack channel
- Operator configures name, role, persona, and system prompt via the Agent Designer in the portal
- Agent responds to: @mentions in channels and direct messages
- Does NOT monitor entire channels or respond to all messages (no "designated support channel" mode in v1)
- Always replies in threads — keeps channels clean
- Shows typing indicator while LLM is generating a response
### Claude's Discretion
- Thread follow-up behavior (auto-follow after first engagement vs always require @mention)
- Portal UI layout and component choices (within shadcn/ui)
- Default AI employee name suggestions
- Agent Designer form layout and field ordering
- Error message copy and formatting
### Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope
</user_constraints>
---
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| CHAN-01 | Channel Gateway normalizes messages from all channels into unified KonstructMessage format | KonstructMessage Pydantic model defined in ARCHITECTURE.md; normalization pattern documented |
| CHAN-02 | User can interact with AI employee via Slack (Events API — @mentions, DMs, thread replies) | slack-bolt 1.27.0 AsyncApp pattern; Events API + async FastAPI integration documented in STACK.md |
| CHAN-05 | Platform rate-limits requests per tenant and per channel with configurable thresholds | slowapi with Redis token bucket pattern; ARCHITECTURE.md Message Router layer |
| AGNT-01 | Tenant can configure a single AI employee with custom name, role, and persona | Agent DB schema; system prompt assembly from persona fields; Agent Designer portal module |
| LLM-01 | LiteLLM router abstracts LLM provider selection with fallback routing | LiteLLM Router configuration pattern in ARCHITECTURE.md Pattern 4; fallback chain config |
| LLM-02 | Platform supports Ollama (local) and commercial APIs (Anthropic, OpenAI) as LLM providers | LiteLLM model_list config with ollama + anthropic + openai providers documented |
| TNNT-01 | All tenant data is isolated via PostgreSQL Row Level Security | RLS + FORCE ROW LEVEL SECURITY pattern; app role isolation; sqlalchemy-tenants integration |
| TNNT-02 | Inbound messages are resolved to the correct tenant via channel metadata | channel_connections table lookup; contextvar-based tenant propagation to RLS |
| TNNT-03 | Per-tenant Redis namespace isolation for cache and session state | `{tenant_id}:` key prefix pattern; shared utility enforcement described |
| TNNT-04 | All data encrypted at rest (PostgreSQL, object storage) and in transit (TLS 1.3) | PostgreSQL TDE, MinIO SSE, TLS config for all service-to-service; Docker Compose network isolation |
| PRTA-01 | Operator can create, view, update, and delete tenants | Next.js portal with TanStack Query + FastAPI CRUD endpoints; Auth.js v5 authentication |
| PRTA-02 | Operator can design agents via Agent Designer — name, role, persona, system prompt, tool assignments, escalation rules | Agent Designer as prominent portal module; form fields are text inputs; React Hook Form + Zod |
</phase_requirements>
---
## Summary
Phase 1 builds the entire vertical slice from Slack message to LLM response, with no tenant data leakage possible, rate limiting enforced, and an admin portal where operators can manage tenants and configure AI employees. It has four sequential plans: (1) monorepo scaffolding and shared data models with PostgreSQL RLS, (2) the LiteLLM backend pool with Celery async dispatch, (3) Channel Gateway (Slack) + Message Router + basic Agent Orchestrator, and (4) the Next.js admin portal with Auth.js v5, tenant CRUD, and Agent Designer. Plans 1 and 2 must complete before Plan 3 begins; Plan 4 can overlap with Plan 3 once the DB schema stabilizes.
The most dangerous failure mode for Phase 1 is **silent cross-tenant data leakage**. PostgreSQL RLS only protects the application if `FORCE ROW LEVEL SECURITY` is applied to every table AND the application connects as a non-superuser role. This must be verified explicitly — RLS can appear to work while providing zero isolation. Every integration test must exercise a two-tenant fixture from the first day of DB schema work.
The second dangerous failure mode is **async event loop conflicts in Celery**. All Celery task functions must be synchronous `def` (not `async def`). The pattern must be established in Plan 1 scaffolding so it becomes the convention before any LLM task work begins in Plan 2.
**Primary recommendation:** Build the DB schema + RLS + Redis namespacing in Plan 1 with their isolation tests green before touching any channel or LLM code. Tenant isolation retrofitted later costs significantly more than tenant isolation designed first.
---
## Standard Stack
### Core Backend
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| Python | 3.12 | Runtime | CLAUDE.md specified; LTS sweet spot — 3.13 ecosystem support lags |
| FastAPI | 0.135.1 | API framework | Async-native, auto OpenAPI docs, DI system; de facto for async Python APIs |
| Pydantic v2 | 2.12.5 | Data validation | Mandatory for FastAPI; 20x faster than v1; strict mode for public interfaces |
| SQLAlchemy | 2.0.48 | ORM | True async `AsyncSession`; 1.x patterns are deprecated and must not be used |
| Alembic | 1.18.4 | DB migrations | Standard SQLAlchemy companion; requires async `env.py` modification |
| asyncpg | 0.31.0 | PostgreSQL async driver | Required for SQLAlchemy async; faster than psycopg2 for concurrent workloads |
| PostgreSQL | 16 | Primary database | CLAUDE.md specified; RLS is the v1 multi-tenancy mechanism |
| Redis | 7.x | Cache, pub/sub, rate limiting, Celery broker | One service for multiple purposes; session state, namespaced per-tenant |
| Celery | 5.6.2 | Background job processing | LLM calls dispatched async; prevents Slack webhook timeouts; mature ecosystem |
| uv | latest | Python package manager | Workspace support for monorepo; replaces pip + virtualenv |
### LLM Integration
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| LiteLLM | 1.82.5 | LLM gateway | Unified API across all providers; fallback routing; cost tracking; never call provider APIs directly |
| Ollama | latest | Local inference | Docker service for dev; OpenAI-compatible API on port 11434 |
### Channel Integration
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| slack-bolt | 1.27.0 | Slack Events API | Official Slack SDK; use `AsyncApp` in HTTP mode (not Socket Mode in production) |
### Admin Portal
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| Next.js | 16.x | Portal framework | CLAUDE.md specifies 14+; current stable is 16 (March 2026); App Router mature; use 16 to avoid building on a behind version |
| TypeScript | 5.x | Type safety | Strict mode required per CLAUDE.md |
| Tailwind CSS | 4.x | Styling | Required by shadcn/ui; v4 uses CSS-native variables |
| shadcn/ui | latest | Component library | Copy-to-project model; standard for Next.js admin portals 2025-2026 |
| TanStack Query | 5.x | Server state | Client-side fetching, caching, mutations against FastAPI |
| React Hook Form + Zod | latest | Form validation | Standard pairing for shadcn/ui forms; Zod schemas shared with backend type defs |
| Auth.js | v5 | Portal authentication | v5 rewritten for App Router compatibility; PostgreSQL session adapter; email/password from the start |
### Rate Limiting
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| slowapi | latest | FastAPI rate limiting | Redis-backed token bucket; integrates directly with FastAPI; per-tenant + per-channel limits |
### Dev Tools
| Tool | Purpose |
|------|---------|
| ruff | Linting + formatting (replaces flake8, isort, black) |
| mypy --strict | Type checking; no `Any` in public interfaces |
| pytest + pytest-asyncio | Async test support; use `httpx.AsyncClient` not sync TestClient |
| Docker Compose | All infra services (PostgreSQL, Redis, Ollama) |
### Installation
```bash
# Initialize Python monorepo
uv init konstruct && cd konstruct
uv workspace add packages/gateway
uv workspace add packages/router
uv workspace add packages/orchestrator
uv workspace add packages/llm-pool
uv workspace add packages/shared
# Core backend dependencies
uv add fastapi[standard] pydantic[email] sqlalchemy[asyncio] asyncpg alembic
uv add litellm redis celery[redis] slack-bolt python-jose[cryptography] httpx slowapi
# Dev dependencies
uv add --dev ruff mypy pytest pytest-asyncio pytest-httpx
# Portal
cd packages/portal
npx create-next-app@latest . --typescript --tailwind --eslint --app
npx shadcn@latest init
npm install @tanstack/react-query react-hook-form zod next-auth
```
---
## Architecture Patterns
### Recommended Project Structure
```
konstruct/
├── packages/
│ ├── gateway/ # Channel Gateway service (FastAPI)
│ │ ├── channels/
│ │ │ └── slack.py # Slack Events API handler (HTTP mode, AsyncApp)
│ │ ├── normalize.py # Slack event → KonstructMessage
│ │ ├── verify.py # X-Slack-Signature verification
│ │ └── main.py # FastAPI app, /slack/events route
│ │
│ ├── router/ # Message Router service (FastAPI)
│ │ ├── tenant.py # workspace_id → tenant_id lookup
│ │ ├── ratelimit.py # Redis token bucket per tenant/channel
│ │ ├── idempotency.py # Redis dedup (message_id, TTL 24h)
│ │ ├── context.py # Load agent config from DB
│ │ └── main.py
│ │
│ ├── orchestrator/ # Agent Orchestrator (Celery workers)
│ │ ├── tasks.py # Celery task: handle_message (sync def, NOT async def)
│ │ ├── agents/
│ │ │ ├── builder.py # Assemble agent prompt from persona + history
│ │ │ └── runner.py # LLM call → parse response → send reply
│ │ └── main.py # Celery worker entry point
│ │
│ ├── llm-pool/ # LLM Backend Pool service (LiteLLM wrapper)
│ │ ├── router.py # LiteLLM Router config (model groups + fallback)
│ │ ├── providers/
│ │ │ ├── ollama.py
│ │ │ ├── anthropic.py
│ │ │ └── openai.py
│ │ └── main.py # FastAPI app exposing /complete endpoint
│ │
│ ├── portal/ # Next.js 16 Admin Dashboard
│ │ ├── app/
│ │ │ ├── (auth)/ # /login route
│ │ │ ├── dashboard/ # Post-auth layout
│ │ │ ├── tenants/ # Tenant CRUD pages
│ │ │ ├── agents/ # Agent Designer module
│ │ │ └── api/auth/ # Auth.js route handler
│ │ ├── components/ # shadcn/ui components
│ │ └── lib/
│ │ ├── api.ts # TanStack Query hooks + API client
│ │ └── auth.ts # Auth.js config
│ │
│ └── shared/ # Shared Python library (no service)
│ ├── models/
│ │ ├── message.py # KonstructMessage Pydantic model
│ │ ├── tenant.py # Tenant, Agent, ChannelConnection SQLAlchemy models
│ │ └── auth.py # Portal user models
│ ├── db.py # SQLAlchemy async engine + session factory
│ ├── rls.py # SET app.current_tenant contextvar + hook
│ └── config.py # Pydantic Settings (env vars)
├── migrations/ # Alembic (single migration history)
├── tests/
│ ├── unit/
│ └── integration/ # Two-tenant fixture tests (REQUIRED in Plan 1)
├── docker-compose.yml # PostgreSQL 16, Redis 7, Ollama, all services
└── pyproject.toml # uv workspace config
```
### Pattern 1: Immediate-Acknowledge, Async-Process
**What:** Channel Gateway returns HTTP 200 to Slack within 3 seconds, without LLM work. Processing is dispatched to Celery. The AI reply arrives as a follow-up Slack message.
**When to use:** Always. Slack retries and flags apps as unhealthy if no 2xx within 3 seconds. This is non-negotiable.
**Example:**
```python
# packages/gateway/channels/slack.py
@app.event("message")
async def handle_message(event, say, client):
msg = normalize_slack(event)
if await is_duplicate(msg.id): # Redis idempotency key
return
handle_message_task.delay(msg.model_dump())
# HTTP 200 returned implicitly — Slack is satisfied
```
### Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook
**What:** Set `app.current_tenant` on the PostgreSQL connection before every query. RLS policies use this setting to filter every row automatically. Application code never adds `WHERE tenant_id = ...` manually.
**When to use:** Every DB interaction in the router and orchestrator. This is the primary tenant isolation mechanism.
**Example:**
```python
# packages/shared/rls.py
from contextvars import ContextVar
from sqlalchemy import event
current_tenant_id: ContextVar[str | None] = ContextVar("current_tenant_id", default=None)
@event.listens_for(engine.sync_engine, "before_cursor_execute")
def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
tenant_id = current_tenant_id.get()
if tenant_id:
cursor.execute(f"SET LOCAL app.current_tenant = '{tenant_id}'")
```
**Critical RLS migration requirements:**
```sql
-- Every table must have both the policy AND FORCE applied
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
ALTER TABLE agents FORCE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON agents
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Application MUST connect as this role (never postgres superuser)
CREATE ROLE konstruct_app WITH LOGIN PASSWORD '...';
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO konstruct_app;
```
### Pattern 3: LiteLLM Router as Internal Singleton Service
**What:** LLM Backend Pool exposes a single internal HTTP `/complete` endpoint. Orchestrator workers call this endpoint. LiteLLM Router behind it handles provider selection, fallback, and cost tracking.
**When to use:** All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator.
**Example:**
```python
# packages/llm-pool/router.py
from litellm import Router
router = Router(
model_list=[
{
"model_name": "fast",
"litellm_params": {
"model": "ollama/qwen3:8b",
"api_base": "http://ollama:11434"
}
},
{
"model_name": "quality",
"litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}
},
{
"model_name": "quality",
"litellm_params": {"model": "openai/gpt-4o"} # fallback
},
],
fallbacks=[{"quality": ["fast"]}],
routing_strategy="latency-based-routing",
)
```
**Pin LiteLLM version in Docker — never use `latest`.** A September 2025 release caused OOM errors on Kubernetes.
### Pattern 4: Celery Task Pattern (SYNC, not async)
**What:** Celery tasks are synchronous `def` functions. Async code inside tasks is wrapped with `asyncio.run()`. This pattern must be established in Plan 1 scaffolding.
**Example:**
```python
# packages/orchestrator/tasks.py
from celery import Celery
import asyncio
app = Celery("orchestrator", broker="redis://redis:6379/0")
# CORRECT: sync def
@app.task
def handle_message(message_data: dict) -> None:
asyncio.run(_process_message(message_data))
async def _process_message(message_data: dict) -> None:
# async DB and LLM calls here
...
# WRONG — DO NOT DO THIS:
# @app.task
# async def handle_message(message_data: dict) -> None: ← RuntimeError
```
### Pattern 5: Redis Namespacing (Tenant Isolation)
**What:** All Redis keys include `{tenant_id}:` prefix. Enforce via a shared utility function — convention is insufficient.
**Example:**
```python
# packages/shared/redis_keys.py
def rate_limit_key(tenant_id: str, channel: str) -> str:
return f"{tenant_id}:ratelimit:{channel}"
def idempotency_key(tenant_id: str, message_id: str) -> str:
return f"{tenant_id}:dedup:{message_id}"
def session_key(tenant_id: str, thread_id: str) -> str:
return f"{tenant_id}:session:{thread_id}"
```
### Pattern 6: Slack AsyncApp + FastAPI Integration
**What:** Mount `slack-bolt` AsyncApp inside FastAPI as an ASGI sub-application. HTTP mode only (not Socket Mode) for production.
**Example:**
```python
# packages/gateway/main.py
from fastapi import FastAPI
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler
slack_app = AsyncApp(
token=settings.slack_bot_token,
signing_secret=settings.slack_signing_secret,
)
handler = AsyncSlackRequestHandler(slack_app)
fastapi_app = FastAPI()
@fastapi_app.post("/slack/events")
async def slack_events(req: Request):
return await handler.handle(req)
```
### Pattern 7: Auth.js v5 with Next.js App Router
**What:** Auth.js v5 (the rewrite formerly known as NextAuth.js) uses the `auth()` helper in server components and API routes. Session stored in PostgreSQL via the Drizzle or Prisma adapter.
**Example:**
```typescript
// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { db } from "./db" // PostgreSQL adapter
export const { handlers, auth, signIn, signOut } = NextAuth({
providers: [
Credentials({
credentials: {
email: { label: "Email", type: "email" },
password: { label: "Password", type: "password" },
},
async authorize(credentials) {
// validate against DB user table
},
}),
],
adapter: DrizzleAdapter(db), // or PrismaAdapter
})
```
### Anti-Patterns to Avoid
- **Async Celery tasks:** Never write `async def` Celery tasks. Use `asyncio.run()` inside sync `def` tasks.
- **Superuser PostgreSQL connections:** Application must never connect as `postgres` superuser. RLS is bypassed silently.
- **Missing FORCE ROW LEVEL SECURITY:** RLS policies without FORCE are bypassed by the table owner. Apply `ALTER TABLE ... FORCE ROW LEVEL SECURITY` to every table.
- **Unnamespaced Redis keys:** Any Redis key without `{tenant_id}:` prefix can collide across tenants.
- **LLM work inside webhook handler:** Slack requires HTTP 200 in 3 seconds. LLM calls take 5-30 seconds. Always dispatch to Celery.
- **Direct provider SDK calls from orchestrator:** Always go through LiteLLM pool. Never import `anthropic` or `openai` SDK directly in orchestrator.
- **Socket Mode in production:** Socket Mode breaks horizontal scaling. Use HTTP Events API for production.
- **Next.js 14 specifically:** Current stable is 16 (March 2026). Start on 16.
---
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Multi-provider LLM routing with fallback | Custom provider selector + retry logic | LiteLLM Router | LiteLLM handles fallback chains, cost tracking, load balancing, and provider abstraction. Custom solutions miss edge cases (rate limit windows, streaming failures, provider-specific error codes) |
| Per-tenant DB row filtering | Application-level `WHERE tenant_id = ?` on every query | PostgreSQL RLS | Application filters are forgotten. RLS is enforced at the DB layer even if code has bugs. FORCE RLS makes it apply to all connections |
| Redis token bucket rate limiting | Custom counter + expiry logic | slowapi | Edge cases in token bucket implementations (burst handling, clock skew, reset timing) are numerous. slowapi handles them correctly with Redis backend |
| Portal form validation | Custom form state machine | React Hook Form + Zod | Form validation edge cases (async validation, dependent fields, submission state) are handled by the library. Zod schemas provide shared type safety with the backend |
| Portal auth session management | Custom JWT storage + refresh logic | Auth.js v5 | Session security (CSRF, rotation, expiry, replay protection) is extremely easy to get wrong. Auth.js is the standard for Next.js |
| Slack signature verification | Custom HMAC implementation | slack-bolt (built-in) | slack-bolt verifies `X-Slack-Signature` automatically in `AsyncApp`. Hand-rolling misses timing attack prevention |
| Redis key namespacing convention | Documentation + code review | Shared utility function | Conventions are forgotten. The utility function `redis_keys.py` makes wrong key patterns impossible to compile |
**Key insight:** In a multi-tenant platform, the most dangerous custom solutions are the ones that appear to work in testing (single tenant) but fail in production (multiple tenants) through data leakage.
---
## Common Pitfalls
### Pitfall 1: RLS Appears to Work But Provides Zero Isolation
**What goes wrong:** PostgreSQL RLS policies exist on the tables, tests pass, but the application connects as the `postgres` superuser, which bypasses all RLS policies silently. No error is raised. Tenant data is fully accessible across tenants.
**Why it happens:** Early dev uses `postgres` superuser. RLS is added. Nobody verifies it actually applies. `BYPASSRLS` is implicit for superusers and table owners unless explicitly overridden.
**How to avoid:**
1. Create a `konstruct_app` role before writing the first migration
2. Apply `FORCE ROW LEVEL SECURITY` to every table with RLS
3. All application connections use `konstruct_app` (never `postgres`)
4. Tenant isolation tests connect as `konstruct_app` in pytest fixtures
5. Verify with: `SELECT relforcerowsecurity FROM pg_class WHERE relname = 'agents'` — must be `true`
**Warning signs:** Application connecting as `postgres`; RLS tests using psql instead of application role.
### Pitfall 2: Silent Celery Task Hang from Async/Await
**What goes wrong:** Celery tasks written as `async def` cause `RuntimeError: This event loop is already running` or hang silently without completing. The task appears to be accepted by the broker but never produces a result or error.
**Why it happens:** FastAPI codebase is all `async def`. Developers naturally write Celery tasks the same way. The incompatibility only appears at runtime.
**How to avoid:** All Celery tasks are `def` (synchronous). Async code within tasks is called via `asyncio.run()`. Establish this pattern in the first Celery task stub (Plan 1) before any LLM work.
**Warning signs:** `RuntimeError: This event loop is already running` in Celery worker logs; tasks accepted but never completed.
### Pitfall 3: LiteLLM Request Log Table Degradation
**What goes wrong:** LiteLLM logs every request to PostgreSQL. After ~1M rows (~10 days at 100k req/day), the table causes measurable latency on every LLM call. There are also documented OOM issues with specific versions.
**How to avoid:**
- Implement a Celery Beat log rotation job from day one that deletes rows older than N days
- Set `LITELLM_LOG_LEVEL=ERROR` in production
- Pin LiteLLM to `1.82.5` in Docker — do not use `latest` (September 2025 release had OOM issues)
- Do not use LiteLLM's built-in caching layer (documented bug: cache hit adds 10+ seconds latency); implement caching above LiteLLM in the orchestrator using Redis directly
**Warning signs:** LiteLLM response times creeping up over 2-3 hours; `litellm_logs` table exceeding 500k rows.
### Pitfall 4: Cross-Tenant Redis Key Collision
**What goes wrong:** Conversation history or rate limit counters stored under bare keys (e.g., `history:{thread_id}`) collide when two tenants happen to have the same Slack thread ID pattern. Tenant A reads Tenant B's session data.
**How to avoid:** All Redis keys use `{tenant_id}:` prefix enforced via shared utility function in `packages/shared/redis_keys.py`. No key construction outside this module.
### Pitfall 5: Slack Webhook Acknowledgment Timeout
**What goes wrong:** LLM call is made synchronously inside the Slack event handler. Call takes 8 seconds. Slack receives no 200 within 3 seconds, retries the event, the agent processes it twice, and the app is flagged as unhealthy.
**How to avoid:** Dispatch to Celery immediately. Return HTTP 200. Send the AI reply as a follow-up message via `client.chat_postMessage()`.
### Pitfall 6: Thread Follow-Up Behavior Decision
**What goes how:** This is marked as Claude's discretion. The two options are:
- **Auto-follow:** After the first @mention in a thread, subsequent messages in the same thread trigger responses without re-mentioning the agent. Better UX for sustained conversations.
- **Require @mention each time:** Safer, more explicit, never accidental. Simpler to implement.
**Recommendation:** Implement auto-follow engaged threads for Phase 1. The "AI employee" metaphor implies sustained engagement — requiring a @mention on every follow-up reply breaks the mental model of talking to a colleague. Track the thread_id in Redis after first engagement; respond to any message in that thread until a configurable idle timeout (default: 30 minutes of inactivity resets).
---
## Code Examples
### KonstructMessage Pydantic Model
```python
# packages/shared/models/message.py
from enum import StrEnum
from pydantic import BaseModel, Field
import uuid
from datetime import datetime
class ChannelType(StrEnum):
SLACK = "slack"
WHATSAPP = "whatsapp"
MATTERMOST = "mattermost"
class SenderInfo(BaseModel):
user_id: str
display_name: str
is_bot: bool = False
class MessageContent(BaseModel):
text: str
attachments: list[dict] = Field(default_factory=list)
class KonstructMessage(BaseModel):
id: str = Field(default_factory=lambda: str(uuid.uuid4()))
tenant_id: str | None = None # Populated by Router after tenant resolution
channel: ChannelType
channel_metadata: dict # Workspace/org IDs for tenant resolution
sender: SenderInfo
content: MessageContent
timestamp: datetime
thread_id: str | None = None
reply_to: str | None = None
context: dict = Field(default_factory=dict)
```
### PostgreSQL RLS Migration Pattern
```python
# migrations/versions/001_initial_schema.py
from alembic import op
def upgrade():
# Create application role first
op.execute("CREATE ROLE konstruct_app WITH LOGIN PASSWORD :password")
op.execute("""
CREATE TABLE tenants (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
)
""")
op.execute("""
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
role TEXT NOT NULL,
persona TEXT,
system_prompt TEXT,
model_preference TEXT DEFAULT 'quality',
created_at TIMESTAMPTZ DEFAULT NOW()
)
""")
# RLS on agents table
op.execute("ALTER TABLE agents ENABLE ROW LEVEL SECURITY")
op.execute("ALTER TABLE agents FORCE ROW LEVEL SECURITY")
op.execute("""
CREATE POLICY tenant_isolation ON agents
USING (tenant_id = current_setting('app.current_tenant')::uuid)
""")
# Grant to application role (not superuser)
op.execute("GRANT ALL ON ALL TABLES IN SCHEMA public TO konstruct_app")
```
### LiteLLM Fallback Configuration
```python
# packages/llm-pool/router.py
from litellm import Router
router = Router(
model_list=[
{
"model_name": "fast",
"litellm_params": {
"model": "ollama/qwen3:8b",
"api_base": "http://ollama:11434",
},
},
{
"model_name": "quality",
"litellm_params": {
"model": "anthropic/claude-sonnet-4-20250514",
"api_key": settings.anthropic_api_key,
},
},
{
"model_name": "quality", # Same group = fallback
"litellm_params": {
"model": "openai/gpt-4o",
"api_key": settings.openai_api_key,
},
},
],
fallbacks=[{"quality": ["fast"]}],
num_retries=2,
routing_strategy="latency-based-routing",
set_verbose=False, # Reduce log volume
)
```
### Slack AsyncApp + FastAPI Mount
```python
# packages/gateway/main.py
from fastapi import FastAPI, Request
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler
from .channels.slack import register_slack_handlers
from .normalize import normalize_slack_event
slack_bolt_app = AsyncApp(
token=settings.slack_bot_token,
signing_secret=settings.slack_signing_secret,
)
register_slack_handlers(slack_bolt_app)
slack_handler = AsyncSlackRequestHandler(slack_bolt_app)
app = FastAPI()
@app.post("/slack/events")
async def slack_events(request: Request):
return await slack_handler.handle(request)
@app.get("/health")
async def health():
return {"status": "ok"}
```
### Auth.js v5 Portal Setup
```typescript
// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { z } from "zod"
const loginSchema = z.object({
email: z.string().email(),
password: z.string().min(8),
})
export const { handlers, auth, signIn, signOut } = NextAuth({
providers: [
Credentials({
credentials: {
email: { label: "Email", type: "email" },
password: { label: "Password", type: "password" },
},
async authorize(credentials) {
const parsed = loginSchema.safeParse(credentials)
if (!parsed.success) return null
// Validate against DB via internal API
const response = await fetch(`${process.env.API_URL}/auth/verify`, {
method: "POST",
body: JSON.stringify(parsed.data),
})
if (!response.ok) return null
return response.json()
},
}),
],
pages: { signIn: "/login" },
session: { strategy: "jwt" },
})
```
---
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| NextAuth.js v4 | Auth.js v5 | 2024 | Complete rewrite; v5 is App Router native; v4 patterns don't apply |
| Next.js 14 | Next.js 16 | 2025-2026 | Turbopack default; improved App Router; use 16 not 14 |
| Tailwind CSS v3 | Tailwind CSS v4 | 2025 | CSS-native variables; JIT always on; JIT config removed |
| SQLAlchemy 1.x `session.query()` | SQLAlchemy 2.0 `AsyncSession` + `select()` | 2023 | 1.x patterns are deprecated and cause async bugs in FastAPI |
| psycopg2 | asyncpg | ongoing | psycopg2 blocks the event loop; asyncpg is required for async FastAPI |
| LangGraph/CrewAI for single agent | Custom orchestrator + direct LiteLLM | 2024-2025 | Frameworks add premature abstraction for single-agent v1; evaluate for v2 multi-agent |
| Flake8 + Black + isort | ruff | 2023-2024 | Single tool replaces three; 100x faster; CLAUDE.md already specifies ruff |
| Slack Socket Mode | Slack Events API (HTTP) | permanent | Socket Mode breaks horizontal scaling; HTTP is production-correct |
**Deprecated/outdated patterns to never use:**
- `session.query(Model).filter_by(...)` — SQLAlchemy 1.x style, deprecated
- `psycopg2` as PostgreSQL driver — synchronous, blocks event loop
- `async def` Celery tasks — runtime error or silent hang
- `CREATE POLICY ... USING (...)` without `FORCE ROW LEVEL SECURITY` — bypassed by superuser
- Unnamespaced Redis keys — tenant collision risk
- Socket Mode for Slack — not production-safe
---
## Open Questions
1. **Thread Follow-Up Behavior (Claude's discretion)**
- What we know: User confirmed this is Claude's discretion
- Recommendation: Auto-follow engaged threads with 30-minute idle timeout. Implement as Redis key `{tenant_id}:engaged_thread:{thread_id}` with TTL. See Pattern section above.
2. **Typing Indicator Implementation**
- What we know: User confirmed "shows typing indicator while LLM is generating"
- What's needed: Slack `chat.postEphemeral` doesn't persist. The correct approach is `conversations.mark` or a brief "thinking..." message that gets updated. More likely: use `chat.postMessage` with a placeholder then `chat.update` — but Slack doesn't have a native "typing" API for bots in threads. The standard pattern is showing a `:loading:` spinner emoji as a placeholder message, then replacing it with the real response.
- Recommendation: Post a placeholder message (`"Thinking..."`) immediately upon receiving the Slack event (before dispatching to Celery), store the `ts` (message timestamp) in the Celery task payload, then use `chat.update` to replace it with the real response.
3. **Docker Compose Service Topology for Phase 1**
- What we know: All services must run locally including Ollama
- What's unclear: Whether Ollama requires GPU passthrough in the dev environment affects the compose file
- Recommendation: Include Ollama with GPU optional (`deploy.resources.reservations.devices` using `count: all` but with a fallback to CPU if no GPU available). Use a small model (qwen3:8b or llama3.2:3b) for dev to avoid requiring a GPU.
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest 8.x + pytest-asyncio 0.24+ |
| Config file | `pyproject.toml``[tool.pytest.ini_options]` section (Wave 0) |
| Quick run command | `pytest tests/unit -x -q` |
| Full suite command | `pytest tests/ -x` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| CHAN-01 | KonstructMessage normalization from Slack payload | unit | `pytest tests/unit/test_normalize.py -x` | Wave 0 |
| CHAN-02 | Slack @mention triggers agent response in-thread | integration | `pytest tests/integration/test_slack_flow.py -x` | Wave 0 |
| CHAN-05 | Rate limit rejects over-threshold requests with informative response | unit + integration | `pytest tests/unit/test_ratelimit.py tests/integration/test_ratelimit.py -x` | Wave 0 |
| AGNT-01 | Agent persona is reflected in LLM response | integration | `pytest tests/integration/test_agent_persona.py -x` | Wave 0 |
| LLM-01 | LiteLLM Router falls back from unavailable provider to next | integration | `pytest tests/integration/test_llm_fallback.py -x` | Wave 0 |
| LLM-02 | Requests route to Ollama and Anthropic/OpenAI | integration | `pytest tests/integration/test_llm_providers.py -x` | Wave 0 |
| TNNT-01 | Tenant A cannot access Tenant B's data via DB query | integration | `pytest tests/integration/test_tenant_isolation.py -x` | Wave 0 |
| TNNT-02 | Inbound message resolves to correct tenant from channel metadata | unit | `pytest tests/unit/test_tenant_resolution.py -x` | Wave 0 |
| TNNT-03 | Tenant A cannot read Tenant B's Redis keys | unit | `pytest tests/unit/test_redis_namespacing.py -x` | Wave 0 |
| TNNT-04 | TLS enforced on all inter-service communication | manual | Verify docker-compose TLS config — no automated test | manual-only |
| PRTA-01 | Operator can create/read/update/delete tenants via portal API | integration | `pytest tests/integration/test_portal_tenants.py -x` | Wave 0 |
| PRTA-02 | Agent Designer saves and loads all fields via portal API | integration | `pytest tests/integration/test_portal_agents.py -x` | Wave 0 |
### Sampling Rate
- **Per task commit:** `pytest tests/unit -x -q`
- **Per wave merge:** `pytest tests/ -x`
- **Phase gate:** Full suite green before `/gsd:verify-work`
### Wave 0 Gaps
All test files are new — this is a greenfield project. Required Wave 0 setup:
- [ ] `pyproject.toml` — add `[tool.pytest.ini_options]` with `asyncio_mode = "auto"` and `testpaths = ["tests"]`
- [ ] `tests/conftest.py` — shared fixtures: async DB session, two-tenant fixture (tenant_a, tenant_b), Redis mock, LiteLLM mock
- [ ] `tests/unit/test_normalize.py` — CHAN-01: Slack payload → KonstructMessage
- [ ] `tests/unit/test_tenant_resolution.py` — TNNT-02: workspace_id lookup
- [ ] `tests/unit/test_ratelimit.py` — CHAN-05: token bucket behavior
- [ ] `tests/unit/test_redis_namespacing.py` — TNNT-03: key prefix enforcement
- [ ] `tests/integration/test_tenant_isolation.py` — TNNT-01: two-tenant RLS fixture (most critical test in phase)
- [ ] `tests/integration/test_slack_flow.py` — CHAN-02: end-to-end Slack → LLM → reply (with mocked Slack client)
- [ ] `tests/integration/test_llm_fallback.py` — LLM-01: LiteLLM fallback behavior
- [ ] `tests/integration/test_llm_providers.py` — LLM-02: Ollama + Anthropic routing
- [ ] `tests/integration/test_agent_persona.py` — AGNT-01: persona reflected in LLM prompt
- [ ] `tests/integration/test_portal_tenants.py` — PRTA-01: tenant CRUD API
- [ ] `tests/integration/test_portal_agents.py` — PRTA-02: Agent Designer API
- [ ] Framework install: `uv add --dev pytest pytest-asyncio pytest-httpx` — add to `pyproject.toml`
---
## Sources
### Primary (HIGH confidence)
- PyPI (verified 2026-03-22): FastAPI 0.135.1, SQLAlchemy 2.0.48, Pydantic 2.12.5, Alembic 1.18.4, asyncpg 0.31.0, Celery 5.6.2, LiteLLM 1.82.5, slack-bolt 1.27.0
- `.planning/research/STACK.md` — all version numbers and library rationale
- `.planning/research/ARCHITECTURE.md` — service topology, data flow patterns, anti-patterns
- `.planning/research/PITFALLS.md` — all critical failure modes cross-verified against production post-mortems
- [Slack Bolt Python — async adapter docs](https://github.com/slackapi/bolt-python) — Events API vs Socket Mode, AsyncApp + Starlette adapter
- [LiteLLM Router docs](https://docs.litellm.ai/docs/routing) — model_list config, fallback chains, routing strategies
- [Crunchy Data: RLS for Tenants in PostgreSQL](https://www.crunchydata.com/blog/row-level-security-for-tenants-in-postgres) — FORCE ROW LEVEL SECURITY behavior
- [uv workspace docs](https://docs.astral.sh/uv/concepts/projects/workspaces/) — monorepo setup
### Secondary (MEDIUM confidence)
- [Auth.js v5 docs](https://authjs.dev/) — App Router compatibility, Credentials provider pattern
- [LiteLLM production issues](https://dev.to/debmckinney/youre-probably-going-to-hit-these-litellm-issues-in-production-59bg) — log table degradation, caching bug, version pinning
- [sqlalchemy-tenants GitHub](https://github.com/Telemaco019/sqlalchemy-tenants) — RLS + SQLAlchemy session hook pattern
### Tertiary (LOW confidence)
- Celery async event loop issue — multiple community sources agree on the pattern; official Celery docs confirm workers are synchronous
---
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — all versions verified against PyPI March 2026
- Architecture: HIGH — patterns verified against official Slack, LiteLLM, and PostgreSQL docs
- Pitfalls: HIGH — cross-verified against multiple production post-mortems and official docs
- Portal (Auth.js v5): MEDIUM — official docs exist but not directly fetched via Context7; pattern widely corroborated
**Research date:** 2026-03-23
**Valid until:** 2026-04-22 (30 days — stable libraries; re-verify LiteLLM version before Plan 2 begins given its active release cadence)