Files

40 KiB

Phase 1: Foundation - Research

Researched: 2026-03-23 Domain: Multi-tenant Python monorepo scaffolding, PostgreSQL RLS, LiteLLM backend pool, Slack Events API, basic agent orchestrator, Next.js admin portal Confidence: HIGH (synthesized from project research docs verified against PyPI, official Slack docs, LiteLLM docs, and pgvector sources — all conducted 2026-03-22)


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

  • Next.js portal starts in Phase 1 — not deferred to Phase 3
  • Portal includes tenant CRUD (create, list, view, edit, delete tenants)
  • Portal includes Agent Designer module (job description, SOW, persona, system prompt, tool assignments, escalation rules)
  • Auth.js v5 with email/password authentication from the start — no hardcoded credentials, no throwaway auth code
  • Phase 3 scope narrows to: Stripe billing integration, onboarding wizard, cost tracking dashboard, channel connection wizard, and portal polish
  • AI employees have human-like names by default (e.g., "Mara", "Alex") — matches the "hire an AI employee" branding
  • Default persona tone: professional + warm — friendly but business-appropriate, like a good colleague
  • Always transparent about being AI when asked directly — never pretends to be human
  • Silent until spoken to — no auto-introduction message when added to a Slack channel
  • Operator configures name, role, persona, and system prompt via the Agent Designer in the portal
  • Agent responds to: @mentions in channels and direct messages
  • Does NOT monitor entire channels or respond to all messages (no "designated support channel" mode in v1)
  • Always replies in threads — keeps channels clean
  • Shows typing indicator while LLM is generating a response

Claude's Discretion

  • Thread follow-up behavior (auto-follow after first engagement vs always require @mention)
  • Portal UI layout and component choices (within shadcn/ui)
  • Default AI employee name suggestions
  • Agent Designer form layout and field ordering
  • Error message copy and formatting

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>


<phase_requirements>

Phase Requirements

ID Description Research Support
CHAN-01 Channel Gateway normalizes messages from all channels into unified KonstructMessage format KonstructMessage Pydantic model defined in ARCHITECTURE.md; normalization pattern documented
CHAN-02 User can interact with AI employee via Slack (Events API — @mentions, DMs, thread replies) slack-bolt 1.27.0 AsyncApp pattern; Events API + async FastAPI integration documented in STACK.md
CHAN-05 Platform rate-limits requests per tenant and per channel with configurable thresholds slowapi with Redis token bucket pattern; ARCHITECTURE.md Message Router layer
AGNT-01 Tenant can configure a single AI employee with custom name, role, and persona Agent DB schema; system prompt assembly from persona fields; Agent Designer portal module
LLM-01 LiteLLM router abstracts LLM provider selection with fallback routing LiteLLM Router configuration pattern in ARCHITECTURE.md Pattern 4; fallback chain config
LLM-02 Platform supports Ollama (local) and commercial APIs (Anthropic, OpenAI) as LLM providers LiteLLM model_list config with ollama + anthropic + openai providers documented
TNNT-01 All tenant data is isolated via PostgreSQL Row Level Security RLS + FORCE ROW LEVEL SECURITY pattern; app role isolation; sqlalchemy-tenants integration
TNNT-02 Inbound messages are resolved to the correct tenant via channel metadata channel_connections table lookup; contextvar-based tenant propagation to RLS
TNNT-03 Per-tenant Redis namespace isolation for cache and session state {tenant_id}: key prefix pattern; shared utility enforcement described
TNNT-04 All data encrypted at rest (PostgreSQL, object storage) and in transit (TLS 1.3) PostgreSQL TDE, MinIO SSE, TLS config for all service-to-service; Docker Compose network isolation
PRTA-01 Operator can create, view, update, and delete tenants Next.js portal with TanStack Query + FastAPI CRUD endpoints; Auth.js v5 authentication
PRTA-02 Operator can design agents via Agent Designer — name, role, persona, system prompt, tool assignments, escalation rules Agent Designer as prominent portal module; form fields are text inputs; React Hook Form + Zod
</phase_requirements>

Summary

Phase 1 builds the entire vertical slice from Slack message to LLM response, with no tenant data leakage possible, rate limiting enforced, and an admin portal where operators can manage tenants and configure AI employees. It has four sequential plans: (1) monorepo scaffolding and shared data models with PostgreSQL RLS, (2) the LiteLLM backend pool with Celery async dispatch, (3) Channel Gateway (Slack) + Message Router + basic Agent Orchestrator, and (4) the Next.js admin portal with Auth.js v5, tenant CRUD, and Agent Designer. Plans 1 and 2 must complete before Plan 3 begins; Plan 4 can overlap with Plan 3 once the DB schema stabilizes.

The most dangerous failure mode for Phase 1 is silent cross-tenant data leakage. PostgreSQL RLS only protects the application if FORCE ROW LEVEL SECURITY is applied to every table AND the application connects as a non-superuser role. This must be verified explicitly — RLS can appear to work while providing zero isolation. Every integration test must exercise a two-tenant fixture from the first day of DB schema work.

The second dangerous failure mode is async event loop conflicts in Celery. All Celery task functions must be synchronous def (not async def). The pattern must be established in Plan 1 scaffolding so it becomes the convention before any LLM task work begins in Plan 2.

Primary recommendation: Build the DB schema + RLS + Redis namespacing in Plan 1 with their isolation tests green before touching any channel or LLM code. Tenant isolation retrofitted later costs significantly more than tenant isolation designed first.


Standard Stack

Core Backend

Library Version Purpose Why Standard
Python 3.12 Runtime CLAUDE.md specified; LTS sweet spot — 3.13 ecosystem support lags
FastAPI 0.135.1 API framework Async-native, auto OpenAPI docs, DI system; de facto for async Python APIs
Pydantic v2 2.12.5 Data validation Mandatory for FastAPI; 20x faster than v1; strict mode for public interfaces
SQLAlchemy 2.0.48 ORM True async AsyncSession; 1.x patterns are deprecated and must not be used
Alembic 1.18.4 DB migrations Standard SQLAlchemy companion; requires async env.py modification
asyncpg 0.31.0 PostgreSQL async driver Required for SQLAlchemy async; faster than psycopg2 for concurrent workloads
PostgreSQL 16 Primary database CLAUDE.md specified; RLS is the v1 multi-tenancy mechanism
Redis 7.x Cache, pub/sub, rate limiting, Celery broker One service for multiple purposes; session state, namespaced per-tenant
Celery 5.6.2 Background job processing LLM calls dispatched async; prevents Slack webhook timeouts; mature ecosystem
uv latest Python package manager Workspace support for monorepo; replaces pip + virtualenv

LLM Integration

Library Version Purpose Why Standard
LiteLLM 1.82.5 LLM gateway Unified API across all providers; fallback routing; cost tracking; never call provider APIs directly
Ollama latest Local inference Docker service for dev; OpenAI-compatible API on port 11434

Channel Integration

Library Version Purpose Why Standard
slack-bolt 1.27.0 Slack Events API Official Slack SDK; use AsyncApp in HTTP mode (not Socket Mode in production)

Admin Portal

Library Version Purpose Why Standard
Next.js 16.x Portal framework CLAUDE.md specifies 14+; current stable is 16 (March 2026); App Router mature; use 16 to avoid building on a behind version
TypeScript 5.x Type safety Strict mode required per CLAUDE.md
Tailwind CSS 4.x Styling Required by shadcn/ui; v4 uses CSS-native variables
shadcn/ui latest Component library Copy-to-project model; standard for Next.js admin portals 2025-2026
TanStack Query 5.x Server state Client-side fetching, caching, mutations against FastAPI
React Hook Form + Zod latest Form validation Standard pairing for shadcn/ui forms; Zod schemas shared with backend type defs
Auth.js v5 Portal authentication v5 rewritten for App Router compatibility; PostgreSQL session adapter; email/password from the start

Rate Limiting

Library Version Purpose Why Standard
slowapi latest FastAPI rate limiting Redis-backed token bucket; integrates directly with FastAPI; per-tenant + per-channel limits

Dev Tools

Tool Purpose
ruff Linting + formatting (replaces flake8, isort, black)
mypy --strict Type checking; no Any in public interfaces
pytest + pytest-asyncio Async test support; use httpx.AsyncClient not sync TestClient
Docker Compose All infra services (PostgreSQL, Redis, Ollama)

Installation

# Initialize Python monorepo
uv init konstruct && cd konstruct
uv workspace add packages/gateway
uv workspace add packages/router
uv workspace add packages/orchestrator
uv workspace add packages/llm-pool
uv workspace add packages/shared

# Core backend dependencies
uv add fastapi[standard] pydantic[email] sqlalchemy[asyncio] asyncpg alembic
uv add litellm redis celery[redis] slack-bolt python-jose[cryptography] httpx slowapi

# Dev dependencies
uv add --dev ruff mypy pytest pytest-asyncio pytest-httpx

# Portal
cd packages/portal
npx create-next-app@latest . --typescript --tailwind --eslint --app
npx shadcn@latest init
npm install @tanstack/react-query react-hook-form zod next-auth

Architecture Patterns

konstruct/
├── packages/
│   ├── gateway/                     # Channel Gateway service (FastAPI)
│   │   ├── channels/
│   │   │   └── slack.py             # Slack Events API handler (HTTP mode, AsyncApp)
│   │   ├── normalize.py             # Slack event → KonstructMessage
│   │   ├── verify.py                # X-Slack-Signature verification
│   │   └── main.py                  # FastAPI app, /slack/events route
│   │
│   ├── router/                      # Message Router service (FastAPI)
│   │   ├── tenant.py                # workspace_id → tenant_id lookup
│   │   ├── ratelimit.py             # Redis token bucket per tenant/channel
│   │   ├── idempotency.py           # Redis dedup (message_id, TTL 24h)
│   │   ├── context.py               # Load agent config from DB
│   │   └── main.py
│   │
│   ├── orchestrator/                # Agent Orchestrator (Celery workers)
│   │   ├── tasks.py                 # Celery task: handle_message (sync def, NOT async def)
│   │   ├── agents/
│   │   │   ├── builder.py           # Assemble agent prompt from persona + history
│   │   │   └── runner.py            # LLM call → parse response → send reply
│   │   └── main.py                  # Celery worker entry point
│   │
│   ├── llm-pool/                    # LLM Backend Pool service (LiteLLM wrapper)
│   │   ├── router.py                # LiteLLM Router config (model groups + fallback)
│   │   ├── providers/
│   │   │   ├── ollama.py
│   │   │   ├── anthropic.py
│   │   │   └── openai.py
│   │   └── main.py                  # FastAPI app exposing /complete endpoint
│   │
│   ├── portal/                      # Next.js 16 Admin Dashboard
│   │   ├── app/
│   │   │   ├── (auth)/              # /login route
│   │   │   ├── dashboard/           # Post-auth layout
│   │   │   ├── tenants/             # Tenant CRUD pages
│   │   │   ├── agents/              # Agent Designer module
│   │   │   └── api/auth/            # Auth.js route handler
│   │   ├── components/              # shadcn/ui components
│   │   └── lib/
│   │       ├── api.ts               # TanStack Query hooks + API client
│   │       └── auth.ts              # Auth.js config
│   │
│   └── shared/                      # Shared Python library (no service)
│       ├── models/
│       │   ├── message.py           # KonstructMessage Pydantic model
│       │   ├── tenant.py            # Tenant, Agent, ChannelConnection SQLAlchemy models
│       │   └── auth.py              # Portal user models
│       ├── db.py                    # SQLAlchemy async engine + session factory
│       ├── rls.py                   # SET app.current_tenant contextvar + hook
│       └── config.py                # Pydantic Settings (env vars)
│
├── migrations/                      # Alembic (single migration history)
├── tests/
│   ├── unit/
│   └── integration/                 # Two-tenant fixture tests (REQUIRED in Plan 1)
├── docker-compose.yml               # PostgreSQL 16, Redis 7, Ollama, all services
└── pyproject.toml                   # uv workspace config

Pattern 1: Immediate-Acknowledge, Async-Process

What: Channel Gateway returns HTTP 200 to Slack within 3 seconds, without LLM work. Processing is dispatched to Celery. The AI reply arrives as a follow-up Slack message.

When to use: Always. Slack retries and flags apps as unhealthy if no 2xx within 3 seconds. This is non-negotiable.

Example:

# packages/gateway/channels/slack.py
@app.event("message")
async def handle_message(event, say, client):
    msg = normalize_slack(event)
    if await is_duplicate(msg.id):  # Redis idempotency key
        return
    handle_message_task.delay(msg.model_dump())
    # HTTP 200 returned implicitly — Slack is satisfied

Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook

What: Set app.current_tenant on the PostgreSQL connection before every query. RLS policies use this setting to filter every row automatically. Application code never adds WHERE tenant_id = ... manually.

When to use: Every DB interaction in the router and orchestrator. This is the primary tenant isolation mechanism.

Example:

# packages/shared/rls.py
from contextvars import ContextVar
from sqlalchemy import event

current_tenant_id: ContextVar[str | None] = ContextVar("current_tenant_id", default=None)

@event.listens_for(engine.sync_engine, "before_cursor_execute")
def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
    tenant_id = current_tenant_id.get()
    if tenant_id:
        cursor.execute(f"SET LOCAL app.current_tenant = '{tenant_id}'")

Critical RLS migration requirements:

-- Every table must have both the policy AND FORCE applied
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
ALTER TABLE agents FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON agents
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application MUST connect as this role (never postgres superuser)
CREATE ROLE konstruct_app WITH LOGIN PASSWORD '...';
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO konstruct_app;

Pattern 3: LiteLLM Router as Internal Singleton Service

What: LLM Backend Pool exposes a single internal HTTP /complete endpoint. Orchestrator workers call this endpoint. LiteLLM Router behind it handles provider selection, fallback, and cost tracking.

When to use: All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator.

Example:

# packages/llm-pool/router.py
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "fast",
            "litellm_params": {
                "model": "ollama/qwen3:8b",
                "api_base": "http://ollama:11434"
            }
        },
        {
            "model_name": "quality",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}
        },
        {
            "model_name": "quality",
            "litellm_params": {"model": "openai/gpt-4o"}  # fallback
        },
    ],
    fallbacks=[{"quality": ["fast"]}],
    routing_strategy="latency-based-routing",
)

Pin LiteLLM version in Docker — never use latest. A September 2025 release caused OOM errors on Kubernetes.

Pattern 4: Celery Task Pattern (SYNC, not async)

What: Celery tasks are synchronous def functions. Async code inside tasks is wrapped with asyncio.run(). This pattern must be established in Plan 1 scaffolding.

Example:

# packages/orchestrator/tasks.py
from celery import Celery
import asyncio

app = Celery("orchestrator", broker="redis://redis:6379/0")

# CORRECT: sync def
@app.task
def handle_message(message_data: dict) -> None:
    asyncio.run(_process_message(message_data))

async def _process_message(message_data: dict) -> None:
    # async DB and LLM calls here
    ...

# WRONG — DO NOT DO THIS:
# @app.task
# async def handle_message(message_data: dict) -> None:  ← RuntimeError

Pattern 5: Redis Namespacing (Tenant Isolation)

What: All Redis keys include {tenant_id}: prefix. Enforce via a shared utility function — convention is insufficient.

Example:

# packages/shared/redis_keys.py
def rate_limit_key(tenant_id: str, channel: str) -> str:
    return f"{tenant_id}:ratelimit:{channel}"

def idempotency_key(tenant_id: str, message_id: str) -> str:
    return f"{tenant_id}:dedup:{message_id}"

def session_key(tenant_id: str, thread_id: str) -> str:
    return f"{tenant_id}:session:{thread_id}"

Pattern 6: Slack AsyncApp + FastAPI Integration

What: Mount slack-bolt AsyncApp inside FastAPI as an ASGI sub-application. HTTP mode only (not Socket Mode) for production.

Example:

# packages/gateway/main.py
from fastapi import FastAPI
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler

slack_app = AsyncApp(
    token=settings.slack_bot_token,
    signing_secret=settings.slack_signing_secret,
)
handler = AsyncSlackRequestHandler(slack_app)

fastapi_app = FastAPI()

@fastapi_app.post("/slack/events")
async def slack_events(req: Request):
    return await handler.handle(req)

Pattern 7: Auth.js v5 with Next.js App Router

What: Auth.js v5 (the rewrite formerly known as NextAuth.js) uses the auth() helper in server components and API routes. Session stored in PostgreSQL via the Drizzle or Prisma adapter.

Example:

// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { db } from "./db"  // PostgreSQL adapter

export const { handlers, auth, signIn, signOut } = NextAuth({
  providers: [
    Credentials({
      credentials: {
        email: { label: "Email", type: "email" },
        password: { label: "Password", type: "password" },
      },
      async authorize(credentials) {
        // validate against DB user table
      },
    }),
  ],
  adapter: DrizzleAdapter(db),  // or PrismaAdapter
})

Anti-Patterns to Avoid

  • Async Celery tasks: Never write async def Celery tasks. Use asyncio.run() inside sync def tasks.
  • Superuser PostgreSQL connections: Application must never connect as postgres superuser. RLS is bypassed silently.
  • Missing FORCE ROW LEVEL SECURITY: RLS policies without FORCE are bypassed by the table owner. Apply ALTER TABLE ... FORCE ROW LEVEL SECURITY to every table.
  • Unnamespaced Redis keys: Any Redis key without {tenant_id}: prefix can collide across tenants.
  • LLM work inside webhook handler: Slack requires HTTP 200 in 3 seconds. LLM calls take 5-30 seconds. Always dispatch to Celery.
  • Direct provider SDK calls from orchestrator: Always go through LiteLLM pool. Never import anthropic or openai SDK directly in orchestrator.
  • Socket Mode in production: Socket Mode breaks horizontal scaling. Use HTTP Events API for production.
  • Next.js 14 specifically: Current stable is 16 (March 2026). Start on 16.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Multi-provider LLM routing with fallback Custom provider selector + retry logic LiteLLM Router LiteLLM handles fallback chains, cost tracking, load balancing, and provider abstraction. Custom solutions miss edge cases (rate limit windows, streaming failures, provider-specific error codes)
Per-tenant DB row filtering Application-level WHERE tenant_id = ? on every query PostgreSQL RLS Application filters are forgotten. RLS is enforced at the DB layer even if code has bugs. FORCE RLS makes it apply to all connections
Redis token bucket rate limiting Custom counter + expiry logic slowapi Edge cases in token bucket implementations (burst handling, clock skew, reset timing) are numerous. slowapi handles them correctly with Redis backend
Portal form validation Custom form state machine React Hook Form + Zod Form validation edge cases (async validation, dependent fields, submission state) are handled by the library. Zod schemas provide shared type safety with the backend
Portal auth session management Custom JWT storage + refresh logic Auth.js v5 Session security (CSRF, rotation, expiry, replay protection) is extremely easy to get wrong. Auth.js is the standard for Next.js
Slack signature verification Custom HMAC implementation slack-bolt (built-in) slack-bolt verifies X-Slack-Signature automatically in AsyncApp. Hand-rolling misses timing attack prevention
Redis key namespacing convention Documentation + code review Shared utility function Conventions are forgotten. The utility function redis_keys.py makes wrong key patterns impossible to compile

Key insight: In a multi-tenant platform, the most dangerous custom solutions are the ones that appear to work in testing (single tenant) but fail in production (multiple tenants) through data leakage.


Common Pitfalls

Pitfall 1: RLS Appears to Work But Provides Zero Isolation

What goes wrong: PostgreSQL RLS policies exist on the tables, tests pass, but the application connects as the postgres superuser, which bypasses all RLS policies silently. No error is raised. Tenant data is fully accessible across tenants.

Why it happens: Early dev uses postgres superuser. RLS is added. Nobody verifies it actually applies. BYPASSRLS is implicit for superusers and table owners unless explicitly overridden.

How to avoid:

  1. Create a konstruct_app role before writing the first migration
  2. Apply FORCE ROW LEVEL SECURITY to every table with RLS
  3. All application connections use konstruct_app (never postgres)
  4. Tenant isolation tests connect as konstruct_app in pytest fixtures
  5. Verify with: SELECT relforcerowsecurity FROM pg_class WHERE relname = 'agents' — must be true

Warning signs: Application connecting as postgres; RLS tests using psql instead of application role.

Pitfall 2: Silent Celery Task Hang from Async/Await

What goes wrong: Celery tasks written as async def cause RuntimeError: This event loop is already running or hang silently without completing. The task appears to be accepted by the broker but never produces a result or error.

Why it happens: FastAPI codebase is all async def. Developers naturally write Celery tasks the same way. The incompatibility only appears at runtime.

How to avoid: All Celery tasks are def (synchronous). Async code within tasks is called via asyncio.run(). Establish this pattern in the first Celery task stub (Plan 1) before any LLM work.

Warning signs: RuntimeError: This event loop is already running in Celery worker logs; tasks accepted but never completed.

Pitfall 3: LiteLLM Request Log Table Degradation

What goes wrong: LiteLLM logs every request to PostgreSQL. After ~1M rows (~10 days at 100k req/day), the table causes measurable latency on every LLM call. There are also documented OOM issues with specific versions.

How to avoid:

  • Implement a Celery Beat log rotation job from day one that deletes rows older than N days
  • Set LITELLM_LOG_LEVEL=ERROR in production
  • Pin LiteLLM to 1.82.5 in Docker — do not use latest (September 2025 release had OOM issues)
  • Do not use LiteLLM's built-in caching layer (documented bug: cache hit adds 10+ seconds latency); implement caching above LiteLLM in the orchestrator using Redis directly

Warning signs: LiteLLM response times creeping up over 2-3 hours; litellm_logs table exceeding 500k rows.

Pitfall 4: Cross-Tenant Redis Key Collision

What goes wrong: Conversation history or rate limit counters stored under bare keys (e.g., history:{thread_id}) collide when two tenants happen to have the same Slack thread ID pattern. Tenant A reads Tenant B's session data.

How to avoid: All Redis keys use {tenant_id}: prefix enforced via shared utility function in packages/shared/redis_keys.py. No key construction outside this module.

Pitfall 5: Slack Webhook Acknowledgment Timeout

What goes wrong: LLM call is made synchronously inside the Slack event handler. Call takes 8 seconds. Slack receives no 200 within 3 seconds, retries the event, the agent processes it twice, and the app is flagged as unhealthy.

How to avoid: Dispatch to Celery immediately. Return HTTP 200. Send the AI reply as a follow-up message via client.chat_postMessage().

Pitfall 6: Thread Follow-Up Behavior Decision

What goes how: This is marked as Claude's discretion. The two options are:

  • Auto-follow: After the first @mention in a thread, subsequent messages in the same thread trigger responses without re-mentioning the agent. Better UX for sustained conversations.
  • Require @mention each time: Safer, more explicit, never accidental. Simpler to implement.

Recommendation: Implement auto-follow engaged threads for Phase 1. The "AI employee" metaphor implies sustained engagement — requiring a @mention on every follow-up reply breaks the mental model of talking to a colleague. Track the thread_id in Redis after first engagement; respond to any message in that thread until a configurable idle timeout (default: 30 minutes of inactivity resets).


Code Examples

KonstructMessage Pydantic Model

# packages/shared/models/message.py
from enum import StrEnum
from pydantic import BaseModel, Field
import uuid
from datetime import datetime

class ChannelType(StrEnum):
    SLACK = "slack"
    WHATSAPP = "whatsapp"
    MATTERMOST = "mattermost"

class SenderInfo(BaseModel):
    user_id: str
    display_name: str
    is_bot: bool = False

class MessageContent(BaseModel):
    text: str
    attachments: list[dict] = Field(default_factory=list)

class KonstructMessage(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    tenant_id: str | None = None  # Populated by Router after tenant resolution
    channel: ChannelType
    channel_metadata: dict  # Workspace/org IDs for tenant resolution
    sender: SenderInfo
    content: MessageContent
    timestamp: datetime
    thread_id: str | None = None
    reply_to: str | None = None
    context: dict = Field(default_factory=dict)

PostgreSQL RLS Migration Pattern

# migrations/versions/001_initial_schema.py
from alembic import op

def upgrade():
    # Create application role first
    op.execute("CREATE ROLE konstruct_app WITH LOGIN PASSWORD :password")

    op.execute("""
        CREATE TABLE tenants (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            name TEXT NOT NULL,
            created_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    op.execute("""
        CREATE TABLE agents (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            tenant_id UUID NOT NULL REFERENCES tenants(id),
            name TEXT NOT NULL,
            role TEXT NOT NULL,
            persona TEXT,
            system_prompt TEXT,
            model_preference TEXT DEFAULT 'quality',
            created_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    # RLS on agents table
    op.execute("ALTER TABLE agents ENABLE ROW LEVEL SECURITY")
    op.execute("ALTER TABLE agents FORCE ROW LEVEL SECURITY")
    op.execute("""
        CREATE POLICY tenant_isolation ON agents
            USING (tenant_id = current_setting('app.current_tenant')::uuid)
    """)

    # Grant to application role (not superuser)
    op.execute("GRANT ALL ON ALL TABLES IN SCHEMA public TO konstruct_app")

LiteLLM Fallback Configuration

# packages/llm-pool/router.py
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "fast",
            "litellm_params": {
                "model": "ollama/qwen3:8b",
                "api_base": "http://ollama:11434",
            },
        },
        {
            "model_name": "quality",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-20250514",
                "api_key": settings.anthropic_api_key,
            },
        },
        {
            "model_name": "quality",  # Same group = fallback
            "litellm_params": {
                "model": "openai/gpt-4o",
                "api_key": settings.openai_api_key,
            },
        },
    ],
    fallbacks=[{"quality": ["fast"]}],
    num_retries=2,
    routing_strategy="latency-based-routing",
    set_verbose=False,  # Reduce log volume
)

Slack AsyncApp + FastAPI Mount

# packages/gateway/main.py
from fastapi import FastAPI, Request
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler
from .channels.slack import register_slack_handlers
from .normalize import normalize_slack_event

slack_bolt_app = AsyncApp(
    token=settings.slack_bot_token,
    signing_secret=settings.slack_signing_secret,
)
register_slack_handlers(slack_bolt_app)
slack_handler = AsyncSlackRequestHandler(slack_bolt_app)

app = FastAPI()

@app.post("/slack/events")
async def slack_events(request: Request):
    return await slack_handler.handle(request)

@app.get("/health")
async def health():
    return {"status": "ok"}

Auth.js v5 Portal Setup

// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { z } from "zod"

const loginSchema = z.object({
  email: z.string().email(),
  password: z.string().min(8),
})

export const { handlers, auth, signIn, signOut } = NextAuth({
  providers: [
    Credentials({
      credentials: {
        email: { label: "Email", type: "email" },
        password: { label: "Password", type: "password" },
      },
      async authorize(credentials) {
        const parsed = loginSchema.safeParse(credentials)
        if (!parsed.success) return null
        // Validate against DB via internal API
        const response = await fetch(`${process.env.API_URL}/auth/verify`, {
          method: "POST",
          body: JSON.stringify(parsed.data),
        })
        if (!response.ok) return null
        return response.json()
      },
    }),
  ],
  pages: { signIn: "/login" },
  session: { strategy: "jwt" },
})

State of the Art

Old Approach Current Approach When Changed Impact
NextAuth.js v4 Auth.js v5 2024 Complete rewrite; v5 is App Router native; v4 patterns don't apply
Next.js 14 Next.js 16 2025-2026 Turbopack default; improved App Router; use 16 not 14
Tailwind CSS v3 Tailwind CSS v4 2025 CSS-native variables; JIT always on; JIT config removed
SQLAlchemy 1.x session.query() SQLAlchemy 2.0 AsyncSession + select() 2023 1.x patterns are deprecated and cause async bugs in FastAPI
psycopg2 asyncpg ongoing psycopg2 blocks the event loop; asyncpg is required for async FastAPI
LangGraph/CrewAI for single agent Custom orchestrator + direct LiteLLM 2024-2025 Frameworks add premature abstraction for single-agent v1; evaluate for v2 multi-agent
Flake8 + Black + isort ruff 2023-2024 Single tool replaces three; 100x faster; CLAUDE.md already specifies ruff
Slack Socket Mode Slack Events API (HTTP) permanent Socket Mode breaks horizontal scaling; HTTP is production-correct

Deprecated/outdated patterns to never use:

  • session.query(Model).filter_by(...) — SQLAlchemy 1.x style, deprecated
  • psycopg2 as PostgreSQL driver — synchronous, blocks event loop
  • async def Celery tasks — runtime error or silent hang
  • CREATE POLICY ... USING (...) without FORCE ROW LEVEL SECURITY — bypassed by superuser
  • Unnamespaced Redis keys — tenant collision risk
  • Socket Mode for Slack — not production-safe

Open Questions

  1. Thread Follow-Up Behavior (Claude's discretion)

    • What we know: User confirmed this is Claude's discretion
    • Recommendation: Auto-follow engaged threads with 30-minute idle timeout. Implement as Redis key {tenant_id}:engaged_thread:{thread_id} with TTL. See Pattern section above.
  2. Typing Indicator Implementation

    • What we know: User confirmed "shows typing indicator while LLM is generating"
    • What's needed: Slack chat.postEphemeral doesn't persist. The correct approach is conversations.mark or a brief "thinking..." message that gets updated. More likely: use chat.postMessage with a placeholder then chat.update — but Slack doesn't have a native "typing" API for bots in threads. The standard pattern is showing a :loading: spinner emoji as a placeholder message, then replacing it with the real response.
    • Recommendation: Post a placeholder message ("Thinking...") immediately upon receiving the Slack event (before dispatching to Celery), store the ts (message timestamp) in the Celery task payload, then use chat.update to replace it with the real response.
  3. Docker Compose Service Topology for Phase 1

    • What we know: All services must run locally including Ollama
    • What's unclear: Whether Ollama requires GPU passthrough in the dev environment affects the compose file
    • Recommendation: Include Ollama with GPU optional (deploy.resources.reservations.devices using count: all but with a fallback to CPU if no GPU available). Use a small model (qwen3:8b or llama3.2:3b) for dev to avoid requiring a GPU.

Validation Architecture

Test Framework

Property Value
Framework pytest 8.x + pytest-asyncio 0.24+
Config file pyproject.toml[tool.pytest.ini_options] section (Wave 0)
Quick run command pytest tests/unit -x -q
Full suite command pytest tests/ -x

Phase Requirements → Test Map

Req ID Behavior Test Type Automated Command File Exists?
CHAN-01 KonstructMessage normalization from Slack payload unit pytest tests/unit/test_normalize.py -x Wave 0
CHAN-02 Slack @mention triggers agent response in-thread integration pytest tests/integration/test_slack_flow.py -x Wave 0
CHAN-05 Rate limit rejects over-threshold requests with informative response unit + integration pytest tests/unit/test_ratelimit.py tests/integration/test_ratelimit.py -x Wave 0
AGNT-01 Agent persona is reflected in LLM response integration pytest tests/integration/test_agent_persona.py -x Wave 0
LLM-01 LiteLLM Router falls back from unavailable provider to next integration pytest tests/integration/test_llm_fallback.py -x Wave 0
LLM-02 Requests route to Ollama and Anthropic/OpenAI integration pytest tests/integration/test_llm_providers.py -x Wave 0
TNNT-01 Tenant A cannot access Tenant B's data via DB query integration pytest tests/integration/test_tenant_isolation.py -x Wave 0
TNNT-02 Inbound message resolves to correct tenant from channel metadata unit pytest tests/unit/test_tenant_resolution.py -x Wave 0
TNNT-03 Tenant A cannot read Tenant B's Redis keys unit pytest tests/unit/test_redis_namespacing.py -x Wave 0
TNNT-04 TLS enforced on all inter-service communication manual Verify docker-compose TLS config — no automated test manual-only
PRTA-01 Operator can create/read/update/delete tenants via portal API integration pytest tests/integration/test_portal_tenants.py -x Wave 0
PRTA-02 Agent Designer saves and loads all fields via portal API integration pytest tests/integration/test_portal_agents.py -x Wave 0

Sampling Rate

  • Per task commit: pytest tests/unit -x -q
  • Per wave merge: pytest tests/ -x
  • Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

All test files are new — this is a greenfield project. Required Wave 0 setup:

  • pyproject.toml — add [tool.pytest.ini_options] with asyncio_mode = "auto" and testpaths = ["tests"]
  • tests/conftest.py — shared fixtures: async DB session, two-tenant fixture (tenant_a, tenant_b), Redis mock, LiteLLM mock
  • tests/unit/test_normalize.py — CHAN-01: Slack payload → KonstructMessage
  • tests/unit/test_tenant_resolution.py — TNNT-02: workspace_id lookup
  • tests/unit/test_ratelimit.py — CHAN-05: token bucket behavior
  • tests/unit/test_redis_namespacing.py — TNNT-03: key prefix enforcement
  • tests/integration/test_tenant_isolation.py — TNNT-01: two-tenant RLS fixture (most critical test in phase)
  • tests/integration/test_slack_flow.py — CHAN-02: end-to-end Slack → LLM → reply (with mocked Slack client)
  • tests/integration/test_llm_fallback.py — LLM-01: LiteLLM fallback behavior
  • tests/integration/test_llm_providers.py — LLM-02: Ollama + Anthropic routing
  • tests/integration/test_agent_persona.py — AGNT-01: persona reflected in LLM prompt
  • tests/integration/test_portal_tenants.py — PRTA-01: tenant CRUD API
  • tests/integration/test_portal_agents.py — PRTA-02: Agent Designer API
  • Framework install: uv add --dev pytest pytest-asyncio pytest-httpx — add to pyproject.toml

Sources

Primary (HIGH confidence)

  • PyPI (verified 2026-03-22): FastAPI 0.135.1, SQLAlchemy 2.0.48, Pydantic 2.12.5, Alembic 1.18.4, asyncpg 0.31.0, Celery 5.6.2, LiteLLM 1.82.5, slack-bolt 1.27.0
  • .planning/research/STACK.md — all version numbers and library rationale
  • .planning/research/ARCHITECTURE.md — service topology, data flow patterns, anti-patterns
  • .planning/research/PITFALLS.md — all critical failure modes cross-verified against production post-mortems
  • Slack Bolt Python — async adapter docs — Events API vs Socket Mode, AsyncApp + Starlette adapter
  • LiteLLM Router docs — model_list config, fallback chains, routing strategies
  • Crunchy Data: RLS for Tenants in PostgreSQL — FORCE ROW LEVEL SECURITY behavior
  • uv workspace docs — monorepo setup

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • Celery async event loop issue — multiple community sources agree on the pattern; official Celery docs confirm workers are synchronous

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all versions verified against PyPI March 2026
  • Architecture: HIGH — patterns verified against official Slack, LiteLLM, and PostgreSQL docs
  • Pitfalls: HIGH — cross-verified against multiple production post-mortems and official docs
  • Portal (Auth.js v5): MEDIUM — official docs exist but not directly fetched via Context7; pattern widely corroborated

Research date: 2026-03-23 Valid until: 2026-04-22 (30 days — stable libraries; re-verify LiteLLM version before Plan 2 begins given its active release cadence)