Files

Adolfo Delorenzo 2ab18fde4f docs(01): research phase 1 foundation domain

2026-03-23 09:25:20 -06:00

40 KiB

Raw Permalink Blame History

Phase 1: Foundation - Research

Researched: 2026-03-23 Domain: Multi-tenant Python monorepo scaffolding, PostgreSQL RLS, LiteLLM backend pool, Slack Events API, basic agent orchestrator, Next.js admin portal Confidence: HIGH (synthesized from project research docs verified against PyPI, official Slack docs, LiteLLM docs, and pgvector sources — all conducted 2026-03-22)

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Next.js portal starts in Phase 1 — not deferred to Phase 3
Portal includes tenant CRUD (create, list, view, edit, delete tenants)
Portal includes Agent Designer module (job description, SOW, persona, system prompt, tool assignments, escalation rules)
Auth.js v5 with email/password authentication from the start — no hardcoded credentials, no throwaway auth code
Phase 3 scope narrows to: Stripe billing integration, onboarding wizard, cost tracking dashboard, channel connection wizard, and portal polish
AI employees have human-like names by default (e.g., "Mara", "Alex") — matches the "hire an AI employee" branding
Default persona tone: professional + warm — friendly but business-appropriate, like a good colleague
Always transparent about being AI when asked directly — never pretends to be human
Silent until spoken to — no auto-introduction message when added to a Slack channel
Operator configures name, role, persona, and system prompt via the Agent Designer in the portal
Agent responds to: @mentions in channels and direct messages
Does NOT monitor entire channels or respond to all messages (no "designated support channel" mode in v1)
Always replies in threads — keeps channels clean
Shows typing indicator while LLM is generating a response

Claude's Discretion

Thread follow-up behavior (auto-follow after first engagement vs always require @mention)
Portal UI layout and component choices (within shadcn/ui)
Default AI employee name suggestions
Agent Designer form layout and field ordering
Error message copy and formatting

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
CHAN-01	Channel Gateway normalizes messages from all channels into unified KonstructMessage format	KonstructMessage Pydantic model defined in ARCHITECTURE.md; normalization pattern documented
CHAN-02	User can interact with AI employee via Slack (Events API — @mentions, DMs, thread replies)	slack-bolt 1.27.0 AsyncApp pattern; Events API + async FastAPI integration documented in STACK.md
CHAN-05	Platform rate-limits requests per tenant and per channel with configurable thresholds	slowapi with Redis token bucket pattern; ARCHITECTURE.md Message Router layer
AGNT-01	Tenant can configure a single AI employee with custom name, role, and persona	Agent DB schema; system prompt assembly from persona fields; Agent Designer portal module
LLM-01	LiteLLM router abstracts LLM provider selection with fallback routing	LiteLLM Router configuration pattern in ARCHITECTURE.md Pattern 4; fallback chain config
LLM-02	Platform supports Ollama (local) and commercial APIs (Anthropic, OpenAI) as LLM providers	LiteLLM model_list config with ollama + anthropic + openai providers documented
TNNT-01	All tenant data is isolated via PostgreSQL Row Level Security	RLS + FORCE ROW LEVEL SECURITY pattern; app role isolation; sqlalchemy-tenants integration
TNNT-02	Inbound messages are resolved to the correct tenant via channel metadata	channel_connections table lookup; contextvar-based tenant propagation to RLS
TNNT-03	Per-tenant Redis namespace isolation for cache and session state	`{tenant_id}:` key prefix pattern; shared utility enforcement described
TNNT-04	All data encrypted at rest (PostgreSQL, object storage) and in transit (TLS 1.3)	PostgreSQL TDE, MinIO SSE, TLS config for all service-to-service; Docker Compose network isolation
PRTA-01	Operator can create, view, update, and delete tenants	Next.js portal with TanStack Query + FastAPI CRUD endpoints; Auth.js v5 authentication
PRTA-02	Operator can design agents via Agent Designer — name, role, persona, system prompt, tool assignments, escalation rules	Agent Designer as prominent portal module; form fields are text inputs; React Hook Form + Zod
</phase_requirements>

Summary

Phase 1 builds the entire vertical slice from Slack message to LLM response, with no tenant data leakage possible, rate limiting enforced, and an admin portal where operators can manage tenants and configure AI employees. It has four sequential plans: (1) monorepo scaffolding and shared data models with PostgreSQL RLS, (2) the LiteLLM backend pool with Celery async dispatch, (3) Channel Gateway (Slack) + Message Router + basic Agent Orchestrator, and (4) the Next.js admin portal with Auth.js v5, tenant CRUD, and Agent Designer. Plans 1 and 2 must complete before Plan 3 begins; Plan 4 can overlap with Plan 3 once the DB schema stabilizes.

The most dangerous failure mode for Phase 1 is silent cross-tenant data leakage. PostgreSQL RLS only protects the application if FORCE ROW LEVEL SECURITY is applied to every table AND the application connects as a non-superuser role. This must be verified explicitly — RLS can appear to work while providing zero isolation. Every integration test must exercise a two-tenant fixture from the first day of DB schema work.

The second dangerous failure mode is async event loop conflicts in Celery. All Celery task functions must be synchronous def (not async def). The pattern must be established in Plan 1 scaffolding so it becomes the convention before any LLM task work begins in Plan 2.

Primary recommendation: Build the DB schema + RLS + Redis namespacing in Plan 1 with their isolation tests green before touching any channel or LLM code. Tenant isolation retrofitted later costs significantly more than tenant isolation designed first.

Standard Stack

Core Backend

Library	Version	Purpose	Why Standard
Python	3.12	Runtime	CLAUDE.md specified; LTS sweet spot — 3.13 ecosystem support lags
FastAPI	0.135.1	API framework	Async-native, auto OpenAPI docs, DI system; de facto for async Python APIs
Pydantic v2	2.12.5	Data validation	Mandatory for FastAPI; 20x faster than v1; strict mode for public interfaces
SQLAlchemy	2.0.48	ORM	True async `AsyncSession`; 1.x patterns are deprecated and must not be used
Alembic	1.18.4	DB migrations	Standard SQLAlchemy companion; requires async `env.py` modification
asyncpg	0.31.0	PostgreSQL async driver	Required for SQLAlchemy async; faster than psycopg2 for concurrent workloads
PostgreSQL	16	Primary database	CLAUDE.md specified; RLS is the v1 multi-tenancy mechanism
Redis	7.x	Cache, pub/sub, rate limiting, Celery broker	One service for multiple purposes; session state, namespaced per-tenant
Celery	5.6.2	Background job processing	LLM calls dispatched async; prevents Slack webhook timeouts; mature ecosystem
uv	latest	Python package manager	Workspace support for monorepo; replaces pip + virtualenv

LLM Integration

Library	Version	Purpose	Why Standard
LiteLLM	1.82.5	LLM gateway	Unified API across all providers; fallback routing; cost tracking; never call provider APIs directly
Ollama	latest	Local inference	Docker service for dev; OpenAI-compatible API on port 11434

Channel Integration

Library	Version	Purpose	Why Standard
slack-bolt	1.27.0	Slack Events API	Official Slack SDK; use `AsyncApp` in HTTP mode (not Socket Mode in production)

Admin Portal

Library	Version	Purpose	Why Standard
Next.js	16.x	Portal framework	CLAUDE.md specifies 14+; current stable is 16 (March 2026); App Router mature; use 16 to avoid building on a behind version
TypeScript	5.x	Type safety	Strict mode required per CLAUDE.md
Tailwind CSS	4.x	Styling	Required by shadcn/ui; v4 uses CSS-native variables
shadcn/ui	latest	Component library	Copy-to-project model; standard for Next.js admin portals 2025-2026
TanStack Query	5.x	Server state	Client-side fetching, caching, mutations against FastAPI
React Hook Form + Zod	latest	Form validation	Standard pairing for shadcn/ui forms; Zod schemas shared with backend type defs
Auth.js	v5	Portal authentication	v5 rewritten for App Router compatibility; PostgreSQL session adapter; email/password from the start

Rate Limiting

Library	Version	Purpose	Why Standard
slowapi	latest	FastAPI rate limiting	Redis-backed token bucket; integrates directly with FastAPI; per-tenant + per-channel limits

Dev Tools

Tool	Purpose
ruff	Linting + formatting (replaces flake8, isort, black)
mypy --strict	Type checking; no `Any` in public interfaces
pytest + pytest-asyncio	Async test support; use `httpx.AsyncClient` not sync TestClient
Docker Compose	All infra services (PostgreSQL, Redis, Ollama)

Installation

# Initialize Python monorepo
uv init konstruct && cd konstruct
uv workspace add packages/gateway
uv workspace add packages/router
uv workspace add packages/orchestrator
uv workspace add packages/llm-pool
uv workspace add packages/shared

# Core backend dependencies
uv add fastapi[standard] pydantic[email] sqlalchemy[asyncio] asyncpg alembic
uv add litellm redis celery[redis] slack-bolt python-jose[cryptography] httpx slowapi

# Dev dependencies
uv add --dev ruff mypy pytest pytest-asyncio pytest-httpx

# Portal
cd packages/portal
npx create-next-app@latest . --typescript --tailwind --eslint --app
npx shadcn@latest init
npm install @tanstack/react-query react-hook-form zod next-auth

Architecture Patterns

Recommended Project Structure

konstruct/
├── packages/
│   ├── gateway/                     # Channel Gateway service (FastAPI)
│   │   ├── channels/
│   │   │   └── slack.py             # Slack Events API handler (HTTP mode, AsyncApp)
│   │   ├── normalize.py             # Slack event → KonstructMessage
│   │   ├── verify.py                # X-Slack-Signature verification
│   │   └── main.py                  # FastAPI app, /slack/events route
│   │
│   ├── router/                      # Message Router service (FastAPI)
│   │   ├── tenant.py                # workspace_id → tenant_id lookup
│   │   ├── ratelimit.py             # Redis token bucket per tenant/channel
│   │   ├── idempotency.py           # Redis dedup (message_id, TTL 24h)
│   │   ├── context.py               # Load agent config from DB
│   │   └── main.py
│   │
│   ├── orchestrator/                # Agent Orchestrator (Celery workers)
│   │   ├── tasks.py                 # Celery task: handle_message (sync def, NOT async def)
│   │   ├── agents/
│   │   │   ├── builder.py           # Assemble agent prompt from persona + history
│   │   │   └── runner.py            # LLM call → parse response → send reply
│   │   └── main.py                  # Celery worker entry point
│   │
│   ├── llm-pool/                    # LLM Backend Pool service (LiteLLM wrapper)
│   │   ├── router.py                # LiteLLM Router config (model groups + fallback)
│   │   ├── providers/
│   │   │   ├── ollama.py
│   │   │   ├── anthropic.py
│   │   │   └── openai.py
│   │   └── main.py                  # FastAPI app exposing /complete endpoint
│   │
│   ├── portal/                      # Next.js 16 Admin Dashboard
│   │   ├── app/
│   │   │   ├── (auth)/              # /login route
│   │   │   ├── dashboard/           # Post-auth layout
│   │   │   ├── tenants/             # Tenant CRUD pages
│   │   │   ├── agents/              # Agent Designer module
│   │   │   └── api/auth/            # Auth.js route handler
│   │   ├── components/              # shadcn/ui components
│   │   └── lib/
│   │       ├── api.ts               # TanStack Query hooks + API client
│   │       └── auth.ts              # Auth.js config
│   │
│   └── shared/                      # Shared Python library (no service)
│       ├── models/
│       │   ├── message.py           # KonstructMessage Pydantic model
│       │   ├── tenant.py            # Tenant, Agent, ChannelConnection SQLAlchemy models
│       │   └── auth.py              # Portal user models
│       ├── db.py                    # SQLAlchemy async engine + session factory
│       ├── rls.py                   # SET app.current_tenant contextvar + hook
│       └── config.py                # Pydantic Settings (env vars)
│
├── migrations/                      # Alembic (single migration history)
├── tests/
│   ├── unit/
│   └── integration/                 # Two-tenant fixture tests (REQUIRED in Plan 1)
├── docker-compose.yml               # PostgreSQL 16, Redis 7, Ollama, all services
└── pyproject.toml                   # uv workspace config

Pattern 1: Immediate-Acknowledge, Async-Process

What: Channel Gateway returns HTTP 200 to Slack within 3 seconds, without LLM work. Processing is dispatched to Celery. The AI reply arrives as a follow-up Slack message.

When to use: Always. Slack retries and flags apps as unhealthy if no 2xx within 3 seconds. This is non-negotiable.

Example:

# packages/gateway/channels/slack.py
@app.event("message")
async def handle_message(event, say, client):
    msg = normalize_slack(event)
    if await is_duplicate(msg.id):  # Redis idempotency key
        return
    handle_message_task.delay(msg.model_dump())
    # HTTP 200 returned implicitly — Slack is satisfied

Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook

What: Set app.current_tenant on the PostgreSQL connection before every query. RLS policies use this setting to filter every row automatically. Application code never adds WHERE tenant_id = ... manually.

When to use: Every DB interaction in the router and orchestrator. This is the primary tenant isolation mechanism.

Example:

# packages/shared/rls.py
from contextvars import ContextVar
from sqlalchemy import event

current_tenant_id: ContextVar[str | None] = ContextVar("current_tenant_id", default=None)

@event.listens_for(engine.sync_engine, "before_cursor_execute")
def set_tenant_context(conn, cursor, statement, parameters, context, executemany):
    tenant_id = current_tenant_id.get()
    if tenant_id:
        cursor.execute(f"SET LOCAL app.current_tenant = '{tenant_id}'")

Critical RLS migration requirements:

-- Every table must have both the policy AND FORCE applied
ALTER TABLE agents ENABLE ROW LEVEL SECURITY;
ALTER TABLE agents FORCE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON agents
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

-- Application MUST connect as this role (never postgres superuser)
CREATE ROLE konstruct_app WITH LOGIN PASSWORD '...';
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO konstruct_app;

Pattern 3: LiteLLM Router as Internal Singleton Service

What: LLM Backend Pool exposes a single internal HTTP /complete endpoint. Orchestrator workers call this endpoint. LiteLLM Router behind it handles provider selection, fallback, and cost tracking.

When to use: All LLM calls. Never call Anthropic/OpenAI SDKs directly from the orchestrator.

Example:

# packages/llm-pool/router.py
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "fast",
            "litellm_params": {
                "model": "ollama/qwen3:8b",
                "api_base": "http://ollama:11434"
            }
        },
        {
            "model_name": "quality",
            "litellm_params": {"model": "anthropic/claude-sonnet-4-20250514"}
        },
        {
            "model_name": "quality",
            "litellm_params": {"model": "openai/gpt-4o"}  # fallback
        },
    ],
    fallbacks=[{"quality": ["fast"]}],
    routing_strategy="latency-based-routing",
)

Pin LiteLLM version in Docker — never use latest. A September 2025 release caused OOM errors on Kubernetes.

Pattern 4: Celery Task Pattern (SYNC, not async)

What: Celery tasks are synchronous def functions. Async code inside tasks is wrapped with asyncio.run(). This pattern must be established in Plan 1 scaffolding.

Example:

# packages/orchestrator/tasks.py
from celery import Celery
import asyncio

app = Celery("orchestrator", broker="redis://redis:6379/0")

# CORRECT: sync def
@app.task
def handle_message(message_data: dict) -> None:
    asyncio.run(_process_message(message_data))

async def _process_message(message_data: dict) -> None:
    # async DB and LLM calls here
    ...

# WRONG — DO NOT DO THIS:
# @app.task
# async def handle_message(message_data: dict) -> None:  ← RuntimeError

Pattern 5: Redis Namespacing (Tenant Isolation)

What: All Redis keys include {tenant_id}: prefix. Enforce via a shared utility function — convention is insufficient.

Example:

# packages/shared/redis_keys.py
def rate_limit_key(tenant_id: str, channel: str) -> str:
    return f"{tenant_id}:ratelimit:{channel}"

def idempotency_key(tenant_id: str, message_id: str) -> str:
    return f"{tenant_id}:dedup:{message_id}"

def session_key(tenant_id: str, thread_id: str) -> str:
    return f"{tenant_id}:session:{thread_id}"

Pattern 6: Slack AsyncApp + FastAPI Integration

What: Mount slack-bolt AsyncApp inside FastAPI as an ASGI sub-application. HTTP mode only (not Socket Mode) for production.

Example:

# packages/gateway/main.py
from fastapi import FastAPI
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler

slack_app = AsyncApp(
    token=settings.slack_bot_token,
    signing_secret=settings.slack_signing_secret,
)
handler = AsyncSlackRequestHandler(slack_app)

fastapi_app = FastAPI()

@fastapi_app.post("/slack/events")
async def slack_events(req: Request):
    return await handler.handle(req)

Pattern 7: Auth.js v5 with Next.js App Router

What: Auth.js v5 (the rewrite formerly known as NextAuth.js) uses the auth() helper in server components and API routes. Session stored in PostgreSQL via the Drizzle or Prisma adapter.

Example:

// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { db } from "./db"  // PostgreSQL adapter

export const { handlers, auth, signIn, signOut } = NextAuth({
  providers: [
    Credentials({
      credentials: {
        email: { label: "Email", type: "email" },
        password: { label: "Password", type: "password" },
      },
      async authorize(credentials) {
        // validate against DB user table
      },
    }),
  ],
  adapter: DrizzleAdapter(db),  // or PrismaAdapter
})

Anti-Patterns to Avoid

Async Celery tasks: Never write async def Celery tasks. Use asyncio.run() inside sync def tasks.
Superuser PostgreSQL connections: Application must never connect as postgres superuser. RLS is bypassed silently.
Missing FORCE ROW LEVEL SECURITY: RLS policies without FORCE are bypassed by the table owner. Apply ALTER TABLE ... FORCE ROW LEVEL SECURITY to every table.
Unnamespaced Redis keys: Any Redis key without {tenant_id}: prefix can collide across tenants.
LLM work inside webhook handler: Slack requires HTTP 200 in 3 seconds. LLM calls take 5-30 seconds. Always dispatch to Celery.
Direct provider SDK calls from orchestrator: Always go through LiteLLM pool. Never import anthropic or openai SDK directly in orchestrator.
Socket Mode in production: Socket Mode breaks horizontal scaling. Use HTTP Events API for production.
Next.js 14 specifically: Current stable is 16 (March 2026). Start on 16.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Multi-provider LLM routing with fallback	Custom provider selector + retry logic	LiteLLM Router	LiteLLM handles fallback chains, cost tracking, load balancing, and provider abstraction. Custom solutions miss edge cases (rate limit windows, streaming failures, provider-specific error codes)
Per-tenant DB row filtering	Application-level `WHERE tenant_id = ?` on every query	PostgreSQL RLS	Application filters are forgotten. RLS is enforced at the DB layer even if code has bugs. FORCE RLS makes it apply to all connections
Redis token bucket rate limiting	Custom counter + expiry logic	slowapi	Edge cases in token bucket implementations (burst handling, clock skew, reset timing) are numerous. slowapi handles them correctly with Redis backend
Portal form validation	Custom form state machine	React Hook Form + Zod	Form validation edge cases (async validation, dependent fields, submission state) are handled by the library. Zod schemas provide shared type safety with the backend
Portal auth session management	Custom JWT storage + refresh logic	Auth.js v5	Session security (CSRF, rotation, expiry, replay protection) is extremely easy to get wrong. Auth.js is the standard for Next.js
Slack signature verification	Custom HMAC implementation	slack-bolt (built-in)	slack-bolt verifies `X-Slack-Signature` automatically in `AsyncApp`. Hand-rolling misses timing attack prevention
Redis key namespacing convention	Documentation + code review	Shared utility function	Conventions are forgotten. The utility function `redis_keys.py` makes wrong key patterns impossible to compile

Key insight: In a multi-tenant platform, the most dangerous custom solutions are the ones that appear to work in testing (single tenant) but fail in production (multiple tenants) through data leakage.

Common Pitfalls

Pitfall 1: RLS Appears to Work But Provides Zero Isolation

What goes wrong: PostgreSQL RLS policies exist on the tables, tests pass, but the application connects as the postgres superuser, which bypasses all RLS policies silently. No error is raised. Tenant data is fully accessible across tenants.

Why it happens: Early dev uses postgres superuser. RLS is added. Nobody verifies it actually applies. BYPASSRLS is implicit for superusers and table owners unless explicitly overridden.

How to avoid:

Create a konstruct_app role before writing the first migration
Apply FORCE ROW LEVEL SECURITY to every table with RLS
All application connections use konstruct_app (never postgres)
Tenant isolation tests connect as konstruct_app in pytest fixtures
Verify with: SELECT relforcerowsecurity FROM pg_class WHERE relname = 'agents' — must be true

Warning signs: Application connecting as postgres; RLS tests using psql instead of application role.

Pitfall 2: Silent Celery Task Hang from Async/Await

What goes wrong: Celery tasks written as async def cause RuntimeError: This event loop is already running or hang silently without completing. The task appears to be accepted by the broker but never produces a result or error.

Why it happens: FastAPI codebase is all async def. Developers naturally write Celery tasks the same way. The incompatibility only appears at runtime.

How to avoid: All Celery tasks are def (synchronous). Async code within tasks is called via asyncio.run(). Establish this pattern in the first Celery task stub (Plan 1) before any LLM work.

Warning signs: RuntimeError: This event loop is already running in Celery worker logs; tasks accepted but never completed.

Pitfall 3: LiteLLM Request Log Table Degradation

What goes wrong: LiteLLM logs every request to PostgreSQL. After ~1M rows (~10 days at 100k req/day), the table causes measurable latency on every LLM call. There are also documented OOM issues with specific versions.

How to avoid:

Implement a Celery Beat log rotation job from day one that deletes rows older than N days
Set LITELLM_LOG_LEVEL=ERROR in production
Pin LiteLLM to 1.82.5 in Docker — do not use latest (September 2025 release had OOM issues)
Do not use LiteLLM's built-in caching layer (documented bug: cache hit adds 10+ seconds latency); implement caching above LiteLLM in the orchestrator using Redis directly

Warning signs: LiteLLM response times creeping up over 2-3 hours; litellm_logs table exceeding 500k rows.

Pitfall 4: Cross-Tenant Redis Key Collision

What goes wrong: Conversation history or rate limit counters stored under bare keys (e.g., history:{thread_id}) collide when two tenants happen to have the same Slack thread ID pattern. Tenant A reads Tenant B's session data.

How to avoid: All Redis keys use {tenant_id}: prefix enforced via shared utility function in packages/shared/redis_keys.py. No key construction outside this module.

Pitfall 5: Slack Webhook Acknowledgment Timeout

What goes wrong: LLM call is made synchronously inside the Slack event handler. Call takes 8 seconds. Slack receives no 200 within 3 seconds, retries the event, the agent processes it twice, and the app is flagged as unhealthy.

How to avoid: Dispatch to Celery immediately. Return HTTP 200. Send the AI reply as a follow-up message via client.chat_postMessage().

Pitfall 6: Thread Follow-Up Behavior Decision

What goes how: This is marked as Claude's discretion. The two options are:

Auto-follow: After the first @mention in a thread, subsequent messages in the same thread trigger responses without re-mentioning the agent. Better UX for sustained conversations.
Require @mention each time: Safer, more explicit, never accidental. Simpler to implement.

Recommendation: Implement auto-follow engaged threads for Phase 1. The "AI employee" metaphor implies sustained engagement — requiring a @mention on every follow-up reply breaks the mental model of talking to a colleague. Track the thread_id in Redis after first engagement; respond to any message in that thread until a configurable idle timeout (default: 30 minutes of inactivity resets).

Code Examples

KonstructMessage Pydantic Model

# packages/shared/models/message.py
from enum import StrEnum
from pydantic import BaseModel, Field
import uuid
from datetime import datetime

class ChannelType(StrEnum):
    SLACK = "slack"
    WHATSAPP = "whatsapp"
    MATTERMOST = "mattermost"

class SenderInfo(BaseModel):
    user_id: str
    display_name: str
    is_bot: bool = False

class MessageContent(BaseModel):
    text: str
    attachments: list[dict] = Field(default_factory=list)

class KonstructMessage(BaseModel):
    id: str = Field(default_factory=lambda: str(uuid.uuid4()))
    tenant_id: str | None = None  # Populated by Router after tenant resolution
    channel: ChannelType
    channel_metadata: dict  # Workspace/org IDs for tenant resolution
    sender: SenderInfo
    content: MessageContent
    timestamp: datetime
    thread_id: str | None = None
    reply_to: str | None = None
    context: dict = Field(default_factory=dict)

PostgreSQL RLS Migration Pattern

# migrations/versions/001_initial_schema.py
from alembic import op

def upgrade():
    # Create application role first
    op.execute("CREATE ROLE konstruct_app WITH LOGIN PASSWORD :password")

    op.execute("""
        CREATE TABLE tenants (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            name TEXT NOT NULL,
            created_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    op.execute("""
        CREATE TABLE agents (
            id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
            tenant_id UUID NOT NULL REFERENCES tenants(id),
            name TEXT NOT NULL,
            role TEXT NOT NULL,
            persona TEXT,
            system_prompt TEXT,
            model_preference TEXT DEFAULT 'quality',
            created_at TIMESTAMPTZ DEFAULT NOW()
        )
    """)

    # RLS on agents table
    op.execute("ALTER TABLE agents ENABLE ROW LEVEL SECURITY")
    op.execute("ALTER TABLE agents FORCE ROW LEVEL SECURITY")
    op.execute("""
        CREATE POLICY tenant_isolation ON agents
            USING (tenant_id = current_setting('app.current_tenant')::uuid)
    """)

    # Grant to application role (not superuser)
    op.execute("GRANT ALL ON ALL TABLES IN SCHEMA public TO konstruct_app")

LiteLLM Fallback Configuration

# packages/llm-pool/router.py
from litellm import Router

router = Router(
    model_list=[
        {
            "model_name": "fast",
            "litellm_params": {
                "model": "ollama/qwen3:8b",
                "api_base": "http://ollama:11434",
            },
        },
        {
            "model_name": "quality",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-20250514",
                "api_key": settings.anthropic_api_key,
            },
        },
        {
            "model_name": "quality",  # Same group = fallback
            "litellm_params": {
                "model": "openai/gpt-4o",
                "api_key": settings.openai_api_key,
            },
        },
    ],
    fallbacks=[{"quality": ["fast"]}],
    num_retries=2,
    routing_strategy="latency-based-routing",
    set_verbose=False,  # Reduce log volume
)

Slack AsyncApp + FastAPI Mount

# packages/gateway/main.py
from fastapi import FastAPI, Request
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.starlette.async_handler import AsyncSlackRequestHandler
from .channels.slack import register_slack_handlers
from .normalize import normalize_slack_event

slack_bolt_app = AsyncApp(
    token=settings.slack_bot_token,
    signing_secret=settings.slack_signing_secret,
)
register_slack_handlers(slack_bolt_app)
slack_handler = AsyncSlackRequestHandler(slack_bolt_app)

app = FastAPI()

@app.post("/slack/events")
async def slack_events(request: Request):
    return await slack_handler.handle(request)

@app.get("/health")
async def health():
    return {"status": "ok"}

Auth.js v5 Portal Setup

// packages/portal/lib/auth.ts
import NextAuth from "next-auth"
import Credentials from "next-auth/providers/credentials"
import { z } from "zod"

const loginSchema = z.object({
  email: z.string().email(),
  password: z.string().min(8),
})

export const { handlers, auth, signIn, signOut } = NextAuth({
  providers: [
    Credentials({
      credentials: {
        email: { label: "Email", type: "email" },
        password: { label: "Password", type: "password" },
      },
      async authorize(credentials) {
        const parsed = loginSchema.safeParse(credentials)
        if (!parsed.success) return null
        // Validate against DB via internal API
        const response = await fetch(`${process.env.API_URL}/auth/verify`, {
          method: "POST",
          body: JSON.stringify(parsed.data),
        })
        if (!response.ok) return null
        return response.json()
      },
    }),
  ],
  pages: { signIn: "/login" },
  session: { strategy: "jwt" },
})

State of the Art

Old Approach	Current Approach	When Changed	Impact
NextAuth.js v4	Auth.js v5	2024	Complete rewrite; v5 is App Router native; v4 patterns don't apply
Next.js 14	Next.js 16	2025-2026	Turbopack default; improved App Router; use 16 not 14
Tailwind CSS v3	Tailwind CSS v4	2025	CSS-native variables; JIT always on; JIT config removed
SQLAlchemy 1.x `session.query()`	SQLAlchemy 2.0 `AsyncSession` + `select()`	2023	1.x patterns are deprecated and cause async bugs in FastAPI
psycopg2	asyncpg	ongoing	psycopg2 blocks the event loop; asyncpg is required for async FastAPI
LangGraph/CrewAI for single agent	Custom orchestrator + direct LiteLLM	2024-2025	Frameworks add premature abstraction for single-agent v1; evaluate for v2 multi-agent
Flake8 + Black + isort	ruff	2023-2024	Single tool replaces three; 100x faster; CLAUDE.md already specifies ruff
Slack Socket Mode	Slack Events API (HTTP)	permanent	Socket Mode breaks horizontal scaling; HTTP is production-correct

Deprecated/outdated patterns to never use:

session.query(Model).filter_by(...) — SQLAlchemy 1.x style, deprecated
psycopg2 as PostgreSQL driver — synchronous, blocks event loop
async def Celery tasks — runtime error or silent hang
CREATE POLICY ... USING (...) without FORCE ROW LEVEL SECURITY — bypassed by superuser
Unnamespaced Redis keys — tenant collision risk
Socket Mode for Slack — not production-safe

Open Questions

Thread Follow-Up Behavior (Claude's discretion)
- What we know: User confirmed this is Claude's discretion
- Recommendation: Auto-follow engaged threads with 30-minute idle timeout. Implement as Redis key {tenant_id}:engaged_thread:{thread_id} with TTL. See Pattern section above.
Typing Indicator Implementation
- What we know: User confirmed "shows typing indicator while LLM is generating"
- What's needed: Slack chat.postEphemeral doesn't persist. The correct approach is conversations.mark or a brief "thinking..." message that gets updated. More likely: use chat.postMessage with a placeholder then chat.update — but Slack doesn't have a native "typing" API for bots in threads. The standard pattern is showing a :loading: spinner emoji as a placeholder message, then replacing it with the real response.
- Recommendation: Post a placeholder message ("Thinking...") immediately upon receiving the Slack event (before dispatching to Celery), store the ts (message timestamp) in the Celery task payload, then use chat.update to replace it with the real response.
Docker Compose Service Topology for Phase 1
- What we know: All services must run locally including Ollama
- What's unclear: Whether Ollama requires GPU passthrough in the dev environment affects the compose file
- Recommendation: Include Ollama with GPU optional (deploy.resources.reservations.devices using count: all but with a fallback to CPU if no GPU available). Use a small model (qwen3:8b or llama3.2:3b) for dev to avoid requiring a GPU.

Validation Architecture

Test Framework

Property	Value
Framework	pytest 8.x + pytest-asyncio 0.24+
Config file	`pyproject.toml` — `[tool.pytest.ini_options]` section (Wave 0)
Quick run command	`pytest tests/unit -x -q`
Full suite command	`pytest tests/ -x`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
CHAN-01	KonstructMessage normalization from Slack payload	unit	`pytest tests/unit/test_normalize.py -x`	Wave 0
CHAN-02	Slack @mention triggers agent response in-thread	integration	`pytest tests/integration/test_slack_flow.py -x`	Wave 0
CHAN-05	Rate limit rejects over-threshold requests with informative response	unit + integration	`pytest tests/unit/test_ratelimit.py tests/integration/test_ratelimit.py -x`	Wave 0
AGNT-01	Agent persona is reflected in LLM response	integration	`pytest tests/integration/test_agent_persona.py -x`	Wave 0
LLM-01	LiteLLM Router falls back from unavailable provider to next	integration	`pytest tests/integration/test_llm_fallback.py -x`	Wave 0
LLM-02	Requests route to Ollama and Anthropic/OpenAI	integration	`pytest tests/integration/test_llm_providers.py -x`	Wave 0
TNNT-01	Tenant A cannot access Tenant B's data via DB query	integration	`pytest tests/integration/test_tenant_isolation.py -x`	Wave 0
TNNT-02	Inbound message resolves to correct tenant from channel metadata	unit	`pytest tests/unit/test_tenant_resolution.py -x`	Wave 0
TNNT-03	Tenant A cannot read Tenant B's Redis keys	unit	`pytest tests/unit/test_redis_namespacing.py -x`	Wave 0
TNNT-04	TLS enforced on all inter-service communication	manual	Verify docker-compose TLS config — no automated test	manual-only
PRTA-01	Operator can create/read/update/delete tenants via portal API	integration	`pytest tests/integration/test_portal_tenants.py -x`	Wave 0
PRTA-02	Agent Designer saves and loads all fields via portal API	integration	`pytest tests/integration/test_portal_agents.py -x`	Wave 0

Sampling Rate

Per task commit: pytest tests/unit -x -q
Per wave merge: pytest tests/ -x
Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

All test files are new — this is a greenfield project. Required Wave 0 setup:

pyproject.toml — add [tool.pytest.ini_options] with asyncio_mode = "auto" and testpaths = ["tests"]
tests/conftest.py — shared fixtures: async DB session, two-tenant fixture (tenant_a, tenant_b), Redis mock, LiteLLM mock
tests/unit/test_normalize.py — CHAN-01: Slack payload → KonstructMessage
tests/unit/test_tenant_resolution.py — TNNT-02: workspace_id lookup
tests/unit/test_ratelimit.py — CHAN-05: token bucket behavior
tests/unit/test_redis_namespacing.py — TNNT-03: key prefix enforcement
tests/integration/test_tenant_isolation.py — TNNT-01: two-tenant RLS fixture (most critical test in phase)
tests/integration/test_slack_flow.py — CHAN-02: end-to-end Slack → LLM → reply (with mocked Slack client)
tests/integration/test_llm_fallback.py — LLM-01: LiteLLM fallback behavior
tests/integration/test_llm_providers.py — LLM-02: Ollama + Anthropic routing
tests/integration/test_agent_persona.py — AGNT-01: persona reflected in LLM prompt
tests/integration/test_portal_tenants.py — PRTA-01: tenant CRUD API
tests/integration/test_portal_agents.py — PRTA-02: Agent Designer API
Framework install: uv add --dev pytest pytest-asyncio pytest-httpx — add to pyproject.toml

Sources

Primary (HIGH confidence)

PyPI (verified 2026-03-22): FastAPI 0.135.1, SQLAlchemy 2.0.48, Pydantic 2.12.5, Alembic 1.18.4, asyncpg 0.31.0, Celery 5.6.2, LiteLLM 1.82.5, slack-bolt 1.27.0
.planning/research/STACK.md — all version numbers and library rationale
.planning/research/ARCHITECTURE.md — service topology, data flow patterns, anti-patterns
.planning/research/PITFALLS.md — all critical failure modes cross-verified against production post-mortems
Slack Bolt Python — async adapter docs — Events API vs Socket Mode, AsyncApp + Starlette adapter
LiteLLM Router docs — model_list config, fallback chains, routing strategies
Crunchy Data: RLS for Tenants in PostgreSQL — FORCE ROW LEVEL SECURITY behavior
uv workspace docs — monorepo setup

Secondary (MEDIUM confidence)

Auth.js v5 docs — App Router compatibility, Credentials provider pattern
LiteLLM production issues — log table degradation, caching bug, version pinning
sqlalchemy-tenants GitHub — RLS + SQLAlchemy session hook pattern

Tertiary (LOW confidence)

Celery async event loop issue — multiple community sources agree on the pattern; official Celery docs confirm workers are synchronous

Metadata

Confidence breakdown:

Standard stack: HIGH — all versions verified against PyPI March 2026
Architecture: HIGH — patterns verified against official Slack, LiteLLM, and PostgreSQL docs
Pitfalls: HIGH — cross-verified against multiple production post-mortems and official docs
Portal (Auth.js v5): MEDIUM — official docs exist but not directly fetched via Context7; pattern widely corroborated

Research date: 2026-03-23 Valid until: 2026-04-22 (30 days — stable libraries; re-verify LiteLLM version before Plan 2 begins given its active release cadence)

40 KiB Raw Permalink Blame History

Phase 1: Foundation - Research

User Constraints (from CONTEXT.md)

Locked Decisions

Claude's Discretion

Deferred Ideas (OUT OF SCOPE)

Phase Requirements

Summary

Standard Stack

Core Backend

LLM Integration

Channel Integration

Admin Portal

Rate Limiting

Dev Tools

Installation

Architecture Patterns

Recommended Project Structure

Pattern 1: Immediate-Acknowledge, Async-Process

Pattern 2: Tenant-Scoped RLS via SQLAlchemy Event Hook

Pattern 3: LiteLLM Router as Internal Singleton Service

Pattern 4: Celery Task Pattern (SYNC, not async)

Pattern 5: Redis Namespacing (Tenant Isolation)

Pattern 6: Slack AsyncApp + FastAPI Integration

Pattern 7: Auth.js v5 with Next.js App Router

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: RLS Appears to Work But Provides Zero Isolation

Pitfall 2: Silent Celery Task Hang from Async/Await

Pitfall 3: LiteLLM Request Log Table Degradation

Pitfall 4: Cross-Tenant Redis Key Collision

Pitfall 5: Slack Webhook Acknowledgment Timeout

Pitfall 6: Thread Follow-Up Behavior Decision

Code Examples

KonstructMessage Pydantic Model

PostgreSQL RLS Migration Pattern

LiteLLM Fallback Configuration

Slack AsyncApp + FastAPI Mount

Auth.js v5 Portal Setup

State of the Art

Open Questions

Validation Architecture

Test Framework

Phase Requirements → Test Map

Sampling Rate

Wave 0 Gaps

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

40 KiB

Raw Permalink Blame History