Files

Adolfo Delorenzo 0e0ea5fb66 fix: runtime deployment fixes for Docker Compose stack

- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-24 12:26:34 -06:00

17 KiB

Raw Blame History

CLAUDE.md — Konstruct

What is Konstruct?

Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.

Think of it as "Hire an AI department" — not another chatbot SaaS.

Project Identity

Codename: Konstruct
Domain: TBD (check konstruct.ai, konstruct.io, konstruct.dev)
Tagline ideas: "Build your AI workforce" / "AI teams that just work"
Inspired by: paperclip.ing
Differentiation: Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support

Architecture Overview

Core Mental Model

Client (Slack/Teams/etc.)
        │
        ▼
┌─────────────────────┐
│   Channel Gateway    │  ← Unified ingress for all messaging platforms
│   (webhook/WS)      │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   Message Router     │  ← Tenant resolution, rate limiting, context loading
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   Agent Orchestrator │  ← Agent selection, tool dispatch, memory, handoffs
│   (per-tenant)       │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   LLM Backend Pool   │  ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
└─────────────────────┘

Key Architectural Principles

Channel-agnostic core — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
Tenant-isolated agent state — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
LLM backend as a pluggable resource — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
Agents are composable — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.

Tech Stack

Backend (Primary: Python)

Layer	Technology	Rationale
API Framework	FastAPI	Async-native, OpenAPI docs, dependency injection
Task Queue	Celery + Redis or Dramatiq	Background jobs: LLM calls, tool execution, webhooks
Database	PostgreSQL 16	Primary data store, tenant isolation via schemas or RLS
Cache / Pub-Sub	Redis / Valkey	Session state, rate limiting, pub/sub for real-time events
Vector Store	pgvector (start) → Qdrant (scale)	Agent memory, RAG, conversation search
Object Storage	MinIO (self-hosted) / S3 (cloud burst)	File attachments, documents, agent artifacts
LLM Gateway	LiteLLM	Unified API across all LLM providers, cost tracking, fallback routing
Agent Framework	Custom (evaluate LangGraph, CrewAI, or raw)	Agent orchestration, tool use, multi-agent handoffs

Messaging Channel SDKs

Channel	Library / Integration
Slack	`slack-bolt` (Events API + Socket Mode)
Microsoft Teams	`botbuilder-python` (Bot Framework SDK)
Mattermost	`mattermostdriver` + webhooks
Rocket.Chat	REST API + Realtime API (WebSocket)
WhatsApp	WhatsApp Business API (Cloud API)
Telegram	`python-telegram-bot` (Bot API)
Signal	`signal-cli` or `signald` (bridge)

Frontend (Admin Dashboard / Client Portal)

Layer	Technology
Framework	Next.js 14+ (App Router)
UI	Tailwind CSS + shadcn/ui
State	TanStack Query
Auth	NextAuth.js → consider Keycloak for enterprise

Infrastructure

Layer	Technology
Dev Orchestration	Docker Compose + Portainer
Prod Orchestration	Kubernetes (k3s or Talos Linux)
Core Hosting	Hetzner Dedicated Servers
Cloud Burst	AWS / GCP (auto-scale inference, overflow)
Reverse Proxy	NPM Plus (dev) / Traefik (prod K8s ingress)
DNS	Technitium (internal) / Cloudflare (external)
VPN Mesh	Headscale (self-hosted) + Tailscale clients
CI/CD	Gitea Actions → GitHub Actions (if public)
Monitoring	Prometheus + Grafana + Loki
Security	Wazuh (SIEM), Trivy (container scanning)

Repo Structure

Monorepo to start, split later when service boundaries stabilize.

konstruct/
├── CLAUDE.md                    # This file
├── docker-compose.yml           # Local dev environment
├── docker-compose.prod.yml      # Production-like local stack
├── k8s/                         # Kubernetes manifests / Helm charts
│   ├── base/
│   └── overlays/
│       ├── staging/
│       └── production/
├── packages/
│   ├── gateway/                 # Channel Gateway service
│   │   ├── channels/            # Per-channel adapters (slack, teams, etc.)
│   │   ├── normalize.py         # Unified message format
│   │   └── main.py
│   ├── router/                  # Message Router service
│   │   ├── tenant.py            # Tenant resolution
│   │   ├── ratelimit.py
│   │   └── main.py
│   ├── orchestrator/            # Agent Orchestrator service
│   │   ├── agents/              # Agent definitions and behaviors
│   │   ├── teams/               # Multi-agent team logic
│   │   ├── tools/               # Tool registry and execution
│   │   ├── memory/              # Conversation and long-term memory
│   │   └── main.py
│   ├── llm-pool/                # LLM Backend Pool service
│   │   ├── providers/           # Provider configs (litellm router)
│   │   ├── byo/                 # BYO key / endpoint management
│   │   └── main.py
│   ├── portal/                  # Next.js admin dashboard
│   │   ├── app/
│   │   ├── components/
│   │   └── lib/
│   └── shared/                  # Shared Python libs
│       ├── models/              # Pydantic models, DB schemas
│       ├── auth/                # Auth utilities
│       ├── messaging/           # Internal message format
│       └── config/              # Shared config / env management
├── migrations/                  # Alembic DB migrations
├── scripts/                     # Dev scripts, seed data, utilities
├── tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
├── docs/                        # Architecture docs, ADRs, runbooks
├── pyproject.toml               # Python monorepo config (uv / hatch)
└── .env.example

Multi-Tenancy Model

Tiered isolation — the level increases with the subscription plan:

Tier	Isolation	Target
Starter	Shared infra, PostgreSQL RLS, logical separation	Solo founders, micro-businesses
Team	Dedicated DB schema, isolated Redis namespace, dedicated agent processes	SMBs, small teams
Enterprise	Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference	Larger orgs, compliance needs
Self-Hosted	Customer deploys their own Konstruct instance (Helm chart / Docker Compose)	On-prem requirements, data sovereignty

Tenant Resolution Flow

Inbound message hits Channel Gateway
Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
Router maps channel org → Konstruct tenant via lookup table
All subsequent processing scoped to that tenant's context, models, tools, and memory

AI Employee Model

Hierarchy

Company (AI-run)
  └── Team
       └── Employee (Agent)
            ├── Role definition (system prompt + persona)
            ├── Skills (tool bindings)
            ├── Memory (vector store + conversation history)
            ├── Channels (which messaging platforms it's active on)
            └── Escalation rules (when to hand off to human or another agent)

Employee Configuration (example)

employee:
  name: "Mara"
  role: "Customer Support Lead"
  persona: |
    Professional, empathetic, solution-oriented.
    Fluent in English, Spanish, Portuguese.
    Escalates billing disputes to human after 2 failed resolutions.
  model:
    primary: "anthropic/claude-sonnet-4-20250514"
    fallback: "openai/gpt-4o"
    local: "ollama/qwen3:32b"
  tools:
    - zendesk_ticket_create
    - zendesk_ticket_lookup
    - knowledge_base_search
    - calendar_book
  channels:
    - slack
    - whatsapp
  memory:
    type: "conversational + rag"
    retention_days: 90
  escalation:
    - condition: "billing_dispute AND attempts > 2"
      action: "handoff_human"
    - condition: "sentiment < -0.7"
      action: "handoff_human"

Team Orchestration

Teams use a coordinator pattern:

Coordinator agent receives the inbound message
Coordinator decides which team member(s) should handle it (routing)
Specialist agent(s) execute their part
Coordinator assembles the final response or delegates follow-up
All inter-agent communication logged for audit

LLM Backend Strategy

Provider Hierarchy

┌─────────────────────────────────────────┐
│              LiteLLM Router             │
│  (load balancing, fallback, cost caps)  │
└────┬──────────┬──────────┬─────────┬────┘
     │          │          │         │
  Ollama     vLLM     Anthropic   OpenAI
  (local)   (local)    (API)      (API)
                                    │
                              BYO Endpoint
                            (customer-provided)

Routing Logic

Tenant config specifies preferred provider(s) and fallback chain
Cost caps per tenant (daily/monthly spend limits)
Model routing by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
BYO keys stored encrypted (AES-256), never logged, never used for other tenants

Messaging Format (Internal)

All channel adapters normalize messages into this format:

class KonstructMessage(BaseModel):
    id: str                          # UUID
    tenant_id: str                   # Konstruct tenant
    channel: ChannelType             # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
    channel_metadata: dict           # Channel-specific IDs (workspace, channel, thread)
    sender: SenderInfo               # User ID, display name, role
    content: MessageContent          # Text, attachments, structured data
    timestamp: datetime
    thread_id: str | None            # For threaded conversations
    reply_to: str | None             # Parent message ID
    context: dict                    # Extracted intent, entities, sentiment (populated downstream)

Security & Compliance

Non-Negotiables

Encryption at rest (PostgreSQL TDE, MinIO server-side encryption)
Encryption in transit (TLS 1.3 everywhere, mTLS between services)
Tenant isolation enforced at every layer (DB, cache, object storage, agent memory)
BYO API keys encrypted with per-tenant KEK, HSM-backed in Enterprise tier
Audit log for every agent action, tool invocation, and LLM call
RBAC per tenant (admin, manager, member, viewer)
Rate limiting per tenant, per channel, per agent
PII handling — configurable PII detection and redaction per tenant

Future Compliance Targets

SOC 2 Type II (when revenue supports it)
GDPR data residency (leverage Hetzner EU + customer self-hosted option)
HIPAA (Enterprise self-hosted tier only, with BAA)

Development Workflow

Local Dev

# Clone and setup
git clone <repo-url> && cd konstruct
cp .env.example .env

# Start all services
docker compose up -d

# Run gateway in dev mode (hot reload)
cd packages/gateway
uvicorn main:app --reload --port 8001

# Run tests
pytest tests/unit -x
pytest tests/integration -x

Branch Strategy

main — production-ready, protected
develop — integration branch
feat/* — feature branches off develop
fix/* — bugfix branches
release/* — release candidates

CI Pipeline

Lint (ruff check, ruff format --check)
Type check (mypy --strict)
Unit tests (pytest tests/unit)
Integration tests (pytest tests/integration — spins up Docker Compose)
Container build + scan (trivy image)
Deploy to staging (auto on develop merge)
Deploy to production (manual approval on release/* merge)

Milestones

Phase 1: Foundation (Weeks 1–6)

Repo scaffolding, CI/CD, Docker Compose dev environment
PostgreSQL schema with RLS multi-tenancy
Unified message format and Channel Gateway (start with Slack + Telegram)
Basic agent orchestrator (single agent per tenant, no teams yet)
LiteLLM integration with Ollama + one commercial API
Basic admin portal (tenant CRUD, agent config)

Phase 2: Channel Expansion + Teams (Weeks 7–12)

Add channels: Mattermost, WhatsApp, Teams
Multi-agent teams with coordinator pattern
Conversational memory (vector store + sliding window)
Tool framework (registry, execution, sandboxing)
BYO API key support
Tenant onboarding flow in portal

Phase 3: Polish + Launch (Weeks 13–18)

Add channels: Rocket.Chat, Signal
AI company hierarchy (teams of teams)
Cost tracking and billing integration (Stripe)
Agent performance analytics dashboard
Self-hosted deployment option (Helm chart + docs)
Public launch (Product Hunt, Hacker News, Reddit)

Phase 4: Scale (Post-Launch)

Kubernetes migration for production workloads
Cloud burst infrastructure (AWS auto-scaling inference)
Marketplace for pre-built AI employee templates
Enterprise tier with dedicated isolation
SOC 2 preparation
API for programmatic agent management

Coding Standards

Python

Version: 3.12+
Package manager: uv
Linting: ruff (replaces flake8, isort, black)
Type checking: mypy --strict — no Any types in public interfaces
Testing: pytest + pytest-asyncio + httpx (for FastAPI test client)
Models: Pydantic v2 for all data validation and serialization
Async: Prefer async def for all I/O-bound operations
DB: SQLAlchemy 2.0 async with Alembic migrations

TypeScript (Portal)

Runtime: Node 20+ LTS
Framework: Next.js 14+ (App Router)
Linting: eslint + prettier
Type checking: strict: true in tsconfig

General

Every PR requires at least one approval
No secrets in code — use .env + secrets manager
Write ADRs (Architecture Decision Records) in docs/adr/ for significant decisions
Conventional commits (feat:, fix:, chore:, docs:, refactor:)

Key Design Decisions (ADR Stubs)

These need full ADRs written before implementation:

ADR-001: Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
ADR-002: Agent memory — pgvector vs. dedicated vector DB vs. hybrid
ADR-003: Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
ADR-004: Agent framework — build custom vs. adopt LangGraph/CrewAI
ADR-005: BYO key encryption — envelope encryption strategy and key rotation
ADR-006: Inter-agent communication — direct function calls vs. message bus vs. shared context
ADR-007: Rate limiting — per-tenant token bucket implementation
ADR-008: Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus

Open Questions

Pricing model: per-agent, per-message, per-seat, or hybrid?
Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
Agent-to-agent communication across tenants (marketplace scenario)?
White-labeling for agencies reselling Konstruct?

References

paperclip.ing — Inspiration
LiteLLM docs — LLM gateway
Slack Bolt Python — Slack SDK
Bot Framework Python — Teams SDK
FastAPI docs — API framework

17 KiB Raw Blame History Unescape Escape