- Add .gitignore for __pycache__, node_modules, .playwright-mcp - Add CLAUDE.md project instructions - docker-compose: remove host port exposure for internal services, remove Ollama container (use host), add CORS origin, bake NEXT_PUBLIC_API_URL at build time, run alembic migrations on gateway startup, add CPU-only torch pre-install - gateway: add CORS middleware, graceful Slack degradation without bot token, fix None guard on slack_handler - gateway pyproject: add aiohttp dependency for slack-bolt async - llm-pool pyproject: install litellm from GitHub (removed from PyPI), enable hatch direct references - portal: enable standalone output in next.config.ts - Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 KiB
17 KiB
CLAUDE.md — Konstruct
What is Konstruct?
Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.
Think of it as "Hire an AI department" — not another chatbot SaaS.
Project Identity
- Codename: Konstruct
- Domain: TBD (check konstruct.ai, konstruct.io, konstruct.dev)
- Tagline ideas: "Build your AI workforce" / "AI teams that just work"
- Inspired by: paperclip.ing
- Differentiation: Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support
Architecture Overview
Core Mental Model
Client (Slack/Teams/etc.)
│
▼
┌─────────────────────┐
│ Channel Gateway │ ← Unified ingress for all messaging platforms
│ (webhook/WS) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Message Router │ ← Tenant resolution, rate limiting, context loading
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Agent Orchestrator │ ← Agent selection, tool dispatch, memory, handoffs
│ (per-tenant) │
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ LLM Backend Pool │ ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
└─────────────────────┘
Key Architectural Principles
- Channel-agnostic core — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
- Tenant-isolated agent state — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
- LLM backend as a pluggable resource — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
- Agents are composable — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.
Tech Stack
Backend (Primary: Python)
| Layer | Technology | Rationale |
|---|---|---|
| API Framework | FastAPI | Async-native, OpenAPI docs, dependency injection |
| Task Queue | Celery + Redis or Dramatiq | Background jobs: LLM calls, tool execution, webhooks |
| Database | PostgreSQL 16 | Primary data store, tenant isolation via schemas or RLS |
| Cache / Pub-Sub | Redis / Valkey | Session state, rate limiting, pub/sub for real-time events |
| Vector Store | pgvector (start) → Qdrant (scale) | Agent memory, RAG, conversation search |
| Object Storage | MinIO (self-hosted) / S3 (cloud burst) | File attachments, documents, agent artifacts |
| LLM Gateway | LiteLLM | Unified API across all LLM providers, cost tracking, fallback routing |
| Agent Framework | Custom (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |
Messaging Channel SDKs
| Channel | Library / Integration |
|---|---|
| Slack | slack-bolt (Events API + Socket Mode) |
| Microsoft Teams | botbuilder-python (Bot Framework SDK) |
| Mattermost | mattermostdriver + webhooks |
| Rocket.Chat | REST API + Realtime API (WebSocket) |
| WhatsApp Business API (Cloud API) | |
| Telegram | python-telegram-bot (Bot API) |
| Signal | signal-cli or signald (bridge) |
Frontend (Admin Dashboard / Client Portal)
| Layer | Technology |
|---|---|
| Framework | Next.js 14+ (App Router) |
| UI | Tailwind CSS + shadcn/ui |
| State | TanStack Query |
| Auth | NextAuth.js → consider Keycloak for enterprise |
Infrastructure
| Layer | Technology |
|---|---|
| Dev Orchestration | Docker Compose + Portainer |
| Prod Orchestration | Kubernetes (k3s or Talos Linux) |
| Core Hosting | Hetzner Dedicated Servers |
| Cloud Burst | AWS / GCP (auto-scale inference, overflow) |
| Reverse Proxy | NPM Plus (dev) / Traefik (prod K8s ingress) |
| DNS | Technitium (internal) / Cloudflare (external) |
| VPN Mesh | Headscale (self-hosted) + Tailscale clients |
| CI/CD | Gitea Actions → GitHub Actions (if public) |
| Monitoring | Prometheus + Grafana + Loki |
| Security | Wazuh (SIEM), Trivy (container scanning) |
Repo Structure
Monorepo to start, split later when service boundaries stabilize.
konstruct/
├── CLAUDE.md # This file
├── docker-compose.yml # Local dev environment
├── docker-compose.prod.yml # Production-like local stack
├── k8s/ # Kubernetes manifests / Helm charts
│ ├── base/
│ └── overlays/
│ ├── staging/
│ └── production/
├── packages/
│ ├── gateway/ # Channel Gateway service
│ │ ├── channels/ # Per-channel adapters (slack, teams, etc.)
│ │ ├── normalize.py # Unified message format
│ │ └── main.py
│ ├── router/ # Message Router service
│ │ ├── tenant.py # Tenant resolution
│ │ ├── ratelimit.py
│ │ └── main.py
│ ├── orchestrator/ # Agent Orchestrator service
│ │ ├── agents/ # Agent definitions and behaviors
│ │ ├── teams/ # Multi-agent team logic
│ │ ├── tools/ # Tool registry and execution
│ │ ├── memory/ # Conversation and long-term memory
│ │ └── main.py
│ ├── llm-pool/ # LLM Backend Pool service
│ │ ├── providers/ # Provider configs (litellm router)
│ │ ├── byo/ # BYO key / endpoint management
│ │ └── main.py
│ ├── portal/ # Next.js admin dashboard
│ │ ├── app/
│ │ ├── components/
│ │ └── lib/
│ └── shared/ # Shared Python libs
│ ├── models/ # Pydantic models, DB schemas
│ ├── auth/ # Auth utilities
│ ├── messaging/ # Internal message format
│ └── config/ # Shared config / env management
├── migrations/ # Alembic DB migrations
├── scripts/ # Dev scripts, seed data, utilities
├── tests/
│ ├── unit/
│ ├── integration/
│ └── e2e/
├── docs/ # Architecture docs, ADRs, runbooks
├── pyproject.toml # Python monorepo config (uv / hatch)
└── .env.example
Multi-Tenancy Model
Tiered isolation — the level increases with the subscription plan:
| Tier | Isolation | Target |
|---|---|---|
| Starter | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
| Team | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
| Enterprise | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
| Self-Hosted | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |
Tenant Resolution Flow
- Inbound message hits Channel Gateway
- Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
- Router maps channel org → Konstruct tenant via lookup table
- All subsequent processing scoped to that tenant's context, models, tools, and memory
AI Employee Model
Hierarchy
Company (AI-run)
└── Team
└── Employee (Agent)
├── Role definition (system prompt + persona)
├── Skills (tool bindings)
├── Memory (vector store + conversation history)
├── Channels (which messaging platforms it's active on)
└── Escalation rules (when to hand off to human or another agent)
Employee Configuration (example)
employee:
name: "Mara"
role: "Customer Support Lead"
persona: |
Professional, empathetic, solution-oriented.
Fluent in English, Spanish, Portuguese.
Escalates billing disputes to human after 2 failed resolutions.
model:
primary: "anthropic/claude-sonnet-4-20250514"
fallback: "openai/gpt-4o"
local: "ollama/qwen3:32b"
tools:
- zendesk_ticket_create
- zendesk_ticket_lookup
- knowledge_base_search
- calendar_book
channels:
- slack
- whatsapp
memory:
type: "conversational + rag"
retention_days: 90
escalation:
- condition: "billing_dispute AND attempts > 2"
action: "handoff_human"
- condition: "sentiment < -0.7"
action: "handoff_human"
Team Orchestration
Teams use a coordinator pattern:
- Coordinator agent receives the inbound message
- Coordinator decides which team member(s) should handle it (routing)
- Specialist agent(s) execute their part
- Coordinator assembles the final response or delegates follow-up
- All inter-agent communication logged for audit
LLM Backend Strategy
Provider Hierarchy
┌─────────────────────────────────────────┐
│ LiteLLM Router │
│ (load balancing, fallback, cost caps) │
└────┬──────────┬──────────┬─────────┬────┘
│ │ │ │
Ollama vLLM Anthropic OpenAI
(local) (local) (API) (API)
│
BYO Endpoint
(customer-provided)
Routing Logic
- Tenant config specifies preferred provider(s) and fallback chain
- Cost caps per tenant (daily/monthly spend limits)
- Model routing by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
- BYO keys stored encrypted (AES-256), never logged, never used for other tenants
Messaging Format (Internal)
All channel adapters normalize messages into this format:
class KonstructMessage(BaseModel):
id: str # UUID
tenant_id: str # Konstruct tenant
channel: ChannelType # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
channel_metadata: dict # Channel-specific IDs (workspace, channel, thread)
sender: SenderInfo # User ID, display name, role
content: MessageContent # Text, attachments, structured data
timestamp: datetime
thread_id: str | None # For threaded conversations
reply_to: str | None # Parent message ID
context: dict # Extracted intent, entities, sentiment (populated downstream)
Security & Compliance
Non-Negotiables
- Encryption at rest (PostgreSQL TDE, MinIO server-side encryption)
- Encryption in transit (TLS 1.3 everywhere, mTLS between services)
- Tenant isolation enforced at every layer (DB, cache, object storage, agent memory)
- BYO API keys encrypted with per-tenant KEK, HSM-backed in Enterprise tier
- Audit log for every agent action, tool invocation, and LLM call
- RBAC per tenant (admin, manager, member, viewer)
- Rate limiting per tenant, per channel, per agent
- PII handling — configurable PII detection and redaction per tenant
Future Compliance Targets
- SOC 2 Type II (when revenue supports it)
- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
- HIPAA (Enterprise self-hosted tier only, with BAA)
Development Workflow
Local Dev
# Clone and setup
git clone <repo-url> && cd konstruct
cp .env.example .env
# Start all services
docker compose up -d
# Run gateway in dev mode (hot reload)
cd packages/gateway
uvicorn main:app --reload --port 8001
# Run tests
pytest tests/unit -x
pytest tests/integration -x
Branch Strategy
main— production-ready, protecteddevelop— integration branchfeat/*— feature branches off developfix/*— bugfix branchesrelease/*— release candidates
CI Pipeline
- Lint (
ruff check,ruff format --check) - Type check (
mypy --strict) - Unit tests (
pytest tests/unit) - Integration tests (
pytest tests/integration— spins up Docker Compose) - Container build + scan (
trivy image) - Deploy to staging (auto on
developmerge) - Deploy to production (manual approval on
release/*merge)
Milestones
Phase 1: Foundation (Weeks 1–6)
- Repo scaffolding, CI/CD, Docker Compose dev environment
- PostgreSQL schema with RLS multi-tenancy
- Unified message format and Channel Gateway (start with Slack + Telegram)
- Basic agent orchestrator (single agent per tenant, no teams yet)
- LiteLLM integration with Ollama + one commercial API
- Basic admin portal (tenant CRUD, agent config)
Phase 2: Channel Expansion + Teams (Weeks 7–12)
- Add channels: Mattermost, WhatsApp, Teams
- Multi-agent teams with coordinator pattern
- Conversational memory (vector store + sliding window)
- Tool framework (registry, execution, sandboxing)
- BYO API key support
- Tenant onboarding flow in portal
Phase 3: Polish + Launch (Weeks 13–18)
- Add channels: Rocket.Chat, Signal
- AI company hierarchy (teams of teams)
- Cost tracking and billing integration (Stripe)
- Agent performance analytics dashboard
- Self-hosted deployment option (Helm chart + docs)
- Public launch (Product Hunt, Hacker News, Reddit)
Phase 4: Scale (Post-Launch)
- Kubernetes migration for production workloads
- Cloud burst infrastructure (AWS auto-scaling inference)
- Marketplace for pre-built AI employee templates
- Enterprise tier with dedicated isolation
- SOC 2 preparation
- API for programmatic agent management
Coding Standards
Python
- Version: 3.12+
- Package manager:
uv - Linting:
ruff(replaces flake8, isort, black) - Type checking:
mypy --strict— noAnytypes in public interfaces - Testing:
pytest+pytest-asyncio+httpx(for FastAPI test client) - Models:
Pydantic v2for all data validation and serialization - Async: Prefer
async deffor all I/O-bound operations - DB:
SQLAlchemy 2.0async with Alembic migrations
TypeScript (Portal)
- Runtime: Node 20+ LTS
- Framework: Next.js 14+ (App Router)
- Linting:
eslint+prettier - Type checking:
strict: truein tsconfig
General
- Every PR requires at least one approval
- No secrets in code — use
.env+ secrets manager - Write ADRs (Architecture Decision Records) in
docs/adr/for significant decisions - Conventional commits (
feat:,fix:,chore:,docs:,refactor:)
Key Design Decisions (ADR Stubs)
These need full ADRs written before implementation:
- ADR-001: Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
- ADR-002: Agent memory — pgvector vs. dedicated vector DB vs. hybrid
- ADR-003: Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
- ADR-004: Agent framework — build custom vs. adopt LangGraph/CrewAI
- ADR-005: BYO key encryption — envelope encryption strategy and key rotation
- ADR-006: Inter-agent communication — direct function calls vs. message bus vs. shared context
- ADR-007: Rate limiting — per-tenant token bucket implementation
- ADR-008: Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus
Open Questions
- Pricing model: per-agent, per-message, per-seat, or hybrid?
- Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
- Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
- Agent-to-agent communication across tenants (marketplace scenario)?
- White-labeling for agencies reselling Konstruct?
References
- paperclip.ing — Inspiration
- LiteLLM docs — LLM gateway
- Slack Bolt Python — Slack SDK
- Bot Framework Python — Teams SDK
- FastAPI docs — API framework