# CLAUDE.md — Konstruct ## What is Konstruct? Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer. Think of it as "Hire an AI department" — not another chatbot SaaS. --- ## Project Identity - **Codename:** Konstruct - **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev) - **Tagline ideas:** "Build your AI workforce" / "AI teams that just work" - **Inspired by:** [paperclip.ing](https://paperclip.ing) - **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support --- ## Architecture Overview ### Core Mental Model ``` Client (Slack/Teams/etc.) │ ▼ ┌─────────────────────┐ │ Channel Gateway │ ← Unified ingress for all messaging platforms │ (webhook/WS) │ └────────┬────────────┘ │ ▼ ┌─────────────────────┐ │ Message Router │ ← Tenant resolution, rate limiting, context loading └────────┬────────────┘ │ ▼ ┌─────────────────────┐ │ Agent Orchestrator │ ← Agent selection, tool dispatch, memory, handoffs │ (per-tenant) │ └────────┬────────────┘ │ ▼ ┌─────────────────────┐ │ LLM Backend Pool │ ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO └─────────────────────┘ ``` ### Key Architectural Principles 1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format. 2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever. 3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints. 4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation. --- ## Tech Stack ### Backend (Primary: Python) | Layer | Technology | Rationale | |-------|-----------|-----------| | API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection | | Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks | | Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS | | Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events | | Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search | | Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts | | LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing | | Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs | ### Messaging Channel SDKs | Channel | Library / Integration | |---------|----------------------| | Slack | `slack-bolt` (Events API + Socket Mode) | | Microsoft Teams | `botbuilder-python` (Bot Framework SDK) | | Mattermost | `mattermostdriver` + webhooks | | Rocket.Chat | REST API + Realtime API (WebSocket) | | WhatsApp | WhatsApp Business API (Cloud API) | | Telegram | `python-telegram-bot` (Bot API) | | Signal | `signal-cli` or `signald` (bridge) | ### Frontend (Admin Dashboard / Client Portal) | Layer | Technology | |-------|-----------| | Framework | **Next.js 14+** (App Router) | | UI | **Tailwind CSS + shadcn/ui** | | State | **TanStack Query** | | Auth | **NextAuth.js** → consider **Keycloak** for enterprise | ### Infrastructure | Layer | Technology | |-------|-----------| | Dev Orchestration | **Docker Compose + Portainer** | | Prod Orchestration | **Kubernetes (k3s or Talos Linux)** | | Core Hosting | **Hetzner Dedicated Servers** | | Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) | | Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) | | DNS | **Technitium** (internal) / **Cloudflare** (external) | | VPN Mesh | **Headscale** (self-hosted) + Tailscale clients | | CI/CD | **Gitea Actions** → **GitHub Actions** (if public) | | Monitoring | **Prometheus + Grafana + Loki** | | Security | **Wazuh** (SIEM), **Trivy** (container scanning) | --- ## Repo Structure Monorepo to start, split later when service boundaries stabilize. ``` konstruct/ ├── CLAUDE.md # This file ├── docker-compose.yml # Local dev environment ├── docker-compose.prod.yml # Production-like local stack ├── k8s/ # Kubernetes manifests / Helm charts │ ├── base/ │ └── overlays/ │ ├── staging/ │ └── production/ ├── packages/ │ ├── gateway/ # Channel Gateway service │ │ ├── channels/ # Per-channel adapters (slack, teams, etc.) │ │ ├── normalize.py # Unified message format │ │ └── main.py │ ├── router/ # Message Router service │ │ ├── tenant.py # Tenant resolution │ │ ├── ratelimit.py │ │ └── main.py │ ├── orchestrator/ # Agent Orchestrator service │ │ ├── agents/ # Agent definitions and behaviors │ │ ├── teams/ # Multi-agent team logic │ │ ├── tools/ # Tool registry and execution │ │ ├── memory/ # Conversation and long-term memory │ │ └── main.py │ ├── llm-pool/ # LLM Backend Pool service │ │ ├── providers/ # Provider configs (litellm router) │ │ ├── byo/ # BYO key / endpoint management │ │ └── main.py │ ├── portal/ # Next.js admin dashboard │ │ ├── app/ │ │ ├── components/ │ │ └── lib/ │ └── shared/ # Shared Python libs │ ├── models/ # Pydantic models, DB schemas │ ├── auth/ # Auth utilities │ ├── messaging/ # Internal message format │ └── config/ # Shared config / env management ├── migrations/ # Alembic DB migrations ├── scripts/ # Dev scripts, seed data, utilities ├── tests/ │ ├── unit/ │ ├── integration/ │ └── e2e/ ├── docs/ # Architecture docs, ADRs, runbooks ├── pyproject.toml # Python monorepo config (uv / hatch) └── .env.example ``` --- ## Multi-Tenancy Model Tiered isolation — the level increases with the subscription plan: | Tier | Isolation | Target | |------|-----------|--------| | **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses | | **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams | | **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs | | **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty | ### Tenant Resolution Flow 1. Inbound message hits Channel Gateway 2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.) 3. Router maps channel org → Konstruct tenant via lookup table 4. All subsequent processing scoped to that tenant's context, models, tools, and memory --- ## AI Employee Model ### Hierarchy ``` Company (AI-run) └── Team └── Employee (Agent) ├── Role definition (system prompt + persona) ├── Skills (tool bindings) ├── Memory (vector store + conversation history) ├── Channels (which messaging platforms it's active on) └── Escalation rules (when to hand off to human or another agent) ``` ### Employee Configuration (example) ```yaml employee: name: "Mara" role: "Customer Support Lead" persona: | Professional, empathetic, solution-oriented. Fluent in English, Spanish, Portuguese. Escalates billing disputes to human after 2 failed resolutions. model: primary: "anthropic/claude-sonnet-4-20250514" fallback: "openai/gpt-4o" local: "ollama/qwen3:32b" tools: - zendesk_ticket_create - zendesk_ticket_lookup - knowledge_base_search - calendar_book channels: - slack - whatsapp memory: type: "conversational + rag" retention_days: 90 escalation: - condition: "billing_dispute AND attempts > 2" action: "handoff_human" - condition: "sentiment < -0.7" action: "handoff_human" ``` ### Team Orchestration Teams use a coordinator pattern: 1. **Coordinator agent** receives the inbound message 2. Coordinator decides which team member(s) should handle it (routing) 3. Specialist agent(s) execute their part 4. Coordinator assembles the final response or delegates follow-up 5. All inter-agent communication logged for audit --- ## LLM Backend Strategy ### Provider Hierarchy ``` ┌─────────────────────────────────────────┐ │ LiteLLM Router │ │ (load balancing, fallback, cost caps) │ └────┬──────────┬──────────┬─────────┬────┘ │ │ │ │ Ollama vLLM Anthropic OpenAI (local) (local) (API) (API) │ BYO Endpoint (customer-provided) ``` ### Routing Logic 1. **Tenant config** specifies preferred provider(s) and fallback chain 2. **Cost caps** per tenant (daily/monthly spend limits) 3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs 4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants --- ## Messaging Format (Internal) All channel adapters normalize messages into this format: ```python class KonstructMessage(BaseModel): id: str # UUID tenant_id: str # Konstruct tenant channel: ChannelType # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal channel_metadata: dict # Channel-specific IDs (workspace, channel, thread) sender: SenderInfo # User ID, display name, role content: MessageContent # Text, attachments, structured data timestamp: datetime thread_id: str | None # For threaded conversations reply_to: str | None # Parent message ID context: dict # Extracted intent, entities, sentiment (populated downstream) ``` --- ## Security & Compliance ### Non-Negotiables - **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption) - **Encryption in transit** (TLS 1.3 everywhere, mTLS between services) - **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory) - **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier - **Audit log** for every agent action, tool invocation, and LLM call - **RBAC** per tenant (admin, manager, member, viewer) - **Rate limiting** per tenant, per channel, per agent - **PII handling** — configurable PII detection and redaction per tenant ### Future Compliance Targets - SOC 2 Type II (when revenue supports it) - GDPR data residency (leverage Hetzner EU + customer self-hosted option) - HIPAA (Enterprise self-hosted tier only, with BAA) --- ## Development Workflow ### Local Dev ```bash # Clone and setup git clone && cd konstruct cp .env.example .env # Start all services docker compose up -d # Run gateway in dev mode (hot reload) cd packages/gateway uvicorn main:app --reload --port 8001 # Run tests pytest tests/unit -x pytest tests/integration -x ``` ### Branch Strategy - `main` — production-ready, protected - `develop` — integration branch - `feat/*` — feature branches off develop - `fix/*` — bugfix branches - `release/*` — release candidates ### CI Pipeline 1. Lint (`ruff check`, `ruff format --check`) 2. Type check (`mypy --strict`) 3. Unit tests (`pytest tests/unit`) 4. Integration tests (`pytest tests/integration` — spins up Docker Compose) 5. Container build + scan (`trivy image`) 6. Deploy to staging (auto on `develop` merge) 7. Deploy to production (manual approval on `release/*` merge) --- ## Milestones ### Phase 1: Foundation (Weeks 1–6) - [ ] Repo scaffolding, CI/CD, Docker Compose dev environment - [ ] PostgreSQL schema with RLS multi-tenancy - [ ] Unified message format and Channel Gateway (start with Slack + Telegram) - [ ] Basic agent orchestrator (single agent per tenant, no teams yet) - [ ] LiteLLM integration with Ollama + one commercial API - [ ] Basic admin portal (tenant CRUD, agent config) ### Phase 2: Channel Expansion + Teams (Weeks 7–12) - [ ] Add channels: Mattermost, WhatsApp, Teams - [ ] Multi-agent teams with coordinator pattern - [ ] Conversational memory (vector store + sliding window) - [ ] Tool framework (registry, execution, sandboxing) - [ ] BYO API key support - [ ] Tenant onboarding flow in portal ### Phase 3: Polish + Launch (Weeks 13–18) - [ ] Add channels: Rocket.Chat, Signal - [ ] AI company hierarchy (teams of teams) - [ ] Cost tracking and billing integration (Stripe) - [ ] Agent performance analytics dashboard - [ ] Self-hosted deployment option (Helm chart + docs) - [ ] Public launch (Product Hunt, Hacker News, Reddit) ### Phase 4: Scale (Post-Launch) - [ ] Kubernetes migration for production workloads - [ ] Cloud burst infrastructure (AWS auto-scaling inference) - [ ] Marketplace for pre-built AI employee templates - [ ] Enterprise tier with dedicated isolation - [ ] SOC 2 preparation - [ ] API for programmatic agent management --- ## Coding Standards ### Python - **Version:** 3.12+ - **Package manager:** `uv` - **Linting:** `ruff` (replaces flake8, isort, black) - **Type checking:** `mypy --strict` — no `Any` types in public interfaces - **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client) - **Models:** `Pydantic v2` for all data validation and serialization - **Async:** Prefer `async def` for all I/O-bound operations - **DB:** `SQLAlchemy 2.0` async with Alembic migrations ### TypeScript (Portal) - **Runtime:** Node 20+ LTS - **Framework:** Next.js 14+ (App Router) - **Linting:** `eslint` + `prettier` - **Type checking:** `strict: true` in tsconfig ### General - Every PR requires at least one approval - No secrets in code — use `.env` + secrets manager - Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions - Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`) --- ## Key Design Decisions (ADR Stubs) These need full ADRs written before implementation: 1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel 2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid 3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant 4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI 5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation 6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context 7. **ADR-007:** Rate limiting — per-tenant token bucket implementation 8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus --- ## Open Questions - [ ] Pricing model: per-agent, per-message, per-seat, or hybrid? - [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)? - [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?) - [ ] Agent-to-agent communication across tenants (marketplace scenario)? - [ ] White-labeling for agencies reselling Konstruct? --- ## References - [paperclip.ing](https://paperclip.ing) — Inspiration - [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway - [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK - [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK - [FastAPI docs](https://fastapi.tiangolo.com/) — API framework