Files
konstruct/CLAUDE.md
Adolfo Delorenzo 0e0ea5fb66 fix: runtime deployment fixes for Docker Compose stack
- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:26:34 -06:00

456 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md — Konstruct
## What is Konstruct?
Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.
Think of it as "Hire an AI department" — not another chatbot SaaS.
---
## Project Identity
- **Codename:** Konstruct
- **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev)
- **Tagline ideas:** "Build your AI workforce" / "AI teams that just work"
- **Inspired by:** [paperclip.ing](https://paperclip.ing)
- **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support
---
## Architecture Overview
### Core Mental Model
```
Client (Slack/Teams/etc.)
┌─────────────────────┐
│ Channel Gateway │ ← Unified ingress for all messaging platforms
│ (webhook/WS) │
└────────┬────────────┘
┌─────────────────────┐
│ Message Router │ ← Tenant resolution, rate limiting, context loading
└────────┬────────────┘
┌─────────────────────┐
│ Agent Orchestrator │ ← Agent selection, tool dispatch, memory, handoffs
│ (per-tenant) │
└────────┬────────────┘
┌─────────────────────┐
│ LLM Backend Pool │ ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
└─────────────────────┘
```
### Key Architectural Principles
1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.
---
## Tech Stack
### Backend (Primary: Python)
| Layer | Technology | Rationale |
|-------|-----------|-----------|
| API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection |
| Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks |
| Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS |
| Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events |
| Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search |
| Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts |
| LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing |
| Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |
### Messaging Channel SDKs
| Channel | Library / Integration |
|---------|----------------------|
| Slack | `slack-bolt` (Events API + Socket Mode) |
| Microsoft Teams | `botbuilder-python` (Bot Framework SDK) |
| Mattermost | `mattermostdriver` + webhooks |
| Rocket.Chat | REST API + Realtime API (WebSocket) |
| WhatsApp | WhatsApp Business API (Cloud API) |
| Telegram | `python-telegram-bot` (Bot API) |
| Signal | `signal-cli` or `signald` (bridge) |
### Frontend (Admin Dashboard / Client Portal)
| Layer | Technology |
|-------|-----------|
| Framework | **Next.js 14+** (App Router) |
| UI | **Tailwind CSS + shadcn/ui** |
| State | **TanStack Query** |
| Auth | **NextAuth.js** → consider **Keycloak** for enterprise |
### Infrastructure
| Layer | Technology |
|-------|-----------|
| Dev Orchestration | **Docker Compose + Portainer** |
| Prod Orchestration | **Kubernetes (k3s or Talos Linux)** |
| Core Hosting | **Hetzner Dedicated Servers** |
| Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) |
| Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) |
| DNS | **Technitium** (internal) / **Cloudflare** (external) |
| VPN Mesh | **Headscale** (self-hosted) + Tailscale clients |
| CI/CD | **Gitea Actions****GitHub Actions** (if public) |
| Monitoring | **Prometheus + Grafana + Loki** |
| Security | **Wazuh** (SIEM), **Trivy** (container scanning) |
---
## Repo Structure
Monorepo to start, split later when service boundaries stabilize.
```
konstruct/
├── CLAUDE.md # This file
├── docker-compose.yml # Local dev environment
├── docker-compose.prod.yml # Production-like local stack
├── k8s/ # Kubernetes manifests / Helm charts
│ ├── base/
│ └── overlays/
│ ├── staging/
│ └── production/
├── packages/
│ ├── gateway/ # Channel Gateway service
│ │ ├── channels/ # Per-channel adapters (slack, teams, etc.)
│ │ ├── normalize.py # Unified message format
│ │ └── main.py
│ ├── router/ # Message Router service
│ │ ├── tenant.py # Tenant resolution
│ │ ├── ratelimit.py
│ │ └── main.py
│ ├── orchestrator/ # Agent Orchestrator service
│ │ ├── agents/ # Agent definitions and behaviors
│ │ ├── teams/ # Multi-agent team logic
│ │ ├── tools/ # Tool registry and execution
│ │ ├── memory/ # Conversation and long-term memory
│ │ └── main.py
│ ├── llm-pool/ # LLM Backend Pool service
│ │ ├── providers/ # Provider configs (litellm router)
│ │ ├── byo/ # BYO key / endpoint management
│ │ └── main.py
│ ├── portal/ # Next.js admin dashboard
│ │ ├── app/
│ │ ├── components/
│ │ └── lib/
│ └── shared/ # Shared Python libs
│ ├── models/ # Pydantic models, DB schemas
│ ├── auth/ # Auth utilities
│ ├── messaging/ # Internal message format
│ └── config/ # Shared config / env management
├── migrations/ # Alembic DB migrations
├── scripts/ # Dev scripts, seed data, utilities
├── tests/
│ ├── unit/
│ ├── integration/
│ └── e2e/
├── docs/ # Architecture docs, ADRs, runbooks
├── pyproject.toml # Python monorepo config (uv / hatch)
└── .env.example
```
---
## Multi-Tenancy Model
Tiered isolation — the level increases with the subscription plan:
| Tier | Isolation | Target |
|------|-----------|--------|
| **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
| **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
| **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
| **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |
### Tenant Resolution Flow
1. Inbound message hits Channel Gateway
2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
3. Router maps channel org → Konstruct tenant via lookup table
4. All subsequent processing scoped to that tenant's context, models, tools, and memory
---
## AI Employee Model
### Hierarchy
```
Company (AI-run)
└── Team
└── Employee (Agent)
├── Role definition (system prompt + persona)
├── Skills (tool bindings)
├── Memory (vector store + conversation history)
├── Channels (which messaging platforms it's active on)
└── Escalation rules (when to hand off to human or another agent)
```
### Employee Configuration (example)
```yaml
employee:
name: "Mara"
role: "Customer Support Lead"
persona: |
Professional, empathetic, solution-oriented.
Fluent in English, Spanish, Portuguese.
Escalates billing disputes to human after 2 failed resolutions.
model:
primary: "anthropic/claude-sonnet-4-20250514"
fallback: "openai/gpt-4o"
local: "ollama/qwen3:32b"
tools:
- zendesk_ticket_create
- zendesk_ticket_lookup
- knowledge_base_search
- calendar_book
channels:
- slack
- whatsapp
memory:
type: "conversational + rag"
retention_days: 90
escalation:
- condition: "billing_dispute AND attempts > 2"
action: "handoff_human"
- condition: "sentiment < -0.7"
action: "handoff_human"
```
### Team Orchestration
Teams use a coordinator pattern:
1. **Coordinator agent** receives the inbound message
2. Coordinator decides which team member(s) should handle it (routing)
3. Specialist agent(s) execute their part
4. Coordinator assembles the final response or delegates follow-up
5. All inter-agent communication logged for audit
---
## LLM Backend Strategy
### Provider Hierarchy
```
┌─────────────────────────────────────────┐
│ LiteLLM Router │
│ (load balancing, fallback, cost caps) │
└────┬──────────┬──────────┬─────────┬────┘
│ │ │ │
Ollama vLLM Anthropic OpenAI
(local) (local) (API) (API)
BYO Endpoint
(customer-provided)
```
### Routing Logic
1. **Tenant config** specifies preferred provider(s) and fallback chain
2. **Cost caps** per tenant (daily/monthly spend limits)
3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants
---
## Messaging Format (Internal)
All channel adapters normalize messages into this format:
```python
class KonstructMessage(BaseModel):
id: str # UUID
tenant_id: str # Konstruct tenant
channel: ChannelType # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
channel_metadata: dict # Channel-specific IDs (workspace, channel, thread)
sender: SenderInfo # User ID, display name, role
content: MessageContent # Text, attachments, structured data
timestamp: datetime
thread_id: str | None # For threaded conversations
reply_to: str | None # Parent message ID
context: dict # Extracted intent, entities, sentiment (populated downstream)
```
---
## Security & Compliance
### Non-Negotiables
- **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption)
- **Encryption in transit** (TLS 1.3 everywhere, mTLS between services)
- **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory)
- **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier
- **Audit log** for every agent action, tool invocation, and LLM call
- **RBAC** per tenant (admin, manager, member, viewer)
- **Rate limiting** per tenant, per channel, per agent
- **PII handling** — configurable PII detection and redaction per tenant
### Future Compliance Targets
- SOC 2 Type II (when revenue supports it)
- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
- HIPAA (Enterprise self-hosted tier only, with BAA)
---
## Development Workflow
### Local Dev
```bash
# Clone and setup
git clone <repo-url> && cd konstruct
cp .env.example .env
# Start all services
docker compose up -d
# Run gateway in dev mode (hot reload)
cd packages/gateway
uvicorn main:app --reload --port 8001
# Run tests
pytest tests/unit -x
pytest tests/integration -x
```
### Branch Strategy
- `main` — production-ready, protected
- `develop` — integration branch
- `feat/*` — feature branches off develop
- `fix/*` — bugfix branches
- `release/*` — release candidates
### CI Pipeline
1. Lint (`ruff check`, `ruff format --check`)
2. Type check (`mypy --strict`)
3. Unit tests (`pytest tests/unit`)
4. Integration tests (`pytest tests/integration` — spins up Docker Compose)
5. Container build + scan (`trivy image`)
6. Deploy to staging (auto on `develop` merge)
7. Deploy to production (manual approval on `release/*` merge)
---
## Milestones
### Phase 1: Foundation (Weeks 16)
- [ ] Repo scaffolding, CI/CD, Docker Compose dev environment
- [ ] PostgreSQL schema with RLS multi-tenancy
- [ ] Unified message format and Channel Gateway (start with Slack + Telegram)
- [ ] Basic agent orchestrator (single agent per tenant, no teams yet)
- [ ] LiteLLM integration with Ollama + one commercial API
- [ ] Basic admin portal (tenant CRUD, agent config)
### Phase 2: Channel Expansion + Teams (Weeks 712)
- [ ] Add channels: Mattermost, WhatsApp, Teams
- [ ] Multi-agent teams with coordinator pattern
- [ ] Conversational memory (vector store + sliding window)
- [ ] Tool framework (registry, execution, sandboxing)
- [ ] BYO API key support
- [ ] Tenant onboarding flow in portal
### Phase 3: Polish + Launch (Weeks 1318)
- [ ] Add channels: Rocket.Chat, Signal
- [ ] AI company hierarchy (teams of teams)
- [ ] Cost tracking and billing integration (Stripe)
- [ ] Agent performance analytics dashboard
- [ ] Self-hosted deployment option (Helm chart + docs)
- [ ] Public launch (Product Hunt, Hacker News, Reddit)
### Phase 4: Scale (Post-Launch)
- [ ] Kubernetes migration for production workloads
- [ ] Cloud burst infrastructure (AWS auto-scaling inference)
- [ ] Marketplace for pre-built AI employee templates
- [ ] Enterprise tier with dedicated isolation
- [ ] SOC 2 preparation
- [ ] API for programmatic agent management
---
## Coding Standards
### Python
- **Version:** 3.12+
- **Package manager:** `uv`
- **Linting:** `ruff` (replaces flake8, isort, black)
- **Type checking:** `mypy --strict` — no `Any` types in public interfaces
- **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client)
- **Models:** `Pydantic v2` for all data validation and serialization
- **Async:** Prefer `async def` for all I/O-bound operations
- **DB:** `SQLAlchemy 2.0` async with Alembic migrations
### TypeScript (Portal)
- **Runtime:** Node 20+ LTS
- **Framework:** Next.js 14+ (App Router)
- **Linting:** `eslint` + `prettier`
- **Type checking:** `strict: true` in tsconfig
### General
- Every PR requires at least one approval
- No secrets in code — use `.env` + secrets manager
- Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions
- Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`)
---
## Key Design Decisions (ADR Stubs)
These need full ADRs written before implementation:
1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid
3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI
5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation
6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context
7. **ADR-007:** Rate limiting — per-tenant token bucket implementation
8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus
---
## Open Questions
- [ ] Pricing model: per-agent, per-message, per-seat, or hybrid?
- [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
- [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
- [ ] Agent-to-agent communication across tenants (marketplace scenario)?
- [ ] White-labeling for agencies reselling Konstruct?
---
## References
- [paperclip.ing](https://paperclip.ing) — Inspiration
- [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway
- [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK
- [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK
- [FastAPI docs](https://fastapi.tiangolo.com/) — API framework