konstruct/CLAUDE.md

# CLAUDE.md — Konstruct

## What is Konstruct?

Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.

Think of it as "Hire an AI department" — not another chatbot SaaS.

---

## Project Identity

- **Codename:** Konstruct
- **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev)
- **Tagline ideas:** "Build your AI workforce" / "AI teams that just work"
- **Inspired by:** [paperclip.ing](https://paperclip.ing)
- **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support

---

## Architecture Overview

### Core Mental Model

```
Client (Slack/Teams/etc.)
        │
        ▼
┌─────────────────────┐
│   Channel Gateway    │  ← Unified ingress for all messaging platforms
│   (webhook/WS)      │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   Message Router     │  ← Tenant resolution, rate limiting, context loading
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   Agent Orchestrator │  ← Agent selection, tool dispatch, memory, handoffs
│   (per-tenant)       │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│   LLM Backend Pool   │  ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
└─────────────────────┘
```

### Key Architectural Principles

1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.

---

## Tech Stack

### Backend (Primary: Python)

| Layer | Technology | Rationale |
|-------|-----------|-----------|
| API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection |
| Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks |
| Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS |
| Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events |
| Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search |
| Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts |
| LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing |
| Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |

### Messaging Channel SDKs

| Channel | Library / Integration |
|---------|----------------------|
| Slack | `slack-bolt` (Events API + Socket Mode) |
| Microsoft Teams | `botbuilder-python` (Bot Framework SDK) |
| Mattermost | `mattermostdriver` + webhooks |
| Rocket.Chat | REST API + Realtime API (WebSocket) |
| WhatsApp | WhatsApp Business API (Cloud API) |
| Telegram | `python-telegram-bot` (Bot API) |
| Signal | `signal-cli` or `signald` (bridge) |

### Frontend (Admin Dashboard / Client Portal)

| Layer | Technology |
|-------|-----------|
| Framework | **Next.js 14+** (App Router) |
| UI | **Tailwind CSS + shadcn/ui** |
| State | **TanStack Query** |
| Auth | **NextAuth.js** → consider **Keycloak** for enterprise |

### Infrastructure

| Layer | Technology |
|-------|-----------|
| Dev Orchestration | **Docker Compose + Portainer** |
| Prod Orchestration | **Kubernetes (k3s or Talos Linux)** |
| Core Hosting | **Hetzner Dedicated Servers** |
| Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) |
| Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) |
| DNS | **Technitium** (internal) / **Cloudflare** (external) |
| VPN Mesh | **Headscale** (self-hosted) + Tailscale clients |
| CI/CD | **Gitea Actions** → **GitHub Actions** (if public) |
| Monitoring | **Prometheus + Grafana + Loki** |
| Security | **Wazuh** (SIEM), **Trivy** (container scanning) |

---

## Repo Structure

Monorepo to start, split later when service boundaries stabilize.

```
konstruct/
├── CLAUDE.md                    # This file
├── docker-compose.yml           # Local dev environment
├── docker-compose.prod.yml      # Production-like local stack
├── k8s/                         # Kubernetes manifests / Helm charts
│   ├── base/
│   └── overlays/
│       ├── staging/
│       └── production/
├── packages/
│   ├── gateway/                 # Channel Gateway service
│   │   ├── channels/            # Per-channel adapters (slack, teams, etc.)
│   │   ├── normalize.py         # Unified message format
│   │   └── main.py
│   ├── router/                  # Message Router service
│   │   ├── tenant.py            # Tenant resolution
│   │   ├── ratelimit.py
│   │   └── main.py
│   ├── orchestrator/            # Agent Orchestrator service
│   │   ├── agents/              # Agent definitions and behaviors
│   │   ├── teams/               # Multi-agent team logic
│   │   ├── tools/               # Tool registry and execution
│   │   ├── memory/              # Conversation and long-term memory
│   │   └── main.py
│   ├── llm-pool/                # LLM Backend Pool service
│   │   ├── providers/           # Provider configs (litellm router)
│   │   ├── byo/                 # BYO key / endpoint management
│   │   └── main.py
│   ├── portal/                  # Next.js admin dashboard
│   │   ├── app/
│   │   ├── components/
│   │   └── lib/
│   └── shared/                  # Shared Python libs
│       ├── models/              # Pydantic models, DB schemas
│       ├── auth/                # Auth utilities
│       ├── messaging/           # Internal message format
│       └── config/              # Shared config / env management
├── migrations/                  # Alembic DB migrations
├── scripts/                     # Dev scripts, seed data, utilities
├── tests/
│   ├── unit/
│   ├── integration/
│   └── e2e/
├── docs/                        # Architecture docs, ADRs, runbooks
├── pyproject.toml               # Python monorepo config (uv / hatch)
└── .env.example
```

---

## Multi-Tenancy Model

Tiered isolation — the level increases with the subscription plan:

| Tier | Isolation | Target |
|------|-----------|--------|
| **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
| **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
| **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
| **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |

### Tenant Resolution Flow

1. Inbound message hits Channel Gateway
2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
3. Router maps channel org → Konstruct tenant via lookup table
4. All subsequent processing scoped to that tenant's context, models, tools, and memory

---

## AI Employee Model

### Hierarchy

```
Company (AI-run)
  └── Team
       └── Employee (Agent)
            ├── Role definition (system prompt + persona)
            ├── Skills (tool bindings)
            ├── Memory (vector store + conversation history)
            ├── Channels (which messaging platforms it's active on)
            └── Escalation rules (when to hand off to human or another agent)
```

### Employee Configuration (example)

```yaml
employee:
  name: "Mara"
  role: "Customer Support Lead"
  persona: |
    Professional, empathetic, solution-oriented.
    Fluent in English, Spanish, Portuguese.
    Escalates billing disputes to human after 2 failed resolutions.
  model:
    primary: "anthropic/claude-sonnet-4-20250514"
    fallback: "openai/gpt-4o"
    local: "ollama/qwen3:32b"
  tools:
    - zendesk_ticket_create
    - zendesk_ticket_lookup
    - knowledge_base_search
    - calendar_book
  channels:
    - slack
    - whatsapp
  memory:
    type: "conversational + rag"
    retention_days: 90
  escalation:
    - condition: "billing_dispute AND attempts > 2"
      action: "handoff_human"
    - condition: "sentiment < -0.7"
      action: "handoff_human"
```

### Team Orchestration

Teams use a coordinator pattern:

1. **Coordinator agent** receives the inbound message
2. Coordinator decides which team member(s) should handle it (routing)
3. Specialist agent(s) execute their part
4. Coordinator assembles the final response or delegates follow-up
5. All inter-agent communication logged for audit

---

## LLM Backend Strategy

### Provider Hierarchy

```
┌─────────────────────────────────────────┐
│              LiteLLM Router             │
│  (load balancing, fallback, cost caps)  │
└────┬──────────┬──────────┬─────────┬────┘
     │          │          │         │
  Ollama     vLLM     Anthropic   OpenAI
  (local)   (local)    (API)      (API)
                                    │
                              BYO Endpoint
                            (customer-provided)
```

### Routing Logic

1. **Tenant config** specifies preferred provider(s) and fallback chain
2. **Cost caps** per tenant (daily/monthly spend limits)
3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants

---

## Messaging Format (Internal)

All channel adapters normalize messages into this format:

```python
class KonstructMessage(BaseModel):
    id: str                          # UUID
    tenant_id: str                   # Konstruct tenant
    channel: ChannelType             # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
    channel_metadata: dict           # Channel-specific IDs (workspace, channel, thread)
    sender: SenderInfo               # User ID, display name, role
    content: MessageContent          # Text, attachments, structured data
    timestamp: datetime
    thread_id: str | None            # For threaded conversations
    reply_to: str | None             # Parent message ID
    context: dict                    # Extracted intent, entities, sentiment (populated downstream)
```

---

## Security & Compliance

### Non-Negotiables

- **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption)
- **Encryption in transit** (TLS 1.3 everywhere, mTLS between services)
- **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory)
- **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier
- **Audit log** for every agent action, tool invocation, and LLM call
- **RBAC** per tenant (admin, manager, member, viewer)
- **Rate limiting** per tenant, per channel, per agent
- **PII handling** — configurable PII detection and redaction per tenant

### Future Compliance Targets

- SOC 2 Type II (when revenue supports it)
- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
- HIPAA (Enterprise self-hosted tier only, with BAA)

---

## Development Workflow

### Local Dev

```bash
# Clone and setup
git clone <repo-url> && cd konstruct
cp .env.example .env

# Start all services
docker compose up -d

# Run gateway in dev mode (hot reload)
cd packages/gateway
uvicorn main:app --reload --port 8001

# Run tests
pytest tests/unit -x
pytest tests/integration -x
```

### Branch Strategy

- `main` — production-ready, protected
- `develop` — integration branch
- `feat/*` — feature branches off develop
- `fix/*` — bugfix branches
- `release/*` — release candidates

### CI Pipeline

1. Lint (`ruff check`, `ruff format --check`)
2. Type check (`mypy --strict`)
3. Unit tests (`pytest tests/unit`)
4. Integration tests (`pytest tests/integration` — spins up Docker Compose)
5. Container build + scan (`trivy image`)
6. Deploy to staging (auto on `develop` merge)
7. Deploy to production (manual approval on `release/*` merge)

---

## Milestones

### Phase 1: Foundation (Weeks 1–6)

- [ ] Repo scaffolding, CI/CD, Docker Compose dev environment
- [ ] PostgreSQL schema with RLS multi-tenancy
- [ ] Unified message format and Channel Gateway (start with Slack + Telegram)
- [ ] Basic agent orchestrator (single agent per tenant, no teams yet)
- [ ] LiteLLM integration with Ollama + one commercial API
- [ ] Basic admin portal (tenant CRUD, agent config)

### Phase 2: Channel Expansion + Teams (Weeks 7–12)

- [ ] Add channels: Mattermost, WhatsApp, Teams
- [ ] Multi-agent teams with coordinator pattern
- [ ] Conversational memory (vector store + sliding window)
- [ ] Tool framework (registry, execution, sandboxing)
- [ ] BYO API key support
- [ ] Tenant onboarding flow in portal

### Phase 3: Polish + Launch (Weeks 13–18)

- [ ] Add channels: Rocket.Chat, Signal
- [ ] AI company hierarchy (teams of teams)
- [ ] Cost tracking and billing integration (Stripe)
- [ ] Agent performance analytics dashboard
- [ ] Self-hosted deployment option (Helm chart + docs)
- [ ] Public launch (Product Hunt, Hacker News, Reddit)

### Phase 4: Scale (Post-Launch)

- [ ] Kubernetes migration for production workloads
- [ ] Cloud burst infrastructure (AWS auto-scaling inference)
- [ ] Marketplace for pre-built AI employee templates
- [ ] Enterprise tier with dedicated isolation
- [ ] SOC 2 preparation
- [ ] API for programmatic agent management

---

## Coding Standards

### Python

- **Version:** 3.12+
- **Package manager:** `uv`
- **Linting:** `ruff` (replaces flake8, isort, black)
- **Type checking:** `mypy --strict` — no `Any` types in public interfaces
- **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client)
- **Models:** `Pydantic v2` for all data validation and serialization
- **Async:** Prefer `async def` for all I/O-bound operations
- **DB:** `SQLAlchemy 2.0` async with Alembic migrations

### TypeScript (Portal)

- **Runtime:** Node 20+ LTS
- **Framework:** Next.js 14+ (App Router)
- **Linting:** `eslint` + `prettier`
- **Type checking:** `strict: true` in tsconfig

### General

- Every PR requires at least one approval
- No secrets in code — use `.env` + secrets manager
- Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions
- Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`)

---

## Key Design Decisions (ADR Stubs)

These need full ADRs written before implementation:

1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid
3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI
5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation
6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context
7. **ADR-007:** Rate limiting — per-tenant token bucket implementation
8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus

---

## Open Questions

- [ ] Pricing model: per-agent, per-message, per-seat, or hybrid?
- [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
- [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
- [ ] Agent-to-agent communication across tenants (marketplace scenario)?
- [ ] White-labeling for agencies reselling Konstruct?

---

## References

- [paperclip.ing](https://paperclip.ing) — Inspiration
- [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway
- [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK
- [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK
- [FastAPI docs](https://fastapi.tiangolo.com/) — API framework