fix: runtime deployment fixes for Docker Compose stack

- Add .gitignore for __pycache__, node_modules, .playwright-mcp - Add CLAUDE.md project instructions - docker-compose: remove host port exposure for internal services, remove Ollama container (use host), add CORS origin, bake NEXT_PUBLIC_API_URL at build time, run alembic migrations on gateway startup, add CPU-only torch pre-install - gateway: add CORS middleware, graceful Slack degradation without bot token, fix None guard on slack_handler - gateway pyproject: add aiohttp dependency for slack-bolt async - llm-pool pyproject: install litellm from GitHub (removed from PyPI), enable hatch direct references - portal: enable standalone output in next.config.ts - Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:26:34 -06:00
parent d936bcf361
commit 0e0ea5fb66
9 changed files with 694 additions and 293 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,455 @@
+# CLAUDE.md — Konstruct
+
+## What is Konstruct?
+
+Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.
+
+Think of it as "Hire an AI department" — not another chatbot SaaS.
+
+---
+
+## Project Identity
+
+- **Codename:** Konstruct
+- **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev)
+- **Tagline ideas:** "Build your AI workforce" / "AI teams that just work"
+- **Inspired by:** [paperclip.ing](https://paperclip.ing)
+- **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support
+
+---
+
+## Architecture Overview
+
+### Core Mental Model
+
+```
+Client (Slack/Teams/etc.)
+        │
+        ▼
+┌─────────────────────┐
+│   Channel Gateway    │  ← Unified ingress for all messaging platforms
+│   (webhook/WS)      │
+└────────┬────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│   Message Router     │  ← Tenant resolution, rate limiting, context loading
+└────────┬────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│   Agent Orchestrator │  ← Agent selection, tool dispatch, memory, handoffs
+│   (per-tenant)       │
+└────────┬────────────┘
+         │
+         ▼
+┌─────────────────────┐
+│   LLM Backend Pool   │  ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
+└─────────────────────┘
+```
+
+### Key Architectural Principles
+
+1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
+2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
+3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
+4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.
+
+---
+
+## Tech Stack
+
+### Backend (Primary: Python)
+
+| Layer | Technology | Rationale |
+|-------|-----------|-----------|
+| API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection |
+| Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks |
+| Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS |
+| Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events |
+| Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search |
+| Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts |
+| LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing |
+| Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |
+
+### Messaging Channel SDKs
+
+| Channel | Library / Integration |
+|---------|----------------------|
+| Slack | `slack-bolt` (Events API + Socket Mode) |
+| Microsoft Teams | `botbuilder-python` (Bot Framework SDK) |
+| Mattermost | `mattermostdriver` + webhooks |
+| Rocket.Chat | REST API + Realtime API (WebSocket) |
+| WhatsApp | WhatsApp Business API (Cloud API) |
+| Telegram | `python-telegram-bot` (Bot API) |
+| Signal | `signal-cli` or `signald` (bridge) |
+
+### Frontend (Admin Dashboard / Client Portal)
+
+| Layer | Technology |
+|-------|-----------|
+| Framework | **Next.js 14+** (App Router) |
+| UI | **Tailwind CSS + shadcn/ui** |
+| State | **TanStack Query** |
+| Auth | **NextAuth.js** → consider **Keycloak** for enterprise |
+
+### Infrastructure
+
+| Layer | Technology |
+|-------|-----------|
+| Dev Orchestration | **Docker Compose + Portainer** |
+| Prod Orchestration | **Kubernetes (k3s or Talos Linux)** |
+| Core Hosting | **Hetzner Dedicated Servers** |
+| Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) |
+| Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) |
+| DNS | **Technitium** (internal) / **Cloudflare** (external) |
+| VPN Mesh | **Headscale** (self-hosted) + Tailscale clients |
+| CI/CD | **Gitea Actions** → **GitHub Actions** (if public) |
+| Monitoring | **Prometheus + Grafana + Loki** |
+| Security | **Wazuh** (SIEM), **Trivy** (container scanning) |
+
+---
+
+## Repo Structure
+
+Monorepo to start, split later when service boundaries stabilize.
+
+```
+konstruct/
+├── CLAUDE.md                    # This file
+├── docker-compose.yml           # Local dev environment
+├── docker-compose.prod.yml      # Production-like local stack
+├── k8s/                         # Kubernetes manifests / Helm charts
+│   ├── base/
+│   └── overlays/
+│       ├── staging/
+│       └── production/
+├── packages/
+│   ├── gateway/                 # Channel Gateway service
+│   │   ├── channels/            # Per-channel adapters (slack, teams, etc.)
+│   │   ├── normalize.py         # Unified message format
+│   │   └── main.py
+│   ├── router/                  # Message Router service
+│   │   ├── tenant.py            # Tenant resolution
+│   │   ├── ratelimit.py
+│   │   └── main.py
+│   ├── orchestrator/            # Agent Orchestrator service
+│   │   ├── agents/              # Agent definitions and behaviors
+│   │   ├── teams/               # Multi-agent team logic
+│   │   ├── tools/               # Tool registry and execution
+│   │   ├── memory/              # Conversation and long-term memory
+│   │   └── main.py
+│   ├── llm-pool/                # LLM Backend Pool service
+│   │   ├── providers/           # Provider configs (litellm router)
+│   │   ├── byo/                 # BYO key / endpoint management
+│   │   └── main.py
+│   ├── portal/                  # Next.js admin dashboard
+│   │   ├── app/
+│   │   ├── components/
+│   │   └── lib/
+│   └── shared/                  # Shared Python libs
+│       ├── models/              # Pydantic models, DB schemas
+│       ├── auth/                # Auth utilities
+│       ├── messaging/           # Internal message format
+│       └── config/              # Shared config / env management
+├── migrations/                  # Alembic DB migrations
+├── scripts/                     # Dev scripts, seed data, utilities
+├── tests/
+│   ├── unit/
+│   ├── integration/
+│   └── e2e/
+├── docs/                        # Architecture docs, ADRs, runbooks
+├── pyproject.toml               # Python monorepo config (uv / hatch)
+└── .env.example
+```
+
+---
+
+## Multi-Tenancy Model
+
+Tiered isolation — the level increases with the subscription plan:
+
+| Tier | Isolation | Target |
+|------|-----------|--------|
+| **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
+| **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
+| **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
+| **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |
+
+### Tenant Resolution Flow
+
+1. Inbound message hits Channel Gateway
+2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
+3. Router maps channel org → Konstruct tenant via lookup table
+4. All subsequent processing scoped to that tenant's context, models, tools, and memory
+
+---
+
+## AI Employee Model
+
+### Hierarchy
+
+```
+Company (AI-run)
+  └── Team
+       └── Employee (Agent)
+            ├── Role definition (system prompt + persona)
+            ├── Skills (tool bindings)
+            ├── Memory (vector store + conversation history)
+            ├── Channels (which messaging platforms it's active on)
+            └── Escalation rules (when to hand off to human or another agent)
+```
+
+### Employee Configuration (example)
+
+```yaml
+employee:
+  name: "Mara"
+  role: "Customer Support Lead"
+  persona: |
+    Professional, empathetic, solution-oriented.
+    Fluent in English, Spanish, Portuguese.
+    Escalates billing disputes to human after 2 failed resolutions.
+  model:
+    primary: "anthropic/claude-sonnet-4-20250514"
+    fallback: "openai/gpt-4o"
+    local: "ollama/qwen3:32b"
+  tools:
+    - zendesk_ticket_create
+    - zendesk_ticket_lookup
+    - knowledge_base_search
+    - calendar_book
+  channels:
+    - slack
+    - whatsapp
+  memory:
+    type: "conversational + rag"
+    retention_days: 90
+  escalation:
+    - condition: "billing_dispute AND attempts > 2"
+      action: "handoff_human"
+    - condition: "sentiment < -0.7"
+      action: "handoff_human"
+```
+
+### Team Orchestration
+
+Teams use a coordinator pattern:
+
+1. **Coordinator agent** receives the inbound message
+2. Coordinator decides which team member(s) should handle it (routing)
+3. Specialist agent(s) execute their part
+4. Coordinator assembles the final response or delegates follow-up
+5. All inter-agent communication logged for audit
+
+---
+
+## LLM Backend Strategy
+
+### Provider Hierarchy
+
+```
+┌─────────────────────────────────────────┐
+│              LiteLLM Router             │
+│  (load balancing, fallback, cost caps)  │
+└────┬──────────┬──────────┬─────────┬────┘
+     │          │          │         │
+  Ollama     vLLM     Anthropic   OpenAI
+  (local)   (local)    (API)      (API)
+                                    │
+                              BYO Endpoint
+                            (customer-provided)
+```
+
+### Routing Logic
+
+1. **Tenant config** specifies preferred provider(s) and fallback chain
+2. **Cost caps** per tenant (daily/monthly spend limits)
+3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
+4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants
+
+---
+
+## Messaging Format (Internal)
+
+All channel adapters normalize messages into this format:
+
+```python
+class KonstructMessage(BaseModel):
+    id: str                          # UUID
+    tenant_id: str                   # Konstruct tenant
+    channel: ChannelType             # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
+    channel_metadata: dict           # Channel-specific IDs (workspace, channel, thread)
+    sender: SenderInfo               # User ID, display name, role
+    content: MessageContent          # Text, attachments, structured data
+    timestamp: datetime
+    thread_id: str | None            # For threaded conversations
+    reply_to: str | None             # Parent message ID
+    context: dict                    # Extracted intent, entities, sentiment (populated downstream)
+```
+
+---
+
+## Security & Compliance
+
+### Non-Negotiables
+
+- **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption)
+- **Encryption in transit** (TLS 1.3 everywhere, mTLS between services)
+- **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory)
+- **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier
+- **Audit log** for every agent action, tool invocation, and LLM call
+- **RBAC** per tenant (admin, manager, member, viewer)
+- **Rate limiting** per tenant, per channel, per agent
+- **PII handling** — configurable PII detection and redaction per tenant
+
+### Future Compliance Targets
+
+- SOC 2 Type II (when revenue supports it)
+- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
+- HIPAA (Enterprise self-hosted tier only, with BAA)
+
+---
+
+## Development Workflow
+
+### Local Dev
+
+```bash
+# Clone and setup
+git clone <repo-url> && cd konstruct
+cp .env.example .env
+
+# Start all services
+docker compose up -d
+
+# Run gateway in dev mode (hot reload)
+cd packages/gateway
+uvicorn main:app --reload --port 8001
+
+# Run tests
+pytest tests/unit -x
+pytest tests/integration -x
+```
+
+### Branch Strategy
+
+- `main` — production-ready, protected
+- `develop` — integration branch
+- `feat/*` — feature branches off develop
+- `fix/*` — bugfix branches
+- `release/*` — release candidates
+
+### CI Pipeline
+
+1. Lint (`ruff check`, `ruff format --check`)
+2. Type check (`mypy --strict`)
+3. Unit tests (`pytest tests/unit`)
+4. Integration tests (`pytest tests/integration` — spins up Docker Compose)
+5. Container build + scan (`trivy image`)
+6. Deploy to staging (auto on `develop` merge)
+7. Deploy to production (manual approval on `release/*` merge)
+
+---
+
+## Milestones
+
+### Phase 1: Foundation (Weeks 1–6)
+
+- [ ] Repo scaffolding, CI/CD, Docker Compose dev environment
+- [ ] PostgreSQL schema with RLS multi-tenancy
+- [ ] Unified message format and Channel Gateway (start with Slack + Telegram)
+- [ ] Basic agent orchestrator (single agent per tenant, no teams yet)
+- [ ] LiteLLM integration with Ollama + one commercial API
+- [ ] Basic admin portal (tenant CRUD, agent config)
+
+### Phase 2: Channel Expansion + Teams (Weeks 7–12)
+
+- [ ] Add channels: Mattermost, WhatsApp, Teams
+- [ ] Multi-agent teams with coordinator pattern
+- [ ] Conversational memory (vector store + sliding window)
+- [ ] Tool framework (registry, execution, sandboxing)
+- [ ] BYO API key support
+- [ ] Tenant onboarding flow in portal
+
+### Phase 3: Polish + Launch (Weeks 13–18)
+
+- [ ] Add channels: Rocket.Chat, Signal
+- [ ] AI company hierarchy (teams of teams)
+- [ ] Cost tracking and billing integration (Stripe)
+- [ ] Agent performance analytics dashboard
+- [ ] Self-hosted deployment option (Helm chart + docs)
+- [ ] Public launch (Product Hunt, Hacker News, Reddit)
+
+### Phase 4: Scale (Post-Launch)
+
+- [ ] Kubernetes migration for production workloads
+- [ ] Cloud burst infrastructure (AWS auto-scaling inference)
+- [ ] Marketplace for pre-built AI employee templates
+- [ ] Enterprise tier with dedicated isolation
+- [ ] SOC 2 preparation
+- [ ] API for programmatic agent management
+
+---
+
+## Coding Standards
+
+### Python
+
+- **Version:** 3.12+
+- **Package manager:** `uv`
+- **Linting:** `ruff` (replaces flake8, isort, black)
+- **Type checking:** `mypy --strict` — no `Any` types in public interfaces
+- **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client)
+- **Models:** `Pydantic v2` for all data validation and serialization
+- **Async:** Prefer `async def` for all I/O-bound operations
+- **DB:** `SQLAlchemy 2.0` async with Alembic migrations
+
+### TypeScript (Portal)
+
+- **Runtime:** Node 20+ LTS
+- **Framework:** Next.js 14+ (App Router)
+- **Linting:** `eslint` + `prettier`
+- **Type checking:** `strict: true` in tsconfig
+
+### General
+
+- Every PR requires at least one approval
+- No secrets in code — use `.env` + secrets manager
+- Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions
+- Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`)
+
+---
+
+## Key Design Decisions (ADR Stubs)
+
+These need full ADRs written before implementation:
+
+1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
+2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid
+3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
+4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI
+5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation
+6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context
+7. **ADR-007:** Rate limiting — per-tenant token bucket implementation
+8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus
+
+---
+
+## Open Questions
+
+- [ ] Pricing model: per-agent, per-message, per-seat, or hybrid?
+- [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
+- [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
+- [ ] Agent-to-agent communication across tenants (marketplace scenario)?
+- [ ] White-labeling for agencies reselling Konstruct?
+
+---
+
+## References
+
+- [paperclip.ing](https://paperclip.ing) — Inspiration
+- [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway
+- [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK
+- [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK
+- [FastAPI docs](https://fastapi.tiangolo.com/) — API framework