fix: runtime deployment fixes for Docker Compose stack

- Add .gitignore for __pycache__, node_modules, .playwright-mcp
- Add CLAUDE.md project instructions
- docker-compose: remove host port exposure for internal services,
  remove Ollama container (use host), add CORS origin, bake
  NEXT_PUBLIC_API_URL at build time, run alembic migrations on
  gateway startup, add CPU-only torch pre-install
- gateway: add CORS middleware, graceful Slack degradation without
  bot token, fix None guard on slack_handler
- gateway pyproject: add aiohttp dependency for slack-bolt async
- llm-pool pyproject: install litellm from GitHub (removed from PyPI),
  enable hatch direct references
- portal: enable standalone output in next.config.ts
- Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-24 12:26:34 -06:00
parent d936bcf361
commit 0e0ea5fb66
9 changed files with 694 additions and 293 deletions

455
CLAUDE.md Normal file
View File

@@ -0,0 +1,455 @@
# CLAUDE.md — Konstruct
## What is Konstruct?
Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.
Think of it as "Hire an AI department" — not another chatbot SaaS.
---
## Project Identity
- **Codename:** Konstruct
- **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev)
- **Tagline ideas:** "Build your AI workforce" / "AI teams that just work"
- **Inspired by:** [paperclip.ing](https://paperclip.ing)
- **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support
---
## Architecture Overview
### Core Mental Model
```
Client (Slack/Teams/etc.)
┌─────────────────────┐
│ Channel Gateway │ ← Unified ingress for all messaging platforms
│ (webhook/WS) │
└────────┬────────────┘
┌─────────────────────┐
│ Message Router │ ← Tenant resolution, rate limiting, context loading
└────────┬────────────┘
┌─────────────────────┐
│ Agent Orchestrator │ ← Agent selection, tool dispatch, memory, handoffs
│ (per-tenant) │
└────────┬────────────┘
┌─────────────────────┐
│ LLM Backend Pool │ ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
└─────────────────────┘
```
### Key Architectural Principles
1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.
---
## Tech Stack
### Backend (Primary: Python)
| Layer | Technology | Rationale |
|-------|-----------|-----------|
| API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection |
| Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks |
| Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS |
| Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events |
| Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search |
| Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts |
| LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing |
| Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |
### Messaging Channel SDKs
| Channel | Library / Integration |
|---------|----------------------|
| Slack | `slack-bolt` (Events API + Socket Mode) |
| Microsoft Teams | `botbuilder-python` (Bot Framework SDK) |
| Mattermost | `mattermostdriver` + webhooks |
| Rocket.Chat | REST API + Realtime API (WebSocket) |
| WhatsApp | WhatsApp Business API (Cloud API) |
| Telegram | `python-telegram-bot` (Bot API) |
| Signal | `signal-cli` or `signald` (bridge) |
### Frontend (Admin Dashboard / Client Portal)
| Layer | Technology |
|-------|-----------|
| Framework | **Next.js 14+** (App Router) |
| UI | **Tailwind CSS + shadcn/ui** |
| State | **TanStack Query** |
| Auth | **NextAuth.js** → consider **Keycloak** for enterprise |
### Infrastructure
| Layer | Technology |
|-------|-----------|
| Dev Orchestration | **Docker Compose + Portainer** |
| Prod Orchestration | **Kubernetes (k3s or Talos Linux)** |
| Core Hosting | **Hetzner Dedicated Servers** |
| Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) |
| Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) |
| DNS | **Technitium** (internal) / **Cloudflare** (external) |
| VPN Mesh | **Headscale** (self-hosted) + Tailscale clients |
| CI/CD | **Gitea Actions****GitHub Actions** (if public) |
| Monitoring | **Prometheus + Grafana + Loki** |
| Security | **Wazuh** (SIEM), **Trivy** (container scanning) |
---
## Repo Structure
Monorepo to start, split later when service boundaries stabilize.
```
konstruct/
├── CLAUDE.md # This file
├── docker-compose.yml # Local dev environment
├── docker-compose.prod.yml # Production-like local stack
├── k8s/ # Kubernetes manifests / Helm charts
│ ├── base/
│ └── overlays/
│ ├── staging/
│ └── production/
├── packages/
│ ├── gateway/ # Channel Gateway service
│ │ ├── channels/ # Per-channel adapters (slack, teams, etc.)
│ │ ├── normalize.py # Unified message format
│ │ └── main.py
│ ├── router/ # Message Router service
│ │ ├── tenant.py # Tenant resolution
│ │ ├── ratelimit.py
│ │ └── main.py
│ ├── orchestrator/ # Agent Orchestrator service
│ │ ├── agents/ # Agent definitions and behaviors
│ │ ├── teams/ # Multi-agent team logic
│ │ ├── tools/ # Tool registry and execution
│ │ ├── memory/ # Conversation and long-term memory
│ │ └── main.py
│ ├── llm-pool/ # LLM Backend Pool service
│ │ ├── providers/ # Provider configs (litellm router)
│ │ ├── byo/ # BYO key / endpoint management
│ │ └── main.py
│ ├── portal/ # Next.js admin dashboard
│ │ ├── app/
│ │ ├── components/
│ │ └── lib/
│ └── shared/ # Shared Python libs
│ ├── models/ # Pydantic models, DB schemas
│ ├── auth/ # Auth utilities
│ ├── messaging/ # Internal message format
│ └── config/ # Shared config / env management
├── migrations/ # Alembic DB migrations
├── scripts/ # Dev scripts, seed data, utilities
├── tests/
│ ├── unit/
│ ├── integration/
│ └── e2e/
├── docs/ # Architecture docs, ADRs, runbooks
├── pyproject.toml # Python monorepo config (uv / hatch)
└── .env.example
```
---
## Multi-Tenancy Model
Tiered isolation — the level increases with the subscription plan:
| Tier | Isolation | Target |
|------|-----------|--------|
| **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
| **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
| **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
| **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |
### Tenant Resolution Flow
1. Inbound message hits Channel Gateway
2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
3. Router maps channel org → Konstruct tenant via lookup table
4. All subsequent processing scoped to that tenant's context, models, tools, and memory
---
## AI Employee Model
### Hierarchy
```
Company (AI-run)
└── Team
└── Employee (Agent)
├── Role definition (system prompt + persona)
├── Skills (tool bindings)
├── Memory (vector store + conversation history)
├── Channels (which messaging platforms it's active on)
└── Escalation rules (when to hand off to human or another agent)
```
### Employee Configuration (example)
```yaml
employee:
name: "Mara"
role: "Customer Support Lead"
persona: |
Professional, empathetic, solution-oriented.
Fluent in English, Spanish, Portuguese.
Escalates billing disputes to human after 2 failed resolutions.
model:
primary: "anthropic/claude-sonnet-4-20250514"
fallback: "openai/gpt-4o"
local: "ollama/qwen3:32b"
tools:
- zendesk_ticket_create
- zendesk_ticket_lookup
- knowledge_base_search
- calendar_book
channels:
- slack
- whatsapp
memory:
type: "conversational + rag"
retention_days: 90
escalation:
- condition: "billing_dispute AND attempts > 2"
action: "handoff_human"
- condition: "sentiment < -0.7"
action: "handoff_human"
```
### Team Orchestration
Teams use a coordinator pattern:
1. **Coordinator agent** receives the inbound message
2. Coordinator decides which team member(s) should handle it (routing)
3. Specialist agent(s) execute their part
4. Coordinator assembles the final response or delegates follow-up
5. All inter-agent communication logged for audit
---
## LLM Backend Strategy
### Provider Hierarchy
```
┌─────────────────────────────────────────┐
│ LiteLLM Router │
│ (load balancing, fallback, cost caps) │
└────┬──────────┬──────────┬─────────┬────┘
│ │ │ │
Ollama vLLM Anthropic OpenAI
(local) (local) (API) (API)
BYO Endpoint
(customer-provided)
```
### Routing Logic
1. **Tenant config** specifies preferred provider(s) and fallback chain
2. **Cost caps** per tenant (daily/monthly spend limits)
3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants
---
## Messaging Format (Internal)
All channel adapters normalize messages into this format:
```python
class KonstructMessage(BaseModel):
id: str # UUID
tenant_id: str # Konstruct tenant
channel: ChannelType # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
channel_metadata: dict # Channel-specific IDs (workspace, channel, thread)
sender: SenderInfo # User ID, display name, role
content: MessageContent # Text, attachments, structured data
timestamp: datetime
thread_id: str | None # For threaded conversations
reply_to: str | None # Parent message ID
context: dict # Extracted intent, entities, sentiment (populated downstream)
```
---
## Security & Compliance
### Non-Negotiables
- **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption)
- **Encryption in transit** (TLS 1.3 everywhere, mTLS between services)
- **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory)
- **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier
- **Audit log** for every agent action, tool invocation, and LLM call
- **RBAC** per tenant (admin, manager, member, viewer)
- **Rate limiting** per tenant, per channel, per agent
- **PII handling** — configurable PII detection and redaction per tenant
### Future Compliance Targets
- SOC 2 Type II (when revenue supports it)
- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
- HIPAA (Enterprise self-hosted tier only, with BAA)
---
## Development Workflow
### Local Dev
```bash
# Clone and setup
git clone <repo-url> && cd konstruct
cp .env.example .env
# Start all services
docker compose up -d
# Run gateway in dev mode (hot reload)
cd packages/gateway
uvicorn main:app --reload --port 8001
# Run tests
pytest tests/unit -x
pytest tests/integration -x
```
### Branch Strategy
- `main` — production-ready, protected
- `develop` — integration branch
- `feat/*` — feature branches off develop
- `fix/*` — bugfix branches
- `release/*` — release candidates
### CI Pipeline
1. Lint (`ruff check`, `ruff format --check`)
2. Type check (`mypy --strict`)
3. Unit tests (`pytest tests/unit`)
4. Integration tests (`pytest tests/integration` — spins up Docker Compose)
5. Container build + scan (`trivy image`)
6. Deploy to staging (auto on `develop` merge)
7. Deploy to production (manual approval on `release/*` merge)
---
## Milestones
### Phase 1: Foundation (Weeks 16)
- [ ] Repo scaffolding, CI/CD, Docker Compose dev environment
- [ ] PostgreSQL schema with RLS multi-tenancy
- [ ] Unified message format and Channel Gateway (start with Slack + Telegram)
- [ ] Basic agent orchestrator (single agent per tenant, no teams yet)
- [ ] LiteLLM integration with Ollama + one commercial API
- [ ] Basic admin portal (tenant CRUD, agent config)
### Phase 2: Channel Expansion + Teams (Weeks 712)
- [ ] Add channels: Mattermost, WhatsApp, Teams
- [ ] Multi-agent teams with coordinator pattern
- [ ] Conversational memory (vector store + sliding window)
- [ ] Tool framework (registry, execution, sandboxing)
- [ ] BYO API key support
- [ ] Tenant onboarding flow in portal
### Phase 3: Polish + Launch (Weeks 1318)
- [ ] Add channels: Rocket.Chat, Signal
- [ ] AI company hierarchy (teams of teams)
- [ ] Cost tracking and billing integration (Stripe)
- [ ] Agent performance analytics dashboard
- [ ] Self-hosted deployment option (Helm chart + docs)
- [ ] Public launch (Product Hunt, Hacker News, Reddit)
### Phase 4: Scale (Post-Launch)
- [ ] Kubernetes migration for production workloads
- [ ] Cloud burst infrastructure (AWS auto-scaling inference)
- [ ] Marketplace for pre-built AI employee templates
- [ ] Enterprise tier with dedicated isolation
- [ ] SOC 2 preparation
- [ ] API for programmatic agent management
---
## Coding Standards
### Python
- **Version:** 3.12+
- **Package manager:** `uv`
- **Linting:** `ruff` (replaces flake8, isort, black)
- **Type checking:** `mypy --strict` — no `Any` types in public interfaces
- **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client)
- **Models:** `Pydantic v2` for all data validation and serialization
- **Async:** Prefer `async def` for all I/O-bound operations
- **DB:** `SQLAlchemy 2.0` async with Alembic migrations
### TypeScript (Portal)
- **Runtime:** Node 20+ LTS
- **Framework:** Next.js 14+ (App Router)
- **Linting:** `eslint` + `prettier`
- **Type checking:** `strict: true` in tsconfig
### General
- Every PR requires at least one approval
- No secrets in code — use `.env` + secrets manager
- Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions
- Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`)
---
## Key Design Decisions (ADR Stubs)
These need full ADRs written before implementation:
1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid
3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI
5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation
6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context
7. **ADR-007:** Rate limiting — per-tenant token bucket implementation
8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus
---
## Open Questions
- [ ] Pricing model: per-agent, per-message, per-seat, or hybrid?
- [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
- [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
- [ ] Agent-to-agent communication across tenants (marketplace scenario)?
- [ ] White-labeling for agencies reselling Konstruct?
---
## References
- [paperclip.ing](https://paperclip.ing) — Inspiration
- [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway
- [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK
- [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK
- [FastAPI docs](https://fastapi.tiangolo.com/) — API framework