fix: runtime deployment fixes for Docker Compose stack
- Add .gitignore for __pycache__, node_modules, .playwright-mcp - Add CLAUDE.md project instructions - docker-compose: remove host port exposure for internal services, remove Ollama container (use host), add CORS origin, bake NEXT_PUBLIC_API_URL at build time, run alembic migrations on gateway startup, add CPU-only torch pre-install - gateway: add CORS middleware, graceful Slack degradation without bot token, fix None guard on slack_handler - gateway pyproject: add aiohttp dependency for slack-bolt async - llm-pool pyproject: install litellm from GitHub (removed from PyPI), enable hatch direct references - portal: enable standalone output in next.config.ts - Remove orphaned migration 003_phase2_audit_kb.py (renamed to 004) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
455
CLAUDE.md
Normal file
455
CLAUDE.md
Normal file
@@ -0,0 +1,455 @@
|
||||
# CLAUDE.md — Konstruct
|
||||
|
||||
## What is Konstruct?
|
||||
|
||||
Konstruct is an AI workforce platform where clients subscribe to AI employees, teams, or entire AI-run companies. AI workers communicate through familiar channels — Slack, Microsoft Teams, Mattermost, Rocket.Chat, WhatsApp, Telegram, and Signal — so adoption requires zero behavior change from the customer.
|
||||
|
||||
Think of it as "Hire an AI department" — not another chatbot SaaS.
|
||||
|
||||
---
|
||||
|
||||
## Project Identity
|
||||
|
||||
- **Codename:** Konstruct
|
||||
- **Domain:** TBD (check konstruct.ai, konstruct.io, konstruct.dev)
|
||||
- **Tagline ideas:** "Build your AI workforce" / "AI teams that just work"
|
||||
- **Inspired by:** [paperclip.ing](https://paperclip.ing)
|
||||
- **Differentiation:** Channel-native AI workers (not a dashboard), tiered multi-tenancy, BYO-model support
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Core Mental Model
|
||||
|
||||
```
|
||||
Client (Slack/Teams/etc.)
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Channel Gateway │ ← Unified ingress for all messaging platforms
|
||||
│ (webhook/WS) │
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Message Router │ ← Tenant resolution, rate limiting, context loading
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ Agent Orchestrator │ ← Agent selection, tool dispatch, memory, handoffs
|
||||
│ (per-tenant) │
|
||||
└────────┬────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ LLM Backend Pool │ ← LiteLLM router → Ollama / vLLM / OpenAI / Anthropic / BYO
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
### Key Architectural Principles
|
||||
|
||||
1. **Channel-agnostic core** — Business logic never depends on which messaging platform the message came from. The Channel Gateway normalizes everything into a unified internal message format.
|
||||
2. **Tenant-isolated agent state** — Each tenant's agents have isolated memory, tools, and configuration. No cross-tenant data leakage, ever.
|
||||
3. **LLM backend as a pluggable resource** — Clients can use platform-provided models, bring their own API keys, or point to their own self-hosted inference endpoints.
|
||||
4. **Agents are composable** — A single AI employee is an agent. A team is an orchestrated group of agents. A company is a hierarchy of teams with shared context and delegation.
|
||||
|
||||
---
|
||||
|
||||
## Tech Stack
|
||||
|
||||
### Backend (Primary: Python)
|
||||
|
||||
| Layer | Technology | Rationale |
|
||||
|-------|-----------|-----------|
|
||||
| API Framework | **FastAPI** | Async-native, OpenAPI docs, dependency injection |
|
||||
| Task Queue | **Celery + Redis** or **Dramatiq** | Background jobs: LLM calls, tool execution, webhooks |
|
||||
| Database | **PostgreSQL 16** | Primary data store, tenant isolation via schemas or RLS |
|
||||
| Cache / Pub-Sub | **Redis / Valkey** | Session state, rate limiting, pub/sub for real-time events |
|
||||
| Vector Store | **pgvector** (start) → **Qdrant** (scale) | Agent memory, RAG, conversation search |
|
||||
| Object Storage | **MinIO** (self-hosted) / **S3** (cloud burst) | File attachments, documents, agent artifacts |
|
||||
| LLM Gateway | **LiteLLM** | Unified API across all LLM providers, cost tracking, fallback routing |
|
||||
| Agent Framework | **Custom** (evaluate LangGraph, CrewAI, or raw) | Agent orchestration, tool use, multi-agent handoffs |
|
||||
|
||||
### Messaging Channel SDKs
|
||||
|
||||
| Channel | Library / Integration |
|
||||
|---------|----------------------|
|
||||
| Slack | `slack-bolt` (Events API + Socket Mode) |
|
||||
| Microsoft Teams | `botbuilder-python` (Bot Framework SDK) |
|
||||
| Mattermost | `mattermostdriver` + webhooks |
|
||||
| Rocket.Chat | REST API + Realtime API (WebSocket) |
|
||||
| WhatsApp | WhatsApp Business API (Cloud API) |
|
||||
| Telegram | `python-telegram-bot` (Bot API) |
|
||||
| Signal | `signal-cli` or `signald` (bridge) |
|
||||
|
||||
### Frontend (Admin Dashboard / Client Portal)
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|-----------|
|
||||
| Framework | **Next.js 14+** (App Router) |
|
||||
| UI | **Tailwind CSS + shadcn/ui** |
|
||||
| State | **TanStack Query** |
|
||||
| Auth | **NextAuth.js** → consider **Keycloak** for enterprise |
|
||||
|
||||
### Infrastructure
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|-----------|
|
||||
| Dev Orchestration | **Docker Compose + Portainer** |
|
||||
| Prod Orchestration | **Kubernetes (k3s or Talos Linux)** |
|
||||
| Core Hosting | **Hetzner Dedicated Servers** |
|
||||
| Cloud Burst | **AWS / GCP** (auto-scale inference, overflow) |
|
||||
| Reverse Proxy | **NPM Plus** (dev) / **Traefik** (prod K8s ingress) |
|
||||
| DNS | **Technitium** (internal) / **Cloudflare** (external) |
|
||||
| VPN Mesh | **Headscale** (self-hosted) + Tailscale clients |
|
||||
| CI/CD | **Gitea Actions** → **GitHub Actions** (if public) |
|
||||
| Monitoring | **Prometheus + Grafana + Loki** |
|
||||
| Security | **Wazuh** (SIEM), **Trivy** (container scanning) |
|
||||
|
||||
---
|
||||
|
||||
## Repo Structure
|
||||
|
||||
Monorepo to start, split later when service boundaries stabilize.
|
||||
|
||||
```
|
||||
konstruct/
|
||||
├── CLAUDE.md # This file
|
||||
├── docker-compose.yml # Local dev environment
|
||||
├── docker-compose.prod.yml # Production-like local stack
|
||||
├── k8s/ # Kubernetes manifests / Helm charts
|
||||
│ ├── base/
|
||||
│ └── overlays/
|
||||
│ ├── staging/
|
||||
│ └── production/
|
||||
├── packages/
|
||||
│ ├── gateway/ # Channel Gateway service
|
||||
│ │ ├── channels/ # Per-channel adapters (slack, teams, etc.)
|
||||
│ │ ├── normalize.py # Unified message format
|
||||
│ │ └── main.py
|
||||
│ ├── router/ # Message Router service
|
||||
│ │ ├── tenant.py # Tenant resolution
|
||||
│ │ ├── ratelimit.py
|
||||
│ │ └── main.py
|
||||
│ ├── orchestrator/ # Agent Orchestrator service
|
||||
│ │ ├── agents/ # Agent definitions and behaviors
|
||||
│ │ ├── teams/ # Multi-agent team logic
|
||||
│ │ ├── tools/ # Tool registry and execution
|
||||
│ │ ├── memory/ # Conversation and long-term memory
|
||||
│ │ └── main.py
|
||||
│ ├── llm-pool/ # LLM Backend Pool service
|
||||
│ │ ├── providers/ # Provider configs (litellm router)
|
||||
│ │ ├── byo/ # BYO key / endpoint management
|
||||
│ │ └── main.py
|
||||
│ ├── portal/ # Next.js admin dashboard
|
||||
│ │ ├── app/
|
||||
│ │ ├── components/
|
||||
│ │ └── lib/
|
||||
│ └── shared/ # Shared Python libs
|
||||
│ ├── models/ # Pydantic models, DB schemas
|
||||
│ ├── auth/ # Auth utilities
|
||||
│ ├── messaging/ # Internal message format
|
||||
│ └── config/ # Shared config / env management
|
||||
├── migrations/ # Alembic DB migrations
|
||||
├── scripts/ # Dev scripts, seed data, utilities
|
||||
├── tests/
|
||||
│ ├── unit/
|
||||
│ ├── integration/
|
||||
│ └── e2e/
|
||||
├── docs/ # Architecture docs, ADRs, runbooks
|
||||
├── pyproject.toml # Python monorepo config (uv / hatch)
|
||||
└── .env.example
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Multi-Tenancy Model
|
||||
|
||||
Tiered isolation — the level increases with the subscription plan:
|
||||
|
||||
| Tier | Isolation | Target |
|
||||
|------|-----------|--------|
|
||||
| **Starter** | Shared infra, PostgreSQL RLS, logical separation | Solo founders, micro-businesses |
|
||||
| **Team** | Dedicated DB schema, isolated Redis namespace, dedicated agent processes | SMBs, small teams |
|
||||
| **Enterprise** | Dedicated namespace (K8s), dedicated DB, optional dedicated LLM inference | Larger orgs, compliance needs |
|
||||
| **Self-Hosted** | Customer deploys their own Konstruct instance (Helm chart / Docker Compose) | On-prem requirements, data sovereignty |
|
||||
|
||||
### Tenant Resolution Flow
|
||||
|
||||
1. Inbound message hits Channel Gateway
|
||||
2. Gateway extracts workspace/org identifier from the channel metadata (Slack workspace ID, Teams tenant ID, etc.)
|
||||
3. Router maps channel org → Konstruct tenant via lookup table
|
||||
4. All subsequent processing scoped to that tenant's context, models, tools, and memory
|
||||
|
||||
---
|
||||
|
||||
## AI Employee Model
|
||||
|
||||
### Hierarchy
|
||||
|
||||
```
|
||||
Company (AI-run)
|
||||
└── Team
|
||||
└── Employee (Agent)
|
||||
├── Role definition (system prompt + persona)
|
||||
├── Skills (tool bindings)
|
||||
├── Memory (vector store + conversation history)
|
||||
├── Channels (which messaging platforms it's active on)
|
||||
└── Escalation rules (when to hand off to human or another agent)
|
||||
```
|
||||
|
||||
### Employee Configuration (example)
|
||||
|
||||
```yaml
|
||||
employee:
|
||||
name: "Mara"
|
||||
role: "Customer Support Lead"
|
||||
persona: |
|
||||
Professional, empathetic, solution-oriented.
|
||||
Fluent in English, Spanish, Portuguese.
|
||||
Escalates billing disputes to human after 2 failed resolutions.
|
||||
model:
|
||||
primary: "anthropic/claude-sonnet-4-20250514"
|
||||
fallback: "openai/gpt-4o"
|
||||
local: "ollama/qwen3:32b"
|
||||
tools:
|
||||
- zendesk_ticket_create
|
||||
- zendesk_ticket_lookup
|
||||
- knowledge_base_search
|
||||
- calendar_book
|
||||
channels:
|
||||
- slack
|
||||
- whatsapp
|
||||
memory:
|
||||
type: "conversational + rag"
|
||||
retention_days: 90
|
||||
escalation:
|
||||
- condition: "billing_dispute AND attempts > 2"
|
||||
action: "handoff_human"
|
||||
- condition: "sentiment < -0.7"
|
||||
action: "handoff_human"
|
||||
```
|
||||
|
||||
### Team Orchestration
|
||||
|
||||
Teams use a coordinator pattern:
|
||||
|
||||
1. **Coordinator agent** receives the inbound message
|
||||
2. Coordinator decides which team member(s) should handle it (routing)
|
||||
3. Specialist agent(s) execute their part
|
||||
4. Coordinator assembles the final response or delegates follow-up
|
||||
5. All inter-agent communication logged for audit
|
||||
|
||||
---
|
||||
|
||||
## LLM Backend Strategy
|
||||
|
||||
### Provider Hierarchy
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ LiteLLM Router │
|
||||
│ (load balancing, fallback, cost caps) │
|
||||
└────┬──────────┬──────────┬─────────┬────┘
|
||||
│ │ │ │
|
||||
Ollama vLLM Anthropic OpenAI
|
||||
(local) (local) (API) (API)
|
||||
│
|
||||
BYO Endpoint
|
||||
(customer-provided)
|
||||
```
|
||||
|
||||
### Routing Logic
|
||||
|
||||
1. **Tenant config** specifies preferred provider(s) and fallback chain
|
||||
2. **Cost caps** per tenant (daily/monthly spend limits)
|
||||
3. **Model routing** by task type: simple queries → smaller/local models, complex reasoning → commercial APIs
|
||||
4. **BYO keys** stored encrypted (AES-256), never logged, never used for other tenants
|
||||
|
||||
---
|
||||
|
||||
## Messaging Format (Internal)
|
||||
|
||||
All channel adapters normalize messages into this format:
|
||||
|
||||
```python
|
||||
class KonstructMessage(BaseModel):
|
||||
id: str # UUID
|
||||
tenant_id: str # Konstruct tenant
|
||||
channel: ChannelType # slack | teams | mattermost | rocketchat | whatsapp | telegram | signal
|
||||
channel_metadata: dict # Channel-specific IDs (workspace, channel, thread)
|
||||
sender: SenderInfo # User ID, display name, role
|
||||
content: MessageContent # Text, attachments, structured data
|
||||
timestamp: datetime
|
||||
thread_id: str | None # For threaded conversations
|
||||
reply_to: str | None # Parent message ID
|
||||
context: dict # Extracted intent, entities, sentiment (populated downstream)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security & Compliance
|
||||
|
||||
### Non-Negotiables
|
||||
|
||||
- **Encryption at rest** (PostgreSQL TDE, MinIO server-side encryption)
|
||||
- **Encryption in transit** (TLS 1.3 everywhere, mTLS between services)
|
||||
- **Tenant isolation** enforced at every layer (DB, cache, object storage, agent memory)
|
||||
- **BYO API keys** encrypted with per-tenant KEK, HSM-backed in Enterprise tier
|
||||
- **Audit log** for every agent action, tool invocation, and LLM call
|
||||
- **RBAC** per tenant (admin, manager, member, viewer)
|
||||
- **Rate limiting** per tenant, per channel, per agent
|
||||
- **PII handling** — configurable PII detection and redaction per tenant
|
||||
|
||||
### Future Compliance Targets
|
||||
|
||||
- SOC 2 Type II (when revenue supports it)
|
||||
- GDPR data residency (leverage Hetzner EU + customer self-hosted option)
|
||||
- HIPAA (Enterprise self-hosted tier only, with BAA)
|
||||
|
||||
---
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### Local Dev
|
||||
|
||||
```bash
|
||||
# Clone and setup
|
||||
git clone <repo-url> && cd konstruct
|
||||
cp .env.example .env
|
||||
|
||||
# Start all services
|
||||
docker compose up -d
|
||||
|
||||
# Run gateway in dev mode (hot reload)
|
||||
cd packages/gateway
|
||||
uvicorn main:app --reload --port 8001
|
||||
|
||||
# Run tests
|
||||
pytest tests/unit -x
|
||||
pytest tests/integration -x
|
||||
```
|
||||
|
||||
### Branch Strategy
|
||||
|
||||
- `main` — production-ready, protected
|
||||
- `develop` — integration branch
|
||||
- `feat/*` — feature branches off develop
|
||||
- `fix/*` — bugfix branches
|
||||
- `release/*` — release candidates
|
||||
|
||||
### CI Pipeline
|
||||
|
||||
1. Lint (`ruff check`, `ruff format --check`)
|
||||
2. Type check (`mypy --strict`)
|
||||
3. Unit tests (`pytest tests/unit`)
|
||||
4. Integration tests (`pytest tests/integration` — spins up Docker Compose)
|
||||
5. Container build + scan (`trivy image`)
|
||||
6. Deploy to staging (auto on `develop` merge)
|
||||
7. Deploy to production (manual approval on `release/*` merge)
|
||||
|
||||
---
|
||||
|
||||
## Milestones
|
||||
|
||||
### Phase 1: Foundation (Weeks 1–6)
|
||||
|
||||
- [ ] Repo scaffolding, CI/CD, Docker Compose dev environment
|
||||
- [ ] PostgreSQL schema with RLS multi-tenancy
|
||||
- [ ] Unified message format and Channel Gateway (start with Slack + Telegram)
|
||||
- [ ] Basic agent orchestrator (single agent per tenant, no teams yet)
|
||||
- [ ] LiteLLM integration with Ollama + one commercial API
|
||||
- [ ] Basic admin portal (tenant CRUD, agent config)
|
||||
|
||||
### Phase 2: Channel Expansion + Teams (Weeks 7–12)
|
||||
|
||||
- [ ] Add channels: Mattermost, WhatsApp, Teams
|
||||
- [ ] Multi-agent teams with coordinator pattern
|
||||
- [ ] Conversational memory (vector store + sliding window)
|
||||
- [ ] Tool framework (registry, execution, sandboxing)
|
||||
- [ ] BYO API key support
|
||||
- [ ] Tenant onboarding flow in portal
|
||||
|
||||
### Phase 3: Polish + Launch (Weeks 13–18)
|
||||
|
||||
- [ ] Add channels: Rocket.Chat, Signal
|
||||
- [ ] AI company hierarchy (teams of teams)
|
||||
- [ ] Cost tracking and billing integration (Stripe)
|
||||
- [ ] Agent performance analytics dashboard
|
||||
- [ ] Self-hosted deployment option (Helm chart + docs)
|
||||
- [ ] Public launch (Product Hunt, Hacker News, Reddit)
|
||||
|
||||
### Phase 4: Scale (Post-Launch)
|
||||
|
||||
- [ ] Kubernetes migration for production workloads
|
||||
- [ ] Cloud burst infrastructure (AWS auto-scaling inference)
|
||||
- [ ] Marketplace for pre-built AI employee templates
|
||||
- [ ] Enterprise tier with dedicated isolation
|
||||
- [ ] SOC 2 preparation
|
||||
- [ ] API for programmatic agent management
|
||||
|
||||
---
|
||||
|
||||
## Coding Standards
|
||||
|
||||
### Python
|
||||
|
||||
- **Version:** 3.12+
|
||||
- **Package manager:** `uv`
|
||||
- **Linting:** `ruff` (replaces flake8, isort, black)
|
||||
- **Type checking:** `mypy --strict` — no `Any` types in public interfaces
|
||||
- **Testing:** `pytest` + `pytest-asyncio` + `httpx` (for FastAPI test client)
|
||||
- **Models:** `Pydantic v2` for all data validation and serialization
|
||||
- **Async:** Prefer `async def` for all I/O-bound operations
|
||||
- **DB:** `SQLAlchemy 2.0` async with Alembic migrations
|
||||
|
||||
### TypeScript (Portal)
|
||||
|
||||
- **Runtime:** Node 20+ LTS
|
||||
- **Framework:** Next.js 14+ (App Router)
|
||||
- **Linting:** `eslint` + `prettier`
|
||||
- **Type checking:** `strict: true` in tsconfig
|
||||
|
||||
### General
|
||||
|
||||
- Every PR requires at least one approval
|
||||
- No secrets in code — use `.env` + secrets manager
|
||||
- Write ADRs (Architecture Decision Records) in `docs/adr/` for significant decisions
|
||||
- Conventional commits (`feat:`, `fix:`, `chore:`, `docs:`, `refactor:`)
|
||||
|
||||
---
|
||||
|
||||
## Key Design Decisions (ADR Stubs)
|
||||
|
||||
These need full ADRs written before implementation:
|
||||
|
||||
1. **ADR-001:** Channel Gateway — webhook-based vs. persistent WebSocket connections per channel
|
||||
2. **ADR-002:** Agent memory — pgvector vs. dedicated vector DB vs. hybrid
|
||||
3. **ADR-003:** Multi-tenancy — RLS vs. schema-per-tenant vs. DB-per-tenant
|
||||
4. **ADR-004:** Agent framework — build custom vs. adopt LangGraph/CrewAI
|
||||
5. **ADR-005:** BYO key encryption — envelope encryption strategy and key rotation
|
||||
6. **ADR-006:** Inter-agent communication — direct function calls vs. message bus vs. shared context
|
||||
7. **ADR-007:** Rate limiting — per-tenant token bucket implementation
|
||||
8. **ADR-008:** Self-hosted distribution — Helm chart vs. Docker Compose vs. Omnibus
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
- [ ] Pricing model: per-agent, per-message, per-seat, or hybrid?
|
||||
- [ ] Should agents maintain persistent identity across channels (same "Mara" on Slack and WhatsApp)?
|
||||
- [ ] Voice channel support? (Telephony via Twilio/Vonage — Phase 4+?)
|
||||
- [ ] Agent-to-agent communication across tenants (marketplace scenario)?
|
||||
- [ ] White-labeling for agencies reselling Konstruct?
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [paperclip.ing](https://paperclip.ing) — Inspiration
|
||||
- [LiteLLM docs](https://docs.litellm.ai/) — LLM gateway
|
||||
- [Slack Bolt Python](https://slack.dev/bolt-python/) — Slack SDK
|
||||
- [Bot Framework Python](https://github.com/microsoft/botbuilder-python) — Teams SDK
|
||||
- [FastAPI docs](https://fastapi.tiangolo.com/) — API framework
|
||||
Reference in New Issue
Block a user