konstruct/.planning/phases/01-foundation/01-03-PLAN.md

---
phase: 01-foundation
plan: 03
type: execute
wave: 3
depends_on: ["01-01", "01-02"]
files_modified:
  - packages/gateway/__init__.py
  - packages/gateway/main.py
  - packages/gateway/channels/__init__.py
  - packages/gateway/channels/slack.py
  - packages/gateway/normalize.py
  - packages/gateway/verify.py
  - packages/router/__init__.py
  - packages/router/main.py
  - packages/router/tenant.py
  - packages/router/ratelimit.py
  - packages/router/idempotency.py
  - packages/router/context.py
  - docker-compose.yml
  - tests/unit/test_ratelimit.py
  - tests/integration/test_slack_flow.py
  - tests/integration/test_agent_persona.py
  - tests/integration/test_ratelimit.py
autonomous: true
requirements:
  - CHAN-02
  - CHAN-05
  - AGNT-01

must_haves:
  truths:
    - "A Slack @mention or DM to the AI employee triggers an LLM-generated response posted back in the same Slack thread"
    - "The Slack event handler returns HTTP 200 within 3 seconds — LLM work is dispatched to Celery, not done inline"
    - "A request exceeding the per-tenant or per-channel rate limit is rejected with an informative Slack message rather than silently dropped"
    - "The agent's response reflects the configured name, role, and persona from the Agent table"
    - "A typing indicator (placeholder message) appears while the LLM is generating"
  artifacts:
    - path: "packages/gateway/main.py"
      provides: "FastAPI app with /slack/events endpoint mounting slack-bolt AsyncApp"
      exports: ["app"]
    - path: "packages/gateway/channels/slack.py"
      provides: "Slack event handlers for @mentions and DMs, dispatches to Celery"
      exports: ["register_slack_handlers"]
    - path: "packages/gateway/normalize.py"
      provides: "Slack event -> KonstructMessage normalization"
      exports: ["normalize_slack_event"]
    - path: "packages/router/tenant.py"
      provides: "Workspace ID -> tenant_id resolution from DB"
      exports: ["resolve_tenant"]
    - path: "packages/router/ratelimit.py"
      provides: "Redis token bucket rate limiter per tenant per channel"
      exports: ["check_rate_limit", "RateLimitExceeded"]
    - path: "packages/router/idempotency.py"
      provides: "Redis-based message deduplication"
      exports: ["is_duplicate", "mark_processed"]
  key_links:
    - from: "packages/gateway/channels/slack.py"
      to: "packages/orchestrator/tasks.py"
      via: "handle_message_task.delay(msg.model_dump())"
      pattern: "handle_message.*delay"
    - from: "packages/gateway/channels/slack.py"
      to: "packages/router/tenant.py"
      via: "resolve_tenant(workspace_id, channel_type)"
      pattern: "resolve_tenant"
    - from: "packages/gateway/channels/slack.py"
      to: "packages/router/ratelimit.py"
      via: "check_rate_limit(tenant_id, channel) before dispatch"
      pattern: "check_rate_limit"
    - from: "packages/orchestrator/agents/runner.py"
      to: "Slack API"
      via: "chat.update to replace placeholder with real response"
      pattern: "chat_update|chat\\.update"
---

<objective>
Build the Channel Gateway (Slack adapter with slack-bolt AsyncApp), Message Router (tenant resolution, rate limiting, idempotency), and wire them to the Celery orchestrator from Plan 02 to complete the end-to-end Slack message -> LLM response flow.

Purpose: Close the vertical loop — a Slack user @mentions the AI employee, a response appears in-thread. This is the core value demonstration of the entire platform.

Output: Working Slack integration where @mentions and DMs trigger LLM responses in-thread, with rate limiting, tenant resolution, deduplication, and typing indicator.
</objective>

<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>

<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-foundation/01-CONTEXT.md
@.planning/phases/01-foundation/01-RESEARCH.md
@.planning/phases/01-foundation/01-01-SUMMARY.md
@.planning/phases/01-foundation/01-02-SUMMARY.md

<interfaces>
<!-- From Plan 01 — shared models -->

From packages/shared/models/message.py:
```python
class KonstructMessage(BaseModel):
    id: str
    tenant_id: str | None = None
    channel: ChannelType
    channel_metadata: dict
    sender: SenderInfo
    content: MessageContent
    timestamp: datetime
    thread_id: str | None = None
```

From packages/shared/redis_keys.py:
```python
def rate_limit_key(tenant_id: str, channel: str) -> str: ...
def idempotency_key(tenant_id: str, message_id: str) -> str: ...
def engaged_thread_key(tenant_id: str, thread_id: str) -> str: ...
```

From packages/shared/rls.py:
```python
current_tenant_id: ContextVar[str | None]
```

<!-- From Plan 02 — orchestrator tasks and agent runner -->

From packages/orchestrator/tasks.py:
```python
@app.task
def handle_message(message_data: dict) -> dict: ...
```

From packages/orchestrator/agents/builder.py:
```python
def build_system_prompt(agent: Agent) -> str: ...
def build_messages(system_prompt: str, user_message: str, history: list[dict] | None = None) -> list[dict]: ...
```
</interfaces>
</context>

<tasks>

<task type="auto">
  <name>Task 1: Channel Gateway (Slack adapter) and Message Router</name>
  <files>
    packages/gateway/__init__.py,
    packages/gateway/main.py,
    packages/gateway/channels/__init__.py,
    packages/gateway/channels/slack.py,
    packages/gateway/normalize.py,
    packages/gateway/verify.py,
    packages/router/__init__.py,
    packages/router/main.py,
    packages/router/tenant.py,
    packages/router/ratelimit.py,
    packages/router/idempotency.py,
    packages/router/context.py,
    docker-compose.yml
  </files>
  <action>
    1. Create `packages/gateway/normalize.py`:
       - `normalize_slack_event(event: dict, workspace_id: str) -> KonstructMessage`: Converts a Slack Events API payload into a KonstructMessage. Extracts: user ID, text, thread_ts -> thread_id, channel ID, workspace_id into channel_metadata. Sets channel=ChannelType.SLACK.
       - Handle both @mention events (strip the `<@BOT_ID>` prefix from text) and DM events.

    2. Create `packages/gateway/channels/slack.py`:
       - `register_slack_handlers(slack_app: AsyncApp)`: Registers event handlers on the slack-bolt AsyncApp.
       - Handle `app_mention` event: Normalize message, resolve tenant, check rate limit, check idempotency, post placeholder "Thinking..." message in thread (typing indicator per user decision), dispatch to Celery with `handle_message_task.delay(msg.model_dump() | {"placeholder_ts": placeholder_msg["ts"], "channel_id": event["channel"]})`.
       - Handle `message` event (DMs only — filter `channel_type == "im"`): Same flow as app_mention.
       - Thread follow-up behavior (Claude's discretion): Implement auto-follow for engaged threads. After the first @mention in a thread, store `engaged_thread_key(tenant_id, thread_id)` in Redis with 30-minute TTL. Subsequent messages in that thread (even without @mention) trigger a response. Per research recommendation.
       - If rate limit exceeded: Post an ephemeral message to the user: "I'm receiving too many requests right now. Please try again in a moment." Do NOT dispatch to Celery.
       - If tenant resolution fails (unknown workspace): Log warning and ignore the event silently.
       - CRITICAL: Return HTTP 200 immediately. NO LLM work inside the handler. Slack retries after 3 seconds.

    3. Create `packages/gateway/main.py`:
       - FastAPI app mounting slack-bolt AsyncApp via `AsyncSlackRequestHandler`
       - `POST /slack/events` endpoint handled by the slack handler
       - `GET /health` endpoint
       - Port 8001

    4. Create `packages/router/tenant.py`:
       - `async def resolve_tenant(workspace_id: str, channel_type: ChannelType, session: AsyncSession) -> str | None`: Queries `channel_connections` table for matching workspace_id + channel_type, returns tenant_id or None. Uses RLS-free query (tenant resolution must work across all tenants — this is the one pre-RLS operation).

    5. Create `packages/router/ratelimit.py`:
       - `async def check_rate_limit(tenant_id: str, channel: str, redis: Redis) -> bool`: Implements token bucket using Redis. Uses `rate_limit_key(tenant_id, channel)` from shared redis_keys. Default: 30 requests per minute per tenant per channel (configurable). Returns True if allowed, raises `RateLimitExceeded` if not.
       - `class RateLimitExceeded(Exception)`: Custom exception with remaining_seconds attribute.

    6. Create `packages/router/idempotency.py`:
       - `async def is_duplicate(tenant_id: str, message_id: str, redis: Redis) -> bool`: Checks Redis for `idempotency_key(tenant_id, message_id)`. If exists, return True (duplicate). Otherwise, set with 24-hour TTL and return False.

    7. Create `packages/router/context.py`:
       - `async def load_agent_for_tenant(tenant_id: str, session: AsyncSession) -> Agent | None`: Loads the active agent for the tenant (Phase 1 = single agent per tenant). Sets `current_tenant_id` ContextVar before querying.

    8. Update the Celery `handle_message` task in `packages/orchestrator/tasks.py` (or instruct Plan 02 SUMMARY reader to expect this):
       - After generating LLM response, use `slack_bolt` or `httpx` to call `chat.update` on the placeholder message (replace "Thinking..." with the real response). Use `placeholder_ts` and `channel_id` from the task payload.
       - Per user decision: Always reply in threads.

    9. Update `docker-compose.yml` to add `gateway` service on port 8001, depending on redis, postgres, celery-worker.

    10. Create `__init__.py` files for gateway, gateway/channels, router packages.
  </action>
  <verify>
    <automated>cd /home/adelorenzo/repos/konstruct && python -c "from packages.gateway.normalize import normalize_slack_event; from packages.router.tenant import resolve_tenant; from packages.router.ratelimit import check_rate_limit; print('Gateway + Router imports OK')"</automated>
  </verify>
  <done>
    - Slack @mentions and DMs are handled by slack-bolt AsyncApp in HTTP mode
    - Messages are normalized to KonstructMessage format
    - Tenant resolution maps workspace_id to tenant_id
    - Rate limiting enforces per-tenant per-channel limits with Redis token bucket
    - Idempotency deduplication prevents double-processing of Slack retries
    - Placeholder "Thinking..." message posted immediately, replaced with LLM response
    - Auto-follow engaged threads with 30-minute idle timeout
    - HTTP 200 returned within 3 seconds, all LLM work dispatched to Celery
  </done>
</task>

<task type="auto">
  <name>Task 2: End-to-end integration tests for Slack flow, rate limiting, and agent persona</name>
  <files>
    tests/unit/test_ratelimit.py,
    tests/integration/test_slack_flow.py,
    tests/integration/test_agent_persona.py,
    tests/integration/test_ratelimit.py
  </files>
  <action>
    1. Create `tests/unit/test_ratelimit.py` (CHAN-05 unit):
       - Test token bucket allows requests under the limit (29 of 30)
       - Test token bucket rejects the 31st request in a 1-minute window
       - Test that rate limit keys are namespaced per tenant (tenant_a limit independent of tenant_b)
       - Test that rate limit resets after the window expires
       - Use fakeredis or real Redis from Docker Compose

    2. Create `tests/integration/test_ratelimit.py` (CHAN-05 integration):
       - Test the full flow: send a Slack event that exceeds rate limit, verify that an ephemeral "too many requests" message is sent back via Slack API (mock Slack client)
       - Verify the event is NOT dispatched to Celery when rate-limited

    3. Create `tests/integration/test_slack_flow.py` (CHAN-02):
       - Mock Slack client (no real Slack workspace needed)
       - Mock LLM pool response (no real LLM call needed)
       - Test full flow: Slack app_mention event -> normalize -> resolve tenant -> dispatch Celery -> LLM call -> chat.update with response
       - Verify the response is posted in-thread (thread_ts is set)
       - Verify placeholder "Thinking..." message is posted before Celery dispatch
       - Verify the placeholder is replaced with the real response
       - Test DM flow: message event with channel_type="im" triggers the same pipeline
       - Test that bot messages are ignored (no infinite loop)
       - Test that unknown workspace_id events are silently ignored

    4. Create `tests/integration/test_agent_persona.py` (AGNT-01):
       - Create a tenant with an agent configured with name="Mara", role="Customer Support", persona="Professional and empathetic"
       - Mock the LLM pool to capture the messages array sent to /complete
       - Trigger a message through the pipeline
       - Verify the system prompt contains: "Your name is Mara", "Your role is Customer Support", "Professional and empathetic", and the AI transparency clause
       - Verify model_preference from the agent config is passed to the LLM pool
  </action>
  <verify>
    <automated>cd /home/adelorenzo/repos/konstruct && pytest tests/unit/test_ratelimit.py tests/integration/test_slack_flow.py tests/integration/test_agent_persona.py tests/integration/test_ratelimit.py -x -q</automated>
  </verify>
  <done>
    - Rate limiting unit tests verify token bucket behavior per tenant per channel
    - Slack flow integration test proves end-to-end: event -> normalize -> tenant resolve -> Celery -> LLM -> thread reply
    - Agent persona test proves system prompt reflects name, role, persona, and AI transparency clause
    - Rate limit integration test proves over-limit requests get informative rejection
    - All tests pass with mocked Slack client and mocked LLM pool
  </done>
</task>

</tasks>

<verification>
- `pytest tests/unit/test_ratelimit.py -x` verifies token bucket logic
- `pytest tests/integration/test_slack_flow.py -x` proves end-to-end Slack -> LLM -> reply
- `pytest tests/integration/test_agent_persona.py -x` proves persona reflected in system prompt
- `pytest tests/integration/test_ratelimit.py -x` proves rate limit produces informative rejection
- `grep -r "async def.*@app.task" packages/orchestrator/` returns NO matches (no async Celery tasks)
</verification>

<success_criteria>
- A Slack @mention or DM triggers an LLM response in the same thread (mocked end-to-end)
- Rate-limited requests are rejected with informative message, not silently dropped
- Agent persona (name, role, persona) is reflected in the LLM system prompt
- Typing indicator (placeholder message) appears before LLM response
- All tests green, no async Celery tasks
</success_criteria>

<output>
After completion, create `.planning/phases/01-foundation/01-03-SUMMARY.md`
</output>