Phase 2: Agent Features - Context

Gathered: 2026-03-23 Status: Ready for planning

## Phase Boundary

The AI employee maintains conversation memory, can execute tools, handles WhatsApp messages, and escalates to humans when rules trigger. Includes conversational memory (sliding window + pgvector long-term), tool framework with 4 built-in tools, WhatsApp Business Cloud API integration with Meta 2026 policy compliance, human escalation/handoff, audit logging, and bidirectional media support across all channels.

## Implementation Decisions

Conversational Memory

Full conversation history stored in pgvector — no messages are dropped
Vector retrieval surfaces relevant past context (not full history dump) when assembling LLM prompt
Cross-conversation memory — agent remembers user preferences and context across separate conversations
Memory keyed per-user per-agent — memory follows the user across channels (same agent remembers you in Slack and WhatsApp)
Indefinite retention — memory never expires. Operator can purge manually if needed.
Sliding window for immediate context (last N messages verbatim in prompt), vector search for older/cross-conversation context

Tool Framework

4 built-in tools for v1: web search, knowledge base search, HTTP request, calendar lookup
Knowledge base content populated via both file upload (PDFs, docs) and URL ingestion (crawl, chunk, embed)
Seamless tool usage — agent incorporates tool results naturally into responses, does NOT announce "let me look that up"
Always confirm before consequential actions — agent asks user before booking calendar slots, sending HTTP requests, or any action with side effects. Read-only tools (search, KB lookup) execute without confirmation.
Tool invocations are schema-validated before execution — prevents prompt injection into tool arguments
Every tool invocation logged in audit trail

Human Escalation

Handoff destination: DM to an assigned human (configured per tenant in Agent Designer)
Full conversation transcript included in the DM — human sees complete context
Agent stays in the thread as assistant after escalation — can provide context to the human if asked, but defers to the human for responses to the end user
Natural language escalation ("can I talk to a human?") is configurable per tenant — operator enables/disables in Agent Designer
Configurable rule-based escalation triggers (from Agent Designer: e.g., failed resolution attempts, billing disputes)

WhatsApp Interaction Model

Same persona across Slack and WhatsApp — consistent employee identity, same name, tone, and behavior
Business-function scoping: layered enforcement — explicit allowlist as first gate (canned rejection for clearly off-topic, no LLM call), then role-based LLM handling for edge cases
Operator defines allowed business functions in Agent Designer (e.g., "order tracking", "returns", "support")
Off-topic messages on WhatsApp get a polite redirect: "[Name] is here to help with [allowed topics]. How can I assist you with one of those?"

Media Support (All Channels)

Bidirectional media support across Slack and WhatsApp
Agent can RECEIVE images and documents (invoices, receipts, screenshots) and interpret them via multimodal LLM
Agent can SEND images and documents back to users (templates, reports, generated files)
Use case examples: user sends an invoice photo, agent extracts line items; agent sends a company expense report template
KonstructMessage format must be extended to handle media attachments (image URLs, file references)
Media stored in MinIO (self-hosted) / S3 with per-tenant isolation

Audit Logging

Every LLM call, tool invocation, and handoff event recorded in immutable audit trail
Audit entries include: timestamp, tenant_id, agent_id, user_id, action_type, input/output summary, latency
Queryable by tenant — operators can review agent actions
Audit data feeds into Phase 3 cost tracking dashboard

Claude's Discretion

Sliding window size (how many recent messages kept verbatim)
Vector similarity threshold for memory retrieval
KB chunking strategy and embedding model choice
Calendar lookup integration approach (Google Calendar API vs generic iCal)
Web search provider (Brave Search API, SerpAPI, etc.)
HTTP request tool: timeout limits, allowed methods, response size caps
WhatsApp message template format for outbound media
Audit log storage strategy (PostgreSQL table vs append-only log)

## Specific Ideas

Agents should feel like they genuinely remember you — not "according to my records" but natural recall like a real colleague would have
Invoice/document interpretation is a key use case for SMBs — the agent reading a photo of a receipt and extracting amounts is a powerful demo
The "employee" metaphor extends to escalation — it should feel like "let me get my manager" not "transferring to support"
Tool confirmation should feel conversational: "I found a slot at 2pm Thursday — shall I book it?" not "Confirm action: calendar.book()"

<code_context>

Existing Code Insights

Reusable Assets

packages/shared/shared/models/tenant.py:Agent — Agent model with name, role, persona, system_prompt, escalation_rules, model_preference fields
packages/shared/shared/models/message.py:KonstructMessage — Unified message format (needs extension for media attachments)
packages/orchestrator/orchestrator/tasks.py:handle_message — Celery task entry point (sync def with asyncio.run)
packages/orchestrator/orchestrator/agents/builder.py:build_system_prompt — System prompt assembly with AI transparency clause
packages/orchestrator/orchestrator/agents/runner.py:run_agent — LLM call via httpx to llm-pool
packages/gateway/gateway/normalize.py:normalize_slack_event — Slack normalization (pattern for WhatsApp normalizer)
packages/gateway/gateway/channels/slack.py — Slack adapter (pattern for WhatsApp adapter)
packages/router/router/ratelimit.py — Token bucket rate limiter (reuse for WhatsApp)
packages/router/router/idempotency.py — Idempotency dedup (reuse for WhatsApp webhooks)
packages/shared/shared/redis_keys.py — Tenant-namespaced Redis key constructors
packages/shared/shared/rls.py — RLS context hook (all new tables must use this)

Established Patterns

Celery tasks MUST be sync def with asyncio.run() — never async def
RLS via SET LOCAL app.current_tenant in before_cursor_execute hook
Redis keys always namespaced by tenant_id
Channel adapters normalize to KonstructMessage before any business logic
Placeholder message → async process → chat.update pattern for typing indicator

Integration Points

Orchestrator handle_message task needs: memory retrieval before LLM call, tool dispatch loop, escalation check, audit logging
Gateway needs new WhatsApp adapter alongside existing Slack adapter
Agent Designer in portal needs: allowed business functions field, escalation assignee field, tool configuration
LLM pool needs multimodal support (image URLs in messages) for document interpretation

</code_context>

## Deferred Ideas

None — discussion stayed within phase scope

Phase: 02-agent-features Context gathered: 2026-03-23

7.3 KiB Raw Blame History