docs(02-agent-features): create phase plan

2026-03-23 14:23:11 -06:00
parent ac54d819f8
commit 7da5ffb92a
4 changed files with 1029 additions and 0 deletions
--- a/.planning/phases/02-agent-features/02-01-PLAN.md
+++ b/.planning/phases/02-agent-features/02-01-PLAN.md
@@ -0,0 +1,257 @@
+---
+phase: 02-agent-features
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - packages/shared/shared/models/memory.py
+  - packages/shared/shared/redis_keys.py
+  - packages/orchestrator/orchestrator/memory/__init__.py
+  - packages/orchestrator/orchestrator/memory/short_term.py
+  - packages/orchestrator/orchestrator/memory/long_term.py
+  - packages/orchestrator/orchestrator/agents/builder.py
+  - packages/orchestrator/orchestrator/tasks.py
+  - migrations/versions/002_phase2_memory.py
+  - tests/unit/test_memory_short_term.py
+  - tests/integration/test_memory_long_term.py
+  - tests/conftest.py
+autonomous: true
+requirements:
+  - AGNT-02
+  - AGNT-03
+
+must_haves:
+  truths:
+    - "Agent includes the last 20 messages verbatim in the LLM prompt for immediate context"
+    - "Agent retrieves up to 3 semantically relevant past exchanges from pgvector when assembling the prompt"
+    - "Memory is keyed per-user per-agent — different users talking to the same agent have isolated memory"
+    - "Cross-tenant vector search is impossible — tenant_id pre-filter enforced on every pgvector query"
+    - "Embedding backfill runs asynchronously via Celery task — never blocks the LLM response"
+  artifacts:
+    - path: "packages/orchestrator/orchestrator/memory/short_term.py"
+      provides: "Redis sliding window (RPUSH/LTRIM/LRANGE)"
+      exports: ["get_recent_messages", "append_message"]
+    - path: "packages/orchestrator/orchestrator/memory/long_term.py"
+      provides: "pgvector embedding store + HNSW similarity search"
+      exports: ["retrieve_relevant", "store_embedding"]
+    - path: "packages/shared/shared/models/memory.py"
+      provides: "ConversationMessage and ConversationEmbedding ORM models"
+      contains: "class ConversationEmbedding"
+    - path: "migrations/versions/002_phase2_memory.py"
+      provides: "Alembic migration for conversation_embeddings table with HNSW index"
+  key_links:
+    - from: "packages/orchestrator/orchestrator/tasks.py"
+      to: "orchestrator/memory/short_term.py"
+      via: "get_recent_messages + append_message called in handle_message"
+      pattern: "get_recent_messages|append_message"
+    - from: "packages/orchestrator/orchestrator/agents/builder.py"
+      to: "orchestrator/memory/long_term.py"
+      via: "retrieve_relevant called during prompt assembly"
+      pattern: "retrieve_relevant"
+    - from: "packages/orchestrator/orchestrator/tasks.py"
+      to: "embed_and_store Celery task"
+      via: "fire-and-forget delay() after response"
+      pattern: "embed_and_store\\.delay"
+---
+
+<objective>
+Build the two-layer conversational memory system: Redis sliding window for immediate context (last 20 messages) and pgvector long-term storage with HNSW similarity search for cross-conversation recall.
+
+Purpose: Transforms the stateless single-turn agent from Phase 1 into one that remembers conversations and user preferences across sessions and channels.
+Output: Memory modules, DB migration, updated orchestrator pipeline, passing tests.
+</objective>
+
+<execution_context>
+@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
+@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/02-agent-features/02-CONTEXT.md
+@.planning/phases/02-agent-features/02-RESEARCH.md
+
+@packages/shared/shared/redis_keys.py
+@packages/shared/shared/models/tenant.py
+@packages/shared/shared/models/message.py
+@packages/shared/shared/rls.py
+@packages/shared/shared/db.py
+@packages/orchestrator/orchestrator/tasks.py
+@packages/orchestrator/orchestrator/agents/builder.py
+@packages/orchestrator/orchestrator/agents/runner.py
+@migrations/versions/001_initial_schema.py
+@tests/conftest.py
+
+<interfaces>
+<!-- Key types and contracts from Phase 1 codebase that this plan depends on -->
+
+From packages/shared/shared/redis_keys.py:
+- Existing key constructors: rate_limit_key(), idempotency_key(), session_key(), engaged_thread_key()
+- Pattern: def key_name(tenant_id: str, ...) -> str returning "{tenant_id}:namespace:..."
+
+From packages/orchestrator/orchestrator/tasks.py:
+- handle_message Celery task (sync def with asyncio.run())
+- Receives msg dict from Celery, reconstructs KonstructMessage via model_validate
+- Loads agent via load_agent_for_tenant
+- Calls run_agent to get LLM response
+- Posts response via Slack chat.update
+
+From packages/orchestrator/orchestrator/agents/builder.py:
+- build_system_prompt(agent: Agent) -> str
+- Assembles system_prompt + identity + persona + AI transparency clause
+
+From packages/orchestrator/orchestrator/agents/runner.py:
+- run_agent(msg: KonstructMessage, agent: Agent) -> str
+- httpx POST to llm-pool /complete with messages array
+</interfaces>
+</context>
+
+<tasks>
+
+<task type="auto" tdd="true">
+  <name>Task 1: DB models, migration, and memory modules with tests</name>
+  <files>
+    packages/shared/shared/models/memory.py,
+    packages/shared/shared/redis_keys.py,
+    packages/orchestrator/orchestrator/memory/__init__.py,
+    packages/orchestrator/orchestrator/memory/short_term.py,
+    packages/orchestrator/orchestrator/memory/long_term.py,
+    migrations/versions/002_phase2_memory.py,
+    tests/unit/test_memory_short_term.py,
+    tests/integration/test_memory_long_term.py,
+    tests/conftest.py
+  </files>
+  <behavior>
+    - get_recent_messages returns last N messages from Redis as list[dict] with role/content keys
+    - append_message adds a message and trims the list to window size (20)
+    - append_message with window=5 keeps only last 5 messages (parameterized trim)
+    - get_recent_messages on empty key returns empty list
+    - Memory keys are namespaced: {tenant_id}:memory:short:{agent_id}:{user_id}
+    - retrieve_relevant returns top-K content strings above cosine similarity threshold
+    - retrieve_relevant with tenant_id=A never returns tenant_id=B embeddings (cross-tenant isolation)
+    - retrieve_relevant with threshold=0.99 and dissimilar query returns empty list
+    - store_embedding inserts a row into conversation_embeddings with correct tenant/agent/user scoping
+    - ConversationEmbedding model has: id, tenant_id, agent_id, user_id, content, role, embedding (vector 384), created_at
+    - Migration creates conversation_embeddings table with HNSW index and RLS policy
+  </behavior>
+  <action>
+    1. Create `packages/shared/shared/models/memory.py` with SQLAlchemy 2.0 `Mapped[]` style:
+       - ConversationEmbedding: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID NOT NULL), user_id (TEXT NOT NULL), content (TEXT NOT NULL), role (TEXT NOT NULL), embedding (Vector(384) NOT NULL), created_at (TIMESTAMPTZ, server_default=now())
+       - Use `from pgvector.sqlalchemy import Vector` for the embedding column
+       - RLS policy follows existing pattern from tenant.py
+
+    2. Extend `packages/shared/shared/redis_keys.py` with:
+       - memory_short_key(tenant_id, agent_id, user_id) -> "{tenant_id}:memory:short:{agent_id}:{user_id}"
+       - escalation_status_key(tenant_id, thread_id) -> "{tenant_id}:escalation:{thread_id}"
+       - pending_tool_confirm_key(tenant_id, thread_id) -> "{tenant_id}:tool_confirm:{thread_id}"
+
+    3. Create `packages/orchestrator/orchestrator/memory/short_term.py`:
+       - async get_recent_messages(redis, tenant_id, agent_id, user_id, n=20) -> list[dict]
+       - async append_message(redis, tenant_id, agent_id, user_id, role, content, window=20) -> None
+       - Uses RPUSH + LTRIM pattern. No TTL (indefinite retention per user decision).
+
+    4. Create `packages/orchestrator/orchestrator/memory/long_term.py`:
+       - async retrieve_relevant(session, tenant_id, agent_id, user_id, query_embedding, top_k=3, threshold=0.75) -> list[str]
+       - async store_embedding(session, tenant_id, agent_id, user_id, content, role, embedding) -> None
+       - CRITICAL: All queries MUST include WHERE tenant_id = :tenant_id AND agent_id = :agent_id AND user_id = :user_id BEFORE the ANN operator
+       - Uses raw SQL text() for pgvector operations (cosine distance operator <=>)
+
+    5. Create Alembic migration `002_phase2_memory.py`:
+       - conversation_embeddings table with all columns from the model
+       - HNSW index: CREATE INDEX ... USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64)
+       - Covering index on (tenant_id, agent_id, user_id, created_at DESC)
+       - RLS: ENABLE ROW LEVEL SECURITY, FORCE ROW LEVEL SECURITY
+       - RLS policy: tenant_id = current_setting('app.current_tenant')::uuid
+       - GRANT SELECT, INSERT on conversation_embeddings TO konstruct_app (no UPDATE/DELETE — embeddings are immutable like audit)
+
+    6. Extend tests/conftest.py with pgvector fixtures (ensure pgvector extension created in test DB).
+
+    7. Write unit tests (test_memory_short_term.py) using fakeredis for sliding window operations.
+
+    8. Write integration tests (test_memory_long_term.py) using real PostgreSQL with pgvector for embedding storage and retrieval, including a two-tenant cross-contamination test.
+  </action>
+  <verify>
+    <automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_memory_short_term.py tests/integration/test_memory_long_term.py -x -v</automated>
+  </verify>
+  <done>
+    - ConversationEmbedding ORM model exists with Vector(384) column
+    - Redis sliding window stores/retrieves messages correctly with tenant+agent+user namespacing
+    - pgvector similarity search returns relevant content above threshold
+    - Cross-tenant isolation verified: tenant A's embeddings never returned for tenant B queries
+    - Alembic migration runs cleanly and creates HNSW index
+  </done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Wire memory into orchestrator pipeline</name>
+  <files>
+    packages/orchestrator/orchestrator/agents/builder.py,
+    packages/orchestrator/orchestrator/agents/runner.py,
+    packages/orchestrator/orchestrator/tasks.py
+  </files>
+  <action>
+    1. Update `builder.py` — add `build_messages_with_memory()` function:
+       - Takes: agent, current_message, recent_messages (from Redis), relevant_context (from pgvector)
+       - Returns: list[dict] formatted as LLM messages array
+       - Structure: [system_prompt] + [pgvector context as system message: "Relevant context from past conversations: ..."] + [sliding window messages as user/assistant alternation] + [current user message]
+       - pgvector context injected as a system message BEFORE the sliding window — gives the LLM background without polluting the conversation flow
+       - If no relevant context found, omit the context system message entirely (don't inject empty context)
+
+    2. Update `runner.py` — modify `run_agent()` to accept pre-built messages array:
+       - Current: builds simple [system, user] messages internally
+       - New: accept optional `messages` parameter. If provided, use it directly. If not, fall back to existing behavior (backward compat for tests).
+       - This lets the pipeline pass the memory-enriched messages array
+
+    3. Update `tasks.py` — modify `handle_message` Celery task:
+       - BEFORE LLM call: load recent messages from Redis via get_recent_messages()
+       - BEFORE LLM call: embed current message text, call retrieve_relevant() for long-term context
+       - For embedding the query: use sentence-transformers `SentenceTransformer('all-MiniLM-L6-v2').encode()` — load model once at module level (lazy singleton)
+       - Build messages array via build_messages_with_memory()
+       - Pass messages to run_agent()
+       - AFTER LLM response: append both user message and assistant response to Redis sliding window via append_message()
+       - AFTER LLM response: dispatch embed_and_store.delay() Celery task for async pgvector backfill (fire-and-forget)
+       - Create embed_and_store Celery task (sync def with asyncio.run()): takes tenant_id, agent_id, user_id, messages list, embeds each, stores via store_embedding()
+       - The embed_and_store task must use sentence-transformers for embedding (same model as query embedding)
+
+    Note: sentence-transformers must be installed. Run `uv add sentence-transformers` in the orchestrator package. If sentence-transformers is too heavy, use the Ollama embedding endpoint via httpx POST to llm-pool (add an /embed endpoint to llm-pool). Use Claude's discretion on which approach is simpler — but the embedding model MUST be all-MiniLM-L6-v2 (384 dimensions) to match the pgvector column width.
+
+    CRITICAL constraints:
+    - All Celery tasks MUST be sync def with asyncio.run() — never async def
+    - Redis operations use the existing redis.asyncio.Redis client pattern
+    - DB operations use the existing async SQLAlchemy session pattern with RLS context
+  </action>
+  <verify>
+    <automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_memory_short_term.py tests/integration/test_memory_long_term.py -x -v</automated>
+  </verify>
+  <done>
+    - handle_message loads sliding window + pgvector context before every LLM call
+    - LLM prompt includes recent conversation history and relevant past context
+    - User and assistant messages are appended to Redis after each turn
+    - Embedding backfill dispatched asynchronously via embed_and_store.delay()
+    - Existing Slack flow still works end-to-end (backward compatible)
+  </done>
+</task>
+
+</tasks>
+
+<verification>
+- All existing Phase 1 tests still pass: `pytest tests/ -x`
+- Memory unit tests pass: `pytest tests/unit/test_memory_short_term.py -x`
+- Memory integration tests pass: `pytest tests/integration/test_memory_long_term.py -x`
+- Cross-tenant isolation verified in integration tests
+- Migration applies cleanly: `alembic upgrade head`
+</verification>
+
+<success_criteria>
+- Agent maintains conversational context within a session via Redis sliding window
+- Agent recalls relevant past context across conversations via pgvector retrieval
+- Memory is isolated per-user per-agent per-tenant
+- Embedding backfill is asynchronous and never blocks the response pipeline
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/02-agent-features/02-01-SUMMARY.md`
+</output>