14 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 02-agent-features | 01 | execute | 1 |
|
true |
|
|
Purpose: Transforms the stateless single-turn agent from Phase 1 into one that remembers conversations and user preferences across sessions and channels. Output: Memory modules, DB migration, updated orchestrator pipeline, passing tests.
<execution_context> @/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md @/home/adelorenzo/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/02-agent-features/02-CONTEXT.md @.planning/phases/02-agent-features/02-RESEARCH.md@packages/shared/shared/redis_keys.py @packages/shared/shared/models/tenant.py @packages/shared/shared/models/message.py @packages/shared/shared/rls.py @packages/shared/shared/db.py @packages/orchestrator/orchestrator/tasks.py @packages/orchestrator/orchestrator/agents/builder.py @packages/orchestrator/orchestrator/agents/runner.py @migrations/versions/001_initial_schema.py @tests/conftest.py
From packages/shared/shared/redis_keys.py:
- Existing key constructors: rate_limit_key(), idempotency_key(), session_key(), engaged_thread_key()
- Pattern: def key_name(tenant_id: str, ...) -> str returning "{tenant_id}:namespace:..."
From packages/orchestrator/orchestrator/tasks.py:
- handle_message Celery task (sync def with asyncio.run())
- Receives msg dict from Celery, reconstructs KonstructMessage via model_validate
- Loads agent via load_agent_for_tenant
- Calls run_agent to get LLM response
- Posts response via Slack chat.update
From packages/orchestrator/orchestrator/agents/builder.py:
- build_system_prompt(agent: Agent) -> str
- Assembles system_prompt + identity + persona + AI transparency clause
From packages/orchestrator/orchestrator/agents/runner.py:
- run_agent(msg: KonstructMessage, agent: Agent) -> str
- httpx POST to llm-pool /complete with messages array
2. Extend `packages/shared/shared/redis_keys.py` with:
- memory_short_key(tenant_id, agent_id, user_id) -> "{tenant_id}:memory:short:{agent_id}:{user_id}"
- escalation_status_key(tenant_id, thread_id) -> "{tenant_id}:escalation:{thread_id}"
- pending_tool_confirm_key(tenant_id, thread_id) -> "{tenant_id}:tool_confirm:{thread_id}"
3. Create `packages/orchestrator/orchestrator/memory/short_term.py`:
- async get_recent_messages(redis, tenant_id, agent_id, user_id, n=20) -> list[dict]
- async append_message(redis, tenant_id, agent_id, user_id, role, content, window=20) -> None
- Uses RPUSH + LTRIM pattern. No TTL (indefinite retention per user decision).
4. Create `packages/orchestrator/orchestrator/memory/long_term.py`:
- async retrieve_relevant(session, tenant_id, agent_id, user_id, query_embedding, top_k=3, threshold=0.75) -> list[str]
- async store_embedding(session, tenant_id, agent_id, user_id, content, role, embedding) -> None
- CRITICAL: All queries MUST include WHERE tenant_id = :tenant_id AND agent_id = :agent_id AND user_id = :user_id BEFORE the ANN operator
- Uses raw SQL text() for pgvector operations (cosine distance operator <=>)
5. Create Alembic migration `002_phase2_memory.py`:
- conversation_embeddings table with all columns from the model
- HNSW index: CREATE INDEX ... USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64)
- Covering index on (tenant_id, agent_id, user_id, created_at DESC)
- RLS: ENABLE ROW LEVEL SECURITY, FORCE ROW LEVEL SECURITY
- RLS policy: tenant_id = current_setting('app.current_tenant')::uuid
- GRANT SELECT, INSERT on conversation_embeddings TO konstruct_app (no UPDATE/DELETE — embeddings are immutable like audit)
6. Extend tests/conftest.py with pgvector fixtures (ensure pgvector extension created in test DB).
7. Write unit tests (test_memory_short_term.py) using fakeredis for sliding window operations.
8. Write integration tests (test_memory_long_term.py) using real PostgreSQL with pgvector for embedding storage and retrieval, including a two-tenant cross-contamination test.
cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_memory_short_term.py tests/integration/test_memory_long_term.py -x -v
- ConversationEmbedding ORM model exists with Vector(384) column
- Redis sliding window stores/retrieves messages correctly with tenant+agent+user namespacing
- pgvector similarity search returns relevant content above threshold
- Cross-tenant isolation verified: tenant A's embeddings never returned for tenant B queries
- Alembic migration runs cleanly and creates HNSW index
Task 2: Wire memory into orchestrator pipeline
packages/orchestrator/orchestrator/agents/builder.py,
packages/orchestrator/orchestrator/agents/runner.py,
packages/orchestrator/orchestrator/tasks.py
1. Update `builder.py` — add `build_messages_with_memory()` function:
- Takes: agent, current_message, recent_messages (from Redis), relevant_context (from pgvector)
- Returns: list[dict] formatted as LLM messages array
- Structure: [system_prompt] + [pgvector context as system message: "Relevant context from past conversations: ..."] + [sliding window messages as user/assistant alternation] + [current user message]
- pgvector context injected as a system message BEFORE the sliding window — gives the LLM background without polluting the conversation flow
- If no relevant context found, omit the context system message entirely (don't inject empty context)
2. Update `runner.py` — modify `run_agent()` to accept pre-built messages array:
- Current: builds simple [system, user] messages internally
- New: accept optional `messages` parameter. If provided, use it directly. If not, fall back to existing behavior (backward compat for tests).
- This lets the pipeline pass the memory-enriched messages array
3. Update `tasks.py` — modify `handle_message` Celery task:
- BEFORE LLM call: load recent messages from Redis via get_recent_messages()
- BEFORE LLM call: embed current message text, call retrieve_relevant() for long-term context
- For embedding the query: use sentence-transformers `SentenceTransformer('all-MiniLM-L6-v2').encode()` — load model once at module level (lazy singleton)
- Build messages array via build_messages_with_memory()
- Pass messages to run_agent()
- AFTER LLM response: append both user message and assistant response to Redis sliding window via append_message()
- AFTER LLM response: dispatch embed_and_store.delay() Celery task for async pgvector backfill (fire-and-forget)
- Create embed_and_store Celery task (sync def with asyncio.run()): takes tenant_id, agent_id, user_id, messages list, embeds each, stores via store_embedding()
- The embed_and_store task must use sentence-transformers for embedding (same model as query embedding)
Note: sentence-transformers must be installed. Run `uv add sentence-transformers` in the orchestrator package. If sentence-transformers is too heavy, use the Ollama embedding endpoint via httpx POST to llm-pool (add an /embed endpoint to llm-pool). Use Claude's discretion on which approach is simpler — but the embedding model MUST be all-MiniLM-L6-v2 (384 dimensions) to match the pgvector column width.
CRITICAL constraints:
- All Celery tasks MUST be sync def with asyncio.run() — never async def
- Redis operations use the existing redis.asyncio.Redis client pattern
- DB operations use the existing async SQLAlchemy session pattern with RLS context
cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_memory_short_term.py tests/integration/test_memory_long_term.py -x -v
- handle_message loads sliding window + pgvector context before every LLM call
- LLM prompt includes recent conversation history and relevant past context
- User and assistant messages are appended to Redis after each turn
- Embedding backfill dispatched asynchronously via embed_and_store.delay()
- Existing Slack flow still works end-to-end (backward compatible)
- All existing Phase 1 tests still pass: `pytest tests/ -x`
- Memory unit tests pass: `pytest tests/unit/test_memory_short_term.py -x`
- Memory integration tests pass: `pytest tests/integration/test_memory_long_term.py -x`
- Cross-tenant isolation verified in integration tests
- Migration applies cleanly: `alembic upgrade head`
<success_criteria>
- Agent maintains conversational context within a session via Redis sliding window
- Agent recalls relevant past context across conversations via pgvector retrieval
- Memory is isolated per-user per-agent per-tenant
- Embedding backfill is asynchronous and never blocks the response pipeline </success_criteria>