16 KiB
16 KiB
phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 02-agent-features | 02 | execute | 2 |
|
|
true |
|
|
Purpose: Gives the AI employee the ability to take actions (search, look up info, make requests) and creates the compliance-ready audit trail for all agent activity. Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, updated runner with tool loop, passing tests.
<execution_context> @/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md @/home/adelorenzo/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/02-agent-features/02-CONTEXT.md @.planning/phases/02-agent-features/02-RESEARCH.md @.planning/phases/02-agent-features/02-01-SUMMARY.md@packages/orchestrator/orchestrator/agents/runner.py @packages/orchestrator/orchestrator/tasks.py @packages/shared/shared/models/tenant.py @packages/shared/shared/rls.py @packages/shared/shared/db.py @migrations/versions/002_phase2_memory.py
Task 1: Audit model, KB model, migration, and audit logger with tests packages/shared/shared/models/audit.py, packages/shared/shared/models/kb.py, packages/orchestrator/orchestrator/audit/__init__.py, packages/orchestrator/orchestrator/audit/logger.py, migrations/versions/003_phase2_audit_kb.py, tests/integration/test_audit.py - AuditEvent has id, tenant_id, agent_id, user_id, action_type, input_summary, output_summary, latency_ms, metadata (JSONB), created_at - AuditLogger.log_tool_call writes a row to audit_events with action_type='tool_invocation' - AuditLogger.log_llm_call writes a row with action_type='llm_call' including latency_ms - AuditLogger.log_escalation writes a row with action_type='escalation' - audit_events table rejects UPDATE and DELETE from konstruct_app role - audit_events are tenant-scoped via RLS - KBChunk model has id, tenant_id, document_id, content, embedding (Vector(384)), chunk_index, created_at - Migration creates both audit_events and kb tables with appropriate indexes and RLS 1. Create `packages/shared/shared/models/audit.py`: - AuditEvent: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID), user_id (TEXT), action_type (TEXT NOT NULL -- 'llm_call' | 'tool_invocation' | 'escalation'), input_summary (TEXT), output_summary (TEXT), latency_ms (INTEGER), metadata (JSONB, default={}), created_at (TIMESTAMPTZ, server_default=now()) - RLS enabled + forced, same pattern as other tenant-scoped tables2. Create `packages/shared/shared/models/kb.py`:
- KnowledgeBaseDocument: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID NOT NULL), filename (TEXT), source_url (TEXT), content_type (TEXT), created_at
- KBChunk: id (UUID PK), tenant_id (UUID NOT NULL), document_id (UUID FK), content (TEXT NOT NULL), embedding (Vector(384) NOT NULL), chunk_index (INTEGER), created_at
- RLS on both tables
3. Create Alembic migration `003_phase2_audit_kb.py`:
- audit_events table with all columns, index on (tenant_id, created_at DESC), RLS
- REVOKE UPDATE, DELETE ON audit_events FROM konstruct_app -- immutability enforced at DB level
- kb_documents and kb_chunks tables, HNSW index on kb_chunks embedding, RLS
- GRANT SELECT, INSERT on audit_events TO konstruct_app
- GRANT SELECT, INSERT, UPDATE, DELETE on kb_documents and kb_chunks TO konstruct_app
4. Create `packages/orchestrator/orchestrator/audit/logger.py`:
- AuditLogger class initialized with async session factory
- async log_llm_call(tenant_id, agent_id, user_id, input_summary, output_summary, latency_ms, metadata={})
- async log_tool_call(tool_name, args, result, tenant_id, agent_id, latency_ms, error=None)
- async log_escalation(tenant_id, agent_id, user_id, trigger_reason, metadata={})
- All methods write to audit_events table with RLS context set
5. Write integration tests (test_audit.py):
- Test that audit events are written to DB with correct fields
- Test that UPDATE/DELETE is rejected (expect error)
- Test RLS isolation between tenants
cd /home/adelorenzo/repos/konstruct && python -m pytest tests/integration/test_audit.py -x -v
- AuditEvent and KB ORM models exist with correct schema
- Audit events written to DB for LLM calls, tool invocations, and escalations
- audit_events immutability enforced (UPDATE/DELETE rejected at DB level)
- RLS isolates audit data per tenant
- Migration applies cleanly with both audit and KB tables
Task 2: Tool registry, executor, and 4 built-in tools with tests
packages/orchestrator/orchestrator/tools/__init__.py,
packages/orchestrator/orchestrator/tools/registry.py,
packages/orchestrator/orchestrator/tools/executor.py,
packages/orchestrator/orchestrator/tools/builtins/__init__.py,
packages/orchestrator/orchestrator/tools/builtins/web_search.py,
packages/orchestrator/orchestrator/tools/builtins/kb_search.py,
packages/orchestrator/orchestrator/tools/builtins/http_request.py,
packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py,
tests/unit/test_tool_registry.py,
tests/unit/test_tool_executor.py
- ToolDefinition has name, description, parameters (JSON Schema), requires_confirmation, handler
- BUILTIN_TOOLS contains 4 tools: web_search, kb_search, http_request, calendar_lookup
- get_tools_for_agent filters BUILTIN_TOOLS by agent's configured tool list
- execute_tool validates args against tool's JSON schema before calling handler
- execute_tool with invalid args returns error string and logs the failure
- execute_tool with unknown tool name raises ValueError
- execute_tool with requires_confirmation=True returns a confirmation request instead of executing
- web_search tool calls Brave Search API and returns structured results
- kb_search tool queries pgvector knowledge base (kb_chunks table)
- http_request tool makes outbound HTTP with timeout (30s), size cap (1MB), allowed methods (GET/POST/PUT/DELETE)
- calendar_lookup tool queries Google Calendar events.list for availability
1. Create `packages/orchestrator/orchestrator/tools/registry.py`:
- ToolDefinition Pydantic model: name, description, parameters (dict -- JSON Schema), requires_confirmation (bool, default False), handler (Any, excluded from serialization)
- BUILTIN_TOOLS: dict[str, ToolDefinition] with 4 tools
- get_tools_for_agent(agent: Agent) -> dict[str, ToolDefinition]: filters by agent.tools list
- to_litellm_format(tools: dict) -> list[dict]: converts to OpenAI function-calling schema for LiteLLM
2. Create `packages/orchestrator/orchestrator/tools/executor.py`:
- async execute_tool(tool_call: dict, registry: dict, tenant_id, agent_id, audit_logger) -> str
- Validates args via jsonschema.validate() BEFORE calling handler (LLM output is untrusted)
- If requires_confirmation is True, return a confirmation message string instead of executing
- Logs every invocation (success or failure) to audit trail
- Install jsonschema: `uv add jsonschema` in orchestrator package
3. Create 4 built-in tool handlers in `tools/builtins/`:
- web_search.py: async web_search(query: str) -> str. Uses Brave Search API via httpx. Env var: BRAVE_API_KEY. Returns top 3 results formatted as text.
- kb_search.py: async kb_search(query: str, tenant_id: str, agent_id: str) -> str. Embeds query, searches kb_chunks via pgvector. Returns top 3 matching chunks as text.
- http_request.py: async http_request(url: str, method: str = "GET", body: str | None = None) -> str. Timeout 30s, response size cap 1MB, allowed methods GET/POST/PUT/DELETE. requires_confirmation=True.
- calendar_lookup.py: async calendar_lookup(date: str, calendar_id: str = "primary") -> str. Uses google-api-python-client events.list(). Requires GOOGLE_SERVICE_ACCOUNT_KEY env var or per-tenant OAuth. Returns formatted availability. requires_confirmation=False (read-only).
4. Write unit tests:
- test_tool_registry.py: test tool lookup, filtering by agent, LiteLLM format conversion
- test_tool_executor.py: test schema validation (valid args pass, invalid rejected), confirmation flow, unknown tool error, audit logging called (mock audit_logger)
cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py -x -v
- 4 built-in tools registered with JSON Schema definitions
- Tool executor validates args and rejects invalid input
- Confirmation-required tools return confirmation message instead of executing
- Tool registry converts to LiteLLM function-calling format
- All unit tests pass
Task 3: Wire tool-call loop into agent runner and orchestrator pipeline
packages/orchestrator/orchestrator/agents/runner.py,
packages/orchestrator/orchestrator/tasks.py
1. Update `runner.py` -- implement tool-call loop:
- After LLM response, check if response contains `tool_calls` array (LiteLLM returns this in OpenAI format)
- If tool_calls present: for each tool call, dispatch to execute_tool()
- If tool requires confirmation: stop the loop, return the confirmation message to the user, store pending action in Redis (pending_tool_confirm_key)
- If tool executed: append tool result as a `tool` role message, re-call LLM with updated messages
- Loop until LLM returns plain text (no tool_calls) or max iterations reached (default: 5)
- Max iteration guard prevents runaway tool chains
- Pass AuditLogger instance through the loop for logging each LLM call and tool call
2. Update `tasks.py`:
- Initialize AuditLogger at task start with session factory
- Pass audit_logger and tool registry to run_agent
- Log initial LLM call and final response via audit_logger.log_llm_call()
- Handle pending tool confirmation: check pending_tool_confirm_key in Redis at start of handle_message. If pending, check if current message is a confirmation (yes/no). If yes, execute the pending tool and continue. If no, cancel and respond.
- The tool definitions are passed to LiteLLM via the `tools` parameter in the /complete request to llm-pool. Update the llm-pool /complete endpoint to forward `tools` parameter to litellm.acompletion() if present.
3. Update llm-pool /complete endpoint:
- Accept optional `tools` parameter in request body
- Forward to litellm.acompletion(tools=tools) when present
- Return tool_calls in response when LLM produces them
CRITICAL: The tool loop happens inside the Celery task (sync context with asyncio.run). Each iteration of the loop is an async function call within the same asyncio.run() block. Do NOT dispatch separate Celery tasks for tool execution -- it all happens in one task invocation.
Seamless tool usage per user decision: The agent's system prompt should NOT include instructions like "announce when using tools." The tool results are injected as context and the LLM naturally incorporates them. The confirmation flow is the only user-visible tool interaction.
cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py tests/integration/test_audit.py -x -v
- Agent runner supports multi-turn tool-call loop (reason -> tool -> observe -> respond)
- Tool calls are bounded at 5 iterations maximum
- Confirmation-required tools pause and await user response
- Every LLM call and tool invocation logged to audit trail
- llm-pool forwards tools parameter to LiteLLM
- Existing memory pipeline from Plan 01 still works (no regression)
- All Phase 1 + Plan 01 tests still pass: `pytest tests/ -x`
- Tool tests pass: `pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py -x`
- Audit integration tests pass: `pytest tests/integration/test_audit.py -x`
- Migration applies cleanly: `alembic upgrade head`
<success_criteria>
- Agent can invoke tools during conversation and incorporate results naturally
- Tool arguments are validated against JSON Schema before execution
- Confirmation-required tools pause for user approval
- Every agent action is recorded in immutable, tenant-scoped audit trail
- Audit entries cannot be modified or deleted at the database level </success_criteria>