konstruct/.planning/phases/02-agent-features/02-02-PLAN.md at 66bc460a7aecaa3686ab31faa4a10ed26347aabb

adelorenzo/konstruct

Fork 0

Files

Adolfo Delorenzo b2e86f1046 fix(02-agent-features): revise plans based on checker feedback

2026-03-23 14:32:20 -06:00

16 KiB

Raw Blame History

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves

phase

plan

type

wave

depends_on

files_modified

autonomous

requirements

must_haves

02-agent-features

execute

02-01

packages/shared/shared/models/audit.py

packages/shared/shared/models/kb.py

packages/orchestrator/orchestrator/audit/__init__.py

packages/orchestrator/orchestrator/audit/logger.py

packages/orchestrator/orchestrator/tools/__init__.py

packages/orchestrator/orchestrator/tools/registry.py

packages/orchestrator/orchestrator/tools/executor.py

packages/orchestrator/orchestrator/tools/builtins/__init__.py

packages/orchestrator/orchestrator/tools/builtins/web_search.py

packages/orchestrator/orchestrator/tools/builtins/kb_search.py

packages/orchestrator/orchestrator/tools/builtins/http_request.py

packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py

packages/orchestrator/orchestrator/agents/runner.py

packages/orchestrator/orchestrator/tasks.py

migrations/versions/003_phase2_audit_kb.py

tests/unit/test_tool_registry.py

tests/unit/test_tool_executor.py

tests/integration/test_audit.py

true

AGNT-04

AGNT-06

truths

artifacts

key_links

Agent can invoke a registered tool and incorporate the result into its response

Tool arguments are schema-validated before execution — invalid args rejected with error message

Tools requiring confirmation pause the loop and ask the user before executing

Every LLM call, tool invocation, and handoff event is recorded in an immutable audit trail

Audit entries cannot be updated or deleted by the application role

Audit trail is queryable by tenant via RLS

path

provides

exports

packages/orchestrator/orchestrator/tools/registry.py

ToolDefinition model + BUILTIN_TOOLS mapping

ToolDefinition

BUILTIN_TOOLS

get_tools_for_agent

path

provides

exports

packages/orchestrator/orchestrator/tools/executor.py

Schema-validated tool execution with audit logging

execute_tool

path

provides

exports

packages/orchestrator/orchestrator/audit/logger.py

Immutable audit event writer

AuditLogger

path	provides	contains
packages/shared/shared/models/audit.py	AuditEvent ORM model	class AuditEvent

path	provides
migrations/versions/003_phase2_audit_kb.py	Migration for audit_events and kb tables with REVOKE UPDATE/DELETE

from	to	via	pattern
packages/orchestrator/orchestrator/agents/runner.py	orchestrator/tools/executor.py	tool-call loop: LLM returns tool_calls -> execute -> re-call LLM	execute_tool\|tool_calls

from	to	via	pattern
packages/orchestrator/orchestrator/tools/executor.py	orchestrator/audit/logger.py	log_tool_call after every tool execution	audit_logger.log

from	to	via	pattern
packages/orchestrator/orchestrator/tasks.py	orchestrator/audit/logger.py	log_llm_call after every LLM invocation	audit_logger.log

Build the tool framework (registry, schema-validated executor, 4 built-in tools) and immutable audit logging system. Wire the tool-call loop into the agent runner so the LLM can reason, call tools, observe results, and respond.

Purpose: Gives the AI employee the ability to take actions (search, look up info, make requests) and creates the compliance-ready audit trail for all agent activity. Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, updated runner with tool loop, passing tests.

<execution_context> @/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md @/home/adelorenzo/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/02-agent-features/02-CONTEXT.md @.planning/phases/02-agent-features/02-RESEARCH.md @.planning/phases/02-agent-features/02-01-SUMMARY.md

@packages/orchestrator/orchestrator/agents/runner.py @packages/orchestrator/orchestrator/tasks.py @packages/shared/shared/models/tenant.py @packages/shared/shared/rls.py @packages/shared/shared/db.py @migrations/versions/002_phase2_memory.py

Task 1: Audit model, KB model, migration, and audit logger with tests packages/shared/shared/models/audit.py, packages/shared/shared/models/kb.py, packages/orchestrator/orchestrator/audit/__init__.py, packages/orchestrator/orchestrator/audit/logger.py, migrations/versions/003_phase2_audit_kb.py, tests/integration/test_audit.py - AuditEvent has id, tenant_id, agent_id, user_id, action_type, input_summary, output_summary, latency_ms, metadata (JSONB), created_at - AuditLogger.log_tool_call writes a row to audit_events with action_type='tool_invocation' - AuditLogger.log_llm_call writes a row with action_type='llm_call' including latency_ms - AuditLogger.log_escalation writes a row with action_type='escalation' - audit_events table rejects UPDATE and DELETE from konstruct_app role - audit_events are tenant-scoped via RLS - KBChunk model has id, tenant_id, document_id, content, embedding (Vector(384)), chunk_index, created_at - Migration creates both audit_events and kb tables with appropriate indexes and RLS 1. Create `packages/shared/shared/models/audit.py`: - AuditEvent: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID), user_id (TEXT), action_type (TEXT NOT NULL -- 'llm_call' | 'tool_invocation' | 'escalation'), input_summary (TEXT), output_summary (TEXT), latency_ms (INTEGER), metadata (JSONB, default={}), created_at (TIMESTAMPTZ, server_default=now()) - RLS enabled + forced, same pattern as other tenant-scoped tables

2. Create `packages/shared/shared/models/kb.py`:
   - KnowledgeBaseDocument: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID NOT NULL), filename (TEXT), source_url (TEXT), content_type (TEXT), created_at
   - KBChunk: id (UUID PK), tenant_id (UUID NOT NULL), document_id (UUID FK), content (TEXT NOT NULL), embedding (Vector(384) NOT NULL), chunk_index (INTEGER), created_at
   - RLS on both tables

3. Create Alembic migration `003_phase2_audit_kb.py`:
   - audit_events table with all columns, index on (tenant_id, created_at DESC), RLS
   - REVOKE UPDATE, DELETE ON audit_events FROM konstruct_app -- immutability enforced at DB level
   - kb_documents and kb_chunks tables, HNSW index on kb_chunks embedding, RLS
   - GRANT SELECT, INSERT on audit_events TO konstruct_app
   - GRANT SELECT, INSERT, UPDATE, DELETE on kb_documents and kb_chunks TO konstruct_app

4. Create `packages/orchestrator/orchestrator/audit/logger.py`:
   - AuditLogger class initialized with async session factory
   - async log_llm_call(tenant_id, agent_id, user_id, input_summary, output_summary, latency_ms, metadata={})
   - async log_tool_call(tool_name, args, result, tenant_id, agent_id, latency_ms, error=None)
   - async log_escalation(tenant_id, agent_id, user_id, trigger_reason, metadata={})
   - All methods write to audit_events table with RLS context set

5. Write integration tests (test_audit.py):
   - Test that audit events are written to DB with correct fields
   - Test that UPDATE/DELETE is rejected (expect error)
   - Test RLS isolation between tenants

cd /home/adelorenzo/repos/konstruct && python -m pytest tests/integration/test_audit.py -x -v - AuditEvent and KB ORM models exist with correct schema - Audit events written to DB for LLM calls, tool invocations, and escalations - audit_events immutability enforced (UPDATE/DELETE rejected at DB level) - RLS isolates audit data per tenant - Migration applies cleanly with both audit and KB tables Task 2: Tool registry, executor, and 4 built-in tools with tests packages/orchestrator/orchestrator/tools/__init__.py, packages/orchestrator/orchestrator/tools/registry.py, packages/orchestrator/orchestrator/tools/executor.py, packages/orchestrator/orchestrator/tools/builtins/__init__.py, packages/orchestrator/orchestrator/tools/builtins/web_search.py, packages/orchestrator/orchestrator/tools/builtins/kb_search.py, packages/orchestrator/orchestrator/tools/builtins/http_request.py, packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py, tests/unit/test_tool_registry.py, tests/unit/test_tool_executor.py - ToolDefinition has name, description, parameters (JSON Schema), requires_confirmation, handler - BUILTIN_TOOLS contains 4 tools: web_search, kb_search, http_request, calendar_lookup - get_tools_for_agent filters BUILTIN_TOOLS by agent's configured tool list - execute_tool validates args against tool's JSON schema before calling handler - execute_tool with invalid args returns error string and logs the failure - execute_tool with unknown tool name raises ValueError - execute_tool with requires_confirmation=True returns a confirmation request instead of executing - web_search tool calls Brave Search API and returns structured results - kb_search tool queries pgvector knowledge base (kb_chunks table) - http_request tool makes outbound HTTP with timeout (30s), size cap (1MB), allowed methods (GET/POST/PUT/DELETE) - calendar_lookup tool queries Google Calendar events.list for availability 1. Create `packages/orchestrator/orchestrator/tools/registry.py`: - ToolDefinition Pydantic model: name, description, parameters (dict -- JSON Schema), requires_confirmation (bool, default False), handler (Any, excluded from serialization) - BUILTIN_TOOLS: dict[str, ToolDefinition] with 4 tools - get_tools_for_agent(agent: Agent) -> dict[str, ToolDefinition]: filters by agent.tools list - to_litellm_format(tools: dict) -> list[dict]: converts to OpenAI function-calling schema for LiteLLM

2. Create `packages/orchestrator/orchestrator/tools/executor.py`:
   - async execute_tool(tool_call: dict, registry: dict, tenant_id, agent_id, audit_logger) -> str
   - Validates args via jsonschema.validate() BEFORE calling handler (LLM output is untrusted)
   - If requires_confirmation is True, return a confirmation message string instead of executing
   - Logs every invocation (success or failure) to audit trail
   - Install jsonschema: `uv add jsonschema` in orchestrator package

3. Create 4 built-in tool handlers in `tools/builtins/`:
   - web_search.py: async web_search(query: str) -> str. Uses Brave Search API via httpx. Env var: BRAVE_API_KEY. Returns top 3 results formatted as text.
   - kb_search.py: async kb_search(query: str, tenant_id: str, agent_id: str) -> str. Embeds query, searches kb_chunks via pgvector. Returns top 3 matching chunks as text.
   - http_request.py: async http_request(url: str, method: str = "GET", body: str | None = None) -> str. Timeout 30s, response size cap 1MB, allowed methods GET/POST/PUT/DELETE. requires_confirmation=True.
   - calendar_lookup.py: async calendar_lookup(date: str, calendar_id: str = "primary") -> str. Uses google-api-python-client events.list(). Requires GOOGLE_SERVICE_ACCOUNT_KEY env var or per-tenant OAuth. Returns formatted availability. requires_confirmation=False (read-only).

4. Write unit tests:
   - test_tool_registry.py: test tool lookup, filtering by agent, LiteLLM format conversion
   - test_tool_executor.py: test schema validation (valid args pass, invalid rejected), confirmation flow, unknown tool error, audit logging called (mock audit_logger)

cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py -x -v - 4 built-in tools registered with JSON Schema definitions - Tool executor validates args and rejects invalid input - Confirmation-required tools return confirmation message instead of executing - Tool registry converts to LiteLLM function-calling format - All unit tests pass Task 3: Wire tool-call loop into agent runner and orchestrator pipeline packages/orchestrator/orchestrator/agents/runner.py, packages/orchestrator/orchestrator/tasks.py 1. Update `runner.py` -- implement tool-call loop: - After LLM response, check if response contains `tool_calls` array (LiteLLM returns this in OpenAI format) - If tool_calls present: for each tool call, dispatch to execute_tool() - If tool requires confirmation: stop the loop, return the confirmation message to the user, store pending action in Redis (pending_tool_confirm_key) - If tool executed: append tool result as a `tool` role message, re-call LLM with updated messages - Loop until LLM returns plain text (no tool_calls) or max iterations reached (default: 5) - Max iteration guard prevents runaway tool chains - Pass AuditLogger instance through the loop for logging each LLM call and tool call

2. Update `tasks.py`:
   - Initialize AuditLogger at task start with session factory
   - Pass audit_logger and tool registry to run_agent
   - Log initial LLM call and final response via audit_logger.log_llm_call()
   - Handle pending tool confirmation: check pending_tool_confirm_key in Redis at start of handle_message. If pending, check if current message is a confirmation (yes/no). If yes, execute the pending tool and continue. If no, cancel and respond.
   - The tool definitions are passed to LiteLLM via the `tools` parameter in the /complete request to llm-pool. Update the llm-pool /complete endpoint to forward `tools` parameter to litellm.acompletion() if present.

3. Update llm-pool /complete endpoint:
   - Accept optional `tools` parameter in request body
   - Forward to litellm.acompletion(tools=tools) when present
   - Return tool_calls in response when LLM produces them

CRITICAL: The tool loop happens inside the Celery task (sync context with asyncio.run). Each iteration of the loop is an async function call within the same asyncio.run() block. Do NOT dispatch separate Celery tasks for tool execution -- it all happens in one task invocation.

Seamless tool usage per user decision: The agent's system prompt should NOT include instructions like "announce when using tools." The tool results are injected as context and the LLM naturally incorporates them. The confirmation flow is the only user-visible tool interaction.

cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py tests/integration/test_audit.py -x -v - Agent runner supports multi-turn tool-call loop (reason -> tool -> observe -> respond) - Tool calls are bounded at 5 iterations maximum - Confirmation-required tools pause and await user response - Every LLM call and tool invocation logged to audit trail - llm-pool forwards tools parameter to LiteLLM - Existing memory pipeline from Plan 01 still works (no regression) - All Phase 1 + Plan 01 tests still pass: `pytest tests/ -x` - Tool tests pass: `pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py -x` - Audit integration tests pass: `pytest tests/integration/test_audit.py -x` - Migration applies cleanly: `alembic upgrade head`

<success_criteria>

Agent can invoke tools during conversation and incorporate results naturally
Tool arguments are validated against JSON Schema before execution
Confirmation-required tools pause for user approval
Every agent action is recorded in immutable, tenant-scoped audit trail
Audit entries cannot be modified or deleted at the database level </success_criteria>

After completion, create `.planning/phases/02-agent-features/02-02-SUMMARY.md`

16 KiB Raw Blame History

16 KiB

Raw Blame History