fix(02-agent-features): revise plans based on checker feedback

This commit is contained in:
2026-03-23 14:32:20 -06:00
parent 7da5ffb92a
commit b2e86f1046
4 changed files with 319 additions and 68 deletions

View File

@@ -97,45 +97,28 @@ Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, u
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Audit logger, tool registry, executor, and built-in tools with tests</name>
<name>Task 1: Audit model, KB model, migration, and audit logger with tests</name>
<files>
packages/shared/shared/models/audit.py,
packages/shared/shared/models/kb.py,
packages/orchestrator/orchestrator/audit/__init__.py,
packages/orchestrator/orchestrator/audit/logger.py,
packages/orchestrator/orchestrator/tools/__init__.py,
packages/orchestrator/orchestrator/tools/registry.py,
packages/orchestrator/orchestrator/tools/executor.py,
packages/orchestrator/orchestrator/tools/builtins/__init__.py,
packages/orchestrator/orchestrator/tools/builtins/web_search.py,
packages/orchestrator/orchestrator/tools/builtins/kb_search.py,
packages/orchestrator/orchestrator/tools/builtins/http_request.py,
packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py,
migrations/versions/003_phase2_audit_kb.py,
tests/unit/test_tool_registry.py,
tests/unit/test_tool_executor.py,
tests/integration/test_audit.py
</files>
<behavior>
- ToolDefinition has name, description, parameters (JSON Schema), requires_confirmation, handler
- BUILTIN_TOOLS contains 4 tools: web_search, kb_search, http_request, calendar_lookup
- get_tools_for_agent filters BUILTIN_TOOLS by agent's configured tool list
- execute_tool validates args against tool's JSON schema before calling handler
- execute_tool with invalid args returns error string and logs the failure
- execute_tool with unknown tool name raises ValueError
- execute_tool with requires_confirmation=True returns a confirmation request instead of executing
- AuditEvent has id, tenant_id, agent_id, user_id, action_type, input_summary, output_summary, latency_ms, metadata (JSONB), created_at
- AuditLogger.log_tool_call writes a row to audit_events with action_type='tool_invocation'
- AuditLogger.log_llm_call writes a row with action_type='llm_call' including latency_ms
- AuditLogger.log_escalation writes a row with action_type='escalation'
- audit_events table rejects UPDATE and DELETE from konstruct_app role
- audit_events are tenant-scoped via RLS
- web_search tool calls Brave Search API and returns structured results
- kb_search tool queries pgvector knowledge base (conversation_embeddings or dedicated kb_chunks table)
- http_request tool makes outbound HTTP with timeout (30s), size cap (1MB), allowed methods (GET/POST/PUT/DELETE)
- calendar_lookup tool queries Google Calendar events.list for availability
- KBChunk model has id, tenant_id, document_id, content, embedding (Vector(384)), chunk_index, created_at
- Migration creates both audit_events and kb tables with appropriate indexes and RLS
</behavior>
<action>
1. Create `packages/shared/shared/models/audit.py`:
- AuditEvent: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID), user_id (TEXT), action_type (TEXT NOT NULL 'llm_call' | 'tool_invocation' | 'escalation'), input_summary (TEXT), output_summary (TEXT), latency_ms (INTEGER), metadata (JSONB, default={}), created_at (TIMESTAMPTZ, server_default=now())
- AuditEvent: id (UUID PK), tenant_id (UUID NOT NULL), agent_id (UUID), user_id (TEXT), action_type (TEXT NOT NULL -- 'llm_call' | 'tool_invocation' | 'escalation'), input_summary (TEXT), output_summary (TEXT), latency_ms (INTEGER), metadata (JSONB, default={}), created_at (TIMESTAMPTZ, server_default=now())
- RLS enabled + forced, same pattern as other tenant-scoped tables
2. Create `packages/shared/shared/models/kb.py`:
@@ -145,7 +128,7 @@ Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, u
3. Create Alembic migration `003_phase2_audit_kb.py`:
- audit_events table with all columns, index on (tenant_id, created_at DESC), RLS
- REVOKE UPDATE, DELETE ON audit_events FROM konstruct_app immutability enforced at DB level
- REVOKE UPDATE, DELETE ON audit_events FROM konstruct_app -- immutability enforced at DB level
- kb_documents and kb_chunks tables, HNSW index on kb_chunks embedding, RLS
- GRANT SELECT, INSERT on audit_events TO konstruct_app
- GRANT SELECT, INSERT, UPDATE, DELETE on kb_documents and kb_chunks TO konstruct_app
@@ -157,53 +140,94 @@ Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, u
- async log_escalation(tenant_id, agent_id, user_id, trigger_reason, metadata={})
- All methods write to audit_events table with RLS context set
5. Create `packages/orchestrator/orchestrator/tools/registry.py`:
- ToolDefinition Pydantic model: name, description, parameters (dict — JSON Schema), requires_confirmation (bool, default False), handler (Any, excluded from serialization)
5. Write integration tests (test_audit.py):
- Test that audit events are written to DB with correct fields
- Test that UPDATE/DELETE is rejected (expect error)
- Test RLS isolation between tenants
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/integration/test_audit.py -x -v</automated>
</verify>
<done>
- AuditEvent and KB ORM models exist with correct schema
- Audit events written to DB for LLM calls, tool invocations, and escalations
- audit_events immutability enforced (UPDATE/DELETE rejected at DB level)
- RLS isolates audit data per tenant
- Migration applies cleanly with both audit and KB tables
</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Tool registry, executor, and 4 built-in tools with tests</name>
<files>
packages/orchestrator/orchestrator/tools/__init__.py,
packages/orchestrator/orchestrator/tools/registry.py,
packages/orchestrator/orchestrator/tools/executor.py,
packages/orchestrator/orchestrator/tools/builtins/__init__.py,
packages/orchestrator/orchestrator/tools/builtins/web_search.py,
packages/orchestrator/orchestrator/tools/builtins/kb_search.py,
packages/orchestrator/orchestrator/tools/builtins/http_request.py,
packages/orchestrator/orchestrator/tools/builtins/calendar_lookup.py,
tests/unit/test_tool_registry.py,
tests/unit/test_tool_executor.py
</files>
<behavior>
- ToolDefinition has name, description, parameters (JSON Schema), requires_confirmation, handler
- BUILTIN_TOOLS contains 4 tools: web_search, kb_search, http_request, calendar_lookup
- get_tools_for_agent filters BUILTIN_TOOLS by agent's configured tool list
- execute_tool validates args against tool's JSON schema before calling handler
- execute_tool with invalid args returns error string and logs the failure
- execute_tool with unknown tool name raises ValueError
- execute_tool with requires_confirmation=True returns a confirmation request instead of executing
- web_search tool calls Brave Search API and returns structured results
- kb_search tool queries pgvector knowledge base (kb_chunks table)
- http_request tool makes outbound HTTP with timeout (30s), size cap (1MB), allowed methods (GET/POST/PUT/DELETE)
- calendar_lookup tool queries Google Calendar events.list for availability
</behavior>
<action>
1. Create `packages/orchestrator/orchestrator/tools/registry.py`:
- ToolDefinition Pydantic model: name, description, parameters (dict -- JSON Schema), requires_confirmation (bool, default False), handler (Any, excluded from serialization)
- BUILTIN_TOOLS: dict[str, ToolDefinition] with 4 tools
- get_tools_for_agent(agent: Agent) -> dict[str, ToolDefinition]: filters by agent.tools list
- to_litellm_format(tools: dict) -> list[dict]: converts to OpenAI function-calling schema for LiteLLM
6. Create `packages/orchestrator/orchestrator/tools/executor.py`:
2. Create `packages/orchestrator/orchestrator/tools/executor.py`:
- async execute_tool(tool_call: dict, registry: dict, tenant_id, agent_id, audit_logger) -> str
- Validates args via jsonschema.validate() BEFORE calling handler (LLM output is untrusted)
- If requires_confirmation is True, return a confirmation message string instead of executing
- Logs every invocation (success or failure) to audit trail
- Install jsonschema: `uv add jsonschema` in orchestrator package
7. Create 4 built-in tool handlers in `tools/builtins/`:
- web_search.py: async web_search(query: str) -> str. Uses Brave Search API via httpx. Env var: BRAVE_API_KEY. Returns top 3 results formatted as text. Install: `uv add brave-search` or use raw httpx to https://api.search.brave.com/res/v1/web/search
3. Create 4 built-in tool handlers in `tools/builtins/`:
- web_search.py: async web_search(query: str) -> str. Uses Brave Search API via httpx. Env var: BRAVE_API_KEY. Returns top 3 results formatted as text.
- kb_search.py: async kb_search(query: str, tenant_id: str, agent_id: str) -> str. Embeds query, searches kb_chunks via pgvector. Returns top 3 matching chunks as text.
- http_request.py: async http_request(url: str, method: str = "GET", body: str | None = None) -> str. Timeout 30s, response size cap 1MB, allowed methods GET/POST/PUT/DELETE. requires_confirmation=True.
- calendar_lookup.py: async calendar_lookup(date: str, calendar_id: str = "primary") -> str. Uses google-api-python-client events.list(). Requires GOOGLE_SERVICE_ACCOUNT_KEY env var or per-tenant OAuth. Returns formatted availability. requires_confirmation=False (read-only).
8. Write unit tests:
4. Write unit tests:
- test_tool_registry.py: test tool lookup, filtering by agent, LiteLLM format conversion
- test_tool_executor.py: test schema validation (valid args pass, invalid rejected), confirmation flow, unknown tool error, audit logging called
9. Write integration tests:
- test_audit.py: test that audit events are written to DB, test that UPDATE/DELETE is rejected (expect error), test RLS isolation between tenants
- test_tool_executor.py: test schema validation (valid args pass, invalid rejected), confirmation flow, unknown tool error, audit logging called (mock audit_logger)
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py tests/integration/test_audit.py -x -v</automated>
<automated>cd /home/adelorenzo/repos/konstruct && python -m pytest tests/unit/test_tool_registry.py tests/unit/test_tool_executor.py -x -v</automated>
</verify>
<done>
- 4 built-in tools registered with JSON Schema definitions
- Tool executor validates args and rejects invalid input
- Confirmation-required tools return confirmation message instead of executing
- Audit events written to DB for every tool call and LLM call
- audit_events immutability enforced (UPDATE/DELETE rejected)
- RLS isolates audit data per tenant
- Tool registry converts to LiteLLM function-calling format
- All unit tests pass
</done>
</task>
<task type="auto">
<name>Task 2: Wire tool-call loop into agent runner and orchestrator pipeline</name>
<name>Task 3: Wire tool-call loop into agent runner and orchestrator pipeline</name>
<files>
packages/orchestrator/orchestrator/agents/runner.py,
packages/orchestrator/orchestrator/tasks.py
</files>
<action>
1. Update `runner.py` implement tool-call loop:
1. Update `runner.py` -- implement tool-call loop:
- After LLM response, check if response contains `tool_calls` array (LiteLLM returns this in OpenAI format)
- If tool_calls present: for each tool call, dispatch to execute_tool()
- If tool requires confirmation: stop the loop, return the confirmation message to the user, store pending action in Redis (pending_tool_confirm_key)
@@ -224,7 +248,7 @@ Output: Tool registry + executor, 4 builtin tools, audit logger, DB migration, u
- Forward to litellm.acompletion(tools=tools) when present
- Return tool_calls in response when LLM produces them
CRITICAL: The tool loop happens inside the Celery task (sync context with asyncio.run). Each iteration of the loop is an async function call within the same asyncio.run() block. Do NOT dispatch separate Celery tasks for tool execution it all happens in one task invocation.
CRITICAL: The tool loop happens inside the Celery task (sync context with asyncio.run). Each iteration of the loop is an async function call within the same asyncio.run() block. Do NOT dispatch separate Celery tasks for tool execution -- it all happens in one task invocation.
Seamless tool usage per user decision: The agent's system prompt should NOT include instructions like "announce when using tools." The tool results are injected as context and the LLM naturally incorporates them. The confirmation flow is the only user-visible tool interaction.
</action>