docs(phase-2): complete phase execution
This commit is contained in:
@@ -80,7 +80,7 @@ Phases execute in numeric order: 1 → 2 → 3
|
|||||||
| Phase | Plans Complete | Status | Completed |
|
| Phase | Plans Complete | Status | Completed |
|
||||||
|-------|----------------|--------|-----------|
|
|-------|----------------|--------|-----------|
|
||||||
| 1. Foundation | 4/4 | Complete | 2026-03-23 |
|
| 1. Foundation | 4/4 | Complete | 2026-03-23 |
|
||||||
| 2. Agent Features | 6/6 | Complete | 2026-03-24 |
|
| 2. Agent Features | 6/6 | Complete | 2026-03-24 |
|
||||||
| 3. Operator Experience | 0/2 | Not started | - |
|
| 3. Operator Experience | 0/2 | Not started | - |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ milestone: v1.0
|
|||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
status: executing
|
status: executing
|
||||||
stopped_at: Completed 02-agent-features/02-06-PLAN.md
|
stopped_at: Completed 02-agent-features/02-06-PLAN.md
|
||||||
last_updated: "2026-03-24T01:16:40.964Z"
|
last_updated: "2026-03-24T01:19:59.932Z"
|
||||||
last_activity: 2026-03-23 — Completed 02-05 multimodal media support and WhatsApp outbound routing
|
last_activity: 2026-03-23 — Completed 02-05 multimodal media support and WhatsApp outbound routing
|
||||||
progress:
|
progress:
|
||||||
total_phases: 3
|
total_phases: 3
|
||||||
|
|||||||
151
.planning/phases/02-agent-features/02-VERIFICATION.md
Normal file
151
.planning/phases/02-agent-features/02-VERIFICATION.md
Normal file
@@ -0,0 +1,151 @@
|
|||||||
|
---
|
||||||
|
phase: 02-agent-features
|
||||||
|
verified: 2026-03-24T01:18:24Z
|
||||||
|
status: human_needed
|
||||||
|
score: 5/5 success criteria verified
|
||||||
|
re_verification: true
|
||||||
|
previous_status: gaps_found
|
||||||
|
previous_score: 3/5
|
||||||
|
gaps_closed:
|
||||||
|
- "Escalation wiring: check_escalation_rules + escalate_to_human now imported at module level (line 71) and called in _process_message (lines 504, 514)"
|
||||||
|
- "WhatsApp outbound routing: _send_response now called at all four response delivery points (lines 355, 395, 438, 556); no direct _update_slack_placeholder calls remain in _process_message"
|
||||||
|
- "Tier-2 WhatsApp system prompt scoping: build_system_prompt appends 'You only handle: {topics}' when channel == 'whatsapp' and tool_assignments non-empty (builder.py line 187)"
|
||||||
|
gaps_remaining: []
|
||||||
|
regressions: []
|
||||||
|
human_verification:
|
||||||
|
- test: "Send a WhatsApp message to a configured tenant and check for reply delivery"
|
||||||
|
expected: "AI employee response appears in the WhatsApp conversation"
|
||||||
|
why_human: "Requires real Meta Cloud API webhook, phone_number_id, and WhatsApp Business account"
|
||||||
|
- test: "Trigger an escalation rule in Slack (send billing-related messages repeatedly) and check for DM"
|
||||||
|
expected: "Assigned human receives a Slack DM with conversation transcript; agent enters assistant mode"
|
||||||
|
why_human: "Requires live Slack workspace, bot token, configured escalation_assignee, and rule trigger"
|
||||||
|
- test: "Configure an agent with allowed_functions, send a borderline off-topic message via WhatsApp"
|
||||||
|
expected: "LLM system prompt contains 'You only handle: {topics}' constraint"
|
||||||
|
why_human: "Can verify injection statically but cannot verify LLM behavioural compliance without live inference"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 2: Agent Features Verification Report
|
||||||
|
|
||||||
|
**Phase Goal:** The AI employee maintains conversation memory, can execute tools, handles WhatsApp messages, and escalates to humans when rules trigger — making it a capable product rather than a demo
|
||||||
|
**Verified:** 2026-03-24T01:18:24Z
|
||||||
|
**Status:** human_needed
|
||||||
|
**Re-verification:** Yes — after gap closure (Plan 02-06)
|
||||||
|
|
||||||
|
## Goal Achievement
|
||||||
|
|
||||||
|
### Observable Truths (from ROADMAP.md Success Criteria)
|
||||||
|
|
||||||
|
| # | Truth | Status | Evidence |
|
||||||
|
|---|-------|--------|----------|
|
||||||
|
| 1 | Agent remembers context from earlier in the same conversation (30+ turns without degradation) | VERIFIED | Redis sliding window (RPUSH/LTRIM, 20-msg default) in `short_term.py`; pgvector HNSW retrieval in `long_term.py`; both wired in `_process_message` via `get_recent_messages` + `retrieve_relevant` + `build_messages_with_memory`. No regression. |
|
||||||
|
| 2 | A user can send a WhatsApp message to the AI employee and receive a reply (per-tenant phone isolation + Meta 2026 scoping) | VERIFIED | Inbound pipeline complete. Outbound routing now wired: `_send_response` called at lines 355, 395, 438, 556 in `_process_message`; `_update_slack_placeholder` only called inside `_send_response` (line 722). `handle_message` pops `phone_number_id` and `bot_token` before `model_validate` (lines 223-224); `wa_id` extracted from `msg.sender.user_id` and injected into extras dict (lines 234-244). |
|
||||||
|
| 3 | Agent can invoke a registered tool and incorporate the result into its response | VERIFIED | Tool registry with 4 built-ins, `execute_tool` with JSON Schema validation, multi-turn loop (max 5 iterations) all wired in `runner.py`; `_process_message` builds `tool_registry` and passes to `run_agent`. No regression. |
|
||||||
|
| 4 | When escalation rule triggers, conversation and full context are handed off to human with no information lost | VERIFIED | `check_escalation_rules` and `escalate_to_human` imported at module level (line 71); pre-check at lines 386-396 (Redis `escalation_status_key` check before LLM call); post-check at lines 502-528 (`check_escalation_rules` called after `run_agent`, `escalate_to_human` called when rule matches and `escalation_assignee` is set). `_build_conversation_metadata` helper provides billing-keyword metadata. |
|
||||||
|
| 5 | Every LLM call, tool invocation, and handoff event is recorded in an immutable audit trail queryable by tenant | VERIFIED | `AuditLogger` initialized at line 375; `log_llm_call` per LLM iteration in `runner.py`; `log_tool_call` in `execute_tool`; `log_escalation` called inside `escalate_to_human`. `audit_events` table has `REVOKE UPDATE, DELETE`. No regression. |
|
||||||
|
|
||||||
|
**Score: 5/5 truths verified**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Required Artifacts
|
||||||
|
|
||||||
|
| Artifact | Expected | Status | Details |
|
||||||
|
|----------|----------|--------|---------|
|
||||||
|
| `packages/orchestrator/orchestrator/memory/short_term.py` | Redis sliding window | VERIFIED | No change from initial verification |
|
||||||
|
| `packages/orchestrator/orchestrator/memory/long_term.py` | pgvector HNSW retrieval | VERIFIED | No change from initial verification |
|
||||||
|
| `packages/orchestrator/orchestrator/tools/registry.py` | `ToolDefinition` + `BUILTIN_TOOLS` | VERIFIED | No change from initial verification |
|
||||||
|
| `packages/orchestrator/orchestrator/tools/executor.py` | Schema-validated tool execution | VERIFIED | No change from initial verification |
|
||||||
|
| `packages/orchestrator/orchestrator/audit/logger.py` | Immutable audit event writer | VERIFIED | No change from initial verification |
|
||||||
|
| `packages/orchestrator/orchestrator/escalation/handler.py` | Escalation rule evaluation + DM delivery | VERIFIED (was ORPHANED) | Now called from `_process_message` pre-check and post-check |
|
||||||
|
| `packages/orchestrator/orchestrator/agents/builder.py` | `build_system_prompt` with WhatsApp tier-2 scoping | VERIFIED (was MISSING) | `channel` parameter added to `build_system_prompt`, `build_messages_with_memory`, `build_messages_with_media`; scoping appended at line 187 |
|
||||||
|
| `packages/orchestrator/orchestrator/tasks.py` | Escalation wiring + channel-aware outbound routing | VERIFIED (was BROKEN) | `check_escalation_rules` + `escalate_to_human` at module-level import and called in pipeline; `_send_response` used at all delivery points; `handle_message` pops WhatsApp extras |
|
||||||
|
| `packages/gateway/gateway/channels/whatsapp.py` | WhatsApp webhook handler + outbound | VERIFIED | No change from initial verification |
|
||||||
|
| `tests/unit/test_pipeline_wiring.py` | 26 tests covering all three gap fixes | VERIFIED | File exists (773 lines), 26 test functions confirmed |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Key Link Verification
|
||||||
|
|
||||||
|
| From | To | Via | Status | Details |
|
||||||
|
|------|----|-----|--------|---------|
|
||||||
|
| `tasks.py` | `memory/short_term.py` | `get_recent_messages` + `append_message` in `_process_message` | WIRED | Lines 451, 566, 567. No regression. |
|
||||||
|
| `agents/builder.py` | `memory/long_term.py` | `retrieve_relevant` + `build_messages_with_memory` | WIRED | Lines 462, 477. No regression. |
|
||||||
|
| `tasks.py` | `embed_and_store` Celery task | `embed_and_store.delay()` after response | WIRED | Line 576. No regression. |
|
||||||
|
| `agents/runner.py` | `tools/executor.py` | Tool-call loop | WIRED | No change. |
|
||||||
|
| `tasks.py` | `audit/logger.py` | `AuditLogger` passed to `run_agent` | WIRED | Line 375. No regression. |
|
||||||
|
| `tasks.py` | `escalation/handler.py` | `check_escalation_rules` + `escalate_to_human` in `_process_message` | WIRED (was NOT WIRED) | Module-level import line 71; pre-check lines 386-396; post-check lines 504-528 |
|
||||||
|
| `tasks.py` | `_send_response` | Called at all response delivery points in `_process_message` | WIRED (was NOT WIRED) | Lines 355, 395, 438, 556. `_update_slack_placeholder` only inside `_send_response` (line 722). |
|
||||||
|
| `agents/builder.py` | `Agent.tool_assignments` | `build_system_prompt(agent, channel="whatsapp")` appends scoping | WIRED (was MISSING) | Line 183-190: iterates `tool_assignments`, appends "You only handle" clause |
|
||||||
|
| `tasks.py` | `build_messages_with_memory` | Passes `str(msg.channel)` as `channel` parameter | WIRED (was MISSING) | Line 482 |
|
||||||
|
| `whatsapp.py` | `normalize.py` | `normalize_whatsapp_event` called after HMAC verification | WIRED | No change. |
|
||||||
|
| `whatsapp.py` | `handle_message.delay` | Dispatched after normalization with extras | WIRED | No change. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Requirements Coverage
|
||||||
|
|
||||||
|
| Requirement | Source Plan | Description | Status | Evidence |
|
||||||
|
|-------------|------------|-------------|--------|----------|
|
||||||
|
| AGNT-02 | 02-01 | Agent maintains conversational memory within sessions (sliding window) | SATISFIED | Redis sliding window fully wired; no regression |
|
||||||
|
| AGNT-03 | 02-01 | Agent retrieves relevant past context via vector search | SATISFIED | pgvector retrieval wired; no regression |
|
||||||
|
| AGNT-04 | 02-02 | Agent can invoke registered tools | SATISFIED | 4 built-in tools, multi-turn loop wired; no regression |
|
||||||
|
| AGNT-05 | 02-04, 02-06 | Agent escalates to human when configured rules trigger | SATISFIED | Pre-check + post-check now wired; `escalate_to_human` called when rule matches and assignee configured |
|
||||||
|
| AGNT-06 | 02-02, 02-06 | Every agent action logged in audit trail | SATISFIED | LLM calls, tool calls, and escalation events all logged; `audit_events` immutable |
|
||||||
|
| CHAN-03 | 02-03, 02-05, 02-06 | User can interact via WhatsApp Business Cloud API | SATISFIED | Inbound fully wired; outbound now routes via `_send_response` → `send_whatsapp_message`; `handle_message` pops WhatsApp extras correctly |
|
||||||
|
| CHAN-04 | 02-03, 02-06 | WhatsApp adapter enforces business-function scoping per Meta 2026 policy | SATISFIED | Tier-1 (keyword gate + canned reply in gateway) verified previously; Tier-2 (system prompt scoping) now implemented in `builder.py` line 182-190 |
|
||||||
|
|
||||||
|
All 7 required requirements satisfied. No orphaned requirements.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Anti-Patterns Found
|
||||||
|
|
||||||
|
| File | Line | Pattern | Severity | Impact |
|
||||||
|
|------|------|---------|----------|--------|
|
||||||
|
| `packages/orchestrator/orchestrator/tasks.py` | 677 | `_execute_pending_tool` returns stub: "Full tool execution will be implemented in Phase 3 with per-tenant OAuth." | Warning | Confirmed tool execution after user approval is deferred to Phase 3 — this is an acknowledged deviation, not a regression |
|
||||||
|
|
||||||
|
No new anti-patterns introduced by Plan 02-06.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Human Verification Required
|
||||||
|
|
||||||
|
#### 1. WhatsApp End-to-End Delivery
|
||||||
|
|
||||||
|
**Test:** Configure a WhatsApp-connected tenant, send a WhatsApp message, wait for LLM response.
|
||||||
|
**Expected:** The AI employee's reply appears in the WhatsApp conversation thread.
|
||||||
|
**Why human:** Requires real Meta Cloud API credentials, a registered phone_number_id, and live webhook traffic. Static analysis confirms the outbound path is now wired (`_send_response` calls `send_whatsapp_message` with correct parameters), but delivery cannot be verified without live infrastructure.
|
||||||
|
|
||||||
|
#### 2. Escalation DM Delivery
|
||||||
|
|
||||||
|
**Test:** Configure an agent with `escalation_assignee` (Slack user ID) and a billing escalation rule. Send multiple messages containing billing keywords (e.g., "billing", "invoice", "refund") to trigger the rule.
|
||||||
|
**Expected:** The configured Slack user receives a DM with the full conversation transcript. Subsequent messages receive the assistant-mode reply without LLM processing.
|
||||||
|
**Why human:** Requires a live Slack workspace, valid bot token, valid `escalation_assignee` user ID, and triggering the keyword threshold. The pre-check and post-check wiring is verified in code and unit tests, but end-to-end delivery requires the Slack API.
|
||||||
|
|
||||||
|
#### 3. WhatsApp Business-Function Scoping (Tier 2) — Behavioural Compliance
|
||||||
|
|
||||||
|
**Test:** Configure an agent with `tool_assignments = ["customer support", "billing inquiries"]`. Send a borderline off-topic message via WhatsApp (e.g., "Can you help me write a poem?").
|
||||||
|
**Expected:** The LLM system prompt contains "You only handle: customer support, billing inquiries" and the response redirects the user to allowed topics.
|
||||||
|
**Why human:** The system prompt injection is statically verified (`build_system_prompt` appends the clause at line 187 when `channel == "whatsapp"` and `tool_assignments` is non-empty). LLM behavioural compliance with that constraint requires a live inference call.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Gaps Summary
|
||||||
|
|
||||||
|
All three gaps from the initial verification are confirmed closed in the actual codebase:
|
||||||
|
|
||||||
|
1. **Escalation wiring (AGNT-05):** `check_escalation_rules` and `escalate_to_human` are imported at module level (line 71) and called from `_process_message`. Pre-check gates already-escalated conversations at lines 386-396. Post-check evaluates rules after `run_agent` at lines 504-528. `_build_conversation_metadata` provides billing-keyword metadata.
|
||||||
|
|
||||||
|
2. **WhatsApp outbound routing (CHAN-03):** `_send_response` is called at all four response delivery points (lines 355, 395, 438, 556). No direct `_update_slack_placeholder` calls remain in `_process_message` — the only call is inside `_send_response` itself (line 722). `handle_message` pops `phone_number_id` and `bot_token` before `model_validate` and injects `wa_id` into the extras dict.
|
||||||
|
|
||||||
|
3. **Tier-2 WhatsApp system prompt scoping (CHAN-04):** `build_system_prompt` accepts a `channel` parameter and appends the business-function constraint at line 182-190. `build_messages_with_memory` and `build_messages_with_media` pass `channel` through. `_process_message` passes `str(msg.channel)` at line 482.
|
||||||
|
|
||||||
|
No regressions detected in previously-verified truths (memory pipeline, tool execution, audit logging).
|
||||||
|
|
||||||
|
Remaining open items are behavioural and require human verification with live infrastructure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Verified: 2026-03-24T01:18:24Z_
|
||||||
|
_Verifier: Claude (gsd-verifier)_
|
||||||
|
_Re-verification: Yes — after Plan 02-06 gap closure_
|
||||||
Reference in New Issue
Block a user