docs(10): add research and validation strategy
This commit is contained in:
82
.planning/phases/10-agent-capabilities/10-VALIDATION.md
Normal file
82
.planning/phases/10-agent-capabilities/10-VALIDATION.md
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
---
|
||||||
|
phase: 10
|
||||||
|
slug: agent-capabilities
|
||||||
|
status: draft
|
||||||
|
nyquist_compliant: false
|
||||||
|
wave_0_complete: false
|
||||||
|
created: 2026-03-26
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 10 — Validation Strategy
|
||||||
|
|
||||||
|
> Per-phase validation contract for feedback sampling during execution.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Test Infrastructure
|
||||||
|
|
||||||
|
| Property | Value |
|
||||||
|
|----------|-------|
|
||||||
|
| **Framework** | pytest 8.x + pytest-asyncio (existing) |
|
||||||
|
| **Config file** | `pyproject.toml` (existing) |
|
||||||
|
| **Quick run command** | `pytest tests/unit -x -q` |
|
||||||
|
| **Full suite command** | `pytest tests/ -x` |
|
||||||
|
| **Estimated runtime** | ~45 seconds |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sampling Rate
|
||||||
|
|
||||||
|
- **After every task commit:** Run `pytest tests/unit -x -q`
|
||||||
|
- **After every plan wave:** Run `pytest tests/ -x`
|
||||||
|
- **Before `/gsd:verify-work`:** Full suite must be green
|
||||||
|
- **Max feedback latency:** 45 seconds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Per-Task Verification Map
|
||||||
|
|
||||||
|
| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
|
||||||
|
|---------|------|------|-------------|-----------|-------------------|-------------|--------|
|
||||||
|
| 10-xx | 01 | 1 | CAP-01 | unit | `pytest tests/unit/test_web_search.py -x` | ❌ W0 | ⬜ pending |
|
||||||
|
| 10-xx | 01 | 1 | CAP-02,03 | unit | `pytest tests/unit/test_kb_ingestion.py -x` | ❌ W0 | ⬜ pending |
|
||||||
|
| 10-xx | 01 | 1 | CAP-04 | unit | `pytest tests/unit/test_http_request.py -x` | ❌ W0 | ⬜ pending |
|
||||||
|
| 10-xx | 02 | 2 | CAP-05 | unit | `pytest tests/unit/test_calendar.py -x` | ❌ W0 | ⬜ pending |
|
||||||
|
| 10-xx | 02 | 2 | CAP-06 | unit | `pytest tests/unit/test_tool_output.py -x` | ❌ W0 | ⬜ pending |
|
||||||
|
| 10-xx | 03 | 2 | CAP-03 | build | `cd packages/portal && npx next build` | ✅ | ⬜ pending |
|
||||||
|
| 10-xx | 03 | 2 | CAP-07 | integration | `pytest tests/integration/test_audit.py -x` | ✅ extend | ⬜ pending |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Wave 0 Requirements
|
||||||
|
|
||||||
|
- [ ] `tests/unit/test_web_search.py` — CAP-01: Brave Search API integration
|
||||||
|
- [ ] `tests/unit/test_kb_ingestion.py` — CAP-02,03: document chunking, embedding, search
|
||||||
|
- [ ] `tests/unit/test_http_request.py` — CAP-04: HTTP request tool validation
|
||||||
|
- [ ] `tests/unit/test_calendar.py` — CAP-05: Google Calendar OAuth + CRUD
|
||||||
|
- [ ] `tests/unit/test_tool_output.py` — CAP-06: natural language tool result formatting
|
||||||
|
- [ ] Install: `uv add pypdf python-docx python-pptx openpyxl pandas firecrawl-py youtube-transcript-api google-auth google-auth-oauthlib google-api-python-client`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Manual-Only Verifications
|
||||||
|
|
||||||
|
| Behavior | Requirement | Why Manual | Test Instructions |
|
||||||
|
|----------|-------------|------------|-------------------|
|
||||||
|
| Web search returns real results | CAP-01 | Requires live Brave API key | Send message requiring web search, verify results |
|
||||||
|
| Document upload + search works end-to-end | CAP-02,03 | Requires file upload + LLM | Upload PDF, ask agent about its content |
|
||||||
|
| Calendar books a meeting | CAP-05 | Requires live Google Calendar OAuth | Connect calendar, ask agent to book a meeting |
|
||||||
|
| Agent response reads naturally with tool data | CAP-06 | Qualitative assessment | Chat with agent using tools, verify natural language |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Validation Sign-Off
|
||||||
|
|
||||||
|
- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
|
||||||
|
- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
|
||||||
|
- [ ] Wave 0 covers all MISSING references
|
||||||
|
- [ ] No watch-mode flags
|
||||||
|
- [ ] Feedback latency < 45s
|
||||||
|
- [ ] `nyquist_compliant: true` set in frontmatter
|
||||||
|
|
||||||
|
**Approval:** pending
|
||||||
Reference in New Issue
Block a user