From 972ef9b1f7cbc01690c022b907b132d6375c72f6 Mon Sep 17 00:00:00 2001 From: Adolfo Delorenzo Date: Wed, 25 Mar 2026 22:11:53 -0600 Subject: [PATCH] docs(09): capture phase context --- .planning/phases/09-testing-qa/09-CONTEXT.md | 115 +++++++++++++++++++ 1 file changed, 115 insertions(+) create mode 100644 .planning/phases/09-testing-qa/09-CONTEXT.md diff --git a/.planning/phases/09-testing-qa/09-CONTEXT.md b/.planning/phases/09-testing-qa/09-CONTEXT.md new file mode 100644 index 0000000..8f1c40e --- /dev/null +++ b/.planning/phases/09-testing-qa/09-CONTEXT.md @@ -0,0 +1,115 @@ +# Phase 9: Testing & QA - Context + +**Gathered:** 2026-03-26 +**Status:** Ready for planning + + +## Phase Boundary + +Automated testing infrastructure and quality audits. Playwright E2E tests for critical user flows, Lighthouse performance/accessibility audits, visual regression snapshots at 3 viewports, axe-core accessibility validation, cross-browser testing (Chrome/Firefox/Safari), and a CI-ready pipeline. Goal: beta-ready confidence that the platform works. + + + + +## Implementation Decisions + +All decisions at Claude's discretion — user trusts judgment. + +### E2E Test Scope & Priority +- Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation) +- Critical flows to test (priority order): + 1. Login → dashboard loads → session persists + 2. Create tenant → tenant appears in list + 3. Deploy template agent → agent appears in employees list + 4. Chat: open conversation → send message → receive streaming response (mock LLM) + 5. RBAC: operator cannot access /agents/new, /billing, /users + 6. Language switcher → UI updates to selected language + 7. Mobile viewport: bottom tab bar renders, sidebar hidden +- LLM responses mocked in E2E tests (no real Ollama/API calls) — deterministic, fast, CI-safe +- Test data: seed a test tenant + test user via API calls in test setup, clean up after + +### Lighthouse & Performance +- Target scores: >= 90 for Performance, Accessibility, Best Practices, SEO +- Run Lighthouse CI on: login page, dashboard, chat page, agents/new page +- Fail CI if any score drops below 80 (warning at 85, target 90) + +### Visual Regression +- Playwright screenshot comparison at 3 viewports: desktop (1280x800), tablet (768x1024), mobile (375x812) +- Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery +- Baseline snapshots committed to repo — CI fails on unexpected visual diff +- Update snapshots intentionally via `npx playwright test --update-snapshots` + +### Accessibility +- axe-core integrated via @axe-core/playwright +- Run on every page during E2E flows — zero critical violations required +- Violations at "serious" level logged as warnings, not blockers (for beta) +- Keyboard navigation test: Tab through login form, chat input, nav items + +### Cross-Browser +- Playwright projects: chromium, firefox, webkit (Safari) +- All E2E tests run on all 3 browsers +- Visual regression only on chromium (browser rendering diffs are expected) + +### CI Pipeline +- Gitea Actions (matches existing infrastructure at git.oe74.net) +- Workflow triggers: push to main, pull request to main +- Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse +- Docker Compose for CI (postgres + redis + gateway + portal) — same containers as dev +- Test results: JUnit XML for test reports, HTML for Playwright trace viewer +- Fail-fast: lint/type errors block everything; unit test failures block E2E + +### Claude's Discretion +- Playwright config details (timeouts, retries, parallelism) +- Test file organization (by feature vs by page) +- Fixture/helper patterns for auth, tenant setup, API mocking +- Lighthouse CI tool (lighthouse-ci vs @lhci/cli) +- Whether to include a smoke test for the WebSocket chat connection +- Visual regression threshold (pixel diff tolerance) + + + + +## Specific Ideas + +- E2E tests should be the "would I trust this with a real customer?" gate +- Mock the LLM but test the full WebSocket flow — the streaming UX was the hardest part to get right +- The CI pipeline should be fast enough to not block development — target < 5 minutes total +- Visual regression catches the kind of CSS regressions that unit tests miss entirely + + + + +## Existing Code Insights + +### Reusable Assets +- `packages/portal/` — Next.js 16 standalone output (Playwright can test against it) +- `docker-compose.yml` — Full stack definition (reuse for CI with test DB) +- `tests/` directory — Backend pytest suite (316+ tests) — already CI-compatible +- `.env.example` — Template for CI environment variables +- Playwright MCP plugin already installed (used for manual testing during development) + +### Established Patterns +- Backend tests use pytest + pytest-asyncio with integration test fixtures +- Portal builds via `npm run build` (already verified in every phase) +- Auth: email/password via Auth.js v5 JWT (Playwright can automate login) +- API: FastAPI with RBAC headers (E2E tests need to set session cookies) + +### Integration Points +- CI needs: PostgreSQL, Redis, gateway, llm-pool (or mock), portal containers +- Playwright tests run against the built portal (localhost:3000) +- Backend tests run against test DB (separate from dev DB) +- Gitea Actions runner on git.oe74.net (needs Docker-in-Docker or host Docker access) + + + + +## Deferred Ideas + +None — discussion stayed within phase scope + + + +--- + +*Phase: 09-testing-qa* +*Context gathered: 2026-03-26*