Compare commits

..

3 Commits

6 changed files with 1449 additions and 5 deletions

View File

@@ -131,7 +131,7 @@ Plans:
## Progress ## Progress
**Execution Order:** **Execution Order:**
Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8 -> 9
| Phase | Plans Complete | Status | Completed | | Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------| |-------|----------------|--------|-----------|
@@ -143,7 +143,7 @@ Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5 -> 6 -> 7 -> 8
| 6. Web Chat | 3/3 | Complete | 2026-03-25 | | 6. Web Chat | 3/3 | Complete | 2026-03-25 |
| 7. Multilanguage | 4/4 | Complete | 2026-03-25 | | 7. Multilanguage | 4/4 | Complete | 2026-03-25 |
| 8. Mobile + PWA | 4/4 | Complete | 2026-03-26 | | 8. Mobile + PWA | 4/4 | Complete | 2026-03-26 |
| 9. Testing & QA | 0/0 | Not started | - | | 9. Testing & QA | 0/3 | In progress | - |
--- ---
@@ -201,11 +201,13 @@ Plans:
5. All E2E tests pass on Chrome, Firefox, and Safari (WebKit) 5. All E2E tests pass on Chrome, Firefox, and Safari (WebKit)
6. Empty states, error states, and loading states are tested and render correctly 6. Empty states, error states, and loading states are tested and render correctly
7. CI-ready test suite that can run in a GitHub Actions / Gitea Actions pipeline 7. CI-ready test suite that can run in a GitHub Actions / Gitea Actions pipeline
**Plans**: 0 plans **Plans**: 3 plans
Plans: Plans:
- [ ] TBD (run /gsd:plan-phase 9 to break down) - [ ] 09-01-PLAN.md — Playwright infrastructure (config, auth fixtures, seed helpers) + all 7 critical flow E2E tests (login, tenant CRUD, agent deploy, chat, RBAC, i18n, mobile)
- [ ] 09-02-PLAN.md — Visual regression snapshots at 3 viewports, axe-core accessibility scans, Lighthouse CI score gating
- [ ] 09-03-PLAN.md — Gitea Actions CI pipeline (backend lint+pytest, portal build+E2E+Lighthouse) + human verification
--- ---
*Roadmap created: 2026-03-23* *Roadmap created: 2026-03-23*
*Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements + 5 Web Chat requirements + 6 Multilanguage requirements + 6 Mobile+PWA requirements mapped* *Coverage: 25/25 v1 requirements + 6 RBAC requirements + 5 Employee Design requirements + 5 Web Chat requirements + 6 Multilanguage requirements + 6 Mobile+PWA requirements + 7 Testing & QA requirements mapped*

View File

@@ -0,0 +1,239 @@
---
phase: 09-testing-qa
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- packages/portal/playwright.config.ts
- packages/portal/e2e/auth.setup.ts
- packages/portal/e2e/fixtures.ts
- packages/portal/e2e/helpers/seed.ts
- packages/portal/e2e/flows/login.spec.ts
- packages/portal/e2e/flows/tenant-crud.spec.ts
- packages/portal/e2e/flows/agent-deploy.spec.ts
- packages/portal/e2e/flows/chat.spec.ts
- packages/portal/e2e/flows/rbac.spec.ts
- packages/portal/e2e/flows/i18n.spec.ts
- packages/portal/e2e/flows/mobile.spec.ts
- packages/portal/package.json
autonomous: true
requirements:
- QA-01
- QA-05
- QA-06
must_haves:
truths:
- "Playwright E2E tests cover all 7 critical user flows and pass on chromium"
- "Tests pass on all 3 browsers (chromium, firefox, webkit)"
- "Empty states, error states, and loading states are tested within flow specs"
- "Auth setup saves storageState for 3 roles (platform_admin, customer_admin, customer_operator)"
artifacts:
- path: "packages/portal/playwright.config.ts"
provides: "Playwright configuration with 3 browser projects + setup project"
contains: "defineConfig"
- path: "packages/portal/e2e/auth.setup.ts"
provides: "Auth state generation for 3 roles"
contains: "storageState"
- path: "packages/portal/e2e/fixtures.ts"
provides: "Shared test fixtures with axe builder and role-based auth"
exports: ["test", "expect"]
- path: "packages/portal/e2e/helpers/seed.ts"
provides: "Test data seeding via FastAPI admin API"
exports: ["seedTestTenant"]
- path: "packages/portal/e2e/flows/login.spec.ts"
provides: "Login flow E2E test"
- path: "packages/portal/e2e/flows/chat.spec.ts"
provides: "Chat flow E2E test with WebSocket mock"
- path: "packages/portal/e2e/flows/rbac.spec.ts"
provides: "RBAC enforcement E2E test"
key_links:
- from: "packages/portal/e2e/auth.setup.ts"
to: "playwright/.auth/*.json"
via: "storageState save"
pattern: "storageState.*path"
- from: "packages/portal/e2e/flows/*.spec.ts"
to: "packages/portal/e2e/fixtures.ts"
via: "import { test } from fixtures"
pattern: "from.*fixtures"
- from: "packages/portal/playwright.config.ts"
to: ".next/standalone/server.js"
via: "webServer command"
pattern: "node .next/standalone/server.js"
---
<objective>
Set up Playwright E2E testing infrastructure and implement all 7 critical user flow tests covering login, tenant CRUD, agent deployment, chat with mocked WebSocket, RBAC enforcement, i18n language switching, and mobile viewport behavior.
Purpose: Establishes the automated E2E test suite that validates all critical user paths work end-to-end across Chrome, Firefox, and Safari -- the primary quality gate for beta readiness.
Output: Playwright config, auth fixtures for 3 roles, seed helpers, and 7 flow spec files that pass on all 3 browsers.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/09-testing-qa/09-CONTEXT.md
@.planning/phases/09-testing-qa/09-RESEARCH.md
Key codebase references:
@packages/portal/package.json
@packages/portal/next.config.ts
@packages/portal/app/layout.tsx
@packages/portal/app/login/page.tsx
@packages/portal/lib/use-chat-socket.ts
@packages/portal/app/(app)/chat/page.tsx
</context>
<tasks>
<task type="auto">
<name>Task 1: Install Playwright and create test infrastructure</name>
<files>
packages/portal/package.json
packages/portal/playwright.config.ts
packages/portal/e2e/auth.setup.ts
packages/portal/e2e/fixtures.ts
packages/portal/e2e/helpers/seed.ts
packages/portal/playwright/.auth/.gitkeep
packages/portal/.gitignore
</files>
<action>
1. Install test dependencies:
```
cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit
```
2. Create `packages/portal/playwright.config.ts` following the RESEARCH Pattern 6 exactly:
- testDir: "./e2e"
- fullyParallel: false (CI stability with shared DB state)
- forbidOnly: !!process.env.CI
- retries: process.env.CI ? 1 : 0
- workers: process.env.CI ? 1 : undefined
- timeout: 30_000
- reporter: html + junit + list
- use.baseURL from PLAYWRIGHT_BASE_URL env or localhost:3000
- use.trace: "on-first-retry"
- use.screenshot: "only-on-failure"
- use.serviceWorkers: "block" (CRITICAL: prevents Serwist from intercepting test requests)
- expect.toHaveScreenshot: maxDiffPixelRatio 0.02, threshold 0.2
- Projects: setup, chromium, firefox, webkit (all depend on setup, testMatch "e2e/flows/**")
- Visual projects: visual-desktop (1280x800), visual-tablet (768x1024), visual-mobile (iPhone 12 375x812) -- all chromium only, testMatch "e2e/visual/**"
- A11y project: chromium, testMatch "e2e/accessibility/**"
- webServer: command "node .next/standalone/server.js", url localhost:3000, reuseExistingServer: !process.env.CI
- webServer env: PORT 3000, API_URL from env or localhost:8001, AUTH_SECRET test-secret, AUTH_URL localhost:3000
- Default storageState for chromium/firefox/webkit: "playwright/.auth/platform-admin.json"
3. Create `packages/portal/e2e/auth.setup.ts`:
- 3 setup blocks: platform admin, customer admin, customer operator
- Each: goto /login, fill Email + Password from env vars (E2E_ADMIN_EMAIL/E2E_ADMIN_PASSWORD, E2E_CADMIN_EMAIL/E2E_CADMIN_PASSWORD, E2E_OPERATOR_EMAIL/E2E_OPERATOR_PASSWORD), click Sign In button, waitForURL /dashboard, save storageState to playwright/.auth/{role}.json
- Use path.resolve(__dirname, ...) for auth file paths
4. Create `packages/portal/e2e/fixtures.ts`:
- Extend base test with: axe fixture (returns () => AxeBuilder with wcag2a, wcag2aa, wcag21aa tags), auth state paths as constants
- Export `test` and `expect` from the extended fixture
- Export AUTH_PATHS object: { platformAdmin, customerAdmin, operator } with resolved paths
5. Create `packages/portal/e2e/helpers/seed.ts`:
- seedTestTenant(request: APIRequestContext) -- POST to /api/portal/tenants with X-User-Id, X-User-Role headers, returns { tenantId, tenantSlug }
- cleanupTenant(request: APIRequestContext, tenantId: string) -- DELETE /api/portal/tenants/{id}
- Use random suffix for tenant names to avoid collisions
6. Create `packages/portal/playwright/.auth/.gitkeep` (empty file)
7. Add to packages/portal/.gitignore: `playwright/.auth/*.json`, `playwright-report/`, `playwright-results.xml`, `.lighthouseci/`
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx playwright --version && test -f playwright.config.ts && test -f e2e/auth.setup.ts && test -f e2e/fixtures.ts && test -f e2e/helpers/seed.ts && echo "PASS"</automated>
</verify>
<done>Playwright installed, config created with 3 browser + 3 visual + 1 a11y projects, auth setup saves storageState for 3 roles, fixtures export axe builder and auth paths, seed helper creates/deletes test tenants via API</done>
</task>
<task type="auto">
<name>Task 2: Implement all 7 critical flow E2E tests</name>
<files>
packages/portal/e2e/flows/login.spec.ts
packages/portal/e2e/flows/tenant-crud.spec.ts
packages/portal/e2e/flows/agent-deploy.spec.ts
packages/portal/e2e/flows/chat.spec.ts
packages/portal/e2e/flows/rbac.spec.ts
packages/portal/e2e/flows/i18n.spec.ts
packages/portal/e2e/flows/mobile.spec.ts
</files>
<action>
All specs import `{ test, expect }` from `../fixtures`. Use semantic selectors (getByRole, getByLabel, getByText) -- never CSS IDs or data-testid unless no semantic selector exists.
1. `login.spec.ts` (Flow 1):
- Test "login -> dashboard loads -> session persists": goto /login, fill credentials, click Sign In, waitForURL /dashboard, assert dashboard heading visible. Reload page, assert still on /dashboard (session persists).
- Test "invalid credentials show error": fill wrong password, submit, assert error message visible.
- Test "empty state: no session redirects to login": use empty storageState ({}), goto /dashboard, assert redirected to /login.
2. `tenant-crud.spec.ts` (Flow 2):
- Uses platform_admin storageState
- Test "create tenant -> appears in list": navigate to tenants page, click create button, fill tenant name + slug (random suffix), submit, assert new tenant appears in list.
- Test "delete tenant": create tenant, then delete it, assert it disappears from list.
- Use seed helper for setup where possible.
3. `agent-deploy.spec.ts` (Flow 3):
- Uses customer_admin storageState (or platform_admin with tenant context)
- Test "deploy template agent -> appears in employees": navigate to /agents/new, select template option, pick first available template, click deploy, assert agent appears in agents list.
- Test "loading state: template gallery shows loading skeleton": mock API to delay, assert skeleton/loading indicator visible.
4. `chat.spec.ts` (Flow 4):
- Uses routeWebSocket per RESEARCH Pattern 2
- Test "send message -> receive streaming response": routeWebSocket matching /\/chat\/ws\//, mock auth acknowledgment and message response with simulated streaming tokens. Open chat page, select an agent/conversation, type message, press Enter, assert response text appears.
- Test "typing indicator shows during response": assert typing indicator visible between message send and response arrival.
- Test "empty state: no conversations shows prompt": navigate to /chat without selecting agent, assert empty state message visible.
- IMPORTANT: Use regex pattern for routeWebSocket: `/\/chat\/ws\//` (not string) -- the portal derives WS URL from NEXT_PUBLIC_API_URL which is absolute.
5. `rbac.spec.ts` (Flow 5):
- Uses customer_operator storageState
- Test "operator cannot access restricted paths": for each of ["/agents/new", "/billing", "/users"], goto path, assert NOT on that URL (proxy.ts redirects to /dashboard).
- Test "operator can view dashboard and chat": goto /dashboard, assert visible. Goto /chat, assert visible.
- Uses customer_admin storageState for contrast test: "admin can access /agents/new".
6. `i18n.spec.ts` (Flow 6):
- Test "language switcher changes UI to Spanish": find language switcher, select Espanol, assert key UI elements render in Spanish (check a known label like "Dashboard" -> "Panel" or whatever the Spanish translation is -- read from the messages/es.json file).
- Test "language persists across page navigation": switch to Portuguese, navigate to another page, assert Portuguese labels still showing.
7. `mobile.spec.ts` (Flow 7):
- Test "mobile viewport: bottom tab bar renders, sidebar hidden": setViewportSize 375x812, goto /dashboard, assert mobile bottom navigation visible, assert desktop sidebar not visible.
- Test "mobile chat: full screen message view": setViewportSize 375x812, navigate to chat, assert chat interface fills viewport.
- Test "error state: offline banner" (if applicable): if the PWA has offline detection, test it shows a banner.
For QA-06 coverage, embed empty/error/loading state tests within the relevant flow specs (noted above with specific test cases).
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx playwright test e2e/flows/ --project=chromium --reporter=list 2>&1 | tail -20</automated>
</verify>
<done>All 7 flow spec files exist with tests for critical paths plus empty/error/loading states. Tests pass on chromium. Cross-browser pass (firefox, webkit) confirmed by running full project suite.</done>
</task>
</tasks>
<verification>
1. `cd packages/portal && npx playwright test e2e/flows/ --project=chromium` -- all flow tests pass on chromium
2. `cd packages/portal && npx playwright test e2e/flows/` -- all flow tests pass on chromium + firefox + webkit
3. Each flow spec covers at least one happy path and one edge/error case
4. Auth setup generates 3 storageState files in playwright/.auth/
</verification>
<success_criteria>
- 7 flow spec files exist and pass on chromium
- Cross-browser (chromium + firefox + webkit) all green
- Empty/error/loading states tested within flow specs
- Auth storageState generated for 3 roles without manual intervention
- No real LLM calls in any test (WebSocket mocked)
</success_criteria>
<output>
After completion, create `.planning/phases/09-testing-qa/09-01-SUMMARY.md`
</output>

View File

@@ -0,0 +1,178 @@
---
phase: 09-testing-qa
plan: 02
type: execute
wave: 2
depends_on: ["09-01"]
files_modified:
- packages/portal/e2e/visual/snapshots.spec.ts
- packages/portal/e2e/accessibility/a11y.spec.ts
- packages/portal/e2e/lighthouse/lighthouserc.json
autonomous: true
requirements:
- QA-02
- QA-03
- QA-04
must_haves:
truths:
- "Visual regression snapshots exist for all key pages at 3 viewports (desktop, tablet, mobile)"
- "axe-core accessibility scan passes with zero critical violations on all key pages"
- "Lighthouse scores meet >= 80 hard floor on login page (90 target)"
- "Serious a11y violations are logged as warnings, not blockers"
artifacts:
- path: "packages/portal/e2e/visual/snapshots.spec.ts"
provides: "Visual regression tests at 3 viewports"
contains: "toHaveScreenshot"
- path: "packages/portal/e2e/accessibility/a11y.spec.ts"
provides: "axe-core accessibility scans on key pages"
contains: "AxeBuilder"
- path: "packages/portal/e2e/lighthouse/lighthouserc.json"
provides: "Lighthouse CI config with score thresholds"
contains: "minScore"
key_links:
- from: "packages/portal/e2e/accessibility/a11y.spec.ts"
to: "packages/portal/e2e/fixtures.ts"
via: "import axe fixture"
pattern: "from.*fixtures"
- from: "packages/portal/e2e/visual/snapshots.spec.ts"
to: "packages/portal/playwright.config.ts"
via: "visual-desktop/tablet/mobile projects"
pattern: "toHaveScreenshot"
---
<objective>
Add visual regression testing at 3 viewports, axe-core accessibility scanning on all key pages, and Lighthouse CI performance/accessibility score gating.
Purpose: Catches CSS regressions that unit tests miss, ensures WCAG 2.1 AA compliance, and validates performance baselines before beta launch.
Output: Visual snapshot specs, accessibility scan specs, Lighthouse CI config, and baseline screenshots.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/09-testing-qa/09-CONTEXT.md
@.planning/phases/09-testing-qa/09-RESEARCH.md
@.planning/phases/09-testing-qa/09-01-SUMMARY.md
Depends on Plan 01 for: playwright.config.ts (visual projects, a11y project), e2e/fixtures.ts (axe fixture), auth.setup.ts (storageState)
</context>
<tasks>
<task type="auto">
<name>Task 1: Visual regression snapshots and axe-core accessibility tests</name>
<files>
packages/portal/e2e/visual/snapshots.spec.ts
packages/portal/e2e/accessibility/a11y.spec.ts
</files>
<action>
1. Create `packages/portal/e2e/visual/snapshots.spec.ts`:
- Import `{ test, expect }` from `../fixtures`
- Use platform_admin storageState for authenticated pages
- Key pages to snapshot (each as a separate test):
a. Login page (no auth needed -- use empty storageState or navigate directly)
b. Dashboard
c. Agents list (/agents or /employees)
d. Agents/new (3-card entry screen)
e. Chat (empty state -- no conversation selected)
f. Templates gallery (/agents/new then select templates option, or /templates)
- Each test: goto page, wait for network idle or key element visible, call `await expect(page).toHaveScreenshot('page-name.png')`
- The 3 viewport sizes are handled by the playwright.config.ts visual-desktop/visual-tablet/visual-mobile projects -- the spec runs once, projects provide viewport variation
- For login page: navigate to /login without storageState
- For authenticated pages: use default storageState (platform_admin)
2. Create `packages/portal/e2e/accessibility/a11y.spec.ts`:
- Import `{ test, expect }` from `../fixtures` (gets axe fixture)
- Use platform_admin storageState
- Pages to scan: login, dashboard, agents list, agents/new, chat, templates, billing, users
- For each page, create a test:
```
test('page-name has no critical a11y violations', async ({ page, axe }) => {
await page.goto('/path');
await page.waitForLoadState('networkidle');
const results = await axe().analyze();
const critical = results.violations.filter(v => v.impact === 'critical');
const serious = results.violations.filter(v => v.impact === 'serious');
if (serious.length > 0) {
console.warn(`Serious a11y violations on /path:`, serious.map(v => v.id));
}
expect(critical, `Critical a11y violations on /path`).toHaveLength(0);
});
```
- Add keyboard navigation test: "Tab through login form fields": goto /login, press Tab repeatedly, assert focus moves through Email -> Password -> Sign In button using `page.locator(':focus')`.
- Add keyboard nav for chat: Tab to message input, type message, Enter to send.
3. Generate initial visual regression baselines:
- Build the portal: `cd packages/portal && npm run build`
- Copy static assets for standalone: `cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`
- Run with --update-snapshots: `npx playwright test e2e/visual/ --update-snapshots`
- This creates baseline screenshots in the __snapshots__ directory
- NOTE: If the full stack (gateway + DB) is not running, authenticated page snapshots may fail. In that case, generate baselines only for login page and document that full baselines require the running stack. The executor should start the stack via docker compose if possible.
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && npx playwright test e2e/accessibility/ --project=a11y --reporter=list 2>&1 | tail -20</automated>
</verify>
<done>Visual regression spec covers 6 key pages (runs at 3 viewports via projects), baseline screenshots generated. Accessibility spec scans 8+ pages with zero critical violations, serious violations logged as warnings. Keyboard navigation tested on login and chat.</done>
</task>
<task type="auto">
<name>Task 2: Lighthouse CI configuration and score gating</name>
<files>
packages/portal/e2e/lighthouse/lighthouserc.json
</files>
<action>
1. Create `packages/portal/e2e/lighthouse/lighthouserc.json`:
- Based on RESEARCH Pattern 5
- collect.url: only "/login" page (authenticated pages redirect to login when Lighthouse runs unauthenticated -- see RESEARCH Pitfall 5)
- collect.numberOfRuns: 1 (speed for CI)
- collect.settings.preset: "desktop"
- collect.settings.chromeFlags: "--no-sandbox --disable-dev-shm-usage"
- assert.assertions:
- categories:performance: ["error", {"minScore": 0.80}] (hard floor)
- categories:accessibility: ["error", {"minScore": 0.80}]
- categories:best-practices: ["error", {"minScore": 0.80}]
- categories:seo: ["error", {"minScore": 0.80}]
- upload.target: "filesystem"
- upload.outputDir: ".lighthouseci"
2. Verify Lighthouse runs successfully:
- Ensure portal is built and standalone server can start
- Run: `cd packages/portal && npx lhci autorun --config=e2e/lighthouse/lighthouserc.json`
- Verify scores are printed and assertions pass
- If score is below 80 on any category, investigate and document (do NOT lower thresholds)
NOTE: Per RESEARCH Pitfall 5, only /login is tested with Lighthouse because authenticated pages redirect. The 90 target is aspirational -- the 80 hard floor is what CI enforces. Dashboard/chat performance should be validated manually or via Web Vitals in production.
</action>
<verify>
<automated>cd /home/adelorenzo/repos/konstruct/packages/portal && test -f e2e/lighthouse/lighthouserc.json && cat e2e/lighthouse/lighthouserc.json | grep -q "minScore" && echo "PASS"</automated>
</verify>
<done>lighthouserc.json exists with score thresholds (80 hard floor, 90 aspirational). Lighthouse CI runs against /login and produces scores. All 4 categories (performance, accessibility, best practices, SEO) pass the 80 floor.</done>
</task>
</tasks>
<verification>
1. `cd packages/portal && npx playwright test e2e/visual/ --project=visual-desktop` -- visual regression passes (or creates baselines on first run)
2. `cd packages/portal && npx playwright test e2e/accessibility/ --project=a11y` -- zero critical violations
3. `cd packages/portal && npx lhci autorun --config=e2e/lighthouse/lighthouserc.json` -- all scores >= 80
4. Baseline screenshots committed to repo
</verification>
<success_criteria>
- Visual regression snapshots exist for 6 key pages at 3 viewports
- axe-core scans all key pages with zero critical a11y violations
- Serious a11y violations logged but not blocking
- Lighthouse CI passes with >= 80 on all 4 categories for /login
- Keyboard navigation tests pass for login form and chat input
</success_criteria>
<output>
After completion, create `.planning/phases/09-testing-qa/09-02-SUMMARY.md`
</output>

View File

@@ -0,0 +1,183 @@
---
phase: 09-testing-qa
plan: 03
type: execute
wave: 2
depends_on: ["09-01"]
files_modified:
- .gitea/workflows/ci.yml
autonomous: false
requirements:
- QA-07
must_haves:
truths:
- "CI pipeline YAML exists and is syntactically valid for Gitea Actions"
- "Pipeline stages enforce fail-fast: lint/type-check block unit tests, unit tests block E2E"
- "Pipeline includes backend tests (lint, type-check, pytest) and portal tests (build, E2E, Lighthouse)"
- "Test reports (JUnit XML, HTML) are uploaded as artifacts"
artifacts:
- path: ".gitea/workflows/ci.yml"
provides: "Complete CI pipeline for Gitea Actions"
contains: "playwright test"
key_links:
- from: ".gitea/workflows/ci.yml"
to: "packages/portal/playwright.config.ts"
via: "npx playwright test command"
pattern: "playwright test"
- from: ".gitea/workflows/ci.yml"
to: "packages/portal/e2e/lighthouse/lighthouserc.json"
via: "npx lhci autorun --config"
pattern: "lhci autorun"
---
<objective>
Create the Gitea Actions CI pipeline that runs the full test suite (backend lint + type-check + pytest, portal build + E2E + Lighthouse) on every push and PR to main.
Purpose: Makes the test suite CI-ready so quality gates are enforced automatically, not just locally. Completes the beta-readiness quality infrastructure.
Output: .gitea/workflows/ci.yml with fail-fast stages and artifact uploads.
</objective>
<execution_context>
@/home/adelorenzo/.claude/get-shit-done/workflows/execute-plan.md
@/home/adelorenzo/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/09-testing-qa/09-CONTEXT.md
@.planning/phases/09-testing-qa/09-RESEARCH.md
@.planning/phases/09-testing-qa/09-01-SUMMARY.md
Depends on Plan 01 for: Playwright config and test files that CI will execute
</context>
<tasks>
<task type="auto">
<name>Task 1: Create Gitea Actions CI workflow</name>
<files>
.gitea/workflows/ci.yml
</files>
<action>
Create `.gitea/workflows/ci.yml` based on RESEARCH Pattern 7 with these specifics:
1. Triggers: push to main, pull_request to main
2. Job 1: `backend` (Backend Tests)
- runs-on: ubuntu-latest
- Service containers:
- postgres: pgvector/pgvector:pg16, env POSTGRES_DB/USER/PASSWORD, health-cmd pg_isready
- redis: redis:7-alpine, health-cmd "redis-cli ping"
- Env vars: DATABASE_URL (asyncpg to konstruct_app), DATABASE_ADMIN_URL (asyncpg to postgres), REDIS_URL
- Steps:
- actions/checkout@v4
- actions/setup-python@v5 python-version 3.12
- pip install uv
- uv sync
- uv run ruff check packages/ tests/
- uv run ruff format --check packages/ tests/
- uv run pytest tests/ -x --tb=short --junitxml=test-results.xml
- Upload test-results.xml as artifact (if: always())
3. Job 2: `portal` (Portal E2E) -- needs: backend
- runs-on: ubuntu-latest
- Service containers: same postgres + redis
- Steps:
- actions/checkout@v4
- actions/setup-node@v4 node-version 22
- actions/setup-python@v5 python-version 3.12 (for gateway)
- Install portal deps: `cd packages/portal && npm ci`
- Build portal: `cd packages/portal && npm run build` with NEXT_PUBLIC_API_URL env
- Copy standalone assets: `cd packages/portal && cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`
- Install Playwright browsers: `cd packages/portal && npx playwright install --with-deps chromium firefox webkit`
- Start gateway (background):
```
pip install uv && uv sync
uv run alembic upgrade head
uv run python -c "from shared.db import seed_admin; import asyncio; asyncio.run(seed_admin())" || true
uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
```
env: DATABASE_URL, DATABASE_ADMIN_URL, REDIS_URL, LLM_POOL_URL (http://localhost:8004)
- Wait for gateway: `timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'`
- Run E2E tests: `cd packages/portal && npx playwright test e2e/flows/ e2e/accessibility/`
env: CI=true, PLAYWRIGHT_BASE_URL, API_URL, AUTH_SECRET, E2E_ADMIN_EMAIL, E2E_ADMIN_PASSWORD, E2E_CADMIN_EMAIL, E2E_CADMIN_PASSWORD, E2E_OPERATOR_EMAIL, E2E_OPERATOR_PASSWORD
(Use secrets for credentials: ${{ secrets.E2E_ADMIN_EMAIL }} etc.)
- Run Lighthouse CI: `cd packages/portal && npx lhci autorun --config=e2e/lighthouse/lighthouserc.json`
env: LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
- Upload Playwright report (if: always()): actions/upload-artifact@v4, path packages/portal/playwright-report/
- Upload Playwright JUnit (if: always()): actions/upload-artifact@v4, path packages/portal/playwright-results.xml
- Upload Lighthouse report (if: always()): actions/upload-artifact@v4, path packages/portal/.lighthouseci/
IMPORTANT: Do NOT include mypy --strict step (existing codebase may not be fully strict-typed). Only include ruff check and ruff format --check for linting.
NOTE: The seed_admin call may not exist -- include `|| true` so it doesn't block. The E2E auth setup creates test users via the login form, so the admin user must already exist in the database. If there's a migration seed, it will handle this.
Pipeline target: < 5 minutes total.
</action>
<verify>
<automated>test -f /home/adelorenzo/repos/konstruct/.gitea/workflows/ci.yml && python3 -c "import yaml; yaml.safe_load(open('/home/adelorenzo/repos/konstruct/.gitea/workflows/ci.yml'))" && echo "VALID YAML"</automated>
</verify>
<done>CI pipeline YAML exists at .gitea/workflows/ci.yml, is valid YAML, has 2 jobs (backend + portal), portal depends on backend (fail-fast), includes lint/format/pytest/E2E/Lighthouse/artifact-upload steps</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<name>Task 2: Verify test suite and CI pipeline</name>
<what-built>
Complete E2E test suite (7 flow specs + accessibility + visual regression + Lighthouse CI) and Gitea Actions CI pipeline. Tests cover login, tenant CRUD, agent deployment, chat with mocked WebSocket, RBAC enforcement, i18n language switching, mobile viewport behavior, accessibility (axe-core), and visual regression at 3 viewports.
</what-built>
<how-to-verify>
1. Run the full E2E test suite locally:
```
cd packages/portal
npx playwright test --project=chromium --reporter=list
```
Expected: All flow tests + accessibility tests pass
2. Run cross-browser:
```
npx playwright test e2e/flows/ --reporter=list
```
Expected: All tests pass on chromium, firefox, webkit
3. Check the Playwright HTML report:
```
npx playwright show-report
```
Expected: Opens browser with detailed test results
4. Review the CI pipeline:
```
cat .gitea/workflows/ci.yml
```
Expected: Valid YAML with backend job (lint + pytest) and portal job (build + E2E + Lighthouse), portal depends on backend
5. (Optional) Push a branch to trigger CI on git.oe74.net and verify pipeline runs
</how-to-verify>
<resume-signal>Type "approved" if tests pass and CI pipeline looks correct, or describe issues</resume-signal>
</task>
</tasks>
<verification>
1. `.gitea/workflows/ci.yml` exists and is valid YAML
2. Pipeline has 2 jobs: backend (lint + pytest) and portal (build + E2E + Lighthouse)
3. Portal job depends on backend job (fail-fast enforced)
4. Secrets referenced for credentials (not hardcoded)
5. Artifacts uploaded for test reports
</verification>
<success_criteria>
- CI pipeline YAML is syntactically valid
- Pipeline stages enforce fail-fast ordering
- Backend job: ruff check + ruff format --check + pytest
- Portal job: npm build + Playwright E2E + Lighthouse CI
- Test reports uploaded as artifacts (JUnit XML, HTML, Lighthouse)
- Human approves test suite and pipeline structure
</success_criteria>
<output>
After completion, create `.planning/phases/09-testing-qa/09-03-SUMMARY.md`
</output>

View File

@@ -0,0 +1,764 @@
# Phase 9: Testing & QA - Research
**Researched:** 2026-03-25
**Domain:** Playwright E2E, Lighthouse CI, visual regression, axe-core accessibility, Gitea Actions CI
**Confidence:** HIGH
## Summary
Phase 9 is a greenfield testing layer added on top of a fully-built portal (Next.js 16 standalone, FastAPI gateway, Celery worker). No Playwright config exists yet — the Playwright MCP plugin is installed for manual use but there is no `playwright.config.ts`, no `tests/e2e/` content, and no `.gitea/workflows/` CI file. Everything must be created from scratch.
The core challenges are: (1) Auth.js v5 JWT sessions that Playwright must obtain and reuse across multiple role fixtures (platform_admin, customer_admin, customer_operator); (2) the WebSocket chat flow at `/chat/ws/{conversation_id}` that needs mocking via `page.routeWebSocket()`; (3) Lighthouse CI that requires a running Next.js server (standalone output complicates `startServerCommand`); and (4) a sub-5-minute pipeline on Gitea Actions that is nearly syntax-identical to GitHub Actions.
**Primary recommendation:** Place Playwright config and tests inside `packages/portal/` (Next.js co-location pattern), use `storageState` with three saved auth fixtures for roles, mock the WebSocket endpoint with `page.routeWebSocket()` for the chat flow, and run `@lhci/cli` in a separate post-build CI stage.
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
All decisions at Claude's discretion — user trusts judgment.
- Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation)
- Critical flows to test (priority order):
1. Login → dashboard loads → session persists
2. Create tenant → tenant appears in list
3. Deploy template agent → agent appears in employees list
4. Chat: open conversation → send message → receive streaming response (mock LLM)
5. RBAC: operator cannot access /agents/new, /billing, /users
6. Language switcher → UI updates to selected language
7. Mobile viewport: bottom tab bar renders, sidebar hidden
- LLM responses mocked in E2E tests (no real Ollama/API calls)
- Test data: seed a test tenant + test user via API calls in test setup, clean up after
- Lighthouse targets: >= 90 (fail at 80, warn at 85)
- Pages: login, dashboard, chat, agents/new
- Visual regression at 3 viewports: desktop 1280x800, tablet 768x1024, mobile 375x812
- Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery
- Baseline snapshots committed to repo
- axe-core via @axe-core/playwright, zero critical violations required
- "serious" violations logged as warnings (not blockers for beta)
- Keyboard navigation test: Tab through login form, chat input, nav items
- Cross-browser: chromium, firefox, webkit
- Visual regression: chromium only
- Gitea Actions, triggers: push to main, PR to main
- Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse
- Docker Compose for CI infra
- JUnit XML + HTML trace viewer reports
- Fail-fast: lint/type errors block everything; unit test failures block E2E
- Target: < 5 min pipeline
### Claude's Discretion
- Playwright config details (timeouts, retries, parallelism)
- Test file organization (by feature vs by page)
- Fixture/helper patterns for auth, tenant setup, API mocking
- Lighthouse CI tool (lighthouse-ci vs @lhci/cli)
- Whether to include a smoke test for the WebSocket chat connection
- Visual regression threshold (pixel diff tolerance)
### Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| QA-01 | Playwright E2E tests cover all critical user flows (login, tenant CRUD, agent deploy, chat, billing, RBAC) | Playwright storageState auth fixtures + routeWebSocket for chat mock |
| QA-02 | Lighthouse scores >= 90 for performance, accessibility, best practices, SEO on key pages | @lhci/cli with minScore assertions per category |
| QA-03 | Visual regression snapshots at desktop/tablet/mobile for all key pages | toHaveScreenshot with maxDiffPixelRatio, viewports per project |
| QA-04 | axe-core accessibility audit passes with zero critical violations across all pages | @axe-core/playwright AxeBuilder with impact filter |
| QA-05 | E2E tests pass on Chrome, Firefox, Safari (WebKit) | Playwright projects array with three browser engines |
| QA-06 | Empty states, error states, loading states tested and rendered correctly | Dedicated test cases + API mocking for empty/error responses |
| QA-07 | CI-ready test suite runnable in Gitea Actions pipeline | .gitea/workflows/ci.yml with Docker Compose service containers |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| @playwright/test | ^1.51 | E2E + visual regression + accessibility runner | Official Next.js recommendation, cross-browser built-in, no extra dependencies |
| @axe-core/playwright | ^4.10 | Accessibility scanning within Playwright tests | Official Deque package, integrates directly with Playwright page objects |
| @lhci/cli | ^0.15 | Lighthouse CI score assertions | Google-maintained, headless Lighthouse, assertion config via lighthouserc |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| axe-html-reporter | ^2.2 | HTML accessibility reports | When you want human-readable a11y reports attached to CI artifacts |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| @lhci/cli | lighthouse npm module directly | @lhci/cli handles multi-run averaging, assertions, and CI upload; raw lighthouse requires custom scripting |
| @axe-core/playwright | axe-playwright (third-party) | @axe-core/playwright is the official Deque package; axe-playwright is a community wrapper with same API but extra dep |
**Installation (portal):**
```bash
cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit
```
## Architecture Patterns
### Recommended Project Structure
```
packages/portal/
├── playwright.config.ts # Main config: projects, webServer, globalSetup
├── e2e/
│ ├── auth.setup.ts # Global setup: save storageState per role
│ ├── fixtures.ts # Extended test: auth fixtures, axe builder, API helpers
│ ├── helpers/
│ │ ├── seed.ts # Seed test tenant + user via API, return IDs
│ │ └── cleanup.ts # Delete seeded data after test suite
│ ├── flows/
│ │ ├── login.spec.ts # Flow 1: login → dashboard loads → session persists
│ │ ├── tenant-crud.spec.ts # Flow 2: create tenant → appears in list
│ │ ├── agent-deploy.spec.ts # Flow 3: deploy template → appears in employees
│ │ ├── chat.spec.ts # Flow 4: open chat → send msg → streaming response (mocked WS)
│ │ ├── rbac.spec.ts # Flow 5: operator access denied to restricted pages
│ │ ├── i18n.spec.ts # Flow 6: language switcher → UI updates
│ │ └── mobile.spec.ts # Flow 7: mobile viewport → bottom tab bar, sidebar hidden
│ ├── accessibility/
│ │ └── a11y.spec.ts # axe-core scan on every key page, keyboard nav test
│ ├── visual/
│ │ └── snapshots.spec.ts # Visual regression at 3 viewports (chromium only)
│ └── lighthouse/
│ └── lighthouserc.json # @lhci/cli config: URLs, score thresholds
├── playwright/.auth/ # gitignored — saved storageState files
│ ├── platform-admin.json
│ ├── customer-admin.json
│ └── customer-operator.json
└── __snapshots__/ # Committed baseline screenshots
.gitea/
└── workflows/
└── ci.yml # Pipeline: lint → typecheck → pytest → build → E2E → lhci
```
### Pattern 1: Auth.js v5 storageState with Multiple Roles
**What:** Authenticate each role once in a global setup project, save to JSON. All E2E tests consume the saved state — no repeated login UI interactions.
**When to use:** Any test that requires a logged-in user. Each spec declares which role it needs via `test.use({ storageState })`.
**Key insight for Auth.js v5:** The credentials provider calls the FastAPI `/api/portal/auth/verify` endpoint. Playwright must fill the login form (not call the API directly) because `next-auth` sets `HttpOnly` session cookies that only the browser can hold. The storageState captures those cookies.
```typescript
// Source: https://playwright.dev/docs/auth
// e2e/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
import path from "path";
const PLATFORM_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/platform-admin.json");
const CUSTOMER_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-admin.json");
const OPERATOR_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-operator.json");
setup("authenticate as platform admin", async ({ page }) => {
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_ADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_ADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: PLATFORM_ADMIN_AUTH });
});
setup("authenticate as customer admin", async ({ page }) => {
// seed returns { email, password } for a fresh customer_admin user
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_CADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_CADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: CUSTOMER_ADMIN_AUTH });
});
```
### Pattern 2: WebSocket Mocking for Chat Flow
**What:** Intercept the `/chat/ws/{conversationId}` WebSocket before the gateway is contacted. Respond to the auth message, then simulate streaming tokens on a user message.
**When to use:** Flow 4 (chat E2E test). The gateway WebSocket endpoint at `ws://localhost:8001/chat/ws/{id}` is routed via the Next.js API proxy — intercept at the browser level.
```typescript
// Source: https://playwright.dev/docs/api/class-websocketroute
// e2e/flows/chat.spec.ts
test("chat: send message → receive streaming response", async ({ page }) => {
await page.routeWebSocket(/\/chat\/ws\//, (ws) => {
ws.onMessage((msg) => {
const data = JSON.parse(msg as string);
if (data.type === "auth") {
// Acknowledge auth — no response needed, gateway just proceeds
return;
}
if (data.type === "message") {
// Simulate typing indicator
ws.send(JSON.stringify({ type: "typing" }));
// Simulate streaming tokens
const tokens = ["Hello", " from", " your", " AI", " assistant!"];
tokens.forEach((token, i) => {
setTimeout(() => {
ws.send(JSON.stringify({ type: "chunk", token }));
}, i * 50);
});
setTimeout(() => {
ws.send(JSON.stringify({
type: "response",
text: tokens.join(""),
conversation_id: data.conversation_id,
}));
ws.send(JSON.stringify({ type: "done", text: tokens.join("") }));
}, tokens.length * 50 + 100);
}
});
});
await page.goto("/chat?agentId=test-agent");
await page.getByPlaceholder(/type a message/i).fill("Hello!");
await page.keyboard.press("Enter");
await expect(page.getByText("Hello from your AI assistant!")).toBeVisible({ timeout: 5000 });
});
```
### Pattern 3: Visual Regression at Multiple Viewports
**What:** Configure separate Playwright projects for each viewport, run snapshots only on chromium to avoid cross-browser rendering diffs.
**When to use:** QA-03. Visual regression baseline committed to repo; CI fails on diff.
```typescript
// Source: https://playwright.dev/docs/test-snapshots
// playwright.config.ts (visual projects section)
{
name: "visual-desktop",
use: {
...devices["Desktop Chrome"],
viewport: { width: 1280, height: 800 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-tablet",
use: {
browserName: "chromium",
viewport: { width: 768, height: 1024 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-mobile",
use: {
...devices["iPhone 12"],
viewport: { width: 375, height: 812 },
},
testMatch: "e2e/visual/**",
},
```
Global threshold:
```typescript
// playwright.config.ts
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02, // 2% tolerance — accounts for antialiasing
threshold: 0.2, // pixel color threshold (01)
},
},
```
### Pattern 4: axe-core Fixture
**What:** Shared fixture that creates an AxeBuilder for each page, scoped to WCAG 2.1 AA, filtering results by impact level.
```typescript
// Source: https://playwright.dev/docs/accessibility-testing
// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
export const test = base.extend<{ axe: () => AxeBuilder }>({
axe: async ({ page }, use) => {
const makeBuilder = () =>
new AxeBuilder({ page })
.withTags(["wcag2a", "wcag2aa", "wcag21aa"]);
await use(makeBuilder);
},
});
// In a test:
const results = await axe().analyze();
const criticalViolations = results.violations.filter(v => v.impact === "critical");
const seriousViolations = results.violations.filter(v => v.impact === "serious");
expect(criticalViolations, "Critical a11y violations found").toHaveLength(0);
if (seriousViolations.length > 0) {
console.warn("Serious a11y violations (non-blocking):", seriousViolations);
}
```
### Pattern 5: Lighthouse CI Config
**What:** `lighthouserc.json` drives `@lhci/cli autorun` in CI. Pages run headlessly against the built portal.
```json
// Source: https://googlechrome.github.io/lighthouse-ci/docs/configuration.html
// e2e/lighthouse/lighthouserc.json
{
"ci": {
"collect": {
"url": [
"http://localhost:3000/login",
"http://localhost:3000/dashboard",
"http://localhost:3000/chat",
"http://localhost:3000/agents/new"
],
"numberOfRuns": 1,
"settings": {
"preset": "desktop",
"chromeFlags": "--no-sandbox --disable-dev-shm-usage"
}
},
"assert": {
"assertions": {
"categories:performance": ["error", {"minScore": 0.80}],
"categories:accessibility": ["error", {"minScore": 0.80}],
"categories:best-practices": ["error", {"minScore": 0.80}],
"categories:seo": ["error", {"minScore": 0.80}]
}
},
"upload": {
"target": "filesystem",
"outputDir": ".lighthouseci"
}
}
}
```
Note: `error` at 0.80 means CI fails below 80; the 90 target is aspirational. Set warn at 0.85 for soft alerts.
### Pattern 6: Playwright Config (Full)
```typescript
// packages/portal/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
testDir: "./e2e",
fullyParallel: false, // Stability in CI with shared DB state
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 1 : undefined,
timeout: 30_000,
reporter: [
["html", { outputFolder: "playwright-report" }],
["junit", { outputFile: "playwright-results.xml" }],
["list"],
],
use: {
baseURL: process.env.PLAYWRIGHT_BASE_URL ?? "http://localhost:3000",
trace: "on-first-retry",
screenshot: "only-on-failure",
serviceWorkers: "block", // Prevents Serwist from intercepting test requests
},
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02,
threshold: 0.2,
},
},
projects: [
// Auth setup runs first for all browser projects
{ name: "setup", testMatch: /auth\.setup\.ts/ },
// E2E flows — all 3 browsers
{
name: "chromium",
use: { ...devices["Desktop Chrome"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "firefox",
use: { ...devices["Desktop Firefox"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "webkit",
use: { ...devices["Desktop Safari"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
// Visual regression — chromium only, 3 viewports
{ name: "visual-desktop", use: { browserName: "chromium", viewport: { width: 1280, height: 800 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-tablet", use: { browserName: "chromium", viewport: { width: 768, height: 1024 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-mobile", use: { ...devices["iPhone 12"] }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
// Accessibility — chromium only
{
name: "a11y",
use: { ...devices["Desktop Chrome"] },
dependencies: ["setup"],
testMatch: "e2e/accessibility/**",
},
],
webServer: {
command: "node .next/standalone/server.js",
url: "http://localhost:3000",
reuseExistingServer: !process.env.CI,
env: {
PORT: "3000",
API_URL: process.env.API_URL ?? "http://localhost:8001",
AUTH_SECRET: process.env.AUTH_SECRET ?? "test-secret-32-chars-minimum-len",
AUTH_URL: "http://localhost:3000",
},
},
});
```
**Critical:** `serviceWorkers: "block"` is required because Serwist (PWA service worker) intercepts network requests and makes them invisible to `page.route()` / `page.routeWebSocket()`.
### Pattern 7: Gitea Actions CI Pipeline
```yaml
# .gitea/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
backend:
name: Backend Tests
runs-on: ubuntu-latest
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 5s
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install uv
- run: uv sync
- run: uv run ruff check packages/ tests/
- run: uv run mypy --strict packages/
- run: uv run pytest tests/ -x --tb=short
portal:
name: Portal E2E
runs-on: ubuntu-latest
needs: backend # E2E blocked until backend passes
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: --health-cmd pg_isready --health-interval 5s --health-retries 10
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "22" }
- name: Install portal deps
working-directory: packages/portal
run: npm ci
- name: Build portal
working-directory: packages/portal
run: npm run build
env:
NEXT_PUBLIC_API_URL: http://localhost:8001
- name: Install Playwright browsers
working-directory: packages/portal
run: npx playwright install --with-deps chromium firefox webkit
- name: Start gateway (background)
run: |
pip install uv && uv sync
uv run alembic upgrade head
uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
LLM_POOL_URL: http://localhost:8004 # not running — mocked in E2E
- name: Wait for gateway
run: timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'
- name: Run E2E tests
working-directory: packages/portal
run: npx playwright test e2e/flows/ e2e/accessibility/
env:
CI: "true"
PLAYWRIGHT_BASE_URL: http://localhost:3000
API_URL: http://localhost:8001
AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
E2E_ADMIN_EMAIL: ${{ secrets.E2E_ADMIN_EMAIL }}
E2E_ADMIN_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }}
- name: Run Lighthouse CI
working-directory: packages/portal
run: |
npx lhci autorun --config=e2e/lighthouse/lighthouserc.json
env:
LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: packages/portal/playwright-report/
- name: Upload Lighthouse report
if: always()
uses: actions/upload-artifact@v4
with:
name: lighthouse-report
path: packages/portal/.lighthouseci/
```
### Anti-Patterns to Avoid
- **Hardcoded IDs in selectors:** Use `getByRole`, `getByLabel`, `getByText` — never CSS `#id` or `[data-testid]` unless semantic selectors are unavailable. Semantic selectors are more resilient and double as accessibility checks.
- **Real LLM calls in E2E:** Never let E2E tests reach Ollama/OpenAI. Mock the WebSocket and gateway LLM calls. Real calls introduce flakiness and cost.
- **Superuser DB connections in test seeds:** The existing conftest uses `konstruct_app` role to preserve RLS. E2E seeds should call the FastAPI admin API endpoints, not connect directly to the DB.
- **Enabling service workers in tests:** Serwist intercepts all requests. Always set `serviceWorkers: "block"` in Playwright config.
- **Parallel workers with shared DB state:** Set `workers: 1` in CI. Tenant/agent mutations are not thread-safe across workers without per-worker isolation.
- **Running visual regression on all browsers:** Browser rendering engines produce expected pixel diffs. Visual regression on chromium only; cross-browser covered by functional E2E.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Screenshot diffs | Custom pixel comparator | `toHaveScreenshot()` built into Playwright | Handles baseline storage, update workflow, CI reporting |
| Accessibility scanning | Custom ARIA traversal | `@axe-core/playwright` | Covers 57 WCAG rules including ones humans miss |
| Performance score gating | Parsing Lighthouse JSON manually | `@lhci/cli assert` | Handles multi-run averaging, threshold config, exit codes |
| Auth state reuse | Logging in before every test | Playwright `storageState` | Session reuse makes the suite 10x faster |
| WS mock server | Running a real mock websocket server | `page.routeWebSocket()` | In-process, no port conflicts, no flakiness |
## Common Pitfalls
### Pitfall 1: Auth.js HttpOnly Cookies
**What goes wrong:** Trying to authenticate by calling `/api/portal/auth/verify` directly with Playwright `request` — this bypasses Auth.js cookie-setting, so the browser session never exists.
**Why it happens:** Auth.js v5 JWT is set as `HttpOnly` secure cookie by the Next.js server, not by the FastAPI backend.
**How to avoid:** Always use Playwright's UI login flow (fill form → submit → wait for redirect) to let Next.js set the cookie. Then save with `storageState`.
**Warning signs:** Tests pass the login assertion but fail immediately after on authenticated pages.
### Pitfall 2: Serwist Service Worker Intercepting Test Traffic
**What goes wrong:** `page.route()` and `page.routeWebSocket()` handlers never fire because the PWA service worker handles requests first.
**Why it happens:** Serwist registers a service worker that intercepts all requests matching the scope. Playwright's routing operates at the network level before the service worker, but only if service workers are blocked.
**How to avoid:** Set `serviceWorkers: "block"` in `playwright.config.ts` under `use`.
**Warning signs:** Mock routes never called; tests see real responses or network errors.
### Pitfall 3: Next.js Standalone Output Path for webServer
**What goes wrong:** `command: "npm run start"` fails in CI because `next start` requires the dev server setup, not standalone output.
**Why it happens:** The portal uses `output: "standalone"` in `next.config.ts`. The build produces `.next/standalone/server.js`, not the standard Next.js CLI server.
**How to avoid:** Use `command: "node .next/standalone/server.js"` in Playwright's `webServer` config. Copy static files if needed: the build step must run `cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`.
**Warning signs:** `webServer` process exits immediately; Playwright reports "server did not start".
### Pitfall 4: Visual Regression Baseline Committed Without CI Environment Lock
**What goes wrong:** Baselines created on a developer's Mac differ from Linux CI renderings (font rendering, subpixel AA, etc.).
**Why it happens:** Screenshot comparisons are pixel-exact. OS-level rendering differences cause 15% false failures.
**How to avoid:** Generate baselines inside the same Docker/Linux environment as CI. Run `npx playwright test --update-snapshots` on Linux (or in the Playwright Docker image) to commit initial baselines. Use `maxDiffPixelRatio: 0.02` to absorb minor remaining differences.
**Warning signs:** Visual tests pass locally but always fail in CI.
### Pitfall 5: Lighthouse Pages Behind Auth
**What goes wrong:** Lighthouse visits `/dashboard` and gets redirected to `/login` — scores an empty page.
**Why it happens:** Lighthouse runs as an unauthenticated browser session. LHCI doesn't support Auth.js cookie injection.
**How to avoid:** For authenticated pages, either (a) test only public pages with Lighthouse (login, landing), or (b) use LHCI's `basicAuth` option for pages behind HTTP auth (not applicable here), or (c) create a special unauthenticated preview mode. **For this project:** Run Lighthouse on `/login` only, plus any public-accessible marketing pages. Skip `/dashboard` and `/chat` for Lighthouse.
**Warning signs:** Lighthouse scores 100 for accessibility on dashboard — suspiciously perfect because it's measuring an empty redirect.
### Pitfall 6: WebSocket URL Resolution in Tests
**What goes wrong:** `page.routeWebSocket("/chat/ws/")` doesn't match because the portal derives the WS URL from `NEXT_PUBLIC_API_URL` (baked at build time), which points to `ws://localhost:8001`, not a relative path.
**Why it happens:** `use-chat-socket.ts` computes `WS_BASE` from `process.env.NEXT_PUBLIC_API_URL` and builds `ws://localhost:8001/chat/ws/{id}`.
**How to avoid:** Use a regex pattern: `page.routeWebSocket(/\/chat\/ws\//, handler)` — this matches the full absolute URL.
**Warning signs:** Chat mock never fires; test times out waiting for WS message.
### Pitfall 7: Gitea Actions Runner Needs Docker
**What goes wrong:** Service containers fail to start because the Gitea runner is not configured with Docker access.
**Why it happens:** Gitea Actions service containers require Docker socket access on the runner.
**How to avoid:** Ensure the `act_runner` is added to the `docker` group on the host. Alternative: use `docker compose` in a setup step instead of service containers.
**Warning signs:** Job fails immediately with "Cannot connect to Docker daemon".
## Code Examples
### Seed Helper via API
```typescript
// e2e/helpers/seed.ts
// Uses Playwright APIRequestContext to create test data via FastAPI endpoints.
// Must run BEFORE storageState setup (needs platform_admin creds via env).
export async function seedTestTenant(request: APIRequestContext): Promise<{ tenantId: string; tenantSlug: string }> {
const suffix = Math.random().toString(36).slice(2, 8);
const res = await request.post("http://localhost:8001/api/portal/tenants", {
headers: {
"X-User-Id": process.env.E2E_ADMIN_ID!,
"X-User-Role": "platform_admin",
"X-Active-Tenant": "",
},
data: { name: `E2E Tenant ${suffix}`, slug: `e2e-tenant-${suffix}` },
});
const body = await res.json() as { id: string; slug: string };
return { tenantId: body.id, tenantSlug: body.slug };
}
```
### RBAC Test Pattern
```typescript
// e2e/flows/rbac.spec.ts
// Tests that operator role is silently redirected, not 403-paged
test.describe("RBAC enforcement", () => {
test.use({ storageState: "playwright/.auth/customer-operator.json" });
const restrictedPaths = ["/agents/new", "/billing", "/users"];
for (const path of restrictedPaths) {
test(`operator cannot access ${path}`, async ({ page }) => {
await page.goto(path);
// proxy.ts does silent redirect — operator ends up on /dashboard
await expect(page).not.toHaveURL(path);
});
}
});
```
### Mobile Viewport Behavioral Test
```typescript
// e2e/flows/mobile.spec.ts
test("mobile: bottom tab bar renders, sidebar hidden", async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto("/dashboard");
// Bottom tab bar visible
await expect(page.getByRole("navigation", { name: /mobile/i })).toBeVisible();
// Desktop sidebar hidden
await expect(page.getByRole("navigation", { name: /sidebar/i })).not.toBeVisible();
});
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Cypress for Next.js E2E | Playwright (official Next.js recommendation) | 20232024 | Cross-browser, better WS support, no iframe limitations |
| `lighthouse` npm module with custom scripts | `@lhci/cli autorun` | 2020+ | Automated multi-run averaging, assertions, CI reporting |
| `axe-playwright` (community) | `@axe-core/playwright` (official Deque) | 2022+ | Official package, same API, no extra wrapper |
| `next start` for E2E server | `node .next/standalone/server.js` | Next.js 12+ standalone | Required when `output: "standalone"` is set |
| middleware.ts | proxy.ts | Next.js 16 | Next.js 16 renamed middleware file |
**Deprecated/outdated:**
- `cypress/integration/` directory: Cypress split this into `cypress/e2e/` in v10 — but we're not using Cypress
- `@playwright/test` `globalSetup` string path: Still valid but the project-based `setup` dependency is preferred in Playwright 1.40+
- `installSerwist()`: Replaced by `new Serwist() + addEventListeners()` in serwist v9 (already applied in Phase 8)
## Open Questions
1. **Lighthouse on authenticated pages**
- What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to `/login`
- What's unclear: Whether LHCI supports cookie injection (not documented)
- Recommendation: Scope Lighthouse to `/login` only for QA-02. Dashboard/chat performance validated manually or via Web Vitals tracking in production.
2. **Visual regression baseline generation environment**
- What we know: OS-level rendering differences cause false failures
- What's unclear: Whether the Gitea runner is Linux or Mac
- Recommendation: Wave 0 task generates baselines inside the CI Docker container (Linux), commits them. Dev machines use `--update-snapshots` only deliberately.
3. **Celery worker in E2E**
- What we know: The chat WebSocket flow uses Redis pub-sub to deliver responses from the Celery worker
- What's unclear: Whether E2E should run the Celery worker (real pipeline, slow) or mock the WS entirely (fast but less realistic)
- Recommendation: Mock the WebSocket entirely via `page.routeWebSocket()`. This tests the frontend streaming UX without depending on Celery. Add a separate smoke test that hits the gateway `/health` endpoint to verify service health in CI.
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework (backend) | pytest 8.3+ / pytest-asyncio (existing, all tests pass) |
| Framework (E2E) | @playwright/test ^1.51 (to be installed) |
| Config file (E2E) | `packages/portal/playwright.config.ts` — Wave 0 |
| Quick run (backend) | `uv run pytest tests/unit -x --tb=short` |
| Full suite (backend) | `uv run pytest tests/ -x --tb=short` |
| E2E run | `cd packages/portal && npx playwright test` |
| Visual update | `cd packages/portal && npx playwright test --update-snapshots` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| QA-01 | 7 critical user flows pass | E2E Playwright | `npx playwright test e2e/flows/ --project=chromium` | Wave 0 |
| QA-02 | Lighthouse >= 90 on key pages | Lighthouse CI | `npx lhci autorun --config=e2e/lighthouse/lighthouserc.json` | Wave 0 |
| QA-03 | Visual snapshots pass at 3 viewports | Visual regression | `npx playwright test e2e/visual/` | Wave 0 |
| QA-04 | Zero critical a11y violations | Accessibility scan | `npx playwright test e2e/accessibility/` | Wave 0 |
| QA-05 | All E2E flows pass on 3 browsers | Cross-browser E2E | `npx playwright test e2e/flows/` (all projects) | Wave 0 |
| QA-06 | Empty/error/loading states correct | E2E Playwright | Covered within flow specs via API mocking | Wave 0 |
| QA-07 | CI pipeline runs in Gitea Actions | CI workflow | `.gitea/workflows/ci.yml` | Wave 0 |
### Sampling Rate
- **Per task commit:** `cd packages/portal && npx playwright test e2e/flows/login.spec.ts --project=chromium`
- **Per wave merge:** `cd packages/portal && npx playwright test e2e/flows/ --project=chromium`
- **Phase gate:** Full suite (all projects + accessibility + visual) green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `packages/portal/playwright.config.ts` — E2E framework config
- [ ] `packages/portal/e2e/auth.setup.ts` — Auth state generation for 3 roles
- [ ] `packages/portal/e2e/fixtures.ts` — Shared test fixtures (axe, auth, API helpers)
- [ ] `packages/portal/e2e/helpers/seed.ts` — Test data seeding via API
- [ ] `packages/portal/e2e/flows/*.spec.ts` — 7 flow spec files
- [ ] `packages/portal/e2e/accessibility/a11y.spec.ts` — axe-core scans
- [ ] `packages/portal/e2e/visual/snapshots.spec.ts` — visual regression specs
- [ ] `packages/portal/e2e/lighthouse/lighthouserc.json` — Lighthouse CI config
- [ ] `.gitea/workflows/ci.yml` — CI pipeline
- [ ] `packages/portal/playwright/.auth/.gitkeep` — Directory for saved auth state (gitignored content)
- [ ] Framework install: `cd packages/portal && npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli && npx playwright install --with-deps`
- [ ] Baseline snapshots: run `npx playwright test e2e/visual/ --update-snapshots` on Linux to generate
## Sources
### Primary (HIGH confidence)
- https://playwright.dev/docs/auth — storageState, setup projects, multiple roles
- https://playwright.dev/docs/api/class-websocketroute — WebSocket mocking API
- https://playwright.dev/docs/test-snapshots — toHaveScreenshot, maxDiffPixelRatio
- https://playwright.dev/docs/accessibility-testing — @axe-core/playwright integration
- https://playwright.dev/docs/ci — CI configuration, Docker image, workers
- https://googlechrome.github.io/lighthouse-ci/docs/configuration.html — minScore assertions format
### Secondary (MEDIUM confidence)
- https://googlechrome.github.io/lighthouse-ci/docs/getting-started.html — lhci autorun setup
- https://playwright.dev/docs/mock — page.route() and page.routeWebSocket() overview
- Gitea Actions docs (forum.gitea.com) — confirmed GitHub Actions YAML compatibility, Docker socket requirements
### Tertiary (LOW confidence)
- WebSearch result: Gitea runner Docker group requirement — mentioned across multiple community posts, not in official docs
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — verified against official Playwright, @axe-core, and LHCI docs
- Architecture: HIGH — patterns derived directly from official Playwright documentation
- Pitfalls: HIGH (pitfalls 16 from direct codebase inspection + official docs); MEDIUM (pitfall 7 from community sources)
**Research date:** 2026-03-25
**Valid until:** 2026-06-25 (90 days — Playwright and Next.js are fast-moving but breaking changes are rare)

View File

@@ -0,0 +1,78 @@
---
phase: 9
slug: testing-qa
status: draft
nyquist_compliant: false
wave_0_complete: false
created: 2026-03-26
---
# Phase 9 — Validation Strategy
> Per-phase validation contract for feedback sampling during execution.
---
## Test Infrastructure
| Property | Value |
|----------|-------|
| **Framework** | Playwright + @axe-core/playwright + @lhci/cli |
| **Config file** | `packages/portal/playwright.config.ts` |
| **Quick run command** | `cd packages/portal && npx playwright test --project=chromium` |
| **Full suite command** | `cd packages/portal && npx playwright test` |
| **Estimated runtime** | ~3 minutes |
---
## Sampling Rate
- **After every task commit:** Run `npx playwright test --project=chromium` (quick)
- **After every plan wave:** Run full suite (all browsers)
- **Before `/gsd:verify-work`:** Full suite green + Lighthouse scores passing
- **Max feedback latency:** 60 seconds (single browser)
---
## Per-Task Verification Map
| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
|---------|------|------|-------------|-----------|-------------------|-------------|--------|
| 09-xx | 01 | 1 | QA-01 | e2e | `npx playwright test` | ❌ W0 | ⬜ pending |
| 09-xx | 02 | 1 | QA-02 | lighthouse | `npx lhci autorun` | ❌ W0 | ⬜ pending |
| 09-xx | 02 | 1 | QA-03 | visual | `npx playwright test --update-snapshots` | ❌ W0 | ⬜ pending |
| 09-xx | 02 | 1 | QA-04 | a11y | `npx playwright test` (axe checks) | ❌ W0 | ⬜ pending |
| 09-xx | 01 | 1 | QA-05 | cross-browser | `npx playwright test` (3 projects) | ❌ W0 | ⬜ pending |
| 09-xx | 01 | 1 | QA-06 | e2e | `npx playwright test` (state tests) | ❌ W0 | ⬜ pending |
| 09-xx | 03 | 2 | QA-07 | ci | Gitea Actions pipeline | ❌ W0 | ⬜ pending |
---
## Wave 0 Requirements
- [ ] `npm install -D @playwright/test @axe-core/playwright @lhci/cli`
- [ ] `npx playwright install` (browser binaries)
- [ ] `packages/portal/playwright.config.ts` — Playwright configuration
- [ ] `packages/portal/e2e/` — test directory structure
---
## Manual-Only Verifications
| Behavior | Requirement | Why Manual | Test Instructions |
|----------|-------------|------------|-------------------|
| CI pipeline runs on push to main | QA-07 | Requires Gitea runner | Push a commit, verify pipeline starts and completes |
| Visual regression diffs reviewed | QA-03 | Human judgment on acceptable diffs | Review Playwright HTML report after baseline update |
---
## Validation Sign-Off
- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
- [ ] Wave 0 covers all MISSING references
- [ ] No watch-mode flags
- [ ] Feedback latency < 60s
- [ ] `nyquist_compliant: true` set in frontmatter
**Approval:** pending