Files

765 lines
37 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 9: Testing & QA - Research
**Researched:** 2026-03-25
**Domain:** Playwright E2E, Lighthouse CI, visual regression, axe-core accessibility, Gitea Actions CI
**Confidence:** HIGH
## Summary
Phase 9 is a greenfield testing layer added on top of a fully-built portal (Next.js 16 standalone, FastAPI gateway, Celery worker). No Playwright config exists yet — the Playwright MCP plugin is installed for manual use but there is no `playwright.config.ts`, no `tests/e2e/` content, and no `.gitea/workflows/` CI file. Everything must be created from scratch.
The core challenges are: (1) Auth.js v5 JWT sessions that Playwright must obtain and reuse across multiple role fixtures (platform_admin, customer_admin, customer_operator); (2) the WebSocket chat flow at `/chat/ws/{conversation_id}` that needs mocking via `page.routeWebSocket()`; (3) Lighthouse CI that requires a running Next.js server (standalone output complicates `startServerCommand`); and (4) a sub-5-minute pipeline on Gitea Actions that is nearly syntax-identical to GitHub Actions.
**Primary recommendation:** Place Playwright config and tests inside `packages/portal/` (Next.js co-location pattern), use `storageState` with three saved auth fixtures for roles, mock the WebSocket endpoint with `page.routeWebSocket()` for the chat flow, and run `@lhci/cli` in a separate post-build CI stage.
<user_constraints>
## User Constraints (from CONTEXT.md)
### Locked Decisions
All decisions at Claude's discretion — user trusts judgment.
- Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation)
- Critical flows to test (priority order):
1. Login → dashboard loads → session persists
2. Create tenant → tenant appears in list
3. Deploy template agent → agent appears in employees list
4. Chat: open conversation → send message → receive streaming response (mock LLM)
5. RBAC: operator cannot access /agents/new, /billing, /users
6. Language switcher → UI updates to selected language
7. Mobile viewport: bottom tab bar renders, sidebar hidden
- LLM responses mocked in E2E tests (no real Ollama/API calls)
- Test data: seed a test tenant + test user via API calls in test setup, clean up after
- Lighthouse targets: >= 90 (fail at 80, warn at 85)
- Pages: login, dashboard, chat, agents/new
- Visual regression at 3 viewports: desktop 1280x800, tablet 768x1024, mobile 375x812
- Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery
- Baseline snapshots committed to repo
- axe-core via @axe-core/playwright, zero critical violations required
- "serious" violations logged as warnings (not blockers for beta)
- Keyboard navigation test: Tab through login form, chat input, nav items
- Cross-browser: chromium, firefox, webkit
- Visual regression: chromium only
- Gitea Actions, triggers: push to main, PR to main
- Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse
- Docker Compose for CI infra
- JUnit XML + HTML trace viewer reports
- Fail-fast: lint/type errors block everything; unit test failures block E2E
- Target: < 5 min pipeline
### Claude's Discretion
- Playwright config details (timeouts, retries, parallelism)
- Test file organization (by feature vs by page)
- Fixture/helper patterns for auth, tenant setup, API mocking
- Lighthouse CI tool (lighthouse-ci vs @lhci/cli)
- Whether to include a smoke test for the WebSocket chat connection
- Visual regression threshold (pixel diff tolerance)
### Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope
</user_constraints>
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| QA-01 | Playwright E2E tests cover all critical user flows (login, tenant CRUD, agent deploy, chat, billing, RBAC) | Playwright storageState auth fixtures + routeWebSocket for chat mock |
| QA-02 | Lighthouse scores >= 90 for performance, accessibility, best practices, SEO on key pages | @lhci/cli with minScore assertions per category |
| QA-03 | Visual regression snapshots at desktop/tablet/mobile for all key pages | toHaveScreenshot with maxDiffPixelRatio, viewports per project |
| QA-04 | axe-core accessibility audit passes with zero critical violations across all pages | @axe-core/playwright AxeBuilder with impact filter |
| QA-05 | E2E tests pass on Chrome, Firefox, Safari (WebKit) | Playwright projects array with three browser engines |
| QA-06 | Empty states, error states, loading states tested and rendered correctly | Dedicated test cases + API mocking for empty/error responses |
| QA-07 | CI-ready test suite runnable in Gitea Actions pipeline | .gitea/workflows/ci.yml with Docker Compose service containers |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| @playwright/test | ^1.51 | E2E + visual regression + accessibility runner | Official Next.js recommendation, cross-browser built-in, no extra dependencies |
| @axe-core/playwright | ^4.10 | Accessibility scanning within Playwright tests | Official Deque package, integrates directly with Playwright page objects |
| @lhci/cli | ^0.15 | Lighthouse CI score assertions | Google-maintained, headless Lighthouse, assertion config via lighthouserc |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| axe-html-reporter | ^2.2 | HTML accessibility reports | When you want human-readable a11y reports attached to CI artifacts |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| @lhci/cli | lighthouse npm module directly | @lhci/cli handles multi-run averaging, assertions, and CI upload; raw lighthouse requires custom scripting |
| @axe-core/playwright | axe-playwright (third-party) | @axe-core/playwright is the official Deque package; axe-playwright is a community wrapper with same API but extra dep |
**Installation (portal):**
```bash
cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit
```
## Architecture Patterns
### Recommended Project Structure
```
packages/portal/
├── playwright.config.ts # Main config: projects, webServer, globalSetup
├── e2e/
│ ├── auth.setup.ts # Global setup: save storageState per role
│ ├── fixtures.ts # Extended test: auth fixtures, axe builder, API helpers
│ ├── helpers/
│ │ ├── seed.ts # Seed test tenant + user via API, return IDs
│ │ └── cleanup.ts # Delete seeded data after test suite
│ ├── flows/
│ │ ├── login.spec.ts # Flow 1: login → dashboard loads → session persists
│ │ ├── tenant-crud.spec.ts # Flow 2: create tenant → appears in list
│ │ ├── agent-deploy.spec.ts # Flow 3: deploy template → appears in employees
│ │ ├── chat.spec.ts # Flow 4: open chat → send msg → streaming response (mocked WS)
│ │ ├── rbac.spec.ts # Flow 5: operator access denied to restricted pages
│ │ ├── i18n.spec.ts # Flow 6: language switcher → UI updates
│ │ └── mobile.spec.ts # Flow 7: mobile viewport → bottom tab bar, sidebar hidden
│ ├── accessibility/
│ │ └── a11y.spec.ts # axe-core scan on every key page, keyboard nav test
│ ├── visual/
│ │ └── snapshots.spec.ts # Visual regression at 3 viewports (chromium only)
│ └── lighthouse/
│ └── lighthouserc.json # @lhci/cli config: URLs, score thresholds
├── playwright/.auth/ # gitignored — saved storageState files
│ ├── platform-admin.json
│ ├── customer-admin.json
│ └── customer-operator.json
└── __snapshots__/ # Committed baseline screenshots
.gitea/
└── workflows/
└── ci.yml # Pipeline: lint → typecheck → pytest → build → E2E → lhci
```
### Pattern 1: Auth.js v5 storageState with Multiple Roles
**What:** Authenticate each role once in a global setup project, save to JSON. All E2E tests consume the saved state — no repeated login UI interactions.
**When to use:** Any test that requires a logged-in user. Each spec declares which role it needs via `test.use({ storageState })`.
**Key insight for Auth.js v5:** The credentials provider calls the FastAPI `/api/portal/auth/verify` endpoint. Playwright must fill the login form (not call the API directly) because `next-auth` sets `HttpOnly` session cookies that only the browser can hold. The storageState captures those cookies.
```typescript
// Source: https://playwright.dev/docs/auth
// e2e/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
import path from "path";
const PLATFORM_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/platform-admin.json");
const CUSTOMER_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-admin.json");
const OPERATOR_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-operator.json");
setup("authenticate as platform admin", async ({ page }) => {
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_ADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_ADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: PLATFORM_ADMIN_AUTH });
});
setup("authenticate as customer admin", async ({ page }) => {
// seed returns { email, password } for a fresh customer_admin user
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_CADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_CADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: CUSTOMER_ADMIN_AUTH });
});
```
### Pattern 2: WebSocket Mocking for Chat Flow
**What:** Intercept the `/chat/ws/{conversationId}` WebSocket before the gateway is contacted. Respond to the auth message, then simulate streaming tokens on a user message.
**When to use:** Flow 4 (chat E2E test). The gateway WebSocket endpoint at `ws://localhost:8001/chat/ws/{id}` is routed via the Next.js API proxy — intercept at the browser level.
```typescript
// Source: https://playwright.dev/docs/api/class-websocketroute
// e2e/flows/chat.spec.ts
test("chat: send message → receive streaming response", async ({ page }) => {
await page.routeWebSocket(/\/chat\/ws\//, (ws) => {
ws.onMessage((msg) => {
const data = JSON.parse(msg as string);
if (data.type === "auth") {
// Acknowledge auth — no response needed, gateway just proceeds
return;
}
if (data.type === "message") {
// Simulate typing indicator
ws.send(JSON.stringify({ type: "typing" }));
// Simulate streaming tokens
const tokens = ["Hello", " from", " your", " AI", " assistant!"];
tokens.forEach((token, i) => {
setTimeout(() => {
ws.send(JSON.stringify({ type: "chunk", token }));
}, i * 50);
});
setTimeout(() => {
ws.send(JSON.stringify({
type: "response",
text: tokens.join(""),
conversation_id: data.conversation_id,
}));
ws.send(JSON.stringify({ type: "done", text: tokens.join("") }));
}, tokens.length * 50 + 100);
}
});
});
await page.goto("/chat?agentId=test-agent");
await page.getByPlaceholder(/type a message/i).fill("Hello!");
await page.keyboard.press("Enter");
await expect(page.getByText("Hello from your AI assistant!")).toBeVisible({ timeout: 5000 });
});
```
### Pattern 3: Visual Regression at Multiple Viewports
**What:** Configure separate Playwright projects for each viewport, run snapshots only on chromium to avoid cross-browser rendering diffs.
**When to use:** QA-03. Visual regression baseline committed to repo; CI fails on diff.
```typescript
// Source: https://playwright.dev/docs/test-snapshots
// playwright.config.ts (visual projects section)
{
name: "visual-desktop",
use: {
...devices["Desktop Chrome"],
viewport: { width: 1280, height: 800 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-tablet",
use: {
browserName: "chromium",
viewport: { width: 768, height: 1024 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-mobile",
use: {
...devices["iPhone 12"],
viewport: { width: 375, height: 812 },
},
testMatch: "e2e/visual/**",
},
```
Global threshold:
```typescript
// playwright.config.ts
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02, // 2% tolerance — accounts for antialiasing
threshold: 0.2, // pixel color threshold (01)
},
},
```
### Pattern 4: axe-core Fixture
**What:** Shared fixture that creates an AxeBuilder for each page, scoped to WCAG 2.1 AA, filtering results by impact level.
```typescript
// Source: https://playwright.dev/docs/accessibility-testing
// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
export const test = base.extend<{ axe: () => AxeBuilder }>({
axe: async ({ page }, use) => {
const makeBuilder = () =>
new AxeBuilder({ page })
.withTags(["wcag2a", "wcag2aa", "wcag21aa"]);
await use(makeBuilder);
},
});
// In a test:
const results = await axe().analyze();
const criticalViolations = results.violations.filter(v => v.impact === "critical");
const seriousViolations = results.violations.filter(v => v.impact === "serious");
expect(criticalViolations, "Critical a11y violations found").toHaveLength(0);
if (seriousViolations.length > 0) {
console.warn("Serious a11y violations (non-blocking):", seriousViolations);
}
```
### Pattern 5: Lighthouse CI Config
**What:** `lighthouserc.json` drives `@lhci/cli autorun` in CI. Pages run headlessly against the built portal.
```json
// Source: https://googlechrome.github.io/lighthouse-ci/docs/configuration.html
// e2e/lighthouse/lighthouserc.json
{
"ci": {
"collect": {
"url": [
"http://localhost:3000/login",
"http://localhost:3000/dashboard",
"http://localhost:3000/chat",
"http://localhost:3000/agents/new"
],
"numberOfRuns": 1,
"settings": {
"preset": "desktop",
"chromeFlags": "--no-sandbox --disable-dev-shm-usage"
}
},
"assert": {
"assertions": {
"categories:performance": ["error", {"minScore": 0.80}],
"categories:accessibility": ["error", {"minScore": 0.80}],
"categories:best-practices": ["error", {"minScore": 0.80}],
"categories:seo": ["error", {"minScore": 0.80}]
}
},
"upload": {
"target": "filesystem",
"outputDir": ".lighthouseci"
}
}
}
```
Note: `error` at 0.80 means CI fails below 80; the 90 target is aspirational. Set warn at 0.85 for soft alerts.
### Pattern 6: Playwright Config (Full)
```typescript
// packages/portal/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
testDir: "./e2e",
fullyParallel: false, // Stability in CI with shared DB state
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 1 : undefined,
timeout: 30_000,
reporter: [
["html", { outputFolder: "playwright-report" }],
["junit", { outputFile: "playwright-results.xml" }],
["list"],
],
use: {
baseURL: process.env.PLAYWRIGHT_BASE_URL ?? "http://localhost:3000",
trace: "on-first-retry",
screenshot: "only-on-failure",
serviceWorkers: "block", // Prevents Serwist from intercepting test requests
},
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02,
threshold: 0.2,
},
},
projects: [
// Auth setup runs first for all browser projects
{ name: "setup", testMatch: /auth\.setup\.ts/ },
// E2E flows — all 3 browsers
{
name: "chromium",
use: { ...devices["Desktop Chrome"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "firefox",
use: { ...devices["Desktop Firefox"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "webkit",
use: { ...devices["Desktop Safari"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
// Visual regression — chromium only, 3 viewports
{ name: "visual-desktop", use: { browserName: "chromium", viewport: { width: 1280, height: 800 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-tablet", use: { browserName: "chromium", viewport: { width: 768, height: 1024 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-mobile", use: { ...devices["iPhone 12"] }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
// Accessibility — chromium only
{
name: "a11y",
use: { ...devices["Desktop Chrome"] },
dependencies: ["setup"],
testMatch: "e2e/accessibility/**",
},
],
webServer: {
command: "node .next/standalone/server.js",
url: "http://localhost:3000",
reuseExistingServer: !process.env.CI,
env: {
PORT: "3000",
API_URL: process.env.API_URL ?? "http://localhost:8001",
AUTH_SECRET: process.env.AUTH_SECRET ?? "test-secret-32-chars-minimum-len",
AUTH_URL: "http://localhost:3000",
},
},
});
```
**Critical:** `serviceWorkers: "block"` is required because Serwist (PWA service worker) intercepts network requests and makes them invisible to `page.route()` / `page.routeWebSocket()`.
### Pattern 7: Gitea Actions CI Pipeline
```yaml
# .gitea/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
backend:
name: Backend Tests
runs-on: ubuntu-latest
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 5s
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install uv
- run: uv sync
- run: uv run ruff check packages/ tests/
- run: uv run mypy --strict packages/
- run: uv run pytest tests/ -x --tb=short
portal:
name: Portal E2E
runs-on: ubuntu-latest
needs: backend # E2E blocked until backend passes
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: --health-cmd pg_isready --health-interval 5s --health-retries 10
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "22" }
- name: Install portal deps
working-directory: packages/portal
run: npm ci
- name: Build portal
working-directory: packages/portal
run: npm run build
env:
NEXT_PUBLIC_API_URL: http://localhost:8001
- name: Install Playwright browsers
working-directory: packages/portal
run: npx playwright install --with-deps chromium firefox webkit
- name: Start gateway (background)
run: |
pip install uv && uv sync
uv run alembic upgrade head
uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
LLM_POOL_URL: http://localhost:8004 # not running — mocked in E2E
- name: Wait for gateway
run: timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'
- name: Run E2E tests
working-directory: packages/portal
run: npx playwright test e2e/flows/ e2e/accessibility/
env:
CI: "true"
PLAYWRIGHT_BASE_URL: http://localhost:3000
API_URL: http://localhost:8001
AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
E2E_ADMIN_EMAIL: ${{ secrets.E2E_ADMIN_EMAIL }}
E2E_ADMIN_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }}
- name: Run Lighthouse CI
working-directory: packages/portal
run: |
npx lhci autorun --config=e2e/lighthouse/lighthouserc.json
env:
LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: packages/portal/playwright-report/
- name: Upload Lighthouse report
if: always()
uses: actions/upload-artifact@v4
with:
name: lighthouse-report
path: packages/portal/.lighthouseci/
```
### Anti-Patterns to Avoid
- **Hardcoded IDs in selectors:** Use `getByRole`, `getByLabel`, `getByText` — never CSS `#id` or `[data-testid]` unless semantic selectors are unavailable. Semantic selectors are more resilient and double as accessibility checks.
- **Real LLM calls in E2E:** Never let E2E tests reach Ollama/OpenAI. Mock the WebSocket and gateway LLM calls. Real calls introduce flakiness and cost.
- **Superuser DB connections in test seeds:** The existing conftest uses `konstruct_app` role to preserve RLS. E2E seeds should call the FastAPI admin API endpoints, not connect directly to the DB.
- **Enabling service workers in tests:** Serwist intercepts all requests. Always set `serviceWorkers: "block"` in Playwright config.
- **Parallel workers with shared DB state:** Set `workers: 1` in CI. Tenant/agent mutations are not thread-safe across workers without per-worker isolation.
- **Running visual regression on all browsers:** Browser rendering engines produce expected pixel diffs. Visual regression on chromium only; cross-browser covered by functional E2E.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Screenshot diffs | Custom pixel comparator | `toHaveScreenshot()` built into Playwright | Handles baseline storage, update workflow, CI reporting |
| Accessibility scanning | Custom ARIA traversal | `@axe-core/playwright` | Covers 57 WCAG rules including ones humans miss |
| Performance score gating | Parsing Lighthouse JSON manually | `@lhci/cli assert` | Handles multi-run averaging, threshold config, exit codes |
| Auth state reuse | Logging in before every test | Playwright `storageState` | Session reuse makes the suite 10x faster |
| WS mock server | Running a real mock websocket server | `page.routeWebSocket()` | In-process, no port conflicts, no flakiness |
## Common Pitfalls
### Pitfall 1: Auth.js HttpOnly Cookies
**What goes wrong:** Trying to authenticate by calling `/api/portal/auth/verify` directly with Playwright `request` — this bypasses Auth.js cookie-setting, so the browser session never exists.
**Why it happens:** Auth.js v5 JWT is set as `HttpOnly` secure cookie by the Next.js server, not by the FastAPI backend.
**How to avoid:** Always use Playwright's UI login flow (fill form → submit → wait for redirect) to let Next.js set the cookie. Then save with `storageState`.
**Warning signs:** Tests pass the login assertion but fail immediately after on authenticated pages.
### Pitfall 2: Serwist Service Worker Intercepting Test Traffic
**What goes wrong:** `page.route()` and `page.routeWebSocket()` handlers never fire because the PWA service worker handles requests first.
**Why it happens:** Serwist registers a service worker that intercepts all requests matching the scope. Playwright's routing operates at the network level before the service worker, but only if service workers are blocked.
**How to avoid:** Set `serviceWorkers: "block"` in `playwright.config.ts` under `use`.
**Warning signs:** Mock routes never called; tests see real responses or network errors.
### Pitfall 3: Next.js Standalone Output Path for webServer
**What goes wrong:** `command: "npm run start"` fails in CI because `next start` requires the dev server setup, not standalone output.
**Why it happens:** The portal uses `output: "standalone"` in `next.config.ts`. The build produces `.next/standalone/server.js`, not the standard Next.js CLI server.
**How to avoid:** Use `command: "node .next/standalone/server.js"` in Playwright's `webServer` config. Copy static files if needed: the build step must run `cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`.
**Warning signs:** `webServer` process exits immediately; Playwright reports "server did not start".
### Pitfall 4: Visual Regression Baseline Committed Without CI Environment Lock
**What goes wrong:** Baselines created on a developer's Mac differ from Linux CI renderings (font rendering, subpixel AA, etc.).
**Why it happens:** Screenshot comparisons are pixel-exact. OS-level rendering differences cause 15% false failures.
**How to avoid:** Generate baselines inside the same Docker/Linux environment as CI. Run `npx playwright test --update-snapshots` on Linux (or in the Playwright Docker image) to commit initial baselines. Use `maxDiffPixelRatio: 0.02` to absorb minor remaining differences.
**Warning signs:** Visual tests pass locally but always fail in CI.
### Pitfall 5: Lighthouse Pages Behind Auth
**What goes wrong:** Lighthouse visits `/dashboard` and gets redirected to `/login` — scores an empty page.
**Why it happens:** Lighthouse runs as an unauthenticated browser session. LHCI doesn't support Auth.js cookie injection.
**How to avoid:** For authenticated pages, either (a) test only public pages with Lighthouse (login, landing), or (b) use LHCI's `basicAuth` option for pages behind HTTP auth (not applicable here), or (c) create a special unauthenticated preview mode. **For this project:** Run Lighthouse on `/login` only, plus any public-accessible marketing pages. Skip `/dashboard` and `/chat` for Lighthouse.
**Warning signs:** Lighthouse scores 100 for accessibility on dashboard — suspiciously perfect because it's measuring an empty redirect.
### Pitfall 6: WebSocket URL Resolution in Tests
**What goes wrong:** `page.routeWebSocket("/chat/ws/")` doesn't match because the portal derives the WS URL from `NEXT_PUBLIC_API_URL` (baked at build time), which points to `ws://localhost:8001`, not a relative path.
**Why it happens:** `use-chat-socket.ts` computes `WS_BASE` from `process.env.NEXT_PUBLIC_API_URL` and builds `ws://localhost:8001/chat/ws/{id}`.
**How to avoid:** Use a regex pattern: `page.routeWebSocket(/\/chat\/ws\//, handler)` — this matches the full absolute URL.
**Warning signs:** Chat mock never fires; test times out waiting for WS message.
### Pitfall 7: Gitea Actions Runner Needs Docker
**What goes wrong:** Service containers fail to start because the Gitea runner is not configured with Docker access.
**Why it happens:** Gitea Actions service containers require Docker socket access on the runner.
**How to avoid:** Ensure the `act_runner` is added to the `docker` group on the host. Alternative: use `docker compose` in a setup step instead of service containers.
**Warning signs:** Job fails immediately with "Cannot connect to Docker daemon".
## Code Examples
### Seed Helper via API
```typescript
// e2e/helpers/seed.ts
// Uses Playwright APIRequestContext to create test data via FastAPI endpoints.
// Must run BEFORE storageState setup (needs platform_admin creds via env).
export async function seedTestTenant(request: APIRequestContext): Promise<{ tenantId: string; tenantSlug: string }> {
const suffix = Math.random().toString(36).slice(2, 8);
const res = await request.post("http://localhost:8001/api/portal/tenants", {
headers: {
"X-User-Id": process.env.E2E_ADMIN_ID!,
"X-User-Role": "platform_admin",
"X-Active-Tenant": "",
},
data: { name: `E2E Tenant ${suffix}`, slug: `e2e-tenant-${suffix}` },
});
const body = await res.json() as { id: string; slug: string };
return { tenantId: body.id, tenantSlug: body.slug };
}
```
### RBAC Test Pattern
```typescript
// e2e/flows/rbac.spec.ts
// Tests that operator role is silently redirected, not 403-paged
test.describe("RBAC enforcement", () => {
test.use({ storageState: "playwright/.auth/customer-operator.json" });
const restrictedPaths = ["/agents/new", "/billing", "/users"];
for (const path of restrictedPaths) {
test(`operator cannot access ${path}`, async ({ page }) => {
await page.goto(path);
// proxy.ts does silent redirect — operator ends up on /dashboard
await expect(page).not.toHaveURL(path);
});
}
});
```
### Mobile Viewport Behavioral Test
```typescript
// e2e/flows/mobile.spec.ts
test("mobile: bottom tab bar renders, sidebar hidden", async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto("/dashboard");
// Bottom tab bar visible
await expect(page.getByRole("navigation", { name: /mobile/i })).toBeVisible();
// Desktop sidebar hidden
await expect(page.getByRole("navigation", { name: /sidebar/i })).not.toBeVisible();
});
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Cypress for Next.js E2E | Playwright (official Next.js recommendation) | 20232024 | Cross-browser, better WS support, no iframe limitations |
| `lighthouse` npm module with custom scripts | `@lhci/cli autorun` | 2020+ | Automated multi-run averaging, assertions, CI reporting |
| `axe-playwright` (community) | `@axe-core/playwright` (official Deque) | 2022+ | Official package, same API, no extra wrapper |
| `next start` for E2E server | `node .next/standalone/server.js` | Next.js 12+ standalone | Required when `output: "standalone"` is set |
| middleware.ts | proxy.ts | Next.js 16 | Next.js 16 renamed middleware file |
**Deprecated/outdated:**
- `cypress/integration/` directory: Cypress split this into `cypress/e2e/` in v10 — but we're not using Cypress
- `@playwright/test` `globalSetup` string path: Still valid but the project-based `setup` dependency is preferred in Playwright 1.40+
- `installSerwist()`: Replaced by `new Serwist() + addEventListeners()` in serwist v9 (already applied in Phase 8)
## Open Questions
1. **Lighthouse on authenticated pages**
- What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to `/login`
- What's unclear: Whether LHCI supports cookie injection (not documented)
- Recommendation: Scope Lighthouse to `/login` only for QA-02. Dashboard/chat performance validated manually or via Web Vitals tracking in production.
2. **Visual regression baseline generation environment**
- What we know: OS-level rendering differences cause false failures
- What's unclear: Whether the Gitea runner is Linux or Mac
- Recommendation: Wave 0 task generates baselines inside the CI Docker container (Linux), commits them. Dev machines use `--update-snapshots` only deliberately.
3. **Celery worker in E2E**
- What we know: The chat WebSocket flow uses Redis pub-sub to deliver responses from the Celery worker
- What's unclear: Whether E2E should run the Celery worker (real pipeline, slow) or mock the WS entirely (fast but less realistic)
- Recommendation: Mock the WebSocket entirely via `page.routeWebSocket()`. This tests the frontend streaming UX without depending on Celery. Add a separate smoke test that hits the gateway `/health` endpoint to verify service health in CI.
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework (backend) | pytest 8.3+ / pytest-asyncio (existing, all tests pass) |
| Framework (E2E) | @playwright/test ^1.51 (to be installed) |
| Config file (E2E) | `packages/portal/playwright.config.ts` — Wave 0 |
| Quick run (backend) | `uv run pytest tests/unit -x --tb=short` |
| Full suite (backend) | `uv run pytest tests/ -x --tb=short` |
| E2E run | `cd packages/portal && npx playwright test` |
| Visual update | `cd packages/portal && npx playwright test --update-snapshots` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| QA-01 | 7 critical user flows pass | E2E Playwright | `npx playwright test e2e/flows/ --project=chromium` | Wave 0 |
| QA-02 | Lighthouse >= 90 on key pages | Lighthouse CI | `npx lhci autorun --config=e2e/lighthouse/lighthouserc.json` | Wave 0 |
| QA-03 | Visual snapshots pass at 3 viewports | Visual regression | `npx playwright test e2e/visual/` | Wave 0 |
| QA-04 | Zero critical a11y violations | Accessibility scan | `npx playwright test e2e/accessibility/` | Wave 0 |
| QA-05 | All E2E flows pass on 3 browsers | Cross-browser E2E | `npx playwright test e2e/flows/` (all projects) | Wave 0 |
| QA-06 | Empty/error/loading states correct | E2E Playwright | Covered within flow specs via API mocking | Wave 0 |
| QA-07 | CI pipeline runs in Gitea Actions | CI workflow | `.gitea/workflows/ci.yml` | Wave 0 |
### Sampling Rate
- **Per task commit:** `cd packages/portal && npx playwright test e2e/flows/login.spec.ts --project=chromium`
- **Per wave merge:** `cd packages/portal && npx playwright test e2e/flows/ --project=chromium`
- **Phase gate:** Full suite (all projects + accessibility + visual) green before `/gsd:verify-work`
### Wave 0 Gaps
- [ ] `packages/portal/playwright.config.ts` — E2E framework config
- [ ] `packages/portal/e2e/auth.setup.ts` — Auth state generation for 3 roles
- [ ] `packages/portal/e2e/fixtures.ts` — Shared test fixtures (axe, auth, API helpers)
- [ ] `packages/portal/e2e/helpers/seed.ts` — Test data seeding via API
- [ ] `packages/portal/e2e/flows/*.spec.ts` — 7 flow spec files
- [ ] `packages/portal/e2e/accessibility/a11y.spec.ts` — axe-core scans
- [ ] `packages/portal/e2e/visual/snapshots.spec.ts` — visual regression specs
- [ ] `packages/portal/e2e/lighthouse/lighthouserc.json` — Lighthouse CI config
- [ ] `.gitea/workflows/ci.yml` — CI pipeline
- [ ] `packages/portal/playwright/.auth/.gitkeep` — Directory for saved auth state (gitignored content)
- [ ] Framework install: `cd packages/portal && npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli && npx playwright install --with-deps`
- [ ] Baseline snapshots: run `npx playwright test e2e/visual/ --update-snapshots` on Linux to generate
## Sources
### Primary (HIGH confidence)
- https://playwright.dev/docs/auth — storageState, setup projects, multiple roles
- https://playwright.dev/docs/api/class-websocketroute — WebSocket mocking API
- https://playwright.dev/docs/test-snapshots — toHaveScreenshot, maxDiffPixelRatio
- https://playwright.dev/docs/accessibility-testing — @axe-core/playwright integration
- https://playwright.dev/docs/ci — CI configuration, Docker image, workers
- https://googlechrome.github.io/lighthouse-ci/docs/configuration.html — minScore assertions format
### Secondary (MEDIUM confidence)
- https://googlechrome.github.io/lighthouse-ci/docs/getting-started.html — lhci autorun setup
- https://playwright.dev/docs/mock — page.route() and page.routeWebSocket() overview
- Gitea Actions docs (forum.gitea.com) — confirmed GitHub Actions YAML compatibility, Docker socket requirements
### Tertiary (LOW confidence)
- WebSearch result: Gitea runner Docker group requirement — mentioned across multiple community posts, not in official docs
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH — verified against official Playwright, @axe-core, and LHCI docs
- Architecture: HIGH — patterns derived directly from official Playwright documentation
- Pitfalls: HIGH (pitfalls 16 from direct codebase inspection + official docs); MEDIUM (pitfall 7 from community sources)
**Research date:** 2026-03-25
**Valid until:** 2026-06-25 (90 days — Playwright and Next.js are fast-moving but breaking changes are rare)