37 KiB
Phase 9: Testing & QA - Research
Researched: 2026-03-25 Domain: Playwright E2E, Lighthouse CI, visual regression, axe-core accessibility, Gitea Actions CI Confidence: HIGH
Summary
Phase 9 is a greenfield testing layer added on top of a fully-built portal (Next.js 16 standalone, FastAPI gateway, Celery worker). No Playwright config exists yet — the Playwright MCP plugin is installed for manual use but there is no playwright.config.ts, no tests/e2e/ content, and no .gitea/workflows/ CI file. Everything must be created from scratch.
The core challenges are: (1) Auth.js v5 JWT sessions that Playwright must obtain and reuse across multiple role fixtures (platform_admin, customer_admin, customer_operator); (2) the WebSocket chat flow at /chat/ws/{conversation_id} that needs mocking via page.routeWebSocket(); (3) Lighthouse CI that requires a running Next.js server (standalone output complicates startServerCommand); and (4) a sub-5-minute pipeline on Gitea Actions that is nearly syntax-identical to GitHub Actions.
Primary recommendation: Place Playwright config and tests inside packages/portal/ (Next.js co-location pattern), use storageState with three saved auth fixtures for roles, mock the WebSocket endpoint with page.routeWebSocket() for the chat flow, and run @lhci/cli in a separate post-build CI stage.
<user_constraints>
User Constraints (from CONTEXT.md)
Locked Decisions
All decisions at Claude's discretion — user trusts judgment.
- Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation)
- Critical flows to test (priority order):
- Login → dashboard loads → session persists
- Create tenant → tenant appears in list
- Deploy template agent → agent appears in employees list
- Chat: open conversation → send message → receive streaming response (mock LLM)
- RBAC: operator cannot access /agents/new, /billing, /users
- Language switcher → UI updates to selected language
- Mobile viewport: bottom tab bar renders, sidebar hidden
- LLM responses mocked in E2E tests (no real Ollama/API calls)
- Test data: seed a test tenant + test user via API calls in test setup, clean up after
- Lighthouse targets: >= 90 (fail at 80, warn at 85)
- Pages: login, dashboard, chat, agents/new
- Visual regression at 3 viewports: desktop 1280x800, tablet 768x1024, mobile 375x812
- Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery
- Baseline snapshots committed to repo
- axe-core via @axe-core/playwright, zero critical violations required
- "serious" violations logged as warnings (not blockers for beta)
- Keyboard navigation test: Tab through login form, chat input, nav items
- Cross-browser: chromium, firefox, webkit
- Visual regression: chromium only
- Gitea Actions, triggers: push to main, PR to main
- Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse
- Docker Compose for CI infra
- JUnit XML + HTML trace viewer reports
- Fail-fast: lint/type errors block everything; unit test failures block E2E
- Target: < 5 min pipeline
Claude's Discretion
- Playwright config details (timeouts, retries, parallelism)
- Test file organization (by feature vs by page)
- Fixture/helper patterns for auth, tenant setup, API mocking
- Lighthouse CI tool (lighthouse-ci vs @lhci/cli)
- Whether to include a smoke test for the WebSocket chat connection
- Visual regression threshold (pixel diff tolerance)
Deferred Ideas (OUT OF SCOPE)
None — discussion stayed within phase scope </user_constraints>
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| QA-01 | Playwright E2E tests cover all critical user flows (login, tenant CRUD, agent deploy, chat, billing, RBAC) | Playwright storageState auth fixtures + routeWebSocket for chat mock |
| QA-02 | Lighthouse scores >= 90 for performance, accessibility, best practices, SEO on key pages | @lhci/cli with minScore assertions per category |
| QA-03 | Visual regression snapshots at desktop/tablet/mobile for all key pages | toHaveScreenshot with maxDiffPixelRatio, viewports per project |
| QA-04 | axe-core accessibility audit passes with zero critical violations across all pages | @axe-core/playwright AxeBuilder with impact filter |
| QA-05 | E2E tests pass on Chrome, Firefox, Safari (WebKit) | Playwright projects array with three browser engines |
| QA-06 | Empty states, error states, loading states tested and rendered correctly | Dedicated test cases + API mocking for empty/error responses |
| QA-07 | CI-ready test suite runnable in Gitea Actions pipeline | .gitea/workflows/ci.yml with Docker Compose service containers |
| </phase_requirements> |
Standard Stack
Core
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| @playwright/test | ^1.51 | E2E + visual regression + accessibility runner | Official Next.js recommendation, cross-browser built-in, no extra dependencies |
| @axe-core/playwright | ^4.10 | Accessibility scanning within Playwright tests | Official Deque package, integrates directly with Playwright page objects |
| @lhci/cli | ^0.15 | Lighthouse CI score assertions | Google-maintained, headless Lighthouse, assertion config via lighthouserc |
Supporting
| Library | Version | Purpose | When to Use |
|---|---|---|---|
| axe-html-reporter | ^2.2 | HTML accessibility reports | When you want human-readable a11y reports attached to CI artifacts |
Alternatives Considered
| Instead of | Could Use | Tradeoff |
|---|---|---|
| @lhci/cli | lighthouse npm module directly | @lhci/cli handles multi-run averaging, assertions, and CI upload; raw lighthouse requires custom scripting |
| @axe-core/playwright | axe-playwright (third-party) | @axe-core/playwright is the official Deque package; axe-playwright is a community wrapper with same API but extra dep |
Installation (portal):
cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit
Architecture Patterns
Recommended Project Structure
packages/portal/
├── playwright.config.ts # Main config: projects, webServer, globalSetup
├── e2e/
│ ├── auth.setup.ts # Global setup: save storageState per role
│ ├── fixtures.ts # Extended test: auth fixtures, axe builder, API helpers
│ ├── helpers/
│ │ ├── seed.ts # Seed test tenant + user via API, return IDs
│ │ └── cleanup.ts # Delete seeded data after test suite
│ ├── flows/
│ │ ├── login.spec.ts # Flow 1: login → dashboard loads → session persists
│ │ ├── tenant-crud.spec.ts # Flow 2: create tenant → appears in list
│ │ ├── agent-deploy.spec.ts # Flow 3: deploy template → appears in employees
│ │ ├── chat.spec.ts # Flow 4: open chat → send msg → streaming response (mocked WS)
│ │ ├── rbac.spec.ts # Flow 5: operator access denied to restricted pages
│ │ ├── i18n.spec.ts # Flow 6: language switcher → UI updates
│ │ └── mobile.spec.ts # Flow 7: mobile viewport → bottom tab bar, sidebar hidden
│ ├── accessibility/
│ │ └── a11y.spec.ts # axe-core scan on every key page, keyboard nav test
│ ├── visual/
│ │ └── snapshots.spec.ts # Visual regression at 3 viewports (chromium only)
│ └── lighthouse/
│ └── lighthouserc.json # @lhci/cli config: URLs, score thresholds
├── playwright/.auth/ # gitignored — saved storageState files
│ ├── platform-admin.json
│ ├── customer-admin.json
│ └── customer-operator.json
└── __snapshots__/ # Committed baseline screenshots
.gitea/
└── workflows/
└── ci.yml # Pipeline: lint → typecheck → pytest → build → E2E → lhci
Pattern 1: Auth.js v5 storageState with Multiple Roles
What: Authenticate each role once in a global setup project, save to JSON. All E2E tests consume the saved state — no repeated login UI interactions.
When to use: Any test that requires a logged-in user. Each spec declares which role it needs via test.use({ storageState }).
Key insight for Auth.js v5: The credentials provider calls the FastAPI /api/portal/auth/verify endpoint. Playwright must fill the login form (not call the API directly) because next-auth sets HttpOnly session cookies that only the browser can hold. The storageState captures those cookies.
// Source: https://playwright.dev/docs/auth
// e2e/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
import path from "path";
const PLATFORM_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/platform-admin.json");
const CUSTOMER_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-admin.json");
const OPERATOR_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-operator.json");
setup("authenticate as platform admin", async ({ page }) => {
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_ADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_ADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: PLATFORM_ADMIN_AUTH });
});
setup("authenticate as customer admin", async ({ page }) => {
// seed returns { email, password } for a fresh customer_admin user
await page.goto("/login");
await page.getByLabel("Email").fill(process.env.E2E_CADMIN_EMAIL!);
await page.getByLabel("Password").fill(process.env.E2E_CADMIN_PASSWORD!);
await page.getByRole("button", { name: /sign in/i }).click();
await page.waitForURL("/dashboard");
await page.context().storageState({ path: CUSTOMER_ADMIN_AUTH });
});
Pattern 2: WebSocket Mocking for Chat Flow
What: Intercept the /chat/ws/{conversationId} WebSocket before the gateway is contacted. Respond to the auth message, then simulate streaming tokens on a user message.
When to use: Flow 4 (chat E2E test). The gateway WebSocket endpoint at ws://localhost:8001/chat/ws/{id} is routed via the Next.js API proxy — intercept at the browser level.
// Source: https://playwright.dev/docs/api/class-websocketroute
// e2e/flows/chat.spec.ts
test("chat: send message → receive streaming response", async ({ page }) => {
await page.routeWebSocket(/\/chat\/ws\//, (ws) => {
ws.onMessage((msg) => {
const data = JSON.parse(msg as string);
if (data.type === "auth") {
// Acknowledge auth — no response needed, gateway just proceeds
return;
}
if (data.type === "message") {
// Simulate typing indicator
ws.send(JSON.stringify({ type: "typing" }));
// Simulate streaming tokens
const tokens = ["Hello", " from", " your", " AI", " assistant!"];
tokens.forEach((token, i) => {
setTimeout(() => {
ws.send(JSON.stringify({ type: "chunk", token }));
}, i * 50);
});
setTimeout(() => {
ws.send(JSON.stringify({
type: "response",
text: tokens.join(""),
conversation_id: data.conversation_id,
}));
ws.send(JSON.stringify({ type: "done", text: tokens.join("") }));
}, tokens.length * 50 + 100);
}
});
});
await page.goto("/chat?agentId=test-agent");
await page.getByPlaceholder(/type a message/i).fill("Hello!");
await page.keyboard.press("Enter");
await expect(page.getByText("Hello from your AI assistant!")).toBeVisible({ timeout: 5000 });
});
Pattern 3: Visual Regression at Multiple Viewports
What: Configure separate Playwright projects for each viewport, run snapshots only on chromium to avoid cross-browser rendering diffs.
When to use: QA-03. Visual regression baseline committed to repo; CI fails on diff.
// Source: https://playwright.dev/docs/test-snapshots
// playwright.config.ts (visual projects section)
{
name: "visual-desktop",
use: {
...devices["Desktop Chrome"],
viewport: { width: 1280, height: 800 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-tablet",
use: {
browserName: "chromium",
viewport: { width: 768, height: 1024 },
},
testMatch: "e2e/visual/**",
},
{
name: "visual-mobile",
use: {
...devices["iPhone 12"],
viewport: { width: 375, height: 812 },
},
testMatch: "e2e/visual/**",
},
Global threshold:
// playwright.config.ts
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02, // 2% tolerance — accounts for antialiasing
threshold: 0.2, // pixel color threshold (0–1)
},
},
Pattern 4: axe-core Fixture
What: Shared fixture that creates an AxeBuilder for each page, scoped to WCAG 2.1 AA, filtering results by impact level.
// Source: https://playwright.dev/docs/accessibility-testing
// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";
export const test = base.extend<{ axe: () => AxeBuilder }>({
axe: async ({ page }, use) => {
const makeBuilder = () =>
new AxeBuilder({ page })
.withTags(["wcag2a", "wcag2aa", "wcag21aa"]);
await use(makeBuilder);
},
});
// In a test:
const results = await axe().analyze();
const criticalViolations = results.violations.filter(v => v.impact === "critical");
const seriousViolations = results.violations.filter(v => v.impact === "serious");
expect(criticalViolations, "Critical a11y violations found").toHaveLength(0);
if (seriousViolations.length > 0) {
console.warn("Serious a11y violations (non-blocking):", seriousViolations);
}
Pattern 5: Lighthouse CI Config
What: lighthouserc.json drives @lhci/cli autorun in CI. Pages run headlessly against the built portal.
// Source: https://googlechrome.github.io/lighthouse-ci/docs/configuration.html
// e2e/lighthouse/lighthouserc.json
{
"ci": {
"collect": {
"url": [
"http://localhost:3000/login",
"http://localhost:3000/dashboard",
"http://localhost:3000/chat",
"http://localhost:3000/agents/new"
],
"numberOfRuns": 1,
"settings": {
"preset": "desktop",
"chromeFlags": "--no-sandbox --disable-dev-shm-usage"
}
},
"assert": {
"assertions": {
"categories:performance": ["error", {"minScore": 0.80}],
"categories:accessibility": ["error", {"minScore": 0.80}],
"categories:best-practices": ["error", {"minScore": 0.80}],
"categories:seo": ["error", {"minScore": 0.80}]
}
},
"upload": {
"target": "filesystem",
"outputDir": ".lighthouseci"
}
}
}
Note: error at 0.80 means CI fails below 80; the 90 target is aspirational. Set warn at 0.85 for soft alerts.
Pattern 6: Playwright Config (Full)
// packages/portal/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";
export default defineConfig({
testDir: "./e2e",
fullyParallel: false, // Stability in CI with shared DB state
forbidOnly: !!process.env.CI,
retries: process.env.CI ? 1 : 0,
workers: process.env.CI ? 1 : undefined,
timeout: 30_000,
reporter: [
["html", { outputFolder: "playwright-report" }],
["junit", { outputFile: "playwright-results.xml" }],
["list"],
],
use: {
baseURL: process.env.PLAYWRIGHT_BASE_URL ?? "http://localhost:3000",
trace: "on-first-retry",
screenshot: "only-on-failure",
serviceWorkers: "block", // Prevents Serwist from intercepting test requests
},
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.02,
threshold: 0.2,
},
},
projects: [
// Auth setup runs first for all browser projects
{ name: "setup", testMatch: /auth\.setup\.ts/ },
// E2E flows — all 3 browsers
{
name: "chromium",
use: { ...devices["Desktop Chrome"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "firefox",
use: { ...devices["Desktop Firefox"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
{
name: "webkit",
use: { ...devices["Desktop Safari"], storageState: "playwright/.auth/platform-admin.json" },
dependencies: ["setup"],
testMatch: "e2e/flows/**",
},
// Visual regression — chromium only, 3 viewports
{ name: "visual-desktop", use: { browserName: "chromium", viewport: { width: 1280, height: 800 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-tablet", use: { browserName: "chromium", viewport: { width: 768, height: 1024 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
{ name: "visual-mobile", use: { ...devices["iPhone 12"] }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
// Accessibility — chromium only
{
name: "a11y",
use: { ...devices["Desktop Chrome"] },
dependencies: ["setup"],
testMatch: "e2e/accessibility/**",
},
],
webServer: {
command: "node .next/standalone/server.js",
url: "http://localhost:3000",
reuseExistingServer: !process.env.CI,
env: {
PORT: "3000",
API_URL: process.env.API_URL ?? "http://localhost:8001",
AUTH_SECRET: process.env.AUTH_SECRET ?? "test-secret-32-chars-minimum-len",
AUTH_URL: "http://localhost:3000",
},
},
});
Critical: serviceWorkers: "block" is required because Serwist (PWA service worker) intercepts network requests and makes them invisible to page.route() / page.routeWebSocket().
Pattern 7: Gitea Actions CI Pipeline
# .gitea/workflows/ci.yml
name: CI
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
backend:
name: Backend Tests
runs-on: ubuntu-latest
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: >-
--health-cmd pg_isready
--health-interval 5s
--health-timeout 5s
--health-retries 10
redis:
image: redis:7-alpine
options: >-
--health-cmd "redis-cli ping"
--health-interval 5s
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- run: pip install uv
- run: uv sync
- run: uv run ruff check packages/ tests/
- run: uv run mypy --strict packages/
- run: uv run pytest tests/ -x --tb=short
portal:
name: Portal E2E
runs-on: ubuntu-latest
needs: backend # E2E blocked until backend passes
services:
postgres:
image: pgvector/pgvector:pg16
env:
POSTGRES_DB: konstruct
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres_dev
options: --health-cmd pg_isready --health-interval 5s --health-retries 10
redis:
image: redis:7-alpine
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "22" }
- name: Install portal deps
working-directory: packages/portal
run: npm ci
- name: Build portal
working-directory: packages/portal
run: npm run build
env:
NEXT_PUBLIC_API_URL: http://localhost:8001
- name: Install Playwright browsers
working-directory: packages/portal
run: npx playwright install --with-deps chromium firefox webkit
- name: Start gateway (background)
run: |
pip install uv && uv sync
uv run alembic upgrade head
uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
env:
DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
REDIS_URL: redis://localhost:6379/0
LLM_POOL_URL: http://localhost:8004 # not running — mocked in E2E
- name: Wait for gateway
run: timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'
- name: Run E2E tests
working-directory: packages/portal
run: npx playwright test e2e/flows/ e2e/accessibility/
env:
CI: "true"
PLAYWRIGHT_BASE_URL: http://localhost:3000
API_URL: http://localhost:8001
AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
E2E_ADMIN_EMAIL: ${{ secrets.E2E_ADMIN_EMAIL }}
E2E_ADMIN_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }}
- name: Run Lighthouse CI
working-directory: packages/portal
run: |
npx lhci autorun --config=e2e/lighthouse/lighthouserc.json
env:
LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report
path: packages/portal/playwright-report/
- name: Upload Lighthouse report
if: always()
uses: actions/upload-artifact@v4
with:
name: lighthouse-report
path: packages/portal/.lighthouseci/
Anti-Patterns to Avoid
- Hardcoded IDs in selectors: Use
getByRole,getByLabel,getByText— never CSS#idor[data-testid]unless semantic selectors are unavailable. Semantic selectors are more resilient and double as accessibility checks. - Real LLM calls in E2E: Never let E2E tests reach Ollama/OpenAI. Mock the WebSocket and gateway LLM calls. Real calls introduce flakiness and cost.
- Superuser DB connections in test seeds: The existing conftest uses
konstruct_approle to preserve RLS. E2E seeds should call the FastAPI admin API endpoints, not connect directly to the DB. - Enabling service workers in tests: Serwist intercepts all requests. Always set
serviceWorkers: "block"in Playwright config. - Parallel workers with shared DB state: Set
workers: 1in CI. Tenant/agent mutations are not thread-safe across workers without per-worker isolation. - Running visual regression on all browsers: Browser rendering engines produce expected pixel diffs. Visual regression on chromium only; cross-browser covered by functional E2E.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Screenshot diffs | Custom pixel comparator | toHaveScreenshot() built into Playwright |
Handles baseline storage, update workflow, CI reporting |
| Accessibility scanning | Custom ARIA traversal | @axe-core/playwright |
Covers 57 WCAG rules including ones humans miss |
| Performance score gating | Parsing Lighthouse JSON manually | @lhci/cli assert |
Handles multi-run averaging, threshold config, exit codes |
| Auth state reuse | Logging in before every test | Playwright storageState |
Session reuse makes the suite 10x faster |
| WS mock server | Running a real mock websocket server | page.routeWebSocket() |
In-process, no port conflicts, no flakiness |
Common Pitfalls
Pitfall 1: Auth.js HttpOnly Cookies
What goes wrong: Trying to authenticate by calling /api/portal/auth/verify directly with Playwright request — this bypasses Auth.js cookie-setting, so the browser session never exists.
Why it happens: Auth.js v5 JWT is set as HttpOnly secure cookie by the Next.js server, not by the FastAPI backend.
How to avoid: Always use Playwright's UI login flow (fill form → submit → wait for redirect) to let Next.js set the cookie. Then save with storageState.
Warning signs: Tests pass the login assertion but fail immediately after on authenticated pages.
Pitfall 2: Serwist Service Worker Intercepting Test Traffic
What goes wrong: page.route() and page.routeWebSocket() handlers never fire because the PWA service worker handles requests first.
Why it happens: Serwist registers a service worker that intercepts all requests matching the scope. Playwright's routing operates at the network level before the service worker, but only if service workers are blocked.
How to avoid: Set serviceWorkers: "block" in playwright.config.ts under use.
Warning signs: Mock routes never called; tests see real responses or network errors.
Pitfall 3: Next.js Standalone Output Path for webServer
What goes wrong: command: "npm run start" fails in CI because next start requires the dev server setup, not standalone output.
Why it happens: The portal uses output: "standalone" in next.config.ts. The build produces .next/standalone/server.js, not the standard Next.js CLI server.
How to avoid: Use command: "node .next/standalone/server.js" in Playwright's webServer config. Copy static files if needed: the build step must run cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public.
Warning signs: webServer process exits immediately; Playwright reports "server did not start".
Pitfall 4: Visual Regression Baseline Committed Without CI Environment Lock
What goes wrong: Baselines created on a developer's Mac differ from Linux CI renderings (font rendering, subpixel AA, etc.).
Why it happens: Screenshot comparisons are pixel-exact. OS-level rendering differences cause 1–5% false failures.
How to avoid: Generate baselines inside the same Docker/Linux environment as CI. Run npx playwright test --update-snapshots on Linux (or in the Playwright Docker image) to commit initial baselines. Use maxDiffPixelRatio: 0.02 to absorb minor remaining differences.
Warning signs: Visual tests pass locally but always fail in CI.
Pitfall 5: Lighthouse Pages Behind Auth
What goes wrong: Lighthouse visits /dashboard and gets redirected to /login — scores an empty page.
Why it happens: Lighthouse runs as an unauthenticated browser session. LHCI doesn't support Auth.js cookie injection.
How to avoid: For authenticated pages, either (a) test only public pages with Lighthouse (login, landing), or (b) use LHCI's basicAuth option for pages behind HTTP auth (not applicable here), or (c) create a special unauthenticated preview mode. For this project: Run Lighthouse on /login only, plus any public-accessible marketing pages. Skip /dashboard and /chat for Lighthouse.
Warning signs: Lighthouse scores 100 for accessibility on dashboard — suspiciously perfect because it's measuring an empty redirect.
Pitfall 6: WebSocket URL Resolution in Tests
What goes wrong: page.routeWebSocket("/chat/ws/") doesn't match because the portal derives the WS URL from NEXT_PUBLIC_API_URL (baked at build time), which points to ws://localhost:8001, not a relative path.
Why it happens: use-chat-socket.ts computes WS_BASE from process.env.NEXT_PUBLIC_API_URL and builds ws://localhost:8001/chat/ws/{id}.
How to avoid: Use a regex pattern: page.routeWebSocket(/\/chat\/ws\//, handler) — this matches the full absolute URL.
Warning signs: Chat mock never fires; test times out waiting for WS message.
Pitfall 7: Gitea Actions Runner Needs Docker
What goes wrong: Service containers fail to start because the Gitea runner is not configured with Docker access.
Why it happens: Gitea Actions service containers require Docker socket access on the runner.
How to avoid: Ensure the act_runner is added to the docker group on the host. Alternative: use docker compose in a setup step instead of service containers.
Warning signs: Job fails immediately with "Cannot connect to Docker daemon".
Code Examples
Seed Helper via API
// e2e/helpers/seed.ts
// Uses Playwright APIRequestContext to create test data via FastAPI endpoints.
// Must run BEFORE storageState setup (needs platform_admin creds via env).
export async function seedTestTenant(request: APIRequestContext): Promise<{ tenantId: string; tenantSlug: string }> {
const suffix = Math.random().toString(36).slice(2, 8);
const res = await request.post("http://localhost:8001/api/portal/tenants", {
headers: {
"X-User-Id": process.env.E2E_ADMIN_ID!,
"X-User-Role": "platform_admin",
"X-Active-Tenant": "",
},
data: { name: `E2E Tenant ${suffix}`, slug: `e2e-tenant-${suffix}` },
});
const body = await res.json() as { id: string; slug: string };
return { tenantId: body.id, tenantSlug: body.slug };
}
RBAC Test Pattern
// e2e/flows/rbac.spec.ts
// Tests that operator role is silently redirected, not 403-paged
test.describe("RBAC enforcement", () => {
test.use({ storageState: "playwright/.auth/customer-operator.json" });
const restrictedPaths = ["/agents/new", "/billing", "/users"];
for (const path of restrictedPaths) {
test(`operator cannot access ${path}`, async ({ page }) => {
await page.goto(path);
// proxy.ts does silent redirect — operator ends up on /dashboard
await expect(page).not.toHaveURL(path);
});
}
});
Mobile Viewport Behavioral Test
// e2e/flows/mobile.spec.ts
test("mobile: bottom tab bar renders, sidebar hidden", async ({ page }) => {
await page.setViewportSize({ width: 375, height: 812 });
await page.goto("/dashboard");
// Bottom tab bar visible
await expect(page.getByRole("navigation", { name: /mobile/i })).toBeVisible();
// Desktop sidebar hidden
await expect(page.getByRole("navigation", { name: /sidebar/i })).not.toBeVisible();
});
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
| Cypress for Next.js E2E | Playwright (official Next.js recommendation) | 2023–2024 | Cross-browser, better WS support, no iframe limitations |
lighthouse npm module with custom scripts |
@lhci/cli autorun |
2020+ | Automated multi-run averaging, assertions, CI reporting |
axe-playwright (community) |
@axe-core/playwright (official Deque) |
2022+ | Official package, same API, no extra wrapper |
next start for E2E server |
node .next/standalone/server.js |
Next.js 12+ standalone | Required when output: "standalone" is set |
| middleware.ts | proxy.ts | Next.js 16 | Next.js 16 renamed middleware file |
Deprecated/outdated:
cypress/integration/directory: Cypress split this intocypress/e2e/in v10 — but we're not using Cypress@playwright/testglobalSetupstring path: Still valid but the project-basedsetupdependency is preferred in Playwright 1.40+installSerwist(): Replaced bynew Serwist() + addEventListeners()in serwist v9 (already applied in Phase 8)
Open Questions
-
Lighthouse on authenticated pages
- What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to
/login - What's unclear: Whether LHCI supports cookie injection (not documented)
- Recommendation: Scope Lighthouse to
/loginonly for QA-02. Dashboard/chat performance validated manually or via Web Vitals tracking in production.
- What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to
-
Visual regression baseline generation environment
- What we know: OS-level rendering differences cause false failures
- What's unclear: Whether the Gitea runner is Linux or Mac
- Recommendation: Wave 0 task generates baselines inside the CI Docker container (Linux), commits them. Dev machines use
--update-snapshotsonly deliberately.
-
Celery worker in E2E
- What we know: The chat WebSocket flow uses Redis pub-sub to deliver responses from the Celery worker
- What's unclear: Whether E2E should run the Celery worker (real pipeline, slow) or mock the WS entirely (fast but less realistic)
- Recommendation: Mock the WebSocket entirely via
page.routeWebSocket(). This tests the frontend streaming UX without depending on Celery. Add a separate smoke test that hits the gateway/healthendpoint to verify service health in CI.
Validation Architecture
Test Framework
| Property | Value |
|---|---|
| Framework (backend) | pytest 8.3+ / pytest-asyncio (existing, all tests pass) |
| Framework (E2E) | @playwright/test ^1.51 (to be installed) |
| Config file (E2E) | packages/portal/playwright.config.ts — Wave 0 |
| Quick run (backend) | uv run pytest tests/unit -x --tb=short |
| Full suite (backend) | uv run pytest tests/ -x --tb=short |
| E2E run | cd packages/portal && npx playwright test |
| Visual update | cd packages/portal && npx playwright test --update-snapshots |
Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|---|---|---|---|---|
| QA-01 | 7 critical user flows pass | E2E Playwright | npx playwright test e2e/flows/ --project=chromium |
Wave 0 |
| QA-02 | Lighthouse >= 90 on key pages | Lighthouse CI | npx lhci autorun --config=e2e/lighthouse/lighthouserc.json |
Wave 0 |
| QA-03 | Visual snapshots pass at 3 viewports | Visual regression | npx playwright test e2e/visual/ |
Wave 0 |
| QA-04 | Zero critical a11y violations | Accessibility scan | npx playwright test e2e/accessibility/ |
Wave 0 |
| QA-05 | All E2E flows pass on 3 browsers | Cross-browser E2E | npx playwright test e2e/flows/ (all projects) |
Wave 0 |
| QA-06 | Empty/error/loading states correct | E2E Playwright | Covered within flow specs via API mocking | Wave 0 |
| QA-07 | CI pipeline runs in Gitea Actions | CI workflow | .gitea/workflows/ci.yml |
Wave 0 |
Sampling Rate
- Per task commit:
cd packages/portal && npx playwright test e2e/flows/login.spec.ts --project=chromium - Per wave merge:
cd packages/portal && npx playwright test e2e/flows/ --project=chromium - Phase gate: Full suite (all projects + accessibility + visual) green before
/gsd:verify-work
Wave 0 Gaps
packages/portal/playwright.config.ts— E2E framework configpackages/portal/e2e/auth.setup.ts— Auth state generation for 3 rolespackages/portal/e2e/fixtures.ts— Shared test fixtures (axe, auth, API helpers)packages/portal/e2e/helpers/seed.ts— Test data seeding via APIpackages/portal/e2e/flows/*.spec.ts— 7 flow spec filespackages/portal/e2e/accessibility/a11y.spec.ts— axe-core scanspackages/portal/e2e/visual/snapshots.spec.ts— visual regression specspackages/portal/e2e/lighthouse/lighthouserc.json— Lighthouse CI config.gitea/workflows/ci.yml— CI pipelinepackages/portal/playwright/.auth/.gitkeep— Directory for saved auth state (gitignored content)- Framework install:
cd packages/portal && npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli && npx playwright install --with-deps - Baseline snapshots: run
npx playwright test e2e/visual/ --update-snapshotson Linux to generate
Sources
Primary (HIGH confidence)
- https://playwright.dev/docs/auth — storageState, setup projects, multiple roles
- https://playwright.dev/docs/api/class-websocketroute — WebSocket mocking API
- https://playwright.dev/docs/test-snapshots — toHaveScreenshot, maxDiffPixelRatio
- https://playwright.dev/docs/accessibility-testing — @axe-core/playwright integration
- https://playwright.dev/docs/ci — CI configuration, Docker image, workers
- https://googlechrome.github.io/lighthouse-ci/docs/configuration.html — minScore assertions format
Secondary (MEDIUM confidence)
- https://googlechrome.github.io/lighthouse-ci/docs/getting-started.html — lhci autorun setup
- https://playwright.dev/docs/mock — page.route() and page.routeWebSocket() overview
- Gitea Actions docs (forum.gitea.com) — confirmed GitHub Actions YAML compatibility, Docker socket requirements
Tertiary (LOW confidence)
- WebSearch result: Gitea runner Docker group requirement — mentioned across multiple community posts, not in official docs
Metadata
Confidence breakdown:
- Standard stack: HIGH — verified against official Playwright, @axe-core, and LHCI docs
- Architecture: HIGH — patterns derived directly from official Playwright documentation
- Pitfalls: HIGH (pitfalls 1–6 from direct codebase inspection + official docs); MEDIUM (pitfall 7 from community sources)
Research date: 2026-03-25 Valid until: 2026-06-25 (90 days — Playwright and Next.js are fast-moving but breaking changes are rare)