Files
konstruct/.planning/phases/09-testing-qa/09-RESEARCH.md

37 KiB
Raw Blame History

Phase 9: Testing & QA - Research

Researched: 2026-03-25 Domain: Playwright E2E, Lighthouse CI, visual regression, axe-core accessibility, Gitea Actions CI Confidence: HIGH

Summary

Phase 9 is a greenfield testing layer added on top of a fully-built portal (Next.js 16 standalone, FastAPI gateway, Celery worker). No Playwright config exists yet — the Playwright MCP plugin is installed for manual use but there is no playwright.config.ts, no tests/e2e/ content, and no .gitea/workflows/ CI file. Everything must be created from scratch.

The core challenges are: (1) Auth.js v5 JWT sessions that Playwright must obtain and reuse across multiple role fixtures (platform_admin, customer_admin, customer_operator); (2) the WebSocket chat flow at /chat/ws/{conversation_id} that needs mocking via page.routeWebSocket(); (3) Lighthouse CI that requires a running Next.js server (standalone output complicates startServerCommand); and (4) a sub-5-minute pipeline on Gitea Actions that is nearly syntax-identical to GitHub Actions.

Primary recommendation: Place Playwright config and tests inside packages/portal/ (Next.js co-location pattern), use storageState with three saved auth fixtures for roles, mock the WebSocket endpoint with page.routeWebSocket() for the chat flow, and run @lhci/cli in a separate post-build CI stage.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

All decisions at Claude's discretion — user trusts judgment.

  • Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation)
  • Critical flows to test (priority order):
    1. Login → dashboard loads → session persists
    2. Create tenant → tenant appears in list
    3. Deploy template agent → agent appears in employees list
    4. Chat: open conversation → send message → receive streaming response (mock LLM)
    5. RBAC: operator cannot access /agents/new, /billing, /users
    6. Language switcher → UI updates to selected language
    7. Mobile viewport: bottom tab bar renders, sidebar hidden
  • LLM responses mocked in E2E tests (no real Ollama/API calls)
  • Test data: seed a test tenant + test user via API calls in test setup, clean up after
  • Lighthouse targets: >= 90 (fail at 80, warn at 85)
  • Pages: login, dashboard, chat, agents/new
  • Visual regression at 3 viewports: desktop 1280x800, tablet 768x1024, mobile 375x812
  • Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery
  • Baseline snapshots committed to repo
  • axe-core via @axe-core/playwright, zero critical violations required
  • "serious" violations logged as warnings (not blockers for beta)
  • Keyboard navigation test: Tab through login form, chat input, nav items
  • Cross-browser: chromium, firefox, webkit
  • Visual regression: chromium only
  • Gitea Actions, triggers: push to main, PR to main
  • Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse
  • Docker Compose for CI infra
  • JUnit XML + HTML trace viewer reports
  • Fail-fast: lint/type errors block everything; unit test failures block E2E
  • Target: < 5 min pipeline

Claude's Discretion

  • Playwright config details (timeouts, retries, parallelism)
  • Test file organization (by feature vs by page)
  • Fixture/helper patterns for auth, tenant setup, API mocking
  • Lighthouse CI tool (lighthouse-ci vs @lhci/cli)
  • Whether to include a smoke test for the WebSocket chat connection
  • Visual regression threshold (pixel diff tolerance)

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>

<phase_requirements>

Phase Requirements

ID Description Research Support
QA-01 Playwright E2E tests cover all critical user flows (login, tenant CRUD, agent deploy, chat, billing, RBAC) Playwright storageState auth fixtures + routeWebSocket for chat mock
QA-02 Lighthouse scores >= 90 for performance, accessibility, best practices, SEO on key pages @lhci/cli with minScore assertions per category
QA-03 Visual regression snapshots at desktop/tablet/mobile for all key pages toHaveScreenshot with maxDiffPixelRatio, viewports per project
QA-04 axe-core accessibility audit passes with zero critical violations across all pages @axe-core/playwright AxeBuilder with impact filter
QA-05 E2E tests pass on Chrome, Firefox, Safari (WebKit) Playwright projects array with three browser engines
QA-06 Empty states, error states, loading states tested and rendered correctly Dedicated test cases + API mocking for empty/error responses
QA-07 CI-ready test suite runnable in Gitea Actions pipeline .gitea/workflows/ci.yml with Docker Compose service containers
</phase_requirements>

Standard Stack

Core

Library Version Purpose Why Standard
@playwright/test ^1.51 E2E + visual regression + accessibility runner Official Next.js recommendation, cross-browser built-in, no extra dependencies
@axe-core/playwright ^4.10 Accessibility scanning within Playwright tests Official Deque package, integrates directly with Playwright page objects
@lhci/cli ^0.15 Lighthouse CI score assertions Google-maintained, headless Lighthouse, assertion config via lighthouserc

Supporting

Library Version Purpose When to Use
axe-html-reporter ^2.2 HTML accessibility reports When you want human-readable a11y reports attached to CI artifacts

Alternatives Considered

Instead of Could Use Tradeoff
@lhci/cli lighthouse npm module directly @lhci/cli handles multi-run averaging, assertions, and CI upload; raw lighthouse requires custom scripting
@axe-core/playwright axe-playwright (third-party) @axe-core/playwright is the official Deque package; axe-playwright is a community wrapper with same API but extra dep

Installation (portal):

cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit

Architecture Patterns

packages/portal/
├── playwright.config.ts          # Main config: projects, webServer, globalSetup
├── e2e/
│   ├── auth.setup.ts             # Global setup: save storageState per role
│   ├── fixtures.ts               # Extended test: auth fixtures, axe builder, API helpers
│   ├── helpers/
│   │   ├── seed.ts               # Seed test tenant + user via API, return IDs
│   │   └── cleanup.ts            # Delete seeded data after test suite
│   ├── flows/
│   │   ├── login.spec.ts         # Flow 1: login → dashboard loads → session persists
│   │   ├── tenant-crud.spec.ts   # Flow 2: create tenant → appears in list
│   │   ├── agent-deploy.spec.ts  # Flow 3: deploy template → appears in employees
│   │   ├── chat.spec.ts          # Flow 4: open chat → send msg → streaming response (mocked WS)
│   │   ├── rbac.spec.ts          # Flow 5: operator access denied to restricted pages
│   │   ├── i18n.spec.ts          # Flow 6: language switcher → UI updates
│   │   └── mobile.spec.ts        # Flow 7: mobile viewport → bottom tab bar, sidebar hidden
│   ├── accessibility/
│   │   └── a11y.spec.ts          # axe-core scan on every key page, keyboard nav test
│   ├── visual/
│   │   └── snapshots.spec.ts     # Visual regression at 3 viewports (chromium only)
│   └── lighthouse/
│       └── lighthouserc.json     # @lhci/cli config: URLs, score thresholds
├── playwright/.auth/             # gitignored — saved storageState files
│   ├── platform-admin.json
│   ├── customer-admin.json
│   └── customer-operator.json
└── __snapshots__/                # Committed baseline screenshots
.gitea/
└── workflows/
    └── ci.yml                    # Pipeline: lint → typecheck → pytest → build → E2E → lhci

Pattern 1: Auth.js v5 storageState with Multiple Roles

What: Authenticate each role once in a global setup project, save to JSON. All E2E tests consume the saved state — no repeated login UI interactions.

When to use: Any test that requires a logged-in user. Each spec declares which role it needs via test.use({ storageState }).

Key insight for Auth.js v5: The credentials provider calls the FastAPI /api/portal/auth/verify endpoint. Playwright must fill the login form (not call the API directly) because next-auth sets HttpOnly session cookies that only the browser can hold. The storageState captures those cookies.

// Source: https://playwright.dev/docs/auth
// e2e/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
import path from "path";

const PLATFORM_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/platform-admin.json");
const CUSTOMER_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-admin.json");
const OPERATOR_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-operator.json");

setup("authenticate as platform admin", async ({ page }) => {
  await page.goto("/login");
  await page.getByLabel("Email").fill(process.env.E2E_ADMIN_EMAIL!);
  await page.getByLabel("Password").fill(process.env.E2E_ADMIN_PASSWORD!);
  await page.getByRole("button", { name: /sign in/i }).click();
  await page.waitForURL("/dashboard");
  await page.context().storageState({ path: PLATFORM_ADMIN_AUTH });
});

setup("authenticate as customer admin", async ({ page }) => {
  // seed returns { email, password } for a fresh customer_admin user
  await page.goto("/login");
  await page.getByLabel("Email").fill(process.env.E2E_CADMIN_EMAIL!);
  await page.getByLabel("Password").fill(process.env.E2E_CADMIN_PASSWORD!);
  await page.getByRole("button", { name: /sign in/i }).click();
  await page.waitForURL("/dashboard");
  await page.context().storageState({ path: CUSTOMER_ADMIN_AUTH });
});

Pattern 2: WebSocket Mocking for Chat Flow

What: Intercept the /chat/ws/{conversationId} WebSocket before the gateway is contacted. Respond to the auth message, then simulate streaming tokens on a user message.

When to use: Flow 4 (chat E2E test). The gateway WebSocket endpoint at ws://localhost:8001/chat/ws/{id} is routed via the Next.js API proxy — intercept at the browser level.

// Source: https://playwright.dev/docs/api/class-websocketroute
// e2e/flows/chat.spec.ts
test("chat: send message → receive streaming response", async ({ page }) => {
  await page.routeWebSocket(/\/chat\/ws\//, (ws) => {
    ws.onMessage((msg) => {
      const data = JSON.parse(msg as string);

      if (data.type === "auth") {
        // Acknowledge auth — no response needed, gateway just proceeds
        return;
      }

      if (data.type === "message") {
        // Simulate typing indicator
        ws.send(JSON.stringify({ type: "typing" }));
        // Simulate streaming tokens
        const tokens = ["Hello", " from", " your", " AI", " assistant!"];
        tokens.forEach((token, i) => {
          setTimeout(() => {
            ws.send(JSON.stringify({ type: "chunk", token }));
          }, i * 50);
        });
        setTimeout(() => {
          ws.send(JSON.stringify({
            type: "response",
            text: tokens.join(""),
            conversation_id: data.conversation_id,
          }));
          ws.send(JSON.stringify({ type: "done", text: tokens.join("") }));
        }, tokens.length * 50 + 100);
      }
    });
  });

  await page.goto("/chat?agentId=test-agent");
  await page.getByPlaceholder(/type a message/i).fill("Hello!");
  await page.keyboard.press("Enter");
  await expect(page.getByText("Hello from your AI assistant!")).toBeVisible({ timeout: 5000 });
});

Pattern 3: Visual Regression at Multiple Viewports

What: Configure separate Playwright projects for each viewport, run snapshots only on chromium to avoid cross-browser rendering diffs.

When to use: QA-03. Visual regression baseline committed to repo; CI fails on diff.

// Source: https://playwright.dev/docs/test-snapshots
// playwright.config.ts (visual projects section)
{
  name: "visual-desktop",
  use: {
    ...devices["Desktop Chrome"],
    viewport: { width: 1280, height: 800 },
  },
  testMatch: "e2e/visual/**",
},
{
  name: "visual-tablet",
  use: {
    browserName: "chromium",
    viewport: { width: 768, height: 1024 },
  },
  testMatch: "e2e/visual/**",
},
{
  name: "visual-mobile",
  use: {
    ...devices["iPhone 12"],
    viewport: { width: 375, height: 812 },
  },
  testMatch: "e2e/visual/**",
},

Global threshold:

// playwright.config.ts
expect: {
  toHaveScreenshot: {
    maxDiffPixelRatio: 0.02, // 2% tolerance — accounts for antialiasing
    threshold: 0.2,          // pixel color threshold (01)
  },
},

Pattern 4: axe-core Fixture

What: Shared fixture that creates an AxeBuilder for each page, scoped to WCAG 2.1 AA, filtering results by impact level.

// Source: https://playwright.dev/docs/accessibility-testing
// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";

export const test = base.extend<{ axe: () => AxeBuilder }>({
  axe: async ({ page }, use) => {
    const makeBuilder = () =>
      new AxeBuilder({ page })
        .withTags(["wcag2a", "wcag2aa", "wcag21aa"]);
    await use(makeBuilder);
  },
});

// In a test:
const results = await axe().analyze();
const criticalViolations = results.violations.filter(v => v.impact === "critical");
const seriousViolations = results.violations.filter(v => v.impact === "serious");

expect(criticalViolations, "Critical a11y violations found").toHaveLength(0);
if (seriousViolations.length > 0) {
  console.warn("Serious a11y violations (non-blocking):", seriousViolations);
}

Pattern 5: Lighthouse CI Config

What: lighthouserc.json drives @lhci/cli autorun in CI. Pages run headlessly against the built portal.

// Source: https://googlechrome.github.io/lighthouse-ci/docs/configuration.html
// e2e/lighthouse/lighthouserc.json
{
  "ci": {
    "collect": {
      "url": [
        "http://localhost:3000/login",
        "http://localhost:3000/dashboard",
        "http://localhost:3000/chat",
        "http://localhost:3000/agents/new"
      ],
      "numberOfRuns": 1,
      "settings": {
        "preset": "desktop",
        "chromeFlags": "--no-sandbox --disable-dev-shm-usage"
      }
    },
    "assert": {
      "assertions": {
        "categories:performance":     ["error", {"minScore": 0.80}],
        "categories:accessibility":   ["error", {"minScore": 0.80}],
        "categories:best-practices":  ["error", {"minScore": 0.80}],
        "categories:seo":             ["error", {"minScore": 0.80}]
      }
    },
    "upload": {
      "target": "filesystem",
      "outputDir": ".lighthouseci"
    }
  }
}

Note: error at 0.80 means CI fails below 80; the 90 target is aspirational. Set warn at 0.85 for soft alerts.

Pattern 6: Playwright Config (Full)

// packages/portal/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
  testDir: "./e2e",
  fullyParallel: false,        // Stability in CI with shared DB state
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 1 : 0,
  workers: process.env.CI ? 1 : undefined,
  timeout: 30_000,

  reporter: [
    ["html", { outputFolder: "playwright-report" }],
    ["junit", { outputFile: "playwright-results.xml" }],
    ["list"],
  ],

  use: {
    baseURL: process.env.PLAYWRIGHT_BASE_URL ?? "http://localhost:3000",
    trace: "on-first-retry",
    screenshot: "only-on-failure",
    serviceWorkers: "block",   // Prevents Serwist from intercepting test requests
  },

  expect: {
    toHaveScreenshot: {
      maxDiffPixelRatio: 0.02,
      threshold: 0.2,
    },
  },

  projects: [
    // Auth setup runs first for all browser projects
    { name: "setup", testMatch: /auth\.setup\.ts/ },

    // E2E flows — all 3 browsers
    {
      name: "chromium",
      use: { ...devices["Desktop Chrome"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },
    {
      name: "firefox",
      use: { ...devices["Desktop Firefox"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },
    {
      name: "webkit",
      use: { ...devices["Desktop Safari"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },

    // Visual regression — chromium only, 3 viewports
    { name: "visual-desktop",  use: { browserName: "chromium", viewport: { width: 1280, height: 800 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
    { name: "visual-tablet",   use: { browserName: "chromium", viewport: { width: 768, height: 1024 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
    { name: "visual-mobile",   use: { ...devices["iPhone 12"] },                                         testMatch: "e2e/visual/**", dependencies: ["setup"] },

    // Accessibility — chromium only
    {
      name: "a11y",
      use: { ...devices["Desktop Chrome"] },
      dependencies: ["setup"],
      testMatch: "e2e/accessibility/**",
    },
  ],

  webServer: {
    command: "node .next/standalone/server.js",
    url: "http://localhost:3000",
    reuseExistingServer: !process.env.CI,
    env: {
      PORT: "3000",
      API_URL: process.env.API_URL ?? "http://localhost:8001",
      AUTH_SECRET: process.env.AUTH_SECRET ?? "test-secret-32-chars-minimum-len",
      AUTH_URL: "http://localhost:3000",
    },
  },
});

Critical: serviceWorkers: "block" is required because Serwist (PWA service worker) intercepts network requests and makes them invisible to page.route() / page.routeWebSocket().

Pattern 7: Gitea Actions CI Pipeline

# .gitea/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  backend:
    name: Backend Tests
    runs-on: ubuntu-latest
    services:
      postgres:
        image: pgvector/pgvector:pg16
        env:
          POSTGRES_DB: konstruct
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres_dev
        options: >-
          --health-cmd pg_isready
          --health-interval 5s
          --health-timeout 5s
          --health-retries 10
      redis:
        image: redis:7-alpine
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 5s
    env:
      DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
      DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
      REDIS_URL: redis://localhost:6379/0
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install uv
      - run: uv sync
      - run: uv run ruff check packages/ tests/
      - run: uv run mypy --strict packages/
      - run: uv run pytest tests/ -x --tb=short

  portal:
    name: Portal E2E
    runs-on: ubuntu-latest
    needs: backend          # E2E blocked until backend passes
    services:
      postgres:
        image: pgvector/pgvector:pg16
        env:
          POSTGRES_DB: konstruct
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres_dev
        options: --health-cmd pg_isready --health-interval 5s --health-retries 10
      redis:
        image: redis:7-alpine
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "22" }
      - name: Install portal deps
        working-directory: packages/portal
        run: npm ci
      - name: Build portal
        working-directory: packages/portal
        run: npm run build
        env:
          NEXT_PUBLIC_API_URL: http://localhost:8001
      - name: Install Playwright browsers
        working-directory: packages/portal
        run: npx playwright install --with-deps chromium firefox webkit
      - name: Start gateway (background)
        run: |
          pip install uv && uv sync
          uv run alembic upgrade head
          uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
        env:
          DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
          DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
          REDIS_URL: redis://localhost:6379/0
          LLM_POOL_URL: http://localhost:8004  # not running — mocked in E2E
      - name: Wait for gateway
        run: timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'
      - name: Run E2E tests
        working-directory: packages/portal
        run: npx playwright test e2e/flows/ e2e/accessibility/
        env:
          CI: "true"
          PLAYWRIGHT_BASE_URL: http://localhost:3000
          API_URL: http://localhost:8001
          AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
          E2E_ADMIN_EMAIL: ${{ secrets.E2E_ADMIN_EMAIL }}
          E2E_ADMIN_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }}
      - name: Run Lighthouse CI
        working-directory: packages/portal
        run: |
          npx lhci autorun --config=e2e/lighthouse/lighthouserc.json
        env:
          LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
      - name: Upload Playwright report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: packages/portal/playwright-report/
      - name: Upload Lighthouse report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: lighthouse-report
          path: packages/portal/.lighthouseci/

Anti-Patterns to Avoid

  • Hardcoded IDs in selectors: Use getByRole, getByLabel, getByText — never CSS #id or [data-testid] unless semantic selectors are unavailable. Semantic selectors are more resilient and double as accessibility checks.
  • Real LLM calls in E2E: Never let E2E tests reach Ollama/OpenAI. Mock the WebSocket and gateway LLM calls. Real calls introduce flakiness and cost.
  • Superuser DB connections in test seeds: The existing conftest uses konstruct_app role to preserve RLS. E2E seeds should call the FastAPI admin API endpoints, not connect directly to the DB.
  • Enabling service workers in tests: Serwist intercepts all requests. Always set serviceWorkers: "block" in Playwright config.
  • Parallel workers with shared DB state: Set workers: 1 in CI. Tenant/agent mutations are not thread-safe across workers without per-worker isolation.
  • Running visual regression on all browsers: Browser rendering engines produce expected pixel diffs. Visual regression on chromium only; cross-browser covered by functional E2E.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Screenshot diffs Custom pixel comparator toHaveScreenshot() built into Playwright Handles baseline storage, update workflow, CI reporting
Accessibility scanning Custom ARIA traversal @axe-core/playwright Covers 57 WCAG rules including ones humans miss
Performance score gating Parsing Lighthouse JSON manually @lhci/cli assert Handles multi-run averaging, threshold config, exit codes
Auth state reuse Logging in before every test Playwright storageState Session reuse makes the suite 10x faster
WS mock server Running a real mock websocket server page.routeWebSocket() In-process, no port conflicts, no flakiness

Common Pitfalls

Pitfall 1: Auth.js HttpOnly Cookies

What goes wrong: Trying to authenticate by calling /api/portal/auth/verify directly with Playwright request — this bypasses Auth.js cookie-setting, so the browser session never exists. Why it happens: Auth.js v5 JWT is set as HttpOnly secure cookie by the Next.js server, not by the FastAPI backend. How to avoid: Always use Playwright's UI login flow (fill form → submit → wait for redirect) to let Next.js set the cookie. Then save with storageState. Warning signs: Tests pass the login assertion but fail immediately after on authenticated pages.

Pitfall 2: Serwist Service Worker Intercepting Test Traffic

What goes wrong: page.route() and page.routeWebSocket() handlers never fire because the PWA service worker handles requests first. Why it happens: Serwist registers a service worker that intercepts all requests matching the scope. Playwright's routing operates at the network level before the service worker, but only if service workers are blocked. How to avoid: Set serviceWorkers: "block" in playwright.config.ts under use. Warning signs: Mock routes never called; tests see real responses or network errors.

Pitfall 3: Next.js Standalone Output Path for webServer

What goes wrong: command: "npm run start" fails in CI because next start requires the dev server setup, not standalone output. Why it happens: The portal uses output: "standalone" in next.config.ts. The build produces .next/standalone/server.js, not the standard Next.js CLI server. How to avoid: Use command: "node .next/standalone/server.js" in Playwright's webServer config. Copy static files if needed: the build step must run cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public. Warning signs: webServer process exits immediately; Playwright reports "server did not start".

Pitfall 4: Visual Regression Baseline Committed Without CI Environment Lock

What goes wrong: Baselines created on a developer's Mac differ from Linux CI renderings (font rendering, subpixel AA, etc.). Why it happens: Screenshot comparisons are pixel-exact. OS-level rendering differences cause 15% false failures. How to avoid: Generate baselines inside the same Docker/Linux environment as CI. Run npx playwright test --update-snapshots on Linux (or in the Playwright Docker image) to commit initial baselines. Use maxDiffPixelRatio: 0.02 to absorb minor remaining differences. Warning signs: Visual tests pass locally but always fail in CI.

Pitfall 5: Lighthouse Pages Behind Auth

What goes wrong: Lighthouse visits /dashboard and gets redirected to /login — scores an empty page. Why it happens: Lighthouse runs as an unauthenticated browser session. LHCI doesn't support Auth.js cookie injection. How to avoid: For authenticated pages, either (a) test only public pages with Lighthouse (login, landing), or (b) use LHCI's basicAuth option for pages behind HTTP auth (not applicable here), or (c) create a special unauthenticated preview mode. For this project: Run Lighthouse on /login only, plus any public-accessible marketing pages. Skip /dashboard and /chat for Lighthouse. Warning signs: Lighthouse scores 100 for accessibility on dashboard — suspiciously perfect because it's measuring an empty redirect.

Pitfall 6: WebSocket URL Resolution in Tests

What goes wrong: page.routeWebSocket("/chat/ws/") doesn't match because the portal derives the WS URL from NEXT_PUBLIC_API_URL (baked at build time), which points to ws://localhost:8001, not a relative path. Why it happens: use-chat-socket.ts computes WS_BASE from process.env.NEXT_PUBLIC_API_URL and builds ws://localhost:8001/chat/ws/{id}. How to avoid: Use a regex pattern: page.routeWebSocket(/\/chat\/ws\//, handler) — this matches the full absolute URL. Warning signs: Chat mock never fires; test times out waiting for WS message.

Pitfall 7: Gitea Actions Runner Needs Docker

What goes wrong: Service containers fail to start because the Gitea runner is not configured with Docker access. Why it happens: Gitea Actions service containers require Docker socket access on the runner. How to avoid: Ensure the act_runner is added to the docker group on the host. Alternative: use docker compose in a setup step instead of service containers. Warning signs: Job fails immediately with "Cannot connect to Docker daemon".

Code Examples

Seed Helper via API

// e2e/helpers/seed.ts
// Uses Playwright APIRequestContext to create test data via FastAPI endpoints.
// Must run BEFORE storageState setup (needs platform_admin creds via env).
export async function seedTestTenant(request: APIRequestContext): Promise<{ tenantId: string; tenantSlug: string }> {
  const suffix = Math.random().toString(36).slice(2, 8);
  const res = await request.post("http://localhost:8001/api/portal/tenants", {
    headers: {
      "X-User-Id": process.env.E2E_ADMIN_ID!,
      "X-User-Role": "platform_admin",
      "X-Active-Tenant": "",
    },
    data: { name: `E2E Tenant ${suffix}`, slug: `e2e-tenant-${suffix}` },
  });
  const body = await res.json() as { id: string; slug: string };
  return { tenantId: body.id, tenantSlug: body.slug };
}

RBAC Test Pattern

// e2e/flows/rbac.spec.ts
// Tests that operator role is silently redirected, not 403-paged
test.describe("RBAC enforcement", () => {
  test.use({ storageState: "playwright/.auth/customer-operator.json" });

  const restrictedPaths = ["/agents/new", "/billing", "/users"];

  for (const path of restrictedPaths) {
    test(`operator cannot access ${path}`, async ({ page }) => {
      await page.goto(path);
      // proxy.ts does silent redirect — operator ends up on /dashboard
      await expect(page).not.toHaveURL(path);
    });
  }
});

Mobile Viewport Behavioral Test

// e2e/flows/mobile.spec.ts
test("mobile: bottom tab bar renders, sidebar hidden", async ({ page }) => {
  await page.setViewportSize({ width: 375, height: 812 });
  await page.goto("/dashboard");
  // Bottom tab bar visible
  await expect(page.getByRole("navigation", { name: /mobile/i })).toBeVisible();
  // Desktop sidebar hidden
  await expect(page.getByRole("navigation", { name: /sidebar/i })).not.toBeVisible();
});

State of the Art

Old Approach Current Approach When Changed Impact
Cypress for Next.js E2E Playwright (official Next.js recommendation) 20232024 Cross-browser, better WS support, no iframe limitations
lighthouse npm module with custom scripts @lhci/cli autorun 2020+ Automated multi-run averaging, assertions, CI reporting
axe-playwright (community) @axe-core/playwright (official Deque) 2022+ Official package, same API, no extra wrapper
next start for E2E server node .next/standalone/server.js Next.js 12+ standalone Required when output: "standalone" is set
middleware.ts proxy.ts Next.js 16 Next.js 16 renamed middleware file

Deprecated/outdated:

  • cypress/integration/ directory: Cypress split this into cypress/e2e/ in v10 — but we're not using Cypress
  • @playwright/test globalSetup string path: Still valid but the project-based setup dependency is preferred in Playwright 1.40+
  • installSerwist(): Replaced by new Serwist() + addEventListeners() in serwist v9 (already applied in Phase 8)

Open Questions

  1. Lighthouse on authenticated pages

    • What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to /login
    • What's unclear: Whether LHCI supports cookie injection (not documented)
    • Recommendation: Scope Lighthouse to /login only for QA-02. Dashboard/chat performance validated manually or via Web Vitals tracking in production.
  2. Visual regression baseline generation environment

    • What we know: OS-level rendering differences cause false failures
    • What's unclear: Whether the Gitea runner is Linux or Mac
    • Recommendation: Wave 0 task generates baselines inside the CI Docker container (Linux), commits them. Dev machines use --update-snapshots only deliberately.
  3. Celery worker in E2E

    • What we know: The chat WebSocket flow uses Redis pub-sub to deliver responses from the Celery worker
    • What's unclear: Whether E2E should run the Celery worker (real pipeline, slow) or mock the WS entirely (fast but less realistic)
    • Recommendation: Mock the WebSocket entirely via page.routeWebSocket(). This tests the frontend streaming UX without depending on Celery. Add a separate smoke test that hits the gateway /health endpoint to verify service health in CI.

Validation Architecture

Test Framework

Property Value
Framework (backend) pytest 8.3+ / pytest-asyncio (existing, all tests pass)
Framework (E2E) @playwright/test ^1.51 (to be installed)
Config file (E2E) packages/portal/playwright.config.ts — Wave 0
Quick run (backend) uv run pytest tests/unit -x --tb=short
Full suite (backend) uv run pytest tests/ -x --tb=short
E2E run cd packages/portal && npx playwright test
Visual update cd packages/portal && npx playwright test --update-snapshots

Phase Requirements → Test Map

Req ID Behavior Test Type Automated Command File Exists?
QA-01 7 critical user flows pass E2E Playwright npx playwright test e2e/flows/ --project=chromium Wave 0
QA-02 Lighthouse >= 90 on key pages Lighthouse CI npx lhci autorun --config=e2e/lighthouse/lighthouserc.json Wave 0
QA-03 Visual snapshots pass at 3 viewports Visual regression npx playwright test e2e/visual/ Wave 0
QA-04 Zero critical a11y violations Accessibility scan npx playwright test e2e/accessibility/ Wave 0
QA-05 All E2E flows pass on 3 browsers Cross-browser E2E npx playwright test e2e/flows/ (all projects) Wave 0
QA-06 Empty/error/loading states correct E2E Playwright Covered within flow specs via API mocking Wave 0
QA-07 CI pipeline runs in Gitea Actions CI workflow .gitea/workflows/ci.yml Wave 0

Sampling Rate

  • Per task commit: cd packages/portal && npx playwright test e2e/flows/login.spec.ts --project=chromium
  • Per wave merge: cd packages/portal && npx playwright test e2e/flows/ --project=chromium
  • Phase gate: Full suite (all projects + accessibility + visual) green before /gsd:verify-work

Wave 0 Gaps

  • packages/portal/playwright.config.ts — E2E framework config
  • packages/portal/e2e/auth.setup.ts — Auth state generation for 3 roles
  • packages/portal/e2e/fixtures.ts — Shared test fixtures (axe, auth, API helpers)
  • packages/portal/e2e/helpers/seed.ts — Test data seeding via API
  • packages/portal/e2e/flows/*.spec.ts — 7 flow spec files
  • packages/portal/e2e/accessibility/a11y.spec.ts — axe-core scans
  • packages/portal/e2e/visual/snapshots.spec.ts — visual regression specs
  • packages/portal/e2e/lighthouse/lighthouserc.json — Lighthouse CI config
  • .gitea/workflows/ci.yml — CI pipeline
  • packages/portal/playwright/.auth/.gitkeep — Directory for saved auth state (gitignored content)
  • Framework install: cd packages/portal && npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli && npx playwright install --with-deps
  • Baseline snapshots: run npx playwright test e2e/visual/ --update-snapshots on Linux to generate

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

  • WebSearch result: Gitea runner Docker group requirement — mentioned across multiple community posts, not in official docs

Metadata

Confidence breakdown:

  • Standard stack: HIGH — verified against official Playwright, @axe-core, and LHCI docs
  • Architecture: HIGH — patterns derived directly from official Playwright documentation
  • Pitfalls: HIGH (pitfalls 16 from direct codebase inspection + official docs); MEDIUM (pitfall 7 from community sources)

Research date: 2026-03-25 Valid until: 2026-06-25 (90 days — Playwright and Next.js are fast-moving but breaking changes are rare)