Files

Adolfo Delorenzo 30c82a1754 docs(09): research phase Testing & QA

2026-03-25 22:19:32 -06:00

37 KiB

Raw Blame History

Phase 9: Testing & QA - Research

Researched: 2026-03-25 Domain: Playwright E2E, Lighthouse CI, visual regression, axe-core accessibility, Gitea Actions CI Confidence: HIGH

Summary

Phase 9 is a greenfield testing layer added on top of a fully-built portal (Next.js 16 standalone, FastAPI gateway, Celery worker). No Playwright config exists yet — the Playwright MCP plugin is installed for manual use but there is no playwright.config.ts, no tests/e2e/ content, and no .gitea/workflows/ CI file. Everything must be created from scratch.

The core challenges are: (1) Auth.js v5 JWT sessions that Playwright must obtain and reuse across multiple role fixtures (platform_admin, customer_admin, customer_operator); (2) the WebSocket chat flow at /chat/ws/{conversation_id} that needs mocking via page.routeWebSocket(); (3) Lighthouse CI that requires a running Next.js server (standalone output complicates startServerCommand); and (4) a sub-5-minute pipeline on Gitea Actions that is nearly syntax-identical to GitHub Actions.

Primary recommendation: Place Playwright config and tests inside packages/portal/ (Next.js co-location pattern), use storageState with three saved auth fixtures for roles, mock the WebSocket endpoint with page.routeWebSocket() for the chat flow, and run @lhci/cli in a separate post-build CI stage.

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

All decisions at Claude's discretion — user trusts judgment.

Playwright for all E2E tests (cross-browser built-in, official Next.js recommendation)
Critical flows to test (priority order):
1. Login → dashboard loads → session persists
2. Create tenant → tenant appears in list
3. Deploy template agent → agent appears in employees list
4. Chat: open conversation → send message → receive streaming response (mock LLM)
5. RBAC: operator cannot access /agents/new, /billing, /users
6. Language switcher → UI updates to selected language
7. Mobile viewport: bottom tab bar renders, sidebar hidden
LLM responses mocked in E2E tests (no real Ollama/API calls)
Test data: seed a test tenant + test user via API calls in test setup, clean up after
Lighthouse targets: >= 90 (fail at 80, warn at 85)
Pages: login, dashboard, chat, agents/new
Visual regression at 3 viewports: desktop 1280x800, tablet 768x1024, mobile 375x812
Key pages: login, dashboard, agents list, agents/new (3-card entry), chat (empty state), templates gallery
Baseline snapshots committed to repo
axe-core via @axe-core/playwright, zero critical violations required
"serious" violations logged as warnings (not blockers for beta)
Keyboard navigation test: Tab through login form, chat input, nav items
Cross-browser: chromium, firefox, webkit
Visual regression: chromium only
Gitea Actions, triggers: push to main, PR to main
Pipeline stages: lint → type-check → unit tests (pytest) → build portal → E2E tests → Lighthouse
Docker Compose for CI infra
JUnit XML + HTML trace viewer reports
Fail-fast: lint/type errors block everything; unit test failures block E2E
Target: < 5 min pipeline

Claude's Discretion

Playwright config details (timeouts, retries, parallelism)
Test file organization (by feature vs by page)
Fixture/helper patterns for auth, tenant setup, API mocking
Lighthouse CI tool (lighthouse-ci vs @lhci/cli)
Whether to include a smoke test for the WebSocket chat connection
Visual regression threshold (pixel diff tolerance)

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
QA-01	Playwright E2E tests cover all critical user flows (login, tenant CRUD, agent deploy, chat, billing, RBAC)	Playwright storageState auth fixtures + routeWebSocket for chat mock
QA-02	Lighthouse scores >= 90 for performance, accessibility, best practices, SEO on key pages	@lhci/cli with minScore assertions per category
QA-03	Visual regression snapshots at desktop/tablet/mobile for all key pages	toHaveScreenshot with maxDiffPixelRatio, viewports per project
QA-04	axe-core accessibility audit passes with zero critical violations across all pages	@axe-core/playwright AxeBuilder with impact filter
QA-05	E2E tests pass on Chrome, Firefox, Safari (WebKit)	Playwright projects array with three browser engines
QA-06	Empty states, error states, loading states tested and rendered correctly	Dedicated test cases + API mocking for empty/error responses
QA-07	CI-ready test suite runnable in Gitea Actions pipeline	.gitea/workflows/ci.yml with Docker Compose service containers
</phase_requirements>

Standard Stack

Core

Library	Version	Purpose	Why Standard
@playwright/test	^1.51	E2E + visual regression + accessibility runner	Official Next.js recommendation, cross-browser built-in, no extra dependencies
@axe-core/playwright	^4.10	Accessibility scanning within Playwright tests	Official Deque package, integrates directly with Playwright page objects
@lhci/cli	^0.15	Lighthouse CI score assertions	Google-maintained, headless Lighthouse, assertion config via lighthouserc

Supporting

Library	Version	Purpose	When to Use
axe-html-reporter	^2.2	HTML accessibility reports	When you want human-readable a11y reports attached to CI artifacts

Alternatives Considered

Instead of	Could Use	Tradeoff
@lhci/cli	lighthouse npm module directly	@lhci/cli handles multi-run averaging, assertions, and CI upload; raw lighthouse requires custom scripting
@axe-core/playwright	axe-playwright (third-party)	@axe-core/playwright is the official Deque package; axe-playwright is a community wrapper with same API but extra dep

Installation (portal):

cd packages/portal
npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli
npx playwright install --with-deps chromium firefox webkit

Architecture Patterns

Recommended Project Structure

packages/portal/
├── playwright.config.ts          # Main config: projects, webServer, globalSetup
├── e2e/
│   ├── auth.setup.ts             # Global setup: save storageState per role
│   ├── fixtures.ts               # Extended test: auth fixtures, axe builder, API helpers
│   ├── helpers/
│   │   ├── seed.ts               # Seed test tenant + user via API, return IDs
│   │   └── cleanup.ts            # Delete seeded data after test suite
│   ├── flows/
│   │   ├── login.spec.ts         # Flow 1: login → dashboard loads → session persists
│   │   ├── tenant-crud.spec.ts   # Flow 2: create tenant → appears in list
│   │   ├── agent-deploy.spec.ts  # Flow 3: deploy template → appears in employees
│   │   ├── chat.spec.ts          # Flow 4: open chat → send msg → streaming response (mocked WS)
│   │   ├── rbac.spec.ts          # Flow 5: operator access denied to restricted pages
│   │   ├── i18n.spec.ts          # Flow 6: language switcher → UI updates
│   │   └── mobile.spec.ts        # Flow 7: mobile viewport → bottom tab bar, sidebar hidden
│   ├── accessibility/
│   │   └── a11y.spec.ts          # axe-core scan on every key page, keyboard nav test
│   ├── visual/
│   │   └── snapshots.spec.ts     # Visual regression at 3 viewports (chromium only)
│   └── lighthouse/
│       └── lighthouserc.json     # @lhci/cli config: URLs, score thresholds
├── playwright/.auth/             # gitignored — saved storageState files
│   ├── platform-admin.json
│   ├── customer-admin.json
│   └── customer-operator.json
└── __snapshots__/                # Committed baseline screenshots
.gitea/
└── workflows/
    └── ci.yml                    # Pipeline: lint → typecheck → pytest → build → E2E → lhci

Pattern 1: Auth.js v5 storageState with Multiple Roles

What: Authenticate each role once in a global setup project, save to JSON. All E2E tests consume the saved state — no repeated login UI interactions.

When to use: Any test that requires a logged-in user. Each spec declares which role it needs via test.use({ storageState }).

Key insight for Auth.js v5: The credentials provider calls the FastAPI /api/portal/auth/verify endpoint. Playwright must fill the login form (not call the API directly) because next-auth sets HttpOnly session cookies that only the browser can hold. The storageState captures those cookies.

// Source: https://playwright.dev/docs/auth
// e2e/auth.setup.ts
import { test as setup, expect } from "@playwright/test";
import path from "path";

const PLATFORM_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/platform-admin.json");
const CUSTOMER_ADMIN_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-admin.json");
const OPERATOR_AUTH = path.resolve(__dirname, "../playwright/.auth/customer-operator.json");

setup("authenticate as platform admin", async ({ page }) => {
  await page.goto("/login");
  await page.getByLabel("Email").fill(process.env.E2E_ADMIN_EMAIL!);
  await page.getByLabel("Password").fill(process.env.E2E_ADMIN_PASSWORD!);
  await page.getByRole("button", { name: /sign in/i }).click();
  await page.waitForURL("/dashboard");
  await page.context().storageState({ path: PLATFORM_ADMIN_AUTH });
});

setup("authenticate as customer admin", async ({ page }) => {
  // seed returns { email, password } for a fresh customer_admin user
  await page.goto("/login");
  await page.getByLabel("Email").fill(process.env.E2E_CADMIN_EMAIL!);
  await page.getByLabel("Password").fill(process.env.E2E_CADMIN_PASSWORD!);
  await page.getByRole("button", { name: /sign in/i }).click();
  await page.waitForURL("/dashboard");
  await page.context().storageState({ path: CUSTOMER_ADMIN_AUTH });
});

Pattern 2: WebSocket Mocking for Chat Flow

What: Intercept the /chat/ws/{conversationId} WebSocket before the gateway is contacted. Respond to the auth message, then simulate streaming tokens on a user message.

When to use: Flow 4 (chat E2E test). The gateway WebSocket endpoint at ws://localhost:8001/chat/ws/{id} is routed via the Next.js API proxy — intercept at the browser level.

// Source: https://playwright.dev/docs/api/class-websocketroute
// e2e/flows/chat.spec.ts
test("chat: send message → receive streaming response", async ({ page }) => {
  await page.routeWebSocket(/\/chat\/ws\//, (ws) => {
    ws.onMessage((msg) => {
      const data = JSON.parse(msg as string);

      if (data.type === "auth") {
        // Acknowledge auth — no response needed, gateway just proceeds
        return;
      }

      if (data.type === "message") {
        // Simulate typing indicator
        ws.send(JSON.stringify({ type: "typing" }));
        // Simulate streaming tokens
        const tokens = ["Hello", " from", " your", " AI", " assistant!"];
        tokens.forEach((token, i) => {
          setTimeout(() => {
            ws.send(JSON.stringify({ type: "chunk", token }));
          }, i * 50);
        });
        setTimeout(() => {
          ws.send(JSON.stringify({
            type: "response",
            text: tokens.join(""),
            conversation_id: data.conversation_id,
          }));
          ws.send(JSON.stringify({ type: "done", text: tokens.join("") }));
        }, tokens.length * 50 + 100);
      }
    });
  });

  await page.goto("/chat?agentId=test-agent");
  await page.getByPlaceholder(/type a message/i).fill("Hello!");
  await page.keyboard.press("Enter");
  await expect(page.getByText("Hello from your AI assistant!")).toBeVisible({ timeout: 5000 });
});

Pattern 3: Visual Regression at Multiple Viewports

What: Configure separate Playwright projects for each viewport, run snapshots only on chromium to avoid cross-browser rendering diffs.

When to use: QA-03. Visual regression baseline committed to repo; CI fails on diff.

// Source: https://playwright.dev/docs/test-snapshots
// playwright.config.ts (visual projects section)
{
  name: "visual-desktop",
  use: {
    ...devices["Desktop Chrome"],
    viewport: { width: 1280, height: 800 },
  },
  testMatch: "e2e/visual/**",
},
{
  name: "visual-tablet",
  use: {
    browserName: "chromium",
    viewport: { width: 768, height: 1024 },
  },
  testMatch: "e2e/visual/**",
},
{
  name: "visual-mobile",
  use: {
    ...devices["iPhone 12"],
    viewport: { width: 375, height: 812 },
  },
  testMatch: "e2e/visual/**",
},

Global threshold:

// playwright.config.ts
expect: {
  toHaveScreenshot: {
    maxDiffPixelRatio: 0.02, // 2% tolerance — accounts for antialiasing
    threshold: 0.2,          // pixel color threshold (0–1)
  },
},

Pattern 4: axe-core Fixture

What: Shared fixture that creates an AxeBuilder for each page, scoped to WCAG 2.1 AA, filtering results by impact level.

// Source: https://playwright.dev/docs/accessibility-testing
// e2e/fixtures.ts
import { test as base, expect } from "@playwright/test";
import AxeBuilder from "@axe-core/playwright";

export const test = base.extend<{ axe: () => AxeBuilder }>({
  axe: async ({ page }, use) => {
    const makeBuilder = () =>
      new AxeBuilder({ page })
        .withTags(["wcag2a", "wcag2aa", "wcag21aa"]);
    await use(makeBuilder);
  },
});

// In a test:
const results = await axe().analyze();
const criticalViolations = results.violations.filter(v => v.impact === "critical");
const seriousViolations = results.violations.filter(v => v.impact === "serious");

expect(criticalViolations, "Critical a11y violations found").toHaveLength(0);
if (seriousViolations.length > 0) {
  console.warn("Serious a11y violations (non-blocking):", seriousViolations);
}

Pattern 5: Lighthouse CI Config

What: lighthouserc.json drives @lhci/cli autorun in CI. Pages run headlessly against the built portal.

// Source: https://googlechrome.github.io/lighthouse-ci/docs/configuration.html
// e2e/lighthouse/lighthouserc.json
{
  "ci": {
    "collect": {
      "url": [
        "http://localhost:3000/login",
        "http://localhost:3000/dashboard",
        "http://localhost:3000/chat",
        "http://localhost:3000/agents/new"
      ],
      "numberOfRuns": 1,
      "settings": {
        "preset": "desktop",
        "chromeFlags": "--no-sandbox --disable-dev-shm-usage"
      }
    },
    "assert": {
      "assertions": {
        "categories:performance":     ["error", {"minScore": 0.80}],
        "categories:accessibility":   ["error", {"minScore": 0.80}],
        "categories:best-practices":  ["error", {"minScore": 0.80}],
        "categories:seo":             ["error", {"minScore": 0.80}]
      }
    },
    "upload": {
      "target": "filesystem",
      "outputDir": ".lighthouseci"
    }
  }
}

Note: error at 0.80 means CI fails below 80; the 90 target is aspirational. Set warn at 0.85 for soft alerts.

Pattern 6: Playwright Config (Full)

// packages/portal/playwright.config.ts
import { defineConfig, devices } from "@playwright/test";

export default defineConfig({
  testDir: "./e2e",
  fullyParallel: false,        // Stability in CI with shared DB state
  forbidOnly: !!process.env.CI,
  retries: process.env.CI ? 1 : 0,
  workers: process.env.CI ? 1 : undefined,
  timeout: 30_000,

  reporter: [
    ["html", { outputFolder: "playwright-report" }],
    ["junit", { outputFile: "playwright-results.xml" }],
    ["list"],
  ],

  use: {
    baseURL: process.env.PLAYWRIGHT_BASE_URL ?? "http://localhost:3000",
    trace: "on-first-retry",
    screenshot: "only-on-failure",
    serviceWorkers: "block",   // Prevents Serwist from intercepting test requests
  },

  expect: {
    toHaveScreenshot: {
      maxDiffPixelRatio: 0.02,
      threshold: 0.2,
    },
  },

  projects: [
    // Auth setup runs first for all browser projects
    { name: "setup", testMatch: /auth\.setup\.ts/ },

    // E2E flows — all 3 browsers
    {
      name: "chromium",
      use: { ...devices["Desktop Chrome"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },
    {
      name: "firefox",
      use: { ...devices["Desktop Firefox"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },
    {
      name: "webkit",
      use: { ...devices["Desktop Safari"], storageState: "playwright/.auth/platform-admin.json" },
      dependencies: ["setup"],
      testMatch: "e2e/flows/**",
    },

    // Visual regression — chromium only, 3 viewports
    { name: "visual-desktop",  use: { browserName: "chromium", viewport: { width: 1280, height: 800 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
    { name: "visual-tablet",   use: { browserName: "chromium", viewport: { width: 768, height: 1024 } }, testMatch: "e2e/visual/**", dependencies: ["setup"] },
    { name: "visual-mobile",   use: { ...devices["iPhone 12"] },                                         testMatch: "e2e/visual/**", dependencies: ["setup"] },

    // Accessibility — chromium only
    {
      name: "a11y",
      use: { ...devices["Desktop Chrome"] },
      dependencies: ["setup"],
      testMatch: "e2e/accessibility/**",
    },
  ],

  webServer: {
    command: "node .next/standalone/server.js",
    url: "http://localhost:3000",
    reuseExistingServer: !process.env.CI,
    env: {
      PORT: "3000",
      API_URL: process.env.API_URL ?? "http://localhost:8001",
      AUTH_SECRET: process.env.AUTH_SECRET ?? "test-secret-32-chars-minimum-len",
      AUTH_URL: "http://localhost:3000",
    },
  },
});

Critical: serviceWorkers: "block" is required because Serwist (PWA service worker) intercepts network requests and makes them invisible to page.route() / page.routeWebSocket().

Pattern 7: Gitea Actions CI Pipeline

# .gitea/workflows/ci.yml
name: CI

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  backend:
    name: Backend Tests
    runs-on: ubuntu-latest
    services:
      postgres:
        image: pgvector/pgvector:pg16
        env:
          POSTGRES_DB: konstruct
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres_dev
        options: >-
          --health-cmd pg_isready
          --health-interval 5s
          --health-timeout 5s
          --health-retries 10
      redis:
        image: redis:7-alpine
        options: >-
          --health-cmd "redis-cli ping"
          --health-interval 5s
    env:
      DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
      DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
      REDIS_URL: redis://localhost:6379/0
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - run: pip install uv
      - run: uv sync
      - run: uv run ruff check packages/ tests/
      - run: uv run mypy --strict packages/
      - run: uv run pytest tests/ -x --tb=short

  portal:
    name: Portal E2E
    runs-on: ubuntu-latest
    needs: backend          # E2E blocked until backend passes
    services:
      postgres:
        image: pgvector/pgvector:pg16
        env:
          POSTGRES_DB: konstruct
          POSTGRES_USER: postgres
          POSTGRES_PASSWORD: postgres_dev
        options: --health-cmd pg_isready --health-interval 5s --health-retries 10
      redis:
        image: redis:7-alpine
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "22" }
      - name: Install portal deps
        working-directory: packages/portal
        run: npm ci
      - name: Build portal
        working-directory: packages/portal
        run: npm run build
        env:
          NEXT_PUBLIC_API_URL: http://localhost:8001
      - name: Install Playwright browsers
        working-directory: packages/portal
        run: npx playwright install --with-deps chromium firefox webkit
      - name: Start gateway (background)
        run: |
          pip install uv && uv sync
          uv run alembic upgrade head
          uv run uvicorn gateway.main:app --host 0.0.0.0 --port 8001 &
        env:
          DATABASE_URL: postgresql+asyncpg://konstruct_app:konstruct_dev@localhost:5432/konstruct
          DATABASE_ADMIN_URL: postgresql+asyncpg://postgres:postgres_dev@localhost:5432/konstruct
          REDIS_URL: redis://localhost:6379/0
          LLM_POOL_URL: http://localhost:8004  # not running — mocked in E2E
      - name: Wait for gateway
        run: timeout 30 bash -c 'until curl -sf http://localhost:8001/health; do sleep 1; done'
      - name: Run E2E tests
        working-directory: packages/portal
        run: npx playwright test e2e/flows/ e2e/accessibility/
        env:
          CI: "true"
          PLAYWRIGHT_BASE_URL: http://localhost:3000
          API_URL: http://localhost:8001
          AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
          E2E_ADMIN_EMAIL: ${{ secrets.E2E_ADMIN_EMAIL }}
          E2E_ADMIN_PASSWORD: ${{ secrets.E2E_ADMIN_PASSWORD }}
      - name: Run Lighthouse CI
        working-directory: packages/portal
        run: |
          npx lhci autorun --config=e2e/lighthouse/lighthouserc.json
        env:
          LHCI_BUILD_CONTEXT__CURRENT_HASH: ${{ github.sha }}
      - name: Upload Playwright report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: playwright-report
          path: packages/portal/playwright-report/
      - name: Upload Lighthouse report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: lighthouse-report
          path: packages/portal/.lighthouseci/

Anti-Patterns to Avoid

Hardcoded IDs in selectors: Use getByRole, getByLabel, getByText — never CSS #id or [data-testid] unless semantic selectors are unavailable. Semantic selectors are more resilient and double as accessibility checks.
Real LLM calls in E2E: Never let E2E tests reach Ollama/OpenAI. Mock the WebSocket and gateway LLM calls. Real calls introduce flakiness and cost.
Superuser DB connections in test seeds: The existing conftest uses konstruct_app role to preserve RLS. E2E seeds should call the FastAPI admin API endpoints, not connect directly to the DB.
Enabling service workers in tests: Serwist intercepts all requests. Always set serviceWorkers: "block" in Playwright config.
Parallel workers with shared DB state: Set workers: 1 in CI. Tenant/agent mutations are not thread-safe across workers without per-worker isolation.
Running visual regression on all browsers: Browser rendering engines produce expected pixel diffs. Visual regression on chromium only; cross-browser covered by functional E2E.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Screenshot diffs	Custom pixel comparator	`toHaveScreenshot()` built into Playwright	Handles baseline storage, update workflow, CI reporting
Accessibility scanning	Custom ARIA traversal	`@axe-core/playwright`	Covers 57 WCAG rules including ones humans miss
Performance score gating	Parsing Lighthouse JSON manually	`@lhci/cli assert`	Handles multi-run averaging, threshold config, exit codes
Auth state reuse	Logging in before every test	Playwright `storageState`	Session reuse makes the suite 10x faster
WS mock server	Running a real mock websocket server	`page.routeWebSocket()`	In-process, no port conflicts, no flakiness

Common Pitfalls

Pitfall 1: Auth.js HttpOnly Cookies

What goes wrong: Trying to authenticate by calling /api/portal/auth/verify directly with Playwright request — this bypasses Auth.js cookie-setting, so the browser session never exists. Why it happens: Auth.js v5 JWT is set as HttpOnly secure cookie by the Next.js server, not by the FastAPI backend. How to avoid: Always use Playwright's UI login flow (fill form → submit → wait for redirect) to let Next.js set the cookie. Then save with storageState. Warning signs: Tests pass the login assertion but fail immediately after on authenticated pages.

Pitfall 2: Serwist Service Worker Intercepting Test Traffic

What goes wrong: page.route() and page.routeWebSocket() handlers never fire because the PWA service worker handles requests first. Why it happens: Serwist registers a service worker that intercepts all requests matching the scope. Playwright's routing operates at the network level before the service worker, but only if service workers are blocked. How to avoid: Set serviceWorkers: "block" in playwright.config.ts under use. Warning signs: Mock routes never called; tests see real responses or network errors.

Pitfall 3: Next.js Standalone Output Path for webServer

What goes wrong: command: "npm run start" fails in CI because next start requires the dev server setup, not standalone output. Why it happens: The portal uses output: "standalone" in next.config.ts. The build produces .next/standalone/server.js, not the standard Next.js CLI server. How to avoid: Use command: "node .next/standalone/server.js" in Playwright's webServer config. Copy static files if needed: the build step must run cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public. Warning signs: webServer process exits immediately; Playwright reports "server did not start".

Pitfall 4: Visual Regression Baseline Committed Without CI Environment Lock

What goes wrong: Baselines created on a developer's Mac differ from Linux CI renderings (font rendering, subpixel AA, etc.). Why it happens: Screenshot comparisons are pixel-exact. OS-level rendering differences cause 1–5% false failures. How to avoid: Generate baselines inside the same Docker/Linux environment as CI. Run npx playwright test --update-snapshots on Linux (or in the Playwright Docker image) to commit initial baselines. Use maxDiffPixelRatio: 0.02 to absorb minor remaining differences. Warning signs: Visual tests pass locally but always fail in CI.

Pitfall 5: Lighthouse Pages Behind Auth

What goes wrong: Lighthouse visits /dashboard and gets redirected to /login — scores an empty page. Why it happens: Lighthouse runs as an unauthenticated browser session. LHCI doesn't support Auth.js cookie injection. How to avoid: For authenticated pages, either (a) test only public pages with Lighthouse (login, landing), or (b) use LHCI's basicAuth option for pages behind HTTP auth (not applicable here), or (c) create a special unauthenticated preview mode. For this project: Run Lighthouse on /login only, plus any public-accessible marketing pages. Skip /dashboard and /chat for Lighthouse. Warning signs: Lighthouse scores 100 for accessibility on dashboard — suspiciously perfect because it's measuring an empty redirect.

Pitfall 6: WebSocket URL Resolution in Tests

What goes wrong: page.routeWebSocket("/chat/ws/") doesn't match because the portal derives the WS URL from NEXT_PUBLIC_API_URL (baked at build time), which points to ws://localhost:8001, not a relative path. Why it happens: use-chat-socket.ts computes WS_BASE from process.env.NEXT_PUBLIC_API_URL and builds ws://localhost:8001/chat/ws/{id}. How to avoid: Use a regex pattern: page.routeWebSocket(/\/chat\/ws\//, handler) — this matches the full absolute URL. Warning signs: Chat mock never fires; test times out waiting for WS message.

Pitfall 7: Gitea Actions Runner Needs Docker

What goes wrong: Service containers fail to start because the Gitea runner is not configured with Docker access. Why it happens: Gitea Actions service containers require Docker socket access on the runner. How to avoid: Ensure the act_runner is added to the docker group on the host. Alternative: use docker compose in a setup step instead of service containers. Warning signs: Job fails immediately with "Cannot connect to Docker daemon".

Code Examples

Seed Helper via API

// e2e/helpers/seed.ts
// Uses Playwright APIRequestContext to create test data via FastAPI endpoints.
// Must run BEFORE storageState setup (needs platform_admin creds via env).
export async function seedTestTenant(request: APIRequestContext): Promise<{ tenantId: string; tenantSlug: string }> {
  const suffix = Math.random().toString(36).slice(2, 8);
  const res = await request.post("http://localhost:8001/api/portal/tenants", {
    headers: {
      "X-User-Id": process.env.E2E_ADMIN_ID!,
      "X-User-Role": "platform_admin",
      "X-Active-Tenant": "",
    },
    data: { name: `E2E Tenant ${suffix}`, slug: `e2e-tenant-${suffix}` },
  });
  const body = await res.json() as { id: string; slug: string };
  return { tenantId: body.id, tenantSlug: body.slug };
}

RBAC Test Pattern

// e2e/flows/rbac.spec.ts
// Tests that operator role is silently redirected, not 403-paged
test.describe("RBAC enforcement", () => {
  test.use({ storageState: "playwright/.auth/customer-operator.json" });

  const restrictedPaths = ["/agents/new", "/billing", "/users"];

  for (const path of restrictedPaths) {
    test(`operator cannot access ${path}`, async ({ page }) => {
      await page.goto(path);
      // proxy.ts does silent redirect — operator ends up on /dashboard
      await expect(page).not.toHaveURL(path);
    });
  }
});

Mobile Viewport Behavioral Test

// e2e/flows/mobile.spec.ts
test("mobile: bottom tab bar renders, sidebar hidden", async ({ page }) => {
  await page.setViewportSize({ width: 375, height: 812 });
  await page.goto("/dashboard");
  // Bottom tab bar visible
  await expect(page.getByRole("navigation", { name: /mobile/i })).toBeVisible();
  // Desktop sidebar hidden
  await expect(page.getByRole("navigation", { name: /sidebar/i })).not.toBeVisible();
});

State of the Art

Old Approach	Current Approach	When Changed	Impact
Cypress for Next.js E2E	Playwright (official Next.js recommendation)	2023–2024	Cross-browser, better WS support, no iframe limitations
`lighthouse` npm module with custom scripts	`@lhci/cli autorun`	2020+	Automated multi-run averaging, assertions, CI reporting
`axe-playwright` (community)	`@axe-core/playwright` (official Deque)	2022+	Official package, same API, no extra wrapper
`next start` for E2E server	`node .next/standalone/server.js`	Next.js 12+ standalone	Required when `output: "standalone"` is set
middleware.ts	proxy.ts	Next.js 16	Next.js 16 renamed middleware file

Deprecated/outdated:

cypress/integration/ directory: Cypress split this into cypress/e2e/ in v10 — but we're not using Cypress
@playwright/test globalSetup string path: Still valid but the project-based setup dependency is preferred in Playwright 1.40+
installSerwist(): Replaced by new Serwist() + addEventListeners() in serwist v9 (already applied in Phase 8)

Open Questions

Lighthouse on authenticated pages
- What we know: Lighthouse runs as unauthenticated — authenticated pages redirect to /login
- What's unclear: Whether LHCI supports cookie injection (not documented)
- Recommendation: Scope Lighthouse to /login only for QA-02. Dashboard/chat performance validated manually or via Web Vitals tracking in production.
Visual regression baseline generation environment
- What we know: OS-level rendering differences cause false failures
- What's unclear: Whether the Gitea runner is Linux or Mac
- Recommendation: Wave 0 task generates baselines inside the CI Docker container (Linux), commits them. Dev machines use --update-snapshots only deliberately.
Celery worker in E2E
- What we know: The chat WebSocket flow uses Redis pub-sub to deliver responses from the Celery worker
- What's unclear: Whether E2E should run the Celery worker (real pipeline, slow) or mock the WS entirely (fast but less realistic)
- Recommendation: Mock the WebSocket entirely via page.routeWebSocket(). This tests the frontend streaming UX without depending on Celery. Add a separate smoke test that hits the gateway /health endpoint to verify service health in CI.

Validation Architecture

Test Framework

Property	Value
Framework (backend)	pytest 8.3+ / pytest-asyncio (existing, all tests pass)
Framework (E2E)	@playwright/test ^1.51 (to be installed)
Config file (E2E)	`packages/portal/playwright.config.ts` — Wave 0
Quick run (backend)	`uv run pytest tests/unit -x --tb=short`
Full suite (backend)	`uv run pytest tests/ -x --tb=short`
E2E run	`cd packages/portal && npx playwright test`
Visual update	`cd packages/portal && npx playwright test --update-snapshots`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
QA-01	7 critical user flows pass	E2E Playwright	`npx playwright test e2e/flows/ --project=chromium`	Wave 0
QA-02	Lighthouse >= 90 on key pages	Lighthouse CI	`npx lhci autorun --config=e2e/lighthouse/lighthouserc.json`	Wave 0
QA-03	Visual snapshots pass at 3 viewports	Visual regression	`npx playwright test e2e/visual/`	Wave 0
QA-04	Zero critical a11y violations	Accessibility scan	`npx playwright test e2e/accessibility/`	Wave 0
QA-05	All E2E flows pass on 3 browsers	Cross-browser E2E	`npx playwright test e2e/flows/` (all projects)	Wave 0
QA-06	Empty/error/loading states correct	E2E Playwright	Covered within flow specs via API mocking	Wave 0
QA-07	CI pipeline runs in Gitea Actions	CI workflow	`.gitea/workflows/ci.yml`	Wave 0

Sampling Rate

Per task commit: cd packages/portal && npx playwright test e2e/flows/login.spec.ts --project=chromium
Per wave merge: cd packages/portal && npx playwright test e2e/flows/ --project=chromium
Phase gate: Full suite (all projects + accessibility + visual) green before /gsd:verify-work

Wave 0 Gaps

packages/portal/playwright.config.ts — E2E framework config
packages/portal/e2e/auth.setup.ts — Auth state generation for 3 roles
packages/portal/e2e/fixtures.ts — Shared test fixtures (axe, auth, API helpers)
packages/portal/e2e/helpers/seed.ts — Test data seeding via API
packages/portal/e2e/flows/*.spec.ts — 7 flow spec files
packages/portal/e2e/accessibility/a11y.spec.ts — axe-core scans
packages/portal/e2e/visual/snapshots.spec.ts — visual regression specs
packages/portal/e2e/lighthouse/lighthouserc.json — Lighthouse CI config
.gitea/workflows/ci.yml — CI pipeline
packages/portal/playwright/.auth/.gitkeep — Directory for saved auth state (gitignored content)
Framework install: cd packages/portal && npm install --save-dev @playwright/test @axe-core/playwright @lhci/cli && npx playwright install --with-deps
Baseline snapshots: run npx playwright test e2e/visual/ --update-snapshots on Linux to generate

Sources

Primary (HIGH confidence)

https://playwright.dev/docs/auth — storageState, setup projects, multiple roles
https://playwright.dev/docs/api/class-websocketroute — WebSocket mocking API
https://playwright.dev/docs/test-snapshots — toHaveScreenshot, maxDiffPixelRatio
https://playwright.dev/docs/accessibility-testing — @axe-core/playwright integration
https://playwright.dev/docs/ci — CI configuration, Docker image, workers
https://googlechrome.github.io/lighthouse-ci/docs/configuration.html — minScore assertions format

Secondary (MEDIUM confidence)

https://googlechrome.github.io/lighthouse-ci/docs/getting-started.html — lhci autorun setup
https://playwright.dev/docs/mock — page.route() and page.routeWebSocket() overview
Gitea Actions docs (forum.gitea.com) — confirmed GitHub Actions YAML compatibility, Docker socket requirements

Tertiary (LOW confidence)

WebSearch result: Gitea runner Docker group requirement — mentioned across multiple community posts, not in official docs

Metadata

Confidence breakdown:

Standard stack: HIGH — verified against official Playwright, @axe-core, and LHCI docs
Architecture: HIGH — patterns derived directly from official Playwright documentation
Pitfalls: HIGH (pitfalls 1–6 from direct codebase inspection + official docs); MEDIUM (pitfall 7 from community sources)

Research date: 2026-03-25 Valid until: 2026-06-25 (90 days — Playwright and Next.js are fast-moving but breaking changes are rare)

37 KiB Raw Blame History Unescape Escape