chihlasm/resolutionflow

fix(ci): set DATABASE_TEST_URL + pin upload-artifact to v3 for Gitea Actions #150

Merged

chihlasm merged 15 commits from fix/ci-workflow-config into main

2026-04-25 21:57:27 +00:00

Author	SHA1	Message	Date
Michael Chihlas	1e3a6cfa01	fix(e2e): harden card selectors for session resume All checks were successful Mirror to GitHub / mirror (push) Successful in 12s Details CI / frontend (pull_request) Successful in 5m43s Details CI / backend (pull_request) Successful in 10m21s Details CI / e2e (pull_request) Successful in 11m23s Details Co-Authored-By: Codex <noreply@openai.com>	2026-04-25 16:42:33 -04:00
Michael Chihlas	ede6eebf9a	docs(ai): note e2e decoupling commit (`261814a`) in HANDOFF Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / frontend (pull_request) Successful in 5m43s Details CI / e2e (pull_request) Failing after 9m30s Details CI / backend (pull_request) Successful in 10m18s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:12:19 -04:00
Michael Chihlas	261814ae65	perf(ci): decouple e2e from frontend — build frontend inline in e2e job Some checks failed Mirror to GitHub / mirror (push) Successful in 14s Details CI / frontend (pull_request) Successful in 5m44s Details CI / e2e (pull_request) Failing after 7m42s Details CI / backend (pull_request) Successful in 10m28s Details Before: e2e \`needs: [frontend]\` waited for the frontend job to upload a build artifact, then downloaded it. With multiple runners this means the third runner sat idle for ~6 min while frontend ran, then started e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min. After: e2e builds its own frontend (npm ci + npm run build are already in the job; just dropped the artifact download step and added the build). e2e starts immediately on a free runner. Adds ~1-2 min to the e2e job duration but removes ~5 min of waiting and eliminates the cross-job artifact mechanism entirely. Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches on the cross-job handoff. The \`if: always()\` upload of the playwright-report at the end of e2e is kept (failure report retrieval is still useful), but it's a leaf-output, not a dependency. Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the 3-runner setup, down from ~11 min. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:59:00 -04:00
Michael Chihlas	6656ebdead	docs(ai): reflect PR consolidation — #151/#152 merged into #150 Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / e2e (pull_request) Has been cancelled Details CI / backend (pull_request) Has been cancelled Details CI / frontend (pull_request) Has been cancelled Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:55:08 -04:00
Michael Chihlas	69f2a37591	fix(e2e): update 5 selectors that drifted with FlowPilot/PSA UI changes Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / frontend (pull_request) Successful in 5m52s Details CI / e2e (pull_request) Has been cancelled Details CI / backend (pull_request) Has been cancelled Details Mechanical drift between the e2e selectors and the current UI surfaced on the first CI run after PR #149 unblocked the artifact upload step. Five tests, three categories of drift: 1. Page heading renames (navigation.spec.ts) - `Sessions` → `Session History` on /sessions - `Account Settings` → `Account Management` on /account 2. Route rename (command-palette.spec.ts:74) - The "Troubleshoot with FlowPilot" command palette option now lands on /pilot (Phase 1 of the FlowPilot migration renamed /assistant). /assistant still 301-redirects, so the assertion accepts either. 3. Feature moved to /sessions (history.spec.ts, resume.spec.ts) - Default tab on /sessions is "AI Sessions"; flow-session filtering and the Resume button moved behind the "Flow Sessions" tab. Both tests now click that tab before asserting. - resume.spec.ts no longer starts at /trees (Resume buttons aren't rendered there anymore — the flow lives on /sessions). Destination URL (/trees/:id/navigate) is unchanged. No product-code changes — these are pure test updates against the shipped UI. Run the suite locally with `cd frontend && npm run test:e2e` once a fresh build is available. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:53:57 -04:00
Michael Chihlas	7f714363dd	perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m Backend suite is the slow gate (1076 passed locally in 22m27s on fix/ci-workflow-config). Adding pytest-xdist with per-worker DB isolation drops it to ~4m20s on the 8-core homelab runner. Verified locally: `pytest -n auto --no-cov` finished in 4m28s real time (15m19s user — confirms ~5× parallelism). How it works: - conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist — 'gw0', 'gw1', …). When set, derives a per-worker DB URL like `…/resolutionflow_test_gw0`. The base DB stays for serial / master runs. - `_ensure_worker_db_exists` runs synchronously at conftest import, connects to the postgres maintenance DB, and `CREATE DATABASE`s the worker-suffixed DB if it doesn't exist. Idempotent across runs. - The "test" safety guard still applies — every worker DB name contains "test" so the assertion holds. - The per-test `DROP SCHEMA public CASCADE` now operates on the worker's isolated DB, no cross-worker race. CI workflow: backend job switches to `pytest -n auto`. Coverage still collected (pytest-cov has built-in xdist support). Adds `pytest-xdist==3.6.1` to requirements-dev.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:53:47 -04:00
Michael Chihlas	1bd43abb8f	fix(ci): drop postgres host port mapping (multi-runner port collision) Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / frontend (pull_request) Successful in 6m44s Details CI / e2e (pull_request) Failing after 8m43s Details CI / backend (pull_request) Has been cancelled Details With 3 Gitea Actions runners on the same homelab box, two simultaneous backend (or backend + e2e) jobs both try to bind 0.0.0.0:5432 for their postgres service containers. The second fails with: failed to set up container networking: ... Bind for 0.0.0.0:5432 failed: port is already allocated The host-port mapping isn't actually needed — the workflow uses \`DATABASE_URL: postgresql+asyncpg://...@postgres:5432/...\` (hostname \`postgres\` is the service container's docker-network DNS name). The tests run inside the act container which is on the same docker network, so they reach postgres without going through the host. Removing \`ports: 5432:5432\` from both backend and e2e job service definitions lets multiple postgres services run in parallel on different docker networks without colliding on the host. Surfaced when PR #150 ran in parallel with another job after the multi-runner setup. Backend instant-failed in 2s on the docker run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:28:17 -04:00
Michael Chihlas	c203b70ef9	docs(ai): queue data-testid hardening + reflect PR #152 + 3-runner setup Some checks failed CI / backend (pull_request) Failing after 2s Details Mirror to GitHub / mirror (push) Successful in 15s Details CI / e2e (pull_request) Has been cancelled Details CI / frontend (pull_request) Has been cancelled Details TODO.md: Promote pytest-xdist to ✅ (PR #151 carries it). Adds three new backlog items: - data-testid hardening for e2e-critical interactive elements (sparked by PR #152's selector drift work) - per-test transactional rollback (next big speedup if needed) - pytest-testmon for PR-time test selection HANDOFF.md: Three open PRs now (#150, #151, #152), all independent. Three Gitea runner agents now registered, so jobs run in parallel. Combined with #151's xdist, the prior 1h 14m wall-clock should drop to ~6-10 min. Updated merge order: #152 first (smallest), #150 next, #151 last. After all three land, enable CI / backend then CI / e2e as required status checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:26:21 -04:00
Michael Chihlas	f27e3b44b0	docs(ai): SESSION_LOG entry for the parallelization session Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Successful in 32m33s Details CI / frontend (pull_request) Successful in 5m42s Details CI / e2e (pull_request) Failing after 4m58s Details (Was meant to land in fe632c9; the multi-line edit failed silently because Codex's earlier entry shifted the surrounding context.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:15:41 -04:00
Michael Chihlas	fe632c9194	docs(ai): handoff after CI parallelization + final test fix Some checks failed Mirror to GitHub / mirror (push) Has been cancelled Details CI / backend (pull_request) Successful in 30m26s Details CI / frontend (pull_request) Successful in 5m46s Details CI / e2e (pull_request) Failing after 5m3s Details Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect: - PR #150 now contains the AI-provider test mock + caching + maxfail. Backend CI should be fully green for the first time in months. - PR #151 stacked on #150: pytest-xdist with per-worker DBs. Local verification: 22m 27s → 4m 28s (5× speedup), 1076 passed both runs. - DoD is now: merge #150, then #151, then add CI / backend (pull_request) to required status checks on main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:15:07 -04:00
Michael Chihlas	e976fb4e87	fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Successful in 31m8s Details CI / frontend (pull_request) Successful in 5m42s Details CI / e2e (pull_request) Failing after 4m57s Details Three changes that get PR #150 to a green CI gate: 1. test_record_decision_persists_and_bumps_state_version — the `decision: draft_template` path calls `_extract_template_parameters` (TemplateExtractionService → AI provider). CI doesn't set ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised `RuntimeError: No AI provider configured` and returned 500. The test isn't exercising the AI integration — patched the extractor with an AsyncMock returning a minimal valid `{templated_body, parameters}` dict. Verified locally: the test now passes. 2. pip + npm caches in backend, frontend, and e2e jobs. Keyed on the hash of requirements.txt / package-lock.json with a runner-os restore-key fallback. Saves ~30-60s per run on cache hit. 3. Pytest invocation tightened*: - Dropped `--cov-report=term-missing` — the custom "Display coverage summary" step below parses coverage.json and prints the same module list more concisely. Term-missing dumps every uncovered line which adds ~5-10s of stdout. - Added `--maxfail=10` so a structural breakage (fixture explosion, DB unreachable) bails after 10 errors instead of running the full 25-min suite. Tunable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:01:05 -04:00
Michael Chihlas	0aefaa78eb	docs(ai): queue pytest-xdist parallelization in TODO.md Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / frontend (pull_request) Has been cancelled Details CI / e2e (pull_request) Has been cancelled Details CI / backend (pull_request) Has been cancelled Details Capture the backend pytest parallelization work so it survives session end. Backend suite is currently ~22 min wall-clock for 1076 tests; xdist with one-DB-per-worker should land in the 3-6 min range on the homelab Gitea Actions runner. Also queues two backlog items: - Frontend lint warnings (23 react-hooks/exhaustive-deps after PR #149) - Periodic audit of the ResourceWarning filterwarnings added by Codex Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 11:35:38 -04:00
Michael Chihlas	49f88569da	wip(handoff): restore backend suite to green Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Failing after 27m35s Details CI / frontend (pull_request) Successful in 2m46s Details CI / e2e (pull_request) Failing after 4m9s Details Co-Authored-By: Codex <noreply@openai.com>	2026-04-25 06:13:23 -04:00
Michael Chihlas	208ec996d5	docs(ai): handoff for Codex — CI recovery + 54 real backend failures Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 28m15s Details CI / frontend (pull_request) Successful in 2m55s Details CI / e2e (pull_request) Failing after 4m23s Details Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md so the next session has accurate resume state. Summary of where things are: - PR #141 (PSA tickets), PR #147 (FlowPilot Phase 1-9), PR #148 (CI fixes part 1), PR #149 (CI fixes part 2) all merged to main in this session. - Branch protection enabled on main: PR-only, CI / frontend required. - PR #150 (this branch) is the last CI-config PR — adds DATABASE_TEST_URL to the workflow and pins upload-artifact to v3. - Next session: watch #150's CI, merge if green, add CI / backend to required checks, then start on the 54 real backend test failures. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 03:36:54 -04:00
Michael Chihlas	8f7df2c0ef	fix(ci): set DATABASE_TEST_URL + downgrade upload-artifact to v3 (Gitea Actions) Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 28m29s Details CI / frontend (pull_request) Successful in 3m11s Details CI / e2e (pull_request) Failing after 4m56s Details Two CI-config issues blocking the gate from going green: 1. Backend tests connect to localhost instead of postgres service. conftest.py reads DATABASE_TEST_URL only — DATABASE_URL is intentionally not consulted (per dab740d's test-DB-isolation hardening — running pytest with DATABASE_URL set previously dropped the dev DB schema). The CI workflow only sets DATABASE_URL, so conftest falls back to its localhost default and every fixture-setup fails with `OSError: Connect call failed ('127.0.0.1', 5432)` — observed as 638 errors on the latest main run. Add DATABASE_TEST_URL pointing at the postgres service container. Same connection string as DATABASE_URL — the test DB and the app DB are the same physical postgres in CI; conftest's safety assertion is satisfied by the URL containing "test". 2. Frontend artifact upload fails on Gitea Actions runner. actions/upload-artifact@v4 (and v5) are not supported on Gitea Actions / GHES — the runner returns `GHESNotSupportedError: ... not currently supported on GHES`. Lint itself is now passing (0 errors after PR #149); the job exits 1 only because the upload step then fails. Pin upload-artifact + download-artifact to v3, the latest version compatible with Gitea Actions until they ship v4 support. After this lands, both backend and frontend CI gates should turn green — at which point we can also add backend to the required status checks on main. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 03:28:54 -04:00