fix(ci): set DATABASE_TEST_URL + pin upload-artifact to v3 for Gitea Actions #150

Merged
chihlasm merged 15 commits from fix/ci-workflow-config into main 2026-04-25 21:57:27 +00:00
Owner

Summary

Two CI-config issues unmasked by PR #149's lint/test fixes. Both are workflow-file changes only — no code touched.

1. Backend: Connect call failed ('127.0.0.1', 5432) × 638

backend/tests/conftest.py (after dab740d's test-DB-isolation hardening) reads DATABASE_TEST_URL only. DATABASE_URL is intentionally not consulted — the safety guard is there because running pytest with DATABASE_URL pointing at the dev/prod DB previously dropped the dev schema (real outage, see commit dab740d).

The CI workflow only set DATABASE_URL. Result: conftest fell back to its localhost:5432 default, no postgres there from the runner's perspective, and every fixture-setup failed with the same connection error — 638 cascading errors visible in the latest main run on f27f671.

Fix: explicitly set DATABASE_TEST_URL to the postgres service container URL. Same connection string as DATABASE_URL (test DB is the app DB in CI) — conftest's "test" in db_name safety assertion is satisfied by resolutionflow_test.

2. Frontend: upload-artifact@v4 not supported on Gitea Actions

The frontend lint job actually passes (0 errors / 23 warnings after PR #149). The job exits 1 because the next step uploads the build artifact via actions/upload-artifact@v4, which the Gitea Actions runner rejects with:

GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES.

Pin both upload-artifact and download-artifact to v3 (the latest version Gitea Actions supports today). Same artifact format, same downstream e2e job.

Expected CI state after this PR

  • CI / frontend (push) and CI / frontend (pull_request) — green (lint already passes; artifact step now works)
  • CI / backend (push) and CI / backend (pull_request) — should match the local 1018 passed / 54 failed / 4 errors from PR #149's verification (down from 438/1/638 today)
  • After this lands, we can add CI / backend (pull_request) to the required status checks on main.

Test plan

  • Watch this PR's CI run after push.
  • Confirm backend connects to the postgres service (no more "127.0.0.1" errors).
  • Confirm frontend job exits 0.

🤖 Generated with Claude Code

## Summary Two CI-config issues unmasked by PR #149's lint/test fixes. Both are workflow-file changes only — no code touched. ### 1. Backend: `Connect call failed ('127.0.0.1', 5432)` × 638 `backend/tests/conftest.py` (after `dab740d`'s test-DB-isolation hardening) reads `DATABASE_TEST_URL` only. `DATABASE_URL` is intentionally **not consulted** — the safety guard is there because running `pytest` with `DATABASE_URL` pointing at the dev/prod DB previously dropped the dev schema (real outage, see commit `dab740d`). The CI workflow only set `DATABASE_URL`. Result: conftest fell back to its `localhost:5432` default, no postgres there from the runner's perspective, and every fixture-setup failed with the same connection error — 638 cascading errors visible in the latest main run on `f27f671`. Fix: explicitly set `DATABASE_TEST_URL` to the postgres service container URL. Same connection string as `DATABASE_URL` (test DB *is* the app DB in CI) — conftest's `"test" in db_name` safety assertion is satisfied by `resolutionflow_test`. ### 2. Frontend: `upload-artifact@v4` not supported on Gitea Actions The frontend lint job actually passes (`0 errors / 23 warnings` after PR #149). The job exits 1 because the *next* step uploads the build artifact via `actions/upload-artifact@v4`, which the Gitea Actions runner rejects with: ``` GHESNotSupportedError: @actions/artifact v2.0.0+, upload-artifact@v4+ and download-artifact@v4+ are not currently supported on GHES. ``` Pin both `upload-artifact` and `download-artifact` to `v3` (the latest version Gitea Actions supports today). Same artifact format, same downstream e2e job. ## Expected CI state after this PR - `CI / frontend (push)` and `CI / frontend (pull_request)` — green (lint already passes; artifact step now works) - `CI / backend (push)` and `CI / backend (pull_request)` — should match the local `1018 passed / 54 failed / 4 errors` from PR #149's verification (down from 438/1/638 today) - After this lands, we can add `CI / backend (pull_request)` to the required status checks on main. ## Test plan - [ ] Watch this PR's CI run after push. - [ ] Confirm backend connects to the postgres service (no more "127.0.0.1" errors). - [ ] Confirm frontend job exits 0. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
chihlasm added 1 commit 2026-04-25 07:29:21 +00:00
fix(ci): set DATABASE_TEST_URL + downgrade upload-artifact to v3 (Gitea Actions)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m29s
CI / frontend (pull_request) Successful in 3m11s
CI / e2e (pull_request) Failing after 4m56s
8f7df2c0ef
Two CI-config issues blocking the gate from going green:

1. **Backend tests connect to localhost instead of postgres service.**
   conftest.py reads DATABASE_TEST_URL only — DATABASE_URL is intentionally
   not consulted (per dab740d's test-DB-isolation hardening — running
   pytest with DATABASE_URL set previously dropped the dev DB schema).
   The CI workflow only sets DATABASE_URL, so conftest falls back to its
   localhost default and every fixture-setup fails with
   `OSError: Connect call failed ('127.0.0.1', 5432)` — observed as 638
   errors on the latest main run.

   Add DATABASE_TEST_URL pointing at the postgres service container.
   Same connection string as DATABASE_URL — the test DB and the app DB
   are the same physical postgres in CI; conftest's safety assertion is
   satisfied by the URL containing "test".

2. **Frontend artifact upload fails on Gitea Actions runner.**
   actions/upload-artifact@v4 (and v5) are not supported on Gitea
   Actions / GHES — the runner returns
   `GHESNotSupportedError: ... not currently supported on GHES`. Lint
   itself is now passing (0 errors after PR #149); the job exits 1 only
   because the upload step then fails.

   Pin upload-artifact + download-artifact to v3, the latest version
   compatible with Gitea Actions until they ship v4 support.

After this lands, both backend and frontend CI gates should turn
green — at which point we can also add backend to the required status
checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 07:36:57 +00:00
docs(ai): handoff for Codex — CI recovery + 54 real backend failures
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m15s
CI / frontend (pull_request) Successful in 2m55s
CI / e2e (pull_request) Failing after 4m23s
208ec996d5
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md so the next
session has accurate resume state. Summary of where things are:

- PR #141 (PSA tickets), PR #147 (FlowPilot Phase 1-9), PR #148 (CI
  fixes part 1), PR #149 (CI fixes part 2) all merged to main in this
  session.
- Branch protection enabled on main: PR-only, CI / frontend required.
- PR #150 (this branch) is the last CI-config PR — adds
  DATABASE_TEST_URL to the workflow and pins upload-artifact to v3.
- Next session: watch #150's CI, merge if green, add CI / backend to
  required checks, then start on the 54 real backend test failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 15:10:25 +00:00
wip(handoff): restore backend suite to green
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Failing after 27m35s
CI / frontend (pull_request) Successful in 2m46s
CI / e2e (pull_request) Failing after 4m9s
49f88569da
Co-Authored-By: Codex <noreply@openai.com>
chihlasm added 1 commit 2026-04-25 15:35:41 +00:00
docs(ai): queue pytest-xdist parallelization in TODO.md
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Has been cancelled
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
0aefaa78eb
Capture the backend pytest parallelization work so it survives session
end. Backend suite is currently ~22 min wall-clock for 1076 tests;
xdist with one-DB-per-worker should land in the 3-6 min range on the
homelab Gitea Actions runner.

Also queues two backlog items:
- Frontend lint warnings (23 react-hooks/exhaustive-deps after PR #149)
- Periodic audit of the ResourceWarning filterwarnings added by Codex

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 16:01:09 +00:00
fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Successful in 31m8s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m57s
e976fb4e87
Three changes that get PR #150 to a green CI gate:

1. **test_record_decision_persists_and_bumps_state_version** — the
   `decision: draft_template` path calls `_extract_template_parameters`
   (TemplateExtractionService → AI provider). CI doesn't set
   ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
   `RuntimeError: No AI provider configured` and returned 500. The test
   isn't exercising the AI integration — patched the extractor with an
   AsyncMock returning a minimal valid `{templated_body, parameters}`
   dict. Verified locally: the test now passes.

2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
   the hash of requirements*.txt / package-lock.json with a runner-os
   restore-key fallback. Saves ~30-60s per run on cache hit.

3. **Pytest invocation tightened**:
   - Dropped `--cov-report=term-missing` — the custom "Display coverage
     summary" step below parses coverage.json and prints the same
     module list more concisely. Term-missing dumps every uncovered
     line which adds ~5-10s of stdout.
   - Added `--maxfail=10` so a structural breakage (fixture explosion,
     DB unreachable) bails after 10 errors instead of running the full
     25-min suite. Tunable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 16:15:09 +00:00
docs(ai): handoff after CI parallelization + final test fix
Some checks failed
Mirror to GitHub / mirror (push) Has been cancelled
CI / backend (pull_request) Successful in 30m26s
CI / frontend (pull_request) Successful in 5m46s
CI / e2e (pull_request) Failing after 5m3s
fe632c9194
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect:
- PR #150 now contains the AI-provider test mock + caching + maxfail.
  Backend CI should be fully green for the first time in months.
- PR #151 stacked on #150: pytest-xdist with per-worker DBs. Local
  verification: 22m 27s → 4m 28s (5× speedup), 1076 passed both runs.
- DoD is now: merge #150, then #151, then add CI / backend
  (pull_request) to required status checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 16:15:44 +00:00
docs(ai): SESSION_LOG entry for the parallelization session
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Successful in 32m33s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m58s
f27e3b44b0
(Was meant to land in fe632c9; the multi-line edit failed silently
because Codex's earlier entry shifted the surrounding context.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 19:26:23 +00:00
docs(ai): queue data-testid hardening + reflect PR #152 + 3-runner setup
Some checks failed
CI / backend (pull_request) Failing after 2s
Mirror to GitHub / mirror (push) Successful in 15s
CI / e2e (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
c203b70ef9
TODO.md: Promote pytest-xdist to  (PR #151 carries it). Adds three new
backlog items:
- data-testid hardening for e2e-critical interactive elements (sparked
  by PR #152's selector drift work)
- per-test transactional rollback (next big speedup if needed)
- pytest-testmon for PR-time test selection

HANDOFF.md: Three open PRs now (#150, #151, #152), all independent.
Three Gitea runner agents now registered, so jobs run in parallel.
Combined with #151's xdist, the prior 1h 14m wall-clock should drop
to ~6-10 min. Updated merge order: #152 first (smallest), #150 next,
#151 last. After all three land, enable CI / backend then CI / e2e
as required status checks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 19:28:20 +00:00
fix(ci): drop postgres host port mapping (multi-runner port collision)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 6m44s
CI / e2e (pull_request) Failing after 8m43s
CI / backend (pull_request) Has been cancelled
1bd43abb8f
With 3 Gitea Actions runners on the same homelab box, two simultaneous
backend (or backend + e2e) jobs both try to bind 0.0.0.0:5432 for their
postgres service containers. The second fails with:

  failed to set up container networking: ... Bind for 0.0.0.0:5432
  failed: port is already allocated

The host-port mapping isn't actually needed — the workflow uses
\`DATABASE_URL: postgresql+asyncpg://...@postgres:5432/...\` (hostname
\`postgres\` is the service container's docker-network DNS name).
The tests run inside the act container which is on the same docker
network, so they reach postgres without going through the host.

Removing \`ports: 5432:5432\` from both backend and e2e job service
definitions lets multiple postgres services run in parallel on
different docker networks without colliding on the host.

Surfaced when PR #150 ran in parallel with another job after the
multi-runner setup. Backend instant-failed in 2s on the docker run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 2 commits 2026-04-25 19:54:09 +00:00
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).

How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
  'gw0', 'gw1', …). When set, derives a per-worker DB URL like
  `…/resolutionflow_test_gw0`. The base DB stays for serial / master
  runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
  connects to the postgres maintenance DB, and `CREATE DATABASE`s the
  worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
  contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
  worker's isolated DB, no cross-worker race.

CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).

Adds `pytest-xdist==3.6.1` to requirements-dev.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(e2e): update 5 selectors that drifted with FlowPilot/PSA UI changes
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m52s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
69f2a37591
Mechanical drift between the e2e selectors and the current UI surfaced
on the first CI run after PR #149 unblocked the artifact upload step.
Five tests, three categories of drift:

1. **Page heading renames** (navigation.spec.ts)
   - `Sessions` → `Session History` on /sessions
   - `Account Settings` → `Account Management` on /account

2. **Route rename** (command-palette.spec.ts:74)
   - The "Troubleshoot with FlowPilot" command palette option now lands
     on /pilot (Phase 1 of the FlowPilot migration renamed /assistant).
     /assistant still 301-redirects, so the assertion accepts either.

3. **Feature moved to /sessions** (history.spec.ts, resume.spec.ts)
   - Default tab on /sessions is "AI Sessions"; flow-session filtering
     and the Resume button moved behind the "Flow Sessions" tab. Both
     tests now click that tab before asserting.
   - resume.spec.ts no longer starts at /trees (Resume buttons aren't
     rendered there anymore — the flow lives on /sessions). Destination
     URL (/trees/:id/navigate) is unchanged.

No product-code changes — these are pure test updates against the
shipped UI. Run the suite locally with
`cd frontend && npm run test:e2e` once a fresh build is available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 19:55:11 +00:00
docs(ai): reflect PR consolidation — #151/#152 merged into #150
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
6656ebdead
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 19:59:02 +00:00
perf(ci): decouple e2e from frontend — build frontend inline in e2e job
Some checks failed
Mirror to GitHub / mirror (push) Successful in 14s
CI / frontend (pull_request) Successful in 5m44s
CI / e2e (pull_request) Failing after 7m42s
CI / backend (pull_request) Successful in 10m28s
261814ae65
Before: e2e \`needs: [frontend]\` waited for the frontend job to upload
a build artifact, then downloaded it. With multiple runners this means
the third runner sat idle for ~6 min while frontend ran, then started
e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min.

After: e2e builds its own frontend (npm ci + npm run build are already
in the job; just dropped the artifact download step and added the
build). e2e starts immediately on a free runner. Adds ~1-2 min to the
e2e job duration but removes ~5 min of waiting and eliminates the
cross-job artifact mechanism entirely.

Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches
on the cross-job handoff. The \`if: always()\` upload of the
playwright-report at the end of e2e is kept (failure report retrieval
is still useful), but it's a leaf-output, not a dependency.

Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the
3-runner setup, down from ~11 min.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 20:12:21 +00:00
docs(ai): note e2e decoupling commit (261814a) in HANDOFF
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m43s
CI / e2e (pull_request) Failing after 9m30s
CI / backend (pull_request) Successful in 10m18s
ede6eebf9a
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chihlasm added 1 commit 2026-04-25 21:42:48 +00:00
fix(e2e): harden card selectors for session resume
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m43s
CI / backend (pull_request) Successful in 10m21s
CI / e2e (pull_request) Successful in 11m23s
1e3a6cfa01
Co-Authored-By: Codex <noreply@openai.com>
chihlasm merged commit 87bb20b8f0 into main 2026-04-25 21:57:27 +00:00
Sign in to join this conversation.