1 Commits

Author SHA1 Message Date
ca45bc9bb3 perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Successful in 9m37s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 20m54s
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).

How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
  'gw0', 'gw1', …). When set, derives a per-worker DB URL like
  `…/resolutionflow_test_gw0`. The base DB stays for serial / master
  runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
  connects to the postgres maintenance DB, and `CREATE DATABASE`s the
  worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
  contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
  worker's isolated DB, no cross-worker race.

CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).

Adds `pytest-xdist==3.6.1` to requirements-dev.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:07:57 -04:00
27 changed files with 110 additions and 544 deletions

View File

@@ -1,20 +1,22 @@
# CURRENT_TASK.md # CURRENT_TASK.md
**Task:** No active task — pick from [`TODO.md`](TODO.md). **Task:** Restore a fully green CI gate on `main` and lock it via branch protection so future merges can't introduce silent rot.
**Status:** ready for next pickup. **Status:** in-progress
## Recommended next moves **Definition of Done:**
- [ ] PR #150 (`fix/ci-workflow-config`) merged. Both `CI / backend (pull_request)` and `CI / frontend (pull_request)` show success on the merge commit.
- [ ] `CI / backend (pull_request)` added to required status checks on `main` in Gitea branch protection (frontend is already required).
- [ ] The 54 real backend test failures (left after #149's infra cleanup) categorized and fixed in a follow-up PR. Target: 0 failures, 0 errors on a `pytest` run inside `resolutionflow_backend`.
- [ ] `npm run lint` stays at 0 errors after the cleanup PR (already at 0 on main).
- [ ] Append a SESSION_LOG.md entry summarizing what shipped.
1. **Promote `CI / e2e (pull_request)` to required on `main`.** Two consecutive PR runs (#150 and #153) have now finished green on the e2e job. That was the threshold the prior CI-recovery task set for promoting it. Branch protection update only — no code change. **Assumptions:**
2. **Pick a backlog item.** Top of `TODO.md` "Up next" is the `data-testid` e2e-stability work (PR #152 spent five one-line selector updates chasing UI churn — adding stable test IDs to a small set of high-value elements would make those tests immune to copy/route renames). The new `currentChatRef` silent-return audit added in #153's session is in Backlog and is a natural pairing with the bug fix that was just shipped. - The 54 failures fall into a small number of root-cause categories (likely 35: fixture-scoping leaks, DB cleanup ordering, account_id propagation in test seed paths). Verify before assuming.
- The pytest-asyncio 0.24 + pytest 8.4 toolchain bumped in #149 is the right baseline; do not revert.
- `DATABASE_TEST_URL` is the only DB URL conftest will honor; do not weaken the safety guard added in `dab740d`.
## Previous task — closed out **Out of scope:**
- New feature work on FlowPilot (Phase 10+) or PSA — keep this branch focused on CI debt.
**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane. - Frontend lint warnings (23 remain after #149; they're missing-deps in useEffect, opt-in cleanup later).
- RLS test suite (`test_rls_isolation.py`) — gated behind `RUN_RLS_TESTS=1` and not in the default CI run.
**Status:** complete (2026-04-26).
- PR #153 merged as commit `68fcdc6` on `main`. Backend, frontend, and e2e all green on the merged SHA after the env-var fix.
- E2e CI needed a stub `ANTHROPIC_API_KEY` in the workflow so the AI-gated `POST /api/v1/ai-sessions` endpoint stops returning 503; the Playwright `page.route` stub still intercepts the actual `/chat` call in the browser, so no real Anthropic traffic occurs.
- Regression test `frontend/e2e/assistant-chat-prefill.spec.ts` is part of the e2e suite going forward.

View File

@@ -2,27 +2,62 @@
# HANDOFF.md # HANDOFF.md
**Last updated:** 2026-04-26 04:55 EDT **Last updated:** 2026-04-25 06:12 EDT
**Active task:** None — pick from [`TODO.md`](TODO.md). See [`CURRENT_TASK.md`](CURRENT_TASK.md) for recommended next moves. **Active task:** Restore green CI gate on `main` and lock it via branch protection. See [CURRENT_TASK.md](CURRENT_TASK.md).
**Branch:** `main` is the home position. Recent merges: PR #150 (CI recovery, `87bb20b`), PR #153 (prefill `currentChatRef` fix, `68fcdc6`). **Branch:** `fix/ci-workflow-config`
## Where things stand ## Current state
- CI is healthy on `main`: backend, frontend, and e2e all green on the latest commits. Previous session fixed the 54 real backend failures left after #149. The default backend suite is now green locally:
- Branch protection on `main`: PR-only merges, force-push blocked, **`CI / frontend (pull_request)` required**, **`CI / backend (pull_request)` required**, `CI / e2e (pull_request)` not yet required.
- Two consecutive PR runs (#150, #153) finished green on e2e. The "promote e2e to required" gate from the prior task is now satisfiable. ```bash
- Backend AI-gated endpoints (`POST /ai-sessions`, `/chat`, `/respond`, etc.) call `_require_ai_enabled()` and return 503 if no provider key is set. The e2e CI job now sets a stub `ANTHROPIC_API_KEY` so any future test that exercises those flows can rely on it; tests should still stub the actual AI calls in the browser via `page.route` so no real Anthropic traffic occurs. docker exec resolutionflow_backend bash -lc 'pytest --override-ini="addopts=" -q > /tmp/full-backend.log 2>&1; code=$?; tail -n 160 /tmp/full-backend.log; exit $code'
# 1076 passed, 35 deselected in 1347.41s (0:22:27)
```
Targeted validation also passed:
- `tests/test_session_resolutions_api.py tests/test_session_sharing.py tests/test_session_suggested_fixes_api.py tests/test_survey.py tests/test_tenant_isolation_p0.py tests/test_tree_sharing.py tests/test_trees.py::TestTrees::test_delete_tree_cleans_up_folder_and_tag_assignments tests/test_uploads.py::test_delete_upload_forbidden_for_non_owner``73 passed`
- PDF export tests → `3 passed`
- Prompt/PSA/resolution/script-builder subset → `14 passed`
- Admin/AI/branch subsets → `11 passed`
## What changed
Production fixes:
- CI/backend dev image now installs WeasyPrint system libraries.
- Public share-token and survey routes are mounted outside tenant auth; protected share management remains tenant-protected.
- Folder creation now persists `UserFolder.account_id`.
- Script Builder save-to-library now persists `ScriptTemplate.account_id`.
- Resolution output generation eager-loads `AISession.steps` to avoid async lazy-load `MissingGreenlet`.
- AI session model now declares the generated `search_vector` column already present in Alembic, so `create_all` test schemas match runtime migrations.
- Direct account-role update now rejects `"owner"`; ownership changes must use the transfer path.
- Assistant prompt marker examples no longer include a literal executable `create_spin_off_ticket` payload.
Test/harness fixes:
- Test seeds updated for tenant-scoped `account_id` columns on sessions, branches, resolution outputs, script templates, PSA connections, folders, schedules, and categories.
- Tests aligned with 404-not-403 resource-hiding policy.
- Disabled-AI tests now restore both Anthropic and Google key settings.
- Pytest harness closes pytest-asyncio's leftover clean loop and ignores known unclosed asyncio/asyncpg teardown ResourceWarnings that otherwise appear at arbitrary later setup points under `filterwarnings = error`.
## Immediate next steps ## Immediate next steps
1. (Optional, ops-only) Promote `CI / e2e (pull_request)` to required on `main` in Gitea branch protection. 1. Commit current working tree if not already committed with trailer:
2. Pick the next backlog item from `TODO.md`. Top of "Up next" is the `data-testid` e2e-stability audit; the new `currentChatRef` silent-return audit (added to backlog in this session) is a natural pairing with the bug fix that just shipped. `Co-Authored-By: Codex <noreply@openai.com>`.
2. Check PR #150 status on Gitea. If both `CI / backend (pull_request)` and `CI / frontend (pull_request)` are green, merge it.
3. After #150 merges, add `CI / backend (pull_request)` to required status checks on main:
```bash
PATCH /repos/chihlasm/resolutionflow/branch_protections/main
{ "status_check_contexts": ["CI / frontend (pull_request)", "CI / backend (pull_request)"] }
```
`$GITEA_TOKEN` is in `.claude/settings.local.json`.
4. Run/confirm frontend lint if needed for the final DoD item (`npm run lint` was already green after #149, but this session did not rerun it).
## Useful breadcrumbs ## Open questions
- The fix that just landed: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) — `currentChatRef.current = session.session_id` after `setActiveChatId` in the dashboard prefill effect. - PR #150 was not rechecked or merged in this session.
- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts). - Branch protection was not updated in this session.
- E2e env convention: [`.gitea/workflows/ci.yml`](../.gitea/workflows/ci.yml) — `ANTHROPIC_API_KEY` is stubbed in the e2e job env. Tests that exercise AI-gated endpoints should stub the actual AI calls in the browser, not rely on a real key.
- Silent-return follow-up entry: [`.ai/TODO.md`](TODO.md), Backlog section.

View File

@@ -12,40 +12,6 @@
--- ---
## 2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150
- User reported a troubleshooting-session bug: after answering a subset of task-lane questions and clicking *Send N of M Responses*, no AI response appeared. Traced to `AssistantChatPage`: the dashboard prefill effect set `activeChatId` after creating a new chat session but never updated `currentChatRef.current`. The `currentChatRef.current !== sentForChatId` guard in `handleSend` and `handleTaskSubmit` then bailed silently on every later request and discarded the AI's reply. The user message was already pushed to the chat before the await, so the user saw their answers but nothing else.
- Fix: one-line addition mirroring `handleNewChat` and `handleResumeNew` — assign `currentChatRef.current = session.session_id` immediately after `setActiveChatId(session.session_id)` in the prefill effect. Branched off `origin/main` as `fix/tasklane-prefill-ref`; PR #153 opened on Gitea.
- Authored a Playwright regression test `frontend/e2e/assistant-chat-prefill.spec.ts` that drives the real dashboard prefill flow against the real backend, stubs `/ai-sessions/*/chat` with `page.route` for deterministic turn-1/turn-2 responses, and asserts the second AI message renders. Confirmed the test fails on unfixed code at the exact assertion (`Got it — based on your answer…` never appears) and passes once the fix is restored.
- Verified locally inside `mcr.microsoft.com/playwright:v1.58.2-noble` against the running dev stack: new spec passes, adjacent `flowpilot-chat` spec still passes, `tsc -b` clean. `resume.spec` and `history.spec` failures observed are pre-existing real-backend fixture collisions, unrelated to this change.
- First CI run on PR #153 failed on infrastructure issues already addressed by PR #150: backend hit `Bind for 0.0.0.0:5432 failed: port is already allocated`, frontend hit `actions/upload-artifact@v4 not supported on GHES`. PR #150 was already merged (commit `87bb20b` on `main`). Rebased `fix/tasklane-prefill-ref` onto new `main` (force-push `1a8cb06``1559feb`), resolved a `.ai/TODO.md` conflict by keeping both backlog item sets, kicked off CI on the rebased SHA.
- Confirmed `CI / backend (pull_request)` is now in branch protection's required-status-checks list (added during PR #150 close-out). `CI / e2e (pull_request)` left as not-required pending one more clean PR run as the threshold.
- Recorded the broader silent-return concern in TODO backlog: the `currentChatRef.current !== sentForChatId` guard is applied across `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview`. PR #153 fixes one symptom but the same pattern can mask other drift. Either log a Sentry breadcrumb on the mismatch path or distinguish "expected stale" (chat switch) from "unexpected stale" (ref never updated) so the latter alerts.
- First CI run on the rebased SHA passed backend and frontend but failed e2e: the new prefill regression test couldn't render the task-lane question text. Diagnosed via the job log: `POST /api/v1/ai-sessions` calls `_require_ai_enabled()` and returns 503 when no provider key is set. The e2e CI job had neither `ANTHROPIC_API_KEY` nor `GOOGLE_AI_API_KEY` in env. Locally the dev backend has a real key, hence the local pass. The Playwright `page.route` stub on `/chat` was correct but never had a chance to fire because the upstream session-creation call was 503-ing.
- Fix: added a stub `ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests` to the e2e job env in `.gitea/workflows/ci.yml`. The Playwright stub still intercepts the actual `/chat` call in the browser, so the backend never contacts Anthropic — the gate just needs to clear. Documented the convention in a workflow comment so future AI-touching e2e tests know what to expect. Pushed `11fe32f`; CI went all-green.
- Merged PR #153 as `68fcdc6` on `main`. Local feature branch and remote both deleted via Gitea's `delete_branch_after_merge`.
- Opened a small follow-up `chore/post-153-handoff` PR to refresh the now-stale `.ai/` files (this entry, plus `CURRENT_TASK.md` rolling forward to "no active task — pick from `TODO.md`" and `HANDOFF.md` updating to the post-merge home position). The `data-testid` audit at the top of `TODO.md` "Up next" or the `currentChatRef` silent-return audit added in this session's backlog are the natural next pickups.
- Files touched: `frontend/src/pages/AssistantChatPage.tsx` (the one-line fix + comment), `frontend/e2e/assistant-chat-prefill.spec.ts` (new regression test), `.gitea/workflows/ci.yml` (stub `ANTHROPIC_API_KEY` for e2e), `.ai/TODO.md` (silent-return follow-up entry, plus conflict resolution preserving PR #150's backlog additions), `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md` (this entry).
## 2026-04-25 16:41 EDT — Codex — Stabilize PR #150 e2e selectors
- Investigated the remaining PR #150 failure after backend and frontend CI were green. The e2e resume smoke test was not failing because of product behavior; it used `.bg-card` plus text filtering and matched the tree filter `<select>` before the intended session card.
- Added stable test IDs to flow session, tree, and share cards, then updated affected e2e tests to target those cards instead of Tailwind class names.
- Hardened the CI workflow by making Postgres healthchecks authenticate as `postgres` and baking `VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}"` into the e2e frontend build.
- Verified with `git diff --check`, frontend build in Docker, no remaining `.bg-card` e2e selectors, and focused Playwright runs in an Actions-like Ubuntu container: resume spec passed, then history/library/library-start/resume/shares passed (`6 passed`).
- Left for next session: push this WIP commit to PR #150, watch CI, merge when all three jobs are green, then enable backend branch protection and consider the e2e gate after a reliable green run.
- Files touched: `.gitea/workflows/ci.yml`, `frontend/e2e/history.spec.ts`, `frontend/e2e/library-start.spec.ts`, `frontend/e2e/library.spec.ts`, `frontend/e2e/resume.spec.ts`, `frontend/e2e/shares.spec.ts`, `frontend/src/components/library/TreeGridView.tsx`, `frontend/src/components/library/TreeListView.tsx`, `frontend/src/pages/MySharesPage.tsx`, `frontend/src/pages/SessionHistoryPage.tsx`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/SESSION_LOG.md`.
## 2026-04-25 12:00 America/New_York — Claude Code — Mock final AI-provider test, cache CI deps, parallelize backend with pytest-xdist
- Diagnosed why CI was still red despite Codex's local 1076 passed: a single test (`test_record_decision_persists_and_bumps_state_version`) needed `ANTHROPIC_API_KEY` because the `decision: draft_template` path calls `TemplateExtractionService` → AI provider. Patched `_extract_template_parameters` with an `AsyncMock` so the test no longer depends on AI availability. Verified.
- Pushed Codex's WIP commit `49f8856` to PR #150 (had been local-only per handoff protocol).
- PR #150 (`fix/ci-workflow-config`) extended with cheap CI wins: `actions/cache@v3` for pip + npm in all three jobs; dropped `--cov-report=term-missing` (the custom display step parses JSON); added `--maxfail=10` so structural breakage exits fast.
- PR #151 (`fix/ci-pytest-xdist`) opened, stacked on #150: pytest-xdist with per-worker DB isolation. `conftest.py` reads `PYTEST_XDIST_WORKER`, computes a per-worker DB URL like `…_gw0`, and synchronously CREATEs the DB on first import. The per-test `DROP SCHEMA public CASCADE` then operates on the worker's isolated DB. Verified locally: backend suite went from 22m 27s serial → 4m 28s parallel (8 workers), 1076 passed in both cases. ~5× speedup.
- Decided NOT to do per-test transactional rollback (bigger refactor); captured for future TODO consideration.
- Left for next session: watch CI on both PRs, merge in order (#150 first, #151 second), then enable `CI / backend (pull_request)` as a required status check on main.
- Files touched: `backend/tests/test_session_suggested_fixes_api.py`, `backend/tests/conftest.py`, `backend/requirements-dev.txt`, `.gitea/workflows/ci.yml`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/TODO.md`.
## 2026-04-25 06:12 EDT — Codex — Fix backend suite to green ## 2026-04-25 06:12 EDT — Codex — Fix backend suite to green
- Fixed the real backend failures left after the CI-infra cleanup: tenant-scoped seed drift, missing production `account_id` writes, public route mounting for survey/share links, Script Builder library saves, resolution output async loading, AI search schema metadata, disabled-AI fixture leakage, and prompt marker guardrails. - Fixed the real backend failures left after the CI-infra cleanup: tenant-scoped seed drift, missing production `account_id` writes, public route mounting for survey/share links, Script Builder library saves, resolution output async loading, AI search schema metadata, disabled-AI fixture leakage, and prompt marker guardrails.

View File

@@ -5,13 +5,9 @@
## Up next ## Up next
- [ ] **Parallelize backend pytest with pytest-xdist.** ✅ landing as PR #151. Verified locally: backend suite 22 min → 4m 28s with `-n auto` on the 8-core homelab runner. Per-worker DB isolation via `PYTEST_XDIST_WORKER` in conftest.py. - [ ] **Parallelize backend pytest with pytest-xdist.** Currently the backend suite takes ~22 min wall-clock for `1076 passed, 35 deselected` (verified locally 2026-04-25). With `-n auto` on the homelab Gitea Actions runner, this should land in the 36 min range depending on core count. Blocker: `test_db` fixture in `backend/tests/conftest.py` does `DROP SCHEMA public CASCADE` per test, which two workers would race on. Standard fix: one database per worker, derived from `PYTEST_XDIST_WORKER` env var inside conftest. The runner has spare CPU, so prioritize once main is green and the 54-failure cleanup has landed.
## Backlog ## Backlog
- [ ] **Frontend lint warnings cleanup.** 23 `react-hooks/exhaustive-deps` warnings remain after PR #149 (mostly missing-deps in useEffect). Either fix them or audit them for known-safe ones and add eslint-disable comments. Not blocking CI today. - [ ] **Frontend lint warnings cleanup.** 23 `react-hooks/exhaustive-deps` warnings remain after PR #149 (mostly missing-deps in useEffect). Either fix them or audit them for known-safe ones and add eslint-disable comments. Not blocking CI today.
- [ ] **Audit `filterwarnings` ignores added in `wip(handoff): restore backend suite to green`.** Codex added narrow `ResourceWarning` filters for unclosed socket/transport/event-loop noise from pytest-asyncio teardown. Worth periodically reviewing whether those are still needed (e.g. when bumping pytest-asyncio) — if a real warning appears in those forms it would be silenced. - [ ] **Audit `filterwarnings` ignores added in `wip(handoff): restore backend suite to green`.** Codex added narrow `ResourceWarning` filters for unclosed socket/transport/event-loop noise from pytest-asyncio teardown. Worth periodically reviewing whether those are still needed (e.g. when bumping pytest-asyncio) — if a real warning appears in those forms it would be silenced.
- [ ] **Add `data-testid` attributes to e2e-critical interactive elements.** PR #152 fixed five Playwright tests by chasing UI-text changes (`Sessions``Session History`, `Account Settings``Account Management`, `/assistant``/pilot`, "Flow Sessions" tab, Resume button on session cards). Each was a one-line selector update, but every UI churn re-breaks them. Adding stable `data-testid` attributes on the targeted elements (page heading wrappers, tab nav, primary action buttons) and switching tests to `getByTestId` would make these immune to copy/route renames. Scope it small — start with `SessionHistoryPage` heading, the AI/Flow Sessions tab buttons, the per-session `Resume` button, and the command-palette FlowPilot option.
- [ ] **Per-test transactional rollback in `test_db` fixture.** Bigger engineering than xdist (which we already shipped). Instead of `DROP SCHEMA public CASCADE` per test, wrap each test in a savepoint and rollback at teardown. ~30-40% additional speedup on top of xdist for test-DB-heavy tests. Real refactor; only worth it if the suite gets significantly larger or runs more frequently.
- [ ] **Consider `pytest-testmon` for PR-time test selection.** Tracks which tests touched which source files and only re-runs affected ones. Best for small PRs touching ~few files. Adds cache-invalidation complexity; only worth it if the suite stays painfully long even after xdist.
- [ ] **AssistantChatPage `currentChatRef` guard is a silent return**`handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview` all bail with `if (currentChatRef.current !== sentForChatId) return` when stale. This is by design for chat switching, but it also silently masked the prefill-ref bug fixed in PR #153 — the user just saw "no AI response" with no log, no toast, no Sentry event. Either (a) log a `console.warn`/Sentry breadcrumb on the mismatch path so future drift is visible, or (b) split "expected stale" (chat switch) from "unexpected stale" (ref never updated) so only the latter alerts. Pair with an audit of every `currentChatRef.current = ...` assignment vs every `setActiveChatId(...)` call to make sure they're paired everywhere.

View File

@@ -17,13 +17,10 @@ jobs:
POSTGRES_USER: postgres POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres POSTGRES_PASSWORD: postgres
POSTGRES_DB: resolutionflow_test POSTGRES_DB: resolutionflow_test
# No host port mapping. Tests connect to `postgres:5432` (the service ports:
# container's docker-network DNS name), not `localhost:5432`. With - 5432:5432
# multiple Gitea runners on the same homelab box, host-port mapping
# would race — two backend/e2e jobs both binding 0.0.0.0:5432 → the
# second fails with "port is already allocated".
options: >- options: >-
--health-cmd "pg_isready -U postgres" --health-cmd pg_isready
--health-interval 10s --health-interval 10s
--health-timeout 5s --health-timeout 5s
--health-retries 5 --health-retries 5
@@ -125,14 +122,15 @@ jobs:
- name: Build - name: Build
run: cd frontend && NODE_OPTIONS="--max-old-space-size=4096" npm run build run: cd frontend && NODE_OPTIONS="--max-old-space-size=4096" npm run build
# Build artifact intentionally NOT uploaded. The e2e job below builds - name: Upload build artifact
# its own frontend rather than downloading one from this job, so there uses: actions/upload-artifact@v3
# is no need for the cross-job artifact handoff (which previously broke with:
# on actions/upload-artifact@v4 GHES support and forced a v3 pin). name: frontend-dist
# Decoupling also lets e2e start immediately rather than waiting for path: frontend/dist
# this job to finish — important on a multi-runner setup. retention-days: 1
e2e: e2e:
needs: [frontend]
runs-on: ubuntu-latest runs-on: ubuntu-latest
services: services:
@@ -142,13 +140,10 @@ jobs:
POSTGRES_USER: postgres POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres POSTGRES_PASSWORD: postgres
POSTGRES_DB: resolutionflow_test POSTGRES_DB: resolutionflow_test
# No host port mapping. Tests connect to `postgres:5432` (the service ports:
# container's docker-network DNS name), not `localhost:5432`. With - 5432:5432
# multiple Gitea runners on the same homelab box, host-port mapping
# would race — two backend/e2e jobs both binding 0.0.0.0:5432 → the
# second fails with "port is already allocated".
options: >- options: >-
--health-cmd "pg_isready -U postgres" --health-cmd pg_isready
--health-interval 10s --health-interval 10s
--health-timeout 5s --health-timeout 5s
--health-retries 5 --health-retries 5
@@ -161,12 +156,6 @@ jobs:
PLAYWRIGHT_SECRET_KEY: ci-playwright-secret-key PLAYWRIGHT_SECRET_KEY: ci-playwright-secret-key
PLAYWRIGHT_TEST_EMAIL: teamadmin@resolutionflow.example.com PLAYWRIGHT_TEST_EMAIL: teamadmin@resolutionflow.example.com
PLAYWRIGHT_TEST_PASSWORD: TestPass123! PLAYWRIGHT_TEST_PASSWORD: TestPass123!
# AI-touching endpoints (POST /ai-sessions, /chat, /respond, etc.) are
# gated by `_require_ai_enabled()`, which returns 503 when no provider
# key is set. Tests that exercise those flows stub the AI calls in the
# browser via `page.route`, so the backend never actually contacts
# Anthropic — but the gate still has to pass. A stub value is enough.
ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v4
@@ -193,13 +182,11 @@ jobs:
- name: Install frontend dependencies - name: Install frontend dependencies
run: cd frontend && npm ci run: cd frontend && npm ci
- name: Build frontend - name: Download frontend build
# Building inline (instead of downloading an artifact from the uses: actions/download-artifact@v3
# frontend job) drops the cross-job dependency, so e2e can start with:
# immediately on a free runner. Adds ~1-2 min of build time, but name: frontend-dist
# eliminates the artifact-upload mechanism entirely (no more path: frontend/dist
# v3/v4 GHES headaches) and saves ~5 min of waiting.
run: cd frontend && NODE_OPTIONS="--max-old-space-size=4096" VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}" npm run build
- name: Install Playwright browser - name: Install Playwright browser
run: cd frontend && npx playwright install --with-deps chromium run: cd frontend && npx playwright install --with-deps chromium

View File

@@ -1,60 +0,0 @@
"""add applied_pending status + pending_reason to session_suggested_fixes
Adds the `applied_pending` non-terminal status (engineer ran the fix but
verification is deferred — waiting on client, async sync, etc) alongside
the existing `applied_partial` status. Mirrors partial_notes with a new
pending_reason column for the "what are you waiting on?" prose.
Revision ID: c0f3a4b7e91d
Revises: 71efd2102f49
Create Date: 2026-04-30
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
revision: str = "c0f3a4b7e91d"
down_revision: Union[str, None] = "71efd2102f49"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"session_suggested_fixes",
sa.Column("pending_reason", sa.Text(), nullable=True),
)
op.drop_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
type_="check",
)
op.create_check_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
"status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'applied_pending', 'dismissed')",
)
def downgrade() -> None:
op.execute(
"UPDATE session_suggested_fixes "
"SET status = 'applied_partial', "
" partial_notes = COALESCE(partial_notes, pending_reason) "
"WHERE status = 'applied_pending'"
)
op.drop_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
type_="check",
)
op.create_check_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
"status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'dismissed')",
)
op.drop_column("session_suggested_fixes", "pending_reason")

View File

@@ -318,11 +318,6 @@ async def patch_suggested_fix_outcome(
status_code=status.HTTP_400_BAD_REQUEST, status_code=status.HTTP_400_BAD_REQUEST,
detail="notes are required when outcome is applied_partial", detail="notes are required when outcome is applied_partial",
) )
if body.outcome == "applied_pending" and not (body.notes and body.notes.strip()):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="notes are required when outcome is applied_pending",
)
TERMINAL = {"applied_success", "applied_failed", "dismissed"} TERMINAL = {"applied_success", "applied_failed", "dismissed"}
if fix.status in TERMINAL: if fix.status in TERMINAL:
@@ -334,10 +329,6 @@ async def patch_suggested_fix_outcome(
fix.status = body.outcome fix.status = body.outcome
if body.outcome == "applied_partial": if body.outcome == "applied_partial":
fix.partial_notes = (body.notes or "").strip() or None fix.partial_notes = (body.notes or "").strip() or None
elif body.outcome == "applied_pending":
# Pending is parked, not terminal — keep applied_at, do NOT stamp
# verified_at. Reason explains what the engineer is waiting on.
fix.pending_reason = (body.notes or "").strip() or None
elif body.outcome == "applied_failed": elif body.outcome == "applied_failed":
fix.failure_reason = (body.notes or "").strip() or None fix.failure_reason = (body.notes or "").strip() or None
fix.verified_at = now fix.verified_at = now

View File

@@ -37,7 +37,7 @@ class SessionSuggestedFix(Base):
), ),
CheckConstraint( CheckConstraint(
"status IN ('proposed', 'applied_success', 'applied_failed', " "status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'applied_pending', 'dismissed')", "'applied_partial', 'dismissed')",
name="ck_session_suggested_fixes_status", name="ck_session_suggested_fixes_status",
), ),
) )
@@ -81,7 +81,6 @@ class SessionSuggestedFix(Base):
DateTime(timezone=True), nullable=True DateTime(timezone=True), nullable=True
) )
partial_notes: Mapped[str | None] = mapped_column(Text, nullable=True) partial_notes: Mapped[str | None] = mapped_column(Text, nullable=True)
pending_reason: Mapped[str | None] = mapped_column(Text, nullable=True)
failure_reason: Mapped[str | None] = mapped_column(Text, nullable=True) failure_reason: Mapped[str | None] = mapped_column(Text, nullable=True)
ai_outcome_proposal: Mapped[dict[str, Any] | None] = mapped_column( ai_outcome_proposal: Mapped[dict[str, Any] | None] = mapped_column(
JSONB, nullable=True JSONB, nullable=True

View File

@@ -20,7 +20,6 @@ FixStatus = Literal[
"applied_success", "applied_success",
"applied_failed", "applied_failed",
"applied_partial", "applied_partial",
"applied_pending",
"dismissed", "dismissed",
] ]
@@ -41,7 +40,6 @@ class SessionSuggestedFixResponse(BaseModel):
applied_at: datetime | None applied_at: datetime | None
verified_at: datetime | None verified_at: datetime | None
partial_notes: str | None partial_notes: str | None
pending_reason: str | None
failure_reason: str | None failure_reason: str | None
ai_outcome_proposal: dict[str, Any] | None ai_outcome_proposal: dict[str, Any] | None
@@ -93,11 +91,7 @@ class SessionSuggestedFixDecisionResponse(BaseModel):
# Subset of FixStatus that the engineer can set via the outcome endpoint — # Subset of FixStatus that the engineer can set via the outcome endpoint —
# `proposed` is excluded because you can't un-decide a fix back to "proposed". # `proposed` is excluded because you can't un-decide a fix back to "proposed".
FixOutcome = Literal[ FixOutcome = Literal[
"applied_success", "applied_success", "applied_failed", "applied_partial", "dismissed"
"applied_failed",
"applied_partial",
"applied_pending",
"dismissed",
] ]
@@ -109,18 +103,14 @@ class SessionSuggestedFixOutcomeRequest(BaseModel):
engineer took); outcome captures whether the fix actually worked. engineer took); outcome captures whether the fix actually worked.
Allowed transitions: Allowed transitions:
- from `proposed`, `applied_partial`, or `applied_pending`: any outcome - from `proposed` or `applied_partial`: any outcome is valid
is valid. Partial means "did some of it"; pending means "did all of (partial is parked, not terminal — the engineer may update notes,
it but verification is deferred (waiting on client, async sync, etc)". abandon via dismiss, or advance to success/failed)
Both are parked, not terminal — the engineer may advance them to
success/failed/dismiss.
- from any terminal outcome (`applied_success`, `applied_failed`, - from any terminal outcome (`applied_success`, `applied_failed`,
`dismissed`): server returns 409 `dismissed`): server returns 409
""" """
outcome: FixOutcome outcome: FixOutcome
# Required for applied_partial AND applied_pending; optional for # Required for applied_partial, optional for applied_failed, ignored otherwise.
# applied_failed; ignored otherwise. For pending, this is the
# "what are you waiting on?" reason (e.g. "client power-cycling router").
notes: str | None = Field(None, max_length=500) notes: str | None = Field(None, max_length=500)

View File

@@ -63,9 +63,6 @@ the active suggested fix, as given in the input bundle under "Outcome status":>
provided. State that it did not resolve the issue. provided. State that it did not resolve the issue.
- applied_partial: Include the fix as a partially tried path. Include partial \ - applied_partial: Include the fix as a partially tried path. Include partial \
notes if provided. Indicate it was not fully completed or not verified. notes if provided. Indicate it was not fully completed or not verified.
- applied_pending: List the fix as applied but awaiting verification. Include \
the pending reason if provided (e.g. "client power-cycling router"). Make it \
clear the next engineer should follow up to confirm it worked.
- applied_success: Note that the fix was applied and verified but escalation \ - applied_success: Note that the fix was applied and verified but escalation \
is still needed for another reason (unusual — reflect this accurately). is still needed for another reason (unusual — reflect this accurately).
- dismissed: Do not mention the fix as a tried path; it was only considered. - dismissed: Do not mention the fix as a tried path; it was only considered.
@@ -83,8 +80,6 @@ symptoms are still being narrowed."
- applied_failed or dismissed: Say the proposed fix did not hold or was set \ - applied_failed or dismissed: Say the proposed fix did not hold or was set \
aside. State any remaining uncertainty. aside. State any remaining uncertainty.
- applied_partial: Note the partial application and what remains open. - applied_partial: Note the partial application and what remains open.
- applied_pending: Note that the fix is in place but unverified. Reference the \
pending reason. Frame this as the leading hypothesis pending confirmation.
- applied_success: Unusual in an escalate path — state the fix resolved the \ - applied_success: Unusual in an escalate path — state the fix resolved the \
original symptom but a new or related issue requires escalation. original symptom but a new or related issue requires escalation.
@@ -97,8 +92,6 @@ accordingly — e.g. suggest alternatives or deeper investigation paths, \
drawing on the failure reason if provided. \ drawing on the failure reason if provided. \
If the fix is partially applied (applied_partial), the first step is typically \ If the fix is partially applied (applied_partial), the first step is typically \
to complete or verify it. \ to complete or verify it. \
If the fix is pending verification (applied_pending), the first step is \
typically to confirm whether the fix held — reference what was being waited on. \
If the fix is still proposed (no outcome), the first step is to try it if \ If the fix is still proposed (no outcome), the first step is to try it if \
confidence is high (>80%).> confidence is high (>80%).>
@@ -306,8 +299,6 @@ class EscalationPackageGeneratorService:
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}") lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
if active_fix.partial_notes: if active_fix.partial_notes:
lines.append(f"Partial notes: {active_fix.partial_notes}") lines.append(f"Partial notes: {active_fix.partial_notes}")
if active_fix.pending_reason:
lines.append(f"Pending reason: {active_fix.pending_reason}")
if active_fix.failure_reason: if active_fix.failure_reason:
lines.append(f"Failure reason: {active_fix.failure_reason}") lines.append(f"Failure reason: {active_fix.failure_reason}")

View File

@@ -83,10 +83,6 @@ state means the engineer resolved the issue another way; the note should cover \
that actual resolution, not just the failed attempt. that actual resolution, not just the failed attempt.
- applied_partial: Note that the fix was partially applied. If partial_notes \ - applied_partial: Note that the fix was partially applied. If partial_notes \
are provided, include them. Then describe the final resolution path taken. are provided, include them. Then describe the final resolution path taken.
- applied_pending: Note that the fix was applied and verification is pending. \
If pending_reason is provided, include it (e.g. "awaiting client power-cycle"). \
Frame the resolution as provisional — the fix is in place but not yet \
confirmed. Do not write closure language.
- dismissed: Treat the fix as considered and set aside. Do not center the note \ - dismissed: Treat the fix as considered and set aside. Do not center the note \
on it. Describe the resolution based on what was actually confirmed and done. on it. Describe the resolution based on what was actually confirmed and done.
- proposed (no outcome yet): Write "Resolution not yet applied — fix proposed: \ - proposed (no outcome yet): Write "Resolution not yet applied — fix proposed: \
@@ -326,8 +322,6 @@ class ResolutionNoteGeneratorService:
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}") lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
if active_fix.partial_notes: if active_fix.partial_notes:
lines.append(f"Partial notes: {active_fix.partial_notes}") lines.append(f"Partial notes: {active_fix.partial_notes}")
if active_fix.pending_reason:
lines.append(f"Pending reason: {active_fix.pending_reason}")
if active_fix.failure_reason: if active_fix.failure_reason:
lines.append(f"Failure reason: {active_fix.failure_reason}") lines.append(f"Failure reason: {active_fix.failure_reason}")

View File

@@ -193,95 +193,6 @@ async def test_applied_at_auto_stamped_on_first_outcome(
assert body["verified_at"] is not None assert body["verified_at"] is not None
@pytest.mark.asyncio
async def test_pending_requires_notes(
client: AsyncClient, test_user, auth_headers, test_db
):
"""applied_pending requires notes (the "what are you waiting on?" reason)."""
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
r = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
headers=auth_headers,
json={"outcome": "applied_pending"},
)
assert r.status_code == 400
assert "notes" in r.text.lower()
@pytest.mark.asyncio
async def test_pending_stores_reason_and_stamps_applied_at(
client: AsyncClient, test_user, auth_headers, test_db
):
"""applied_pending stores notes under pending_reason and stamps applied_at
but NOT verified_at — the fix is parked, not verified."""
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
r = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
headers=auth_headers,
json={"outcome": "applied_pending", "notes": "client power-cycling router"},
)
assert r.status_code == 200, r.text
body = r.json()
assert body["status"] == "applied_pending"
assert body["pending_reason"] == "client power-cycling router"
assert body["applied_at"] is not None
assert body["verified_at"] is None
assert body["partial_notes"] is None
assert body["failure_reason"] is None
@pytest.mark.asyncio
async def test_pending_to_success_allowed(
client: AsyncClient, test_user, auth_headers, test_db
):
"""pending is non-terminal — engineer can advance to success once verified."""
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
r1 = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
headers=auth_headers,
json={"outcome": "applied_pending", "notes": "waiting on AD replication"},
)
assert r1.status_code == 200
r2 = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
headers=auth_headers,
json={"outcome": "applied_success"},
)
assert r2.status_code == 200
body = r2.json()
assert body["status"] == "applied_success"
assert body["verified_at"] is not None
# pending_reason is preserved as audit trail
assert body["pending_reason"] == "waiting on AD replication"
@pytest.mark.asyncio
async def test_pending_reason_can_be_updated(
client: AsyncClient, test_user, auth_headers, test_db
):
"""pending→pending with new notes updates the stored pending_reason."""
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
r1 = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
json={"outcome": "applied_pending", "notes": "waiting on AD replication"},
headers=auth_headers,
)
assert r1.status_code == 200
assert r1.json()["pending_reason"] == "waiting on AD replication"
r2 = await client.patch(
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
json={"outcome": "applied_pending", "notes": "now waiting on client to confirm login"},
headers=auth_headers,
)
assert r2.status_code == 200
assert r2.json()["pending_reason"] == "now waiting on client to confirm login"
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_failed_outcome_stores_notes_as_failure_reason( async def test_failed_outcome_stores_notes_as_failure_reason(
client: AsyncClient, test_user, auth_headers, test_db client: AsyncClient, test_user, auth_headers, test_db

View File

@@ -1,111 +0,0 @@
import { expect, test } from '@playwright/test'
/**
* Regression test for the prefill-handoff `currentChatRef` bug.
*
* Symptom: a chat session created via the dashboard prefill flow
* looked fine on the first AI turn, but submitting partial answers
* from the task lane silently dropped the AI's follow-up response.
* The user saw their answers in the chat, no assistant reply, no
* toast.
*
* Root cause: the prefill effect in `AssistantChatPage` set
* `activeChatId` without also updating `currentChatRef.current`, so
* the `currentChatRef.current !== sentForChatId` guard in
* `handleTaskSubmit` (and `handleSend`) tripped on every subsequent
* request and discarded the AI response.
*
* Strategy: drive the real prefill flow against the real backend, but
* intercept the `/chat` endpoint with `page.route` so we get
* deterministic question payloads on turn 1 and a deterministic
* follow-up on turn 2. The fix is what makes turn 2 visible.
*/
test.describe('AssistantChatPage — prefill handoff regression', () => {
test('AI follow-up renders after submitting partial task lane answers', async ({ page }) => {
let chatCallCount = 0
// Clear any persisted active-chat-id so the page does not auto-resume a
// stale session left behind by a sibling spec.
await page.addInitScript(() => {
try {
sessionStorage.removeItem('rf-active-chat-id')
sessionStorage.removeItem('rf-tasklane-meta')
} catch { /* ignore */ }
})
// Intercept only the chat endpoint. Session creation, listSessions,
// facts, suggested-fixes, etc. all hit the real backend so the page
// renders normally — only the LLM call is deterministic. The pattern
// matches `/ai-sessions/<uuid>/chat` and nothing nested beneath it.
await page.route(/\/api\/v1\/ai-sessions\/[^/]+\/chat$/, async (route) => {
if (route.request().method() !== 'POST') {
await route.fallback()
return
}
chatCallCount += 1
if (chatCallCount === 1) {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
content: 'Initial diagnostic plan. Please answer the questions in the task lane.',
suggested_flows: [],
fork: null,
actions: [],
questions: [
{ text: 'Has the user recently changed their password?' },
{ text: 'Is the lockout happening at a consistent time of day?' },
],
}),
})
return
}
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({
content: 'Got it — based on your answer, here is what to check next.',
suggested_flows: [],
fork: null,
actions: [],
questions: [],
}),
})
})
// Drive the prefill flow exactly the way the dashboard does. The textarea
// is keyed by its placeholder copy on QuickStartPage.
await page.goto('/')
const prefillBox = page.getByPlaceholder(/Describe the issue/i)
await expect(prefillBox).toBeVisible({ timeout: 10_000 })
await prefillBox.fill('User locked out of AD weekly')
await prefillBox.press('Enter')
// After the prefill submits we land on /pilot and the first stubbed AI
// turn surfaces the task-lane question text.
await expect(page).toHaveURL(/\/pilot/)
await expect(
page.getByText('Has the user recently changed their password?'),
).toBeVisible({ timeout: 15_000 })
// Answer the first question. UI flow: click "Answer" to open the
// textarea, type, click the inline "Answer" button to mark done.
await page.getByRole('button', { name: /^Answer$/ }).first().click()
await page.getByPlaceholder('Type your answer...').fill('No, password is months old')
await page.getByRole('button', { name: /^Answer$/ }).first().click()
// Submit the partial response. Pre-fix: the response was silently dropped
// here because `currentChatRef.current` still held the mount-time value.
await page.getByRole('button', { name: /Send 1 of 2 Responses/ }).click()
// Bug repro: the assistant message must render. Pre-fix this assertion
// fails because `handleTaskSubmit` early-returns at the
// `currentChatRef.current !== sentForChatId` guard.
await expect(
page.getByText('Got it — based on your answer, here is what to check next.'),
).toBeVisible({ timeout: 15_000 })
// Both chat calls must have actually happened.
expect(chatCallCount).toBe(2)
})
})

View File

@@ -88,8 +88,6 @@ test.describe('command palette smoke tests', () => {
await flowpilotOption.click() await flowpilotOption.click()
// Phase 1 of the FlowPilot migration renamed /assistant to /pilot. await expect(page).toHaveURL(/\/assistant/)
// /assistant still 301-redirects to /pilot, so accept either landing URL.
await expect(page).toHaveURL(/\/(pilot|assistant)/)
}) })
}) })

View File

@@ -24,21 +24,13 @@ test.describe('session history smoke tests', () => {
await page.goto('/sessions') await page.goto('/sessions')
await expect( await expect(
page.getByRole('heading', { name: 'Session History', exact: true }), page.getByRole('heading', { name: 'Sessions', exact: true }),
).toBeVisible() ).toBeVisible()
// Default tab on /sessions is "AI Sessions"; flow sessions live behind
// the "Flow Sessions" tab and only that tab exposes ticket/client filters.
await page.getByRole('button', { name: 'Flow Sessions' }).click()
await page.getByPlaceholder('Search by ticket number...').fill(ticketNumber) await page.getByPlaceholder('Search by ticket number...').fill(ticketNumber)
await page.getByPlaceholder('Search by client name...').fill(clientName) await page.getByPlaceholder('Search by client name...').fill(clientName)
const sessionCard = page const sessionCard = page.locator('.bg-card').filter({ hasText: ticketNumber }).filter({ hasText: clientName }).first()
.getByTestId('flow-session-card')
.filter({ hasText: ticketNumber })
.filter({ hasText: clientName })
.first()
await expect(sessionCard).toBeVisible() await expect(sessionCard).toBeVisible()
await expect(sessionCard.getByText(tree.name)).toBeVisible() await expect(sessionCard.getByText(tree.name)).toBeVisible()

View File

@@ -24,7 +24,7 @@ test.describe('flow library start-session smoke tests', () => {
await page.getByPlaceholder('Search flows...').fill(tree.name) await page.getByPlaceholder('Search flows...').fill(tree.name)
await page.getByRole('button', { name: 'Search', exact: true }).click() await page.getByRole('button', { name: 'Search', exact: true }).click()
const treeCard = page.getByTestId('tree-card').filter({ hasText: tree.name }).first() const treeCard = page.locator('.bg-card').filter({ hasText: tree.name }).first()
await expect(treeCard).toBeVisible() await expect(treeCard).toBeVisible()
await treeCard.getByRole('button', { name: /^Start(?: Session)?$/ }).click() await treeCard.getByRole('button', { name: /^Start(?: Session)?$/ }).click()

View File

@@ -20,7 +20,7 @@ test.describe('flow library smoke tests', () => {
await page.getByPlaceholder('Search flows...').fill(tree.name) await page.getByPlaceholder('Search flows...').fill(tree.name)
await page.getByRole('button', { name: 'Search', exact: true }).click() await page.getByRole('button', { name: 'Search', exact: true }).click()
await expect(page.getByTestId('tree-card').filter({ hasText: tree.name }).first()).toBeVisible() await expect(page.getByText(tree.name)).toBeVisible()
} finally { } finally {
await disposeApiContext(api) await disposeApiContext(api)
} }

View File

@@ -14,7 +14,7 @@ test.describe('authenticated navigation smoke tests', () => {
await page.goto('/sessions') await page.goto('/sessions')
await expect( await expect(
page.getByRole('heading', { name: 'Session History', exact: true }), page.getByRole('heading', { name: 'Sessions', exact: true }),
).toBeVisible() ).toBeVisible()
}) })
@@ -30,7 +30,7 @@ test.describe('authenticated navigation smoke tests', () => {
await page.goto('/account') await page.goto('/account')
await expect( await expect(
page.getByRole('heading', { name: 'Account Management' }), page.getByRole('heading', { name: 'Account Settings' }),
).toBeVisible() ).toBeVisible()
}) })
}) })

View File

@@ -18,17 +18,9 @@ test.describe('session resume smoke tests', () => {
}) })
try { try {
// Resume flow moved off /trees onto the Flow Sessions tab of /sessions await page.goto('/trees')
// during the FlowPilot migration. The destination (/trees/:id/navigate)
// is unchanged — only the entry point shifted.
await page.goto('/sessions')
await expect(
page.getByRole('heading', { name: 'Session History', exact: true }),
).toBeVisible()
await page.getByRole('button', { name: 'Flow Sessions' }).click()
// Active sub-tab is the default and surfaces in-progress sessions.
const resumeCard = page.getByTestId('flow-session-card').filter({ hasText: tree.name }).first() const resumeCard = page.locator('.bg-card').filter({ hasText: tree.name }).filter({ hasText: 'Resume' }).first()
await expect(resumeCard).toBeVisible() await expect(resumeCard).toBeVisible()
await resumeCard.getByRole('button', { name: 'Resume' }).first().click() await resumeCard.getByRole('button', { name: 'Resume' }).first().click()

View File

@@ -31,7 +31,7 @@ test.describe('shared session management smoke tests', () => {
).toBeVisible() ).toBeVisible()
await expect(page.getByText(share.share_name || '')).toBeVisible() await expect(page.getByText(share.share_name || '')).toBeVisible()
const shareCard = page.getByTestId('share-card').filter({ hasText: share.share_name || '' }).first() const shareCard = page.locator('.bg-card').filter({ hasText: share.share_name || '' }).first()
await shareCard.getByRole('button', { name: 'Revoke' }).click() await shareCard.getByRole('button', { name: 'Revoke' }).click()
const confirmDialog = page.getByRole('dialog', { name: 'Revoke Share Link' }) const confirmDialog = page.getByRole('dialog', { name: 'Revoke Share Link' })

View File

@@ -13,14 +13,12 @@ export type FixStatus =
| 'applied_success' | 'applied_success'
| 'applied_failed' | 'applied_failed'
| 'applied_partial' | 'applied_partial'
| 'applied_pending'
| 'dismissed' | 'dismissed'
export type FixOutcome = export type FixOutcome =
| 'applied_success' | 'applied_success'
| 'applied_failed' | 'applied_failed'
| 'applied_partial' | 'applied_partial'
| 'applied_pending'
| 'dismissed' | 'dismissed'
export interface AIOutcomeProposal { export interface AIOutcomeProposal {
@@ -43,7 +41,6 @@ export interface SessionSuggestedFix {
applied_at: string | null applied_at: string | null
verified_at: string | null verified_at: string | null
partial_notes: string | null partial_notes: string | null
pending_reason: string | null
failure_reason: string | null failure_reason: string | null
ai_outcome_proposal: AIOutcomeProposal | null ai_outcome_proposal: AIOutcomeProposal | null
superseded_at: string | null superseded_at: string | null
@@ -129,12 +126,11 @@ export const sessionSuggestedFixesApi = {
/** /**
* Record the outcome of applying a suggested fix. Transition rules: * Record the outcome of applying a suggested fix. Transition rules:
* - from `proposed`, `applied_partial`, or `applied_pending`: any outcome * - from `proposed` or `applied_partial`: any outcome is valid (partial is
* is valid. Partial = "did some of it"; pending = "did all of it but * parked, not terminal — engineer may update notes, abandon via dismiss,
* verification is deferred". Both are parked, not terminal. * or advance to success/failed).
* - from a terminal status (`applied_success`, `applied_failed`, `dismissed`): * - from a terminal status (`applied_success`, `applied_failed`, `dismissed`):
* server returns 409. * server returns 409.
* - `applied_pending` requires `notes` (the "what are you waiting on?" reason).
*/ */
async patchOutcome( async patchOutcome(
sessionId: string, sessionId: string,

View File

@@ -34,8 +34,6 @@ export function TreeGridView({
{trees.map((tree) => ( {trees.map((tree) => (
<div <div
key={tree.id} key={tree.id}
data-testid="tree-card"
data-tree-id={tree.id}
className="relative bg-card border border-border rounded-2xl p-4 transition-all hover:-translate-y-0.5 hover:border-primary/30 hover:shadow-md sm:p-6" className="relative bg-card border border-border rounded-2xl p-4 transition-all hover:-translate-y-0.5 hover:border-primary/30 hover:shadow-md sm:p-6"
> >
<div className="mb-2 flex items-start justify-between gap-2"> <div className="mb-2 flex items-start justify-between gap-2">

View File

@@ -33,8 +33,6 @@ export function TreeListView({
{trees.map((tree) => ( {trees.map((tree) => (
<div <div
key={tree.id} key={tree.id}
data-testid="tree-card"
data-tree-id={tree.id}
className="flex items-center gap-4 bg-card border border-border rounded-2xl p-4 transition-all hover:border-primary/30 hover:shadow-xs" className="flex items-center gap-4 bg-card border border-border rounded-2xl p-4 transition-all hover:border-primary/30 hover:shadow-xs"
> >
{/* Left: Name and Description */} {/* Left: Name and Description */}

View File

@@ -10,7 +10,7 @@
* + 07-verify-states.html. * + 07-verify-states.html.
*/ */
import { useState } from 'react' import { useState } from 'react'
import { Sparkles, Check, ChevronDown, X, MoreHorizontal, Info, Clock3 } from 'lucide-react' import { Sparkles, Check, ChevronDown, X, MoreHorizontal, Info } from 'lucide-react'
import { cn } from '@/lib/utils' import { cn } from '@/lib/utils'
import type { import type {
SessionSuggestedFix, SessionSuggestedFix,
@@ -21,7 +21,6 @@ export type BannerMode =
| 'proposed' // AI just proposed; engineer hasn't applied yet | 'proposed' // AI just proposed; engineer hasn't applied yet
| 'verifying' // Engineer clicked Apply; awaiting outcome | 'verifying' // Engineer clicked Apply; awaiting outcome
| 'partial' // Applied partially; awaiting finish or terminal outcome | 'partial' // Applied partially; awaiting finish or terminal outcome
| 'pending' // Applied fully; verification deferred (waiting on client, etc)
| 'ai_confirming' // AI emitted [FIX_OUTCOME]; engineer confirms | 'ai_confirming' // AI emitted [FIX_OUTCOME]; engineer confirms
| 'nudge' // Compact nudge shown after N post-apply messages | 'nudge' // Compact nudge shown after N post-apply messages
@@ -46,7 +45,6 @@ export function ProposalBanner(props: ProposalBannerProps) {
case 'proposed': return <ProposedBanner {...props} /> case 'proposed': return <ProposedBanner {...props} />
case 'verifying': return <VerifyingBanner {...props} /> case 'verifying': return <VerifyingBanner {...props} />
case 'partial': return <PartialBanner {...props} /> case 'partial': return <PartialBanner {...props} />
case 'pending': return <PendingBanner {...props} />
case 'ai_confirming': return <AIConfirmingBanner {...props} /> case 'ai_confirming': return <AIConfirmingBanner {...props} />
case 'nudge': return <NudgeBanner {...props} /> case 'nudge': return <NudgeBanner {...props} />
} }
@@ -150,7 +148,7 @@ function VerifyingBanner({ fix, onOutcome }: ProposalBannerProps) {
</button> </button>
{showOverflow && ( {showOverflow && (
<div className={cn( <div className={cn(
'absolute top-full right-0 mt-1 w-56 rounded-lg', 'absolute top-full right-0 mt-1 w-48 rounded-lg',
'border border-white/10 bg-card shadow-xl py-1 z-10', 'border border-white/10 bg-card shadow-xl py-1 z-10',
)}> )}>
<button <button
@@ -163,17 +161,6 @@ function VerifyingBanner({ fix, onOutcome }: ProposalBannerProps) {
> >
Mark partial Mark partial
</button> </button>
<button
onClick={() => {
setShowOverflow(false)
const reason = window.prompt('What are you waiting on? (e.g. "client power-cycling router")')
if (reason && reason.trim()) onOutcome('applied_pending', reason.trim())
}}
className="w-full text-left px-3 py-2 text-[12.5px] hover:bg-elevated text-primary inline-flex items-center gap-2"
>
<Clock3 size={12} className="text-info" />
Waiting to verify
</button>
</div> </div>
)} )}
<button <button
@@ -260,66 +247,6 @@ function PartialBanner({ fix, onOutcome, onApply }: ProposalBannerProps) {
) )
} }
function PendingBanner({ fix, onOutcome }: ProposalBannerProps) {
return (
<div className="relative border-t border-info/30 bg-gradient-to-b from-info-dim/40 to-info-dim/20 px-5 py-3 animate-slide-up">
<div className="absolute left-0 top-0 bottom-0 w-[3px] bg-info" />
<div className="flex items-start gap-3">
<div className="shrink-0 mt-0.5 w-7 h-7 rounded-md border border-info/30 bg-info-dim flex items-center justify-center text-info">
<Clock3 size={15} />
</div>
<div className="flex-1 min-w-0">
<div className="flex items-center gap-2 font-heading text-[10px] font-semibold uppercase tracking-[1.2px] text-info">
<span>Awaiting verification</span>
<span className="px-2 py-[2px] rounded-full bg-info/20 text-info text-[10.5px] font-bold normal-case tracking-normal">
Parked
</span>
</div>
<div className="mt-0.5 text-[14px] font-semibold text-heading leading-snug">
{fix.title}
</div>
{fix.pending_reason && (
<div className="mt-1.5 flex items-center gap-2 px-2.5 py-1.5 rounded-md bg-info/[0.08] border border-info/30 text-[12px] italic text-primary">
<span className="not-italic font-bold text-info text-[10.5px] uppercase tracking-[0.6px]">Waiting on</span>
<span>{fix.pending_reason}</span>
</div>
)}
</div>
<div className="flex items-center gap-2 shrink-0 pt-0.5">
<button
onClick={() => {
const reason = window.prompt(
'Update what you\'re waiting on:',
fix.pending_reason ?? '',
)
if (reason && reason.trim()) onOutcome('applied_pending', reason.trim())
}}
className="px-3 py-[9px] rounded-lg text-muted-foreground text-[12.5px] hover:bg-white/[0.08] hover:text-primary"
>
Update reason
</button>
<button
onClick={() => {
const reason = window.prompt("Why didn't it work? (optional)")
onOutcome('applied_failed', reason?.trim() || undefined)
}}
className="px-3 py-[9px] rounded-lg border border-danger/30 text-danger text-[12.5px] font-medium hover:bg-danger-dim hover:border-danger"
>
Didn't work
</button>
<button
onClick={() => onOutcome('applied_success')}
className="px-3 py-[9px] rounded-lg bg-success text-[#0a1a12] font-semibold text-[12.5px] hover:brightness-110 inline-flex items-center gap-1.5"
>
<Check size={12} strokeWidth={2.5} />
It worked
</button>
</div>
</div>
</div>
)
}
function AIConfirmingBanner({ fix, onAcceptAIProposal, onRejectAIProposal }: ProposalBannerProps) { function AIConfirmingBanner({ fix, onAcceptAIProposal, onRejectAIProposal }: ProposalBannerProps) {
const proposal = fix.ai_outcome_proposal const proposal = fix.ai_outcome_proposal
if (!proposal) return null if (!proposal) return null
@@ -391,19 +318,9 @@ function NudgeBanner({ fix, onOutcome, onSilenceNudge }: ProposalBannerProps) {
Did <strong className="text-heading">"{fix.title}"</strong> work? Did <strong className="text-heading">"{fix.title}"</strong> work?
</span> </span>
<button <button
onClick={() => { onClick={onSilenceNudge}
const reason = window.prompt( className="px-2.5 py-1 rounded text-[12px] text-muted-foreground hover:bg-white/[0.08] hover:text-primary"
'What are you waiting on? (e.g. "client power-cycling router")',
)
if (reason && reason.trim()) {
onOutcome('applied_pending', reason.trim())
} else {
onSilenceNudge()
}
}}
className="px-2.5 py-1 rounded text-[12px] text-muted-foreground hover:bg-white/[0.08] hover:text-primary inline-flex items-center gap-1"
> >
<Clock3 size={11} />
Still checking Still checking
</button> </button>
<button <button

View File

@@ -159,7 +159,6 @@ export default function AssistantChatPage() {
if (activeFix.status === 'dismissed') return null if (activeFix.status === 'dismissed') return null
if (activeFix.ai_outcome_proposal) return 'ai_confirming' if (activeFix.ai_outcome_proposal) return 'ai_confirming'
if (activeFix.status === 'applied_partial') return 'partial' if (activeFix.status === 'applied_partial') return 'partial'
if (activeFix.status === 'applied_pending') return 'pending'
if (activeFix.status === 'applied_success' || activeFix.status === 'applied_failed') return null if (activeFix.status === 'applied_success' || activeFix.status === 'applied_failed') return null
if (activeFix.applied_at) { if (activeFix.applied_at) {
if (postApplyMsgCount >= 3 && !nudgeSilenced) return 'nudge' if (postApplyMsgCount >= 3 && !nudgeSilenced) return 'nudge'
@@ -256,12 +255,6 @@ export default function AssistantChatPage() {
} }
setChats(prev => [chatItem, ...prev]) setChats(prev => [chatItem, ...prev])
setActiveChatId(session.session_id) setActiveChatId(session.session_id)
// Keep the in-flight guard ref in sync. Without this, currentChatRef
// stays at its mount-time value (often a stale id from sessionStorage
// or null), so subsequent handleSend / handleTaskSubmit calls bail at
// their `currentChatRef.current !== sentForChatId` check and the AI
// response is silently dropped.
currentChatRef.current = session.session_id
setMessages([{ role: 'user', content: prefill }]) setMessages([{ role: 'user', content: prefill }])
setLoading(true) setLoading(true)

View File

@@ -161,12 +161,7 @@ export default function MySharesPage() {
const isCopied = copiedId === share.id const isCopied = copiedId === share.id
return ( return (
<div <div key={share.id} className="bg-card border border-border rounded-xl p-5">
key={share.id}
data-testid="share-card"
data-share-id={share.id}
className="bg-card border border-border rounded-xl p-5"
>
{/* Top row: badge + name */} {/* Top row: badge + name */}
<div className="flex items-center gap-3 mb-3"> <div className="flex items-center gap-3 mb-3">
<span className="inline-flex items-center gap-1.5 text-xs rounded-full px-2 py-0.5 bg-accent text-muted-foreground"> <span className="inline-flex items-center gap-1.5 text-xs rounded-full px-2 py-0.5 bg-accent text-muted-foreground">

View File

@@ -533,11 +533,7 @@ export default function SessionHistoryPage() {
)} )}
style={{ '--stagger-index': i } as React.CSSProperties} style={{ '--stagger-index': i } as React.CSSProperties}
> >
<div <div className="bg-card border border-border rounded-xl p-4 transition-all hover:border-[var(--color-border-hover)]">
data-testid="flow-session-card"
data-session-id={session.id}
className="bg-card border border-border rounded-xl p-4 transition-all hover:border-[var(--color-border-hover)]"
>
<div className="flex flex-col gap-3 sm:flex-row sm:items-start sm:justify-between"> <div className="flex flex-col gap-3 sm:flex-row sm:items-start sm:justify-between">
<div className="flex-1"> <div className="flex-1">
<div className="flex flex-wrap items-center gap-2"> <div className="flex flex-wrap items-center gap-2">