Merge PR #153: fix(chat): sync currentChatRef when prefill creates a new chat session

Fixes a silent-drop bug where the dashboard prefill flow created a new chat session but didn't update the in-flight guard ref, so subsequent task-lane submissions had their AI follow-up responses discarded. Includes a Playwright regression test that drives the prefill flow and stubs /ai-sessions/*/chat to verify the second AI turn renders. Also adds a stub ANTHROPIC_API_KEY to the e2e CI job so AI-gated endpoints clear their _require_ai_enabled() check (the chat call itself is intercepted in the browser, so no real Anthropic traffic).
2026-04-26 05:05:54 +00:00
parent 87bb20b8f0 11fe32f4c6
commit 68fcdc6122
7 changed files with 185 additions and 51 deletions
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -1,21 +1,28 @@
 # CURRENT_TASK.md

+**Task:** Land PR #153 — fix the AssistantChatPage prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane.
+
+**Status:** in-progress (CI running on rebased branch)
+
+**Definition of Done:**
+- [ ] PR #153 (`fix/tasklane-prefill-ref`) CI green on the rebased SHA `1559feb`. Backend, frontend, and e2e all pass.
+- [ ] PR #153 merged into `main`.
+- [ ] User-visible verification: from the dashboard, type a prefill, answer a subset of task-lane questions, click *Send N of M Responses* — AI follow-up renders.
+
+**Assumptions:**
+- Rebasing onto post-#150 main pulls in the workflow fixes (no host-port mapping for postgres, no upload-artifact step), so the prior CI failures on this PR are resolved structurally.
+- The new e2e regression spec (`frontend/e2e/assistant-chat-prefill.spec.ts`) doesn't depend on Anthropic — it uses `page.route` to stub `/ai-sessions/*/chat` deterministically, so it should be CI-stable.
+
+**Out of scope (deferred to TODO):**
+- Hardening the `currentChatRef.current !== sentForChatId` silent-return pattern across `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, `refreshPreview`. PR #153 fixes the specific symptom; the broader audit is its own task in `TODO.md`.
+- Promoting `CI / e2e (pull_request)` to required on `main`. Holding off until two consecutive green PR runs (PR #150 was one; PR #153's run will be the second if it passes).
+
+## Previous task — closed out
+
 **Task:** Land consolidated CI-recovery PR #150 and lock reliable CI gates on `main`.

-**Status:** in-progress
+**Status:** complete (2026-04-26).

-**Definition of Done:**
- [ ] PR #150 (`fix/ci-workflow-config`) merged. `CI / backend (pull_request)`, `CI / frontend (pull_request)`, and `CI / e2e (pull_request)` show success before merge.
- [ ] `CI / backend (pull_request)` added to required status checks on `main` in Gitea branch protection (frontend is already required).
- [ ] Optional: `CI / e2e (pull_request)` confirmed clean across at least one PR run and added to required checks.
-
-**Assumptions:**
- The 8-core homelab Gitea Actions runner can support `-n auto` (8 xdist workers). If memory pressure shows up in CI, drop to `-n 4`.
- pytest-cov's xdist support continues to handle the coverage merge and `--cov-fail-under=50` check correctly.
- The per-worker DB creation in `conftest.py` is idempotent and racing workers on first import won't all try to CREATE DATABASE simultaneously — postgres serializes that, but if it surfaces issues, wrap with an advisory lock.
-
-**Out of scope:**
- Frontend lint warnings (23 remain after #149).
- The 23 react-hooks/exhaustive-deps warnings.
- RLS test suite (gated behind `RUN_RLS_TESTS=1`; not in default CI).
- Per-test transactional rollback (would shave another 30-40% off backend time but is a much bigger refactor — capture in TODO if interested).
+- PR #150 merged as commit `87bb20b` on `main`. Backend, frontend, and e2e all green on the merge SHA.
+- `CI / backend (pull_request)` added to required status checks on `main` (alongside the pre-existing `CI / frontend (pull_request)` requirement).
+- `CI / e2e (pull_request)` left as not-required pending one more green PR run.
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,63 +2,54 @@

 # HANDOFF.md

-**Last updated:** 2026-04-25 16:41 EDT
+**Last updated:** 2026-04-26 03:50 EDT

-**Active task:** Land PR #150 (the consolidated CI-recovery PR), then enable backend and eventually e2e gates on `main`. See [CURRENT_TASK.md](CURRENT_TASK.md).
+**Active task:** Ship PR #153 — the AssistantChatPage prefill `currentChatRef` bug fix. See [CURRENT_TASK.md](CURRENT_TASK.md).

-**Branch:** `fix/ci-workflow-config` -> PR #150. PRs #151 and #152 were closed and consolidated into this branch.
+**Branch:** `fix/tasklane-prefill-ref` → PR #153.

 ## Current resume point

-Latest PR #150 CI had backend and frontend green, but `CI / e2e (pull_request)` failed on the resume smoke test.
+PR #150 is merged. Branch `fix/tasklane-prefill-ref` has been rebased onto the new `main` (HEAD `1559feb`) and force-pushed; that pulled in the workflow fixes from #150, so the prior backend port-conflict and frontend upload-artifact failures are gone structurally.

-The failure was not product behavior. Playwright was using:
+CI was kicked off automatically on the rebased SHA. Watch:

-```ts
-page.locator('.bg-card').filter({ hasText: tree.name }).first()
-```
+- `CI / backend (pull_request)` — should pass now that postgres uses the docker-network DNS name `postgres:5432` (no host port race).
+- `CI / frontend (pull_request)` — should pass now that the `actions/upload-artifact@v4` step is removed.
+- `CI / e2e (pull_request)` — runs on its own postgres + builds frontend inline. Includes the new `e2e/assistant-chat-prefill.spec.ts` regression test, which stubs `/ai-sessions/*/chat` with `page.route` and is independent of Anthropic.

-On the session history page this matched the tree filter `<select>` first because the select options contain the same flow name, then the test waited forever for a `Resume` button inside the select.
+If all three go green, merge #153 into `main`.

-This session fixed that properly by adding stable test IDs to repeated cards and moving e2e tests off `.bg-card` selectors:
+## What this PR fixes

- `flow-session-card` in `SessionHistoryPage.tsx`
- `tree-card` in `TreeGridView.tsx` and `TreeListView.tsx`
- `share-card` in `MySharesPage.tsx`
+Reported symptom: in a troubleshooting (chat) session, after answering a subset of the task-lane questions and clicking *Send N of M Responses*, no AI response appeared.

-The workflow was also hardened:
+Root cause: the dashboard prefill effect in `AssistantChatPage` set `activeChatId` after creating a new session but never updated `currentChatRef.current`. The `currentChatRef.current !== sentForChatId` guard inside `handleSend` and `handleTaskSubmit` then bailed silently on every later request and discarded the AI's reply.

- Postgres service healthchecks now run `pg_isready -U postgres` instead of checking as `root`.
- The e2e frontend build now bakes `VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}"`, matching the Playwright backend origin.
+Fix is a single line: assign `currentChatRef.current = session.session_id` immediately after `setActiveChatId(session.session_id)` in the prefill effect, mirroring `handleNewChat` and `handleResumeNew`.

 ## Verification completed

- `git diff --check`
- Confirmed no remaining `.bg-card` selectors in `frontend/e2e/*.ts`.
- `docker exec -w /app resolutionflow_frontend npm run build`
- Ran migrations and test-user seed in the dev backend container.
- Focused Playwright verification in an Actions-like Ubuntu container:
-  - First `e2e/resume.spec.ts` passed.
-  - Then `e2e/history.spec.ts e2e/library.spec.ts e2e/library-start.spec.ts e2e/resume.spec.ts e2e/shares.spec.ts --project=chromium --workers=1` passed: `6 passed (1.3m)`.
-
-## Immediate next steps
-
-1. Push the WIP commit from this session to PR #150.
-2. Watch PR #150 CI on the new SHA. Expected result: backend, frontend, and e2e all green.
-3. Merge PR #150 when green.
-4. Enable `CI / backend (pull_request)` as a required status check on `main`.
-5. After at least one reliable green PR run, consider adding `CI / e2e (pull_request)` as required too.
+- New regression test `frontend/e2e/assistant-chat-prefill.spec.ts` drives the real dashboard prefill flow against the real backend, stubs `/ai-sessions/*/chat` for deterministic turn-1/turn-2 responses, asserts the second AI message renders. Confirmed it fails on unfixed code at the exact assertion (`Got it — based on your answer…` never appears) and passes once the fix is restored.
+- `tsc -b` clean. No new lint errors. Adjacent specs (`flowpilot-chat`) still pass.

 ## Branch protection on main (current)

 - PR-only merges
 - `CI / frontend (pull_request)` required
+- `CI / backend (pull_request)` required (added during PR #150 close-out)
 - Force-push blocked
 - No review required (solo)

+## Immediate next steps
+
+1. Watch PR #153 CI on `1559feb`.
+2. Merge PR #153 when all three checks are green.
+3. After merge, decide whether to promote `CI / e2e (pull_request)` to required (would make this two consecutive green PR e2e runs in a row, the threshold from the prior CURRENT_TASK).
+4. Pick next item from `.ai/TODO.md` — top "Up next" is the `data-testid` audit; the `currentChatRef` silent-return follow-up is in Backlog and is a natural pairing with this fix.
+
 ## Useful breadcrumbs

- `.gitea/workflows/ci.yml` contains the parallel backend/frontend/e2e workflow.
- `backend/scripts/seed_phase9_qa_fixtures.py` pre-bakes Phase 9 QA fixtures.
- `.gstack/qa-reports/phase9-20260424-232700/REPORT.md` has the FlowPilot QA report.
- Per-worker test DBs accumulate on the Postgres service. Cheap to leave around; cleanup if needed.
+- Bug fix: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) around line 258 — the `currentChatRef.current = session.session_id` line in the prefill `sendPrefill` effect.
+- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts).
+- TODO entry tracking the broader silent-return audit: [`.ai/TODO.md`](TODO.md).
--- a/.ai/SESSION_LOG.md
+++ b/.ai/SESSION_LOG.md
@@ -12,6 +12,18 @@

 ---

+## 2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150
+
+- User reported a troubleshooting-session bug: after answering a subset of task-lane questions and clicking *Send N of M Responses*, no AI response appeared. Traced to `AssistantChatPage`: the dashboard prefill effect set `activeChatId` after creating a new chat session but never updated `currentChatRef.current`. The `currentChatRef.current !== sentForChatId` guard in `handleSend` and `handleTaskSubmit` then bailed silently on every later request and discarded the AI's reply. The user message was already pushed to the chat before the await, so the user saw their answers but nothing else.
+- Fix: one-line addition mirroring `handleNewChat` and `handleResumeNew` — assign `currentChatRef.current = session.session_id` immediately after `setActiveChatId(session.session_id)` in the prefill effect. Branched off `origin/main` as `fix/tasklane-prefill-ref`; PR #153 opened on Gitea.
+- Authored a Playwright regression test `frontend/e2e/assistant-chat-prefill.spec.ts` that drives the real dashboard prefill flow against the real backend, stubs `/ai-sessions/*/chat` with `page.route` for deterministic turn-1/turn-2 responses, and asserts the second AI message renders. Confirmed the test fails on unfixed code at the exact assertion (`Got it — based on your answer…` never appears) and passes once the fix is restored.
+- Verified locally inside `mcr.microsoft.com/playwright:v1.58.2-noble` against the running dev stack: new spec passes, adjacent `flowpilot-chat` spec still passes, `tsc -b` clean. `resume.spec` and `history.spec` failures observed are pre-existing real-backend fixture collisions, unrelated to this change.
+- First CI run on PR #153 failed on infrastructure issues already addressed by PR #150: backend hit `Bind for 0.0.0.0:5432 failed: port is already allocated`, frontend hit `actions/upload-artifact@v4 not supported on GHES`. PR #150 was already merged (commit `87bb20b` on `main`). Rebased `fix/tasklane-prefill-ref` onto new `main` (force-push `1a8cb06` → `1559feb`), resolved a `.ai/TODO.md` conflict by keeping both backlog item sets, kicked off CI on the rebased SHA.
+- Confirmed `CI / backend (pull_request)` is now in branch protection's required-status-checks list (added during PR #150 close-out). `CI / e2e (pull_request)` left as not-required pending one more clean PR run as the threshold.
+- Recorded the broader silent-return concern in TODO backlog: the `currentChatRef.current !== sentForChatId` guard is applied across `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview`. PR #153 fixes one symptom but the same pattern can mask other drift. Either log a Sentry breadcrumb on the mismatch path or distinguish "expected stale" (chat switch) from "unexpected stale" (ref never updated) so the latter alerts.
+- Left for next session: watch PR #153 CI on `1559feb`, merge when green, then decide whether to promote `CI / e2e (pull_request)` to required and pick the next item from `TODO.md` — natural pairing is the `currentChatRef` silent-return audit added to backlog this session.
+- Files touched: `frontend/src/pages/AssistantChatPage.tsx` (the one-line fix + comment), `frontend/e2e/assistant-chat-prefill.spec.ts` (new regression test), `.ai/TODO.md` (silent-return follow-up entry, plus conflict resolution preserving PR #150's backlog additions), `.ai/CURRENT_TASK.md` (rolled forward to PR #153 active task; previous task closed out), `.ai/HANDOFF.md` (new resume point), `.ai/SESSION_LOG.md` (this entry).
+
 ## 2026-04-25 16:41 EDT — Codex — Stabilize PR #150 e2e selectors

 - Investigated the remaining PR #150 failure after backend and frontend CI were green. The e2e resume smoke test was not failing because of product behavior; it used `.bg-card` plus text filtering and matched the tree filter `<select>` before the intended session card.
--- a/.ai/TODO.md
+++ b/.ai/TODO.md
@@ -14,3 +14,4 @@
 - [ ] **Add `data-testid` attributes to e2e-critical interactive elements.** PR #152 fixed five Playwright tests by chasing UI-text changes (`Sessions` → `Session History`, `Account Settings` → `Account Management`, `/assistant` → `/pilot`, "Flow Sessions" tab, Resume button on session cards). Each was a one-line selector update, but every UI churn re-breaks them. Adding stable `data-testid` attributes on the targeted elements (page heading wrappers, tab nav, primary action buttons) and switching tests to `getByTestId` would make these immune to copy/route renames. Scope it small — start with `SessionHistoryPage` heading, the AI/Flow Sessions tab buttons, the per-session `Resume` button, and the command-palette FlowPilot option.
 - [ ] **Per-test transactional rollback in `test_db` fixture.** Bigger engineering than xdist (which we already shipped). Instead of `DROP SCHEMA public CASCADE` per test, wrap each test in a savepoint and rollback at teardown. ~30-40% additional speedup on top of xdist for test-DB-heavy tests. Real refactor; only worth it if the suite gets significantly larger or runs more frequently.
 - [ ] **Consider `pytest-testmon` for PR-time test selection.** Tracks which tests touched which source files and only re-runs affected ones. Best for small PRs touching ~few files. Adds cache-invalidation complexity; only worth it if the suite stays painfully long even after xdist.
+- [ ] **AssistantChatPage `currentChatRef` guard is a silent return** — `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview` all bail with `if (currentChatRef.current !== sentForChatId) return` when stale. This is by design for chat switching, but it also silently masked the prefill-ref bug fixed in PR #153 — the user just saw "no AI response" with no log, no toast, no Sentry event. Either (a) log a `console.warn`/Sentry breadcrumb on the mismatch path so future drift is visible, or (b) split "expected stale" (chat switch) from "unexpected stale" (ref never updated) so only the latter alerts. Pair with an audit of every `currentChatRef.current = ...` assignment vs every `setActiveChatId(...)` call to make sure they're paired everywhere.
--- a/.gitea/workflows/ci.yml
+++ b/.gitea/workflows/ci.yml
@@ -161,6 +161,12 @@ jobs:
      PLAYWRIGHT_SECRET_KEY: ci-playwright-secret-key
      PLAYWRIGHT_TEST_EMAIL: teamadmin@resolutionflow.example.com
      PLAYWRIGHT_TEST_PASSWORD: TestPass123!
+      # AI-touching endpoints (POST /ai-sessions, /chat, /respond, etc.) are
+      # gated by `_require_ai_enabled()`, which returns 503 when no provider
+      # key is set. Tests that exercise those flows stub the AI calls in the
+      # browser via `page.route`, so the backend never actually contacts
+      # Anthropic — but the gate still has to pass. A stub value is enough.
+      ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests

    steps:
      - uses: actions/checkout@v4
--- a/frontend/e2e/assistant-chat-prefill.spec.ts
+++ b/frontend/e2e/assistant-chat-prefill.spec.ts
@@ -0,0 +1,111 @@
+import { expect, test } from '@playwright/test'
+
+/**
+ * Regression test for the prefill-handoff `currentChatRef` bug.
+ *
+ * Symptom: a chat session created via the dashboard prefill flow
+ * looked fine on the first AI turn, but submitting partial answers
+ * from the task lane silently dropped the AI's follow-up response.
+ * The user saw their answers in the chat, no assistant reply, no
+ * toast.
+ *
+ * Root cause: the prefill effect in `AssistantChatPage` set
+ * `activeChatId` without also updating `currentChatRef.current`, so
+ * the `currentChatRef.current !== sentForChatId` guard in
+ * `handleTaskSubmit` (and `handleSend`) tripped on every subsequent
+ * request and discarded the AI response.
+ *
+ * Strategy: drive the real prefill flow against the real backend, but
+ * intercept the `/chat` endpoint with `page.route` so we get
+ * deterministic question payloads on turn 1 and a deterministic
+ * follow-up on turn 2. The fix is what makes turn 2 visible.
+ */
+test.describe('AssistantChatPage — prefill handoff regression', () => {
+  test('AI follow-up renders after submitting partial task lane answers', async ({ page }) => {
+    let chatCallCount = 0
+
+    // Clear any persisted active-chat-id so the page does not auto-resume a
+    // stale session left behind by a sibling spec.
+    await page.addInitScript(() => {
+      try {
+        sessionStorage.removeItem('rf-active-chat-id')
+        sessionStorage.removeItem('rf-tasklane-meta')
+      } catch { /* ignore */ }
+    })
+
+    // Intercept only the chat endpoint. Session creation, listSessions,
+    // facts, suggested-fixes, etc. all hit the real backend so the page
+    // renders normally — only the LLM call is deterministic. The pattern
+    // matches `/ai-sessions/<uuid>/chat` and nothing nested beneath it.
+    await page.route(/\/api\/v1\/ai-sessions\/[^/]+\/chat$/, async (route) => {
+      if (route.request().method() !== 'POST') {
+        await route.fallback()
+        return
+      }
+      chatCallCount += 1
+      if (chatCallCount === 1) {
+        await route.fulfill({
+          status: 200,
+          contentType: 'application/json',
+          body: JSON.stringify({
+            content: 'Initial diagnostic plan. Please answer the questions in the task lane.',
+            suggested_flows: [],
+            fork: null,
+            actions: [],
+            questions: [
+              { text: 'Has the user recently changed their password?' },
+              { text: 'Is the lockout happening at a consistent time of day?' },
+            ],
+          }),
+        })
+        return
+      }
+      await route.fulfill({
+        status: 200,
+        contentType: 'application/json',
+        body: JSON.stringify({
+          content: 'Got it — based on your answer, here is what to check next.',
+          suggested_flows: [],
+          fork: null,
+          actions: [],
+          questions: [],
+        }),
+      })
+    })
+
+    // Drive the prefill flow exactly the way the dashboard does. The textarea
+    // is keyed by its placeholder copy on QuickStartPage.
+    await page.goto('/')
+    const prefillBox = page.getByPlaceholder(/Describe the issue/i)
+    await expect(prefillBox).toBeVisible({ timeout: 10_000 })
+    await prefillBox.fill('User locked out of AD weekly')
+    await prefillBox.press('Enter')
+
+    // After the prefill submits we land on /pilot and the first stubbed AI
+    // turn surfaces the task-lane question text.
+    await expect(page).toHaveURL(/\/pilot/)
+    await expect(
+      page.getByText('Has the user recently changed their password?'),
+    ).toBeVisible({ timeout: 15_000 })
+
+    // Answer the first question. UI flow: click "Answer" to open the
+    // textarea, type, click the inline "Answer" button to mark done.
+    await page.getByRole('button', { name: /^Answer$/ }).first().click()
+    await page.getByPlaceholder('Type your answer...').fill('No, password is months old')
+    await page.getByRole('button', { name: /^Answer$/ }).first().click()
+
+    // Submit the partial response. Pre-fix: the response was silently dropped
+    // here because `currentChatRef.current` still held the mount-time value.
+    await page.getByRole('button', { name: /Send 1 of 2 Responses/ }).click()
+
+    // Bug repro: the assistant message must render. Pre-fix this assertion
+    // fails because `handleTaskSubmit` early-returns at the
+    // `currentChatRef.current !== sentForChatId` guard.
+    await expect(
+      page.getByText('Got it — based on your answer, here is what to check next.'),
+    ).toBeVisible({ timeout: 15_000 })
+
+    // Both chat calls must have actually happened.
+    expect(chatCallCount).toBe(2)
+  })
+})
--- a/frontend/src/pages/AssistantChatPage.tsx
+++ b/frontend/src/pages/AssistantChatPage.tsx
@@ -255,6 +255,12 @@ export default function AssistantChatPage() {
        }
        setChats(prev => [chatItem, ...prev])
        setActiveChatId(session.session_id)
+        // Keep the in-flight guard ref in sync. Without this, currentChatRef
+        // stays at its mount-time value (often a stale id from sessionStorage
+        // or null), so subsequent handleSend / handleTaskSubmit calls bail at
+        // their `currentChatRef.current !== sentForChatId` check and the AI
+        // response is silently dropped.
+        currentChatRef.current = session.session_id
        setMessages([{ role: 'user', content: prefill }])
        setLoading(true)