Files

Michael Chihlas bc15952857 fix(tests): stabilize escalation SSE backend tests

Co-Authored-By: Codex <noreply@openai.com>

2026-04-27 19:47:43 -04:00

17 KiB

Raw Blame History

SESSION_LOG.md

Append-only chronological record. Newest entries at the top. Skim when broader context is needed. Entry format:
## YYYY-MM-DD HH:MM <timezone> — <agent> — <one-line summary>
- What was accomplished
- What was left for next session
- Files touched

2026-04-27 19:50 EDT — Codex — Stabilize Escalation Mode SSE backend tests

Diagnosed slow backend tests on feat/escalation-metric-endpoint. Multiple stale pytest processes were still alive inside resolutionflow_backend and held resolutionflow_test transactions open, blocking later per-test schema resets on DROP SCHEMA public CASCADE.
Reproduced a deterministic hang in test_escalations_stream_returns_sse_content_type: HTTPX ASGITransport buffers the full response body before returning, so an infinite SSE response never yielded the initial chunk and kept the auth DB dependency transaction open.
Fixed stream_escalations to release auth dependencies before the long-lived stream body with Depends(..., scope="function").
Reworked the SSE handshake test to call stream_escalations() directly and consume one generator yield, then close it; kept viewer role-gate coverage through the API client.
Stubbed _generate_ai_assessment() in handoff manager/API tests so escalation handoff tests no longer wait on the real AI path.
Normalized account IDs inside EscalationBus so string UUIDs and UUID objects hit the same subscriber bucket; added a regression test.
Verified focused backend subset: serial 31 passed in 46.95s; xdist 31 passed in 17.80s. Confirmed no lingering pytest processes or test DB sessions afterward.
Left for next session: continue frontend SSE subscription in EscalationQueue.tsx, then the magic-moment handoff-context screen.
Files touched: backend/app/api/endpoints/session_handoffs.py, backend/app/core/escalation_bus.py, backend/tests/test_escalation_bus.py, backend/tests/test_handoff_manager.py, backend/tests/test_session_handoffs_api.py, .ai/HANDOFF.md, .ai/SESSION_LOG.md.

2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150

User reported a troubleshooting-session bug: after answering a subset of task-lane questions and clicking Send N of M Responses, no AI response appeared. Traced to AssistantChatPage: the dashboard prefill effect set activeChatId after creating a new chat session but never updated currentChatRef.current. The currentChatRef.current !== sentForChatId guard in handleSend and handleTaskSubmit then bailed silently on every later request and discarded the AI's reply. The user message was already pushed to the chat before the await, so the user saw their answers but nothing else.
Fix: one-line addition mirroring handleNewChat and handleResumeNew — assign currentChatRef.current = session.session_id immediately after setActiveChatId(session.session_id) in the prefill effect. Branched off origin/main as fix/tasklane-prefill-ref; PR #153 opened on Gitea.
Authored a Playwright regression test frontend/e2e/assistant-chat-prefill.spec.ts that drives the real dashboard prefill flow against the real backend, stubs /ai-sessions/*/chat with page.route for deterministic turn-1/turn-2 responses, and asserts the second AI message renders. Confirmed the test fails on unfixed code at the exact assertion (Got it — based on your answer… never appears) and passes once the fix is restored.
Verified locally inside mcr.microsoft.com/playwright:v1.58.2-noble against the running dev stack: new spec passes, adjacent flowpilot-chat spec still passes, tsc -b clean. resume.spec and history.spec failures observed are pre-existing real-backend fixture collisions, unrelated to this change.
First CI run on PR #153 failed on infrastructure issues already addressed by PR #150: backend hit Bind for 0.0.0.0:5432 failed: port is already allocated, frontend hit actions/upload-artifact@v4 not supported on GHES. PR #150 was already merged (commit 87bb20b on main). Rebased fix/tasklane-prefill-ref onto new main (force-push 1a8cb06 → 1559feb), resolved a .ai/TODO.md conflict by keeping both backlog item sets, kicked off CI on the rebased SHA.
Confirmed CI / backend (pull_request) is now in branch protection's required-status-checks list (added during PR #150 close-out). CI / e2e (pull_request) left as not-required pending one more clean PR run as the threshold.
Recorded the broader silent-return concern in TODO backlog: the currentChatRef.current !== sentForChatId guard is applied across handleSend, handleTaskSubmit, selectChat, refreshFacts, refreshActiveFix, and refreshPreview. PR #153 fixes one symptom but the same pattern can mask other drift. Either log a Sentry breadcrumb on the mismatch path or distinguish "expected stale" (chat switch) from "unexpected stale" (ref never updated) so the latter alerts.
First CI run on the rebased SHA passed backend and frontend but failed e2e: the new prefill regression test couldn't render the task-lane question text. Diagnosed via the job log: POST /api/v1/ai-sessions calls _require_ai_enabled() and returns 503 when no provider key is set. The e2e CI job had neither ANTHROPIC_API_KEY nor GOOGLE_AI_API_KEY in env. Locally the dev backend has a real key, hence the local pass. The Playwright page.route stub on /chat was correct but never had a chance to fire because the upstream session-creation call was 503-ing.
Fix: added a stub ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests to the e2e job env in .gitea/workflows/ci.yml. The Playwright stub still intercepts the actual /chat call in the browser, so the backend never contacts Anthropic — the gate just needs to clear. Documented the convention in a workflow comment so future AI-touching e2e tests know what to expect. Pushed 11fe32f; CI went all-green.
Merged PR #153 as 68fcdc6 on main. Local feature branch and remote both deleted via Gitea's delete_branch_after_merge.
Opened a small follow-up chore/post-153-handoff PR to refresh the now-stale .ai/ files (this entry, plus CURRENT_TASK.md rolling forward to "no active task — pick from TODO.md" and HANDOFF.md updating to the post-merge home position). The data-testid audit at the top of TODO.md "Up next" or the currentChatRef silent-return audit added in this session's backlog are the natural next pickups.
Files touched: frontend/src/pages/AssistantChatPage.tsx (the one-line fix + comment), frontend/e2e/assistant-chat-prefill.spec.ts (new regression test), .gitea/workflows/ci.yml (stub ANTHROPIC_API_KEY for e2e), .ai/TODO.md (silent-return follow-up entry, plus conflict resolution preserving PR #150's backlog additions), .ai/CURRENT_TASK.md, .ai/HANDOFF.md, .ai/SESSION_LOG.md (this entry).

2026-04-25 16:41 EDT — Codex — Stabilize PR #150 e2e selectors

Investigated the remaining PR #150 failure after backend and frontend CI were green. The e2e resume smoke test was not failing because of product behavior; it used .bg-card plus text filtering and matched the tree filter <select> before the intended session card.
Added stable test IDs to flow session, tree, and share cards, then updated affected e2e tests to target those cards instead of Tailwind class names.
Hardened the CI workflow by making Postgres healthchecks authenticate as postgres and baking VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}" into the e2e frontend build.
Verified with git diff --check, frontend build in Docker, no remaining .bg-card e2e selectors, and focused Playwright runs in an Actions-like Ubuntu container: resume spec passed, then history/library/library-start/resume/shares passed (6 passed).
Left for next session: push this WIP commit to PR #150, watch CI, merge when all three jobs are green, then enable backend branch protection and consider the e2e gate after a reliable green run.
Files touched: .gitea/workflows/ci.yml, frontend/e2e/history.spec.ts, frontend/e2e/library-start.spec.ts, frontend/e2e/library.spec.ts, frontend/e2e/resume.spec.ts, frontend/e2e/shares.spec.ts, frontend/src/components/library/TreeGridView.tsx, frontend/src/components/library/TreeListView.tsx, frontend/src/pages/MySharesPage.tsx, frontend/src/pages/SessionHistoryPage.tsx, .ai/HANDOFF.md, .ai/CURRENT_TASK.md, .ai/SESSION_LOG.md.

2026-04-25 12:00 America/New_York — Claude Code — Mock final AI-provider test, cache CI deps, parallelize backend with pytest-xdist

Diagnosed why CI was still red despite Codex's local 1076 passed: a single test (test_record_decision_persists_and_bumps_state_version) needed ANTHROPIC_API_KEY because the decision: draft_template path calls TemplateExtractionService → AI provider. Patched _extract_template_parameters with an AsyncMock so the test no longer depends on AI availability. Verified.
Pushed Codex's WIP commit 49f8856 to PR #150 (had been local-only per handoff protocol).
PR #150 (fix/ci-workflow-config) extended with cheap CI wins: actions/cache@v3 for pip + npm in all three jobs; dropped --cov-report=term-missing (the custom display step parses JSON); added --maxfail=10 so structural breakage exits fast.
PR #151 (fix/ci-pytest-xdist) opened, stacked on #150: pytest-xdist with per-worker DB isolation. conftest.py reads PYTEST_XDIST_WORKER, computes a per-worker DB URL like …_gw0, and synchronously CREATEs the DB on first import. The per-test DROP SCHEMA public CASCADE then operates on the worker's isolated DB. Verified locally: backend suite went from 22m 27s serial → 4m 28s parallel (8 workers), 1076 passed in both cases. ~5× speedup.
Decided NOT to do per-test transactional rollback (bigger refactor); captured for future TODO consideration.
Left for next session: watch CI on both PRs, merge in order (#150 first, #151 second), then enable CI / backend (pull_request) as a required status check on main.
Files touched: backend/tests/test_session_suggested_fixes_api.py, backend/tests/conftest.py, backend/requirements-dev.txt, .gitea/workflows/ci.yml, .ai/HANDOFF.md, .ai/CURRENT_TASK.md, .ai/TODO.md.

2026-04-25 06:12 EDT — Codex — Fix backend suite to green

Fixed the real backend failures left after the CI-infra cleanup: tenant-scoped seed drift, missing production account_id writes, public route mounting for survey/share links, Script Builder library saves, resolution output async loading, AI search schema metadata, disabled-AI fixture leakage, and prompt marker guardrails.
Added backend CI/dev system packages required by WeasyPrint PDF export.
Stabilized the pytest harness for pytest-asyncio/asyncpg teardown ResourceWarnings under filterwarnings = error.
Verified pytest --override-ini="addopts=" -q inside resolutionflow_backend: 1076 passed, 35 deselected in 1347.41s.
Left for next session: commit/push if needed, check and merge PR #150 when Gitea CI is green, add backend CI as a required branch-protection check, and rerun frontend lint if final DoD requires it.
Files touched: .gitea/workflows/ci.yml, backend/Dockerfile.dev, backend/app/api/endpoints/folders.py, backend/app/api/endpoints/script_builder.py, backend/app/api/endpoints/shares.py, backend/app/api/router.py, backend/app/models/ai_session.py, backend/app/schemas/user.py, backend/app/services/assistant_chat_service.py, backend/app/services/resolution_output_generator.py, backend/app/services/script_builder_service.py, backend/pytest.ini, backend/tests/conftest.py, and focused backend tests.

2026-04-25 02:00 America/New_York — Claude Code — Land FlowPilot + PSA, recover CI from 488 errors to ~4

Started session by completing pending FlowPilot Phase 9 QA: ran /qa against the seeded fixtures, found and fixed four latent layout/state bugs (ResolutionNotePreview off-screen, TemplateMatchPanel deadlock when TaskLane closed, EscalateInterceptDialog clipped above viewport, seed_test_users.py cancel_at_period_end NOT NULL crash). Added a new fixture seeder backend/scripts/seed_phase9_qa_fixtures.py that pre-bakes the four backend states the AI orchestrator needs to emit, so future QA can exercise all 7 conditional Phase 9 components without depending on stochastic AI behavior.
Discovered PR #141 (PSA ticket management) and feat/flowpilot-migration had 5 overlapping files but only 2 real conflicts (CLAUDE.md, AssistantChatPage.tsx). Conflicts were both additive — concatenated rather than chose-a-side.
Merged PSA first (PR #141), then merged FlowPilot (PR #147), each through Gitea API. tsc -b clean and visual smoke-test confirmed PSA's Tickets sidebar coexists with Phase 9 ProposalBanner.
Discovered main had been merging through a broken CI gate for several merges. Initially recommended "stop the line, fix CI before shipping." After scoping the actual rot (~50% of tests red, ~600 errors on a clean run), reversed the recommendation: ship the queue first because FlowPilot itself carried significant test-infra repairs that would be duplicated work on a fresh recovery branch.
PR #148: two surgical fixes to main (network_diagrams JSONB server_default triple-quote bug, deprecated session-scoped event_loop fixture in conftest). +78 passing / -114 errors.
PR #149: frontend lint 20 errors → 0, requirements-dev.txt pytest pin bumped to satisfy pytest-asyncio==0.24.0's pytest>=8.2, and a one-line from app import models as _models in conftest that registers all ~60 models with Base.metadata before create_all. The conftest fix collapsed 484 of the remaining 488 backend errors. 1018 passed / 4 errors / 54 failed after.
Enabled Gitea branch protection on main: PR-only merges, CI / frontend (pull_request) required, force-push blocked, no review required.
Discovered CI on the merge commit STILL showed red despite local pytest being mostly green. Root cause: workflow only set DATABASE_URL, but conftest reads only DATABASE_TEST_URL (per dab740d's safety hardening). 638 connection-refused errors on every fixture setup. Plus actions/upload-artifact@v4 not supported by Gitea Actions. PR #150 fixes both.
Left for next session: merge PR #150 once CI confirms green, add CI / backend (pull_request) to required status checks, then root-cause and fix the 54 real backend test failures (one sample seen — test_user fixture leaking across calls causing duplicate-email violations).
Files touched (committed): backend/scripts/seed_test_users.py, backend/scripts/seed_phase9_qa_fixtures.py (new), backend/app/models/network_diagram.py, backend/tests/conftest.py, backend/requirements-dev.txt, frontend/src/components/pilot/ResolutionNotePreview.tsx, frontend/src/components/pilot/EscalateInterceptDialog.tsx, frontend/src/components/pilot/ScriptBuilderTab.tsx, frontend/src/pages/AssistantChatPage.tsx, frontend/src/pages/FlowPilotSessionPage.tsx, frontend/src/pages/TicketsPage.tsx, frontend/src/hooks/useFlowPilotSession.ts, frontend/src/hooks/useMediaQuery.ts, frontend/src/components/dashboard/TicketQueue.tsx, frontend/src/components/network/nodes/DeviceNode.tsx, frontend/src/components/network/nodes/GroupNode.tsx, frontend/src/components/routing/AssistantSessionRedirect.tsx (new), frontend/src/router.tsx, .gitea/workflows/ci.yml, .claude/settings.json (new), .claude/hooks/check-gstack.sh (new), .gitignore, CLAUDE.md, .gstack/qa-reports/phase9-*/ (QA artifacts).
Net merges to main: PR #141 (PSA), PR #147 (FlowPilot), PR #148 (CI fixes part 1), PR #149 (CI fixes part 2). PR #150 still open at session end.

2026-04-24 — Claude Code — Migrate to dual-agent handoff system

Split CLAUDE.md into .ai/PROJECT_CONTEXT.md + shared-protocol root files (CLAUDE.md, AGENTS.md).
Seeded CURRENT_TASK.md, HANDOFF.md, TODO.md, DECISIONS.md, SESSION_LOG.md, README.md.
Deleted legacy SESSION-HANDOFF.md (superseded).
Left for next session: first real feature task should replace the seed CURRENT_TASK.md and update HANDOFF.md with real resume state.
Files touched: .ai/*.md (created), CLAUDE.md (rewritten), AGENTS.md (created), SESSION-HANDOFF.md (deleted).
Follow-up (same day): Codex review pass flagged stale SaaS-role claim and incomplete file-listings carried over from the pre-migration CLAUDE.md. Verified against backend/app/core/permissions.py, frontend/src/hooks/usePermissions.ts, backend/app/api/deps.py, backend/app/api/router.py, and backend/app/services/psa/. Corrected PROJECT_CONTEXT.md role hierarchy (super_admin > owner > engineer > viewer, not team_admin), added require_account_owner / require_team_admin to deps list, replaced stale endpoint comment with a summary pointing at api/router.py, added exceptions.py + ticket_context.py to the PSA file list. Also replaced seed-example content in CURRENT_TASK.md and TODO.md with clearer empty-state sentinels.
Branch cleanup (same day): committed pending test-isolation work as b14a16a chore(tests): gate RLS tests behind RUN_RLS_TESTS flag, new Phase 9 review doc as b3506b5 docs(pilot): phase 9 review issues, and .remember/ gitignore entry as b3be1e0 chore: ignore .remember/ skill runtime state. Deleted docs/landing-handoff/ (prepared for external design work, not meant to live in the repo). Working tree clean; 3 cleanup commits unpushed.

17 KiB Raw Blame History Unescape Escape

SESSION_LOG.md

2026-04-27 19:50 EDT — Codex — Stabilize Escalation Mode SSE backend tests

2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill currentChatRef fix; close out PR #150

2026-04-25 16:41 EDT — Codex — Stabilize PR #150 e2e selectors

2026-04-25 12:00 America/New_York — Claude Code — Mock final AI-provider test, cache CI deps, parallelize backend with pytest-xdist

2026-04-25 06:12 EDT — Codex — Fix backend suite to green

2026-04-25 02:00 America/New_York — Claude Code — Land FlowPilot + PSA, recover CI from 488 errors to ~4

2026-04-24 — Claude Code — Migrate to dual-agent handoff system

17 KiB

Raw Blame History

2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150