Files

Michael Chihlas bc15952857 fix(tests): stabilize escalation SSE backend tests

Co-Authored-By: Codex <noreply@openai.com>

2026-04-27 19:47:43 -04:00

4.4 KiB

Raw Blame History

HANDOFF.md

Last updated: 2026-04-27 EDT

Active task: Escalation Mode wedge build. See CURRENT_TASK.md for the full status; this file holds the resume point only.

Branch: feat/escalation-metric-endpoint — SSE backend WIP is now test-stabilized locally. Working tree should be clean after the handoff commit.

Status

Previous session diagnosed the slow-test issue and fixed the backend test loop.

Root causes:

Multiple stale pytest processes were still alive inside resolutionflow_backend, despite the prior handoff saying they were dead. They held resolutionflow_test transactions open and caused later tests to block on DROP SCHEMA public CASCADE.
test_escalations_stream_returns_sse_content_type used HTTPX ASGITransport against an infinite SSE stream. That transport buffers the entire response body before returning, so the test waited forever and held the auth DB dependency transaction open.
Escalation handoff tests created intent="escalate" handoffs without stubbing _generate_ai_assessment(), so they waited on the real AI path instead of testing handoff behavior.
The bus keyed subscribers by raw account_id; string UUIDs and UUID objects for the same account did not match.

Fixes made:

stream_escalations now uses Depends(require_engineer_or_admin, scope="function") so auth DB dependencies are released before the long-lived stream body.
The SSE handshake test now calls stream_escalations() directly and consumes only the first generator yield, avoiding HTTPX's infinite-stream buffering behavior.
Handoff manager/API tests stub _generate_ai_assessment() with an AsyncMock.
EscalationBus normalizes string/UUID account IDs at subscribe/publish/unsubscribe/subscriber_count boundaries, with a regression test.

Verified:

pytest tests/test_escalation_bus.py tests/test_handoff_manager.py tests/test_session_handoffs_api.py tests/test_flowpilot_analytics_escalations.py --override-ini=addopts= -q --durations=20 → 31 passed in 46.95s
Same subset with -n auto → 31 passed in 17.80s
No remaining pytest processes or resolutionflow%test% Postgres sessions after the run.

Resume point

Continue the Frontend SSE subscription in EscalationQueue.tsx: fetch-based reader, prepend new cards with the locked 200ms slide-in, reconnect with backoff, tab-title flash when backgrounded, respect prefers-reduced-motion.
Then ship the magic-moment handoff-context screen: 4 sections (problem summary / what's been tried / AI assessment / Start here CTA), loads on Pick Up, then dissolves into regular FlowPilot session view.
Push the branch and open a draft PR when the frontend/live-arrival slice is ready.

Useful breadcrumbs

SSE endpoint: backend/app/api/endpoints/session_handoffs.py — stream_escalations.
Pub/sub bus: backend/app/core/escalation_bus.py. In-memory, account-scoped, non-durable, 64-event per-subscriber queue, drop-on-full.
Notification dispatch: backend/app/services/handoff_manager.py — dispatch_escalation_notifications, called after db.commit() in the handoff endpoint.
Frontend streaming reference: frontend/src/api/aiSessions.ts — streamDocumentation uses fetch + ReadableStream, which remains the right pattern because native EventSource cannot send auth headers.
Metric endpoint: backend/app/api/endpoints/flowpilot_analytics.py — get_escalation_metrics.

Watch-outs

Do not reintroduce client.stream()/ASGITransport tests for infinite SSE responses; test the generator directly or use a real server-level test.
DROP SCHEMA public CASCADE per test is still the dominant cost: DB-backed tests spend ~1.7-2.8s in setup. Use -n auto for focused backend loops.
The bus is acceptable for v1 pilot scale only because Railway is single-replica. Redis pub/sub is the obvious swap when horizontal scaling appears.
Synchronous _generate_ai_assessment() during escalation creation remains product-latency risk; tests are now isolated from it, but the UX path should be watched as the magic-moment screen is built.

4.4 KiB Raw Blame History