fix(tests): stabilize escalation SSE backend tests
Co-Authored-By: Codex <noreply@openai.com>
This commit is contained in:
@@ -2,64 +2,50 @@
|
||||
|
||||
# HANDOFF.md
|
||||
|
||||
**Last updated:** 2026-04-27 EDT (paused mid-build for Codex review)
|
||||
**Last updated:** 2026-04-27 EDT
|
||||
|
||||
**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only.
|
||||
|
||||
**Branch:** `feat/escalation-metric-endpoint` — six commits stacked on `main` (`c0ed6d9`). Working tree has UNCOMMITTED WIP for the SSE push.
|
||||
**Branch:** `feat/escalation-metric-endpoint` — SSE backend WIP is now test-stabilized locally. Working tree should be clean after the handoff commit.
|
||||
|
||||
## Status — paused for Codex review
|
||||
## Status
|
||||
|
||||
Build is paused mid-flight on the SSE push. Hand the branch (and the WIP) to Codex for an outside-voice pass before stacking more commits, fixing tests, or pushing. Reasons: local backend test loop got tangled (multiple stale pytest processes contended on the same Postgres test schema; the suite design rebuilds the schema per test which doesn't tolerate concurrent runs well), and the SSE work is the kind of cross-layer surface a second pair of eyes is most valuable on.
|
||||
Previous session diagnosed the slow-test issue and fixed the backend test loop.
|
||||
|
||||
What Codex should look at:
|
||||
1. The new SSE endpoint at [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations` — and the in-memory pub/sub bus at [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py).
|
||||
2. Whether the bus's single-process / non-durable design is acceptable for the v1 pilot (Railway single-replica) and what the swap-to-Redis story should look like.
|
||||
3. The dispatch wiring in [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications` now publishes to the bus before the email fan-out. Race / ordering / failure-mode review.
|
||||
4. Auth on the SSE stream — same `require_engineer_or_admin` dep as `/queue` and `/claim`. Browsers can't send custom headers via the native `EventSource` API; the planned frontend uses a fetch-based `ReadableStream` reader (matching the existing `streamDocumentation` pattern in [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts)). Verify that's the right call vs. a query-token scheme.
|
||||
5. Whether the bus's "drop-on-full-queue" semantic is acceptable, given a stuck subscriber would silently miss live-arrival cards (they'd still see them on next page load via REST `/queue`).
|
||||
Root causes:
|
||||
- Multiple stale pytest processes were still alive inside `resolutionflow_backend`, despite the prior handoff saying they were dead. They held `resolutionflow_test` transactions open and caused later tests to block on `DROP SCHEMA public CASCADE`.
|
||||
- `test_escalations_stream_returns_sse_content_type` used HTTPX `ASGITransport` against an infinite SSE stream. That transport buffers the entire response body before returning, so the test waited forever and held the auth DB dependency transaction open.
|
||||
- Escalation handoff tests created `intent="escalate"` handoffs without stubbing `_generate_ai_assessment()`, so they waited on the real AI path instead of testing handoff behavior.
|
||||
- The bus keyed subscribers by raw `account_id`; string UUIDs and `UUID` objects for the same account did not match.
|
||||
|
||||
## Resume point (after Codex review)
|
||||
Fixes made:
|
||||
- `stream_escalations` now uses `Depends(require_engineer_or_admin, scope="function")` so auth DB dependencies are released before the long-lived stream body.
|
||||
- The SSE handshake test now calls `stream_escalations()` directly and consumes only the first generator yield, avoiding HTTPX's infinite-stream buffering behavior.
|
||||
- Handoff manager/API tests stub `_generate_ai_assessment()` with an `AsyncMock`.
|
||||
- `EscalationBus` normalizes string/UUID account IDs at subscribe/publish/unsubscribe/subscriber_count boundaries, with a regression test.
|
||||
|
||||
1. **Get the test suite back to green.** Stale pytest zombies in the container were cleared (PIDs 1790034, 1844996, 1883167, 1916565, 1935830, 2009437, 2009449 — all dead, parent uvicorn-reload didn't reap them; PID slots remain but no live processes). Re-run with `pytest -n auto` to keep wall-clock manageable. Files: `tests/test_escalation_bus.py` (7 tests), the 4 new dispatch + SSE tests in `tests/test_handoff_manager.py` and `tests/test_session_handoffs_api.py`.
|
||||
2. **Frontend SSE subscription** in `EscalationQueue.tsx` — fetch-based reader, prepend new cards with the locked 200ms slide-in, reconnect with backoff, tab-title flash when backgrounded, respect `prefers-reduced-motion`. Then ship the magic-moment handoff-context screen (4 sections, dissolves into FlowPilot session view).
|
||||
3. Push the branch + open a draft PR.
|
||||
Verified:
|
||||
- `pytest tests/test_escalation_bus.py tests/test_handoff_manager.py tests/test_session_handoffs_api.py tests/test_flowpilot_analytics_escalations.py --override-ini=addopts= -q --durations=20` → `31 passed in 46.95s`
|
||||
- Same subset with `-n auto` → `31 passed in 17.80s`
|
||||
- No remaining pytest processes or `resolutionflow%test%` Postgres sessions after the run.
|
||||
|
||||
## Stack
|
||||
## Resume point
|
||||
|
||||
```
|
||||
WIP (uncommitted): SSE bus + endpoint + dispatcher publish + 7 bus tests + 1 dispatcher test + 2 SSE endpoint tests
|
||||
a283d0d docs(ai): refresh handoff state mid-flight on Escalation Mode build
|
||||
9f0bfd4 feat(escalations): mount time-to-first-action stat-card on /escalations
|
||||
07d0db9 feat(handoff): email engineer-or-admin teammates on escalation
|
||||
7a5b853 feat(api): role-gate handoff claim to engineer-or-admin
|
||||
52f6d03 feat(analytics): add escalation time-to-first-action metric endpoint
|
||||
d51e95c docs(plans): add escalation-mode wedge design + test plan
|
||||
```
|
||||
|
||||
## Where things stand
|
||||
|
||||
- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required.
|
||||
- The 20 tests passing as of `9f0bfd4` are still passing (last green run logged before the SSE work). The newly added SSE tests (7 bus + 1 dispatcher integration + 2 endpoint) HAVE NOT been verified end-to-end this session — they ran clean on the bus suite alone (7/7 in 0.14s) but the DB-backed integration tests were aborted before completing.
|
||||
- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO from the design-stage pass.
|
||||
1. Continue the **Frontend SSE subscription** in `EscalationQueue.tsx`: fetch-based reader, prepend new cards with the locked 200ms slide-in, reconnect with backoff, tab-title flash when backgrounded, respect `prefers-reduced-motion`.
|
||||
2. Then ship the **magic-moment handoff-context screen**: 4 sections (problem summary / what's been tried / AI assessment / Start here CTA), loads on Pick Up, then dissolves into regular FlowPilot session view.
|
||||
3. Push the branch and open a draft PR when the frontend/live-arrival slice is ready.
|
||||
|
||||
## Useful breadcrumbs
|
||||
|
||||
- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom.
|
||||
- Notification dispatch (email + bus publish): [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails or fans out.
|
||||
- SSE endpoint (WIP): [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`. Heartbeat every 25s, account-scoped subscribe, role-gated to engineer-or-admin.
|
||||
- Pub/sub bus (WIP): [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py). Module-level singleton, in-memory, `asyncio.Queue` per subscriber with 64-event maxsize and drop-on-full semantics.
|
||||
- Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer.
|
||||
- Two-metric framing — required reading before quoting any number to a pilot. In-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc).
|
||||
- The `notification_sent` boolean is intentionally NOT being written. Per Codex's design-stage correction it should be replaced by per-channel delivery records; v1.x story. For now application logs are the audit trail.
|
||||
- SSE endpoint: [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`.
|
||||
- Pub/sub bus: [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py). In-memory, account-scoped, non-durable, 64-event per-subscriber queue, drop-on-full.
|
||||
- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`, called after `db.commit()` in the handoff endpoint.
|
||||
- Frontend streaming reference: [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts) — `streamDocumentation` uses fetch + `ReadableStream`, which remains the right pattern because native `EventSource` cannot send auth headers.
|
||||
- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics`.
|
||||
|
||||
## Watch-outs
|
||||
|
||||
- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`. Fine for v1 because session activity post-claim IS the senior's activity (session reactivates under `escalated_to_id`).
|
||||
- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). Use it directly; don't join through `ai_sessions`.
|
||||
- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is a v2 TODO.
|
||||
- The test suite uses `DROP SCHEMA public CASCADE` + `CREATE SCHEMA public` per test (see [`backend/tests/conftest.py:144`](../backend/tests/conftest.py#L144)). Concurrent pytest runs against the same test DB collide. Always run one suite at a time, or via `-n auto` xdist with the per-worker-DB isolation already in conftest.
|
||||
|
||||
## Kill-switch (week 8)
|
||||
|
||||
If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.
|
||||
- Do not reintroduce `client.stream()`/ASGITransport tests for infinite SSE responses; test the generator directly or use a real server-level test.
|
||||
- `DROP SCHEMA public CASCADE` per test is still the dominant cost: DB-backed tests spend ~1.7-2.8s in setup. Use `-n auto` for focused backend loops.
|
||||
- The bus is acceptable for v1 pilot scale only because Railway is single-replica. Redis pub/sub is the obvious swap when horizontal scaling appears.
|
||||
- Synchronous `_generate_ai_assessment()` during escalation creation remains product-latency risk; tests are now isolated from it, but the UX path should be watched as the magic-moment screen is built.
|
||||
|
||||
Reference in New Issue
Block a user