Update HANDOFF to reflect:
- Build paused after the WIP SSE commit (87bd0b7)
- What Codex should look at on the SSE bus + endpoint + dispatch wiring
- Resume point post-review: re-run tests with -n auto, then frontend
SSE subscription, then magic-moment screen
- Test-suite watch-out: per-test DROP SCHEMA fixture means concurrent
pytest runs on the same DB collide; always one-suite-at-a-time or
-n auto with conftest's per-worker DB isolation
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
7.3 KiB
HANDOFF.md
Last updated: 2026-04-27 EDT (paused mid-build for Codex review)
Active task: Escalation Mode wedge build. See CURRENT_TASK.md for the full status; this file holds the resume point only.
Branch: feat/escalation-metric-endpoint — six commits stacked on main (c0ed6d9). Working tree has UNCOMMITTED WIP for the SSE push.
Status — paused for Codex review
Build is paused mid-flight on the SSE push. Hand the branch (and the WIP) to Codex for an outside-voice pass before stacking more commits, fixing tests, or pushing. Reasons: local backend test loop got tangled (multiple stale pytest processes contended on the same Postgres test schema; the suite design rebuilds the schema per test which doesn't tolerate concurrent runs well), and the SSE work is the kind of cross-layer surface a second pair of eyes is most valuable on.
What Codex should look at:
- The new SSE endpoint at
backend/app/api/endpoints/session_handoffs.py—stream_escalations— and the in-memory pub/sub bus atbackend/app/core/escalation_bus.py. - Whether the bus's single-process / non-durable design is acceptable for the v1 pilot (Railway single-replica) and what the swap-to-Redis story should look like.
- The dispatch wiring in
backend/app/services/handoff_manager.py—dispatch_escalation_notificationsnow publishes to the bus before the email fan-out. Race / ordering / failure-mode review. - Auth on the SSE stream — same
require_engineer_or_admindep as/queueand/claim. Browsers can't send custom headers via the nativeEventSourceAPI; the planned frontend uses a fetch-basedReadableStreamreader (matching the existingstreamDocumentationpattern infrontend/src/api/aiSessions.ts). Verify that's the right call vs. a query-token scheme. - Whether the bus's "drop-on-full-queue" semantic is acceptable, given a stuck subscriber would silently miss live-arrival cards (they'd still see them on next page load via REST
/queue).
Resume point (after Codex review)
- Get the test suite back to green. Stale pytest zombies in the container were cleared (PIDs 1790034, 1844996, 1883167, 1916565, 1935830, 2009437, 2009449 — all dead, parent uvicorn-reload didn't reap them; PID slots remain but no live processes). Re-run with
pytest -n autoto keep wall-clock manageable. Files:tests/test_escalation_bus.py(7 tests), the 4 new dispatch + SSE tests intests/test_handoff_manager.pyandtests/test_session_handoffs_api.py. - Frontend SSE subscription in
EscalationQueue.tsx— fetch-based reader, prepend new cards with the locked 200ms slide-in, reconnect with backoff, tab-title flash when backgrounded, respectprefers-reduced-motion. Then ship the magic-moment handoff-context screen (4 sections, dissolves into FlowPilot session view). - Push the branch + open a draft PR.
Stack
WIP (uncommitted): SSE bus + endpoint + dispatcher publish + 7 bus tests + 1 dispatcher test + 2 SSE endpoint tests
a283d0d docs(ai): refresh handoff state mid-flight on Escalation Mode build
9f0bfd4 feat(escalations): mount time-to-first-action stat-card on /escalations
07d0db9 feat(handoff): email engineer-or-admin teammates on escalation
7a5b853 feat(api): role-gate handoff claim to engineer-or-admin
52f6d03 feat(analytics): add escalation time-to-first-action metric endpoint
d51e95c docs(plans): add escalation-mode wedge design + test plan
Where things stand
- CI on
mainstill healthy. Branch protection:CI / frontend (pull_request)required,CI / backend (pull_request)required,CI / e2e (pull_request)not yet required. - The 20 tests passing as of
9f0bfd4are still passing (last green run logged before the SSE work). The newly added SSE tests (7 bus + 1 dispatcher integration + 2 endpoint) HAVE NOT been verified end-to-end this session — they ran clean on the bus suite alone (7/7 in 0.14s) but the DB-backed integration tests were aborted before completing. - The plan doc at
docs/plans/2026-04-27-escalation-mode-wedge-design.mdis the source of truth for every UI / metric / scope decision. The embedded GSTACK REVIEW REPORT at the bottom shows Eng + Design CLEARED and Codex INFO from the design-stage pass.
Useful breadcrumbs
- Metric endpoint:
backend/app/api/endpoints/flowpilot_analytics.py—get_escalation_metricsat the bottom. - Notification dispatch (email + bus publish):
backend/app/services/handoff_manager.py—dispatch_escalation_notifications. Wired inbackend/app/api/endpoints/session_handoffs.pyafterdb.commit()so a rolled-back handoff never emails or fans out. - SSE endpoint (WIP):
backend/app/api/endpoints/session_handoffs.py—stream_escalations. Heartbeat every 25s, account-scoped subscribe, role-gated to engineer-or-admin. - Pub/sub bus (WIP):
backend/app/core/escalation_bus.py. Module-level singleton, in-memory,asyncio.Queueper subscriber with 64-event maxsize and drop-on-full semantics. - Frontend stat-card:
frontend/src/components/flowpilot/EscalationMetricCard.tsx. Rendersn_with_action / n_claimed, avg + median, and the metric_definition disclaimer. - Two-metric framing — required reading before quoting any number to a pilot. In-product endpoint measures post-claim time-to-first-action; the savings claim is
manual_baseline − in_product. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). - The
notification_sentboolean is intentionally NOT being written. Per Codex's design-stage correction it should be replaced by per-channel delivery records; v1.x story. For now application logs are the audit trail.
Watch-outs
ai_session_stephas NOuser_idcolumn — the metric query keys "first action by senior" offsession_id + created_at > claimed_at. Fine for v1 because session activity post-claim IS the senior's activity (session reactivates underescalated_to_id).account_idis denormalized onai_session_step(Phase 4 RLS pattern). Use it directly; don't join throughai_sessions.- POST
/handoffstill requires the session owner to be the escalator (AISession.user_id == current_user.id). Peer-tech escalation is a v2 TODO. - The test suite uses
DROP SCHEMA public CASCADE+CREATE SCHEMA publicper test (seebackend/tests/conftest.py:144). Concurrent pytest runs against the same test DB collide. Always run one suite at a time, or via-n autoxdist with the per-worker-DB isolation already in conftest.
Kill-switch (week 8)
If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.