diff --git a/.ai/HANDOFF.md b/.ai/HANDOFF.md index 4c96a7ab..fbf41e8a 100644 --- a/.ai/HANDOFF.md +++ b/.ai/HANDOFF.md @@ -2,13 +2,34 @@ # HANDOFF.md -**Last updated:** 2026-04-27 EDT +**Last updated:** 2026-04-27 EDT (paused mid-build for Codex review) **Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only. -**Branch:** `feat/escalation-metric-endpoint` — five commits stacked on top of `main` (`c0ed6d9`). Nothing pushed yet. +**Branch:** `feat/escalation-metric-endpoint` — six commits stacked on `main` (`c0ed6d9`). Working tree has UNCOMMITTED WIP for the SSE push. + +## Status — paused for Codex review + +Build is paused mid-flight on the SSE push. Hand the branch (and the WIP) to Codex for an outside-voice pass before stacking more commits, fixing tests, or pushing. Reasons: local backend test loop got tangled (multiple stale pytest processes contended on the same Postgres test schema; the suite design rebuilds the schema per test which doesn't tolerate concurrent runs well), and the SSE work is the kind of cross-layer surface a second pair of eyes is most valuable on. + +What Codex should look at: +1. The new SSE endpoint at [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations` — and the in-memory pub/sub bus at [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py). +2. Whether the bus's single-process / non-durable design is acceptable for the v1 pilot (Railway single-replica) and what the swap-to-Redis story should look like. +3. The dispatch wiring in [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications` now publishes to the bus before the email fan-out. Race / ordering / failure-mode review. +4. Auth on the SSE stream — same `require_engineer_or_admin` dep as `/queue` and `/claim`. Browsers can't send custom headers via the native `EventSource` API; the planned frontend uses a fetch-based `ReadableStream` reader (matching the existing `streamDocumentation` pattern in [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts)). Verify that's the right call vs. a query-token scheme. +5. Whether the bus's "drop-on-full-queue" semantic is acceptable, given a stuck subscriber would silently miss live-arrival cards (they'd still see them on next page load via REST `/queue`). + +## Resume point (after Codex review) + +1. **Get the test suite back to green.** Stale pytest zombies in the container were cleared (PIDs 1790034, 1844996, 1883167, 1916565, 1935830, 2009437, 2009449 — all dead, parent uvicorn-reload didn't reap them; PID slots remain but no live processes). Re-run with `pytest -n auto` to keep wall-clock manageable. Files: `tests/test_escalation_bus.py` (7 tests), the 4 new dispatch + SSE tests in `tests/test_handoff_manager.py` and `tests/test_session_handoffs_api.py`. +2. **Frontend SSE subscription** in `EscalationQueue.tsx` — fetch-based reader, prepend new cards with the locked 200ms slide-in, reconnect with backoff, tab-title flash when backgrounded, respect `prefers-reduced-motion`. Then ship the magic-moment handoff-context screen (4 sections, dissolves into FlowPilot session view). +3. Push the branch + open a draft PR. + +## Stack ``` +WIP (uncommitted): SSE bus + endpoint + dispatcher publish + 7 bus tests + 1 dispatcher test + 2 SSE endpoint tests +a283d0d docs(ai): refresh handoff state mid-flight on Escalation Mode build 9f0bfd4 feat(escalations): mount time-to-first-action stat-card on /escalations 07d0db9 feat(handoff): email engineer-or-admin teammates on escalation 7a5b853 feat(api): role-gate handoff claim to engineer-or-admin @@ -16,34 +37,28 @@ d51e95c docs(plans): add escalation-mode wedge design + test plan ``` -## Resume point - -Pick up the **WebSocket/SSE push** — the live-arrival half of the notification dual-path. Email is already wired (commit `07d0db9`); push is the second channel that makes the demo's "30-second magic moment" undeniable when the receiving senior is online and on the queue page. - -Suggested first slice: a thin server-side SSE endpoint scoped to `current_user.account_id`, fan out from `HandoffManager.dispatch_escalation_notifications` (alongside email), and hook the frontend `EscalationQueue` to subscribe and prepend new cards with the locked 200ms slide-in. Reconnect logic, tab-title flash, and `prefers-reduced-motion` respect are part of this slice per the locked UI spec in the design doc. - -After the dual-path is feature-complete, the **magic-moment handoff-context screen** is next (4 sections, dissolves into the FlowPilot session view on first action). - ## Where things stand -- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required (ops-only follow-up — two consecutive green runs cleared the threshold). -- 20 backend tests green on this branch (handoff_manager, session_handoffs_api, flowpilot_analytics_escalations). Frontend `tsc -b` clean. Branch has not been pushed; no CI runs yet. -- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO with the disposition of all 12 of its findings. +- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required. +- The 20 tests passing as of `9f0bfd4` are still passing (last green run logged before the SSE work). The newly added SSE tests (7 bus + 1 dispatcher integration + 2 endpoint) HAVE NOT been verified end-to-end this session — they ran clean on the bus suite alone (7/7 in 0.14s) but the DB-backed integration tests were aborted before completing. +- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO from the design-stage pass. ## Useful breadcrumbs -- New endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom of the file. -- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails. +- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom. +- Notification dispatch (email + bus publish): [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails or fans out. +- SSE endpoint (WIP): [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`. Heartbeat every 25s, account-scoped subscribe, role-gated to engineer-or-admin. +- Pub/sub bus (WIP): [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py). Module-level singleton, in-memory, `asyncio.Queue` per subscriber with 64-event maxsize and drop-on-full semantics. - Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer. -- Two-metric framing — required reading before quoting any number to a pilot. The in-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). -- The `notification_sent` boolean is intentionally NOT being written. Per Codex's correction it should be replaced by per-channel delivery records; v1.x story. For now, application logs are the audit trail. -- Two TODOs added during this session: peer-tech escalation (deferred to v2) and the (already moved-in-scope) claim role gate. See [`TODO.md`](TODO.md). +- Two-metric framing — required reading before quoting any number to a pilot. In-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). +- The `notification_sent` boolean is intentionally NOT being written. Per Codex's design-stage correction it should be replaced by per-channel delivery records; v1.x story. For now application logs are the audit trail. ## Watch-outs -- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`, which is fine because session activity post-claim IS the senior's activity (the session is reactivated under `escalated_to_id`). If a future change adds `user_id` to `ai_session_step`, the metric query can become more precise. -- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). The metric query and any new SSE subscription scoping must use it directly, not join through `ai_sessions`. -- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is captured as a v2 TODO. Don't widen this without a UX decision. +- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`. Fine for v1 because session activity post-claim IS the senior's activity (session reactivates under `escalated_to_id`). +- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). Use it directly; don't join through `ai_sessions`. +- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is a v2 TODO. +- The test suite uses `DROP SCHEMA public CASCADE` + `CREATE SCHEMA public` per test (see [`backend/tests/conftest.py:144`](../backend/tests/conftest.py#L144)). Concurrent pytest runs against the same test DB collide. Always run one suite at a time, or via `-n auto` xdist with the per-worker-DB isolation already in conftest. ## Kill-switch (week 8)