docs(ai): refresh handoff state mid-flight on Escalation Mode build

Capture the in-flight state of the Escalation Mode wedge build so the next session (or Codex resume) picks up cleanly without re-deriving context: - CURRENT_TASK now describes the wedge, what's done across the 5 commits on this branch, what remains (WebSocket push, magic-moment screen, analytics page, e2e), and the two-metric framing readers MUST internalize before quoting numbers - HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the notification dual-path); includes suggested first slice + watch-outs (no user_id on ai_session_step, denormalized account_id, peer-escalation still gated to session owner) - Both files reference the design doc and the kill-switch criterion No code change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 16:38:14 -04:00
parent 9f0bfd44f9
commit a283d0d3fd
2 changed files with 71 additions and 27 deletions
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -1,20 +1,42 @@
 # CURRENT_TASK.md

-**Task:** No active task — pick from [`TODO.md`](TODO.md).
+**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.

-**Status:** ready for next pickup.
+**Status:** in-flight on `feat/escalation-mode` (currently `feat/escalation-metric-endpoint`). Backend metric + role gate + email notification shipped. Frontend stat-card mounted. **Next:** WebSocket/SSE push (live-arrival half of the dual-path) and the magic-moment handoff-context screen.

-## Recommended next moves
+**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim-role-gate + per-channel notification model all applied to the plan and the code.

-1. **Promote `CI / e2e (pull_request)` to required on `main`.** Two consecutive PR runs (#150 and #153) have now finished green on the e2e job. That was the threshold the prior CI-recovery task set for promoting it. Branch protection update only — no code change.
-2. **Pick a backlog item.** Top of `TODO.md` "Up next" is the `data-testid` e2e-stability work (PR #152 spent five one-line selector updates chasing UI churn — adding stable test IDs to a small set of high-value elements would make those tests immune to copy/route renames). The new `currentChatRef` silent-return audit added in #153's session is in Backlog and is a natural pairing with the bug fix that was just shipped.
+**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once the build is feature-complete.
+
+## Done so far on `feat/escalation-metric-endpoint`
+
+| Commit | What it ships |
+|---|---|
+| `d51e95c` | Plan + test-plan artifacts checked in |
+| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated; 9 tests including multi-tenant isolation |
+| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin (was viewer-claimable); 2 tests |
+| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression test; 4 tests |
+| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list; consumes the new endpoint; matches DESIGN-SYSTEM tokens |
+
+20 backend tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. Frontend `tsc -b` clean. Nothing pushed yet.
+
+## Remaining work on this branch
+
+1. **WebSocket/SSE push** for live escalation arrival in the queue — the second half of the notification dual-path. Senior already on the queue page sees a new card slide in within ~1s of the junior hitting Escalate. ~3-4 days of work split across multiple commits (connection manager, auth-scoped fan-out, frontend EventSource handling, reconnect, slide-in animation, tab-title flash).
+2. **Magic-moment handoff-context screen** — 4-section view (problem summary / what's been tried / AI assessment / Start here CTA) that loads on Pick Up before dissolving into the regular FlowPilot session view. ~1.5-2 days.
+3. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d.
+4. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
+
+## Two-metric framing — read this before quoting numbers to anyone
+
+The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
+
+## Kill-switch
+
+Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative (deterministic-ops territory) for context, but don't pivot before the data lands.

 ## Previous task — closed out

-**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane.
+**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`. E2e regression test now in the suite.

-**Status:** complete (2026-04-26).
-
- PR #153 merged as commit `68fcdc6` on `main`. Backend, frontend, and e2e all green on the merged SHA after the env-var fix.
- E2e CI needed a stub `ANTHROPIC_API_KEY` in the workflow so the AI-gated `POST /api/v1/ai-sessions` endpoint stops returning 503; the Playwright `page.route` stub still intercepts the actual `/chat` call in the browser, so no real Anthropic traffic occurs.
- Regression test `frontend/e2e/assistant-chat-prefill.spec.ts` is part of the e2e suite going forward.
+**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green PR runs (#150 and #153) cleared the threshold. Ops-only.
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,27 +2,49 @@

 # HANDOFF.md

-**Last updated:** 2026-04-26 04:55 EDT
+**Last updated:** 2026-04-27 EDT

-**Active task:** None — pick from [`TODO.md`](TODO.md). See [`CURRENT_TASK.md`](CURRENT_TASK.md) for recommended next moves.
+**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only.

-**Branch:** `main` is the home position. Recent merges: PR #150 (CI recovery, `87bb20b`), PR #153 (prefill `currentChatRef` fix, `68fcdc6`).
+**Branch:** `feat/escalation-metric-endpoint` — five commits stacked on top of `main` (`c0ed6d9`). Nothing pushed yet.
+
+```
+9f0bfd4  feat(escalations): mount time-to-first-action stat-card on /escalations
+07d0db9  feat(handoff): email engineer-or-admin teammates on escalation
+7a5b853  feat(api): role-gate handoff claim to engineer-or-admin
+52f6d03  feat(analytics): add escalation time-to-first-action metric endpoint
+d51e95c  docs(plans): add escalation-mode wedge design + test plan
+```
+
+## Resume point
+
+Pick up the **WebSocket/SSE push** — the live-arrival half of the notification dual-path. Email is already wired (commit `07d0db9`); push is the second channel that makes the demo's "30-second magic moment" undeniable when the receiving senior is online and on the queue page.
+
+Suggested first slice: a thin server-side SSE endpoint scoped to `current_user.account_id`, fan out from `HandoffManager.dispatch_escalation_notifications` (alongside email), and hook the frontend `EscalationQueue` to subscribe and prepend new cards with the locked 200ms slide-in. Reconnect logic, tab-title flash, and `prefers-reduced-motion` respect are part of this slice per the locked UI spec in the design doc.
+
+After the dual-path is feature-complete, the **magic-moment handoff-context screen** is next (4 sections, dissolves into the FlowPilot session view on first action).

 ## Where things stand

- CI is healthy on `main`: backend, frontend, and e2e all green on the latest commits.
- Branch protection on `main`: PR-only merges, force-push blocked, **`CI / frontend (pull_request)` required**, **`CI / backend (pull_request)` required**, `CI / e2e (pull_request)` not yet required.
- Two consecutive PR runs (#150, #153) finished green on e2e. The "promote e2e to required" gate from the prior task is now satisfiable.
- Backend AI-gated endpoints (`POST /ai-sessions`, `/chat`, `/respond`, etc.) call `_require_ai_enabled()` and return 503 if no provider key is set. The e2e CI job now sets a stub `ANTHROPIC_API_KEY` so any future test that exercises those flows can rely on it; tests should still stub the actual AI calls in the browser via `page.route` so no real Anthropic traffic occurs.
-
-## Immediate next steps
-
-1. (Optional, ops-only) Promote `CI / e2e (pull_request)` to required on `main` in Gitea branch protection.
-2. Pick the next backlog item from `TODO.md`. Top of "Up next" is the `data-testid` e2e-stability audit; the new `currentChatRef` silent-return audit (added to backlog in this session) is a natural pairing with the bug fix that just shipped.
+- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required (ops-only follow-up — two consecutive green runs cleared the threshold).
+- 20 backend tests green on this branch (handoff_manager, session_handoffs_api, flowpilot_analytics_escalations). Frontend `tsc -b` clean. Branch has not been pushed; no CI runs yet.
+- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO with the disposition of all 12 of its findings.

 ## Useful breadcrumbs

- The fix that just landed: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) — `currentChatRef.current = session.session_id` after `setActiveChatId` in the dashboard prefill effect.
- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts).
- E2e env convention: [`.gitea/workflows/ci.yml`](../.gitea/workflows/ci.yml) — `ANTHROPIC_API_KEY` is stubbed in the e2e job env. Tests that exercise AI-gated endpoints should stub the actual AI calls in the browser, not rely on a real key.
- Silent-return follow-up entry: [`.ai/TODO.md`](TODO.md), Backlog section.
+- New endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom of the file.
+- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails.
+- Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer.
+- Two-metric framing — required reading before quoting any number to a pilot. The in-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc).
+- The `notification_sent` boolean is intentionally NOT being written. Per Codex's correction it should be replaced by per-channel delivery records; v1.x story. For now, application logs are the audit trail.
+- Two TODOs added during this session: peer-tech escalation (deferred to v2) and the (already moved-in-scope) claim role gate. See [`TODO.md`](TODO.md).
+
+## Watch-outs
+
+- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`, which is fine because session activity post-claim IS the senior's activity (the session is reactivated under `escalated_to_id`). If a future change adds `user_id` to `ai_session_step`, the metric query can become more precise.
+- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). The metric query and any new SSE subscription scoping must use it directly, not join through `ai_sessions`.
+- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is captured as a v2 TODO. Don't widen this without a UX decision.
+
+## Kill-switch (week 8)
+
+If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.