From a283d0d3fdc45949a0d24e9348b5b3de006b43f4 Mon Sep 17 00:00:00 2001
From: Michael Chihlas <michael@resolutionflow.com>
Date: Mon, 27 Apr 2026 16:38:14 -0400
Subject: [PATCH] docs(ai): refresh handoff state mid-flight on Escalation Mode
 build

Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:

- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
  this branch, what remains (WebSocket push, magic-moment screen, analytics
  page, e2e), and the two-metric framing readers MUST internalize before
  quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
  notification dual-path); includes suggested first slice + watch-outs
  (no user_id on ai_session_step, denormalized account_id, peer-escalation
  still gated to session owner)
- Both files reference the design doc and the kill-switch criterion

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
 .ai/CURRENT_TASK.md | 44 +++++++++++++++++++++++++++---------
 .ai/HANDOFF.md      | 54 +++++++++++++++++++++++++++++++--------------
 2 files changed, 71 insertions(+), 27 deletions(-)

diff --git a/.ai/CURRENT_TASK.md b/.ai/CURRENT_TASK.md
index 6879f473..5e8d9314 100644
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -1,20 +1,42 @@
 # CURRENT_TASK.md
 
-**Task:** No active task — pick from [`TODO.md`](TODO.md).
+**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
 
-**Status:** ready for next pickup.
+**Status:** in-flight on `feat/escalation-mode` (currently `feat/escalation-metric-endpoint`). Backend metric + role gate + email notification shipped. Frontend stat-card mounted. **Next:** WebSocket/SSE push (live-arrival half of the dual-path) and the magic-moment handoff-context screen.
 
-## Recommended next moves
+**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim-role-gate + per-channel notification model all applied to the plan and the code.
 
-1. **Promote `CI / e2e (pull_request)` to required on `main`.** Two consecutive PR runs (#150 and #153) have now finished green on the e2e job. That was the threshold the prior CI-recovery task set for promoting it. Branch protection update only — no code change.
-2. **Pick a backlog item.** Top of `TODO.md` "Up next" is the `data-testid` e2e-stability work (PR #152 spent five one-line selector updates chasing UI churn — adding stable test IDs to a small set of high-value elements would make those tests immune to copy/route renames). The new `currentChatRef` silent-return audit added in #153's session is in Backlog and is a natural pairing with the bug fix that was just shipped.
+**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once the build is feature-complete.
+
+## Done so far on `feat/escalation-metric-endpoint`
+
+| Commit | What it ships |
+|---|---|
+| `d51e95c` | Plan + test-plan artifacts checked in |
+| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated; 9 tests including multi-tenant isolation |
+| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin (was viewer-claimable); 2 tests |
+| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression test; 4 tests |
+| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list; consumes the new endpoint; matches DESIGN-SYSTEM tokens |
+
+20 backend tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. Frontend `tsc -b` clean. Nothing pushed yet.
+
+## Remaining work on this branch
+
+1. **WebSocket/SSE push** for live escalation arrival in the queue — the second half of the notification dual-path. Senior already on the queue page sees a new card slide in within ~1s of the junior hitting Escalate. ~3-4 days of work split across multiple commits (connection manager, auth-scoped fan-out, frontend EventSource handling, reconnect, slide-in animation, tab-title flash).
+2. **Magic-moment handoff-context screen** — 4-section view (problem summary / what's been tried / AI assessment / Start here CTA) that loads on Pick Up before dissolving into the regular FlowPilot session view. ~1.5-2 days.
+3. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d.
+4. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
+
+## Two-metric framing — read this before quoting numbers to anyone
+
+The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
+
+## Kill-switch
+
+Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative (deterministic-ops territory) for context, but don't pivot before the data lands.
 
 ## Previous task — closed out
 
-**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane.
+**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`. E2e regression test now in the suite.
 
-**Status:** complete (2026-04-26).
-
-- PR #153 merged as commit `68fcdc6` on `main`. Backend, frontend, and e2e all green on the merged SHA after the env-var fix.
-- E2e CI needed a stub `ANTHROPIC_API_KEY` in the workflow so the AI-gated `POST /api/v1/ai-sessions` endpoint stops returning 503; the Playwright `page.route` stub still intercepts the actual `/chat` call in the browser, so no real Anthropic traffic occurs.
-- Regression test `frontend/e2e/assistant-chat-prefill.spec.ts` is part of the e2e suite going forward.
+**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green PR runs (#150 and #153) cleared the threshold. Ops-only.
diff --git a/.ai/HANDOFF.md b/.ai/HANDOFF.md
index 54190f44..4c96a7ab 100644
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,27 +2,49 @@
 
 # HANDOFF.md
 
-**Last updated:** 2026-04-26 04:55 EDT
+**Last updated:** 2026-04-27 EDT
 
-**Active task:** None — pick from [`TODO.md`](TODO.md). See [`CURRENT_TASK.md`](CURRENT_TASK.md) for recommended next moves.
+**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only.
 
-**Branch:** `main` is the home position. Recent merges: PR #150 (CI recovery, `87bb20b`), PR #153 (prefill `currentChatRef` fix, `68fcdc6`).
+**Branch:** `feat/escalation-metric-endpoint` — five commits stacked on top of `main` (`c0ed6d9`). Nothing pushed yet.
+
+```
+9f0bfd4  feat(escalations): mount time-to-first-action stat-card on /escalations
+07d0db9  feat(handoff): email engineer-or-admin teammates on escalation
+7a5b853  feat(api): role-gate handoff claim to engineer-or-admin
+52f6d03  feat(analytics): add escalation time-to-first-action metric endpoint
+d51e95c  docs(plans): add escalation-mode wedge design + test plan
+```
+
+## Resume point
+
+Pick up the **WebSocket/SSE push** — the live-arrival half of the notification dual-path. Email is already wired (commit `07d0db9`); push is the second channel that makes the demo's "30-second magic moment" undeniable when the receiving senior is online and on the queue page.
+
+Suggested first slice: a thin server-side SSE endpoint scoped to `current_user.account_id`, fan out from `HandoffManager.dispatch_escalation_notifications` (alongside email), and hook the frontend `EscalationQueue` to subscribe and prepend new cards with the locked 200ms slide-in. Reconnect logic, tab-title flash, and `prefers-reduced-motion` respect are part of this slice per the locked UI spec in the design doc.
+
+After the dual-path is feature-complete, the **magic-moment handoff-context screen** is next (4 sections, dissolves into the FlowPilot session view on first action).
 
 ## Where things stand
 
-- CI is healthy on `main`: backend, frontend, and e2e all green on the latest commits.
-- Branch protection on `main`: PR-only merges, force-push blocked, **`CI / frontend (pull_request)` required**, **`CI / backend (pull_request)` required**, `CI / e2e (pull_request)` not yet required.
-- Two consecutive PR runs (#150, #153) finished green on e2e. The "promote e2e to required" gate from the prior task is now satisfiable.
-- Backend AI-gated endpoints (`POST /ai-sessions`, `/chat`, `/respond`, etc.) call `_require_ai_enabled()` and return 503 if no provider key is set. The e2e CI job now sets a stub `ANTHROPIC_API_KEY` so any future test that exercises those flows can rely on it; tests should still stub the actual AI calls in the browser via `page.route` so no real Anthropic traffic occurs.
-
-## Immediate next steps
-
-1. (Optional, ops-only) Promote `CI / e2e (pull_request)` to required on `main` in Gitea branch protection.
-2. Pick the next backlog item from `TODO.md`. Top of "Up next" is the `data-testid` e2e-stability audit; the new `currentChatRef` silent-return audit (added to backlog in this session) is a natural pairing with the bug fix that just shipped.
+- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required (ops-only follow-up — two consecutive green runs cleared the threshold).
+- 20 backend tests green on this branch (handoff_manager, session_handoffs_api, flowpilot_analytics_escalations). Frontend `tsc -b` clean. Branch has not been pushed; no CI runs yet.
+- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO with the disposition of all 12 of its findings.
 
 ## Useful breadcrumbs
 
-- The fix that just landed: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) — `currentChatRef.current = session.session_id` after `setActiveChatId` in the dashboard prefill effect.
-- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts).
-- E2e env convention: [`.gitea/workflows/ci.yml`](../.gitea/workflows/ci.yml) — `ANTHROPIC_API_KEY` is stubbed in the e2e job env. Tests that exercise AI-gated endpoints should stub the actual AI calls in the browser, not rely on a real key.
-- Silent-return follow-up entry: [`.ai/TODO.md`](TODO.md), Backlog section.
+- New endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom of the file.
+- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails.
+- Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer.
+- Two-metric framing — required reading before quoting any number to a pilot. The in-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc).
+- The `notification_sent` boolean is intentionally NOT being written. Per Codex's correction it should be replaced by per-channel delivery records; v1.x story. For now, application logs are the audit trail.
+- Two TODOs added during this session: peer-tech escalation (deferred to v2) and the (already moved-in-scope) claim role gate. See [`TODO.md`](TODO.md).
+
+## Watch-outs
+
+- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`, which is fine because session activity post-claim IS the senior's activity (the session is reactivated under `escalated_to_id`). If a future change adds `user_id` to `ai_session_step`, the metric query can become more precise.
+- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). The metric query and any new SSE subscription scoping must use it directly, not join through `ai_sessions`.
+- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is captured as a v2 TODO. Don't widen this without a UX decision.
+
+## Kill-switch (week 8)
+
+If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.