docs(ai): refresh handoff state mid-flight on Escalation Mode build

Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:

- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
  this branch, what remains (WebSocket push, magic-moment screen, analytics
  page, e2e), and the two-metric framing readers MUST internalize before
  quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
  notification dual-path); includes suggested first slice + watch-outs
  (no user_id on ai_session_step, denormalized account_id, peer-escalation
  still gated to session owner)
- Both files reference the design doc and the kill-switch criterion

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-27 16:38:14 -04:00
parent 9f0bfd44f9
commit a283d0d3fd
2 changed files with 71 additions and 27 deletions

View File

@@ -1,20 +1,42 @@
# CURRENT_TASK.md
**Task:** No active task — pick from [`TODO.md`](TODO.md).
**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
**Status:** ready for next pickup.
**Status:** in-flight on `feat/escalation-mode` (currently `feat/escalation-metric-endpoint`). Backend metric + role gate + email notification shipped. Frontend stat-card mounted. **Next:** WebSocket/SSE push (live-arrival half of the dual-path) and the magic-moment handoff-context screen.
## Recommended next moves
**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim-role-gate + per-channel notification model all applied to the plan and the code.
1. **Promote `CI / e2e (pull_request)` to required on `main`.** Two consecutive PR runs (#150 and #153) have now finished green on the e2e job. That was the threshold the prior CI-recovery task set for promoting it. Branch protection update only — no code change.
2. **Pick a backlog item.** Top of `TODO.md` "Up next" is the `data-testid` e2e-stability work (PR #152 spent five one-line selector updates chasing UI churn — adding stable test IDs to a small set of high-value elements would make those tests immune to copy/route renames). The new `currentChatRef` silent-return audit added in #153's session is in Backlog and is a natural pairing with the bug fix that was just shipped.
**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once the build is feature-complete.
## Done so far on `feat/escalation-metric-endpoint`
| Commit | What it ships |
|---|---|
| `d51e95c` | Plan + test-plan artifacts checked in |
| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated; 9 tests including multi-tenant isolation |
| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin (was viewer-claimable); 2 tests |
| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression test; 4 tests |
| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list; consumes the new endpoint; matches DESIGN-SYSTEM tokens |
20 backend tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. Frontend `tsc -b` clean. Nothing pushed yet.
## Remaining work on this branch
1. **WebSocket/SSE push** for live escalation arrival in the queue — the second half of the notification dual-path. Senior already on the queue page sees a new card slide in within ~1s of the junior hitting Escalate. ~3-4 days of work split across multiple commits (connection manager, auth-scoped fan-out, frontend EventSource handling, reconnect, slide-in animation, tab-title flash).
2. **Magic-moment handoff-context screen** — 4-section view (problem summary / what's been tried / AI assessment / Start here CTA) that loads on Pick Up before dissolving into the regular FlowPilot session view. ~1.5-2 days.
3. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d.
4. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
## Two-metric framing — read this before quoting numbers to anyone
The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
## Kill-switch
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative (deterministic-ops territory) for context, but don't pivot before the data lands.
## Previous task — closed out
**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane.
**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`. E2e regression test now in the suite.
**Status:** complete (2026-04-26).
- PR #153 merged as commit `68fcdc6` on `main`. Backend, frontend, and e2e all green on the merged SHA after the env-var fix.
- E2e CI needed a stub `ANTHROPIC_API_KEY` in the workflow so the AI-gated `POST /api/v1/ai-sessions` endpoint stops returning 503; the Playwright `page.route` stub still intercepts the actual `/chat` call in the browser, so no real Anthropic traffic occurs.
- Regression test `frontend/e2e/assistant-chat-prefill.spec.ts` is part of the e2e suite going forward.
**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green PR runs (#150 and #153) cleared the threshold. Ops-only.

View File

@@ -2,27 +2,49 @@
# HANDOFF.md
**Last updated:** 2026-04-26 04:55 EDT
**Last updated:** 2026-04-27 EDT
**Active task:** None — pick from [`TODO.md`](TODO.md). See [`CURRENT_TASK.md`](CURRENT_TASK.md) for recommended next moves.
**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only.
**Branch:** `main` is the home position. Recent merges: PR #150 (CI recovery, `87bb20b`), PR #153 (prefill `currentChatRef` fix, `68fcdc6`).
**Branch:** `feat/escalation-metric-endpoint` — five commits stacked on top of `main` (`c0ed6d9`). Nothing pushed yet.
```
9f0bfd4 feat(escalations): mount time-to-first-action stat-card on /escalations
07d0db9 feat(handoff): email engineer-or-admin teammates on escalation
7a5b853 feat(api): role-gate handoff claim to engineer-or-admin
52f6d03 feat(analytics): add escalation time-to-first-action metric endpoint
d51e95c docs(plans): add escalation-mode wedge design + test plan
```
## Resume point
Pick up the **WebSocket/SSE push** — the live-arrival half of the notification dual-path. Email is already wired (commit `07d0db9`); push is the second channel that makes the demo's "30-second magic moment" undeniable when the receiving senior is online and on the queue page.
Suggested first slice: a thin server-side SSE endpoint scoped to `current_user.account_id`, fan out from `HandoffManager.dispatch_escalation_notifications` (alongside email), and hook the frontend `EscalationQueue` to subscribe and prepend new cards with the locked 200ms slide-in. Reconnect logic, tab-title flash, and `prefers-reduced-motion` respect are part of this slice per the locked UI spec in the design doc.
After the dual-path is feature-complete, the **magic-moment handoff-context screen** is next (4 sections, dissolves into the FlowPilot session view on first action).
## Where things stand
- CI is healthy on `main`: backend, frontend, and e2e all green on the latest commits.
- Branch protection on `main`: PR-only merges, force-push blocked, **`CI / frontend (pull_request)` required**, **`CI / backend (pull_request)` required**, `CI / e2e (pull_request)` not yet required.
- Two consecutive PR runs (#150, #153) finished green on e2e. The "promote e2e to required" gate from the prior task is now satisfiable.
- Backend AI-gated endpoints (`POST /ai-sessions`, `/chat`, `/respond`, etc.) call `_require_ai_enabled()` and return 503 if no provider key is set. The e2e CI job now sets a stub `ANTHROPIC_API_KEY` so any future test that exercises those flows can rely on it; tests should still stub the actual AI calls in the browser via `page.route` so no real Anthropic traffic occurs.
## Immediate next steps
1. (Optional, ops-only) Promote `CI / e2e (pull_request)` to required on `main` in Gitea branch protection.
2. Pick the next backlog item from `TODO.md`. Top of "Up next" is the `data-testid` e2e-stability audit; the new `currentChatRef` silent-return audit (added to backlog in this session) is a natural pairing with the bug fix that just shipped.
- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required (ops-only follow-up — two consecutive green runs cleared the threshold).
- 20 backend tests green on this branch (handoff_manager, session_handoffs_api, flowpilot_analytics_escalations). Frontend `tsc -b` clean. Branch has not been pushed; no CI runs yet.
- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO with the disposition of all 12 of its findings.
## Useful breadcrumbs
- The fix that just landed: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) — `currentChatRef.current = session.session_id` after `setActiveChatId` in the dashboard prefill effect.
- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts).
- E2e env convention: [`.gitea/workflows/ci.yml`](../.gitea/workflows/ci.yml) — `ANTHROPIC_API_KEY` is stubbed in the e2e job env. Tests that exercise AI-gated endpoints should stub the actual AI calls in the browser, not rely on a real key.
- Silent-return follow-up entry: [`.ai/TODO.md`](TODO.md), Backlog section.
- New endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom of the file.
- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails.
- Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer.
- Two-metric framing — required reading before quoting any number to a pilot. The in-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc).
- The `notification_sent` boolean is intentionally NOT being written. Per Codex's correction it should be replaced by per-channel delivery records; v1.x story. For now, application logs are the audit trail.
- Two TODOs added during this session: peer-tech escalation (deferred to v2) and the (already moved-in-scope) claim role gate. See [`TODO.md`](TODO.md).
## Watch-outs
- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`, which is fine because session activity post-claim IS the senior's activity (the session is reactivated under `escalated_to_id`). If a future change adds `user_id` to `ai_session_step`, the metric query can become more precise.
- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). The metric query and any new SSE subscription scoping must use it directly, not join through `ai_sessions`.
- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is captured as a v2 TODO. Don't widen this without a UX decision.
## Kill-switch (week 8)
If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.