From a283d0d3fdc45949a0d24e9348b5b3de006b43f4 Mon Sep 17 00:00:00 2001 From: Michael Chihlas Date: Mon, 27 Apr 2026 16:38:14 -0400 Subject: [PATCH] docs(ai): refresh handoff state mid-flight on Escalation Mode build Capture the in-flight state of the Escalation Mode wedge build so the next session (or Codex resume) picks up cleanly without re-deriving context: - CURRENT_TASK now describes the wedge, what's done across the 5 commits on this branch, what remains (WebSocket push, magic-moment screen, analytics page, e2e), and the two-metric framing readers MUST internalize before quoting numbers - HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the notification dual-path); includes suggested first slice + watch-outs (no user_id on ai_session_step, denormalized account_id, peer-escalation still gated to session owner) - Both files reference the design doc and the kill-switch criterion No code change. Co-Authored-By: Claude Opus 4.7 --- .ai/CURRENT_TASK.md | 44 +++++++++++++++++++++++++++--------- .ai/HANDOFF.md | 54 +++++++++++++++++++++++++++++++-------------- 2 files changed, 71 insertions(+), 27 deletions(-) diff --git a/.ai/CURRENT_TASK.md b/.ai/CURRENT_TASK.md index 6879f473..5e8d9314 100644 --- a/.ai/CURRENT_TASK.md +++ b/.ai/CURRENT_TASK.md @@ -1,20 +1,42 @@ # CURRENT_TASK.md -**Task:** No active task — pick from [`TODO.md`](TODO.md). +**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call. -**Status:** ready for next pickup. +**Status:** in-flight on `feat/escalation-mode` (currently `feat/escalation-metric-endpoint`). Backend metric + role gate + email notification shipped. Frontend stat-card mounted. **Next:** WebSocket/SSE push (live-arrival half of the dual-path) and the magic-moment handoff-context screen. -## Recommended next moves +**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim-role-gate + per-channel notification model all applied to the plan and the code. -1. **Promote `CI / e2e (pull_request)` to required on `main`.** Two consecutive PR runs (#150 and #153) have now finished green on the e2e job. That was the threshold the prior CI-recovery task set for promoting it. Branch protection update only — no code change. -2. **Pick a backlog item.** Top of `TODO.md` "Up next" is the `data-testid` e2e-stability work (PR #152 spent five one-line selector updates chasing UI churn — adding stable test IDs to a small set of high-value elements would make those tests immune to copy/route renames). The new `currentChatRef` silent-return audit added in #153's session is in Backlog and is a natural pairing with the bug fix that was just shipped. +**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once the build is feature-complete. + +## Done so far on `feat/escalation-metric-endpoint` + +| Commit | What it ships | +|---|---| +| `d51e95c` | Plan + test-plan artifacts checked in | +| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated; 9 tests including multi-tenant isolation | +| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin (was viewer-claimable); 2 tests | +| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression test; 4 tests | +| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list; consumes the new endpoint; matches DESIGN-SYSTEM tokens | + +20 backend tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. Frontend `tsc -b` clean. Nothing pushed yet. + +## Remaining work on this branch + +1. **WebSocket/SSE push** for live escalation arrival in the queue — the second half of the notification dual-path. Senior already on the queue page sees a new card slide in within ~1s of the junior hitting Escalate. ~3-4 days of work split across multiple commits (connection manager, auth-scoped fan-out, frontend EventSource handling, reconnect, slide-in animation, tab-title flash). +2. **Magic-moment handoff-context screen** — 4-section view (problem summary / what's been tried / AI assessment / Start here CTA) that loads on Pick Up before dissolving into the regular FlowPilot session view. ~1.5-2 days. +3. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d. +4. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording. + +## Two-metric framing — read this before quoting numbers to anyone + +The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught. + +## Kill-switch + +Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative (deterministic-ops territory) for context, but don't pivot before the data lands. ## Previous task — closed out -**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug that silently dropped AI follow-up responses in the task lane. +**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`. E2e regression test now in the suite. -**Status:** complete (2026-04-26). - -- PR #153 merged as commit `68fcdc6` on `main`. Backend, frontend, and e2e all green on the merged SHA after the env-var fix. -- E2e CI needed a stub `ANTHROPIC_API_KEY` in the workflow so the AI-gated `POST /api/v1/ai-sessions` endpoint stops returning 503; the Playwright `page.route` stub still intercepts the actual `/chat` call in the browser, so no real Anthropic traffic occurs. -- Regression test `frontend/e2e/assistant-chat-prefill.spec.ts` is part of the e2e suite going forward. +**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green PR runs (#150 and #153) cleared the threshold. Ops-only. diff --git a/.ai/HANDOFF.md b/.ai/HANDOFF.md index 54190f44..4c96a7ab 100644 --- a/.ai/HANDOFF.md +++ b/.ai/HANDOFF.md @@ -2,27 +2,49 @@ # HANDOFF.md -**Last updated:** 2026-04-26 04:55 EDT +**Last updated:** 2026-04-27 EDT -**Active task:** None — pick from [`TODO.md`](TODO.md). See [`CURRENT_TASK.md`](CURRENT_TASK.md) for recommended next moves. +**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only. -**Branch:** `main` is the home position. Recent merges: PR #150 (CI recovery, `87bb20b`), PR #153 (prefill `currentChatRef` fix, `68fcdc6`). +**Branch:** `feat/escalation-metric-endpoint` — five commits stacked on top of `main` (`c0ed6d9`). Nothing pushed yet. + +``` +9f0bfd4 feat(escalations): mount time-to-first-action stat-card on /escalations +07d0db9 feat(handoff): email engineer-or-admin teammates on escalation +7a5b853 feat(api): role-gate handoff claim to engineer-or-admin +52f6d03 feat(analytics): add escalation time-to-first-action metric endpoint +d51e95c docs(plans): add escalation-mode wedge design + test plan +``` + +## Resume point + +Pick up the **WebSocket/SSE push** — the live-arrival half of the notification dual-path. Email is already wired (commit `07d0db9`); push is the second channel that makes the demo's "30-second magic moment" undeniable when the receiving senior is online and on the queue page. + +Suggested first slice: a thin server-side SSE endpoint scoped to `current_user.account_id`, fan out from `HandoffManager.dispatch_escalation_notifications` (alongside email), and hook the frontend `EscalationQueue` to subscribe and prepend new cards with the locked 200ms slide-in. Reconnect logic, tab-title flash, and `prefers-reduced-motion` respect are part of this slice per the locked UI spec in the design doc. + +After the dual-path is feature-complete, the **magic-moment handoff-context screen** is next (4 sections, dissolves into the FlowPilot session view on first action). ## Where things stand -- CI is healthy on `main`: backend, frontend, and e2e all green on the latest commits. -- Branch protection on `main`: PR-only merges, force-push blocked, **`CI / frontend (pull_request)` required**, **`CI / backend (pull_request)` required**, `CI / e2e (pull_request)` not yet required. -- Two consecutive PR runs (#150, #153) finished green on e2e. The "promote e2e to required" gate from the prior task is now satisfiable. -- Backend AI-gated endpoints (`POST /ai-sessions`, `/chat`, `/respond`, etc.) call `_require_ai_enabled()` and return 503 if no provider key is set. The e2e CI job now sets a stub `ANTHROPIC_API_KEY` so any future test that exercises those flows can rely on it; tests should still stub the actual AI calls in the browser via `page.route` so no real Anthropic traffic occurs. - -## Immediate next steps - -1. (Optional, ops-only) Promote `CI / e2e (pull_request)` to required on `main` in Gitea branch protection. -2. Pick the next backlog item from `TODO.md`. Top of "Up next" is the `data-testid` e2e-stability audit; the new `currentChatRef` silent-return audit (added to backlog in this session) is a natural pairing with the bug fix that just shipped. +- CI on `main` still healthy. Branch protection: `CI / frontend (pull_request)` required, `CI / backend (pull_request)` required, `CI / e2e (pull_request)` not yet required (ops-only follow-up — two consecutive green runs cleared the threshold). +- 20 backend tests green on this branch (handoff_manager, session_handoffs_api, flowpilot_analytics_escalations). Frontend `tsc -b` clean. Branch has not been pushed; no CI runs yet. +- The plan doc at [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md) is the source of truth for every UI / metric / scope decision. The embedded **GSTACK REVIEW REPORT** at the bottom shows Eng + Design CLEARED and Codex INFO with the disposition of all 12 of its findings. ## Useful breadcrumbs -- The fix that just landed: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx) — `currentChatRef.current = session.session_id` after `setActiveChatId` in the dashboard prefill effect. -- Regression test: [`frontend/e2e/assistant-chat-prefill.spec.ts`](../frontend/e2e/assistant-chat-prefill.spec.ts). -- E2e env convention: [`.gitea/workflows/ci.yml`](../.gitea/workflows/ci.yml) — `ANTHROPIC_API_KEY` is stubbed in the e2e job env. Tests that exercise AI-gated endpoints should stub the actual AI calls in the browser, not rely on a real key. -- Silent-return follow-up entry: [`.ai/TODO.md`](TODO.md), Backlog section. +- New endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py) — `get_escalation_metrics` at the bottom of the file. +- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`. Wired in [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) **after** `db.commit()` so a rolled-back handoff never emails. +- Frontend stat-card: [`frontend/src/components/flowpilot/EscalationMetricCard.tsx`](../frontend/src/components/flowpilot/EscalationMetricCard.tsx). Renders `n_with_action / n_claimed`, avg + median, and the metric_definition disclaimer. +- Two-metric framing — required reading before quoting any number to a pilot. The in-product endpoint measures *post-claim time-to-first-action*; the savings claim is `manual_baseline − in_product`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). +- The `notification_sent` boolean is intentionally NOT being written. Per Codex's correction it should be replaced by per-channel delivery records; v1.x story. For now, application logs are the audit trail. +- Two TODOs added during this session: peer-tech escalation (deferred to v2) and the (already moved-in-scope) claim role gate. See [`TODO.md`](TODO.md). + +## Watch-outs + +- `ai_session_step` has NO `user_id` column — the metric query keys "first action by senior" off `session_id + created_at > claimed_at`, which is fine because session activity post-claim IS the senior's activity (the session is reactivated under `escalated_to_id`). If a future change adds `user_id` to `ai_session_step`, the metric query can become more precise. +- `account_id` is denormalized on `ai_session_step` (Phase 4 RLS pattern). The metric query and any new SSE subscription scoping must use it directly, not join through `ai_sessions`. +- POST `/handoff` still requires the session owner to be the escalator (`AISession.user_id == current_user.id`). Peer-tech escalation is captured as a v2 TODO. Don't widen this without a UX decision. + +## Kill-switch (week 8) + +If 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) but data lands first.