docs(ai): handoff for fresh session — AI consolidation plan locked

- HANDOFF: rewritten resume point. AI summary blocker is the active task; consolidation plan is the path. 5-step implementation order with watch-outs and breadcrumbs. - CURRENT_TASK: updated commit table through 0d1b305. Documents the live-test results (what works, the AI summary blocker), full consolidation design with proposed payload shape. - SESSION_LOG: chronological entry covering live QA bash, two pickup bugs found + fixed, the three Enter/dashboard/timeout fixes, and the architectural smell that surfaced. - DECISIONS: new entry "Consolidate the three per-escalation AI calls into one structured generation" — rejected alternatives (bump timeout further, copy status-update content the wrong way, switch to Haiku) and consequences (5s magic-moment, ~60% token reduction, instant Ticket Notes button, schema enforcement required, migration concerns documented). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:21:30 -04:00
parent 0d1b305619
commit fb2dc222fd
4 changed files with 188 additions and 81 deletions
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -2,68 +2,102 @@

 **Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.

-**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch is pushed; **draft PR #155** is open against `main` ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Backend is **feature-complete and test-stabilized**. **Frontend live-arrival SSE subscription**, **magic-moment handoff-context screen**, and **bell-icon notification fix** all shipped. **`/escalate` and `/handoff` are now unified** through `HandoffManager` — every escalation creates a SessionHandoff, persists an AppNotification, fans out on the SSE bus, dispatches Slack/Teams via `notify()`, and emails per-user, regardless of which URL it entered through. **Next:** visual QA via `/qa`, then optional follow-ups (suggested-step chips, snapshot expansion, analytics page, Playwright e2e).
+**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch pushed; **draft PR #155** open ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Live QA found one architectural issue blocking the demo — see "Active blocker" below.

-**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim role gate + per-channel notification model + SSE bus diagnostics all applied.
+**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED.

-**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once feature-complete.
+**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md).
+
+## Active blocker — AI assessment still empty after pickup
+
+**The bug** (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and **the placeholder never clears**. Bus event fires with `has_assessment: false` because `_generate_ai_assessment` is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` from 15 → 45 (commit `0d1b305`) didn't fix it in the field.
+
+**Why patching is the wrong move:** the real architectural issue is that we make **three** AI calls per escalation, all summarizing the same source material:
+
+1. `_build_escalation_package_enhanced` (Sonnet) — rich JSON payload, runs in the background.
+2. `_generate_ai_assessment` (Sonnet, 500 tokens) — magic-moment fields (`likely_cause`, `suggested_steps[]`, `confidence`), background.
+3. `generate_status_update` (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in `ConcludeSessionModal`, on demand.
+
+User's correct observation (2026-04-29): the engineer is *typically* generating a status update during the escalate flow anyway. There's no reason to do that work three times.
+
+**Next active task: consolidate the three calls into one.** See `## Active task — AI generation consolidation` below.
+
+## Active task — AI generation consolidation
+
+**Goal:** ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
+
+**Proposed shape** (decide during implementation):
+
+```python
+# Persist on SessionHandoff:
+{
+  "summary_prose": "<PSA-flavored ticket-notes paragraph>",
+  "what_we_know": ["<one-liner>", ...],
+  "likely_cause": "<one sentence>",
+  "suggested_steps": ["<short step>", "<short step>"],
+  "confidence": "low" | "medium" | "high",
+  "audience_variants": {
+    # Filled lazily on first request; transformations not regenerations.
+    "client_update": null,
+    "email_draft": null,
+  }
+}
+```
+
+**Implementation order (suggested):**
+
+1. **Backend:** Replace `_generate_ai_assessment` with `_generate_handoff_summary` (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to `handoff.ai_assessment_data` + a new `handoff.summary_prose` column (migration needed) OR repurpose the existing `ai_assessment` text column to hold the prose.
+2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For `client_update` / `email_draft`, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
+3. **Backend:** Drop `_build_escalation_package_enhanced` from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the `ai_session.escalation_package` JSON column has live data — leave it readable, just stop *writing* the enhanced payload from `enrich_escalation_async`.
+4. **Frontend:** `HandoffContextScreen` reads from the new structured fields. The `ConcludeSessionModal`'s "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
+5. **Test plan:** Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
+
+**Watch-outs:**
+
+- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
+- Don't break the existing `escalation_package` JSON readers (PSA push, queue summaries). Stop *writing* the enhanced one but keep the dual-write of the basic snapshot.
+- `_generate_ai_assessment` is referenced in tests (`test_handoff_manager.py` stubs it via `AsyncMock`). Update test fixtures when renaming.

 ## Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)

 | Commit | What it ships |
 |---|---|
 | `d51e95c` | Plan + test-plan artifacts |
-| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated |
+| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action |
 | `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin |
-| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression |
+| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates |
 | `9f0bfd4` | `EscalationMetricCard` mounted above the queue list |
-| `a283d0d` | `.ai/` mid-flight refresh |
-| `87bd0b7` | **WIP** marker for the SSE backend slice (paused for Codex pass) |
-| `bc15952` | Codex: stabilize SSE backend tests — `Depends(..., scope="function")` releases auth DB deps before the long-lived stream body; SSE handshake test calls the generator directly; AI-assessment stub fixture; bus normalizes string vs UUID account_id |
-| `fff8338` | Doc-only: track escalation assessment latency follow-up |
-| `9bdd995` | Bound escalation assessment latency to `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` (default 5s); handoff still creates if assessment times out |
-| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — fetch-based `ReadableStream` reader; `handoff_created` triggers refetch + prepend with locked 200ms slide-in; exponential-backoff reconnect; tab-title flash when backgrounded; `prefers-reduced-motion` honored; ARIA live-region |
-| `f65b657` | Handoff state docs after frontend SSE slice lands |
-| `8e9d22e` | Magic-moment handoff-context screen on pickup — `HandoffContextScreen.tsx` (4 sections, graceful null AI assessment, focus management, prefers-reduced-motion); `FlowPilotSessionPage.tsx` integration (pre-claim handoff fetch, claim on Start here, toolbar re-open overlay) |
-| `c194ba4` | Handoff state docs after magic-moment screen lands |
-| `641853a` | Bell-icon notification opens the pickup flow — notification link template adds `?pickup=true`; GET `/ai-sessions/{id}` allows account-scoped read for `requesting_escalation` / `escalated` states |
-| `2a2329a` | Handoff state docs after bell-icon fix; record draft PR #155 |
-| `029680a` | Unify `/escalate` through `HandoffManager` — single canonical path for every escalation. `HandoffCreateRequest.target_user_id`, `create_handoff` does the legacy enriched-package work + sets `escalation_reason`, `finalize_escalation` runs documentation + PSA push + `notify()` pre-commit, `dispatch_escalation_notifications` keeps only fire-and-forget IO post-commit. `pickup_session` accepts either status for in-flight migration. `flowpilot_engine.escalate_session` no longer called from any endpoint |
-| `8914391` | First task-lane race fix — initializer-time guards (`incomingPrefill || isPickup`) + eager `sessionStorage.removeItem` in `resetSessionDerivedState`. Insufficient (only covered mount-time entry paths) |
-| `0f00ee5` | Four plan-locked wedge polish items in one commit — see "Just shipped" section below |
-| `665530f` | **Structural fix for the task-lane stale-flash bug.** `taskLaneOwnerChatId` state tags the chatId the in-memory questions/actions belong to. Set at every populate site (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix); cleared in `resetSessionDerivedState`. Persistence effect now writes `chatId: ownerChatId` (was `activeChatId` — that was the original write-side bug). Render gate `taskLaneIsForActiveChat = ownerChatId === activeChatId` ANDed into all three render conditions. Stale data is now structurally unable to display. See DECISIONS entry for full rationale |
+| `bc15952` | Codex: stabilize SSE backend tests |
+| `9bdd995` | Bound escalation assessment latency (ORIGINAL: 5s) |
+| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — live-arrival animations |
+| `8e9d22e` | Magic-moment handoff-context screen on pickup |
+| `641853a` | Bell-icon notification opens the pickup flow |
+| `029680a` | Unify `/escalate` through `HandoffManager` |
+| `8914391` | First task-lane race fix (insufficient — see `665530f`) |
+| `0f00ee5` | Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast |
+| `665530f` | Structural task-lane fix — `taskLaneOwnerChatId` tagging |
+| `b7d7ff0` | docs(ai): refresh handoff for compute swap |
+| `0d1b305` | **Live-test fixes**: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview |

-**Test status:** full backend suite → `1103 passed in 259.63s` with `-n auto` after the unification. Frontend `tsc -b` clean. End-to-end smoke test against the running dev stack confirmed: SSE handshake delivers `ready` + `handoff_created` frames; `listHandoffs` returns the unclaimed handoff for a senior pre-claim; `claimHandoff` flips session status `escalated` → `active`; senior (non-owner, non-target) can `GET` an in-transit session detail; **a single legacy `/escalate` call now produces status='escalated', SessionDocumentation, SessionHandoff row, AppNotification with link `/pilot/{id}?pickup=true` for the team admin, and a PSA push attempt** — all from one funneled HandoffManager call. Branch pushed; draft PR #155 open.
+## Live-test results (2026-04-29 morning)

-## Remaining work on this branch
+After the structural task-lane fix and the four polish items, end-to-end test confirmed:

-1. **Visual QA + bug bash** in a real browser — full pickup demo path with the four new pieces below; this is the next active step.
-2. **Snapshot expansion in `HandoffManager._generate_snapshot`** — include the recent diagnostic steps / conversation tail so the magic-moment screen's "What's been tried" section can render the actual timeline pre-claim instead of "full timeline available after pickup".
-3. **Toolbar Context button on legacy-arrival sessions** — currently the button only appears when the senior arrived via the magic-moment flow this session. Lazy-fetching the handoff list on session-load (when status was-escalated) would make it work on revisits.
-4. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d. Optional for v1 demo.
-5. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives via SSE → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
+- ✅ Junior escalates → senior gets bell-icon notification.
+- ✅ Magic-moment screen renders with handoff data on Pick Up.
+- ✅ Senior's chat surface loads with conversation history (after `0d1b305`'s selectChat fix — was completely broken before).
+- ✅ Sidebar shows the picked-up session with the "Escalated" pill (after `0d1b305`'s `loadChats()` call).
+- ✅ Suggested-step chips render below the composer.
+- ✅ Unread 6px dot on queue cards.
+- ✅ Task-lane regression is gone — no stale flash on new sessions.
+- ❌ **AI assessment placeholder never clears.** Drives the consolidation work above.

-## Just shipped (this session — 2 commits)
-
-**Commit `0f00ee5`** — four plan-locked wedge polish items:
-
- **Live AI assessment refresh on the magic-moment screen.** New `HandoffAssessmentReadyEvent` type + `onAssessmentReady` handler on `streamEscalations`. `AssistantChatPage` opens a scoped SSE subscription whenever it has a tracked handoff with no AI assessment yet; on a matching event it refetches and replaces both `magicHandoff` and `overlayHandoff` in place. Closes the loop on the async-assessment commit `e8ba74e`.
- **Suggested-step chips below the chat input.** New `chipsHidden` state in `AssistantChatPage` defaulting to false; a chip strip renders above the composer when `magicHandoff?.ai_assessment_data?.suggested_steps[]` is non-empty and the magic-moment has dissolved. Click prefills input + focus; first send hides the strip; explicit X also hides. Per-session lifetime (Codex correction locked design).
- **Unread 6px dot on `EscalationQueue` cards.** localStorage-persisted seen set (`rf-escalation-seen`, capped 200). Dot renders top-right of any card not yet seen. Cleared on **open (card click) or claim (Pick Up)** — NOT on hover (Codex correction). Pick Up onClick now stops propagation so the wrapper's open handler isn't double-fired.
- **Race-condition toast on claim conflict.** New `HandoffAlreadyClaimedError` exception class in `handoff_manager.py`. `claim_session` now eager-loads `claimed_by_user`, rejects different-user re-claims (idempotent for same-user), and raises with the winner's id/name/timestamp. Endpoint translates to 409 with structured detail. `AssistantChatPage.handleStartHere` extracts the detail, formats `"Already claimed by {name} {time_ago}."` via `timeAgo()`, drops `?pickup=true`, and dismisses the magic-moment so the loser flows back to the queue. Backed by 2 new unit tests in `test_handoff_manager.py`.
-
-**Commit `665530f`** — structural fix for the recurring stale-task-lane bug. Owner-tagging pattern applied to `activeQuestions` / `activeActions` / `showTaskLane`. See [`DECISIONS.md`](DECISIONS.md) for the architecture write-up. **User-reported on next session: needs visual verification.**
+Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).

 ## Two-metric framing — read this before quoting numbers to anyone

-The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
+The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.

 ## Kill-switch

-Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) for context, but data lands first.
-
-## Previous task — closed out
-
-**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`.
-
-**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green runs cleared the threshold. Ops-only.
+Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.