docs(ai): handoff for fresh session — AI consolidation plan locked

- HANDOFF: rewritten resume point. AI summary blocker is the active task; consolidation plan is the path. 5-step implementation order with watch-outs and breadcrumbs. - CURRENT_TASK: updated commit table through 0d1b305. Documents the live-test results (what works, the AI summary blocker), full consolidation design with proposed payload shape. - SESSION_LOG: chronological entry covering live QA bash, two pickup bugs found + fixed, the three Enter/dashboard/timeout fixes, and the architectural smell that surfaced. - DECISIONS: new entry "Consolidate the three per-escalation AI calls into one structured generation" — rejected alternatives (bump timeout further, copy status-update content the wrong way, switch to Haiku) and consequences (5s magic-moment, ~60% token reduction, instant Ticket Notes button, schema enforcement required, migration concerns documented). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:21:30 -04:00
parent 0d1b305619
commit fb2dc222fd
4 changed files with 188 additions and 81 deletions
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,54 +2,64 @@

 # HANDOFF.md

-**Last updated:** 2026-04-28 02:00 EDT
+**Last updated:** 2026-04-29 04:30 EDT

-**Active task:** **Escalation Mode** wedge build. Full status in [`CURRENT_TASK.md`](CURRENT_TASK.md); this file is the resume point.
+**Active task:** **Escalation Mode** wedge — AI generation consolidation. Full status + design in [`CURRENT_TASK.md`](CURRENT_TASK.md). The wedge demo is **demo-blocked** by an empty AI assessment that didn't fix with a timeout bump. Architectural cause: 3 redundant AI calls per escalation; the right fix is to consolidate.

-**Branch:** `feat/escalation-metric-endpoint`. Local tip is `665530f`. **Remote (origin) is at `8914391`** — the last two commits (`0f00ee5`, `665530f`) are local-only because the user is swapping computers and asked for the docs/handoff first. **Push needed on next session before continuing work.** Draft PR #155 is open against `main`.
+**Branch:** `feat/escalation-metric-endpoint` at `0d1b305`. Pushed to origin. Draft PR #155 open.

-## What this session did
+## Where the previous session ended

-Two commits, both untested in a real browser:
+Live QA bash on the wedge demo. Branch state: 4 commits added this session (`0f00ee5`, `665530f`, `b7d7ff0`, `0d1b305`).

-1. **`0f00ee5` feat(escalations): close out plan-locked wedge polish.** Four items from the design-plan audit ([`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md)):
-   - **Live AI assessment refresh** — frontend listener for the `handoff_assessment_ready` SSE event, refetches the handoff and updates `magicHandoff` / `overlayHandoff` in place. Closes the async-assessment loop from `e8ba74e`.
-   - **Suggested-step chips** below the composer in `AssistantChatPage` — surfaces `ai_assessment_data.suggested_steps[]` post-claim, click prefills the input, hides on first send or explicit X.
-   - **Unread 6px dot** on `EscalationQueue` cards — localStorage-persisted seen set (`rf-escalation-seen`), clears on open OR claim (NOT hover; Codex correction).
-   - **Race-condition toast on claim conflict** — new `HandoffAlreadyClaimedError` exception, endpoint returns 409 with structured `{claimed_by_id, claimed_by_name, claimed_at}`, frontend shows `"Already claimed by {name} {time_ago}."` and bounces the loser back to the queue. Backed by 2 new tests; full handoff/escalation suite (34 tests) green.
+**Confirmed working in browser:**

-2. **`665530f` fix(assistant-chat): tag task-lane state with owner chatId.** Structural fix for the recurring "new session shows previous session's task lane" bug. The earlier fix `8914391` only covered the mount-time entry path; this change makes stale data structurally unable to display by adding `taskLaneOwnerChatId` state and a render gate `taskLaneOwnerChatId === activeChatId` ANDed into all three render conditions. Persistence effect now writes ownership chatId, not active chatId — that was the original write-side bug. See [`DECISIONS.md`](DECISIONS.md) for the architecture write-up.
+- Junior escalates → senior bell-icon notification
+- Senior Pick Up → magic-moment screen with handoff data
+- Senior Start Here → chat surface loads with conversation history (`0d1b305` fixed the selectChat-gating bug — was rendering blank before)
+- Sidebar shows picked-up session with "Escalated" pill (`0d1b305`'s `loadChats()` after claim)
+- Suggested-step chips render below the composer
+- Unread 6px dot on queue cards persists across refresh
+- Task-lane regression killed — no stale flash on new sessions
+- Enter-to-submit (Shift+Enter for newline) on `EscalateModal` and `ConcludeSessionModal`
+- `PendingEscalations` rows on dashboard expand to show escalation reason + step count + ticket #

-Verified: `tsc -b` clean after both. Backend handoff/escalation suite (34 tests) green. **Not verified:** anything in a real browser. The user explicitly asked for a debugging session after implementation — that's the next thing.
+**Active blocker:**

-## Resume point
+- **AI assessment never populates** on the magic-moment screen. Bumping the timeout 15s → 45s in `0d1b305` did not fix it in the field. Backend logs from earlier in session showed Sonnet timing out at 15s; the assumption was the call would complete with more headroom, but live test still empty. May be a different failure mode (assessment generating but the bus event firing with `has_assessment: false`, or the frontend subscription not refetching, or the call genuinely failing past 45s).

-1. **First action: `git push` the two local commits.** `0f00ee5` and `665530f` are local-only.
-2. **Visual QA + bug bash.** End-to-end demo flow:
-   - Junior escalates → senior gets bell-icon notification → click → magic-moment screen with **placeholder AI assessment** (because it's now async/background) → assessment populates **in place** within ~5–15s without manual reopen → Start here → chat surface loads with **suggested-step chips** above the composer → click a chip prefills input.
-   - On `/escalations`: backgrounded tab gets `(N)` title prefix when an arrival fires; new card has **6px accent dot** top-right; clicking the card body OR Pick Up clears the dot (verify it persists across refresh, doesn't clear on hover).
-   - Race condition: claim the same handoff from two browsers; loser sees toast `"Already claimed by {name} {time_ago}."` and bounces.
-   - **Task-lane regression check:** create a new session via dashboard prefill / pickup / "New Chat" — the lane must NOT flash the previous session's questions/actions. The user previously reported this happening repeatedly; the fix in `665530f` should kill it. If it still happens, that's the next debug target.
-3. **Deferred follow-ups in `CURRENT_TASK.md`:** snapshot expansion, owner-facing `/analytics/escalations` page, Playwright e2e for the GTM Loom demo path, eventual cleanup of `flowpilot_engine.escalate_session` and the dead `FlowPilotSessionPage.tsx` magic-moment branch.
+## Resume point — DO THIS NEXT
+
+**Replace the three redundant AI calls with a single structured generation.** Full implementation plan in [`CURRENT_TASK.md`](CURRENT_TASK.md) under "Active task — AI generation consolidation." Summary:
+
+1. **Backend:** Replace `_generate_ai_assessment` with one Sonnet call returning structured JSON: `summary_prose` (PSA-flavored) + `what_we_know[]` + `likely_cause` + `suggested_steps[]` + `confidence`. Persist to `SessionHandoff`. Use Anthropic structured output / tool-use to enforce the schema.
+2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read the saved payload (instant). For `client_update` and `email_draft`, run a cheaper Haiku transformation over the saved prose, not a full re-summarization.
+3. **Backend:** Stop calling `_build_escalation_package_enhanced` from the background path — overlapping content. Verify nothing downstream depends on the *enhanced* enriched payload before removing.
+4. **Frontend:** `HandoffContextScreen` reads from the consolidated structured fields. `ConcludeSessionModal`'s "Ticket Notes" button stops generating, just copies the saved prose. "Client Update" / "Email Draft" trigger the cheap transformation.
+5. **Test plan:** magic-moment populates in ~5s. Token spend down ~60%. AI summary blocker resolved.
+
+**Implementation order (suggested):** 1 → 4 (so the magic moment shows the new fields) → 2 → 3 (cleanup) → tests.
+
+**Watch-outs:**
+
+- Schema enforcement matters. Past calls returned freeform prose that doesn't parse into chips. Anthropic structured output / tool-use is the right tool.
+- `escalation_package` JSON column has live data on existing sessions — keep it READABLE, just stop *writing* the enhanced payload from `enrich_escalation_async`. Dual-write the basic snapshot if downstream queue summaries need it.
+- `_generate_ai_assessment` is stubbed in `test_handoff_manager.py` and `test_session_handoffs_api.py` via `AsyncMock`. Update test fixtures when renaming.
+- The frontend assessment-ready SSE subscription (added in `0f00ee5`) is fine as-is — it'll dispatch on the new event payload. No client changes for the live-refresh path.

 ## Useful breadcrumbs

- SSE endpoint: [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`.
- Pub/sub bus: [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py).
- Frontend SSE consumer: [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts) → `streamEscalations` (now dispatches `handoff_created` AND `handoff_assessment_ready`).
- Live-arrival queue UI: [`frontend/src/components/flowpilot/EscalationQueue.tsx`](../frontend/src/components/flowpilot/EscalationQueue.tsx).
+- AI assessment current impl: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `_generate_ai_assessment`, `_generate_ai_assessment_with_timeout`, `enrich_escalation_async`.
+- Status update current impl: [`backend/app/services/flowpilot_engine.py`](../backend/app/services/flowpilot_engine.py) — `generate_status_update`, `_build_status_update_prompt`, `_build_status_update_context`.
+- Enhanced package builder: [`backend/app/services/flowpilot_engine.py`](../backend/app/services/flowpilot_engine.py) — `_build_escalation_package_enhanced` (line ~1694).
 - Magic-moment screen: [`frontend/src/components/flowpilot/HandoffContextScreen.tsx`](../frontend/src/components/flowpilot/HandoffContextScreen.tsx).
- Pickup integration + magic state machine + suggested-step chips + assessment-ready subscription + claim 409 handling + task-lane owner tagging: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx).
- Claim conflict exception: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `HandoffAlreadyClaimedError`, `claim_session`, `enrich_escalation_async`.
- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py).
+- Conclude modal: [`frontend/src/components/assistant/ConcludeSessionModal.tsx`](../frontend/src/components/assistant/ConcludeSessionModal.tsx) — see `handleGenerateStatusUpdate`.
+- Magic-moment integration + suggested-step chips: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx).
+- Test fixtures stubbing the assessment: `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`.

-## Watch-outs
+## Watch-outs (general)

- The two new commits are **local-only** until pushed. Run `git push` before any other work.
- The assessment-ready subscription opens a fresh SSE connection scoped by `assessmentMissing && trackedHandoffId`. If you change the magic-moment lifecycle, double-check the cleanup deps don't churn the subscription.
- The claim conflict path is currently only wired into `AssistantChatPage.handleStartHere`. `useHandoff` (used by `SessionQueuePage`) and `FlowPilotSessionPage.tsx` (dead) were not updated. If `SessionQueuePage` claims start mattering, mirror the same `axios.isAxiosError(e) && e.response?.status === 409` extraction.
- The handoff snapshot is still sparse (`problem_summary, problem_domain, status, step_count, confidence_tier`). Magic-moment "What's been tried" still only shows engineer notes + step count pre-claim.
- `HandoffResponse.ai_assessment_data.confidence` is typed `number` on the frontend but the backend currently emits `'low' | 'medium' | 'high'`. Runtime handles both; type definition is stale.
- Toolbar "Context" button is hidden on revisited active sessions where the senior didn't arrive via magic-moment this session — known scope cut.
- Do not reintroduce `client.stream()`/ASGITransport tests for infinite SSE responses; test the generator directly.
- Bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the obvious swap when horizontal scaling appears.
+- Dev stack on this machine: backend `:8000`, frontend `:5173`, postgres `:5433`. All running via docker-compose. HMR works.
+- Test users (Acme MSP shared account, password `TestPass123!`): `engineer@resolutionflow.example.com` (junior), `teamadmin@resolutionflow.example.com` (senior).
+- The bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the swap when horizontal scaling appears.
+- `streamEscalations` doesn't drive token refresh on a mid-stream 401. Acceptable for v1.