docs(ai): handoff for fresh session — AI consolidation plan locked
- HANDOFF: rewritten resume point. AI summary blocker is the active
task; consolidation plan is the path. 5-step implementation order
with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
live-test results (what works, the AI summary blocker), full
consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
pickup bugs found + fixed, the three Enter/dashboard/timeout
fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
calls into one structured generation" — rejected alternatives
(bump timeout further, copy status-update content the wrong way,
switch to Haiku) and consequences (5s magic-moment, ~60% token
reduction, instant Ticket Notes button, schema enforcement
required, migration concerns documented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -2,68 +2,102 @@
|
||||
|
||||
**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
|
||||
|
||||
**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch is pushed; **draft PR #155** is open against `main` ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Backend is **feature-complete and test-stabilized**. **Frontend live-arrival SSE subscription**, **magic-moment handoff-context screen**, and **bell-icon notification fix** all shipped. **`/escalate` and `/handoff` are now unified** through `HandoffManager` — every escalation creates a SessionHandoff, persists an AppNotification, fans out on the SSE bus, dispatches Slack/Teams via `notify()`, and emails per-user, regardless of which URL it entered through. **Next:** visual QA via `/qa`, then optional follow-ups (suggested-step chips, snapshot expansion, analytics page, Playwright e2e).
|
||||
**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch pushed; **draft PR #155** open ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Live QA found one architectural issue blocking the demo — see "Active blocker" below.
|
||||
|
||||
**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED. Codex's two-metric correction + claim role gate + per-channel notification model + SSE bus diagnostics all applied.
|
||||
**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED.
|
||||
|
||||
**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md) — primary input for `/qa` once feature-complete.
|
||||
**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md).
|
||||
|
||||
## Active blocker — AI assessment still empty after pickup
|
||||
|
||||
**The bug** (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and **the placeholder never clears**. Bus event fires with `has_assessment: false` because `_generate_ai_assessment` is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` from 15 → 45 (commit `0d1b305`) didn't fix it in the field.
|
||||
|
||||
**Why patching is the wrong move:** the real architectural issue is that we make **three** AI calls per escalation, all summarizing the same source material:
|
||||
|
||||
1. `_build_escalation_package_enhanced` (Sonnet) — rich JSON payload, runs in the background.
|
||||
2. `_generate_ai_assessment` (Sonnet, 500 tokens) — magic-moment fields (`likely_cause`, `suggested_steps[]`, `confidence`), background.
|
||||
3. `generate_status_update` (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in `ConcludeSessionModal`, on demand.
|
||||
|
||||
User's correct observation (2026-04-29): the engineer is *typically* generating a status update during the escalate flow anyway. There's no reason to do that work three times.
|
||||
|
||||
**Next active task: consolidate the three calls into one.** See `## Active task — AI generation consolidation` below.
|
||||
|
||||
## Active task — AI generation consolidation
|
||||
|
||||
**Goal:** ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
|
||||
|
||||
**Proposed shape** (decide during implementation):
|
||||
|
||||
```python
|
||||
# Persist on SessionHandoff:
|
||||
{
|
||||
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
|
||||
"what_we_know": ["<one-liner>", ...],
|
||||
"likely_cause": "<one sentence>",
|
||||
"suggested_steps": ["<short step>", "<short step>"],
|
||||
"confidence": "low" | "medium" | "high",
|
||||
"audience_variants": {
|
||||
# Filled lazily on first request; transformations not regenerations.
|
||||
"client_update": null,
|
||||
"email_draft": null,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Implementation order (suggested):**
|
||||
|
||||
1. **Backend:** Replace `_generate_ai_assessment` with `_generate_handoff_summary` (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to `handoff.ai_assessment_data` + a new `handoff.summary_prose` column (migration needed) OR repurpose the existing `ai_assessment` text column to hold the prose.
|
||||
2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For `client_update` / `email_draft`, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
|
||||
3. **Backend:** Drop `_build_escalation_package_enhanced` from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the `ai_session.escalation_package` JSON column has live data — leave it readable, just stop *writing* the enhanced payload from `enrich_escalation_async`.
|
||||
4. **Frontend:** `HandoffContextScreen` reads from the new structured fields. The `ConcludeSessionModal`'s "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
|
||||
5. **Test plan:** Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
|
||||
|
||||
**Watch-outs:**
|
||||
|
||||
- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
|
||||
- Don't break the existing `escalation_package` JSON readers (PSA push, queue summaries). Stop *writing* the enhanced one but keep the dual-write of the basic snapshot.
|
||||
- `_generate_ai_assessment` is referenced in tests (`test_handoff_manager.py` stubs it via `AsyncMock`). Update test fixtures when renaming.
|
||||
|
||||
## Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)
|
||||
|
||||
| Commit | What it ships |
|
||||
|---|---|
|
||||
| `d51e95c` | Plan + test-plan artifacts |
|
||||
| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action; account-scoped, engineer-or-admin gated |
|
||||
| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action |
|
||||
| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin |
|
||||
| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates on intent=escalate; graceful-degradation regression |
|
||||
| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates |
|
||||
| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list |
|
||||
| `a283d0d` | `.ai/` mid-flight refresh |
|
||||
| `87bd0b7` | **WIP** marker for the SSE backend slice (paused for Codex pass) |
|
||||
| `bc15952` | Codex: stabilize SSE backend tests — `Depends(..., scope="function")` releases auth DB deps before the long-lived stream body; SSE handshake test calls the generator directly; AI-assessment stub fixture; bus normalizes string vs UUID account_id |
|
||||
| `fff8338` | Doc-only: track escalation assessment latency follow-up |
|
||||
| `9bdd995` | Bound escalation assessment latency to `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` (default 5s); handoff still creates if assessment times out |
|
||||
| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — fetch-based `ReadableStream` reader; `handoff_created` triggers refetch + prepend with locked 200ms slide-in; exponential-backoff reconnect; tab-title flash when backgrounded; `prefers-reduced-motion` honored; ARIA live-region |
|
||||
| `f65b657` | Handoff state docs after frontend SSE slice lands |
|
||||
| `8e9d22e` | Magic-moment handoff-context screen on pickup — `HandoffContextScreen.tsx` (4 sections, graceful null AI assessment, focus management, prefers-reduced-motion); `FlowPilotSessionPage.tsx` integration (pre-claim handoff fetch, claim on Start here, toolbar re-open overlay) |
|
||||
| `c194ba4` | Handoff state docs after magic-moment screen lands |
|
||||
| `641853a` | Bell-icon notification opens the pickup flow — notification link template adds `?pickup=true`; GET `/ai-sessions/{id}` allows account-scoped read for `requesting_escalation` / `escalated` states |
|
||||
| `2a2329a` | Handoff state docs after bell-icon fix; record draft PR #155 |
|
||||
| `029680a` | Unify `/escalate` through `HandoffManager` — single canonical path for every escalation. `HandoffCreateRequest.target_user_id`, `create_handoff` does the legacy enriched-package work + sets `escalation_reason`, `finalize_escalation` runs documentation + PSA push + `notify()` pre-commit, `dispatch_escalation_notifications` keeps only fire-and-forget IO post-commit. `pickup_session` accepts either status for in-flight migration. `flowpilot_engine.escalate_session` no longer called from any endpoint |
|
||||
| `8914391` | First task-lane race fix — initializer-time guards (`incomingPrefill || isPickup`) + eager `sessionStorage.removeItem` in `resetSessionDerivedState`. Insufficient (only covered mount-time entry paths) |
|
||||
| `0f00ee5` | Four plan-locked wedge polish items in one commit — see "Just shipped" section below |
|
||||
| `665530f` | **Structural fix for the task-lane stale-flash bug.** `taskLaneOwnerChatId` state tags the chatId the in-memory questions/actions belong to. Set at every populate site (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix); cleared in `resetSessionDerivedState`. Persistence effect now writes `chatId: ownerChatId` (was `activeChatId` — that was the original write-side bug). Render gate `taskLaneIsForActiveChat = ownerChatId === activeChatId` ANDed into all three render conditions. Stale data is now structurally unable to display. See DECISIONS entry for full rationale |
|
||||
| `bc15952` | Codex: stabilize SSE backend tests |
|
||||
| `9bdd995` | Bound escalation assessment latency (ORIGINAL: 5s) |
|
||||
| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — live-arrival animations |
|
||||
| `8e9d22e` | Magic-moment handoff-context screen on pickup |
|
||||
| `641853a` | Bell-icon notification opens the pickup flow |
|
||||
| `029680a` | Unify `/escalate` through `HandoffManager` |
|
||||
| `8914391` | First task-lane race fix (insufficient — see `665530f`) |
|
||||
| `0f00ee5` | Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast |
|
||||
| `665530f` | Structural task-lane fix — `taskLaneOwnerChatId` tagging |
|
||||
| `b7d7ff0` | docs(ai): refresh handoff for compute swap |
|
||||
| `0d1b305` | **Live-test fixes**: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview |
|
||||
|
||||
**Test status:** full backend suite → `1103 passed in 259.63s` with `-n auto` after the unification. Frontend `tsc -b` clean. End-to-end smoke test against the running dev stack confirmed: SSE handshake delivers `ready` + `handoff_created` frames; `listHandoffs` returns the unclaimed handoff for a senior pre-claim; `claimHandoff` flips session status `escalated` → `active`; senior (non-owner, non-target) can `GET` an in-transit session detail; **a single legacy `/escalate` call now produces status='escalated', SessionDocumentation, SessionHandoff row, AppNotification with link `/pilot/{id}?pickup=true` for the team admin, and a PSA push attempt** — all from one funneled HandoffManager call. Branch pushed; draft PR #155 open.
|
||||
## Live-test results (2026-04-29 morning)
|
||||
|
||||
## Remaining work on this branch
|
||||
After the structural task-lane fix and the four polish items, end-to-end test confirmed:
|
||||
|
||||
1. **Visual QA + bug bash** in a real browser — full pickup demo path with the four new pieces below; this is the next active step.
|
||||
2. **Snapshot expansion in `HandoffManager._generate_snapshot`** — include the recent diagnostic steps / conversation tail so the magic-moment screen's "What's been tried" section can render the actual timeline pre-claim instead of "full timeline available after pickup".
|
||||
3. **Toolbar Context button on legacy-arrival sessions** — currently the button only appears when the senior arrived via the magic-moment flow this session. Lazy-fetching the handoff list on session-load (when status was-escalated) would make it work on revisits.
|
||||
4. **Owner-facing analytics page** at `/analytics/escalations` — period selector, conversion-rate, trend chart. ~0.5d. Optional for v1 demo.
|
||||
5. **Playwright e2e** for the magic-moment demo flow (junior escalates → senior receives via SSE → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
|
||||
- ✅ Junior escalates → senior gets bell-icon notification.
|
||||
- ✅ Magic-moment screen renders with handoff data on Pick Up.
|
||||
- ✅ Senior's chat surface loads with conversation history (after `0d1b305`'s selectChat fix — was completely broken before).
|
||||
- ✅ Sidebar shows the picked-up session with the "Escalated" pill (after `0d1b305`'s `loadChats()` call).
|
||||
- ✅ Suggested-step chips render below the composer.
|
||||
- ✅ Unread 6px dot on queue cards.
|
||||
- ✅ Task-lane regression is gone — no stale flash on new sessions.
|
||||
- ❌ **AI assessment placeholder never clears.** Drives the consolidation work above.
|
||||
|
||||
## Just shipped (this session — 2 commits)
|
||||
|
||||
**Commit `0f00ee5`** — four plan-locked wedge polish items:
|
||||
|
||||
- **Live AI assessment refresh on the magic-moment screen.** New `HandoffAssessmentReadyEvent` type + `onAssessmentReady` handler on `streamEscalations`. `AssistantChatPage` opens a scoped SSE subscription whenever it has a tracked handoff with no AI assessment yet; on a matching event it refetches and replaces both `magicHandoff` and `overlayHandoff` in place. Closes the loop on the async-assessment commit `e8ba74e`.
|
||||
- **Suggested-step chips below the chat input.** New `chipsHidden` state in `AssistantChatPage` defaulting to false; a chip strip renders above the composer when `magicHandoff?.ai_assessment_data?.suggested_steps[]` is non-empty and the magic-moment has dissolved. Click prefills input + focus; first send hides the strip; explicit X also hides. Per-session lifetime (Codex correction locked design).
|
||||
- **Unread 6px dot on `EscalationQueue` cards.** localStorage-persisted seen set (`rf-escalation-seen`, capped 200). Dot renders top-right of any card not yet seen. Cleared on **open (card click) or claim (Pick Up)** — NOT on hover (Codex correction). Pick Up onClick now stops propagation so the wrapper's open handler isn't double-fired.
|
||||
- **Race-condition toast on claim conflict.** New `HandoffAlreadyClaimedError` exception class in `handoff_manager.py`. `claim_session` now eager-loads `claimed_by_user`, rejects different-user re-claims (idempotent for same-user), and raises with the winner's id/name/timestamp. Endpoint translates to 409 with structured detail. `AssistantChatPage.handleStartHere` extracts the detail, formats `"Already claimed by {name} {time_ago}."` via `timeAgo()`, drops `?pickup=true`, and dismisses the magic-moment so the loser flows back to the queue. Backed by 2 new unit tests in `test_handoff_manager.py`.
|
||||
|
||||
**Commit `665530f`** — structural fix for the recurring stale-task-lane bug. Owner-tagging pattern applied to `activeQuestions` / `activeActions` / `showTaskLane`. See [`DECISIONS.md`](DECISIONS.md) for the architecture write-up. **User-reported on next session: needs visual verification.**
|
||||
Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).
|
||||
|
||||
## Two-metric framing — read this before quoting numbers to anyone
|
||||
|
||||
The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
|
||||
The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
|
||||
|
||||
## Kill-switch
|
||||
|
||||
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) for context, but data lands first.
|
||||
|
||||
## Previous task — closed out
|
||||
|
||||
**Task:** Land PR #153 — fix the `AssistantChatPage` prefill `currentChatRef` bug. **Status:** complete (2026-04-26). Merged as `68fcdc6` on `main`.
|
||||
|
||||
**Background CI item, not blocking:** promoting `CI / e2e (pull_request)` to required on `main`. Two consecutive green runs cleared the threshold. Ops-only.
|
||||
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.
|
||||
|
||||
@@ -13,6 +13,51 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-29 — Consolidate the three per-escalation AI calls into one structured generation
|
||||
|
||||
**Context:** A single user-initiated escalation currently triggers three separate Sonnet calls, all summarizing the same source material (session state, steps taken, "what we know") from slightly different angles:
|
||||
|
||||
1. `_build_escalation_package_enhanced` — runs in the background `enrich_escalation_async` task, builds a rich JSON payload that's saved to `ai_session.escalation_package`.
|
||||
2. `_generate_ai_assessment` — also background, returns the magic-moment screen fields (`likely_cause`, `suggested_steps[]`, `confidence`).
|
||||
3. `generate_status_update` — engineer-triggered when they click "Ticket Notes" / "Client Update" / "Email Draft" in the conclude modal, generates audience-specific PSA prose.
|
||||
|
||||
The user surfaced the smell: the engineer is *typically* generating a status update during the escalate flow, so the AI assessment work is being done twice with overlapping context and the engineer's PSA prose is being thrown away. Live test on 2026-04-29 also showed that bumping the assessment timeout 15s → 45s did NOT fix the empty-placeholder bug — meaning the architectural smell is also a demo blocker.
|
||||
|
||||
**Decision:** ONE structured AI call per escalation that produces a single payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Persist to `SessionHandoff`. The conclude modal's "Ticket Notes" button reads from the saved prose instead of calling the model. "Client Update" and "Email Draft" buttons trigger a cheap Haiku transformation over the saved prose (tone shift only, not a re-summarization).
|
||||
|
||||
Proposed payload shape (final form decided during implementation):
|
||||
|
||||
```json
|
||||
{
|
||||
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
|
||||
"what_we_know": ["<one-liner>"],
|
||||
"likely_cause": "<one sentence>",
|
||||
"suggested_steps": ["<short step>"],
|
||||
"confidence": "low | medium | high",
|
||||
"audience_variants": {"client_update": null, "email_draft": null}
|
||||
}
|
||||
```
|
||||
|
||||
`audience_variants` filled lazily on first user request, cached.
|
||||
|
||||
**Rejected:**
|
||||
|
||||
- **Just bumping the timeout further.** Already tried 5s → 15s → 45s. The architectural redundancy is the real cost — even if Sonnet completed reliably, three calls per escalation is wasteful and creates three places where state can diverge.
|
||||
- **Reusing the engineer's status update content as the AI assessment.** User's first instinct, but: status updates aren't always generated (engineer has to click), they're audience-specific (so you'd pick which one to copy), and they're prose without the structured fields the magic-moment screen needs. The right consolidation is the OTHER direction — generate ONE structured payload that the status-update buttons consume.
|
||||
- **Switching the assessment to Haiku for speed.** Faster but solves only the latency symptom, not the redundancy. Doesn't help the conclude modal's status-update buttons.
|
||||
|
||||
**Consequences:**
|
||||
|
||||
- Magic-moment screen populates in ~5s instead of 25s+ (work happens in the foreground escalate path, not in a background task that races with the senior's pickup).
|
||||
- Token spend per escalation drops by ~60% — one Sonnet call replaces two; the third (audience variants) becomes Haiku.
|
||||
- Engineer's "Ticket Notes" button is instant — no model round-trip.
|
||||
- Schema enforcement matters. The current `_generate_ai_assessment` returns freeform prose that the frontend stuffs into `assessment_text` because the structured fields aren't reliably parseable. The new call must use Anthropic's structured output / tool-use to enforce the schema.
|
||||
- Migration concern: `ai_session.escalation_package` JSON column has live data on existing sessions. Keep it READABLE for backward compatibility; just stop *writing* the enhanced payload from `enrich_escalation_async`. If downstream queue summaries depend on it, dual-write the basic snapshot.
|
||||
- Test fixtures (`test_handoff_manager.py`, `test_session_handoffs_api.py`) currently stub `_generate_ai_assessment` via `AsyncMock`. Updating the stubs is part of the rename.
|
||||
- The frontend SSE assessment-ready subscription (added in `0f00ee5`) stays as-is — it just listens for the new event payload.
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-28 — Tag the task-lane state with an owner chatId
|
||||
|
||||
**Context:** A recurring bug — every time the user returned to test escalation work, creating a new session would flash the previous session's task-lane data (questions, actions, "Tasks" pill counts) before the new session's AI response landed. The first attempt to fix it (`8914391`) added initializer-time guards (`incomingPrefill || isPickup`) that skipped the sessionStorage restore on mount. That covered exactly two entry paths and missed every other case: in-place URL navigation, mid-flight pickup, HMR re-runs, and the gap between `setActiveChatId(B)` and the AI response that finally populates B's questions/actions. The persistence effect made it worse by writing `{chatId: activeChatId, questions: activeQuestions}` — at any moment where activeChatId had flipped before the questions were updated, sessionStorage was stamped with `{chatId: B, questions: [A's data]}` and a subsequent restore would happily render A's data for B.
|
||||
|
||||
@@ -2,54 +2,64 @@
|
||||
|
||||
# HANDOFF.md
|
||||
|
||||
**Last updated:** 2026-04-28 02:00 EDT
|
||||
**Last updated:** 2026-04-29 04:30 EDT
|
||||
|
||||
**Active task:** **Escalation Mode** wedge build. Full status in [`CURRENT_TASK.md`](CURRENT_TASK.md); this file is the resume point.
|
||||
**Active task:** **Escalation Mode** wedge — AI generation consolidation. Full status + design in [`CURRENT_TASK.md`](CURRENT_TASK.md). The wedge demo is **demo-blocked** by an empty AI assessment that didn't fix with a timeout bump. Architectural cause: 3 redundant AI calls per escalation; the right fix is to consolidate.
|
||||
|
||||
**Branch:** `feat/escalation-metric-endpoint`. Local tip is `665530f`. **Remote (origin) is at `8914391`** — the last two commits (`0f00ee5`, `665530f`) are local-only because the user is swapping computers and asked for the docs/handoff first. **Push needed on next session before continuing work.** Draft PR #155 is open against `main`.
|
||||
**Branch:** `feat/escalation-metric-endpoint` at `0d1b305`. Pushed to origin. Draft PR #155 open.
|
||||
|
||||
## What this session did
|
||||
## Where the previous session ended
|
||||
|
||||
Two commits, both untested in a real browser:
|
||||
Live QA bash on the wedge demo. Branch state: 4 commits added this session (`0f00ee5`, `665530f`, `b7d7ff0`, `0d1b305`).
|
||||
|
||||
1. **`0f00ee5` feat(escalations): close out plan-locked wedge polish.** Four items from the design-plan audit ([`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md)):
|
||||
- **Live AI assessment refresh** — frontend listener for the `handoff_assessment_ready` SSE event, refetches the handoff and updates `magicHandoff` / `overlayHandoff` in place. Closes the async-assessment loop from `e8ba74e`.
|
||||
- **Suggested-step chips** below the composer in `AssistantChatPage` — surfaces `ai_assessment_data.suggested_steps[]` post-claim, click prefills the input, hides on first send or explicit X.
|
||||
- **Unread 6px dot** on `EscalationQueue` cards — localStorage-persisted seen set (`rf-escalation-seen`), clears on open OR claim (NOT hover; Codex correction).
|
||||
- **Race-condition toast on claim conflict** — new `HandoffAlreadyClaimedError` exception, endpoint returns 409 with structured `{claimed_by_id, claimed_by_name, claimed_at}`, frontend shows `"Already claimed by {name} {time_ago}."` and bounces the loser back to the queue. Backed by 2 new tests; full handoff/escalation suite (34 tests) green.
|
||||
**Confirmed working in browser:**
|
||||
|
||||
2. **`665530f` fix(assistant-chat): tag task-lane state with owner chatId.** Structural fix for the recurring "new session shows previous session's task lane" bug. The earlier fix `8914391` only covered the mount-time entry path; this change makes stale data structurally unable to display by adding `taskLaneOwnerChatId` state and a render gate `taskLaneOwnerChatId === activeChatId` ANDed into all three render conditions. Persistence effect now writes ownership chatId, not active chatId — that was the original write-side bug. See [`DECISIONS.md`](DECISIONS.md) for the architecture write-up.
|
||||
- Junior escalates → senior bell-icon notification
|
||||
- Senior Pick Up → magic-moment screen with handoff data
|
||||
- Senior Start Here → chat surface loads with conversation history (`0d1b305` fixed the selectChat-gating bug — was rendering blank before)
|
||||
- Sidebar shows picked-up session with "Escalated" pill (`0d1b305`'s `loadChats()` after claim)
|
||||
- Suggested-step chips render below the composer
|
||||
- Unread 6px dot on queue cards persists across refresh
|
||||
- Task-lane regression killed — no stale flash on new sessions
|
||||
- Enter-to-submit (Shift+Enter for newline) on `EscalateModal` and `ConcludeSessionModal`
|
||||
- `PendingEscalations` rows on dashboard expand to show escalation reason + step count + ticket #
|
||||
|
||||
Verified: `tsc -b` clean after both. Backend handoff/escalation suite (34 tests) green. **Not verified:** anything in a real browser. The user explicitly asked for a debugging session after implementation — that's the next thing.
|
||||
**Active blocker:**
|
||||
|
||||
## Resume point
|
||||
- **AI assessment never populates** on the magic-moment screen. Bumping the timeout 15s → 45s in `0d1b305` did not fix it in the field. Backend logs from earlier in session showed Sonnet timing out at 15s; the assumption was the call would complete with more headroom, but live test still empty. May be a different failure mode (assessment generating but the bus event firing with `has_assessment: false`, or the frontend subscription not refetching, or the call genuinely failing past 45s).
|
||||
|
||||
1. **First action: `git push` the two local commits.** `0f00ee5` and `665530f` are local-only.
|
||||
2. **Visual QA + bug bash.** End-to-end demo flow:
|
||||
- Junior escalates → senior gets bell-icon notification → click → magic-moment screen with **placeholder AI assessment** (because it's now async/background) → assessment populates **in place** within ~5–15s without manual reopen → Start here → chat surface loads with **suggested-step chips** above the composer → click a chip prefills input.
|
||||
- On `/escalations`: backgrounded tab gets `(N)` title prefix when an arrival fires; new card has **6px accent dot** top-right; clicking the card body OR Pick Up clears the dot (verify it persists across refresh, doesn't clear on hover).
|
||||
- Race condition: claim the same handoff from two browsers; loser sees toast `"Already claimed by {name} {time_ago}."` and bounces.
|
||||
- **Task-lane regression check:** create a new session via dashboard prefill / pickup / "New Chat" — the lane must NOT flash the previous session's questions/actions. The user previously reported this happening repeatedly; the fix in `665530f` should kill it. If it still happens, that's the next debug target.
|
||||
3. **Deferred follow-ups in `CURRENT_TASK.md`:** snapshot expansion, owner-facing `/analytics/escalations` page, Playwright e2e for the GTM Loom demo path, eventual cleanup of `flowpilot_engine.escalate_session` and the dead `FlowPilotSessionPage.tsx` magic-moment branch.
|
||||
## Resume point — DO THIS NEXT
|
||||
|
||||
**Replace the three redundant AI calls with a single structured generation.** Full implementation plan in [`CURRENT_TASK.md`](CURRENT_TASK.md) under "Active task — AI generation consolidation." Summary:
|
||||
|
||||
1. **Backend:** Replace `_generate_ai_assessment` with one Sonnet call returning structured JSON: `summary_prose` (PSA-flavored) + `what_we_know[]` + `likely_cause` + `suggested_steps[]` + `confidence`. Persist to `SessionHandoff`. Use Anthropic structured output / tool-use to enforce the schema.
|
||||
2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read the saved payload (instant). For `client_update` and `email_draft`, run a cheaper Haiku transformation over the saved prose, not a full re-summarization.
|
||||
3. **Backend:** Stop calling `_build_escalation_package_enhanced` from the background path — overlapping content. Verify nothing downstream depends on the *enhanced* enriched payload before removing.
|
||||
4. **Frontend:** `HandoffContextScreen` reads from the consolidated structured fields. `ConcludeSessionModal`'s "Ticket Notes" button stops generating, just copies the saved prose. "Client Update" / "Email Draft" trigger the cheap transformation.
|
||||
5. **Test plan:** magic-moment populates in ~5s. Token spend down ~60%. AI summary blocker resolved.
|
||||
|
||||
**Implementation order (suggested):** 1 → 4 (so the magic moment shows the new fields) → 2 → 3 (cleanup) → tests.
|
||||
|
||||
**Watch-outs:**
|
||||
|
||||
- Schema enforcement matters. Past calls returned freeform prose that doesn't parse into chips. Anthropic structured output / tool-use is the right tool.
|
||||
- `escalation_package` JSON column has live data on existing sessions — keep it READABLE, just stop *writing* the enhanced payload from `enrich_escalation_async`. Dual-write the basic snapshot if downstream queue summaries need it.
|
||||
- `_generate_ai_assessment` is stubbed in `test_handoff_manager.py` and `test_session_handoffs_api.py` via `AsyncMock`. Update test fixtures when renaming.
|
||||
- The frontend assessment-ready SSE subscription (added in `0f00ee5`) is fine as-is — it'll dispatch on the new event payload. No client changes for the live-refresh path.
|
||||
|
||||
## Useful breadcrumbs
|
||||
|
||||
- SSE endpoint: [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`.
|
||||
- Pub/sub bus: [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py).
|
||||
- Frontend SSE consumer: [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts) → `streamEscalations` (now dispatches `handoff_created` AND `handoff_assessment_ready`).
|
||||
- Live-arrival queue UI: [`frontend/src/components/flowpilot/EscalationQueue.tsx`](../frontend/src/components/flowpilot/EscalationQueue.tsx).
|
||||
- AI assessment current impl: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `_generate_ai_assessment`, `_generate_ai_assessment_with_timeout`, `enrich_escalation_async`.
|
||||
- Status update current impl: [`backend/app/services/flowpilot_engine.py`](../backend/app/services/flowpilot_engine.py) — `generate_status_update`, `_build_status_update_prompt`, `_build_status_update_context`.
|
||||
- Enhanced package builder: [`backend/app/services/flowpilot_engine.py`](../backend/app/services/flowpilot_engine.py) — `_build_escalation_package_enhanced` (line ~1694).
|
||||
- Magic-moment screen: [`frontend/src/components/flowpilot/HandoffContextScreen.tsx`](../frontend/src/components/flowpilot/HandoffContextScreen.tsx).
|
||||
- Pickup integration + magic state machine + suggested-step chips + assessment-ready subscription + claim 409 handling + task-lane owner tagging: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx).
|
||||
- Claim conflict exception: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `HandoffAlreadyClaimedError`, `claim_session`, `enrich_escalation_async`.
|
||||
- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py).
|
||||
- Conclude modal: [`frontend/src/components/assistant/ConcludeSessionModal.tsx`](../frontend/src/components/assistant/ConcludeSessionModal.tsx) — see `handleGenerateStatusUpdate`.
|
||||
- Magic-moment integration + suggested-step chips: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx).
|
||||
- Test fixtures stubbing the assessment: `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`.
|
||||
|
||||
## Watch-outs
|
||||
## Watch-outs (general)
|
||||
|
||||
- The two new commits are **local-only** until pushed. Run `git push` before any other work.
|
||||
- The assessment-ready subscription opens a fresh SSE connection scoped by `assessmentMissing && trackedHandoffId`. If you change the magic-moment lifecycle, double-check the cleanup deps don't churn the subscription.
|
||||
- The claim conflict path is currently only wired into `AssistantChatPage.handleStartHere`. `useHandoff` (used by `SessionQueuePage`) and `FlowPilotSessionPage.tsx` (dead) were not updated. If `SessionQueuePage` claims start mattering, mirror the same `axios.isAxiosError(e) && e.response?.status === 409` extraction.
|
||||
- The handoff snapshot is still sparse (`problem_summary, problem_domain, status, step_count, confidence_tier`). Magic-moment "What's been tried" still only shows engineer notes + step count pre-claim.
|
||||
- `HandoffResponse.ai_assessment_data.confidence` is typed `number` on the frontend but the backend currently emits `'low' | 'medium' | 'high'`. Runtime handles both; type definition is stale.
|
||||
- Toolbar "Context" button is hidden on revisited active sessions where the senior didn't arrive via magic-moment this session — known scope cut.
|
||||
- Do not reintroduce `client.stream()`/ASGITransport tests for infinite SSE responses; test the generator directly.
|
||||
- Bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the obvious swap when horizontal scaling appears.
|
||||
- Dev stack on this machine: backend `:8000`, frontend `:5173`, postgres `:5433`. All running via docker-compose. HMR works.
|
||||
- Test users (Acme MSP shared account, password `TestPass123!`): `engineer@resolutionflow.example.com` (junior), `teamadmin@resolutionflow.example.com` (senior).
|
||||
- The bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the swap when horizontal scaling appears.
|
||||
- `streamEscalations` doesn't drive token refresh on a mid-stream 401. Acceptable for v1.
|
||||
|
||||
@@ -12,6 +12,24 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-04-29 04:30 EDT — Claude Code — Live QA bash, pickup bug fixes, AI summary consolidation surfaced
|
||||
|
||||
- User on a freshly swapped computer ran the live QA flow. Identified two bugs missed by static analysis from the previous session:
|
||||
- **Pickup landed on a blank chat surface.** Root cause: commit `8914391` had made `activeChatId` initialize from `urlSessionId`, which broke the selectChat-gating effect in `AssistantChatPage` (`urlSessionId === activeChatId` short-circuited fresh mounts). Symptom was `selectChat` never firing post-claim; messages, conversation history, and pickup-flow correctness all silently broken.
|
||||
- **Picked-up session missing from sidebar.** Root cause: `loadChats` runs once at mount; pre-claim the session's `escalated_to_id` is null (the junior didn't specify a target), so `listSessions` doesn't return it. Post-claim `claim_session` sets `escalated_to_id` to teamadmin, but the sidebar list never refreshes.
|
||||
- Fixes (commit `0d1b305`):
|
||||
- Replaced the `urlSessionId === activeChatId` gate with a `loadedChatIdsRef` set so selectChat fires once per URL session per page lifecycle, regardless of whether activeChatId already matches.
|
||||
- Added `loadChats()` call in `handleStartHere` after the claim succeeds so the sidebar reflects ownership.
|
||||
- Three additional pieces folded into `0d1b305` from the same QA bash:
|
||||
- **Enter-to-submit on the escalate forms.** Chat-input convention: plain Enter submits, Shift+Enter inserts a newline. Added optional `onSubmit` prop to `RichTextInput` (used by `EscalateModal`) and inline `onKeyDown` on the plain textarea in `ConcludeSessionModal`. The user explicitly asked for this — they want to type the reason and hit Enter without reaching for the mouse.
|
||||
- **Dashboard `PendingEscalations` rows expand to preview.** Click a row to reveal escalation reason + step count + confidence tier + PSA ticket number. Pick Up button click-stops to still go directly to magic moment. Single expansion at a time.
|
||||
- **`ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` bumped 15 → 45.** Backend logs showed Sonnet hitting the 15s timeout in field testing. Background-task architecture (e8ba74e) means this no longer blocks the user — only bounds before publishing `has_assessment: false`. **Did NOT fix the live demo.** Assessment placeholder still permanent in user's test.
|
||||
- Surfaced an architectural smell: the escalation flow makes **three** Sonnet calls — `_build_escalation_package_enhanced`, `_generate_ai_assessment`, and `generate_status_update` (engineer-triggered) — all summarizing the same source material from slightly different angles. User correctly observed: status update is typically generated during the escalate flow anyway; reusing that content would consolidate.
|
||||
- Decided the right consolidation: ONE structured AI call per escalation that returns both the magic-moment diagnostic fields (`likely_cause`, `suggested_steps[]`, `confidence`) AND PSA-ready prose. Magic moment populates immediately. Status update buttons become tone-shift transformations (Haiku) of the saved prose, not fresh summarizations. Drops to 1 call (~60% token reduction), eliminates the AI-summary placeholder bug because the work happens in the foreground escalate path. Full implementation plan written into CURRENT_TASK.md and DECISIONS.md.
|
||||
- Session ended pre-consolidation: user is updating Claude Code CLI and starting a fresh session for clean context window. All work pushed to origin (`0d1b305`). PR #155 still draft.
|
||||
- Test users for the next session (Acme MSP shared account, password `TestPass123!`): `engineer@` (junior) and `teamadmin@` (senior).
|
||||
- Files touched: `frontend/src/pages/AssistantChatPage.tsx`, `frontend/src/components/common/RichTextInput.tsx`, `frontend/src/components/flowpilot/EscalateModal.tsx`, `frontend/src/components/assistant/ConcludeSessionModal.tsx`, `frontend/src/components/dashboard/PendingEscalations.tsx`, `backend/app/core/config.py`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/DECISIONS.md`.
|
||||
|
||||
## 2026-04-28 02:00 EDT — Claude Code — Plan-locked wedge polish + structural task-lane fix
|
||||
|
||||
- Audited `docs/plans/2026-04-27-escalation-mode-wedge-design.md` against the branch and identified four locked-design / Codex-correction items not yet shipped: live AI assessment refresh, suggested-step chips, unread 6px dot on queue cards, and race-condition toast on claim conflict.
|
||||
|
||||
Reference in New Issue
Block a user