feat(escalations): Escalation Mode wedge — live arrival + magic-moment pickup #155
Reference in New Issue
Block a user
Delete Branch "feat/escalation-metric-endpoint"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Escalation Mode wedge — the GTM wedge for ResolutionFlow's first paying-customer push. When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
Plan:
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Test plan:docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md. Both reviewed by/office-hours,/plan-eng-review,/plan-design-review, and/codex review.What ships
Backend
GET /analytics/flowpilot/escalations— in-product time-to-first-action metric (account-scoped, engineer-or-admin gated).POST /handoffs/{id}/claimis now role-gated to engineer-or-admin (Codex correction, in-scope).HandoffManager.dispatch_escalation_notifications— emails engineer/admin teammates onintent=escalate, with graceful-degradation regression coverage so handoff creation never fails on a notification path error.GET /ai-sessions/escalations/stream— SSE endpoint backed by an in-memory account-scoped pub/sub bus (app/core/escalation_bus.py). Heartbeat every 25s; per-subscriber bounded queue with drop-on-full. Acceptable for v1 pilot scale (single Railway replica) — Redis pub/sub is the obvious swap when horizontal scaling appears.ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS(default 5s). Handoff creation proceeds with no assessment if the model times out.GET /ai-sessions/{id}access policy: any account member can read sessions inrequesting_escalation/escalatedstatus. Tenant boundary enforced by RLS — the owner-only guard was overly restrictive for explicitly-shared in-transit states. Unblocks the senior-pickup flow.session.escalatednow includes?pickup=trueso bell-icon clicks route through the pickup flow instead of dead-ending in a 404.Frontend — live arrival
aiSessionsApi.streamEscalations(handlers, signal)— fetch-basedReadableStreamSSE parser (nativeEventSourcecannot send auth headers).EscalationQueue.tsx—AbortController-managed subscription with exponential-backoff reconnect (1s → 30s cap, resets onready). New arrivals prepended (newest-first) above established cards (oldest-first preserved). Locked 200ms slide-in. Tab-title(N)flash whiledocument.hidden.prefers-reduced-motionswap. ARIA region witharia-live="polite".Frontend — magic-moment screen
HandoffContextScreen.tsx— 4 sections (problem header / what's been tried / AI assessment / Start here CTA). Renders gracefully whenai_assessmentis null (the 5s timeout fired). Confidence badge accepts numeric or string shape. Focus on primary CTA on mount. Esc dismisses when used as a re-openable overlay.FlowPilotSessionPage.tsxintegration — on?pickup=true, fetch the handoff list first, find the latest unclaimed escalate handoff, render the screen and skiploadSession(the senior would 404 pre-claim under the legacy access policy). Start here callsclaimHandoff, dismisses, andloadSessionfires. Toolbar "Context" button on active sessions re-opens the screen as a dismissible overlay.Frontend — owner-facing metric card
EscalationMetricCardmounted above the queue list on/escalations.Two-metric framing — read this before quoting numbers
The in-product endpoint measures post-claim time-to-first-action. The "minutes recovered" sales claim is
manual_baseline − in_product_metric. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.Kill-switch
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.
Known follow-ups (deferred, captured in
.ai/CURRENT_TASK.md)FlowPilotMessageBar).HandoffManager._generate_snapshotexpansion to include the recent diagnostic timeline pre-claim./analytics/escalationspage (period selector + conversion rate + trend chart).HandoffResponse.ai_assessment_data.confidencefrontend type is stale (number); backend emits string tiers. Runtime handles both.Test plan
test_escalation_bus,test_handoff_manager,test_session_handoffs_api,test_flowpilot_analytics_escalations) → 32 passed in 18.91s with-n auto.test_sessions,test_session_sharingadded after access-policy change) → 94 passed in 43.26s.tsc -bclean.text/event-stream;readyframe on connect;handoff_createdframe with full payload after posting a handoff. Wire format matches the parser exactly.listHandoffsreturns the unclaimed handoff for a senior pre-claim;claimHandoffflips session statusescalated→active; subsequentGETsucceeds.GETan escalated session detail./qaagainst the staging environment after merge.GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d} Computes the in-product wedge metric for Escalation Mode: average / median / p95 seconds between SessionHandoff.claimed_at and the first ai_session_step created on the same session after that timestamp. Account-scoped, role-gated to engineer-or-admin. The metric is intentionally NOT called "minutes recovered" — that's the two-metric framing locked by /codex review: this in-product number must be paired with manual baseline (the verbal-handoff stopwatch from The Assignment) to produce the savings claim. Schema's `metric_definition` field surfaces the disclaimer in every response so callers don't oversell it. Implementation notes: - Uses correlated scalar subquery for first-step-after-claim per handoff, aggregates avg/median/p95 in Python (~1k rows/account/month is well within budget; cleaner than percentile_cont gymnastics in SQL) - Excludes unclaimed handoffs (claimed_at IS NULL) - Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in n_handoffs_with_action — surfaces the conversion-rate signal - Floors negative deltas at 0 to handle clock-drift edge cases Tests cover happy path, zero-data, claimed-but-no-action accounting, period window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4 RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests, all green. No regressions in existing handoff_manager / session_handoffs suites. First piece of the Approach A wedge build per docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue stat-card and the analytics page. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an authenticated user, so a viewer-role account user could claim escalations. Codex review flagged this as wedge-relevant: the Escalation Mode race- condition story (two seniors clicking Pick Up simultaneously) depends on auth gating for audit integrity. Originally captured as a deferred TODO during /plan-eng-review, then moved in-scope by /codex review. Swap the dep to require_engineer_or_admin. One-line change. Two new tests: - viewer_role gets 403 with "Engineer or admin access required" - engineer/owner role still succeeds and claimed_at + claimed_by populate Existing handoff create + queue tests unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>First half of the Escalation Mode notification dual-path. WebSocket/SSE push is the second half (next commit) — email handles offline seniors, push handles online ones for the magic-moment demo. HandoffManager.dispatch_escalation_notifications: - Pulls active engineer/admin/owner-role users in the same account_id (excludes the escalator + viewers + soft-deleted) - Sends via existing EmailService.send_notification_email, concurrent via asyncio.gather; per-message failures don't block the rest - Wrapped in try/except: any exception is logged + swallowed. Handoff creation is authoritative; notification is advisory. This is the graceful-degradation regression both eng + codex reviews flagged as critical (handoff must succeed even if SMTP is down). Endpoint wiring (POST /ai-sessions/{id}/handoff): - Dispatch fires AFTER db.commit() — never email about a rolled-back handoff. Trust-erosion bug if we got that wrong. - Only fires for intent=escalate. Park is private to the escalator. Tests (4 new): - emails-engineer-recipients-in-account: viewer excluded, escalator excluded, only the engineer/admin teammates get the message - skipped-for-park-intent: park doesn't fan out - graceful-degradation-when-email-raises: RuntimeError from the email service does NOT bubble out of dispatch - endpoint-dispatches-on-escalate: end-to-end wiring through POST Per-channel delivery records (replacing the dead `notification_sent` boolean per Codex correction) is a v1.x story — for now application logs are the audit trail. See docs/plans/2026-04-27-escalation-mode-wedge-design.md. 20 tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. No regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Surfaces the new GET /analytics/flowpilot/escalations endpoint as a card above the EscalationQueue list. Closes the loop from yesterday's metric endpoint commit — seniors and owners see the wedge stat the moment they open the queue, which is the daily-reps version of the GTM ROI story. Pieces: - EscalationMetrics TS interface mirroring the backend Pydantic model (incl. metric_definition disclaimer field) - flowpilotAnalyticsApi.getEscalationMetrics(period) client method - EscalationMetricCard component: * loading skeleton, error state, zero-data empty state * avg + median + n_with_action/n_claimed conversion rate * humanized seconds → "Ns" / "N.N min" formatting * inline disclaimer reminding callers this is in-product time-to- first-action only, NOT the savings claim — pair with manual baseline (per /codex review's two-metric correction) - Wired into EscalationQueuePage above EscalationQueue DS-aligned: card-flat, accent-dim usage held to interactive elements, text-muted-foreground for secondary copy, font-heading on the headline number, explicit transition properties (no `transition: all`). Respects prefers-reduced-motion implicitly (only animation is the loading pulse, which Tailwind's animate-pulse already gates). tsc -b clean. No new tests in this commit — component is a thin state-machine over an axios call; integration coverage comes from the existing backend tests + the e2e Playwright work in the plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>First half of the WebSocket/SSE push slice. Paused mid-flight to hand the branch to Codex for outside-voice review before stacking more commits on top. See .ai/HANDOFF.md for the full pause context + what to look at. What's here: - backend/app/core/escalation_bus.py — module-level singleton in-memory pub/sub keyed by account_id. asyncio.Queue per subscriber with 64-event maxsize and drop-on-full semantics. Designed to be swappable for Redis pub/sub when Railway scales past single-replica. - backend/app/api/endpoints/session_handoffs.py — GET /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe bound to current_user.account_id. - backend/app/services/handoff_manager.py — dispatch_escalation_notifications now publishes a `handoff_created` event to the bus BEFORE the email fan-out, in a try/except so a bus failure can't block email delivery. - backend/tests/test_escalation_bus.py — 7 unit tests, all green standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers. - backend/tests/test_handoff_manager.py — +1 dispatcher integration test (publishes to bus, payload shape). - backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer blocked, ready event handshake). [gstack-context] Decisions: - SSE over WebSocket (one-way, browser EventSource semantics, fewer moving parts behind Railway proxy) - In-memory bus over Redis for v1 pilot (3 MSPs, single replica) - Drop-on-full subscriber queue rather than back-pressure publishers - Bus publish ahead of email send, both wrapped in try/except so neither can break handoff creation - Frontend will be a fetch-based ReadableStream reader matching the existing streamDocumentation pattern, not native EventSource (custom-header auth) Remaining (post-Codex): - Frontend SSE subscription in EscalationQueue.tsx (slide-in, reconnect, tab-title flash, prefers-reduced-motion) - Magic-moment handoff-context screen - Re-run the full backend test suite to verify the SSE + dispatcher integration tests (bus units already green standalone) Tried: - Running the full test suite repeatedly without xdist; the per-test DROP SCHEMA + recreate fixture made wall-clock prohibitive when multiple stale runs collided on the same Postgres test schema. Resolution: -n auto next time. [/gstack-context] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Adds the dedicated 4-section handoff-context view that renders BEFORE the FlowPilot session for senior techs picking up an escalated session, then dissolves on "Start here". This is the wedge's demonstrable magic moment — what the GTM Loom records. - HandoffContextScreen.tsx: pure presentational, takes a HandoffResponse plus onStartHere / onDismiss callbacks. Sections: header (problem summary, domain, step count, escalated-time, priority badge), "What's been tried" (engineer notes + step-count affordance), "AI assessment" (likely_cause / suggested_steps / confidence badge), Start here CTA. Confidence badge accepts both numeric (0..1) and string ("low"/"medium"/"high") shapes — backend currently emits the latter. Renders an explicit "assessment unavailable" branch when ai_assessment_data is null (the 5s timeout from9bdd995fired). Honors prefers-reduced-motion (animate-fade-in vs animate-slide-up). ARIA dialog + focus on the primary CTA. Esc dismisses when used as a re-openable overlay; pre-claim, Start here is the only exit. - FlowPilotSessionPage.tsx: on /pilot/:id?pickup=true, fetch the handoff list via handoffsApi.listHandoffs (account-scoped via RLS, no claim required) and find the latest unclaimed escalate handoff. If found, render the magic-moment screen and skip the regular loadSession (the senior isn't yet escalated_to_id, so GET would 404). Start here calls claimHandoff, drops the pickup query param, dismisses the screen — the existing loadSession effect then fires because the senior is now escalated_to_id. A "Context" toolbar button on active sessions re-opens the screen as a dismissible overlay (visible only when the senior arrived via the magic-moment flow this session — handoff lookup on demand). Verified end-to-end against the running dev stack: listHandoffs returns the unclaimed handoff with full payload; claim flips session status from escalated → active; subsequent GET succeeds. tsc -b clean. Defers (TODO followups): suggested-step chips below the chat input that prefill on click (requires threading through to FlowPilotMessageBar); snapshot expansion to include the recent diagnostic steps pre-claim; toolbar Context button on sessions where the senior didn't arrive via magic-moment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Escalation Mode wedge: live arrival + magic-moment pickupto WIP: Escalation Mode wedge: live arrival + magic-moment pickupReplaces the legacy flowpilot_engine.escalate_session orchestration with a single canonical path through HandoffManager. Every escalation now creates a SessionHandoff row, fans out via the SSE bus, persists AppNotification rows for the bell icon, dispatches to external channels (Slack/Teams) via notify(), and emails per-user — regardless of whether the call entered through /escalate (legacy URL) or /handoff (new URL). The senior-pickup magic-moment screen now works end-to-end from the EscalateModal bell-icon path the user just tested. Backend - HandoffCreateRequest gains optional target_user_id (the equivalent of the legacy escalated_to_id field). Self-targeting rejected. - HandoffManager.create_handoff handles intent='escalate' end-to-end: sets escalation_reason + escalated_to_id, builds the legacy enhanced AI escalation_package (Sonnet, lazy-imported from flowpilot_engine, graceful fallback on failure), and merges handoff metadata into it. Eager-loads session.steps and session.user via selectinload — required by both the enhanced-package builder and notify() to avoid MissingGreenlet on async lazy access. - HandoffManager.finalize_escalation generates SessionDocumentation, pushes documentation to PSA, and runs notify() — pre-commit so the AppNotification rows persist atomically with the handoff. - HandoffManager.dispatch_escalation_notifications keeps only the fire-and-forget IO (bus publish, per-user emails) — runs post-commit. Pulls engineer name via a separate User query rather than relying on session.user lazy access. - /handoff endpoint passes target_user_id through and calls finalize_escalation pre-commit. - /escalate endpoint is now a thin shim: owner-only session lookup, HandoffManager.create_handoff(intent='escalate'), finalize_escalation, commit, dispatch_escalation_notifications, return SessionCloseResponse built from documentation + psa_result. flowpilot_engine.escalate_session is no longer called by any endpoint. - pickup_session accepts both 'requesting_escalation' (legacy in-flight sessions) and 'escalated' (new canonical) so the migration is seamless for sessions already in the queue. - Escalation queue list and sidebar count now match either status. Frontend - useFlowPilotSession optimistic update flips status to 'escalated' instead of 'requesting_escalation' so the page state matches the unified backend response. Verified end-to-end live: a fresh /escalate call from the junior produces status='escalated', a SessionHandoff row, a SessionDocumentation, PSA push attempted (no_psa for this test session), AND a bell-icon AppNotification for the team admin with link /pilot/{session_id}?pickup=true. Backend test suite: 1103 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Three improvements driven by live wedge testing. 1) Notification title now includes a problem snippet and PSA ticket suffix when present: "Escalation from Jane · #12345: Outlook is failing to sync email…" Replaces the prior "Session escalated by Jane" copy that made every escalation from the same junior look identical in the bell panel. Snippet is trimmed to 70 chars with ellipsis. handoff_manager now passes psa_ticket_id through in the notify() payload so this works for both /escalate and /handoff entry points. 2) AI enrichment (assessment + enhanced escalation_package) moved to a FastAPI BackgroundTask. The escalating engineer no longer waits on 15-25s of Sonnet latency — handoff creation returns as soon as snapshot, status flip, dual-write, documentation, PSA push, and notify() are committed. enrich_escalation_async opens its own DB session, runs both AI calls, updates handoff.ai_assessment + session.escalation_package, commits, and publishes a new `handoff_assessment_ready` event on the escalation bus. Frontend doesn't yet listen for that event — the magic-moment screen still shows a placeholder ("AI assessment is still generating. Reopen this view in a few seconds…") which is honest about the state. Live polling / auto-refresh on the bus event is the natural next step. 3) ChatSidebar entries now surface the problem summary as a secondary line and tag PSA-linked sessions with a monospace #ticket badge plus an "Escalated" pill on in-transit sessions. ChatListItem grew problem_summary, psa_ticket_id, and status fields; loadChats populates them from listSessions. The user couldn't tell their own sessions apart in the sidebar because they all rendered as "New Chat" with no distinguishing detail — this fixes that for any session, escalated or not. Test plan - Backend full suite: 1103 passed in 255.85s with -n auto. - Frontend tsc -b clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>Two compounding bugs caused the previous session's questions/actions to render briefly when entering a new chat — visible as "the new session instantly pops with old session task-lane data" the user reported. The race - AssistantChatPage's activeQuestions / activeActions / showTaskLane useState initializers synchronously read sessionStorage's rf-tasklane-meta. They restore the persisted task-lane state if its saved chatId matches the freshly-resolved activeChatId. - On dashboard prefill flow, the page mounts on /pilot with location.state.prefill set; activeChatId initializes from sessionStorage's rf-active-chat-id (the previous session). The previous session's task-lane meta matches that chatId — so the initializer restores it. First paint shows old questions/actions. sendPrefill's resetSessionDerivedState fires later from a useEffect, but only after the flash. - Same pattern hits the senior-pickup flow: ?pickup=true means we're about to render the magic-moment screen and discard whatever chat the senior was previously on, but the underlying chat surface still initializes with their old task-lane meta. The amplifier - resetSessionDerivedState wiped the in-memory state but never removed sessionStorage's rf-tasklane-meta. Any remount or reload before the next persistence-effect write could re-hydrate the cleared state from the still-stale sessionStorage entry. Fixes - Initializer guard: when location.state.prefill is set OR ?pickup=true is in the URL, skip the sessionStorage restore entirely. Kills the first-paint flash for both entry paths. - Eager wipe: resetSessionDerivedState now also calls sessionStorage.removeItem('rf-tasklane-meta'). The persistence effect re-saves on the next state change anyway, so the only window where sessionStorage is empty is the exact window where stale-tag leakage was happening. tsc -b clean. No backend changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>- HandoffContextScreen: 3-option layout (Continue/AI analysis/Own thing) with hasTaskLane, activeOptionKey, spinner/disabled states - AssistantChatPage: wire up handleContinue, handleAIAnalysis, handleOwnThing handlers; chip detail expansion inline with copy-button fix; post-escalation redirect to dashboard on ConcludeSessionModal close - TaskLane: fix async copy button (await + execCommand fallback + copiedKey visual feedback); whitespace-pre-wrap on command blocks - Fix 500 on claim: Pydantic v2 model_validate() + model_copy(update={}) (was passing update= kwarg directly which v2 rejects) - HandoffResponse schema: handed_off_by_name field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>WIP: Escalation Mode wedge: live arrival + magic-moment pickupto feat(escalations): Escalation Mode wedge — live arrival + magic-moment pickup