Codex review pass on the escalation wedge. Reworks claim_session from
read-then-write to a conditional UPDATE so two seniors racing can't both
win, blocks the original engineer from claiming their own handoff, and
filters self-escalated sessions out of the dashboard escalation queue.
Also preassigns the handoff UUID before flush so the compatibility
escalation_package payload carries it. Removes legacy frontend pickup
state (claiming, handleStartHere) that broke tsc --noEmit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four items from the design-plan audit, all flagged as locked-design or
Codex corrections, shipped together so the GTM demo path covers them
end-to-end before bug bash.
1. Live AI assessment refresh on the magic-moment screen. Backend already
publishes handoff_assessment_ready when enrich_escalation_async commits;
wire the frontend listener so the senior sees the assessment populate
without a manual reopen. New event type + onAssessmentReady handler on
streamEscalations; AssistantChatPage opens a scoped SSE subscription
whenever it tracks a handoff missing its assessment, refetches on match,
and replaces magicHandoff / overlayHandoff in place. Closes the loop on
the async-assessment commit e8ba74e.
2. Suggested-step chips below the chat input. Locked design from the plan
(Codex correction). Chip strip renders above the composer post-claim
when ai_assessment_data.suggested_steps[] is non-empty. Click prefills
the input and focuses; first send or explicit X hides for the session.
3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen
set (rf-escalation-seen, capped 200). Dot top-right when not seen.
Cleared on open (card click) or claim (Pick Up) — NOT on hover, per
Codex correction. Pick Up stops propagation so it doesn't double-fire.
4. Race-condition toast on claim conflict. The /claim endpoint previously
silently overwrote claimed_by — both seniors thought they owned the
session. New HandoffAlreadyClaimedError carries the winner's id/name/
timestamp; claim_session rejects different-user re-claims (same-user is
idempotent for double-click safety); endpoint returns 409 with
structured detail. AssistantChatPage.handleStartHere extracts and
surfaces "Already claimed by {name} {time_ago}." via toast, drops
?pickup=true, dismisses magic-moment so the loser flows back to queue.
Tests: 2 new unit tests in test_handoff_manager.py (conflict raises,
same-user idempotent). Full handoff + escalation suite (34 tests) green.
Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three improvements driven by live wedge testing.
1) Notification title now includes a problem snippet and PSA ticket
suffix when present:
"Escalation from Jane · #12345: Outlook is failing to sync email…"
Replaces the prior "Session escalated by Jane" copy that made every
escalation from the same junior look identical in the bell panel.
Snippet is trimmed to 70 chars with ellipsis. handoff_manager now
passes psa_ticket_id through in the notify() payload so this works
for both /escalate and /handoff entry points.
2) AI enrichment (assessment + enhanced escalation_package) moved to
a FastAPI BackgroundTask. The escalating engineer no longer waits
on 15-25s of Sonnet latency — handoff creation returns as soon as
snapshot, status flip, dual-write, documentation, PSA push, and
notify() are committed. enrich_escalation_async opens its own DB
session, runs both AI calls, updates handoff.ai_assessment +
session.escalation_package, commits, and publishes a new
`handoff_assessment_ready` event on the escalation bus. Frontend
doesn't yet listen for that event — the magic-moment screen still
shows a placeholder ("AI assessment is still generating. Reopen
this view in a few seconds…") which is honest about the state.
Live polling / auto-refresh on the bus event is the natural next
step.
3) ChatSidebar entries now surface the problem summary as a secondary
line and tag PSA-linked sessions with a monospace #ticket badge plus
an "Escalated" pill on in-transit sessions. ChatListItem grew
problem_summary, psa_ticket_id, and status fields; loadChats
populates them from listSessions. The user couldn't tell their own
sessions apart in the sidebar because they all rendered as "New
Chat" with no distinguishing detail — this fixes that for any
session, escalated or not.
Test plan
- Backend full suite: 1103 passed in 255.85s with -n auto.
- Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the legacy flowpilot_engine.escalate_session orchestration with
a single canonical path through HandoffManager. Every escalation now
creates a SessionHandoff row, fans out via the SSE bus, persists
AppNotification rows for the bell icon, dispatches to external channels
(Slack/Teams) via notify(), and emails per-user — regardless of whether
the call entered through /escalate (legacy URL) or /handoff (new URL).
The senior-pickup magic-moment screen now works end-to-end from the
EscalateModal bell-icon path the user just tested.
Backend
- HandoffCreateRequest gains optional target_user_id (the equivalent of
the legacy escalated_to_id field). Self-targeting rejected.
- HandoffManager.create_handoff handles intent='escalate' end-to-end:
sets escalation_reason + escalated_to_id, builds the legacy enhanced
AI escalation_package (Sonnet, lazy-imported from flowpilot_engine,
graceful fallback on failure), and merges handoff metadata into it.
Eager-loads session.steps and session.user via selectinload — required
by both the enhanced-package builder and notify() to avoid
MissingGreenlet on async lazy access.
- HandoffManager.finalize_escalation generates SessionDocumentation,
pushes documentation to PSA, and runs notify() — pre-commit so the
AppNotification rows persist atomically with the handoff.
- HandoffManager.dispatch_escalation_notifications keeps only the
fire-and-forget IO (bus publish, per-user emails) — runs post-commit.
Pulls engineer name via a separate User query rather than relying on
session.user lazy access.
- /handoff endpoint passes target_user_id through and calls
finalize_escalation pre-commit.
- /escalate endpoint is now a thin shim: owner-only session lookup,
HandoffManager.create_handoff(intent='escalate'), finalize_escalation,
commit, dispatch_escalation_notifications, return SessionCloseResponse
built from documentation + psa_result. flowpilot_engine.escalate_session
is no longer called by any endpoint.
- pickup_session accepts both 'requesting_escalation' (legacy in-flight
sessions) and 'escalated' (new canonical) so the migration is seamless
for sessions already in the queue.
- Escalation queue list and sidebar count now match either status.
Frontend
- useFlowPilotSession optimistic update flips status to 'escalated'
instead of 'requesting_escalation' so the page state matches the
unified backend response.
Verified end-to-end live: a fresh /escalate call from the junior produces
status='escalated', a SessionHandoff row, a SessionDocumentation, PSA
push attempted (no_psa for this test session), AND a bell-icon
AppNotification for the team admin with link
/pilot/{session_id}?pickup=true. Backend test suite: 1103 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.
What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
pub/sub keyed by account_id. asyncio.Queue per subscriber with
64-event maxsize and drop-on-full semantics. Designed to be swappable
for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
/api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
now publishes a `handoff_created` event to the bus BEFORE the email
fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
(publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
blocked, ready event handshake).
[gstack-context]
Decisions:
- SSE over WebSocket (one-way, browser EventSource semantics, fewer
moving parts behind Railway proxy)
- In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
- Drop-on-full subscriber queue rather than back-pressure publishers
- Bus publish ahead of email send, both wrapped in try/except so
neither can break handoff creation
- Frontend will be a fetch-based ReadableStream reader matching the
existing streamDocumentation pattern, not native EventSource
(custom-header auth)
Remaining (post-Codex):
- Frontend SSE subscription in EscalationQueue.tsx (slide-in,
reconnect, tab-title flash, prefers-reduced-motion)
- Magic-moment handoff-context screen
- Re-run the full backend test suite to verify the SSE +
dispatcher integration tests (bus units already green standalone)
Tried:
- Running the full test suite repeatedly without xdist; the per-test
DROP SCHEMA + recreate fixture made wall-clock prohibitive when
multiple stale runs collided on the same Postgres test schema.
Resolution: -n auto next time.
[/gstack-context]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.
HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
(excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
creation is authoritative; notification is advisory. This is the
graceful-degradation regression both eng + codex reviews flagged as
critical (handoff must succeed even if SMTP is down).
Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.
Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST
Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.
20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.
Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate
Existing handoff create + queue tests unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four endpoints: create handoff (park/escalate), list handoff history,
claim session, and team queue. Two routers: session-scoped router with
prefix /ai-sessions/{session_id} and queue_router with prefix /ai-sessions.
queue_router registered before ai_sessions.router to avoid /{session_id}
path conflict on GET /ai-sessions/queue.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>