The code-server LXC has bun and docker but no python/node/npm on PATH,
which left Codex unable to reproduce build/test commands. Adds a 6-line
block to PROJECT_CONTEXT.md showing the docker exec resolutionflow_{backend,frontend}
form, and updates the AGENTS.md "Tooling you do NOT have" line to point
Codex at it instead of suggesting toolchain installs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Page-level Resolve patches applied_pending → applied_success before
opening the resolution flow, so resolved sessions don't carry a
provisional pending fix.
- Page-level Escalate intercept now catches applied_pending in addition
to verifying/partial; intercept copy generalized from "Verifying state"
to "still needs an outcome."
- PendingBanner gains a Dismiss action, matching the PR body and the
backend's allowed pending → dismissed transition.
- resolution_note_generator and escalation_package_generator system
prompts no longer include real-looking pending examples (anti-parrot
guardrail compliance).
Verified via Docker: prompt anti-parrot 2/2, suggested-fix outcome suite
21/21, frontend tsc -b clean, npm run build clean.
Co-Authored-By: Codex <noreply@openai.com>
Closes out Escalation Mode (PR #155 merged) and pivots active task to
the new applied_pending suggested-fix outcome on PR #156.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Engineer applies a fix but can't verify yet (waiting on client power-cycle,
AD replication, async sync). Today the verifying banner forces a synchronous
verdict (worked / didn't / partial) — anything else means leaving the banner
stale or guessing wrong. This adds a fourth outcome that parks the fix in a
non-terminal "Awaiting verification" state with a reason ("waiting on what?")
and exposes it on the chat-anchored banner so the engineer doesn't lose track.
Backend
- New non-terminal status `applied_pending` parallel to `applied_partial`.
- New `pending_reason` column (nullable Text) — the "what are you waiting on?"
prose, mirrors `partial_notes`. Required when outcome=applied_pending.
- Outcome endpoint allows pending in/out transitions; pending stamps
applied_at but NOT verified_at (it's parked, not verified).
- Resolution-note + escalation-package prompts handle the new status:
resolution note frames the fix as provisional; escalation package surfaces
pending verification as the leading hypothesis with reference to what's
being waited on.
- Migration: add column + extend status CHECK constraint.
Frontend
- New `BannerMode = 'pending'` + `PendingBanner` component (info-tone,
parallel to PartialBanner) with worked / didn't / update-reason actions.
- VerifyingBanner overflow menu adds "Waiting to verify…".
- Nudge banner's "Still checking" button now actually records pending with
a reason, instead of just silencing for the session.
- AssistantChatPage banner-mode derivation maps applied_pending → 'pending'.
Tests: 4 new integration tests covering pending notes requirement, reason
storage + applied_at/verified_at semantics, pending→success transition,
and pending_reason update on re-PATCH.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Magic-moment handoff-context screen on senior pickup, live SSE escalation arrivals, time-to-first-action metric, role-gated claim with atomic conflict resolution, and chat ownership extension for claimed sessions.
Codex review pass on the escalation wedge. Reworks claim_session from
read-then-write to a conditional UPDATE so two seniors racing can't both
win, blocks the original engineer from claiming their own handoff, and
filters self-escalated sessions out of the dashboard escalation queue.
Also preassigns the handoff UUID before flush so the compatibility
escalation_package payload carries it. Removes legacy frontend pickup
state (claiming, handleStartHere) that broke tsc --noEmit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All paths pass. One critical fix: chat endpoint now allows escalated_to_id
as a valid sender so the senior can run AI analysis on claimed sessions.
PR #155 ready for review.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
unified_chat_service.send_chat_message checked AISession.user_id == user_id,
blocking the senior who claimed an escalation from sending the AI briefing.
Now also allows AISession.escalated_to_id == user_id (the claimer).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- HANDOFF: rewritten resume point. AI summary blocker is the active
task; consolidation plan is the path. 5-step implementation order
with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
live-test results (what works, the AI summary blocker), full
consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
pickup bugs found + fixed, the three Enter/dashboard/timeout
fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
calls into one structured generation" — rejected alternatives
(bump timeout further, copy status-update content the wrong way,
switch to Haiku) and consequences (5s magic-moment, ~60% token
reduction, instant Ticket Notes button, schema enforcement
required, migration concerns documented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bundles four fixes from the live debugging session:
1. AssistantChatPage: replace urlSessionId === activeChatId gate with a
loadedChatIdsRef. After 8914391 made activeChatId initialize from
urlSessionId, the gate short-circuited fresh mounts and selectChat
never fired. Symptom: senior picks up an escalation, lands on a blank
chat surface with no conversation history and no sidebar entry. Fix
also adds loadChats() in handleStartHere so the picked-up session
appears in the sidebar (its escalated_to_id is null pre-claim, so
listSessions doesn't return it until claim_session sets it).
2. config: bump ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS 15s → 45s.
Sonnet was hitting tail latency at 15s in the field, leaving the
magic-moment placeholder permanent. Background-task architecture
(e8ba74e) means this no longer blocks the user; it's just the budget
before publishing has_assessment=false. NOTE: live test still shows
assessment not populating — see HANDOFF for the consolidation plan
that supersedes this.
3. Enter-to-submit: chat-input convention (Enter submits, Shift+Enter
inserts newline) on the escalate-flow forms. RichTextInput gains an
optional onSubmit prop; EscalateModal wires it to handleSubmit;
ConcludeSessionModal gets the same handler on its plain textarea.
4. PendingEscalations: each row is now expandable. Click row body to
reveal the engineer's escalation reason, step count on record,
confidence tier, and PSA ticket number. Pick Up still clicks through
directly. Single-expand-at-a-time keeps the dashboard compact.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- HANDOFF: rewritten resume point. First action on resume is `git push`
(commits 0f00ee5 and 665530f are local-only). Visual QA + bug bash is
the active work; 4 plan-locked items + the structural task-lane fix
all need real-browser verification.
- CURRENT_TASK: add 0f00ee5 and 665530f to the commit table; reframe
"Just shipped" as a per-commit summary; flag the task-lane fix as
needing visual confirmation.
- SESSION_LOG: chronological entry for this session with full detail
(audit, four polish items, race-condition wiring, structural
task-lane fix, test status, files touched).
- DECISIONS: new entry "Tag the task-lane state with an owner chatId"
documenting the structural pattern, what was rejected, and the
forward implication that future task-lane state slices follow the
same owner-tagging pattern.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous fix (8914391) only blocked the mount-time sessionStorage
restore when the page entered with prefill or ?pickup=true. It didn't
cover any path where the page was already mounted and activeChatId
flipped without the in-memory task-lane state going through reset+
repopulate cleanly — in-place URL navigation, mid-flight pickup,
HMR re-runs, the gap between setActiveChatId(B) and the AI response
that finally populates B's questions/actions.
Root cause: activeQuestions / activeActions / showTaskLane were never
intrinsically tied to a chatId. They were treated as "the active chat's
data" by convention, with no structural enforcement. Any window where
they survived past their owning chat leaked previous-session data into
the new view. The persistence effect made it worse: it stamped the
sessionStorage chatId field with activeChatId at write time, so a
mid-transition snapshot {chatId: B, questions: [A's]} would happily
restore A's data for B on the next mount.
Fix: introduce taskLaneOwnerChatId state that records the chatId those
in-memory questions/actions/show values BELONG to. Set at every site
that populates them (sendPrefill, selectChat, handleSend, handleTaskSubmit,
handleResumeNew, refreshFacts, handleApplyFix). Cleared in
resetSessionDerivedState. The persistence effect now writes ownerChatId
as the chatId tag, not activeChatId — so the snapshot is always
self-consistent.
Render gate: taskLaneIsForActiveChat = ownerChatId === activeChatId.
ANDed into all three render conditions (toolbar Tasks button, narrow-
viewport floating drawer, main side panel). The lane is structurally
unable to display data tagged with a different chat.
The mount-time skipTaskLaneRestore guard stays — it kills the flash
between component mount and the first sendPrefill effect run, which
the owner-gate alone doesn't cover.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four items from the design-plan audit, all flagged as locked-design or
Codex corrections, shipped together so the GTM demo path covers them
end-to-end before bug bash.
1. Live AI assessment refresh on the magic-moment screen. Backend already
publishes handoff_assessment_ready when enrich_escalation_async commits;
wire the frontend listener so the senior sees the assessment populate
without a manual reopen. New event type + onAssessmentReady handler on
streamEscalations; AssistantChatPage opens a scoped SSE subscription
whenever it tracks a handoff missing its assessment, refetches on match,
and replaces magicHandoff / overlayHandoff in place. Closes the loop on
the async-assessment commit e8ba74e.
2. Suggested-step chips below the chat input. Locked design from the plan
(Codex correction). Chip strip renders above the composer post-claim
when ai_assessment_data.suggested_steps[] is non-empty. Click prefills
the input and focuses; first send or explicit X hides for the session.
3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen
set (rf-escalation-seen, capped 200). Dot top-right when not seen.
Cleared on open (card click) or claim (Pick Up) — NOT on hover, per
Codex correction. Pick Up stops propagation so it doesn't double-fire.
4. Race-condition toast on claim conflict. The /claim endpoint previously
silently overwrote claimed_by — both seniors thought they owned the
session. New HandoffAlreadyClaimedError carries the winner's id/name/
timestamp; claim_session rejects different-user re-claims (same-user is
idempotent for double-click safety); endpoint returns 409 with
structured detail. AssistantChatPage.handleStartHere extracts and
surfaces "Already claimed by {name} {time_ago}." via toast, drops
?pickup=true, dismisses magic-moment so the loser flows back to queue.
Tests: 2 new unit tests in test_handoff_manager.py (conflict raises,
same-user idempotent). Full handoff + escalation suite (34 tests) green.
Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two compounding bugs caused the previous session's questions/actions
to render briefly when entering a new chat — visible as "the new
session instantly pops with old session task-lane data" the user
reported.
The race
- AssistantChatPage's activeQuestions / activeActions / showTaskLane
useState initializers synchronously read sessionStorage's
rf-tasklane-meta. They restore the persisted task-lane state if its
saved chatId matches the freshly-resolved activeChatId.
- On dashboard prefill flow, the page mounts on /pilot with
location.state.prefill set; activeChatId initializes from
sessionStorage's rf-active-chat-id (the previous session). The
previous session's task-lane meta matches that chatId — so the
initializer restores it. First paint shows old questions/actions.
sendPrefill's resetSessionDerivedState fires later from a useEffect,
but only after the flash.
- Same pattern hits the senior-pickup flow: ?pickup=true means we're
about to render the magic-moment screen and discard whatever chat
the senior was previously on, but the underlying chat surface still
initializes with their old task-lane meta.
The amplifier
- resetSessionDerivedState wiped the in-memory state but never
removed sessionStorage's rf-tasklane-meta. Any remount or reload
before the next persistence-effect write could re-hydrate the
cleared state from the still-stale sessionStorage entry.
Fixes
- Initializer guard: when location.state.prefill is set OR
?pickup=true is in the URL, skip the sessionStorage restore
entirely. Kills the first-paint flash for both entry paths.
- Eager wipe: resetSessionDerivedState now also calls
sessionStorage.removeItem('rf-tasklane-meta'). The persistence
effect re-saves on the next state change anyway, so the only
window where sessionStorage is empty is the exact window where
stale-tag leakage was happening.
tsc -b clean. No backend changes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three improvements driven by live wedge testing.
1) Notification title now includes a problem snippet and PSA ticket
suffix when present:
"Escalation from Jane · #12345: Outlook is failing to sync email…"
Replaces the prior "Session escalated by Jane" copy that made every
escalation from the same junior look identical in the bell panel.
Snippet is trimmed to 70 chars with ellipsis. handoff_manager now
passes psa_ticket_id through in the notify() payload so this works
for both /escalate and /handoff entry points.
2) AI enrichment (assessment + enhanced escalation_package) moved to
a FastAPI BackgroundTask. The escalating engineer no longer waits
on 15-25s of Sonnet latency — handoff creation returns as soon as
snapshot, status flip, dual-write, documentation, PSA push, and
notify() are committed. enrich_escalation_async opens its own DB
session, runs both AI calls, updates handoff.ai_assessment +
session.escalation_package, commits, and publishes a new
`handoff_assessment_ready` event on the escalation bus. Frontend
doesn't yet listen for that event — the magic-moment screen still
shows a placeholder ("AI assessment is still generating. Reopen
this view in a few seconds…") which is honest about the state.
Live polling / auto-refresh on the bus event is the natural next
step.
3) ChatSidebar entries now surface the problem summary as a secondary
line and tag PSA-linked sessions with a monospace #ticket badge plus
an "Escalated" pill on in-transit sessions. ChatListItem grew
problem_summary, psa_ticket_id, and status fields; loadChats
populates them from listSessions. The user couldn't tell their own
sessions apart in the sidebar because they all rendered as "New
Chat" with no distinguishing detail — this fixes that for any
session, escalated or not.
Test plan
- Backend full suite: 1103 passed in 255.85s with -n auto.
- Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two field-reported issues from live wedge testing.
ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS bumped 5s → 15s. The 5s bound
fired too aggressively against the Sonnet diagnostic assessment prompt;
~4-8s is typical but tail latency hits 12-14s. The fallback "Assessment
unavailable — model didn't respond in time" placeholder was showing on
the magic-moment screen for two consecutive escalations, which kills
the demo. 15s keeps the click-path bounded but lets the typical case
return real content. Real fix is async generation (kick off, persist
when done, surface "still computing" with refresh) — captured as a
follow-up; bumping the bound is the right call for the wedge demo.
list_sessions now matches escalated_to_id == current_user.id alongside
the existing user_id and escalation_package.picked_up_by clauses. The
unified HandoffManager.claim_session sets escalated_to_id but doesn't
write the legacy picked_up_by JSONB key, so picked-up sessions never
showed in the senior's chat list — the senior would land on the
session detail (active chat) but the sidebar showed only their other
unrelated sessions. User reported this as "4 different versions of the
session in the chat history section" — they were actually 4 unrelated
empty sessions the senior owned, plus the picked-up session was just
invisible. Backend tests still 94/94.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The /pilot/:id route renders AssistantChatPage, not FlowPilotSessionPage
(the latter is dead code with no active route). The earlier magic-moment
integration sat in the wrong file, so clicking Pick Up from the
dashboard navigated to /pilot/:id?pickup=true and AssistantChatPage
just loaded the chat surface with no claim — the senior never saw the
magic-moment screen and the handoff stayed unclaimed (status escalated,
permanently in the queue).
Adds full pickup awareness to AssistantChatPage:
- ?pickup=true on entry triggers a handoff fetch via
handoffsApi.listHandoffs (account-scoped, no claim required).
magicState transitions loading → visible (handoff found) or
loading → dismissed (no handoff or fetch failed). The dismiss path
also strips ?pickup=true from the URL so a refresh doesn't re-enter
loading state.
- The existing selectChat-from-URL effect is gated on magicState — it
skips while we're loading or showing the magic-moment so the chat
surface doesn't race the claim flow. After claim it re-fires and
populates messages from conversation_messages because the senior is
now escalated_to_id and GET succeeds.
- Magic-moment renders as full-page take-over (sidebar hidden) until
Start here. handleStartHere calls handoffsApi.claimHandoff, drops
?pickup=true, and dismisses — the regular chat then loads.
- Toolbar Context button (visible when magicHandoff is in memory)
re-opens the screen as a dismissible overlay. Lazy-fetches the
handoff when needed.
Verified tsc -b clean and Vite HMR picked the file up without errors.
The wire-level integration was already verified in earlier commits:
listHandoffs returns the unclaimed handoff for a senior pre-claim,
claimHandoff flips status escalated → active and sets escalated_to_id.
Note: the prior FlowPilotSessionPage magic-moment integration is now
in dead code (file is unreferenced from router). Left in place for
this commit; will come out in a follow-up cleanup once we're confident
the AssistantChatPage path is solid in production.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Records 029680a — every escalation now funnels through HandoffManager
regardless of which URL it entered through, so /escalate from
EscalateModal produces the full set of artifacts (handoff row,
AppNotification, SSE event, Slack/Teams via notify, per-user emails,
documentation, PSA push) and the bell-icon notification opens the
magic-moment screen end-to-end. Notes the legacy SessionBriefing branch
+ flowpilot_engine.escalate_session as orphaned, scheduled for removal
after pilots have run a couple of weeks on the unified path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the legacy flowpilot_engine.escalate_session orchestration with
a single canonical path through HandoffManager. Every escalation now
creates a SessionHandoff row, fans out via the SSE bus, persists
AppNotification rows for the bell icon, dispatches to external channels
(Slack/Teams) via notify(), and emails per-user — regardless of whether
the call entered through /escalate (legacy URL) or /handoff (new URL).
The senior-pickup magic-moment screen now works end-to-end from the
EscalateModal bell-icon path the user just tested.
Backend
- HandoffCreateRequest gains optional target_user_id (the equivalent of
the legacy escalated_to_id field). Self-targeting rejected.
- HandoffManager.create_handoff handles intent='escalate' end-to-end:
sets escalation_reason + escalated_to_id, builds the legacy enhanced
AI escalation_package (Sonnet, lazy-imported from flowpilot_engine,
graceful fallback on failure), and merges handoff metadata into it.
Eager-loads session.steps and session.user via selectinload — required
by both the enhanced-package builder and notify() to avoid
MissingGreenlet on async lazy access.
- HandoffManager.finalize_escalation generates SessionDocumentation,
pushes documentation to PSA, and runs notify() — pre-commit so the
AppNotification rows persist atomically with the handoff.
- HandoffManager.dispatch_escalation_notifications keeps only the
fire-and-forget IO (bus publish, per-user emails) — runs post-commit.
Pulls engineer name via a separate User query rather than relying on
session.user lazy access.
- /handoff endpoint passes target_user_id through and calls
finalize_escalation pre-commit.
- /escalate endpoint is now a thin shim: owner-only session lookup,
HandoffManager.create_handoff(intent='escalate'), finalize_escalation,
commit, dispatch_escalation_notifications, return SessionCloseResponse
built from documentation + psa_result. flowpilot_engine.escalate_session
is no longer called by any endpoint.
- pickup_session accepts both 'requesting_escalation' (legacy in-flight
sessions) and 'escalated' (new canonical) so the migration is seamless
for sessions already in the queue.
- Escalation queue list and sidebar count now match either status.
Frontend
- useFlowPilotSession optimistic update flips status to 'escalated'
instead of 'requesting_escalation' so the page state matches the
unified backend response.
Verified end-to-end live: a fresh /escalate call from the junior produces
status='escalated', a SessionHandoff row, a SessionDocumentation, PSA
push attempted (no_psa for this test session), AND a bell-icon
AppNotification for the team admin with link
/pilot/{session_id}?pickup=true. Backend test suite: 1103 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates the handoff trio after the legacy notification flow fix and
the branch push. PR #155 is open against main as draft. Resume point
is now visual QA via /qa, then deferred follow-ups (chat-input
suggested-step chips, snapshot expansion). Logs the open question
about whether EscalateModal should switch to /handoff.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two backend changes that unbreak the senior-pickup path from the
notification panel:
1. notification_service: session.escalated link template now ends with
?pickup=true so the senior lands in the handoff/pickup flow on
click. Without it, navigation hit /pilot/:id directly, which then
404'd on the GET because the senior isn't yet escalated_to_id —
the user perceives this as the bell-icon "just clearing the
notification".
2. ai_sessions GET access: any account member can now read an escalated
session's detail when status is requesting_escalation or escalated.
The owner-only guard was overly restrictive for explicitly-shared
in-transit states. Tenant boundary is enforced by RLS on the
underlying query, so account-scope is the right ceiling here. After
pickup, the existing handler/escalated_to_id checks still apply.
Verified live: re-login as the senior engineer and GET the active
escalated session — now returns 200 with full detail. Focused test
subset plus tests/test_sessions.py and tests/test_session_sharing.py
→ 94 passed in 43.26s, no regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Marks the magic-moment handoff-context screen as shipped, points the
next session at visual QA + push + draft PR, and captures the deferred
follow-ups (suggested-step chips, snapshot expansion, toolbar button
on revisits, owner analytics, Playwright e2e).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the dedicated 4-section handoff-context view that renders BEFORE
the FlowPilot session for senior techs picking up an escalated
session, then dissolves on "Start here". This is the wedge's
demonstrable magic moment — what the GTM Loom records.
- HandoffContextScreen.tsx: pure presentational, takes a HandoffResponse
plus onStartHere / onDismiss callbacks. Sections: header
(problem summary, domain, step count, escalated-time, priority badge),
"What's been tried" (engineer notes + step-count affordance), "AI
assessment" (likely_cause / suggested_steps / confidence badge), Start
here CTA. Confidence badge accepts both numeric (0..1) and string
("low"/"medium"/"high") shapes — backend currently emits the latter.
Renders an explicit "assessment unavailable" branch when
ai_assessment_data is null (the 5s timeout from 9bdd995 fired).
Honors prefers-reduced-motion (animate-fade-in vs animate-slide-up).
ARIA dialog + focus on the primary CTA. Esc dismisses when used as a
re-openable overlay; pre-claim, Start here is the only exit.
- FlowPilotSessionPage.tsx: on /pilot/:id?pickup=true, fetch the
handoff list via handoffsApi.listHandoffs (account-scoped via RLS,
no claim required) and find the latest unclaimed escalate handoff.
If found, render the magic-moment screen and skip the regular
loadSession (the senior isn't yet escalated_to_id, so GET would
404). Start here calls claimHandoff, drops the pickup query param,
dismisses the screen — the existing loadSession effect then fires
because the senior is now escalated_to_id. A "Context" toolbar
button on active sessions re-opens the screen as a dismissible
overlay (visible only when the senior arrived via the magic-moment
flow this session — handoff lookup on demand).
Verified end-to-end against the running dev stack: listHandoffs
returns the unclaimed handoff with full payload; claim flips session
status from escalated → active; subsequent GET succeeds. tsc -b clean.
Defers (TODO followups): suggested-step chips below the chat input
that prefill on click (requires threading through to
FlowPilotMessageBar); snapshot expansion to include the recent
diagnostic steps pre-claim; toolbar Context button on sessions where
the senior didn't arrive via magic-moment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Marks the SSE subscription as shipped, points the next-session resume
target at the magic-moment handoff-context screen, and logs the live
end-to-end verification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the frontend live-arrival slice on top of the test-stabilized SSE
backend. Senior techs now see a junior's escalation slide into the
queue without refresh.
- streamEscalations(handlers, signal) in aiSessions.ts: fetch-based
ReadableStream parser (native EventSource cannot send auth headers).
Handles SSE frames, partial frames across chunks, : keepalive
heartbeats. Dispatches ready and handoff_created.
- HandoffCreatedEvent + EscalationStreamHandlers types mirror the bus
payload published by HandoffManager.dispatch_escalation_notifications.
- EscalationQueue.tsx: AbortController-managed subscription with
exponential-backoff reconnect (1s → 30s cap, attempt counter resets
on ready). On handoff_created, refetch and diff against previous IDs
via sessionsRef; new arrivals prepended (newest-first) above
established cards (oldest-first preserved). Slide-in tag held for
800ms so the locked 200ms animation completes. Tab-title flash
prefixes (N) while document.hidden, restores on focus / unmount.
prefers-reduced-motion swaps slide-in for fade-in. ARIA region +
aria-live=polite + aria-label on heading. Pick Up bumped to py-2.5
to clear the 44px touch floor.
Verified end-to-end against the running dev stack: subscriber received
the ready frame on connect; after posting a handoff via the API, the
subscriber received the handoff_created frame with the expected
payload — wire format matches the parser. Backend regression: focused
subset still 32 passed in 18.91s. Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Default Claude Code model is being switched from Opus 4.7 1M-context to
Opus 4.7 (200k). Tighten the per-session pickup docs so they're
self-sufficient under the smaller window:
- CURRENT_TASK now reflects the post-Codex state: 8 commits on the
branch (5 feat + WIP SSE + 2 Codex test/latency fixes + 1 doc
refresh), 32/32 backend tests with -n auto, frontend tsc -b clean.
Remaining work re-scoped: the SSE backend half is feature-complete
and tested, so what's left is the FRONTEND SSE subscription in
EscalationQueue.tsx, then the magic-moment handoff-context screen,
then push + draft PR.
- Session log gets a Claude Code entry covering today's planning →
build → pause-for-Codex arc, the design decisions locked into the
doc and code, the two TODOs added (peer-tech escalation, mobile
responsive), and the model-switch context for the next session.
- HANDOFF.md needs no change — Codex's update in 9bdd995 already
describes the resume point and watch-outs cleanly.
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Update HANDOFF to reflect:
- Build paused after the WIP SSE commit (87bd0b7)
- What Codex should look at on the SSE bus + endpoint + dispatch wiring
- Resume point post-review: re-run tests with -n auto, then frontend
SSE subscription, then magic-moment screen
- Test-suite watch-out: per-test DROP SCHEMA fixture means concurrent
pytest runs on the same DB collide; always one-suite-at-a-time or
-n auto with conftest's per-worker DB isolation
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.
What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
pub/sub keyed by account_id. asyncio.Queue per subscriber with
64-event maxsize and drop-on-full semantics. Designed to be swappable
for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
/api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
now publishes a `handoff_created` event to the bus BEFORE the email
fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
(publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
blocked, ready event handshake).
[gstack-context]
Decisions:
- SSE over WebSocket (one-way, browser EventSource semantics, fewer
moving parts behind Railway proxy)
- In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
- Drop-on-full subscriber queue rather than back-pressure publishers
- Bus publish ahead of email send, both wrapped in try/except so
neither can break handoff creation
- Frontend will be a fetch-based ReadableStream reader matching the
existing streamDocumentation pattern, not native EventSource
(custom-header auth)
Remaining (post-Codex):
- Frontend SSE subscription in EscalationQueue.tsx (slide-in,
reconnect, tab-title flash, prefers-reduced-motion)
- Magic-moment handoff-context screen
- Re-run the full backend test suite to verify the SSE +
dispatcher integration tests (bus units already green standalone)
Tried:
- Running the full test suite repeatedly without xdist; the per-test
DROP SCHEMA + recreate fixture made wall-clock prohibitive when
multiple stale runs collided on the same Postgres test schema.
Resolution: -n auto next time.
[/gstack-context]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:
- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
this branch, what remains (WebSocket push, magic-moment screen, analytics
page, e2e), and the two-metric framing readers MUST internalize before
quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
notification dual-path); includes suggested first slice + watch-outs
(no user_id on ai_session_step, denormalized account_id, peer-escalation
still gated to session owner)
- Both files reference the design doc and the kill-switch criterion
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surfaces the new GET /analytics/flowpilot/escalations endpoint as a card
above the EscalationQueue list. Closes the loop from yesterday's metric
endpoint commit — seniors and owners see the wedge stat the moment they
open the queue, which is the daily-reps version of the GTM ROI story.
Pieces:
- EscalationMetrics TS interface mirroring the backend Pydantic model
(incl. metric_definition disclaimer field)
- flowpilotAnalyticsApi.getEscalationMetrics(period) client method
- EscalationMetricCard component:
* loading skeleton, error state, zero-data empty state
* avg + median + n_with_action/n_claimed conversion rate
* humanized seconds → "Ns" / "N.N min" formatting
* inline disclaimer reminding callers this is in-product time-to-
first-action only, NOT the savings claim — pair with manual
baseline (per /codex review's two-metric correction)
- Wired into EscalationQueuePage above EscalationQueue
DS-aligned: card-flat, accent-dim usage held to interactive elements,
text-muted-foreground for secondary copy, font-heading on the headline
number, explicit transition properties (no `transition: all`). Respects
prefers-reduced-motion implicitly (only animation is the loading pulse,
which Tailwind's animate-pulse already gates).
tsc -b clean. No new tests in this commit — component is a thin
state-machine over an axios call; integration coverage comes from the
existing backend tests + the e2e Playwright work in the plan.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.
HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
(excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
creation is authoritative; notification is advisory. This is the
graceful-degradation regression both eng + codex reviews flagged as
critical (handoff must succeed even if SMTP is down).
Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.
Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST
Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.
20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.
Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate
Existing handoff create + queue tests unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}
Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.
The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.
Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
aggregates avg/median/p95 in Python (~1k rows/account/month is well within
budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases
Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.
First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI
specs, and embedded review report for the Escalation Mode wedge — output of
/office-hours, /plan-eng-review, /plan-design-review, and /codex review.
Codex review surfaced two corrections we applied:
- two-metric framing (manual baseline vs in-product time-to-first-action)
- claim role gate moved in-scope (was deferred TODO)
TODO updates: peer-tech escalation + claim role gate captured (the latter then
moved in-scope by the codex pass).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CURRENT_TASK rolls forward — PR #153 closed out, no active task,
with recommended next moves (promote e2e gate to required, pick
from TODO).
- HANDOFF rewritten — new home position is `main`; documents the
e2e job's stub ANTHROPIC_API_KEY convention so future
AI-touching e2e tests know what to expect.
- SESSION_LOG entry extended with the CI env-var diagnosis, the
fix, the merge, and pointers to the natural next pickups.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes a silent-drop bug where the dashboard prefill flow created a new chat session but didn't update the in-flight guard ref, so subsequent task-lane submissions had their AI follow-up responses discarded.
Includes a Playwright regression test that drives the prefill flow and stubs /ai-sessions/*/chat to verify the second AI turn renders. Also adds a stub ANTHROPIC_API_KEY to the e2e CI job so AI-gated endpoints clear their _require_ai_enabled() check (the chat call itself is intercepted in the browser, so no real Anthropic traffic).
POST /api/v1/ai-sessions and friends call _require_ai_enabled(), which
returns 503 when no provider key is set. The new prefill-handoff
regression test (e2e/assistant-chat-prefill.spec.ts) drives the
dashboard prefill flow, which has to create a chat session before its
page.route stub on /chat can fire — so without a key, session
creation 503s and the test never sees the task lane.
The Playwright stub intercepts /chat in the browser, so the backend
never actually contacts Anthropic — but the AI-enabled gate still
needs to pass. A stub value is enough.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CURRENT_TASK.md rolled forward — the CI-recovery task is complete
(PR #150 merged as 87bb20b; backend gate is in required checks).
Active task is now landing PR #153.
- HANDOFF.md rewritten — new resume point is watching CI on the
rebased SHA 1559feb and merging when all three checks are green.
- SESSION_LOG.md gains a 2026-04-26 entry covering the prefill bug
diagnosis, fix, regression test, and the rebase off post-#150 main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The guard pattern that masked the prefill-ref bug fixed in PR #153 is
applied across handleSend, handleTaskSubmit, selectChat, refreshFacts,
refreshActiveFix, and refreshPreview. Worth either logging the
mismatch path or distinguishing expected-stale from unexpected-stale
so the next instance of this class of bug surfaces instead of hiding.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dashboard prefill flow in AssistantChatPage set activeChatId after
creating a new session but never updated currentChatRef.current. Every
later handleSend / handleTaskSubmit then tripped the
`currentChatRef.current !== sentForChatId` guard that was supposed to
discard responses for stale chats — and silently dropped the AI's
follow-up. The user saw their submitted message but no assistant
reply, no toast, no task-lane update.
Mirrors what handleNewChat and handleResumeNew already do. Adds an
e2e regression test that drives the dashboard prefill, submits a
partial task-lane response, and asserts the second AI turn renders.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Before: e2e \`needs: [frontend]\` waited for the frontend job to upload
a build artifact, then downloaded it. With multiple runners this means
the third runner sat idle for ~6 min while frontend ran, then started
e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min.
After: e2e builds its own frontend (npm ci + npm run build are already
in the job; just dropped the artifact download step and added the
build). e2e starts immediately on a free runner. Adds ~1-2 min to the
e2e job duration but removes ~5 min of waiting and eliminates the
cross-job artifact mechanism entirely.
Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches
on the cross-job handoff. The \`if: always()\` upload of the
playwright-report at the end of e2e is kept (failure report retrieval
is still useful), but it's a leaf-output, not a dependency.
Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the
3-runner setup, down from ~11 min.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>