docs(ai): refresh handoff for compute swap
- HANDOFF: rewritten resume point. First action on resume is `git push` (commits0f00ee5and665530fare local-only). Visual QA + bug bash is the active work; 4 plan-locked items + the structural task-lane fix all need real-browser verification. - CURRENT_TASK: add0f00ee5and665530fto the commit table; reframe "Just shipped" as a per-commit summary; flag the task-lane fix as needing visual confirmation. - SESSION_LOG: chronological entry for this session with full detail (audit, four polish items, race-condition wiring, structural task-lane fix, test status, files touched). - DECISIONS: new entry "Tag the task-lane state with an owner chatId" documenting the structural pattern, what was rejected, and the forward implication that future task-lane state slices follow the same owner-tagging pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -2,62 +2,54 @@
|
||||
|
||||
# HANDOFF.md
|
||||
|
||||
**Last updated:** 2026-04-27 22:30 EDT
|
||||
**Last updated:** 2026-04-28 02:00 EDT
|
||||
|
||||
**Active task:** **Escalation Mode** wedge build. See [`CURRENT_TASK.md`](CURRENT_TASK.md) for the full status; this file holds the resume point only.
|
||||
**Active task:** **Escalation Mode** wedge build. Full status in [`CURRENT_TASK.md`](CURRENT_TASK.md); this file is the resume point.
|
||||
|
||||
**Branch:** `feat/escalation-metric-endpoint` — pushed (latest: `029680a`). **Draft PR #155** open against `main` ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Wedge is feature-complete pending visual QA + the deferred follow-ups in `CURRENT_TASK.md`. **/escalate and /handoff are unified** — every escalation goes through `HandoffManager` and produces the full set of artifacts (handoff row, AppNotification, SSE bus event, Slack/Teams via `notify()`, per-user emails, documentation, PSA push) regardless of which URL it entered through.
|
||||
**Branch:** `feat/escalation-metric-endpoint`. Local tip is `665530f`. **Remote (origin) is at `8914391`** — the last two commits (`0f00ee5`, `665530f`) are local-only because the user is swapping computers and asked for the docs/handoff first. **Push needed on next session before continuing work.** Draft PR #155 is open against `main`.
|
||||
|
||||
## Status
|
||||
## What this session did
|
||||
|
||||
Previous session shipped the two remaining frontend slices: live-arrival SSE subscription in `EscalationQueue.tsx`, and the magic-moment `HandoffContextScreen` for senior pickup.
|
||||
Two commits, both untested in a real browser:
|
||||
|
||||
What landed (commits added to the branch):
|
||||
1. **`0f00ee5` feat(escalations): close out plan-locked wedge polish.** Four items from the design-plan audit ([`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md)):
|
||||
- **Live AI assessment refresh** — frontend listener for the `handoff_assessment_ready` SSE event, refetches the handoff and updates `magicHandoff` / `overlayHandoff` in place. Closes the async-assessment loop from `e8ba74e`.
|
||||
- **Suggested-step chips** below the composer in `AssistantChatPage` — surfaces `ai_assessment_data.suggested_steps[]` post-claim, click prefills the input, hides on first send or explicit X.
|
||||
- **Unread 6px dot** on `EscalationQueue` cards — localStorage-persisted seen set (`rf-escalation-seen`), clears on open OR claim (NOT hover; Codex correction).
|
||||
- **Race-condition toast on claim conflict** — new `HandoffAlreadyClaimedError` exception, endpoint returns 409 with structured `{claimed_by_id, claimed_by_name, claimed_at}`, frontend shows `"Already claimed by {name} {time_ago}."` and bounces the loser back to the queue. Backed by 2 new tests; full handoff/escalation suite (34 tests) green.
|
||||
|
||||
- `b8627f4` feat(escalations): subscribe EscalationQueue to live SSE arrivals — `streamEscalations` in `aiSessions.ts` (fetch-based `ReadableStream` parser; native `EventSource` can't send auth headers); `HandoffCreatedEvent` + `EscalationStreamHandlers` types; `EscalationQueue.tsx` rewrite with `AbortController`-managed subscription, exponential-backoff reconnect (1s → 30s cap, resets on `ready`), prepend-on-arrival with locked 200ms slide-in, tab-title `(N)` prefix while `document.hidden`, `prefers-reduced-motion` swap, ARIA live region.
|
||||
- `f65b657` docs(ai): handoff state after frontend SSE slice lands.
|
||||
- `8e9d22e` feat(escalations): magic-moment handoff-context screen on pickup — new `HandoffContextScreen.tsx` (4 sections; renders gracefully when `ai_assessment` is null per the 5s timeout from `9bdd995`; ARIA dialog + focus on primary CTA + Esc dismiss for re-open overlay; `prefers-reduced-motion` honored). `FlowPilotSessionPage.tsx` integration: on `?pickup=true`, fetch the handoff list first (account-scoped via RLS, no claim required), find the latest unclaimed escalate handoff, render the screen and skip `loadSession` (senior would 404 pre-claim). "Start here" calls `claimHandoff`, drops the pickup query, and dismisses — `loadSession` then fires because senior is now `escalated_to_id`. Toolbar "Context" button on active sessions re-opens the screen as a dismissible overlay (visible only when senior arrived via the magic-moment flow this session).
|
||||
- `c194ba4` docs(ai): handoff state after magic-moment screen lands.
|
||||
- `641853a` fix(escalations): bell-icon notification opens the pickup flow — `_build_notification_link` for `session.escalated` now ends with `?pickup=true` so notification clicks route through the senior-pickup flow. `GET /ai-sessions/{id}` now allows account-scoped read for `requesting_escalation` / `escalated` status (RLS already enforces tenant boundary; the owner-only guard was overly restrictive for explicitly-shared in-transit states). Without these two fixes the user observed bell-icon clicks "just clearing the notification" — the navigation was happening but landing on a 404 the senior couldn't escape from.
|
||||
- `2a2329a` docs(ai): handoff state after bell-icon fix; record draft PR #155.
|
||||
- `029680a` feat(escalations): unify `/escalate` through `HandoffManager` — single canonical path for every escalation. `HandoffCreateRequest.target_user_id` added (rejects self-targeting). `HandoffManager.create_handoff` for intent='escalate' now sets `session.escalation_reason` + `escalated_to_id`, builds the legacy AI-enhanced escalation_package via Sonnet (lazy-import from flowpilot_engine, graceful fallback), and merges handoff metadata into it; eager-loads `session.steps` + `session.user` to dodge async lazy-load greenlet errors. New `HandoffManager.finalize_escalation` runs `_generate_documentation` + `_push_to_psa` + `notify()` pre-commit so the AppNotification rows and PSA writes land atomically with the handoff. `dispatch_escalation_notifications` keeps only fire-and-forget IO (bus publish + per-user emails) post-commit. The `/escalate` endpoint is a thin shim: owner-only session lookup → `create_handoff(intent='escalate')` → `finalize_escalation` → commit → `dispatch_escalation_notifications` → return `SessionCloseResponse`. `flowpilot_engine.escalate_session` is no longer called by any endpoint. `pickup_session` accepts both `requesting_escalation` and `escalated` for in-flight migration. Escalation queue list + sidebar count match either status.
|
||||
2. **`665530f` fix(assistant-chat): tag task-lane state with owner chatId.** Structural fix for the recurring "new session shows previous session's task lane" bug. The earlier fix `8914391` only covered the mount-time entry path; this change makes stale data structurally unable to display by adding `taskLaneOwnerChatId` state and a render gate `taskLaneOwnerChatId === activeChatId` ANDed into all three render conditions. Persistence effect now writes ownership chatId, not active chatId — that was the original write-side bug. See [`DECISIONS.md`](DECISIONS.md) for the architecture write-up.
|
||||
|
||||
Verified:
|
||||
|
||||
- `tsc -b` exit 0 after each frontend commit.
|
||||
- Full backend test suite after unification: `1103 passed in 259.63s` with `-n auto`.
|
||||
- Live SSE handshake against the running dev stack: 200 + `text/event-stream`; `ready` frame on connect; `handoff_created` frame with full payload arrived after posting a handoff via the API. Wire format matches the parser exactly.
|
||||
- Live claim flow against the running dev stack: `listHandoffs` returns the unclaimed handoff for a senior pre-claim; `claimHandoff` flips session status from `escalated` → `active` and sets `escalated_to_id`; subsequent `GET /ai-sessions/{id}` succeeds.
|
||||
- Live access-policy verification: senior (non-owner, non-target) can now `GET` an in-transit escalated session detail.
|
||||
- Live unification verification: a single legacy `/escalate` call from a junior produced status='escalated', a `SessionDocumentation`, a `SessionHandoff` row, an attempted PSA push (`no_psa` since no ticket linked), AND an `AppNotification` row for the team admin with title "Session escalated by Jordan Tech" and link `/pilot/{session_id}?pickup=true`. The bell-icon click now lands the senior in the magic-moment flow with the actual handoff data.
|
||||
|
||||
Not yet verified (would need a real browser session): the slide-in animation visually plays, tab title actually updates, reduced-motion media-query path renders, AbortController cleanup on unmount, exponential backoff after a real network blip, the magic-moment screen layout/typography looks right, dissolve transition feels right. Wire contract + integration semantics are confirmed; visuals are next.
|
||||
|
||||
Smoke-test artifact: a single test handoff (`0f6149db…` on session `50ea20d4…`) was claimed during verification and is now an `active` session owned by the engineer test user. Harmless; useful as visual demo data.
|
||||
Verified: `tsc -b` clean after both. Backend handoff/escalation suite (34 tests) green. **Not verified:** anything in a real browser. The user explicitly asked for a debugging session after implementation — that's the next thing.
|
||||
|
||||
## Resume point
|
||||
|
||||
1. **Visual QA via `/qa` against the dev stack.** End-to-end demo flow: junior escalates via EscalateModal → senior gets bell-icon notification → senior clicks the notification (now routes through `?pickup=true`) → magic-moment screen renders with the rich handoff data → Start here → FlowPilot session view loads. Also: open `/escalations` as senior with a second session escalating in the background, watch the slide-in + tab-title flash. The PR description has a checklist mirroring this.
|
||||
2. **Pick up the deferred follow-ups** in `CURRENT_TASK.md`. Highest-leverage: suggested-step chips below the chat input (Codex correction, locked design — needs threading through `FlowPilotSession` → `FlowPilotMessageBar`). Next: `HandoffManager._generate_snapshot` expansion to include the recent diagnostic timeline pre-claim — though this is lower-priority now that the unified path already merges the legacy enriched escalation_package into the dual-write, so the magic-moment screen has access to `steps_tried` / `remaining_hypotheses` / `suggested_next_steps` once it's wired to read them.
|
||||
3. Optional v1: owner-facing `/analytics/escalations` page; Playwright e2e for the GTM Loom demo path.
|
||||
4. Eventual cleanup: `flowpilot_engine.escalate_session` is no longer called by any endpoint and could be deleted; the legacy `SessionBriefing` render branch in `FlowPilotSessionPage.tsx` is effectively dead code for any new escalation (magic-moment takes over) but still useful for in-flight legacy `requesting_escalation` sessions during the transition window. Both can come out after pilots have run a couple of weeks on the unified path.
|
||||
1. **First action: `git push` the two local commits.** `0f00ee5` and `665530f` are local-only.
|
||||
2. **Visual QA + bug bash.** End-to-end demo flow:
|
||||
- Junior escalates → senior gets bell-icon notification → click → magic-moment screen with **placeholder AI assessment** (because it's now async/background) → assessment populates **in place** within ~5–15s without manual reopen → Start here → chat surface loads with **suggested-step chips** above the composer → click a chip prefills input.
|
||||
- On `/escalations`: backgrounded tab gets `(N)` title prefix when an arrival fires; new card has **6px accent dot** top-right; clicking the card body OR Pick Up clears the dot (verify it persists across refresh, doesn't clear on hover).
|
||||
- Race condition: claim the same handoff from two browsers; loser sees toast `"Already claimed by {name} {time_ago}."` and bounces.
|
||||
- **Task-lane regression check:** create a new session via dashboard prefill / pickup / "New Chat" — the lane must NOT flash the previous session's questions/actions. The user previously reported this happening repeatedly; the fix in `665530f` should kill it. If it still happens, that's the next debug target.
|
||||
3. **Deferred follow-ups in `CURRENT_TASK.md`:** snapshot expansion, owner-facing `/analytics/escalations` page, Playwright e2e for the GTM Loom demo path, eventual cleanup of `flowpilot_engine.escalate_session` and the dead `FlowPilotSessionPage.tsx` magic-moment branch.
|
||||
|
||||
## Useful breadcrumbs
|
||||
|
||||
- SSE endpoint: [`backend/app/api/endpoints/session_handoffs.py`](../backend/app/api/endpoints/session_handoffs.py) — `stream_escalations`.
|
||||
- Pub/sub bus: [`backend/app/core/escalation_bus.py`](../backend/app/core/escalation_bus.py).
|
||||
- Frontend SSE consumer: [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts) → `streamEscalations`.
|
||||
- Frontend SSE consumer: [`frontend/src/api/aiSessions.ts`](../frontend/src/api/aiSessions.ts) → `streamEscalations` (now dispatches `handoff_created` AND `handoff_assessment_ready`).
|
||||
- Live-arrival queue UI: [`frontend/src/components/flowpilot/EscalationQueue.tsx`](../frontend/src/components/flowpilot/EscalationQueue.tsx).
|
||||
- Magic-moment screen: [`frontend/src/components/flowpilot/HandoffContextScreen.tsx`](../frontend/src/components/flowpilot/HandoffContextScreen.tsx).
|
||||
- Pickup integration: [`frontend/src/pages/FlowPilotSessionPage.tsx`](../frontend/src/pages/FlowPilotSessionPage.tsx) — `magicState`, `handleStartHere`, `openHandoffContextOverlay`.
|
||||
- Notification dispatch: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `dispatch_escalation_notifications`.
|
||||
- Pickup integration + magic state machine + suggested-step chips + assessment-ready subscription + claim 409 handling + task-lane owner tagging: [`frontend/src/pages/AssistantChatPage.tsx`](../frontend/src/pages/AssistantChatPage.tsx).
|
||||
- Claim conflict exception: [`backend/app/services/handoff_manager.py`](../backend/app/services/handoff_manager.py) — `HandoffAlreadyClaimedError`, `claim_session`, `enrich_escalation_async`.
|
||||
- Metric endpoint: [`backend/app/api/endpoints/flowpilot_analytics.py`](../backend/app/api/endpoints/flowpilot_analytics.py).
|
||||
|
||||
## Watch-outs
|
||||
|
||||
- The two new commits are **local-only** until pushed. Run `git push` before any other work.
|
||||
- The assessment-ready subscription opens a fresh SSE connection scoped by `assessmentMissing && trackedHandoffId`. If you change the magic-moment lifecycle, double-check the cleanup deps don't churn the subscription.
|
||||
- The claim conflict path is currently only wired into `AssistantChatPage.handleStartHere`. `useHandoff` (used by `SessionQueuePage`) and `FlowPilotSessionPage.tsx` (dead) were not updated. If `SessionQueuePage` claims start mattering, mirror the same `axios.isAxiosError(e) && e.response?.status === 409` extraction.
|
||||
- The handoff snapshot is still sparse (`problem_summary, problem_domain, status, step_count, confidence_tier`). Magic-moment "What's been tried" still only shows engineer notes + step count pre-claim.
|
||||
- `HandoffResponse.ai_assessment_data.confidence` is typed `number` on the frontend but the backend currently emits `'low' | 'medium' | 'high'`. Runtime handles both; type definition is stale.
|
||||
- Toolbar "Context" button is hidden on revisited active sessions where the senior didn't arrive via magic-moment this session — known scope cut.
|
||||
- Do not reintroduce `client.stream()`/ASGITransport tests for infinite SSE responses; test the generator directly.
|
||||
- The bus is acceptable for v1 pilot scale only because Railway is single-replica. Redis pub/sub is the obvious swap when horizontal scaling appears.
|
||||
- `streamEscalations` doesn't drive token refresh on a mid-stream 401 — the Axios interceptor only covers axios calls. Acceptable for v1.
|
||||
- The handoff snapshot today is sparse (`problem_summary, problem_domain, status, step_count, confidence_tier` plus optional branch info). The magic-moment screen's "What's been tried" section currently shows engineer notes + step-count affordance, not the actual step timeline. Snapshot expansion is the right fix.
|
||||
- `HandoffResponse.ai_assessment_data.confidence` is typed `number` on the frontend but the backend currently emits `'low' | 'medium' | 'high'` strings. The `ConfidenceBadge` component handles both shapes at runtime; the type definition is stale and should be widened to `number | 'low' | 'medium' | 'high'`.
|
||||
- The toolbar "Context" button is hidden on revisited active sessions where the senior didn't arrive via magic-moment this session — known scope cut. Lazy-fetching handoff list on session-load (when status was previously `escalated`) is the cleanup.
|
||||
- Bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the obvious swap when horizontal scaling appears.
|
||||
|
||||
Reference in New Issue
Block a user