fix(escalations): atomic claim + self-claim rejection + queue exclusion

Codex review pass on the escalation wedge. Reworks claim_session from read-then-write to a conditional UPDATE so two seniors racing can't both win, blocks the original engineer from claiming their own handoff, and filters self-escalated sessions out of the dashboard escalation queue. Also preassigns the handoff UUID before flush so the compatibility escalation_package payload carries it. Removes legacy frontend pickup state (claiming, handleStartHere) that broke tsc --noEmit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 16:21:20 -04:00
parent ab5e0deaf7
commit f10649abc2
10 changed files with 248 additions and 134 deletions
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -2,61 +2,41 @@

 **Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.

-**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch pushed; **draft PR #155** open ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Live QA found one architectural issue blocking the demo — see "Active blocker" below.
+**Status:** ✅ **Engineering complete.** Browser QA passed (2026-04-30). Branch `feat/escalation-metric-endpoint`; PR #155 ready to mark ready-for-review.

 **Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED.

 **Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md).

-## Active blocker — AI assessment still empty after pickup
+## What's done (all sessions combined)

-**The bug** (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and **the placeholder never clears**. Bus event fires with `has_assessment: false` because `_generate_ai_assessment` is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` from 15 → 45 (commit `0d1b305`) didn't fix it in the field.
+All plan items complete. Key commits on `feat/escalation-metric-endpoint`:

-**Why patching is the wrong move:** the real architectural issue is that we make **three** AI calls per escalation, all summarizing the same source material:
+| Commit | What it ships |
+|---|---|
+| `d51e95c` | Plan + test-plan artifacts |
+| `52f6d03` | `GET /analytics/flowpilot/escalations` — time-to-first-action metric |
+| `7a5b853` | Role-gate claim to engineer-or-admin |
+| `07d0db9` | Email notifications on escalation |
+| `9f0bfd4` | `EscalationMetricCard` on `/escalations` |
+| `b8627f4` | SSE live-arrival animations in `EscalationQueue` |
+| `8e9d22e` | Magic-moment handoff-context screen |
+| `641853a` | Bell-icon opens pickup flow |
+| `029680a` | Unify `/escalate` through `HandoffManager` |
+| `0f00ee5` | Plan-locked polish: chips, unread dot, race toast, AI refresh |
+| `665530f` | Structural task-lane race fix |
+| `db717b0` | 3-option CTA, copy button fix, post-escalation redirect, claim 500 fix |
+| `dc69c9d` | Allow `escalated_to_id` to send chat (GET AI analysis fix) |

-1. `_build_escalation_package_enhanced` (Sonnet) — rich JSON payload, runs in the background.
-2. `_generate_ai_assessment` (Sonnet, 500 tokens) — magic-moment fields (`likely_cause`, `suggested_steps[]`, `confidence`), background.
-3. `generate_status_update` (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in `ConcludeSessionModal`, on demand.
+**Browser QA results (2026-04-30):**

-User's correct observation (2026-04-29): the engineer is *typically* generating a status update during the escalate flow anyway. There's no reason to do that work three times.
-
-**Next active task: consolidate the three calls into one.** See `## Active task — AI generation consolidation` below.
-
-## Active task — AI generation consolidation
-
-**Goal:** ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
-
-**Proposed shape** (decide during implementation):
-
-```python
-# Persist on SessionHandoff:
-{
-  "summary_prose": "<PSA-flavored ticket-notes paragraph>",
-  "what_we_know": ["<one-liner>", ...],
-  "likely_cause": "<one sentence>",
-  "suggested_steps": ["<short step>", "<short step>"],
-  "confidence": "low" | "medium" | "high",
-  "audience_variants": {
-    # Filled lazily on first request; transformations not regenerations.
-    "client_update": null,
-    "email_draft": null,
-  }
-}
-```
-
-**Implementation order (suggested):**
-
-1. **Backend:** Replace `_generate_ai_assessment` with `_generate_handoff_summary` (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to `handoff.ai_assessment_data` + a new `handoff.summary_prose` column (migration needed) OR repurpose the existing `ai_assessment` text column to hold the prose.
-2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For `client_update` / `email_draft`, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
-3. **Backend:** Drop `_build_escalation_package_enhanced` from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the `ai_session.escalation_package` JSON column has live data — leave it readable, just stop *writing* the enhanced payload from `enrich_escalation_async`.
-4. **Frontend:** `HandoffContextScreen` reads from the new structured fields. The `ConcludeSessionModal`'s "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
-5. **Test plan:** Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
-
-**Watch-outs:**
-
- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
- Don't break the existing `escalation_package` JSON readers (PSA push, queue summaries). Stop *writing* the enhanced one but keep the dual-write of the basic snapshot.
- `_generate_ai_assessment` is referenced in tests (`test_handoff_manager.py` stubs it via `AsyncMock`). Update test fixtures when renaming.
+- ✅ Post-escalation redirect (dashboard + toast)
+- ✅ Magic-moment screen: header, AI assessment, 2-option CTA
+- ✅ "I'll take it from here": claim → dismiss → composer focused
+- ✅ "Get AI analysis": claim → briefing → AI responds → task lane populates
+- ✅ Task lane copy button: toast + checkmark
+- ✅ Chip expansion: inline detail + "Open in Tasks panel"
+- ✅ Post-claim overlay: dismissible mode, Close only

 ## Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)

--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,34 +2,33 @@

 # HANDOFF.md

-**Last updated:** 2026-04-30 (session 3 — QA pass)
+**Last updated:** 2026-04-30 (Codex review-fix pass)

-**Active task:** **Escalation Mode** wedge — BROWSER QA COMPLETE. Branch: `feat/escalation-metric-endpoint`. PR #155 ready to mark ready-for-review.
+**Active task:** **Escalation Mode** wedge — BROWSER QA COMPLETE + review fixes applied. Branch: `feat/escalation-metric-endpoint`. PR #155 ready to mark ready-for-review after committing this fix pass.

-## Where the previous session ended
+## Where this session ended

-Browser QA pass completed. One critical bug found and fixed during QA.
+Code-review fixes were applied after browser QA:

-**Bug found + fixed (commit dc69c9d):**
- `POST /ai-sessions/{id}/chat → 400` when senior clicks "Get AI analysis" — `send_chat_message` checked `session.user_id == user_id` but the senior is `escalated_to_id`, not `user_id`. Fixed by adding `OR escalated_to_id == user_id` in the WHERE clause.
+- `claim_session` now uses atomic conditional `UPDATE ... WHERE claimed_by IS NULL` instead of read-then-write, so simultaneous senior pickup cannot silently overwrite `claimed_by`.
+- Original escalators cannot claim their own handoff. The escalation queue also excludes the current user's own escalated sessions, preventing the post-escalation dashboard from showing the junior their own handoff.
+- `session.escalation_package["handoff_id"]` is now populated from a preassigned UUID instead of `None` before flush.
+- Frontend build blockers removed: deleted unused legacy `claiming` / `handleStartHere` path in `AssistantChatPage` and unused `onStartHere` destructuring in `HandoffContextScreen`.

-**All QA checks passed (17/17 backend tests pass):**
+**Validation:**

- Post-escalation redirect: junior gets "Session escalated. Heading back to your dashboard." toast + navigates to `/`
- Magic-moment screen: header, metadata, two-column AI assessment, 2-option CTA (no task lane) all render correctly
- "I'll take it from here": claim → dismiss → chat surface → composer focused ✅
- "Get AI analysis": claim → briefing sent → AI responds → task lane populates ✅ (fixed)
- Task lane copy button: toast + checkmark visual feedback ✅
- Chip expansion: inline detail card + "Open in Tasks panel" scroll ✅
- Post-claim overlay: "Context" toolbar button → dismissible mode → only Close button ✅
+- `git diff --check` ✅
+- `cd backend && pytest --override-ini='addopts=' tests/test_handoff_manager.py tests/test_session_handoffs_api.py tests/test_escalation_bus.py` ✅ `28 passed in 42.23s`
+- `cd frontend && /config/.bun/bin/bunx tsc -p tsconfig.app.json --noEmit --pretty false && /config/.bun/bin/bunx tsc -p tsconfig.node.json --noEmit --pretty false` ✅
+- Full frontend build could not complete because generated dirs are root-owned in this workspace: `frontend/node_modules/.tmp`, `frontend/node_modules/.vite-temp`, and likely `frontend/dist` produce EACCES. Type errors from review are fixed.

 **Not testable in dev (known limitations):**
 - "Continue where X left off": requires senior to have existing task lane for session (won't occur on first pickup)
- 409 race condition: requires two distinct senior accounts; backend logic reviewed and correct
+- Browser-level 409 race toast still requires two distinct senior accounts. Backend claim write is now atomic and covered by service/API tests for conflict, self-claim, and idempotent same-user retry.

 ## Resume point — DO THIS NEXT

-**Ship:** Mark PR #155 ready-for-review and demo to stakeholder. No engineering work remaining.
+**Ship:** Commit this review-fix pass, then mark PR #155 ready-for-review and demo to stakeholder.

 Optional before shipping:
 - Record Loom demo walking through the escalation flow end-to-end
@@ -37,6 +36,8 @@ Optional before shipping:
 ## Key files changed this session

 - `backend/app/services/handoff_manager.py` — `_generate_handoff_summary` replaces old assessment pair; `enrich_escalation_async` unified; `claim_session` eager-loads `handed_off_by_user`
+- `backend/app/api/endpoints/ai_sessions.py` — escalation queue excludes the current user's own escalations
+- `backend/app/api/endpoints/session_handoffs.py` — self-claim returns 403
 - `backend/app/services/flowpilot_engine.py` — `generate_status_update` early-returns saved prose for `context='escalation'`
 - `backend/app/schemas/session_handoff.py` — `handed_off_by_name: str | None = None` added
 - `backend/app/api/endpoints/session_handoffs.py` — both create + claim endpoints pass `handed_off_by_name`
@@ -44,11 +45,12 @@ Optional before shipping:
 - `frontend/src/components/flowpilot/HandoffContextScreen.tsx` — 3-option CTA; `hasTaskLane`, `activeOptionKey`, `onContinue/onAIAnalysis/onOwnThing` props
 - `frontend/src/components/assistant/TaskLane.tsx` — `id="task-lane-card-{idx}"` on all card variants
 - `frontend/src/pages/AssistantChatPage.tsx` — `handleContinue`, `handleAIAnalysis`, `handleOwnThing` handlers; chip → card navigation; `activeOptionKey` state
+- `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py` — regression coverage for atomic/idempotent claim, self-claim rejection, queue self-exclusion, and pre-flush handoff ID

 ## Watch-outs

 - Dev stack: backend `:8000`, frontend `:5173`, postgres `:5433` (docker-compose). HMR works.
 - Test users (Acme MSP, password `TestPass123!`): `engineer@resolutionflow.example.com` (junior), `teamadmin@resolutionflow.example.com` (senior).
 - `handleAIAnalysis` pre-adds `urlSessionId` to `loadedChatIdsRef` before dismissing so the normal selectChat effect doesn't double-fire. It then calls `selectChat` manually before sending the briefing.
- `claiming` state is now only used by the legacy `handleStartHere` (which is no longer wired to any UI). `activeOptionKey !== null` is the new `isProcessing` signal.
+- Legacy `claiming` / `handleStartHere` on `AssistantChatPage` was removed; `activeOptionKey !== null` is the active pre-claim processing signal.
 - The bus is acceptable for v1 pilot scale only (Railway single-replica). Redis pub/sub is the swap when horizontal scaling appears.
--- a/.ai/SESSION_LOG.md
+++ b/.ai/SESSION_LOG.md
@@ -12,6 +12,17 @@

 ---

+## 2026-04-30 06:25 UTC — Codex — Apply Escalation Mode review fixes
+
+- Reviewed the recent Escalation Mode wedge work and fixed the actionable findings before PR #155 is marked ready.
+- Reworked `HandoffManager.claim_session` from read-then-write to an atomic conditional update, preserving idempotent same-user retries and returning a typed conflict for a different claimant.
+- Blocked original engineers from claiming their own handoffs and filtered their own escalated sessions out of `/ai-sessions/escalation-queue`, preventing the post-escalation dashboard from showing a junior their own handoff.
+- Fixed the compatibility payload so `session.escalation_package["handoff_id"]` is populated from a preassigned UUID before flush.
+- Removed unused legacy frontend pickup state (`claiming`, `handleStartHere`, unused `onStartHere` destructuring) that made `tsc -b` fail under `noUnusedLocals`.
+- Added regression coverage for pre-flush handoff IDs, conflict handling, self-claim rejection, successful non-owner claim, and own-escalation queue exclusion.
+- Verified `git diff --check`; focused backend tests passed (`28 passed in 42.23s`); frontend `tsc --noEmit` checks passed for app and node configs. Full Vite/build script remains blocked by root-owned generated directories under `frontend/node_modules` / `frontend/dist` in this workspace, not by TypeScript errors.
+- Files touched: `backend/app/services/handoff_manager.py`, `backend/app/api/endpoints/ai_sessions.py`, `backend/app/api/endpoints/session_handoffs.py`, `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `frontend/src/components/flowpilot/HandoffContextScreen.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
+
 ## 2026-04-30 — Claude Code — Browser QA pass complete; chat ownership bug found and fixed; PR #155 ready

 - Ran full browser QA pass on the escalation mode feature using gstack `/qa` skill.