resolutionflow

Author	SHA1	Message	Date
Michael Chihlas	f10649abc2	fix(escalations): atomic claim + self-claim rejection + queue exclusion All checks were successful Mirror to GitHub / mirror (push) Successful in 5s Details CI / frontend (pull_request) Successful in 4m59s Details CI / backend (pull_request) Successful in 10m22s Details CI / e2e (pull_request) Successful in 10m46s Details Codex review pass on the escalation wedge. Reworks claim_session from read-then-write to a conditional UPDATE so two seniors racing can't both win, blocks the original engineer from claiming their own handoff, and filters self-escalated sessions out of the dashboard escalation queue. Also preassigns the handoff UUID before flush so the compatibility escalation_package payload carries it. Removes legacy frontend pickup state (claiming, handleStartHere) that broke tsc --noEmit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 16:21:20 -04:00
Michael Chihlas	db717b0b3f	feat(escalations): magic-moment 3-option CTA + claim 500 fix - HandoffContextScreen: 3-option layout (Continue/AI analysis/Own thing) with hasTaskLane, activeOptionKey, spinner/disabled states - AssistantChatPage: wire up handleContinue, handleAIAnalysis, handleOwnThing handlers; chip detail expansion inline with copy-button fix; post-escalation redirect to dashboard on ConcludeSessionModal close - TaskLane: fix async copy button (await + execCommand fallback + copiedKey visual feedback); whitespace-pre-wrap on command blocks - Fix 500 on claim: Pydantic v2 model_validate() + model_copy(update={}) (was passing update= kwarg directly which v2 rejects) - HandoffResponse schema: handed_off_by_name field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 00:05:02 -04:00
Michael Chihlas	0f00ee5e01	feat(escalations): close out plan-locked wedge polish Four items from the design-plan audit, all flagged as locked-design or Codex corrections, shipped together so the GTM demo path covers them end-to-end before bug bash. 1. Live AI assessment refresh on the magic-moment screen. Backend already publishes handoff_assessment_ready when enrich_escalation_async commits; wire the frontend listener so the senior sees the assessment populate without a manual reopen. New event type + onAssessmentReady handler on streamEscalations; AssistantChatPage opens a scoped SSE subscription whenever it tracks a handoff missing its assessment, refetches on match, and replaces magicHandoff / overlayHandoff in place. Closes the loop on the async-assessment commit `e8ba74e`. 2. Suggested-step chips below the chat input. Locked design from the plan (Codex correction). Chip strip renders above the composer post-claim when ai_assessment_data.suggested_steps[] is non-empty. Click prefills the input and focuses; first send or explicit X hides for the session. 3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen set (rf-escalation-seen, capped 200). Dot top-right when not seen. Cleared on open (card click) or claim (Pick Up) — NOT on hover, per Codex correction. Pick Up stops propagation so it doesn't double-fire. 4. Race-condition toast on claim conflict. The /claim endpoint previously silently overwrote claimed_by — both seniors thought they owned the session. New HandoffAlreadyClaimedError carries the winner's id/name/ timestamp; claim_session rejects different-user re-claims (same-user is idempotent for double-click safety); endpoint returns 409 with structured detail. AssistantChatPage.handleStartHere extracts and surfaces "Already claimed by {name} {time_ago}." via toast, drops ?pickup=true, dismisses magic-moment so the loser flows back to queue. Tests: 2 new unit tests in test_handoff_manager.py (conflict raises, same-user idempotent). Full handoff + escalation suite (34 tests) green. Frontend tsc -b clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 01:59:28 -04:00
Michael Chihlas	9bdd9959a8	fix(handoff): bound escalation assessment latency Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 20:03:14 -04:00
Michael Chihlas	bc15952857	fix(tests): stabilize escalation SSE backend tests Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 19:47:43 -04:00
Michael Chihlas	87bd0b7c56	WIP: SSE pub/sub for live escalation arrivals (paused for Codex review) First half of the WebSocket/SSE push slice. Paused mid-flight to hand the branch to Codex for outside-voice review before stacking more commits on top. See .ai/HANDOFF.md for the full pause context + what to look at. What's here: - backend/app/core/escalation_bus.py — module-level singleton in-memory pub/sub keyed by account_id. asyncio.Queue per subscriber with 64-event maxsize and drop-on-full semantics. Designed to be swappable for Redis pub/sub when Railway scales past single-replica. - backend/app/api/endpoints/session_handoffs.py — GET /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe bound to current_user.account_id. - backend/app/services/handoff_manager.py — dispatch_escalation_notifications now publishes a `handoff_created` event to the bus BEFORE the email fan-out, in a try/except so a bus failure can't block email delivery. - backend/tests/test_escalation_bus.py — 7 unit tests, all green standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers. - backend/tests/test_handoff_manager.py — +1 dispatcher integration test (publishes to bus, payload shape). - backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer blocked, ready event handshake). [gstack-context] Decisions: - SSE over WebSocket (one-way, browser EventSource semantics, fewer moving parts behind Railway proxy) - In-memory bus over Redis for v1 pilot (3 MSPs, single replica) - Drop-on-full subscriber queue rather than back-pressure publishers - Bus publish ahead of email send, both wrapped in try/except so neither can break handoff creation - Frontend will be a fetch-based ReadableStream reader matching the existing streamDocumentation pattern, not native EventSource (custom-header auth) Remaining (post-Codex): - Frontend SSE subscription in EscalationQueue.tsx (slide-in, reconnect, tab-title flash, prefers-reduced-motion) - Magic-moment handoff-context screen - Re-run the full backend test suite to verify the SSE + dispatcher integration tests (bus units already green standalone) Tried: - Running the full test suite repeatedly without xdist; the per-test DROP SCHEMA + recreate fixture made wall-clock prohibitive when multiple stale runs collided on the same Postgres test schema. Resolution: -n auto next time. [/gstack-context] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 19:29:07 -04:00
Michael Chihlas	07d0db9579	feat(handoff): email engineer-or-admin teammates on escalation First half of the Escalation Mode notification dual-path. WebSocket/SSE push is the second half (next commit) — email handles offline seniors, push handles online ones for the magic-moment demo. HandoffManager.dispatch_escalation_notifications: - Pulls active engineer/admin/owner-role users in the same account_id (excludes the escalator + viewers + soft-deleted) - Sends via existing EmailService.send_notification_email, concurrent via asyncio.gather; per-message failures don't block the rest - Wrapped in try/except: any exception is logged + swallowed. Handoff creation is authoritative; notification is advisory. This is the graceful-degradation regression both eng + codex reviews flagged as critical (handoff must succeed even if SMTP is down). Endpoint wiring (POST /ai-sessions/{id}/handoff): - Dispatch fires AFTER db.commit() — never email about a rolled-back handoff. Trust-erosion bug if we got that wrong. - Only fires for intent=escalate. Park is private to the escalator. Tests (4 new): - emails-engineer-recipients-in-account: viewer excluded, escalator excluded, only the engineer/admin teammates get the message - skipped-for-park-intent: park doesn't fan out - graceful-degradation-when-email-raises: RuntimeError from the email service does NOT bubble out of dispatch - endpoint-dispatches-on-escalate: end-to-end wiring through POST Per-channel delivery records (replacing the dead `notification_sent` boolean per Codex correction) is a v1.x story — for now application logs are the audit trail. See docs/plans/2026-04-27-escalation-mode-wedge-design.md. 20 tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. No regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:58:05 -04:00
chihlasm	f84b868d13	feat: add HandoffManager service with dual-write and integration tests Unified park/escalate handoff management with snapshot generation, AI diagnostic assessment for escalations (via _call_ai), claim workflow that reactivates sessions, PSA push via existing psa_documentation_service, and team queue query. Dual-writes to ai_sessions.escalation_package for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 08:42:21 +00:00

8 Commits