Default Claude Code model is being switched from Opus 4.7 1M-context to
Opus 4.7 (200k). Tighten the per-session pickup docs so they're
self-sufficient under the smaller window:
- CURRENT_TASK now reflects the post-Codex state: 8 commits on the
branch (5 feat + WIP SSE + 2 Codex test/latency fixes + 1 doc
refresh), 32/32 backend tests with -n auto, frontend tsc -b clean.
Remaining work re-scoped: the SSE backend half is feature-complete
and tested, so what's left is the FRONTEND SSE subscription in
EscalationQueue.tsx, then the magic-moment handoff-context screen,
then push + draft PR.
- Session log gets a Claude Code entry covering today's planning →
build → pause-for-Codex arc, the design decisions locked into the
doc and code, the two TODOs added (peer-tech escalation, mobile
responsive), and the model-switch context for the next session.
- HANDOFF.md needs no change — Codex's update in 9bdd995 already
describes the resume point and watch-outs cleanly.
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4.6 KiB
CURRENT_TASK.md
Task: Build Escalation Mode — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
Status: in-flight on feat/escalation-metric-endpoint. Backend is feature-complete and test-stabilized. Next: frontend SSE subscription in EscalationQueue.tsx, then the magic-moment handoff-context screen, then push + draft PR.
Plan: docs/plans/2026-04-27-escalation-mode-wedge-design.md. Reviewed by /office-hours, /plan-eng-review, /plan-design-review, /codex review. Eng + Design CLEARED. Codex's two-metric correction + claim role gate + per-channel notification model + SSE bus diagnostics all applied.
Test plan artifact: docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md — primary input for /qa once feature-complete.
Done on feat/escalation-metric-endpoint (8 commits, branched from main @ c0ed6d9)
| Commit | What it ships |
|---|---|
d51e95c |
Plan + test-plan artifacts |
52f6d03 |
GET /analytics/flowpilot/escalations — in-product time-to-first-action; account-scoped, engineer-or-admin gated |
7a5b853 |
Role-gate POST /handoffs/{id}/claim to engineer-or-admin |
07d0db9 |
HandoffManager.dispatch_escalation_notifications — emails engineer/admin teammates on intent=escalate; graceful-degradation regression |
9f0bfd4 |
EscalationMetricCard mounted above the queue list |
a283d0d |
.ai/ mid-flight refresh |
87bd0b7 |
WIP marker for the SSE backend slice (paused for Codex pass) |
bc15952 |
Codex: stabilize SSE backend tests — Depends(..., scope="function") releases auth DB deps before the long-lived stream body; SSE handshake test calls the generator directly; AI-assessment stub fixture; bus normalizes string vs UUID account_id |
fff8338 |
Doc-only: track escalation assessment latency follow-up |
9bdd995 |
Bound escalation assessment latency to ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS (default 5s); handoff still creates if assessment times out |
Test status: focused subset (test_escalation_bus, test_handoff_manager, test_session_handoffs_api, test_flowpilot_analytics_escalations) → 32 passed in 17.77s with -n auto. Frontend tsc -b clean. Branch not pushed.
Remaining work on this branch
- Frontend SSE subscription in
EscalationQueue.tsx. Use a fetch-basedReadableStreamreader (matchingfrontend/src/api/aiSessions.tsstreamDocumentation— nativeEventSourcecan't send auth headers). Prepend new cards with the locked 200ms slide-in. Reconnect with backoff. Tab-title flash when backgrounded. Respectprefers-reduced-motion. - Magic-moment handoff-context screen — 4-section view (problem summary / what's been tried / AI assessment / Start here CTA) that loads on Pick Up before dissolving into the regular FlowPilot session view. ~1.5-2 days. Must render gracefully when
ai_assessmentisNone(assessment timed out — see9bdd995). - Owner-facing analytics page at
/analytics/escalations— period selector, conversion-rate, trend chart. ~0.5d. Optional for v1 demo. - Playwright e2e for the magic-moment demo flow (junior escalates → senior receives → senior claims → opens session). Critical for the GTM Loom not to crash mid-recording.
Two-metric framing — read this before quoting numbers to anyone
The in-product endpoint measures post-claim time-to-first-action. The "minutes recovered" sales claim is manual_baseline − in_product_metric. Manual baseline comes from the founder's stopwatch on the next 5 escalations (The Assignment in the design doc). Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
Kill-switch
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge. The design doc names the alternative direction (deterministic-ops territory) for context, but data lands first.
Previous task — closed out
Task: Land PR #153 — fix the AssistantChatPage prefill currentChatRef bug. Status: complete (2026-04-26). Merged as 68fcdc6 on main.
Background CI item, not blocking: promoting CI / e2e (pull_request) to required on main. Two consecutive green runs cleared the threshold. Ops-only.