- HANDOFF: rewritten resume point. AI summary blocker is the active
task; consolidation plan is the path. 5-step implementation order
with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
live-test results (what works, the AI summary blocker), full
consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
pickup bugs found + fixed, the three Enter/dashboard/timeout
fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
calls into one structured generation" — rejected alternatives
(bump timeout further, copy status-update content the wrong way,
switch to Haiku) and consequences (5s magic-moment, ~60% token
reduction, instant Ticket Notes button, schema enforcement
required, migration concerns documented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8.0 KiB
CURRENT_TASK.md
Task: Build Escalation Mode — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
Status: in-flight on feat/escalation-metric-endpoint. Branch pushed; draft PR #155 open (gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155). Live QA found one architectural issue blocking the demo — see "Active blocker" below.
Plan: docs/plans/2026-04-27-escalation-mode-wedge-design.md. Reviewed by /office-hours, /plan-eng-review, /plan-design-review, /codex review. Eng + Design CLEARED.
Test plan artifact: docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md.
Active blocker — AI assessment still empty after pickup
The bug (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and the placeholder never clears. Bus event fires with has_assessment: false because _generate_ai_assessment is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS from 15 → 45 (commit 0d1b305) didn't fix it in the field.
Why patching is the wrong move: the real architectural issue is that we make three AI calls per escalation, all summarizing the same source material:
_build_escalation_package_enhanced(Sonnet) — rich JSON payload, runs in the background._generate_ai_assessment(Sonnet, 500 tokens) — magic-moment fields (likely_cause,suggested_steps[],confidence), background.generate_status_update(Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce inConcludeSessionModal, on demand.
User's correct observation (2026-04-29): the engineer is typically generating a status update during the escalate flow anyway. There's no reason to do that work three times.
Next active task: consolidate the three calls into one. See ## Active task — AI generation consolidation below.
Active task — AI generation consolidation
Goal: ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
Proposed shape (decide during implementation):
# Persist on SessionHandoff:
{
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
"what_we_know": ["<one-liner>", ...],
"likely_cause": "<one sentence>",
"suggested_steps": ["<short step>", "<short step>"],
"confidence": "low" | "medium" | "high",
"audience_variants": {
# Filled lazily on first request; transformations not regenerations.
"client_update": null,
"email_draft": null,
}
}
Implementation order (suggested):
- Backend: Replace
_generate_ai_assessmentwith_generate_handoff_summary(or rename — pick the right noun). One Sonnet call, structured JSON response, persisted tohandoff.ai_assessment_data+ a newhandoff.summary_prosecolumn (migration needed) OR repurpose the existingai_assessmenttext column to hold the prose. - Backend: Make
generate_status_updateforaudience='ticket_notes'/context='escalation'read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). Forclient_update/email_draft, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose. - Backend: Drop
_build_escalation_package_enhancedfrom the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: theai_session.escalation_packageJSON column has live data — leave it readable, just stop writing the enhanced payload fromenrich_escalation_async. - Frontend:
HandoffContextScreenreads from the new structured fields. TheConcludeSessionModal's "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint. - Test plan: Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
Watch-outs:
- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
- Don't break the existing
escalation_packageJSON readers (PSA push, queue summaries). Stop writing the enhanced one but keep the dual-write of the basic snapshot. _generate_ai_assessmentis referenced in tests (test_handoff_manager.pystubs it viaAsyncMock). Update test fixtures when renaming.
Done on feat/escalation-metric-endpoint (branched from main @ c0ed6d9)
| Commit | What it ships |
|---|---|
d51e95c |
Plan + test-plan artifacts |
52f6d03 |
GET /analytics/flowpilot/escalations — in-product time-to-first-action |
7a5b853 |
Role-gate POST /handoffs/{id}/claim to engineer-or-admin |
07d0db9 |
HandoffManager.dispatch_escalation_notifications — emails engineer/admin teammates |
9f0bfd4 |
EscalationMetricCard mounted above the queue list |
bc15952 |
Codex: stabilize SSE backend tests |
9bdd995 |
Bound escalation assessment latency (ORIGINAL: 5s) |
b8627f4 |
Frontend SSE subscription in EscalationQueue.tsx — live-arrival animations |
8e9d22e |
Magic-moment handoff-context screen on pickup |
641853a |
Bell-icon notification opens the pickup flow |
029680a |
Unify /escalate through HandoffManager |
8914391 |
First task-lane race fix (insufficient — see 665530f) |
0f00ee5 |
Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast |
665530f |
Structural task-lane fix — taskLaneOwnerChatId tagging |
b7d7ff0 |
docs(ai): refresh handoff for compute swap |
0d1b305 |
Live-test fixes: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview |
Live-test results (2026-04-29 morning)
After the structural task-lane fix and the four polish items, end-to-end test confirmed:
- ✅ Junior escalates → senior gets bell-icon notification.
- ✅ Magic-moment screen renders with handoff data on Pick Up.
- ✅ Senior's chat surface loads with conversation history (after
0d1b305's selectChat fix — was completely broken before). - ✅ Sidebar shows the picked-up session with the "Escalated" pill (after
0d1b305'sloadChats()call). - ✅ Suggested-step chips render below the composer.
- ✅ Unread 6px dot on queue cards.
- ✅ Task-lane regression is gone — no stale flash on new sessions.
- ❌ AI assessment placeholder never clears. Drives the consolidation work above.
Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).
Two-metric framing — read this before quoting numbers to anyone
The in-product endpoint measures post-claim time-to-first-action. The "minutes recovered" sales claim is manual_baseline − in_product_metric. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
Kill-switch
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.