- HANDOFF: rewritten resume point. AI summary blocker is the active
task; consolidation plan is the path. 5-step implementation order
with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
live-test results (what works, the AI summary blocker), full
consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
pickup bugs found + fixed, the three Enter/dashboard/timeout
fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
calls into one structured generation" — rejected alternatives
(bump timeout further, copy status-update content the wrong way,
switch to Haiku) and consequences (5s magic-moment, ~60% token
reduction, instant Ticket Notes button, schema enforcement
required, migration concerns documented).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
104 lines
8.0 KiB
Markdown
104 lines
8.0 KiB
Markdown
# CURRENT_TASK.md
|
||
|
||
**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
|
||
|
||
**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch pushed; **draft PR #155** open ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Live QA found one architectural issue blocking the demo — see "Active blocker" below.
|
||
|
||
**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED.
|
||
|
||
**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md).
|
||
|
||
## Active blocker — AI assessment still empty after pickup
|
||
|
||
**The bug** (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and **the placeholder never clears**. Bus event fires with `has_assessment: false` because `_generate_ai_assessment` is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` from 15 → 45 (commit `0d1b305`) didn't fix it in the field.
|
||
|
||
**Why patching is the wrong move:** the real architectural issue is that we make **three** AI calls per escalation, all summarizing the same source material:
|
||
|
||
1. `_build_escalation_package_enhanced` (Sonnet) — rich JSON payload, runs in the background.
|
||
2. `_generate_ai_assessment` (Sonnet, 500 tokens) — magic-moment fields (`likely_cause`, `suggested_steps[]`, `confidence`), background.
|
||
3. `generate_status_update` (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in `ConcludeSessionModal`, on demand.
|
||
|
||
User's correct observation (2026-04-29): the engineer is *typically* generating a status update during the escalate flow anyway. There's no reason to do that work three times.
|
||
|
||
**Next active task: consolidate the three calls into one.** See `## Active task — AI generation consolidation` below.
|
||
|
||
## Active task — AI generation consolidation
|
||
|
||
**Goal:** ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
|
||
|
||
**Proposed shape** (decide during implementation):
|
||
|
||
```python
|
||
# Persist on SessionHandoff:
|
||
{
|
||
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
|
||
"what_we_know": ["<one-liner>", ...],
|
||
"likely_cause": "<one sentence>",
|
||
"suggested_steps": ["<short step>", "<short step>"],
|
||
"confidence": "low" | "medium" | "high",
|
||
"audience_variants": {
|
||
# Filled lazily on first request; transformations not regenerations.
|
||
"client_update": null,
|
||
"email_draft": null,
|
||
}
|
||
}
|
||
```
|
||
|
||
**Implementation order (suggested):**
|
||
|
||
1. **Backend:** Replace `_generate_ai_assessment` with `_generate_handoff_summary` (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to `handoff.ai_assessment_data` + a new `handoff.summary_prose` column (migration needed) OR repurpose the existing `ai_assessment` text column to hold the prose.
|
||
2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For `client_update` / `email_draft`, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
|
||
3. **Backend:** Drop `_build_escalation_package_enhanced` from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the `ai_session.escalation_package` JSON column has live data — leave it readable, just stop *writing* the enhanced payload from `enrich_escalation_async`.
|
||
4. **Frontend:** `HandoffContextScreen` reads from the new structured fields. The `ConcludeSessionModal`'s "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
|
||
5. **Test plan:** Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
|
||
|
||
**Watch-outs:**
|
||
|
||
- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
|
||
- Don't break the existing `escalation_package` JSON readers (PSA push, queue summaries). Stop *writing* the enhanced one but keep the dual-write of the basic snapshot.
|
||
- `_generate_ai_assessment` is referenced in tests (`test_handoff_manager.py` stubs it via `AsyncMock`). Update test fixtures when renaming.
|
||
|
||
## Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)
|
||
|
||
| Commit | What it ships |
|
||
|---|---|
|
||
| `d51e95c` | Plan + test-plan artifacts |
|
||
| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action |
|
||
| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin |
|
||
| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates |
|
||
| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list |
|
||
| `bc15952` | Codex: stabilize SSE backend tests |
|
||
| `9bdd995` | Bound escalation assessment latency (ORIGINAL: 5s) |
|
||
| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — live-arrival animations |
|
||
| `8e9d22e` | Magic-moment handoff-context screen on pickup |
|
||
| `641853a` | Bell-icon notification opens the pickup flow |
|
||
| `029680a` | Unify `/escalate` through `HandoffManager` |
|
||
| `8914391` | First task-lane race fix (insufficient — see `665530f`) |
|
||
| `0f00ee5` | Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast |
|
||
| `665530f` | Structural task-lane fix — `taskLaneOwnerChatId` tagging |
|
||
| `b7d7ff0` | docs(ai): refresh handoff for compute swap |
|
||
| `0d1b305` | **Live-test fixes**: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview |
|
||
|
||
## Live-test results (2026-04-29 morning)
|
||
|
||
After the structural task-lane fix and the four polish items, end-to-end test confirmed:
|
||
|
||
- ✅ Junior escalates → senior gets bell-icon notification.
|
||
- ✅ Magic-moment screen renders with handoff data on Pick Up.
|
||
- ✅ Senior's chat surface loads with conversation history (after `0d1b305`'s selectChat fix — was completely broken before).
|
||
- ✅ Sidebar shows the picked-up session with the "Escalated" pill (after `0d1b305`'s `loadChats()` call).
|
||
- ✅ Suggested-step chips render below the composer.
|
||
- ✅ Unread 6px dot on queue cards.
|
||
- ✅ Task-lane regression is gone — no stale flash on new sessions.
|
||
- ❌ **AI assessment placeholder never clears.** Drives the consolidation work above.
|
||
|
||
Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).
|
||
|
||
## Two-metric framing — read this before quoting numbers to anyone
|
||
|
||
The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
|
||
|
||
## Kill-switch
|
||
|
||
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.
|