Files
resolutionflow/.ai/CURRENT_TASK.md
Michael Chihlas fb2dc222fd
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m9s
CI / backend (pull_request) Successful in 9m43s
CI / e2e (pull_request) Successful in 10m13s
docs(ai): handoff for fresh session — AI consolidation plan locked
- HANDOFF: rewritten resume point. AI summary blocker is the active
  task; consolidation plan is the path. 5-step implementation order
  with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
  live-test results (what works, the AI summary blocker), full
  consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
  pickup bugs found + fixed, the three Enter/dashboard/timeout
  fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
  calls into one structured generation" — rejected alternatives
  (bump timeout further, copy status-update content the wrong way,
  switch to Haiku) and consequences (5s magic-moment, ~60% token
  reduction, instant Ticket Notes button, schema enforcement
  required, migration concerns documented).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:21:30 -04:00

104 lines
8.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CURRENT_TASK.md
**Task:** Build **Escalation Mode** — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.
**Status:** in-flight on `feat/escalation-metric-endpoint`. Branch pushed; **draft PR #155** open ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Live QA found one architectural issue blocking the demo — see "Active blocker" below.
**Plan:** [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md). Reviewed by `/office-hours`, `/plan-eng-review`, `/plan-design-review`, `/codex review`. Eng + Design CLEARED.
**Test plan artifact:** [`docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`](../docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md).
## Active blocker — AI assessment still empty after pickup
**The bug** (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and **the placeholder never clears**. Bus event fires with `has_assessment: false` because `_generate_ai_assessment` is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` from 15 → 45 (commit `0d1b305`) didn't fix it in the field.
**Why patching is the wrong move:** the real architectural issue is that we make **three** AI calls per escalation, all summarizing the same source material:
1. `_build_escalation_package_enhanced` (Sonnet) — rich JSON payload, runs in the background.
2. `_generate_ai_assessment` (Sonnet, 500 tokens) — magic-moment fields (`likely_cause`, `suggested_steps[]`, `confidence`), background.
3. `generate_status_update` (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in `ConcludeSessionModal`, on demand.
User's correct observation (2026-04-29): the engineer is *typically* generating a status update during the escalate flow anyway. There's no reason to do that work three times.
**Next active task: consolidate the three calls into one.** See `## Active task — AI generation consolidation` below.
## Active task — AI generation consolidation
**Goal:** ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.
**Proposed shape** (decide during implementation):
```python
# Persist on SessionHandoff:
{
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
"what_we_know": ["<one-liner>", ...],
"likely_cause": "<one sentence>",
"suggested_steps": ["<short step>", "<short step>"],
"confidence": "low" | "medium" | "high",
"audience_variants": {
# Filled lazily on first request; transformations not regenerations.
"client_update": null,
"email_draft": null,
}
}
```
**Implementation order (suggested):**
1. **Backend:** Replace `_generate_ai_assessment` with `_generate_handoff_summary` (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to `handoff.ai_assessment_data` + a new `handoff.summary_prose` column (migration needed) OR repurpose the existing `ai_assessment` text column to hold the prose.
2. **Backend:** Make `generate_status_update` for `audience='ticket_notes'` / `context='escalation'` read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For `client_update` / `email_draft`, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
3. **Backend:** Drop `_build_escalation_package_enhanced` from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the `ai_session.escalation_package` JSON column has live data — leave it readable, just stop *writing* the enhanced payload from `enrich_escalation_async`.
4. **Frontend:** `HandoffContextScreen` reads from the new structured fields. The `ConcludeSessionModal`'s "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
5. **Test plan:** Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.
**Watch-outs:**
- The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
- Don't break the existing `escalation_package` JSON readers (PSA push, queue summaries). Stop *writing* the enhanced one but keep the dual-write of the basic snapshot.
- `_generate_ai_assessment` is referenced in tests (`test_handoff_manager.py` stubs it via `AsyncMock`). Update test fixtures when renaming.
## Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)
| Commit | What it ships |
|---|---|
| `d51e95c` | Plan + test-plan artifacts |
| `52f6d03` | `GET /analytics/flowpilot/escalations` — in-product time-to-first-action |
| `7a5b853` | Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin |
| `07d0db9` | `HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates |
| `9f0bfd4` | `EscalationMetricCard` mounted above the queue list |
| `bc15952` | Codex: stabilize SSE backend tests |
| `9bdd995` | Bound escalation assessment latency (ORIGINAL: 5s) |
| `b8627f4` | Frontend SSE subscription in `EscalationQueue.tsx` — live-arrival animations |
| `8e9d22e` | Magic-moment handoff-context screen on pickup |
| `641853a` | Bell-icon notification opens the pickup flow |
| `029680a` | Unify `/escalate` through `HandoffManager` |
| `8914391` | First task-lane race fix (insufficient — see `665530f`) |
| `0f00ee5` | Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast |
| `665530f` | Structural task-lane fix — `taskLaneOwnerChatId` tagging |
| `b7d7ff0` | docs(ai): refresh handoff for compute swap |
| `0d1b305` | **Live-test fixes**: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview |
## Live-test results (2026-04-29 morning)
After the structural task-lane fix and the four polish items, end-to-end test confirmed:
- ✅ Junior escalates → senior gets bell-icon notification.
- ✅ Magic-moment screen renders with handoff data on Pick Up.
- ✅ Senior's chat surface loads with conversation history (after `0d1b305`'s selectChat fix — was completely broken before).
- ✅ Sidebar shows the picked-up session with the "Escalated" pill (after `0d1b305`'s `loadChats()` call).
- ✅ Suggested-step chips render below the composer.
- ✅ Unread 6px dot on queue cards.
- ✅ Task-lane regression is gone — no stale flash on new sessions.
-**AI assessment placeholder never clears.** Drives the consolidation work above.
Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).
## Two-metric framing — read this before quoting numbers to anyone
The in-product endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
## Kill-switch
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.