chihlasm/resolutionflow

Fork 0

Files

Michael Chihlas fb2dc222fd

Mirror to GitHub / mirror (push) Successful in 5s

Details

CI / frontend (pull_request) Successful in 5m9s

Details

CI / backend (pull_request) Successful in 9m43s

Details

CI / e2e (pull_request) Successful in 10m13s

Details

docs(ai): handoff for fresh session — AI consolidation plan locked

- HANDOFF: rewritten resume point. AI summary blocker is the active
  task; consolidation plan is the path. 5-step implementation order
  with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
  live-test results (what works, the AI summary blocker), full
  consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
  pickup bugs found + fixed, the three Enter/dashboard/timeout
  fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
  calls into one structured generation" — rejected alternatives
  (bump timeout further, copy status-update content the wrong way,
  switch to Haiku) and consequences (5s magic-moment, ~60% token
  reduction, instant Ticket Notes button, schema enforcement
  required, migration concerns documented).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-04-29 00:21:30 -04:00

8.0 KiB

Raw Blame History

CURRENT_TASK.md

Task: Build Escalation Mode — the wedge for ResolutionFlow's GTM (first paying-customer push). When a junior tech escalates a FlowPilot session, the senior tech sees structured handoff context in seconds instead of running a 5-minute verbal "tell me what you tried" call.

Status: in-flight on feat/escalation-metric-endpoint. Branch pushed; draft PR #155 open (gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155). Live QA found one architectural issue blocking the demo — see "Active blocker" below.

Plan: docs/plans/2026-04-27-escalation-mode-wedge-design.md. Reviewed by /office-hours, /plan-eng-review, /plan-design-review, /codex review. Eng + Design CLEARED.

Test plan artifact: docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md.

Active blocker — AI assessment still empty after pickup

The bug (live-test confirmed 2026-04-29): senior picks up an escalation, magic-moment screen renders with the "AI assessment is still generating" placeholder, and the placeholder never clears. Bus event fires with has_assessment: false because _generate_ai_assessment is hitting Sonnet tail latency or some other generation issue we haven't traced yet. Bumping ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS from 15 → 45 (commit 0d1b305) didn't fix it in the field.

Why patching is the wrong move: the real architectural issue is that we make three AI calls per escalation, all summarizing the same source material:

_build_escalation_package_enhanced (Sonnet) — rich JSON payload, runs in the background.
_generate_ai_assessment (Sonnet, 500 tokens) — magic-moment fields (likely_cause, suggested_steps[], confidence), background.
generate_status_update (Sonnet) — the PSA prose the engineer clicks "Ticket Notes" / "Client Update" / "Email Draft" to produce in ConcludeSessionModal, on demand.

User's correct observation (2026-04-29): the engineer is typically generating a status update during the escalate flow anyway. There's no reason to do that work three times.

Next active task: consolidate the three calls into one. See ## Active task — AI generation consolidation below.

Active task — AI generation consolidation

Goal: ONE AI call per escalation that produces a single structured payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Magic-moment populates immediately. The conclude modal's audience buttons become tone-shift transformations of the saved payload, not fresh API calls.

Proposed shape (decide during implementation):

# Persist on SessionHandoff:
{
  "summary_prose": "<PSA-flavored ticket-notes paragraph>",
  "what_we_know": ["<one-liner>", ...],
  "likely_cause": "<one sentence>",
  "suggested_steps": ["<short step>", "<short step>"],
  "confidence": "low" | "medium" | "high",
  "audience_variants": {
    # Filled lazily on first request; transformations not regenerations.
    "client_update": null,
    "email_draft": null,
  }
}

Implementation order (suggested):

Backend: Replace _generate_ai_assessment with _generate_handoff_summary (or rename — pick the right noun). One Sonnet call, structured JSON response, persisted to handoff.ai_assessment_data + a new handoff.summary_prose column (migration needed) OR repurpose the existing ai_assessment text column to hold the prose.
Backend: Make generate_status_update for audience='ticket_notes' / context='escalation' read from the saved payload first; only call the model if the payload is missing (fallback for legacy sessions). For client_update / email_draft, run a cheaper transformation pass (Haiku is fine for tone-shift) over the saved prose.
Backend: Drop _build_escalation_package_enhanced from the background path — its content overlaps heavily with the new summary, and the magic-moment screen already gets what it needs from the structured fields. Keep it only if downstream PSA push depends on it (verify by grep). Migration concern: the ai_session.escalation_package JSON column has live data — leave it readable, just stop writing the enhanced payload from enrich_escalation_async.
Frontend: HandoffContextScreen reads from the new structured fields. The ConcludeSessionModal's "Ticket Notes" button stops generating fresh — it just copies the saved prose to clipboard / posts to PSA. "Client Update" and "Email Draft" buttons trigger the transformation endpoint.
Test plan: Magic-moment screen populates within ~5s instead of ~25s. Engineer's "Ticket Notes" button is instant. Token spend per escalation drops by ~60%.

Watch-outs:

The schema for the structured response needs to be enforced — past calls returned freeform prose that the frontend can't parse into chips. Use Anthropic's tool-use / structured output if needed.
Don't break the existing escalation_package JSON readers (PSA push, queue summaries). Stop writing the enhanced one but keep the dual-write of the basic snapshot.
_generate_ai_assessment is referenced in tests (test_handoff_manager.py stubs it via AsyncMock). Update test fixtures when renaming.

Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)

Commit	What it ships
`d51e95c`	Plan + test-plan artifacts
`52f6d03`	`GET /analytics/flowpilot/escalations` — in-product time-to-first-action
`7a5b853`	Role-gate POST `/handoffs/{id}/claim` to engineer-or-admin
`07d0db9`	`HandoffManager.dispatch_escalation_notifications` — emails engineer/admin teammates
`9f0bfd4`	`EscalationMetricCard` mounted above the queue list
`bc15952`	Codex: stabilize SSE backend tests
`9bdd995`	Bound escalation assessment latency (ORIGINAL: 5s)
`b8627f4`	Frontend SSE subscription in `EscalationQueue.tsx` — live-arrival animations
`8e9d22e`	Magic-moment handoff-context screen on pickup
`641853a`	Bell-icon notification opens the pickup flow
`029680a`	Unify `/escalate` through `HandoffManager`
`8914391`	First task-lane race fix (insufficient — see `665530f`)
`0f00ee5`	Four plan-locked items: live AI refresh, suggested-step chips, unread dot, race-condition toast
`665530f`	Structural task-lane fix — `taskLaneOwnerChatId` tagging
`b7d7ff0`	docs(ai): refresh handoff for compute swap
`0d1b305`	Live-test fixes: selectChat-gating bug (loadedChatIdsRef), 45s timeout bump, Enter-to-submit on escalate forms, dashboard expand-to-preview

Live-test results (2026-04-29 morning)

After the structural task-lane fix and the four polish items, end-to-end test confirmed:

✅ Junior escalates → senior gets bell-icon notification.
✅ Magic-moment screen renders with handoff data on Pick Up.
✅ Senior's chat surface loads with conversation history (after 0d1b305's selectChat fix — was completely broken before).
✅ Sidebar shows the picked-up session with the "Escalated" pill (after 0d1b305's loadChats() call).
✅ Suggested-step chips render below the composer.
✅ Unread 6px dot on queue cards.
✅ Task-lane regression is gone — no stale flash on new sessions.
❌ AI assessment placeholder never clears. Drives the consolidation work above.

Untested live (low priority, can verify post-consolidation): race-condition toast (needs second user in same account).

Two-metric framing — read this before quoting numbers to anyone

The in-product endpoint measures post-claim time-to-first-action. The "minutes recovered" sales claim is manual_baseline − in_product_metric. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.

Kill-switch

Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.

8.0 KiB Raw Blame History Unescape Escape

CURRENT_TASK.md

Active blocker — AI assessment still empty after pickup

Active task — AI generation consolidation

Done on feat/escalation-metric-endpoint (branched from main @ c0ed6d9)

Live-test results (2026-04-29 morning)

Two-metric framing — read this before quoting numbers to anyone

Kill-switch

8.0 KiB

Raw Blame History

Done on `feat/escalation-metric-endpoint` (branched from `main` @ `c0ed6d9`)