resolutionflow/.ai/CURRENT_TASK.md

# CURRENT_TASK.md

**Active task:** None — pick next from `.ai/TODO.md` or `03-DEVELOPMENT-ROADMAP.md`.

## Recently shipped

- **2026-05-01 — PR #156** Suggested-fix `applied_pending` non-terminal outcome. Merged into `main` as `3ba4532`. Adds:
  - Schema/API: `FixStatus="applied_pending"`, `pending_reason` Text column, migration `c0f3a4b7e91d`. `PATCH /suggested-fixes/{id}/outcome` accepts pending, requires notes, stamps `applied_at` only.
  - UI: `PendingBanner` (info-tone, worked / didn't / update reason / dismiss). "Waiting to verify…" overflow option in `VerifyingBanner`. Nudge "Still checking" records pending with a reason. Page-level Resolve auto-patches pending → success before resolution flow; page-level Escalate intercepts pending the same way verifying/partial does.
  - Generators: `resolution_note_generator` and `escalation_package_generator` system prompts handle the new status without real-looking examples.
  - Tests: 4 new in `test_fix_outcome_endpoint.py` (21/21 suite green); prompt anti-parrot guardrail green; tsc + Vite build clean.
  - QA report: `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md` (5/7 scripted checks PASS with concrete evidence; 2 entry-path checks deferred — same handlers verified via tested transitions).
- **2026-04-30 — PR #155** Escalation Mode wedge merged as `ac42f97`. Senior-tech magic-moment screen. Plan: [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md).

## Two-metric framing (Escalation Mode — read before quoting numbers)

The in-product `GET /analytics/flowpilot/escalations` endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline − in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.

## Kill-switch (Escalation Mode)

Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.

## Notes for next session

- Drive checks 1 (VerifyingBanner overflow → "Waiting to verify…") and 5 (nudge "Still checking" with 3+ post-apply messages) in real pilot usage to close the QA gap left by `/qa` (the tested handlers cover the same mutations, but the entry-path UI rendering wasn't exercised end-to-end).
- Consider monitoring how often pending fixes get parked vs resolved — if engineers report losing track across sessions, revisit the cross-session "Follow-ups" dashboard rollup that was scoped out.