docs(plans): add escalation-mode wedge design + test plan

Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI
specs, and embedded review report for the Escalation Mode wedge — output of
/office-hours, /plan-eng-review, /plan-design-review, and /codex review.

Codex review surfaced two corrections we applied:
- two-metric framing (manual baseline vs in-product time-to-first-action)
- claim role gate moved in-scope (was deferred TODO)

TODO updates: peer-tech escalation + claim role gate captured (the latter then
moved in-scope by the codex pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-04-27 15:18:46 -04:00
parent c0ed6d9840
commit d51e95cdfa
3 changed files with 533 additions and 0 deletions

View File

@@ -0,0 +1,494 @@
# Design: ResolutionFlow GTM — Escalation-Mode-First Wedge
Generated by /office-hours on 2026-04-26
Branch: main
Repo: chihlasm/resolutionflow
Status: APPROVED
Mode: Startup
## Problem Statement
ResolutionFlow is a multi-tenant SaaS troubleshooting platform for MSPs, currently
in Go-to-Market Validation (pre-PMF). The backend is feature-complete (55+ endpoints,
100+ tests, FlowPilot telemetry baseline accruing). The product has users but no
paying customers.
The blocker is not engineering completeness. The blocker is the absence of a sharp
GTM story tied to a number a buyer can verify. The session reframed the wedge twice
before landing on the real one.
**What ResolutionFlow actually is:** the structuring layer between conversational AI
and the way MSP techs work tickets. AI is great at producing answers; it is bad at
producing workflow-shaped output. ResolutionFlow gives the tech the AI they already
trust (Claude/GPT) but organizes the output into actionable structured steps,
records the session, captures customer-specific context, and turns the result into
PSA-formatted ticket notes — and optionally a runbook — without the tech writing
anything.
**Positioning line:** "the senior engineer looking over your shoulder."
## Demand Evidence
The founder is the first user. Senior Systems Engineer at an MSP, losing ~20
hours/week to cross-domain interruptions (systems engineer pulled into networking
problems and vice versa). At least 4 interruptions per day, with the time cost
concentrated in the gap between AI-conversation output and MSP-ticket workflow.
This is solving-your-own-problem demand evidence — strongest possible signal at
this stage. The 20 hrs/week figure is the founder's own time, not a hypothetical.
Every MSP shop with a senior tech and a junior tech has a version of this problem.
Telemetry signal (Phase 0.5 baseline accruing): captured flows pile up but are not
being re-used. This says capture works, retrieval doesn't — which means the
"hours-saved-via-re-use" number isn't yet generatable from existing data. The
GTM-grade ROI story needs a different metric until re-use lands: minutes recovered
per escalation, generated by Approach A below.
## Status Quo
MSP techs today resolve tickets via three workarounds:
1. **AI in a tab.** Junior tech opens Claude or ChatGPT, pastes the problem, gets a
wall of prose, parses it into action items in their head, executes, repeats. AI
does the diagnostic work. The tech does all the structure-extraction and
ticket-note-writing afterward.
2. **Tribal knowledge.** Junior tech pings senior in Slack. Senior tech is
interrupted (4+ times/day per the founder's own data). Context handoff is verbal
and lossy.
3. **Stale runbooks.** Half-maintained Notion / IT Glue / SharePoint pages that
nobody trusts because they're 18 months out of date and don't match the current
customer environment.
The cost of these workarounds for the founder personally: ~20 hours per week of
senior-tech time lost. For a 5-tech MSP, the equivalent is 1 full FTE worth of
senior-engineer hours leaking into context-switching and tab-hopping.
## Target User & Narrowest Wedge
**Target user:** Senior Systems Engineer at a small-to-mid MSP (5-20 techs). The
founder is exemplar #1. Buying authority is shared between senior tech (champion)
and MSP owner (signs the check).
**Narrowest paid wedge:** Escalation Mode. Single sharp feature. When a junior tech
escalates a ticket they were working in FlowPilot, the senior tech opens the ticket
and sees the entire structured session state — every step the junior tried, every
dead end, every command output — instead of starting with "tell me what you tried"
for five minutes.
Why this is the wedge:
- **Two metrics, not one** (revised after /codex review 2026-04-27):
- **Manual baseline** (the Assignment, weeks 0-2): senior tech stopwatches the
next 5 escalations. T1 (first diagnostic action) T0 (open ticket) under
today's verbal-handoff workflow. This is the "what you currently lose" number.
- **In-product metric** (telemetry, week 3+): time-to-first-action after claim,
derived from `ai_session_step` rows where `created_at > SessionHandoff.claimed_at`
AND `user_id = SessionHandoff.claimed_by`. This is the "what it is now with
structured handoff" number.
- **The savings claim** = manual baseline in-product metric. Quote both
explicitly in pilot conversations. Do NOT roll the in-product number alone
into "minutes recovered" — that's an apples-to-oranges miscount Codex caught
in the cross-model review.
- **Single-feature demo:** a 2-minute Loom shows the magic moment — junior hits
escalate, senior window opens with full structured context. No theory required.
- **Cross-buyer story:** sells to senior tech (less interruption) AND owner (junior
techs resolve faster, take more accounts).
- **Hours-saved math is simple:** 4-5 minutes per escalation × 15-30 escalations
per week per senior tech = 1-2 hours/week recovered per senior. At $80-150/hr
fully-loaded senior tech cost, the tool pays for itself with one customer.
## Constraints
- **One-founder shop.** Cannot run three concurrent product narratives. Sequence
matters more than scope.
- **Pre-PMF runway implied.** 4-8 week build cycles before talking to a buyer are
expensive. Approach A's 1-2 week timeline is the binding constraint.
- **Existing architecture is mostly aligned.** FlowPilot, unified_chat_service,
FlowProposal, ConnectWise PSA integration — most of the pieces exist. Risk is
positioning and UX, not capability.
- **PSA copilot competition is real.** ConnectWise / Autotask / Halo are racing to
ship AI features. The wedge has to be sharp because we lose on distribution.
## Premises
The five load-bearing claims this design rests on, all confirmed in session:
1. **Diagnostic AI is commoditized.** ResolutionFlow does not compete on
"AI solves the ticket faster." That race is over. ChatGPT/Claude already won.
2. **The structuring layer is the wedge.** AI conversational output is too dense
and unstructured for active troubleshooting. ResolutionFlow's value is
organizing that output into actionable, separable, recorded steps.
3. **Escalation context is the killer feature.** "Junior hits escalate, senior gets
full structured context in 30 seconds instead of 5 minutes" is the sharpest
demoable moment in the entire product surface.
4. **First paying customer is bottom-up, prosumer-flavored.** Senior tech at a
small MSP, $20-50/seat/month, monthly billing. Owner-targeted enterprise
pricing waits until 5+ paying shops establish baseline ROI numbers.
5. **Distribution is MSP communities, not paid SaaS ads.** r/msp, MSPGeek, RocketMSP,
PSA marketplace listings. The channel matches the buyer.
## Approaches Considered
### Approach A: Escalation Mode first (REDUCED SCOPE per /plan-eng-review)
Lead the GTM with the killer feature. Polish the escalate-with-context handoff:
junior tech mid-session hits escalate, senior tech window opens with full
structured session state. 2-min demo Loom. Pilot with **3 MSPs** in the founder's
network (capped at 3 to preserve build capacity for B). Metric: minutes recovered
per escalation.
**SCOPE REDUCTION (2026-04-27 eng review):** ~80% of Approach A is already built.
The original 2-3 week estimate assumed greenfield. Codebase audit confirms:
| What the doc said "build" | What actually exists |
|---|---|
| Session-state serialization | `ai_session.escalation_package` (JSONB), `SessionHandoff.snapshot` |
| Senior-tech inbox | [EscalationQueuePage.tsx](frontend/src/pages/EscalationQueuePage.tsx) + [EscalationQueue.tsx](frontend/src/components/flowpilot/EscalationQueue.tsx) |
| Claim workflow | [handoff_manager.py:123 claim_session()](backend/app/services/handoff_manager.py#L123) |
| API surface | [session_handoffs.py](backend/app/api/endpoints/session_handoffs.py) — POST /handoff, /claim, GET queue |
| AI assessment for senior | `_generate_ai_assessment()` in handoff_manager |
| PSA round-trip | `escalation_package_markdown`, `escalation_package_external_id` |
**Real engineering scope (~6-9 days):**
1. **Notification dual-path** (4-5 days). `notification_sent` flag is a dead column —
never written. Wire two channels in `handoff_manager.create_handoff`:
- **Email** (existing `EmailService.send_notification_email`) — handles offline seniors.
- **WebSocket / SSE push** to the EscalationQueue for live demo magic moment.
- Set `notification_sent=true` after dispatch confirmation.
- Graceful degradation: handoff still created if notification raises (regression test required).
2. **Hero metric endpoint** (~2 hours). New `GET /api/v1/analytics/escalation-metrics`,
account-scoped, role-gated to `require_engineer_or_admin`. Computes
*minutes recovered per escalation* by querying:
```
ai_session_step.created_at (first row by senior_tech_user_id where created_at > SessionHandoff.claimed_at)
minus
SessionHandoff.claimed_at
```
Returns a rolling-30-day average per account. No schema change.
3. **UX polish on EscalationQueue + receiving-engineer view** (2-3 days). Confirm the
magic-moment screen lands when senior clicks claim. Add an unread indicator on
the queue. Wire optimistic insert when SSE event arrives.
4. **Loom + landing page copy** (1-2 days). Non-engineering. Outside this plan's scope
but required for the GTM in week 3.
**Test plan:** 100% coverage of new paths — 13 tests including 4 e2e and 1 regression
(graceful-degradation when notification dispatch raises). Test plan artifact at
`~/.gstack/projects/chihlasm-resolutionflow/abc-main-eng-review-test-plan-20260427-000000.md`.
**Risk:** Low. Single feature, single metric, architecture-aligned. The dual-path
notification is the only mildly novel surface; both halves use existing infra.
**Reuses:** `services/handoff_manager.py`, `services/escalation_package_generator.py`,
`models/session_handoff.py`, `models/ai_session.py`, `services/notification_service.py`,
`models/notification_log.py`, EmailService, EscalationQueuePage + EscalationQueue.
### UI Specifications (locked by /plan-design-review 2026-04-27)
**Magic-moment screen** (new, after Pick Up click): dedicated handoff-context view that
loads BEFORE the regular FlowPilot session view, then dissolves on first senior action.
Four sections, single frame:
1. **Problem summary** (top, 2-3 lines): junior's framing. Bricolage Grotesque h2.
2. **What's been tried** (left or middle column): structured list of `dead_ends_flagged[]`
and `steps_attempted[]` from `escalation_package` JSONB. Card-flat surface, IBM Plex.
3. **AI assessment** (right column): `ai_assessment_data` rendered as 3 fields —
`likely_cause`, `suggested_steps[]`, `confidence`. accent-dim badge for confidence.
4. **Start here** (primary CTA, electric-blue, ≥44px touch target): opens FlowPilot
session at the most-likely-next-step. Senior typing or clicking anywhere triggers
200ms fade-out and FlowPilot view fades in. Re-openable via "Show handoff context"
ghost button in FlowPilot toolbar.
**Hero metric ("minutes recovered per escalation"):** lives in TWO places:
- **Queue stat-card** (above EscalationQueue list on /escalations): compact, "X.X hrs
saved this month" + "click for details" affordance. Refreshes on queue load.
- **Dedicated `/analytics/escalations` page** (owner-facing): trend chart (4-week
rolling), per-tech breakdown, per-problem-domain segmentation. Engineer-or-admin
role-gated.
**Real-time arrival visual** (when WebSocket pushes a new escalation):
- New card slides in from above the list, 200ms ease-out CSS transition.
- Browser tab title prefixes with " (1) " / " (N) " when tab is backgrounded; clears
on focus.
- No sound. MUST respect `prefers-reduced-motion: reduce` (slide-in collapses to
instant fade-in).
**Unread state:** subtle 6px dot in top-right corner of card for escalations the
current senior has never opened. Dot fades on first hover or click.
**Race-condition (two seniors click Pick Up simultaneously):** loser sees a toast
"Already claimed by [name] 2s ago" via existing `@/lib/toast`; the card flashes the
winner's name in the meta row for 1s, then dissolves from the loser's view via
optimistic update + WebSocket reconciliation.
**Unread state (Codex correction 2026-04-27):** dot indicator clears on **open,
claim, or explicit dismiss** — NOT on hover. Hover-to-clear is a bad proxy for
acknowledgment because incidental mouse movement creates false clears.
**Notification routing (Codex finding 2026-04-27):** v1 fans out the email + push
to **all engineer-or-admin role users in the same account_id as the SessionHandoff**.
No on-call/round-robin logic in v1. If pilots ask for routing, capture as v2 TODO.
The first senior to claim wins; everyone else's notification self-resolves on
WebSocket reconciliation.
**Notification delivery model (Codex correction 2026-04-27):** drop the
`notification_sent: bool` flag from v1. Replace with per-channel delivery rows
in a new `notification_log` table (already exists — reuse, don't add a new model)
keyed by `(handoff_id, channel, recipient_user_id, status)` where status ∈
{queued, sent, failed, suppressed}. This makes partial-success and per-channel
retry visible. If the existing `notification_log` schema doesn't match, defer
the per-channel persistence to a v2 TODO and v1 logs delivery attempts to the
existing telemetry stream instead. Do NOT keep the dead boolean.
**"Start here" CTA (Codex correction 2026-04-27):** opens the FlowPilot session
at the **latest known state** (the AI's most recent agent_message + the current
pending_task_lane). Surface `ai_assessment_data.suggested_steps[]` as a list of
chips below the chat input — clicking a chip prefills the input. Do NOT invent a
"jump to most-likely-next-step" capability that doesn't exist in the session model.
**`/claim` role gate (Codex correction 2026-04-27, IN-SCOPE for v1):** add
`require_engineer_or_admin` dep on POST `/handoffs/{id}/claim`. Originally
deferred to TODO during eng review; Codex correctly flagged it as wedge-relevant
because the race-condition story depends on auth gating. ~30 min change. Removed
from TODO.md.
**A11y requirements (mandatory before pilot ship):**
- Keyboard: Tab order through queue cards; Enter on focused card opens it; Pick Up
button is a reachable target; Esc closes the handoff-context overlay.
- ARIA: `role="region"` + `aria-live="polite"` on the queue list (announces arrivals);
`aria-label="N escalations awaiting pickup"` on the heading; the slide-in animation
must not announce twice (debounce live-region updates).
- Pick Up button: bump from `py-2` to `py-2.5` to clear the 44px touch-target floor.
- Color contrast: confidence-badge text on accent-dim background must be ≥4.5:1
(verify against DESIGN-SYSTEM.md tokens).
**DS token discipline:** every new piece must use `card-flat`, `accent-dim`/`accent-text`,
`text-muted-foreground`, `bg-card`/`bg-elevated`, IBM Plex / Bricolage / JetBrains,
explicit `transition` property lists (never `transition: all`). No glass, no blur,
no gradient surfaces. Electric-blue accent reserved for interactive elements only.
**Mobile responsive:** deferred to post-pilot TODO. Pre-PMF wedge target is desktop;
MSP techs work on laptops/desktops in shop environments.
**Deferred to TODO.md (out of scope for v1 wedge):**
- Peer-tech escalates colleague's session (currently session-owner-only)
- Role gate on POST /claim (currently any authenticated user in account)
### Approach B: Full Structured Resolution loop (split B1 + B2)
End-to-end demo: tech opens FlowPilot, structure appears in side panel as AI
responds, ticket notes auto-populate at end, optional runbook capture for reusable
patterns. Tells the full "senior engineer over your shoulder" story.
**B1 — Side panel + PSA-formatted ticket notes** (ships first):
- Structured side panel that surfaces parsed AI markers as live actionable steps
while the conversation runs.
- PSA-formatted ticket-notes exporter (ConnectWise first; Autotask/Halo later).
- Effort: M (~3 weeks).
**B2 — Runbook offer-and-save** (gated on pilot demand):
- "Save this resolution as a flow?" prompt at session end, with auto-drafted
runbook from the structured session state.
- Effort: S (~1 week). Don't build until at least 2 pilot customers explicitly
ask for it.
- **Risk:** Medium. The structured-output panel quality is the whole demo. If it
looks dumb, the demo dies.
- **Reuses:** FlowPilot, unified_chat_service, FlowProposal, ConnectWise PSA
integration.
### Approach C: Senior-Tech Time-Saved Counter
Continuous measurement layer underneath A and B. Every session contributes an
estimated minutes-saved number. Owner-facing dashboard quotes "this month your
shop saved N hours of senior-tech time." Sells to MSP owner with verifiable ROI.
- **Effort:** S (~1 week + ongoing measurement methodology refinement).
- **Risk:** Medium-low. Methodology has to be defensible. If numbers look
made-up, trust dies fast.
- **Reuses:** FlowPilot telemetry, session metadata, account-scoped analytics.
## Recommended Approach
**A first (1-2 weeks), then B (3-4 weeks after A ships), with C running underneath
both as a continuous backdrop.**
Sequence rationale:
- **A is the sharpest possible 2-minute demo.** Single feature, single metric,
buyer-verifiable in their own data. Get it in front of 5 MSPs in week 3.
- **B is the depth play.** Once Approach A has produced first-pilot signal,
Approach B's full structured-resolution loop becomes the "what we ship next" that
retains pilots and converts them to paid.
- **C compounds across both.** Every session under A or B contributes to the
time-saved counter. By week 6 there are real numbers to put in front of an MSP
owner — turning a senior-tech-led pilot into an owner-signed contract.
This sequence is non-negotiable. Building B before A is the classic pre-PMF trap of
perfecting product before validating GTM. Building C alone is measurement without a
demo to anchor it.
## Pricing
**Pilot pricing (first 3-5 customers): $39/seat/month, monthly billing,
month-to-month.** Anchored against IT Glue (~$29/tech), Hudu (~$25/tech),
Liongard (~$3/endpoint). The premium over IT Glue/Hudu reflects the active-session
value (vs. their static-runbook value) — 30% above the runbook-only category.
Customer #6+ pricing is an Open Question (revisit after 3 pilots produce real
hours-saved data; price up if the per-seat ROI is over $200/seat/mo).
## Open Questions
1. **Free-tier shape.** Should the time-saved counter be free forever as a
distribution lever, with paid for the structuring + escalation? Land-and-expand
pattern. Decide after 3 pilot conversions.
2. **PSA-marketplace timing.** ConnectWise Marketplace listing requires partnership
onboarding (~6-week cycle). Submit application week 5; expect listing live by
week 11. Don't gate launch on it.
3. **Customer #6+ pricing.** Revisit after 3 pilot customers produce verifiable
hours-saved numbers.
## Deferred (YAGNI until 10 paying customers)
- HIPAA / SOC2 audit positioning. Pre-PMF is too early; revisit when a regulated-
vertical MSP asks for it explicitly.
- Multi-PSA depth (Autotask, Halo). ConnectWise alone covers ~40% of the SMB MSP
market and is sufficient for first 5-10 customers.
- Cross-tenant pattern detection. The data-flywheel-across-shops play is at least
6 months out; building it before single-shop ROI is proven is premature.
## Success Criteria (revised for realism)
- **Week 3:** Approach A shipped. 3 MSPs in active free pilot (cap at 3 to
preserve B1 build capacity).
- **Weeks 3-6:** Pilot management dominates. B1 build is paused; founder runs
pilot calls, captures bug reports, iterates UX. Stripe seat-based billing is
set up in week 5.
- **Week 6:** First verbal commit from a pilot customer. Verified
minutes-recovered-per-escalation number from at least 2 pilots.
- **Week 8:** First paid customer (procurement cycles run 4-6 weeks even at small
MSPs; 2 weeks from verbal commit to signed contract is realistic). Time-saved
counter (Approach C) producing dashboard-quality data.
- **Week 11:** B1 (side panel + PSA notes) shipped. 3-5 paying customers. First
MSP-owner-led conversation. ConnectWise Marketplace listing live.
- **Quarter end:** $5K MRR or 10 paying customers, whichever comes first. Loom
demos posted publicly to r/msp and MSPGeek.
## Distribution Plan (week-by-week cadence)
- **Week 3:** Escalation Mode demo Loom posted. r/msp launch post.
- **Week 4:** MSPGeek Discord AMA scheduled. RocketMSP newsletter pitch sent.
- **Week 5:** ConnectWise Marketplace listing application submitted. Stripe
billing live for paid conversion.
- **Week 6:** First "guest on Inside MSP podcast" outreach. Second r/msp post
(case study from a pilot, anonymized).
- **Week 7-8:** Pilot conversion calls. First paying customer.
- **Week 9-11:** B1 ships. Owner-targeted demo Loom. Second podcast outreach.
**Founder-led pilot:** The first 3-5 customers come from the founder's existing
MSP network. Treat them as design partners; expect to ship feature requests
weekly during pilot. Cap at 3 active pilots until B1 ships.
**Tech audience channels:** r/msp, r/sysadmin, MSPGeek Discord, RocketMSP
newsletter, Inside MSP podcast.
**Owner audience channels:** ConnectWise Marketplace, MSP-focused Substacks,
RIA Vendor Roundup.
CI/CD: existing Railway auto-deploy via GitHub mirror. No new pipeline needed.
## Dependencies
- **Session-state serialization (Approach A blocker).** Schema design + migration
is the longest-lead engineering task. 3-5 days budget. Do this first.
- **Stripe seat-based billing (week 5 task).** No billing infrastructure exists
today. ~3-5 days of work for monthly subscriptions + invoice flow. Block on
this before week-8 first-paid milestone.
- **ConnectWise PSA integration depth.** Sufficient for ticket-notes auto-export
(Approach B1). Autotask and Halo wait until first 5 paying ConnectWise
customers.
- **Authentication.** Existing JWT + role hierarchy is sufficient for senior-tech
inbox view; no new auth work needed.
## Risks and Kill-Switch
- **Risk: Session-state serialization design churn.** If the schema needs to
change after pilot feedback, every saved session has to migrate. Mitigation:
keep schema versioned and forward-compatible from day 1.
- **Risk: Pilot-to-paid conversion slower than 4-6 weeks.** MSP procurement is
notoriously slow. Mitigation: get verbal commits in writing; price as
month-to-month with no annual contract to lower the buying friction.
- **Risk: ConnectWise ships an equivalent feature in their 2026.x release.**
Mitigation: lead the marketing on "we're independent of your PSA" — works with
any PSA, not just ConnectWise. The founder's PSA-agnostic FlowPilot is an
asset here.
- **Kill-switch criterion:** if 0 of 3 pilots produce a verifiable
hours-saved-per-week number above 1.0 by week 8, **revisit the wedge**. The
product may need to pivot to deterministic-ops territory (Read 1 from the
session) or be repositioned. Don't sink another quarter into the current GTM
story without this number.
## The Assignment
**This week, before any code:**
Time-track the next 5 escalations in your shop manually. For each, capture:
1. Time the senior tech opens the ticket
2. Time the senior tech takes their first diagnostic action (not counting the
verbal "tell me what you tried" warm-up)
3. The delta — that's the wasted time per escalation today
Average those 5 numbers. **That's the hero stat in your first sales conversation:**
"Senior techs at our shop wasted N minutes per escalation just getting up to
speed. We built the thing that takes that to zero."
Don't try to pull this from telemetry — the doc itself notes that retrieval/re-use
data isn't queryable yet. Manual stopwatch on the next 5 escalations is the
fastest path to a defensible number.
This is the assignment because it forces the GTM story into the same time-zone as
the build, and it's a one-day effort that compounds for every conversation
afterward.
## What I noticed about how you think
- You contradicted my framing twice in the same session and the second
contradiction was sharper than the first. Most founders agree with the
diagnostic and walk out with a polished version of what they came in with. You
said "I'm just questioning if flows are even the way to go" — and that
sentence reset the entire wedge. That's craft.
- "The senior engineer looking over your shoulder" came out of you spontaneously,
not as a prepared pitch. That's the line. Use it. It survives because it's
emotional truth (every junior tech has had this, every senior tech has been
this), not constructed marketing copy.
- You're solving your own problem with your own time. 20 hrs/week isn't a
hypothetical user pain — it's your Tuesday. Founders who solve their own pain
ship sharper products because the feedback loop is instant.
- The escalation feature emerged from your description, not mine. I was busy
cataloging documentation pains. You said "junior to senior escalation? no
worries there either" almost as an afterthought. That afterthought is the wedge.
Pay attention to which features you describe casually versus which you push hard
on — the casual ones are sometimes where the truth lives.
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 0 | — | not run |
| Codex Review | `/codex review` | Independent 2nd opinion | 1 | INFO | 12 findings, 6 applied, 1 partial, 5 rejected |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR (PLAN) | 2 issues, 0 critical gaps, scope reduced |
| Design Review | `/plan-design-review` | UI/UX gaps | 1 | CLEAR (FULL) | score 6/10 → 9/10, 8 decisions |
| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | not run |
- **CODEX:** 12 findings reviewed. Applied: 2-metric framing (#2), notification routing spec (#3), per-channel delivery model (#4), unread-state fix (#11), Start-here CTA reframe (#9), claim role gate moved in-scope (#8). Rejected: full scope reduction to PSA-brief-only (#6/7/12 — user kept queue UI as demo hero). Partial: scope concern (#5) acknowledged in eng review's email-first/polling-fallback. Misread: #1, #10.
- **CROSS-MODEL:** Claude (eng + design reviews) and Codex agree on 6/12 findings. The major disagreement was scope — Codex argued for cutting the queue UI, user rejected. Both agree on metric definition, notification routing, claim auth gating.
- **UNRESOLVED:** 0
- **VERDICT:** ENG + DESIGN CLEARED, CODEX REVIEWED — ready to implement.

View File

@@ -0,0 +1,33 @@
# Test Plan
Generated by /plan-eng-review on 2026-04-27
Branch: main
Repo: chihlasm/resolutionflow
## Affected Pages/Routes
- `/escalations` ([EscalationQueuePage.tsx](frontend/src/pages/EscalationQueuePage.tsx)) — senior-tech inbox view; verify queue list, real-time arrival, click-through
- `/pilot/:session_id` (FlowPilotSessionPage) — verify post-claim load shows full escalation context (snapshot, ai_assessment, escalation_package)
- `GET /api/v1/analytics/escalation-metrics` (NEW) — verify hero metric calculation, account-scoping, role gate
## Key Interactions to Verify
- Junior tech clicks **Escalate** in active FlowPilot session → handoff is created → notification fires → senior sees escalation in queue within 30 seconds
- Senior tech clicks **Claim** in queue → session reactivates → senior is redirected into FlowPilot session view → ai_assessment + snapshot are visible
- Senior types first message in chat after claim → metric query starts attributing time-to-first-action
- MSP owner opens analytics page → "minutes recovered per escalation" widget shows current month's rolling average
## Edge Cases
- **Two seniors race to claim** the same handoff → one wins, the other gets a "Already claimed by [name]" message
- **Senior is offline** when escalation fires → email arrives via existing `EmailService.send_notification_email`
- **WebSocket disconnects mid-session** → frontend reconnects; missed events backfilled by re-fetching the queue
- **Notification dispatch raises** (SMTP down, WebSocket fanout fails) → handoff is still created (graceful degradation)
- **Senior takes non-chat action first** (e.g., posts directly to PSA) → metric falls back to PSA writeback timestamp or remains null; doc the chosen behavior
- **Account-scoped multi-tenancy** → senior at MSP A cannot see escalations from MSP B (Phase 4 RLS)
- **Role gate on metric endpoint** → only `engineer_or_admin` can hit `/escalation-metrics`
## Critical Paths
1. **Magic-moment demo flow** (the entire Loom): junior escalate → senior notification → senior claim → session view → first action recorded → metric updates
2. **Email fallback** when senior is offline — must not silently drop
3. **Regression: handoff creation succeeds even if notification dispatch raises** — graceful degradation is mandatory