docs(plans): add escalation-mode wedge design + test plan

Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI specs, and embedded review report for the Escalation Mode wedge — output of /office-hours, /plan-eng-review, /plan-design-review, and /codex review. Codex review surfaced two corrections we applied: - two-metric framing (manual baseline vs in-product time-to-first-action) - claim role gate moved in-scope (was deferred TODO) TODO updates: peer-tech escalation + claim role gate captured (the latter then moved in-scope by the codex pass). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:18:46 -04:00
parent c0ed6d9840
commit d51e95cdfa
3 changed files with 533 additions and 0 deletions
--- a/.ai/TODO.md
+++ b/.ai/TODO.md
@@ -15,3 +15,9 @@
 - [ ] **Per-test transactional rollback in `test_db` fixture.** Bigger engineering than xdist (which we already shipped). Instead of `DROP SCHEMA public CASCADE` per test, wrap each test in a savepoint and rollback at teardown. ~30-40% additional speedup on top of xdist for test-DB-heavy tests. Real refactor; only worth it if the suite gets significantly larger or runs more frequently.
 - [ ] **Consider `pytest-testmon` for PR-time test selection.** Tracks which tests touched which source files and only re-runs affected ones. Best for small PRs touching ~few files. Adds cache-invalidation complexity; only worth it if the suite stays painfully long even after xdist.
 - [ ] **AssistantChatPage `currentChatRef` guard is a silent return** — `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview` all bail with `if (currentChatRef.current !== sentForChatId) return` when stale. This is by design for chat switching, but it also silently masked the prefill-ref bug fixed in PR #153 — the user just saw "no AI response" with no log, no toast, no Sentry event. Either (a) log a `console.warn`/Sentry breadcrumb on the mismatch path so future drift is visible, or (b) split "expected stale" (chat switch) from "unexpected stale" (ref never updated) so only the latter alerts. Pair with an audit of every `currentChatRef.current = ...` assignment vs every `setActiveChatId(...)` call to make sure they're paired everywhere.
+
+- [ ] **Allow peer-tech to escalate a colleague's session.** Today `POST /ai-sessions/{session_id}/handoff` in [endpoints/session_handoffs.py:48](backend/app/api/endpoints/session_handoffs.py#L48) filters by `AISession.user_id == current_user.id`, so only the session owner can escalate. Real MSP shops have peer hand-offs: Junior A is on lunch, Junior B sees the session is stuck and should be able to escalate it. Auth tweak: switch from session-owner check to `require_engineer_or_admin` + same-account scope. Add a `handed_off_by` audit column (already exists on `SessionHandoff`) so the original-owner-vs-actual-escalator distinction is preserved. Surfaced from /plan-eng-review on the Escalation-Mode wedge plan; v1 wedge demo doesn't need this (solo-founder pilot), but capture for v2 once 3+ pilots are live and a peer-claim need surfaces.
+
+- [ ] **Mobile/responsive design for EscalationQueue + handoff-context screen.** Pre-PMF wedge demo targets desktop only — MSP techs work on laptops/desktops in shop environments. Once 3+ paying customers exist and a tech requests mobile (likely on-call use case), spec the responsive behavior: stacked card layout below `sm:` breakpoint, full-bleed handoff-context overlay on mobile, swipe-to-claim gesture instead of Pick Up button. Surfaced from /plan-design-review on the Escalation-Mode wedge plan.
+
+- [ ] **(MOVED IN-SCOPE for Escalation Mode v1, 2026-04-27)** ~~Add role gate to handoff claim endpoint.~~ Codex review correctly flagged this as wedge-relevant (the race-condition story depends on auth gating). Now part of the Escalation Mode v1 build, not a deferred TODO.
--- a/docs/plans/2026-04-27-escalation-mode-wedge-design.md
+++ b/docs/plans/2026-04-27-escalation-mode-wedge-design.md
@@ -0,0 +1,494 @@
+# Design: ResolutionFlow GTM — Escalation-Mode-First Wedge
+
+Generated by /office-hours on 2026-04-26
+Branch: main
+Repo: chihlasm/resolutionflow
+Status: APPROVED
+Mode: Startup
+
+## Problem Statement
+
+ResolutionFlow is a multi-tenant SaaS troubleshooting platform for MSPs, currently
+in Go-to-Market Validation (pre-PMF). The backend is feature-complete (55+ endpoints,
+100+ tests, FlowPilot telemetry baseline accruing). The product has users but no
+paying customers.
+
+The blocker is not engineering completeness. The blocker is the absence of a sharp
+GTM story tied to a number a buyer can verify. The session reframed the wedge twice
+before landing on the real one.
+
+**What ResolutionFlow actually is:** the structuring layer between conversational AI
+and the way MSP techs work tickets. AI is great at producing answers; it is bad at
+producing workflow-shaped output. ResolutionFlow gives the tech the AI they already
+trust (Claude/GPT) but organizes the output into actionable structured steps,
+records the session, captures customer-specific context, and turns the result into
+PSA-formatted ticket notes — and optionally a runbook — without the tech writing
+anything.
+
+**Positioning line:** "the senior engineer looking over your shoulder."
+
+## Demand Evidence
+
+The founder is the first user. Senior Systems Engineer at an MSP, losing ~20
+hours/week to cross-domain interruptions (systems engineer pulled into networking
+problems and vice versa). At least 4 interruptions per day, with the time cost
+concentrated in the gap between AI-conversation output and MSP-ticket workflow.
+
+This is solving-your-own-problem demand evidence — strongest possible signal at
+this stage. The 20 hrs/week figure is the founder's own time, not a hypothetical.
+Every MSP shop with a senior tech and a junior tech has a version of this problem.
+
+Telemetry signal (Phase 0.5 baseline accruing): captured flows pile up but are not
+being re-used. This says capture works, retrieval doesn't — which means the
+"hours-saved-via-re-use" number isn't yet generatable from existing data. The
+GTM-grade ROI story needs a different metric until re-use lands: minutes recovered
+per escalation, generated by Approach A below.
+
+## Status Quo
+
+MSP techs today resolve tickets via three workarounds:
+
+1. **AI in a tab.** Junior tech opens Claude or ChatGPT, pastes the problem, gets a
+   wall of prose, parses it into action items in their head, executes, repeats. AI
+   does the diagnostic work. The tech does all the structure-extraction and
+   ticket-note-writing afterward.
+
+2. **Tribal knowledge.** Junior tech pings senior in Slack. Senior tech is
+   interrupted (4+ times/day per the founder's own data). Context handoff is verbal
+   and lossy.
+
+3. **Stale runbooks.** Half-maintained Notion / IT Glue / SharePoint pages that
+   nobody trusts because they're 18 months out of date and don't match the current
+   customer environment.
+
+The cost of these workarounds for the founder personally: ~20 hours per week of
+senior-tech time lost. For a 5-tech MSP, the equivalent is 1 full FTE worth of
+senior-engineer hours leaking into context-switching and tab-hopping.
+
+## Target User & Narrowest Wedge
+
+**Target user:** Senior Systems Engineer at a small-to-mid MSP (5-20 techs). The
+founder is exemplar #1. Buying authority is shared between senior tech (champion)
+and MSP owner (signs the check).
+
+**Narrowest paid wedge:** Escalation Mode. Single sharp feature. When a junior tech
+escalates a ticket they were working in FlowPilot, the senior tech opens the ticket
+and sees the entire structured session state — every step the junior tried, every
+dead end, every command output — instead of starting with "tell me what you tried"
+for five minutes.
+
+Why this is the wedge:
+
+- **Two metrics, not one** (revised after /codex review 2026-04-27):
+  - **Manual baseline** (the Assignment, weeks 0-2): senior tech stopwatches the
+    next 5 escalations. T1 (first diagnostic action) − T0 (open ticket) under
+    today's verbal-handoff workflow. This is the "what you currently lose" number.
+  - **In-product metric** (telemetry, week 3+): time-to-first-action after claim,
+    derived from `ai_session_step` rows where `created_at > SessionHandoff.claimed_at`
+    AND `user_id = SessionHandoff.claimed_by`. This is the "what it is now with
+    structured handoff" number.
+  - **The savings claim** = manual baseline − in-product metric. Quote both
+    explicitly in pilot conversations. Do NOT roll the in-product number alone
+    into "minutes recovered" — that's an apples-to-oranges miscount Codex caught
+    in the cross-model review.
+- **Single-feature demo:** a 2-minute Loom shows the magic moment — junior hits
+  escalate, senior window opens with full structured context. No theory required.
+- **Cross-buyer story:** sells to senior tech (less interruption) AND owner (junior
+  techs resolve faster, take more accounts).
+- **Hours-saved math is simple:** 4-5 minutes per escalation × 15-30 escalations
+  per week per senior tech = 1-2 hours/week recovered per senior. At $80-150/hr
+  fully-loaded senior tech cost, the tool pays for itself with one customer.
+
+## Constraints
+
+- **One-founder shop.** Cannot run three concurrent product narratives. Sequence
+  matters more than scope.
+- **Pre-PMF runway implied.** 4-8 week build cycles before talking to a buyer are
+  expensive. Approach A's 1-2 week timeline is the binding constraint.
+- **Existing architecture is mostly aligned.** FlowPilot, unified_chat_service,
+  FlowProposal, ConnectWise PSA integration — most of the pieces exist. Risk is
+  positioning and UX, not capability.
+- **PSA copilot competition is real.** ConnectWise / Autotask / Halo are racing to
+  ship AI features. The wedge has to be sharp because we lose on distribution.
+
+## Premises
+
+The five load-bearing claims this design rests on, all confirmed in session:
+
+1. **Diagnostic AI is commoditized.** ResolutionFlow does not compete on
+   "AI solves the ticket faster." That race is over. ChatGPT/Claude already won.
+2. **The structuring layer is the wedge.** AI conversational output is too dense
+   and unstructured for active troubleshooting. ResolutionFlow's value is
+   organizing that output into actionable, separable, recorded steps.
+3. **Escalation context is the killer feature.** "Junior hits escalate, senior gets
+   full structured context in 30 seconds instead of 5 minutes" is the sharpest
+   demoable moment in the entire product surface.
+4. **First paying customer is bottom-up, prosumer-flavored.** Senior tech at a
+   small MSP, $20-50/seat/month, monthly billing. Owner-targeted enterprise
+   pricing waits until 5+ paying shops establish baseline ROI numbers.
+5. **Distribution is MSP communities, not paid SaaS ads.** r/msp, MSPGeek, RocketMSP,
+   PSA marketplace listings. The channel matches the buyer.
+
+## Approaches Considered
+
+### Approach A: Escalation Mode first (REDUCED SCOPE per /plan-eng-review)
+
+Lead the GTM with the killer feature. Polish the escalate-with-context handoff:
+junior tech mid-session hits escalate, senior tech window opens with full
+structured session state. 2-min demo Loom. Pilot with **3 MSPs** in the founder's
+network (capped at 3 to preserve build capacity for B). Metric: minutes recovered
+per escalation.
+
+**SCOPE REDUCTION (2026-04-27 eng review):** ~80% of Approach A is already built.
+The original 2-3 week estimate assumed greenfield. Codebase audit confirms:
+
+| What the doc said "build" | What actually exists |
+|---|---|
+| Session-state serialization | `ai_session.escalation_package` (JSONB), `SessionHandoff.snapshot` |
+| Senior-tech inbox | [EscalationQueuePage.tsx](frontend/src/pages/EscalationQueuePage.tsx) + [EscalationQueue.tsx](frontend/src/components/flowpilot/EscalationQueue.tsx) |
+| Claim workflow | [handoff_manager.py:123 claim_session()](backend/app/services/handoff_manager.py#L123) |
+| API surface | [session_handoffs.py](backend/app/api/endpoints/session_handoffs.py) — POST /handoff, /claim, GET queue |
+| AI assessment for senior | `_generate_ai_assessment()` in handoff_manager |
+| PSA round-trip | `escalation_package_markdown`, `escalation_package_external_id` |
+
+**Real engineering scope (~6-9 days):**
+
+1. **Notification dual-path** (4-5 days). `notification_sent` flag is a dead column —
+   never written. Wire two channels in `handoff_manager.create_handoff`:
+   - **Email** (existing `EmailService.send_notification_email`) — handles offline seniors.
+   - **WebSocket / SSE push** to the EscalationQueue for live demo magic moment.
+   - Set `notification_sent=true` after dispatch confirmation.
+   - Graceful degradation: handoff still created if notification raises (regression test required).
+
+2. **Hero metric endpoint** (~2 hours). New `GET /api/v1/analytics/escalation-metrics`,
+   account-scoped, role-gated to `require_engineer_or_admin`. Computes
+   *minutes recovered per escalation* by querying:
+   ```
+   ai_session_step.created_at (first row by senior_tech_user_id where created_at > SessionHandoff.claimed_at)
+   minus
+   SessionHandoff.claimed_at
+   ```
+   Returns a rolling-30-day average per account. No schema change.
+
+3. **UX polish on EscalationQueue + receiving-engineer view** (2-3 days). Confirm the
+   magic-moment screen lands when senior clicks claim. Add an unread indicator on
+   the queue. Wire optimistic insert when SSE event arrives.
+
+4. **Loom + landing page copy** (1-2 days). Non-engineering. Outside this plan's scope
+   but required for the GTM in week 3.
+
+**Test plan:** 100% coverage of new paths — 13 tests including 4 e2e and 1 regression
+(graceful-degradation when notification dispatch raises). Test plan artifact at
+`~/.gstack/projects/chihlasm-resolutionflow/abc-main-eng-review-test-plan-20260427-000000.md`.
+
+**Risk:** Low. Single feature, single metric, architecture-aligned. The dual-path
+notification is the only mildly novel surface; both halves use existing infra.
+
+**Reuses:** `services/handoff_manager.py`, `services/escalation_package_generator.py`,
+`models/session_handoff.py`, `models/ai_session.py`, `services/notification_service.py`,
+`models/notification_log.py`, EmailService, EscalationQueuePage + EscalationQueue.
+
+### UI Specifications (locked by /plan-design-review 2026-04-27)
+
+**Magic-moment screen** (new, after Pick Up click): dedicated handoff-context view that
+loads BEFORE the regular FlowPilot session view, then dissolves on first senior action.
+Four sections, single frame:
+
+1. **Problem summary** (top, 2-3 lines): junior's framing. Bricolage Grotesque h2.
+2. **What's been tried** (left or middle column): structured list of `dead_ends_flagged[]`
+   and `steps_attempted[]` from `escalation_package` JSONB. Card-flat surface, IBM Plex.
+3. **AI assessment** (right column): `ai_assessment_data` rendered as 3 fields —
+   `likely_cause`, `suggested_steps[]`, `confidence`. accent-dim badge for confidence.
+4. **Start here** (primary CTA, electric-blue, ≥44px touch target): opens FlowPilot
+   session at the most-likely-next-step. Senior typing or clicking anywhere triggers
+   200ms fade-out and FlowPilot view fades in. Re-openable via "Show handoff context"
+   ghost button in FlowPilot toolbar.
+
+**Hero metric ("minutes recovered per escalation"):** lives in TWO places:
+- **Queue stat-card** (above EscalationQueue list on /escalations): compact, "X.X hrs
+  saved this month" + "click for details" affordance. Refreshes on queue load.
+- **Dedicated `/analytics/escalations` page** (owner-facing): trend chart (4-week
+  rolling), per-tech breakdown, per-problem-domain segmentation. Engineer-or-admin
+  role-gated.
+
+**Real-time arrival visual** (when WebSocket pushes a new escalation):
+- New card slides in from above the list, 200ms ease-out CSS transition.
+- Browser tab title prefixes with " (1) " / " (N) " when tab is backgrounded; clears
+  on focus.
+- No sound. MUST respect `prefers-reduced-motion: reduce` (slide-in collapses to
+  instant fade-in).
+
+**Unread state:** subtle 6px dot in top-right corner of card for escalations the
+current senior has never opened. Dot fades on first hover or click.
+
+**Race-condition (two seniors click Pick Up simultaneously):** loser sees a toast
+"Already claimed by [name] 2s ago" via existing `@/lib/toast`; the card flashes the
+winner's name in the meta row for 1s, then dissolves from the loser's view via
+optimistic update + WebSocket reconciliation.
+
+**Unread state (Codex correction 2026-04-27):** dot indicator clears on **open,
+claim, or explicit dismiss** — NOT on hover. Hover-to-clear is a bad proxy for
+acknowledgment because incidental mouse movement creates false clears.
+
+**Notification routing (Codex finding 2026-04-27):** v1 fans out the email + push
+to **all engineer-or-admin role users in the same account_id as the SessionHandoff**.
+No on-call/round-robin logic in v1. If pilots ask for routing, capture as v2 TODO.
+The first senior to claim wins; everyone else's notification self-resolves on
+WebSocket reconciliation.
+
+**Notification delivery model (Codex correction 2026-04-27):** drop the
+`notification_sent: bool` flag from v1. Replace with per-channel delivery rows
+in a new `notification_log` table (already exists — reuse, don't add a new model)
+keyed by `(handoff_id, channel, recipient_user_id, status)` where status ∈
+{queued, sent, failed, suppressed}. This makes partial-success and per-channel
+retry visible. If the existing `notification_log` schema doesn't match, defer
+the per-channel persistence to a v2 TODO and v1 logs delivery attempts to the
+existing telemetry stream instead. Do NOT keep the dead boolean.
+
+**"Start here" CTA (Codex correction 2026-04-27):** opens the FlowPilot session
+at the **latest known state** (the AI's most recent agent_message + the current
+pending_task_lane). Surface `ai_assessment_data.suggested_steps[]` as a list of
+chips below the chat input — clicking a chip prefills the input. Do NOT invent a
+"jump to most-likely-next-step" capability that doesn't exist in the session model.
+
+**`/claim` role gate (Codex correction 2026-04-27, IN-SCOPE for v1):** add
+`require_engineer_or_admin` dep on POST `/handoffs/{id}/claim`. Originally
+deferred to TODO during eng review; Codex correctly flagged it as wedge-relevant
+because the race-condition story depends on auth gating. ~30 min change. Removed
+from TODO.md.
+
+**A11y requirements (mandatory before pilot ship):**
+- Keyboard: Tab order through queue cards; Enter on focused card opens it; Pick Up
+  button is a reachable target; Esc closes the handoff-context overlay.
+- ARIA: `role="region"` + `aria-live="polite"` on the queue list (announces arrivals);
+  `aria-label="N escalations awaiting pickup"` on the heading; the slide-in animation
+  must not announce twice (debounce live-region updates).
+- Pick Up button: bump from `py-2` to `py-2.5` to clear the 44px touch-target floor.
+- Color contrast: confidence-badge text on accent-dim background must be ≥4.5:1
+  (verify against DESIGN-SYSTEM.md tokens).
+
+**DS token discipline:** every new piece must use `card-flat`, `accent-dim`/`accent-text`,
+`text-muted-foreground`, `bg-card`/`bg-elevated`, IBM Plex / Bricolage / JetBrains,
+explicit `transition` property lists (never `transition: all`). No glass, no blur,
+no gradient surfaces. Electric-blue accent reserved for interactive elements only.
+
+**Mobile responsive:** deferred to post-pilot TODO. Pre-PMF wedge target is desktop;
+MSP techs work on laptops/desktops in shop environments.
+
+**Deferred to TODO.md (out of scope for v1 wedge):**
+- Peer-tech escalates colleague's session (currently session-owner-only)
+- Role gate on POST /claim (currently any authenticated user in account)
+
+### Approach B: Full Structured Resolution loop (split B1 + B2)
+
+End-to-end demo: tech opens FlowPilot, structure appears in side panel as AI
+responds, ticket notes auto-populate at end, optional runbook capture for reusable
+patterns. Tells the full "senior engineer over your shoulder" story.
+
+**B1 — Side panel + PSA-formatted ticket notes** (ships first):
+- Structured side panel that surfaces parsed AI markers as live actionable steps
+  while the conversation runs.
+- PSA-formatted ticket-notes exporter (ConnectWise first; Autotask/Halo later).
+- Effort: M (~3 weeks).
+
+**B2 — Runbook offer-and-save** (gated on pilot demand):
+- "Save this resolution as a flow?" prompt at session end, with auto-drafted
+  runbook from the structured session state.
+- Effort: S (~1 week). Don't build until at least 2 pilot customers explicitly
+  ask for it.
+
+- **Risk:** Medium. The structured-output panel quality is the whole demo. If it
+  looks dumb, the demo dies.
+- **Reuses:** FlowPilot, unified_chat_service, FlowProposal, ConnectWise PSA
+  integration.
+
+### Approach C: Senior-Tech Time-Saved Counter
+
+Continuous measurement layer underneath A and B. Every session contributes an
+estimated minutes-saved number. Owner-facing dashboard quotes "this month your
+shop saved N hours of senior-tech time." Sells to MSP owner with verifiable ROI.
+
+- **Effort:** S (~1 week + ongoing measurement methodology refinement).
+- **Risk:** Medium-low. Methodology has to be defensible. If numbers look
+  made-up, trust dies fast.
+- **Reuses:** FlowPilot telemetry, session metadata, account-scoped analytics.
+
+## Recommended Approach
+
+**A first (1-2 weeks), then B (3-4 weeks after A ships), with C running underneath
+both as a continuous backdrop.**
+
+Sequence rationale:
+
+- **A is the sharpest possible 2-minute demo.** Single feature, single metric,
+  buyer-verifiable in their own data. Get it in front of 5 MSPs in week 3.
+- **B is the depth play.** Once Approach A has produced first-pilot signal,
+  Approach B's full structured-resolution loop becomes the "what we ship next" that
+  retains pilots and converts them to paid.
+- **C compounds across both.** Every session under A or B contributes to the
+  time-saved counter. By week 6 there are real numbers to put in front of an MSP
+  owner — turning a senior-tech-led pilot into an owner-signed contract.
+
+This sequence is non-negotiable. Building B before A is the classic pre-PMF trap of
+perfecting product before validating GTM. Building C alone is measurement without a
+demo to anchor it.
+
+## Pricing
+
+**Pilot pricing (first 3-5 customers): $39/seat/month, monthly billing,
+month-to-month.** Anchored against IT Glue (~$29/tech), Hudu (~$25/tech),
+Liongard (~$3/endpoint). The premium over IT Glue/Hudu reflects the active-session
+value (vs. their static-runbook value) — 30% above the runbook-only category.
+
+Customer #6+ pricing is an Open Question (revisit after 3 pilots produce real
+hours-saved data; price up if the per-seat ROI is over $200/seat/mo).
+
+## Open Questions
+
+1. **Free-tier shape.** Should the time-saved counter be free forever as a
+   distribution lever, with paid for the structuring + escalation? Land-and-expand
+   pattern. Decide after 3 pilot conversions.
+2. **PSA-marketplace timing.** ConnectWise Marketplace listing requires partnership
+   onboarding (~6-week cycle). Submit application week 5; expect listing live by
+   week 11. Don't gate launch on it.
+3. **Customer #6+ pricing.** Revisit after 3 pilot customers produce verifiable
+   hours-saved numbers.
+
+## Deferred (YAGNI until 10 paying customers)
+
+- HIPAA / SOC2 audit positioning. Pre-PMF is too early; revisit when a regulated-
+  vertical MSP asks for it explicitly.
+- Multi-PSA depth (Autotask, Halo). ConnectWise alone covers ~40% of the SMB MSP
+  market and is sufficient for first 5-10 customers.
+- Cross-tenant pattern detection. The data-flywheel-across-shops play is at least
+  6 months out; building it before single-shop ROI is proven is premature.
+
+## Success Criteria (revised for realism)
+
+- **Week 3:** Approach A shipped. 3 MSPs in active free pilot (cap at 3 to
+  preserve B1 build capacity).
+- **Weeks 3-6:** Pilot management dominates. B1 build is paused; founder runs
+  pilot calls, captures bug reports, iterates UX. Stripe seat-based billing is
+  set up in week 5.
+- **Week 6:** First verbal commit from a pilot customer. Verified
+  minutes-recovered-per-escalation number from at least 2 pilots.
+- **Week 8:** First paid customer (procurement cycles run 4-6 weeks even at small
+  MSPs; 2 weeks from verbal commit to signed contract is realistic). Time-saved
+  counter (Approach C) producing dashboard-quality data.
+- **Week 11:** B1 (side panel + PSA notes) shipped. 3-5 paying customers. First
+  MSP-owner-led conversation. ConnectWise Marketplace listing live.
+- **Quarter end:** $5K MRR or 10 paying customers, whichever comes first. Loom
+  demos posted publicly to r/msp and MSPGeek.
+
+## Distribution Plan (week-by-week cadence)
+
+- **Week 3:** Escalation Mode demo Loom posted. r/msp launch post.
+- **Week 4:** MSPGeek Discord AMA scheduled. RocketMSP newsletter pitch sent.
+- **Week 5:** ConnectWise Marketplace listing application submitted. Stripe
+  billing live for paid conversion.
+- **Week 6:** First "guest on Inside MSP podcast" outreach. Second r/msp post
+  (case study from a pilot, anonymized).
+- **Week 7-8:** Pilot conversion calls. First paying customer.
+- **Week 9-11:** B1 ships. Owner-targeted demo Loom. Second podcast outreach.
+
+**Founder-led pilot:** The first 3-5 customers come from the founder's existing
+MSP network. Treat them as design partners; expect to ship feature requests
+weekly during pilot. Cap at 3 active pilots until B1 ships.
+
+**Tech audience channels:** r/msp, r/sysadmin, MSPGeek Discord, RocketMSP
+newsletter, Inside MSP podcast.
+**Owner audience channels:** ConnectWise Marketplace, MSP-focused Substacks,
+RIA Vendor Roundup.
+
+CI/CD: existing Railway auto-deploy via GitHub mirror. No new pipeline needed.
+
+## Dependencies
+
+- **Session-state serialization (Approach A blocker).** Schema design + migration
+  is the longest-lead engineering task. 3-5 days budget. Do this first.
+- **Stripe seat-based billing (week 5 task).** No billing infrastructure exists
+  today. ~3-5 days of work for monthly subscriptions + invoice flow. Block on
+  this before week-8 first-paid milestone.
+- **ConnectWise PSA integration depth.** Sufficient for ticket-notes auto-export
+  (Approach B1). Autotask and Halo wait until first 5 paying ConnectWise
+  customers.
+- **Authentication.** Existing JWT + role hierarchy is sufficient for senior-tech
+  inbox view; no new auth work needed.
+
+## Risks and Kill-Switch
+
+- **Risk: Session-state serialization design churn.** If the schema needs to
+  change after pilot feedback, every saved session has to migrate. Mitigation:
+  keep schema versioned and forward-compatible from day 1.
+- **Risk: Pilot-to-paid conversion slower than 4-6 weeks.** MSP procurement is
+  notoriously slow. Mitigation: get verbal commits in writing; price as
+  month-to-month with no annual contract to lower the buying friction.
+- **Risk: ConnectWise ships an equivalent feature in their 2026.x release.**
+  Mitigation: lead the marketing on "we're independent of your PSA" — works with
+  any PSA, not just ConnectWise. The founder's PSA-agnostic FlowPilot is an
+  asset here.
+- **Kill-switch criterion:** if 0 of 3 pilots produce a verifiable
+  hours-saved-per-week number above 1.0 by week 8, **revisit the wedge**. The
+  product may need to pivot to deterministic-ops territory (Read 1 from the
+  session) or be repositioned. Don't sink another quarter into the current GTM
+  story without this number.
+
+## The Assignment
+
+**This week, before any code:**
+
+Time-track the next 5 escalations in your shop manually. For each, capture:
+1. Time the senior tech opens the ticket
+2. Time the senior tech takes their first diagnostic action (not counting the
+   verbal "tell me what you tried" warm-up)
+3. The delta — that's the wasted time per escalation today
+
+Average those 5 numbers. **That's the hero stat in your first sales conversation:**
+"Senior techs at our shop wasted N minutes per escalation just getting up to
+speed. We built the thing that takes that to zero."
+
+Don't try to pull this from telemetry — the doc itself notes that retrieval/re-use
+data isn't queryable yet. Manual stopwatch on the next 5 escalations is the
+fastest path to a defensible number.
+
+This is the assignment because it forces the GTM story into the same time-zone as
+the build, and it's a one-day effort that compounds for every conversation
+afterward.
+
+## What I noticed about how you think
+
+- You contradicted my framing twice in the same session and the second
+  contradiction was sharper than the first. Most founders agree with the
+  diagnostic and walk out with a polished version of what they came in with. You
+  said "I'm just questioning if flows are even the way to go" — and that
+  sentence reset the entire wedge. That's craft.
+
+- "The senior engineer looking over your shoulder" came out of you spontaneously,
+  not as a prepared pitch. That's the line. Use it. It survives because it's
+  emotional truth (every junior tech has had this, every senior tech has been
+  this), not constructed marketing copy.
+
+- You're solving your own problem with your own time. 20 hrs/week isn't a
+  hypothetical user pain — it's your Tuesday. Founders who solve their own pain
+  ship sharper products because the feedback loop is instant.
+
+- The escalation feature emerged from your description, not mine. I was busy
+  cataloging documentation pains. You said "junior to senior escalation? no
+  worries there either" almost as an afterthought. That afterthought is the wedge.
+  Pay attention to which features you describe casually versus which you push hard
+  on — the casual ones are sometimes where the truth lives.
+
+## GSTACK REVIEW REPORT
+
+| Review | Trigger | Why | Runs | Status | Findings |
+|--------|---------|-----|------|--------|----------|
+| CEO Review | `/plan-ceo-review` | Scope & strategy | 0 | — | not run |
+| Codex Review | `/codex review` | Independent 2nd opinion | 1 | INFO | 12 findings, 6 applied, 1 partial, 5 rejected |
+| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 1 | CLEAR (PLAN) | 2 issues, 0 critical gaps, scope reduced |
+| Design Review | `/plan-design-review` | UI/UX gaps | 1 | CLEAR (FULL) | score 6/10 → 9/10, 8 decisions |
+| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | not run |
+
+- **CODEX:** 12 findings reviewed. Applied: 2-metric framing (#2), notification routing spec (#3), per-channel delivery model (#4), unread-state fix (#11), Start-here CTA reframe (#9), claim role gate moved in-scope (#8). Rejected: full scope reduction to PSA-brief-only (#6/7/12 — user kept queue UI as demo hero). Partial: scope concern (#5) acknowledged in eng review's email-first/polling-fallback. Misread: #1, #10.
+- **CROSS-MODEL:** Claude (eng + design reviews) and Codex agree on 6/12 findings. The major disagreement was scope — Codex argued for cutting the queue UI, user rejected. Both agree on metric definition, notification routing, claim auth gating.
+- **UNRESOLVED:** 0
+- **VERDICT:** ENG + DESIGN CLEARED, CODEX REVIEWED — ready to implement.
--- a/docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md
+++ b/docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md
@@ -0,0 +1,33 @@
+# Test Plan
+Generated by /plan-eng-review on 2026-04-27
+Branch: main
+Repo: chihlasm/resolutionflow
+
+## Affected Pages/Routes
+
+- `/escalations` ([EscalationQueuePage.tsx](frontend/src/pages/EscalationQueuePage.tsx)) — senior-tech inbox view; verify queue list, real-time arrival, click-through
+- `/pilot/:session_id` (FlowPilotSessionPage) — verify post-claim load shows full escalation context (snapshot, ai_assessment, escalation_package)
+- `GET /api/v1/analytics/escalation-metrics` (NEW) — verify hero metric calculation, account-scoping, role gate
+
+## Key Interactions to Verify
+
+- Junior tech clicks **Escalate** in active FlowPilot session → handoff is created → notification fires → senior sees escalation in queue within 30 seconds
+- Senior tech clicks **Claim** in queue → session reactivates → senior is redirected into FlowPilot session view → ai_assessment + snapshot are visible
+- Senior types first message in chat after claim → metric query starts attributing time-to-first-action
+- MSP owner opens analytics page → "minutes recovered per escalation" widget shows current month's rolling average
+
+## Edge Cases
+
+- **Two seniors race to claim** the same handoff → one wins, the other gets a "Already claimed by [name]" message
+- **Senior is offline** when escalation fires → email arrives via existing `EmailService.send_notification_email`
+- **WebSocket disconnects mid-session** → frontend reconnects; missed events backfilled by re-fetching the queue
+- **Notification dispatch raises** (SMTP down, WebSocket fanout fails) → handoff is still created (graceful degradation)
+- **Senior takes non-chat action first** (e.g., posts directly to PSA) → metric falls back to PSA writeback timestamp or remains null; doc the chosen behavior
+- **Account-scoped multi-tenancy** → senior at MSP A cannot see escalations from MSP B (Phase 4 RLS)
+- **Role gate on metric endpoint** → only `engineer_or_admin` can hit `/escalation-metrics`
+
+## Critical Paths
+
+1. **Magic-moment demo flow** (the entire Loom): junior escalate → senior notification → senior claim → session view → first action recorded → metric updates
+2. **Email fallback** when senior is offline — must not silently drop
+3. **Regression: handoff creation succeeds even if notification dispatch raises** — graceful degradation is mandatory