# MSP Assistant Harness — Super Plan **Date:** 2026-04-01 **Status:** Approved — ready to execute **Sources:** `MSP_Assistant_Harness_Implementation_Plan.docx` (v2.0) + `2026-04-01-msp-assistant-harness-design.md` (brainstorming session) --- ## Goal Reframe `/assistant` from a generic AI chat surface into a **live MSP triage cockpit**. An engineer arrives with an open ticket; the page immediately reads as their operational tool — not an AI chatbot that's been adapted for IT work. The change is a UI and data layer reframe. The existing session, branching, PSA, and conclude architecture is preserved and extended, not rebuilt. ### Key Architectural Choices This plan explicitly chooses: - **`FlowPilot`** as the primary page/product label (not "Assistant Harness") - **Backend triage + handoff contracts required in v1** — not deferred to a later phase - **Desktop-first cockpit layout** with clean mobile degradation - **Explicit persisted triage fields** on the session model, not purely derived/computed header state - **Prompt-embedded structured extraction** (`[TRIAGE_UPDATE]` marker) as the primary AI triage path, with post-response model pass only as fallback - **Sidebar visual demotion** — existing sidebar stays but is visually de-emphasized so the cockpit reads as an operations surface, not a chat app --- ## What Phase 0 Resolved The brainstorming session (2026-04-01) locked these decisions. They are not open questions. | Question | Decision | |----------|----------| | Layout structure | Stacked zones: incident header → work zone → (drag handle) → conversation log → compose | | Incident header style | Single row, explicit micro-labels above each field, per-field `✏` edit | | Work zone left panel | Ordered step checklist (✓ / → / ○) | | Work zone right panel | Two stacked mini-panels: FlowPilot Asks (top) + What We Know (bottom) | | Chat zone treatment | Drag-resizable split, compact `you:` / `fp:` prefix style, darker background | | Chat collapsibility | Not collapsible — drag handle gives control | | Scope | Includes all required backend changes, not UI-layer only | | Conclude modal | Fully redesigned as structured handoff artifact | | Page label | "FlowPilot" (not "AI Assistant") | | "New Chat" label | "New Case" | | "Conclude" label | "Close Case" | | Hypothesis language | "Hypothesis" (direct, not softened to "working theory") | | What We Know editability | Engineer-editable + AI-appended | | Header field population | Intake form + AI-inferred mid-session + manual engineer override | --- ## Cockpit Layout ``` ┌─────────────────────────────────────────────────────────────┐ │ [Left sidebar — Case History, unchanged] │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ INCIDENT HEADER (single row, labelled fields) │ │ │ │ CLIENT DEVICE CATEGORY HYPOTHESIS │ │ │ │ Contoso ✏ jsmith-04 ✏ DNS/Net ✏ Cache fail ✏ │ │ │ │ [CW #48291][Resolve⋯]│ │ │ ├───────────────────────┬───────────────────────────────┤ │ │ │ │ ▸ FLOWPILOT ASKS (amber) │ │ │ │ STEPS (~55%) │ Did nslookup time out? │ │ │ │ ✓ Ping 8.8.8.8 │ [Time out] [Wrong IP] [Both] │ │ │ │ → nslookup ←active ├───────────────────────────────┤ │ │ │ ○ Flush DNS │ WHAT WE KNOW │ │ │ │ ○ Check NIC │ ✓ Gateway reachable │ │ │ │ │ ✗ DNS 1.1.1.1 — timeout │ │ │ │ [⚡ Generate Script] │ ? DNS 8.8.8.8 — pending │ │ │ ├───────────────────────┴───── ≡ drag handle ───────────┤ │ │ │ CONVERSATION LOG (compact, darker bg) │ │ │ │ you: Can't resolve external DNS, internal fine │ │ │ │ fp: Ping test passed. Run nslookup google.com. │ │ │ │ you: Timed out on 1.1.1.1 too. │ │ │ ├───────────────────────────────────────────────────────┤ │ │ │ Describe next finding or ask FlowPilot... [Send] │ │ │ └───────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Non-Goals - No redesign of `/pilot` (FlowPilot session page) — separate page, untouched - No rebuild of session, branching, or PSA architecture - No new data model for conversations — `conversation_messages` JSONB unchanged - No mobile-first redesign — mobile degrades cleanly, desktop is primary - No generic "assistant polish" that does not tighten the harness --- ## Backend Changes ### B1 — Alembic migration `071` File: `backend/alembic/versions/071_add_triage_fields_to_ai_sessions.py` Add to `ai_sessions`: | Column | Type | Notes | |--------|------|-------| | `client_name` | `VARCHAR(255)` | MSP client for incident header | | `asset_name` | `VARCHAR(255)` | Device / user being worked on | | `issue_category` | `VARCHAR(100)` | Human-readable category ("DNS / Networking") | | `triage_hypothesis` | `TEXT` | Working hypothesis — AI-updated + editable | | `evidence_items` | `JSONB` | What We Know list — persisted for resume | `evidence_items` schema: `[{ "text": str, "status": "confirmed" | "ruled_out" | "pending" }]` Note: existing `problem_domain` is an internal classifier slug and is unchanged. `issue_category` is the human-readable display label. Both coexist. ### B2 — Updated schemas (`backend/app/schemas/ai_session.py`) **New `TriageUpdate`:** ```python class TriageUpdate(BaseModel): client_name: str | None = None asset_name: str | None = None issue_category: str | None = None triage_hypothesis: str | None = None evidence_items: list[dict] | None = None # appends to existing list ``` **Updated `ChatMessageResponse`:** ```python class ChatMessageResponse(BaseModel): # ... existing fields unchanged ... triage_update: TriageUpdate | None = None ``` **Updated `QuestionItem`** — add quick-reply options: ```python class QuestionItem(BaseModel): text: str context: str = "" options: list[str] | None = None # quick-reply labels; null → free-text input ``` **Updated `ResolveSessionRequest` / `EscalateSessionRequest`:** ```python root_cause: str | None = None steps_taken: list[str] | None = None recommendations: str | None = None ``` ### B3 — New `PATCH /ai-sessions/{id}/triage` endpoint ``` PATCH /ai-sessions/{session_id}/triage Auth: require_engineer_or_admin Body: { client_name?, asset_name?, issue_category?, triage_hypothesis?, evidence_items? } Response: { id, client_name, asset_name, issue_category, triage_hypothesis, evidence_items } ``` Called on every manual header field edit. Partial update — only supplied fields are written. ### B4 — New `POST /ai-sessions/{id}/handoff-draft` endpoint ``` POST /ai-sessions/{session_id}/handoff-draft Auth: require_engineer_or_admin Response: StreamingResponse (text/event-stream) ``` Streams structured handoff JSON built from session context: ```json { "root_cause": "...", "resolution": "...", "steps_taken": ["..."], "recommendations": "..." } ``` Uses: `problem_summary`, `triage_hypothesis`, `evidence_items`, last 20 `conversation_messages`, saved task lane state. Called immediately on conclude modal open — engineer can edit while stream fills in. ### B5 — `unified_chat_service.py` — triage extraction After each AI response, extract triage signals and return as `triage_update`. **Recommended approach:** Add a `[TRIAGE_UPDATE]` structured marker to the system prompt, following the existing `[QUESTIONS]` / `[ACTIONS]` / `[FORK]` marker pattern. The AI emits the block only when it has new signal: ``` [TRIAGE_UPDATE] client_name: Contoso Ltd issue_category: DNS / Networking triage_hypothesis: Corrupted DNS cache on NIC evidence_items: - confirmed: Gateway 192.168.1.1 reachable - ruled_out: DNS 1.1.1.1 — timeout [/TRIAGE_UPDATE] ``` Service parses this, strips it from `display_content`, auto-PATCHes the session record, and returns `triage_update` in the response. ### B6 — `resolution_output_generator.py` — accept structured fields Update `_build_session_context()` to incorporate `root_cause`, `steps_taken`, and `recommendations` when supplied, producing richer `psa_ticket_notes` and `client_summary` outputs. ### B7 — Session detail response — expose new triage fields `GET /ai-sessions/{id}` (and the session list item) must return the 5 new fields so the frontend can restore header state on session load and resume. --- ## Frontend Changes ### F1 — `AssistantChatPage.tsx` — cockpit layout refactor Replace current layout (sidebar + chat column + TaskLane right rail) with the stacked cockpit structure. **New state:** - `triageMeta: TriageMeta` — `{ client_name, asset_name, issue_category, triage_hypothesis, evidence_items }` - `workZoneHeight: number` — persisted to `localStorage('rf-assistant-work-zone-height')` **On session load / resume:** populate `triageMeta` from session response new fields. **On AI response:** if `response.triage_update` is non-null, merge into `triageMeta` (partial — preserve existing non-null values unless AI explicitly overwrites). **Work zone layout:** left `StepsPanel` + right column with `FlowPilotAsks` stacked above `WhatWeKnow`. **Chat zone layout:** compact `ConversationLog` below drag handle, independent scroll. ### F2 — New `IncidentHeader.tsx` ``` frontend/src/components/assistant/IncidentHeader.tsx ``` Props: `triageMeta: TriageMeta`, `psaTicketId: string | null`, `sessionId: string`, `onFieldSave(field, value)`, `onResolve()`, `onOverflow()` - Single-row bar with micro-labels (CLIENT / DEVICE / CATEGORY / HYPOTHESIS) - Each field: `✏` icon visible on hover → opens inline `EditPopover` (text input + Save/Cancel) - On Save: calls `aiSessionsApi.updateTriage(sessionId, { [field]: value })` - Empty fields: muted placeholder ("Unknown client", "No device specified", etc.) - Right side: PSA ticket badge (if linked) + Resolve button + `⋯` overflow menu ### F3 — Refactored `StepsPanel.tsx` (from `TaskLane`) ``` frontend/src/components/assistant/StepsPanel.tsx ``` Preserves all `TaskLane` data logic and persistence. Changes rendering only: | State | Icon | Style | |-------|------|-------| | Completed | `✓` | Strikethrough, muted, green icon | | Active | `→` | Blue left border, white text, full opacity | | Pending | `○` | Muted text | Script generation CTA: shown at bottom when active step `command` references "script" or AI has flagged it. `TaskLane.tsx` can remain for now (no renames required in this phase) — `StepsPanel` is a new component that consumes the same `activeActions` prop. ### F4 — New `FlowPilotAsks.tsx` ``` frontend/src/components/assistant/FlowPilotAsks.tsx ``` Props: `questions: QuestionItem[]`, `onAnswer(answer: string)` - Renders first unanswered question - `question.options` non-null → button row; clicking calls `onAnswer(option)` - `question.options` null → compact text input + Send - `onAnswer` calls parent's `handleSend` with the answer string - Hidden entirely when `questions` is empty ### F5 — New `WhatWeKnow.tsx` ``` frontend/src/components/assistant/WhatWeKnow.tsx ``` Props: `items: EvidenceItem[]`, `onAdd(text, status)`, `onEdit(index, text, status)` - Evidence list: `✓` confirmed (green) / `✗` ruled out (red) / `?` pending (muted) - "+ Add finding" inline entry at bottom - Click any item to edit inline - State lives in `AssistantChatPage` (`triageMeta.evidence_items`), synced to backend via `PATCH /triage` ### F6 — Drag-resizable split Thin handle bar between work zone and conversation log. On drag: update `workZoneHeight` in state, persist to `localStorage`. On mount: restore, default `55%`. ### F7 — Compact `ConversationLog` rendering Replace current full `` bubbles in the log zone with a compact list: `you: ...` / `fp: ...` prefix style, tighter line height, no avatars. `ChatMessage` can still be used for rich content (forks, suggested flows) in a compact variant. Individual messages should support click-to-expand for full rendering when the engineer needs to re-read a longer response or review a suggested flow. ### F8 — Redesigned `ConcludeSessionModal.tsx` On open: 1. Call `aiSessionsApi.getHandoffDraft(sessionId)` (streaming) — fields fill in as stream arrives 2. Render: outcome selector (Resolved / Escalated / Parked) 3. Render 4 structured editable fields: Root Cause, Resolution, Steps Taken, Recommendations 4. Render output destination checkboxes: Post to CW note / Save to KB / Send client summary 5. Confirm → call resolve/escalate/pause with enriched request body including structured fields ### F9 — Sidebar visual demotion The existing `ChatSidebar` stays functionally unchanged but should be visually softened so the cockpit — not the session list — reads as the primary surface. Specific changes: - Reduce sidebar background contrast (use `bg-sidebar` or one step darker) - Reduce sidebar header prominence (smaller label, no bold "Chat History" heading) - Rename "Chat History" → "Case History" (part of language pass) - Default sidebar to collapsed state on first cockpit load (existing collapse toggle + `localStorage`) ### F10 — MSP-native language pass | Old | New | |-----|-----| | "AI Assistant" (page title, meta) | "FlowPilot" | | "New Chat" | "New Case" | | "Messages" | "Conversation Log" | | "Task Lane" (panel label) | "Steps" | | "Conclude" | "Close Case" | | "Chat history" (sidebar label) | "Case History" | | Compose placeholder | "Describe finding, paste log output, or ask FlowPilot..." | ### F11 — New API methods (`aiSessions.ts`) ```typescript updateTriage(sessionId: string, fields: Partial): Promise getHandoffDraft(sessionId: string): AsyncGenerator ``` ### F12 — New types (`types/ai-session.ts`) ```typescript interface TriageMeta { client_name: string | null asset_name: string | null issue_category: string | null triage_hypothesis: string | null evidence_items: EvidenceItem[] } interface EvidenceItem { text: string status: 'confirmed' | 'ruled_out' | 'pending' } interface TriageUpdate extends Partial {} // Extend existing: interface QuestionItem { text: string context: string options?: string[] // new } ``` --- ## Phased Execution Order ### Phase 1 — Backend Foundation > Lock backend schema and API changes first so the cockpit can be built against stable session contracts. 1. Write migration `071` — add 5 columns to `ai_sessions` 2. Run `alembic upgrade head`, verify columns 3. Update `AISession` model with new mapped columns 4. Add `TriageUpdate` schema, extend `QuestionItem`, extend `ChatMessageResponse` 5. Extend `ResolveSessionRequest` / `EscalateSessionRequest` with structured fields 6. Add `PATCH /{id}/triage` endpoint 7. Add `POST /{id}/handoff-draft` streaming endpoint 8. Update `GET /ai-sessions/{id}` response to include new triage fields 9. Update `resolution_output_generator._build_session_context()` to use structured fields 10. Run backend tests — `pytest --override-ini="addopts="` ### Phase 2 — Triage Extraction (AI layer) 11. Add `[TRIAGE_UPDATE]` marker to `unified_chat_service.py` system prompt 12. Implement `_parse_triage_update_marker()` in the service 13. Auto-PATCH session on non-null `triage_update` 14. Add `options` generation instructions to `[QUESTIONS]` system prompt section 15. Verify extraction in a live session ### Phase 3 — New Frontend Types + API 16. Add `TriageMeta`, `EvidenceItem`, `TriageUpdate` to `types/ai-session.ts` 17. Extend `QuestionItem` type 18. Add `updateTriage()` and `getHandoffDraft()` to `aiSessions.ts` ### Phase 4 — New Work Zone Components 19. Build `IncidentHeader.tsx` with `EditPopover` 20. Build `StepsPanel.tsx` 21. Build `FlowPilotAsks.tsx` 22. Build `WhatWeKnow.tsx` ### Phase 5 — Page Layout Refactor 23. Refactor `AssistantChatPage.tsx` — implement stacked cockpit layout 24. Wire `triageMeta` state, session load population, `triage_update` merge 25. Implement drag-resizable split with `localStorage` persistence 26. Compact `ConversationLog` rendering ### Phase 6 — Handoff Modal + Language Pass + Sidebar 27. Redesign `ConcludeSessionModal.tsx` — structured handoff form 28. Sidebar visual demotion — background, label prominence, default-collapsed 29. MSP-native language pass across all assistant components 30. Update `` title ### Phase 7 — QA + Hardening 31. `npx tsc -b` — fix any TypeScript errors 32. `npm run build` — production build clean 33. Functional regression: all chat flows, session switching, conclude/resume 34. Harness feel test: cockpit within 3 seconds? 35. Mobile viewport check 36. Stress test: 50+ messages, 10+ steps, long outputs --- ## Risks and Mitigations | Risk | Mitigation | |------|-----------| | `[TRIAGE_UPDATE]` marker extraction is unreliable — AI doesn't emit it consistently | Gate Phase 2 on a pass/fail test with 5 real sessions before wiring it to the header. Fall back to Option B (post-response Haiku pass) if needed. | | Header fields feel fabricated — AI guesses wrong client or hypothesis | Show confidence-aware placeholder copy ("FlowPilot is building context…") until a field has real data. Never invent. | | Task lane visual promotion breaks established chat patterns | Keep all send/respond behavior intact. Change hierarchy only. Verify every task-lane state transition manually. | | Handoff modal exposes weak underlying summaries | Reuse existing `ResolutionOutputGenerator` output where possible. Add guardrail copy for empty fields. | | Mobile loses compose or step access | Test responsive layout as a first-class deliverable in Phase 7, not a final sweep. Enforce scroll isolation between all zones. | | `tsc -b` errors after component refactor | Run `npx tsc -b` after every phase. Trace unused imports/props immediately — don't batch (lesson #92). | --- ## Test Plan ### Harness Feel (primary, subjective) - Does the page read as an MSP triage cockpit within 3 seconds on first load? - Is the active step obvious without reading chat? - Do FlowPilot Asks quick-reply buttons work and update the step list? - Does the incident header update mid-session as AI learns context? - Drag handle, refresh — does split restore? - Does the conclude modal look like a case handoff or a chat closure? ### Functional Regression - New session (no PSA) — header degrades gracefully - New session (with CW ticket) — header populates from ticket data - Send message → `triage_update` updates header - Click quick-reply button → answer submitted, step advances - Add finding to What We Know → persisted via PATCH - Edit header field via `✏` → saved and survives refresh - Conclude as Resolved → handoff draft fills modal → post to CW note - Conclude as Escalated → same - Pause and resume → triage header restores from saved session fields - Session switching (currentChatRef guard) — no stale state - Image paste, forks, suggested flows — all still work ### MSP Scenarios (from docx) 1. Single-user endpoint issue (basic triage flow, script generation) 2. M365 / tenant-wide issue (multi-user context, issue category) 3. Network / VPN outage (asset targeting, hypothesis tracking) 4. Escalation and resume (session persistence, structured handoff) ### Edge Cases - 50+ messages — layout hierarchy stays intact - 10+ steps — step panel scrolls, compose remains accessible - Long issue titles / hypothesis text — header truncates gracefully - Missing PSA context — placeholder copy, not blank fields - Narrow mobile viewport — all zones reachable ### Backend Checks ```bash # Migration alembic upgrade head psql -U postgres -d resolutionflow -c "\d ai_sessions" | grep -E "client_name|asset_name|issue_category|triage_hypothesis|evidence_items" # Triage PATCH curl -X PATCH http://localhost:8000/ai-sessions/{id}/triage \ -H "Authorization: Bearer $TOKEN" \ -d '{"client_name":"Test Client","triage_hypothesis":"Cache corruption"}' # Handoff draft stream curl -X POST http://localhost:8000/ai-sessions/{id}/handoff-draft \ -H "Authorization: Bearer $TOKEN" ``` --- ## Assumptions - Desktop is the primary target; mobile must remain usable but does not drive the layout. - `/assistant` remains the chat-session cockpit; `/pilot` is out of scope. - New triage fields are **additive** — they do not replace `problem_summary`, `problem_domain`, `ticket_data`, or `conversation_messages`. - `issue_category` is the operator-facing display field; `problem_domain` remains the internal classifier. Both coexist. - `evidence_items` is editable by both AI and engineer; engineer edits persist through the triage PATCH endpoint. - PSA context is optional — every triage header field must degrade gracefully when PSA is absent or session is free-text-only. - The existing `TaskLane.tsx` component remains in the codebase — `StepsPanel` is a new component that consumes the same props with different rendering. No risky renames during this work. --- ## Critical Files | File | Change | |------|--------| | `backend/alembic/versions/071_add_triage_fields_to_ai_sessions.py` | New migration | | `backend/app/models/ai_session.py` | Add 5 new mapped columns | | `backend/app/schemas/ai_session.py` | `TriageUpdate`, `QuestionItem.options`, extended request/response schemas | | `backend/app/api/endpoints/ai_sessions.py` | `PATCH /triage`, `POST /handoff-draft` | | `backend/app/services/unified_chat_service.py` | `[TRIAGE_UPDATE]` marker extraction, auto-PATCH | | `backend/app/services/resolution_output_generator.py` | Structured fields in context builder | | `frontend/src/types/ai-session.ts` | `TriageMeta`, `EvidenceItem`, `TriageUpdate`; extend `QuestionItem` | | `frontend/src/api/aiSessions.ts` | `updateTriage()`, `getHandoffDraft()` | | `frontend/src/pages/AssistantChatPage.tsx` | Full cockpit layout refactor | | `frontend/src/components/assistant/IncidentHeader.tsx` | New | | `frontend/src/components/assistant/StepsPanel.tsx` | New (from TaskLane logic) | | `frontend/src/components/assistant/FlowPilotAsks.tsx` | New | | `frontend/src/components/assistant/WhatWeKnow.tsx` | New | | `frontend/src/components/assistant/ConcludeSessionModal.tsx` | Redesigned |