Merges MSP_Assistant_Harness_Implementation_Plan.docx with the brainstorming design spec into a single executable plan. Resolves all open questions from the original docx, expands scope to include backend changes, and adds a 35-step phased execution order. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
20 KiB
MSP Assistant Harness — Super Plan
Date: 2026-04-01
Status: Approved — ready to execute
Sources: MSP_Assistant_Harness_Implementation_Plan.docx (v2.0) + 2026-04-01-msp-assistant-harness-design.md (brainstorming session)
Goal
Reframe /assistant from a generic AI chat surface into a live MSP triage cockpit. An engineer arrives with an open ticket; the page immediately reads as their operational tool — not an AI chatbot that's been adapted for IT work.
The change is a UI and data layer reframe. The existing session, branching, PSA, and conclude architecture is preserved and extended, not rebuilt.
What Phase 0 Resolved
The brainstorming session (2026-04-01) locked these decisions. They are not open questions.
| Question | Decision |
|---|---|
| Layout structure | Stacked zones: incident header → work zone → (drag handle) → conversation log → compose |
| Incident header style | Single row, explicit micro-labels above each field, per-field ✏ edit |
| Work zone left panel | Ordered step checklist (✓ / → / ○) |
| Work zone right panel | Two stacked mini-panels: FlowPilot Asks (top) + What We Know (bottom) |
| Chat zone treatment | Drag-resizable split, compact you: / fp: prefix style, darker background |
| Chat collapsibility | Not collapsible — drag handle gives control |
| Scope | Includes all required backend changes, not UI-layer only |
| Conclude modal | Fully redesigned as structured handoff artifact |
| Page label | "FlowPilot" (not "AI Assistant") |
| "New Chat" label | "New Case" |
| "Conclude" label | "Close Case" |
| Hypothesis language | "Hypothesis" (direct, not softened to "working theory") |
| What We Know editability | Engineer-editable + AI-appended |
| Header field population | Intake form + AI-inferred mid-session + manual engineer override |
Cockpit Layout
┌─────────────────────────────────────────────────────────────┐
│ [Left sidebar — Case History, unchanged] │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ INCIDENT HEADER (single row, labelled fields) │ │
│ │ CLIENT DEVICE CATEGORY HYPOTHESIS │ │
│ │ Contoso ✏ jsmith-04 ✏ DNS/Net ✏ Cache fail ✏ │ │
│ │ [CW #48291][Resolve⋯]│ │
│ ├───────────────────────┬───────────────────────────────┤ │
│ │ │ ▸ FLOWPILOT ASKS (amber) │ │
│ │ STEPS (~55%) │ Did nslookup time out? │ │
│ │ ✓ Ping 8.8.8.8 │ [Time out] [Wrong IP] [Both] │ │
│ │ → nslookup ←active ├───────────────────────────────┤ │
│ │ ○ Flush DNS │ WHAT WE KNOW │ │
│ │ ○ Check NIC │ ✓ Gateway reachable │ │
│ │ │ ✗ DNS 1.1.1.1 — timeout │ │
│ │ [⚡ Generate Script] │ ? DNS 8.8.8.8 — pending │ │
│ ├───────────────────────┴───── ≡ drag handle ───────────┤ │
│ │ CONVERSATION LOG (compact, darker bg) │ │
│ │ you: Can't resolve external DNS, internal fine │ │
│ │ fp: Ping test passed. Run nslookup google.com. │ │
│ │ you: Timed out on 1.1.1.1 too. │ │
│ ├───────────────────────────────────────────────────────┤ │
│ │ Describe next finding or ask FlowPilot... [Send] │ │
│ └───────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Non-Goals
- No redesign of
/pilot(FlowPilot session page) — separate page, untouched - No rebuild of session, branching, or PSA architecture
- No new data model for conversations —
conversation_messagesJSONB unchanged - No mobile-first redesign — mobile degrades cleanly, desktop is primary
- No generic "assistant polish" that does not tighten the harness
Backend Changes
B1 — Alembic migration 071
File: backend/alembic/versions/071_add_triage_fields_to_ai_sessions.py
Add to ai_sessions:
| Column | Type | Notes |
|---|---|---|
client_name |
VARCHAR(255) |
MSP client for incident header |
asset_name |
VARCHAR(255) |
Device / user being worked on |
issue_category |
VARCHAR(100) |
Human-readable category ("DNS / Networking") |
triage_hypothesis |
TEXT |
Working hypothesis — AI-updated + editable |
evidence_items |
JSONB |
What We Know list — persisted for resume |
evidence_items schema: [{ "text": str, "status": "confirmed" | "ruled_out" | "pending" }]
Note: existing problem_domain is an internal classifier slug and is unchanged. issue_category is the human-readable display label. Both coexist.
B2 — Updated schemas (backend/app/schemas/ai_session.py)
New TriageUpdate:
class TriageUpdate(BaseModel):
client_name: str | None = None
asset_name: str | None = None
issue_category: str | None = None
triage_hypothesis: str | None = None
evidence_items: list[dict] | None = None # appends to existing list
Updated ChatMessageResponse:
class ChatMessageResponse(BaseModel):
# ... existing fields unchanged ...
triage_update: TriageUpdate | None = None
Updated QuestionItem — add quick-reply options:
class QuestionItem(BaseModel):
text: str
context: str = ""
options: list[str] | None = None # quick-reply labels; null → free-text input
Updated ResolveSessionRequest / EscalateSessionRequest:
root_cause: str | None = None
steps_taken: list[str] | None = None
recommendations: str | None = None
B3 — New PATCH /ai-sessions/{id}/triage endpoint
PATCH /ai-sessions/{session_id}/triage
Auth: require_engineer_or_admin
Body: { client_name?, asset_name?, issue_category?, triage_hypothesis?, evidence_items? }
Response: { id, client_name, asset_name, issue_category, triage_hypothesis, evidence_items }
Called on every manual header field edit. Partial update — only supplied fields are written.
B4 — New POST /ai-sessions/{id}/handoff-draft endpoint
POST /ai-sessions/{session_id}/handoff-draft
Auth: require_engineer_or_admin
Response: StreamingResponse (text/event-stream)
Streams structured handoff JSON built from session context:
{ "root_cause": "...", "resolution": "...", "steps_taken": ["..."], "recommendations": "..." }
Uses: problem_summary, triage_hypothesis, evidence_items, last 20 conversation_messages, saved task lane state.
Called immediately on conclude modal open — engineer can edit while stream fills in.
B5 — unified_chat_service.py — triage extraction
After each AI response, extract triage signals and return as triage_update.
Recommended approach: Add a [TRIAGE_UPDATE] structured marker to the system prompt, following the existing [QUESTIONS] / [ACTIONS] / [FORK] marker pattern. The AI emits the block only when it has new signal:
[TRIAGE_UPDATE]
client_name: Contoso Ltd
issue_category: DNS / Networking
triage_hypothesis: Corrupted DNS cache on NIC
evidence_items:
- confirmed: Gateway 192.168.1.1 reachable
- ruled_out: DNS 1.1.1.1 — timeout
[/TRIAGE_UPDATE]
Service parses this, strips it from display_content, auto-PATCHes the session record, and returns triage_update in the response.
B6 — resolution_output_generator.py — accept structured fields
Update _build_session_context() to incorporate root_cause, steps_taken, and recommendations when supplied, producing richer psa_ticket_notes and client_summary outputs.
B7 — Session detail response — expose new triage fields
GET /ai-sessions/{id} (and the session list item) must return the 5 new fields so the frontend can restore header state on session load and resume.
Frontend Changes
F1 — AssistantChatPage.tsx — cockpit layout refactor
Replace current layout (sidebar + chat column + TaskLane right rail) with the stacked cockpit structure.
New state:
triageMeta: TriageMeta—{ client_name, asset_name, issue_category, triage_hypothesis, evidence_items }workZoneHeight: number— persisted tolocalStorage('rf-assistant-work-zone-height')
On session load / resume: populate triageMeta from session response new fields.
On AI response: if response.triage_update is non-null, merge into triageMeta (partial — preserve existing non-null values unless AI explicitly overwrites).
Work zone layout: left StepsPanel + right column with FlowPilotAsks stacked above WhatWeKnow.
Chat zone layout: compact ConversationLog below drag handle, independent scroll.
F2 — New IncidentHeader.tsx
frontend/src/components/assistant/IncidentHeader.tsx
Props: triageMeta: TriageMeta, psaTicketId: string | null, sessionId: string, onFieldSave(field, value), onResolve(), onOverflow()
- Single-row bar with micro-labels (CLIENT / DEVICE / CATEGORY / HYPOTHESIS)
- Each field:
✏icon visible on hover → opens inlineEditPopover(text input + Save/Cancel) - On Save: calls
aiSessionsApi.updateTriage(sessionId, { [field]: value }) - Empty fields: muted placeholder ("Unknown client", "No device specified", etc.)
- Right side: PSA ticket badge (if linked) + Resolve button +
⋯overflow menu
F3 — Refactored StepsPanel.tsx (from TaskLane)
frontend/src/components/assistant/StepsPanel.tsx
Preserves all TaskLane data logic and persistence. Changes rendering only:
| State | Icon | Style |
|---|---|---|
| Completed | ✓ |
Strikethrough, muted, green icon |
| Active | → |
Blue left border, white text, full opacity |
| Pending | ○ |
Muted text |
Script generation CTA: shown at bottom when active step command references "script" or AI has flagged it.
TaskLane.tsx can remain for now (no renames required in this phase) — StepsPanel is a new component that consumes the same activeActions prop.
F4 — New FlowPilotAsks.tsx
frontend/src/components/assistant/FlowPilotAsks.tsx
Props: questions: QuestionItem[], onAnswer(answer: string)
- Renders first unanswered question
question.optionsnon-null → button row; clicking callsonAnswer(option)question.optionsnull → compact text input + SendonAnswercalls parent'shandleSendwith the answer string- Hidden entirely when
questionsis empty
F5 — New WhatWeKnow.tsx
frontend/src/components/assistant/WhatWeKnow.tsx
Props: items: EvidenceItem[], onAdd(text, status), onEdit(index, text, status)
- Evidence list:
✓confirmed (green) /✗ruled out (red) /?pending (muted) - "+ Add finding" inline entry at bottom
- Click any item to edit inline
- State lives in
AssistantChatPage(triageMeta.evidence_items), synced to backend viaPATCH /triage
F6 — Drag-resizable split
Thin handle bar between work zone and conversation log. On drag: update workZoneHeight in state, persist to localStorage. On mount: restore, default 55%.
F7 — Compact ConversationLog rendering
Replace current full <ChatMessage> bubbles in the log zone with a compact list: you: ... / fp: ... prefix style, tighter line height, no avatars. ChatMessage can still be used for rich content (forks, suggested flows) in a compact variant.
F8 — Redesigned ConcludeSessionModal.tsx
On open:
- Call
aiSessionsApi.getHandoffDraft(sessionId)(streaming) — fields fill in as stream arrives - Render: outcome selector (Resolved / Escalated / Parked)
- Render 4 structured editable fields: Root Cause, Resolution, Steps Taken, Recommendations
- Render output destination checkboxes: Post to CW note / Save to KB / Send client summary
- Confirm → call resolve/escalate/pause with enriched request body including structured fields
F9 — MSP-native language pass
| Old | New |
|---|---|
| "AI Assistant" (page title, meta) | "FlowPilot" |
| "New Chat" | "New Case" |
| "Messages" | "Conversation Log" |
| "Task Lane" (panel label) | "Steps" |
| "Conclude" | "Close Case" |
| "Chat history" (sidebar label) | "Case History" |
| Compose placeholder | "Describe finding, paste log output, or ask FlowPilot..." |
F10 — New API methods (aiSessions.ts)
updateTriage(sessionId: string, fields: Partial<TriageMeta>): Promise<TriageMeta>
getHandoffDraft(sessionId: string): AsyncGenerator<HandoffDraftChunk>
F11 — New types (types/ai-session.ts)
interface TriageMeta {
client_name: string | null
asset_name: string | null
issue_category: string | null
triage_hypothesis: string | null
evidence_items: EvidenceItem[]
}
interface EvidenceItem {
text: string
status: 'confirmed' | 'ruled_out' | 'pending'
}
interface TriageUpdate extends Partial<TriageMeta> {}
// Extend existing:
interface QuestionItem {
text: string
context: string
options?: string[] // new
}
Phased Execution Order
Phase 1 — Backend Foundation
- Write migration
071— add 5 columns toai_sessions - Run
alembic upgrade head, verify columns - Update
AISessionmodel with new mapped columns - Add
TriageUpdateschema, extendQuestionItem, extendChatMessageResponse - Extend
ResolveSessionRequest/EscalateSessionRequestwith structured fields - Add
PATCH /{id}/triageendpoint - Add
POST /{id}/handoff-draftstreaming endpoint - Update
GET /ai-sessions/{id}response to include new triage fields - Update
resolution_output_generator._build_session_context()to use structured fields - Run backend tests —
pytest --override-ini="addopts="
Phase 2 — Triage Extraction (AI layer)
- Add
[TRIAGE_UPDATE]marker tounified_chat_service.pysystem prompt - Implement
_parse_triage_update_marker()in the service - Auto-PATCH session on non-null
triage_update - Add
optionsgeneration instructions to[QUESTIONS]system prompt section - Verify extraction in a live session
Phase 3 — New Frontend Types + API
- Add
TriageMeta,EvidenceItem,TriageUpdatetotypes/ai-session.ts - Extend
QuestionItemtype - Add
updateTriage()andgetHandoffDraft()toaiSessions.ts
Phase 4 — New Work Zone Components
- Build
IncidentHeader.tsxwithEditPopover - Build
StepsPanel.tsx - Build
FlowPilotAsks.tsx - Build
WhatWeKnow.tsx
Phase 5 — Page Layout Refactor
- Refactor
AssistantChatPage.tsx— implement stacked cockpit layout - Wire
triageMetastate, session load population,triage_updatemerge - Implement drag-resizable split with
localStoragepersistence - Compact
ConversationLogrendering
Phase 6 — Handoff Modal + Language Pass
- Redesign
ConcludeSessionModal.tsx— structured handoff form - MSP-native language pass across all assistant components
- Update
<PageMeta>title
Phase 7 — QA + Hardening
npx tsc -b— fix any TypeScript errorsnpm run build— production build clean- Functional regression: all chat flows, session switching, conclude/resume
- Harness feel test: cockpit within 3 seconds?
- Mobile viewport check
- Stress test: 50+ messages, 10+ steps, long outputs
Risks and Mitigations
| Risk | Mitigation |
|---|---|
[TRIAGE_UPDATE] marker extraction is unreliable — AI doesn't emit it consistently |
Gate Phase 2 on a pass/fail test with 5 real sessions before wiring it to the header. Fall back to Option B (post-response Haiku pass) if needed. |
| Header fields feel fabricated — AI guesses wrong client or hypothesis | Show confidence-aware placeholder copy ("FlowPilot is building context…") until a field has real data. Never invent. |
| Task lane visual promotion breaks established chat patterns | Keep all send/respond behavior intact. Change hierarchy only. Verify every task-lane state transition manually. |
| Handoff modal exposes weak underlying summaries | Reuse existing ResolutionOutputGenerator output where possible. Add guardrail copy for empty fields. |
| Mobile loses compose or step access | Test responsive layout as a first-class deliverable in Phase 7, not a final sweep. Enforce scroll isolation between all zones. |
tsc -b errors after component refactor |
Run npx tsc -b after every phase. Trace unused imports/props immediately — don't batch (lesson #92). |
Test Plan
Harness Feel (primary, subjective)
- Does the page read as an MSP triage cockpit within 3 seconds on first load?
- Is the active step obvious without reading chat?
- Do FlowPilot Asks quick-reply buttons work and update the step list?
- Does the incident header update mid-session as AI learns context?
- Drag handle, refresh — does split restore?
- Does the conclude modal look like a case handoff or a chat closure?
Functional Regression
- New session (no PSA) — header degrades gracefully
- New session (with CW ticket) — header populates from ticket data
- Send message →
triage_updateupdates header - Click quick-reply button → answer submitted, step advances
- Add finding to What We Know → persisted via PATCH
- Edit header field via
✏→ saved and survives refresh - Conclude as Resolved → handoff draft fills modal → post to CW note
- Conclude as Escalated → same
- Pause and resume → triage header restores from saved session fields
- Session switching (currentChatRef guard) — no stale state
- Image paste, forks, suggested flows — all still work
MSP Scenarios (from docx)
- Single-user endpoint issue (basic triage flow, script generation)
- M365 / tenant-wide issue (multi-user context, issue category)
- Network / VPN outage (asset targeting, hypothesis tracking)
- Escalation and resume (session persistence, structured handoff)
Edge Cases
- 50+ messages — layout hierarchy stays intact
- 10+ steps — step panel scrolls, compose remains accessible
- Long issue titles / hypothesis text — header truncates gracefully
- Missing PSA context — placeholder copy, not blank fields
- Narrow mobile viewport — all zones reachable
Backend Checks
# Migration
alembic upgrade head
psql -U postgres -d resolutionflow -c "\d ai_sessions" | grep -E "client_name|asset_name|issue_category|triage_hypothesis|evidence_items"
# Triage PATCH
curl -X PATCH http://localhost:8000/ai-sessions/{id}/triage \
-H "Authorization: Bearer $TOKEN" \
-d '{"client_name":"Test Client","triage_hypothesis":"Cache corruption"}'
# Handoff draft stream
curl -X POST http://localhost:8000/ai-sessions/{id}/handoff-draft \
-H "Authorization: Bearer $TOKEN"
Critical Files
| File | Change |
|---|---|
backend/alembic/versions/071_add_triage_fields_to_ai_sessions.py |
New migration |
backend/app/models/ai_session.py |
Add 5 new mapped columns |
backend/app/schemas/ai_session.py |
TriageUpdate, QuestionItem.options, extended request/response schemas |
backend/app/api/endpoints/ai_sessions.py |
PATCH /triage, POST /handoff-draft |
backend/app/services/unified_chat_service.py |
[TRIAGE_UPDATE] marker extraction, auto-PATCH |
backend/app/services/resolution_output_generator.py |
Structured fields in context builder |
frontend/src/types/ai-session.ts |
TriageMeta, EvidenceItem, TriageUpdate; extend QuestionItem |
frontend/src/api/aiSessions.ts |
updateTriage(), getHandoffDraft() |
frontend/src/pages/AssistantChatPage.tsx |
Full cockpit layout refactor |
frontend/src/components/assistant/IncidentHeader.tsx |
New |
frontend/src/components/assistant/StepsPanel.tsx |
New (from TaskLane logic) |
frontend/src/components/assistant/FlowPilotAsks.tsx |
New |
frontend/src/components/assistant/WhatWeKnow.tsx |
New |
frontend/src/components/assistant/ConcludeSessionModal.tsx |
Redesigned |