docs: add MSP assistant harness cockpit design spec
Design spec for evolving /assistant into a live triage cockpit. Covers layout decisions (stacked zones, drag-resizable split), incident header (labelled fields, AI-inferred + editable), work zone (steps checklist + FlowPilot Asks + What We Know), conclude modal redesign, and all required backend changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
363
docs/cockpit/2026-04-01-msp-assistant-harness-design.md
Normal file
363
docs/cockpit/2026-04-01-msp-assistant-harness-design.md
Normal file
@@ -0,0 +1,363 @@
|
|||||||
|
# MSP Assistant Harness — Design Spec
|
||||||
|
**Date:** 2026-04-01
|
||||||
|
**Status:** Draft — pending user review
|
||||||
|
**Source:** MSP_Assistant_Harness_Implementation_Plan.docx (v2.0, March 2026) + brainstorming session
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
The `/assistant` page currently works as a generic AI chat surface with a task lane side panel and a chat sidebar for session history. It functions well but reads as "AI chat with extras" rather than an MSP engineer's operational tool.
|
||||||
|
|
||||||
|
The goal is to reframe the page as a **live triage cockpit** — the place where an engineer opens a ticket, works through it from intake to resolution, and closes with a structured handoff artifact. The underlying session, branching, and chat architecture is preserved. What changes is layout hierarchy, information density, field labelling, and the conclude output.
|
||||||
|
|
||||||
|
Scope is broader than the original docx: includes all required backend changes to support the frontend properly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### 1. Overall Layout — Stacked Zones
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────┐
|
||||||
|
│ Incident Header (labelled fields, 1 row) │
|
||||||
|
├────────────────────────┬────────────────────┤
|
||||||
|
│ │ FlowPilot Asks │
|
||||||
|
│ Steps Checklist │ (quick replies) │
|
||||||
|
│ (left, ~55%) ├────────────────────┤
|
||||||
|
│ │ What We Know │
|
||||||
|
│ │ (evidence list) │
|
||||||
|
├────────────────────────┴────────────────────┤ ← drag handle
|
||||||
|
│ Conversation Log (muted, darker bg) │
|
||||||
|
├─────────────────────────────────────────────┤
|
||||||
|
│ Compose area │
|
||||||
|
└─────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- Work zone (top) and conversation log (bottom) are **drag-resizable** via a handle
|
||||||
|
- Default split: ~55% work zone, ~45% chat
|
||||||
|
- Existing left sidebar (session history) unchanged
|
||||||
|
- Compose area is always pinned to bottom, spans full width
|
||||||
|
- `workZoneHeight` persisted to `localStorage` so split survives refresh
|
||||||
|
|
||||||
|
### 2. Incident Header
|
||||||
|
|
||||||
|
Single row with explicit micro-labels above each field:
|
||||||
|
|
||||||
|
```
|
||||||
|
CLIENT DEVICE CATEGORY HYPOTHESIS
|
||||||
|
Contoso Ltd ✏ jsmith-desktop ✏ DNS / Network ✏ Corrupted DNS cache on NIC ✏
|
||||||
|
[CW #48291] [Resolve ▾] [⋯]
|
||||||
|
```
|
||||||
|
|
||||||
|
- Each field has its own `✏` icon (visible on hover) that opens an inline edit popover
|
||||||
|
- Fields populate from: (a) intake form on session create, (b) AI-inferred updates mid-session via `triage_update`, (c) manual engineer edits via `PATCH /ai-sessions/{id}/triage`
|
||||||
|
- PSA ticket number shown if linked; action buttons (Resolve, overflow menu) on the right
|
||||||
|
- Empty fields show muted placeholder text — never blank
|
||||||
|
|
||||||
|
### 3. Work Zone — Steps + FlowPilot Asks + What We Know
|
||||||
|
|
||||||
|
**Left panel (~55%): ordered step checklist**
|
||||||
|
- Steps displayed as a vertical list: `✓` completed, `→` active (blue border, white text), `○` pending
|
||||||
|
- Active step is visually distinct
|
||||||
|
- "Generate Script" CTA appears at the bottom when a script-generation step is active
|
||||||
|
|
||||||
|
**Right panel (~45%): two stacked mini-panels**
|
||||||
|
- **FlowPilot Asks** (top, amber label): current question from AI. When `options` are provided, renders as quick-reply buttons — clicking a button submits that answer as a chat message. When no `options`, renders a compact free-text input. Panel is empty/hidden when no pending question.
|
||||||
|
- **What We Know** (bottom, muted label): running evidence list. Each entry: `✓ confirmed` / `✗ ruled out` / `? pending`. AI appends via `triage_update.evidence_items`; engineer can manually add or edit entries.
|
||||||
|
|
||||||
|
### 4. Conversation Log Zone
|
||||||
|
|
||||||
|
- Lives below the work zone, separated by a **drag handle**
|
||||||
|
- Background: `#13151c` (one step darker than page) — visually recedes
|
||||||
|
- Label: "CONVERSATION LOG" in muted colour (`text-muted`)
|
||||||
|
- Messages are compact: `you:` / `fp:` prefixes instead of full name/avatar bubbles
|
||||||
|
- Scrolls independently
|
||||||
|
- Not collapsible — drag handle gives control
|
||||||
|
|
||||||
|
### 5. Conclude / Handoff Modal (redesigned)
|
||||||
|
|
||||||
|
On opening "Close Case":
|
||||||
|
|
||||||
|
1. **Header**: "Close Case — [Client Name]" + outcome selector (Resolved / Escalated / Parked)
|
||||||
|
2. **Structured fields** — pre-filled by streaming `/handoff-draft`, all editable:
|
||||||
|
- **Root Cause** (short text input)
|
||||||
|
- **Resolution** (what fixed it)
|
||||||
|
- **Steps Taken** (list, auto-populated from step checklist)
|
||||||
|
- **Recommendations** (next steps / preventive actions)
|
||||||
|
3. **Output destinations** (checkboxes): Post to CW ticket note / Save to Knowledge Base / Send client summary
|
||||||
|
4. **Confirm** button — triggers resolve/escalate/pause and passes structured fields into `ResolutionOutputGenerator`
|
||||||
|
|
||||||
|
The existing `SessionResolutionOutput` model and `ResolutionOutputGenerator` service are reused. The `/handoff-draft` stream starts immediately on modal open — the engineer can begin editing while fields fill in.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backend Changes Required
|
||||||
|
|
||||||
|
### 1. New AISession columns (Alembic migration)
|
||||||
|
|
||||||
|
Add to `ai_sessions` table:
|
||||||
|
|
||||||
|
| Column | Type | Purpose |
|
||||||
|
|--------|------|---------|
|
||||||
|
| `client_name` | `VARCHAR(255)` | MSP client name for incident header |
|
||||||
|
| `asset_name` | `VARCHAR(255)` | Device / asset / user being worked on |
|
||||||
|
| `issue_category` | `VARCHAR(100)` | Human-readable category (e.g. "DNS / Networking") |
|
||||||
|
| `triage_hypothesis` | `TEXT` | Current working hypothesis — AI-updated + engineer-editable |
|
||||||
|
| `evidence_items` | `JSONB` | "What We Know" list — persisted for session resume |
|
||||||
|
|
||||||
|
`evidence_items` format: `[{ "text": str, "status": "confirmed" | "ruled_out" | "pending" }]`
|
||||||
|
|
||||||
|
Note: `problem_domain` (existing) is an internal classifier slug. `issue_category` is the human-readable display label for the header. Both coexist.
|
||||||
|
|
||||||
|
### 2. New PATCH endpoint — triage metadata
|
||||||
|
|
||||||
|
```
|
||||||
|
PATCH /ai-sessions/{session_id}/triage
|
||||||
|
Auth: require_engineer_or_admin
|
||||||
|
Body: { client_name?, asset_name?, issue_category?, triage_hypothesis?, evidence_items? }
|
||||||
|
Response: { id, client_name, asset_name, issue_category, triage_hypothesis, evidence_items }
|
||||||
|
```
|
||||||
|
|
||||||
|
Used when the engineer edits any header field or evidence list manually.
|
||||||
|
|
||||||
|
### 3. Updated schemas — TriageUpdate and QuestionItem.options
|
||||||
|
|
||||||
|
**New `TriageUpdate` model** (returned in chat response when AI infers session context):
|
||||||
|
|
||||||
|
```python
|
||||||
|
class TriageUpdate(BaseModel):
|
||||||
|
client_name: str | None = None
|
||||||
|
asset_name: str | None = None
|
||||||
|
issue_category: str | None = None
|
||||||
|
triage_hypothesis: str | None = None
|
||||||
|
evidence_items: list[dict] | None = None # appends to existing list
|
||||||
|
```
|
||||||
|
|
||||||
|
**Updated `ChatMessageResponse`:**
|
||||||
|
```python
|
||||||
|
class ChatMessageResponse(BaseModel):
|
||||||
|
# existing fields unchanged...
|
||||||
|
triage_update: TriageUpdate | None = None
|
||||||
|
```
|
||||||
|
|
||||||
|
**Updated `QuestionItem`** — add `options` for quick-reply buttons:
|
||||||
|
```python
|
||||||
|
class QuestionItem(BaseModel):
|
||||||
|
text: str
|
||||||
|
context: str = ""
|
||||||
|
options: list[str] | None = None # quick-reply labels; null = free-text fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. unified_chat_service.py — triage extraction
|
||||||
|
|
||||||
|
After generating each AI response, run a lightweight extraction to populate `triage_update`. Implementation options (pick one during implementation):
|
||||||
|
|
||||||
|
- **Option A (recommended):** Embed structured extraction in the system prompt using an `[TRIAGE_UPDATE]` marker, similar to existing `[QUESTIONS]` / `[ACTIONS]` markers. AI emits the block if it has new triage signals; service parses it.
|
||||||
|
- **Option B:** Post-response extraction pass using a fast model (`claude-haiku-4-5`) with the last 3 messages as context.
|
||||||
|
|
||||||
|
When `triage_update` contains non-null fields, the service auto-PATCHes the session record (so fields are persisted) AND returns `triage_update` in the response for the frontend to update the header immediately.
|
||||||
|
|
||||||
|
### 5. New streaming endpoint — handoff draft
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /ai-sessions/{session_id}/handoff-draft
|
||||||
|
Auth: require_engineer_or_admin
|
||||||
|
Response: StreamingResponse (text/event-stream)
|
||||||
|
```
|
||||||
|
|
||||||
|
Streams a structured handoff JSON object:
|
||||||
|
```json
|
||||||
|
{ "root_cause": "...", "resolution": "...", "steps_taken": ["..."], "recommendations": "..." }
|
||||||
|
```
|
||||||
|
|
||||||
|
Built from session context: `problem_summary`, `triage_hypothesis`, `evidence_items`, `conversation_messages` (last 20), step checklist from saved task lane state.
|
||||||
|
|
||||||
|
### 6. Updated conclude schemas
|
||||||
|
|
||||||
|
Add optional structured fields to `ResolveSessionRequest` and `EscalateSessionRequest`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
root_cause: str | None = None
|
||||||
|
steps_taken: list[str] | None = None
|
||||||
|
recommendations: str | None = None
|
||||||
|
```
|
||||||
|
|
||||||
|
Pass these into `ResolutionOutputGenerator._build_session_context()` to enrich `psa_ticket_notes` and `client_summary` outputs.
|
||||||
|
|
||||||
|
### 7. Session read endpoint — include new triage fields
|
||||||
|
|
||||||
|
Ensure the session detail response (`GET /ai-sessions/{id}`) returns the new fields so the frontend can restore header state on session resume.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Frontend Changes Required
|
||||||
|
|
||||||
|
### 1. AssistantChatPage layout refactor
|
||||||
|
|
||||||
|
Replace current layout (sidebar + chat column + TaskLane side panel) with the stacked cockpit layout described above.
|
||||||
|
|
||||||
|
**New state:**
|
||||||
|
- `triageMeta: TriageMeta` — `{ client_name, asset_name, issue_category, triage_hypothesis, evidence_items }`
|
||||||
|
- `workZoneHeight: number` — persisted to `localStorage('rf-assistant-work-zone-height')`
|
||||||
|
|
||||||
|
**On session load / resume:** populate `triageMeta` from the session response (new fields).
|
||||||
|
|
||||||
|
**On AI response:** if `response.triage_update` is non-null, merge into `triageMeta` (partial update, preserve existing non-null values unless AI overwrites).
|
||||||
|
|
||||||
|
### 2. New component: `IncidentHeader`
|
||||||
|
|
||||||
|
```
|
||||||
|
frontend/src/components/assistant/IncidentHeader.tsx
|
||||||
|
```
|
||||||
|
|
||||||
|
Props: `triageMeta`, `psaTicketId`, `sessionId`, `onFieldSave(field, value)`, `onResolve`, `onOverflow`
|
||||||
|
|
||||||
|
- Renders labelled single-row header
|
||||||
|
- Each field: micro-label + value + `✏` icon (visible on hover)
|
||||||
|
- `✏` opens an `EditPopover` (small popover with text input + Save/Cancel)
|
||||||
|
- On Save: calls `aiSessionsApi.updateTriage(sessionId, { [field]: value })`
|
||||||
|
- Empty field shows muted placeholder (e.g. "Unknown client")
|
||||||
|
|
||||||
|
### 3. Refactored component: `StepsPanel` (from TaskLane)
|
||||||
|
|
||||||
|
```
|
||||||
|
frontend/src/components/assistant/StepsPanel.tsx
|
||||||
|
```
|
||||||
|
|
||||||
|
Same `activeActions` data source. Renders as ordered checklist:
|
||||||
|
- Completed: `✓` + strikethrough label, muted
|
||||||
|
- Active: `→` + blue left border, white text, full opacity
|
||||||
|
- Pending: `○` + muted text
|
||||||
|
|
||||||
|
Script generation CTA: shown at bottom when the active step has `command` containing "script" or when AI has flagged it.
|
||||||
|
|
||||||
|
### 4. New component: `FlowPilotAsks`
|
||||||
|
|
||||||
|
```
|
||||||
|
frontend/src/components/assistant/FlowPilotAsks.tsx
|
||||||
|
```
|
||||||
|
|
||||||
|
Props: `questions: QuestionItem[]`, `onAnswer(answer: string)`
|
||||||
|
|
||||||
|
- Shows first unanswered question (or empty/hidden state if none)
|
||||||
|
- When `question.options` is non-null: renders as button row, clicking calls `onAnswer(option)`
|
||||||
|
- When `question.options` is null: renders compact text input with Send button
|
||||||
|
- `onAnswer` calls `handleSend` in the parent page with the answer text
|
||||||
|
|
||||||
|
### 5. New component: `WhatWeKnow`
|
||||||
|
|
||||||
|
```
|
||||||
|
frontend/src/components/assistant/WhatWeKnow.tsx
|
||||||
|
```
|
||||||
|
|
||||||
|
Props: `items: EvidenceItem[]`, `onAdd(text, status)`, `onEdit(index, text, status)`
|
||||||
|
|
||||||
|
- Renders evidence list with status icons: `✓` (confirmed, green), `✗` (ruled out, red), `?` (pending, muted)
|
||||||
|
- "+ Add finding" link at bottom opens an inline input row
|
||||||
|
- Items are editable inline (click to edit)
|
||||||
|
- State lives in `AssistantChatPage` as part of `triageMeta.evidence_items`, synced to backend via `PATCH /triage`
|
||||||
|
|
||||||
|
### 6. Drag handle — resizable split
|
||||||
|
|
||||||
|
Implement as a thin handle bar between work zone and conversation log. On drag:
|
||||||
|
- Update `workZoneHeight` in state
|
||||||
|
- Persist to `localStorage`
|
||||||
|
|
||||||
|
On mount: restore from `localStorage`, default to `55%` of available height.
|
||||||
|
|
||||||
|
### 7. Compact conversation log
|
||||||
|
|
||||||
|
Replace current `<ChatMessage>` bubble rendering in the log zone with a compact list:
|
||||||
|
|
||||||
|
```
|
||||||
|
you: Can't resolve external DNS, internal fine
|
||||||
|
fp: Ping passed — layer 3 OK. Run nslookup google.com.
|
||||||
|
you: Timed out on 1.1.1.1 too.
|
||||||
|
```
|
||||||
|
|
||||||
|
`ChatMessage` component still used for rich rendering (suggested flows, forks) but in a more compact variant. Full bubble rendering available on hover/expand if needed.
|
||||||
|
|
||||||
|
### 8. Redesigned `ConcludeSessionModal`
|
||||||
|
|
||||||
|
Replaces current simple textarea with the structured handoff form. On open:
|
||||||
|
1. Call `aiSessionsApi.getHandoffDraft(sessionId)` — streaming — populate fields as stream arrives
|
||||||
|
2. Render outcome selector + 4 structured fields (all `<textarea>` with labels)
|
||||||
|
3. Render output destination checkboxes
|
||||||
|
4. On Confirm: call resolve/escalate/pause with enriched request body
|
||||||
|
|
||||||
|
### 9. MSP-native language pass
|
||||||
|
|
||||||
|
| Old | New |
|
||||||
|
|-----|-----|
|
||||||
|
| "AI Assistant" | "FlowPilot" |
|
||||||
|
| "New Chat" | "New Case" |
|
||||||
|
| "Messages" | "Conversation Log" |
|
||||||
|
| "Task Lane" (panel header) | "Steps" |
|
||||||
|
| "Conclude" | "Close Case" |
|
||||||
|
| "Chat history" (sidebar label) | "Case History" |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What This Is NOT
|
||||||
|
|
||||||
|
- Not a redesign of the FlowPilot session page (`/pilot`) — separate page, untouched
|
||||||
|
- Not a rebuild of session, branching, or PSA architecture
|
||||||
|
- Not a new data model for conversations — `conversation_messages` JSONB is unchanged
|
||||||
|
- Not a mobile-first redesign — mobile degrades cleanly but desktop is primary
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
### Harness Feel Test (primary — subjective)
|
||||||
|
- Open `/assistant`, start a new case: does the page read as an MSP triage cockpit within 3 seconds without reading labels?
|
||||||
|
- Is the current active step obvious without scrolling through chat?
|
||||||
|
- Do FlowPilot Asks quick-reply buttons submit answers and update the steps list?
|
||||||
|
- Does the incident header update mid-session as the AI infers context?
|
||||||
|
- Drag the handle, refresh: does the split restore correctly?
|
||||||
|
|
||||||
|
### Functional Regression
|
||||||
|
- Free-text chat, image paste, suggested flows, forks, branching: all work
|
||||||
|
- Session pause, resume, and handoff end-to-end: works
|
||||||
|
- ConcludeSessionModal resolves / escalates / parks correctly
|
||||||
|
- Handoff draft streams and pre-fills the modal fields
|
||||||
|
- Manual header edit saves and persists across reload
|
||||||
|
|
||||||
|
### MSP Scenario Coverage (from docx)
|
||||||
|
Run end-to-end: single-user endpoint issue · M365/tenant-wide issue · network/VPN outage · escalation and resume after handoff.
|
||||||
|
|
||||||
|
### Backend Checks
|
||||||
|
```bash
|
||||||
|
# Migration
|
||||||
|
alembic upgrade head
|
||||||
|
|
||||||
|
# Verify new columns
|
||||||
|
psql -U postgres -d resolutionflow -c "\d ai_sessions" | grep -E "client_name|asset_name|issue_category|triage_hypothesis|evidence_items"
|
||||||
|
|
||||||
|
# Smoke test endpoints (with valid token)
|
||||||
|
curl -X PATCH .../ai-sessions/{id}/triage -d '{"client_name":"Test"}'
|
||||||
|
curl -X POST .../ai-sessions/{id}/handoff-draft # should stream JSON
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Critical Files
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `backend/app/models/ai_session.py` | Add 5 new columns |
|
||||||
|
| `backend/app/schemas/ai_session.py` | Add `TriageUpdate`, extend `QuestionItem`, update request/response schemas |
|
||||||
|
| `backend/app/api/endpoints/ai_sessions.py` | Add `PATCH /{id}/triage`, `POST /{id}/handoff-draft` |
|
||||||
|
| `backend/app/services/unified_chat_service.py` | Extract and return `triage_update` per AI response |
|
||||||
|
| `backend/app/services/resolution_output_generator.py` | Accept structured handoff fields in context builder |
|
||||||
|
| `backend/alembic/versions/NNN_add_triage_fields_to_ai_sessions.py` | Sequential migration (check `ls backend/alembic/versions/ \| sort \| tail -1` for NNN) |
|
||||||
|
| `frontend/src/pages/AssistantChatPage.tsx` | Full layout refactor — cockpit structure |
|
||||||
|
| `frontend/src/components/assistant/IncidentHeader.tsx` | New component |
|
||||||
|
| `frontend/src/components/assistant/StepsPanel.tsx` | Refactored from `TaskLane` |
|
||||||
|
| `frontend/src/components/assistant/FlowPilotAsks.tsx` | New component |
|
||||||
|
| `frontend/src/components/assistant/WhatWeKnow.tsx` | New component |
|
||||||
|
| `frontend/src/components/assistant/ConcludeSessionModal.tsx` | Redesigned |
|
||||||
|
| `frontend/src/api/aiSessions.ts` | Add `updateTriage()`, `getHandoffDraft()` |
|
||||||
|
| `frontend/src/types/ai-session.ts` | Add `TriageUpdate`, `TriageMeta`, `EvidenceItem`; extend `QuestionItem` |
|
||||||
Reference in New Issue
Block a user