docs: add MSP assistant harness cockpit design spec

Design spec for evolving /assistant into a live triage cockpit.
Covers layout decisions (stacked zones, drag-resizable split),
incident header (labelled fields, AI-inferred + editable),
work zone (steps checklist + FlowPilot Asks + What We Know),
conclude modal redesign, and all required backend changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
chihlasm
2026-04-01 20:59:15 +00:00
parent f4143e52a1
commit 7998dd237d

View File

@@ -0,0 +1,363 @@
# MSP Assistant Harness — Design Spec
**Date:** 2026-04-01
**Status:** Draft — pending user review
**Source:** MSP_Assistant_Harness_Implementation_Plan.docx (v2.0, March 2026) + brainstorming session
---
## Context
The `/assistant` page currently works as a generic AI chat surface with a task lane side panel and a chat sidebar for session history. It functions well but reads as "AI chat with extras" rather than an MSP engineer's operational tool.
The goal is to reframe the page as a **live triage cockpit** — the place where an engineer opens a ticket, works through it from intake to resolution, and closes with a structured handoff artifact. The underlying session, branching, and chat architecture is preserved. What changes is layout hierarchy, information density, field labelling, and the conclude output.
Scope is broader than the original docx: includes all required backend changes to support the frontend properly.
---
## Design Decisions
### 1. Overall Layout — Stacked Zones
```
┌─────────────────────────────────────────────┐
│ Incident Header (labelled fields, 1 row) │
├────────────────────────┬────────────────────┤
│ │ FlowPilot Asks │
│ Steps Checklist │ (quick replies) │
│ (left, ~55%) ├────────────────────┤
│ │ What We Know │
│ │ (evidence list) │
├────────────────────────┴────────────────────┤ ← drag handle
│ Conversation Log (muted, darker bg) │
├─────────────────────────────────────────────┤
│ Compose area │
└─────────────────────────────────────────────┘
```
- Work zone (top) and conversation log (bottom) are **drag-resizable** via a handle
- Default split: ~55% work zone, ~45% chat
- Existing left sidebar (session history) unchanged
- Compose area is always pinned to bottom, spans full width
- `workZoneHeight` persisted to `localStorage` so split survives refresh
### 2. Incident Header
Single row with explicit micro-labels above each field:
```
CLIENT DEVICE CATEGORY HYPOTHESIS
Contoso Ltd ✏ jsmith-desktop ✏ DNS / Network ✏ Corrupted DNS cache on NIC ✏
[CW #48291] [Resolve ▾] [⋯]
```
- Each field has its own `✏` icon (visible on hover) that opens an inline edit popover
- Fields populate from: (a) intake form on session create, (b) AI-inferred updates mid-session via `triage_update`, (c) manual engineer edits via `PATCH /ai-sessions/{id}/triage`
- PSA ticket number shown if linked; action buttons (Resolve, overflow menu) on the right
- Empty fields show muted placeholder text — never blank
### 3. Work Zone — Steps + FlowPilot Asks + What We Know
**Left panel (~55%): ordered step checklist**
- Steps displayed as a vertical list: `✓` completed, `→` active (blue border, white text), `○` pending
- Active step is visually distinct
- "Generate Script" CTA appears at the bottom when a script-generation step is active
**Right panel (~45%): two stacked mini-panels**
- **FlowPilot Asks** (top, amber label): current question from AI. When `options` are provided, renders as quick-reply buttons — clicking a button submits that answer as a chat message. When no `options`, renders a compact free-text input. Panel is empty/hidden when no pending question.
- **What We Know** (bottom, muted label): running evidence list. Each entry: `✓ confirmed` / `✗ ruled out` / `? pending`. AI appends via `triage_update.evidence_items`; engineer can manually add or edit entries.
### 4. Conversation Log Zone
- Lives below the work zone, separated by a **drag handle**
- Background: `#13151c` (one step darker than page) — visually recedes
- Label: "CONVERSATION LOG" in muted colour (`text-muted`)
- Messages are compact: `you:` / `fp:` prefixes instead of full name/avatar bubbles
- Scrolls independently
- Not collapsible — drag handle gives control
### 5. Conclude / Handoff Modal (redesigned)
On opening "Close Case":
1. **Header**: "Close Case — [Client Name]" + outcome selector (Resolved / Escalated / Parked)
2. **Structured fields** — pre-filled by streaming `/handoff-draft`, all editable:
- **Root Cause** (short text input)
- **Resolution** (what fixed it)
- **Steps Taken** (list, auto-populated from step checklist)
- **Recommendations** (next steps / preventive actions)
3. **Output destinations** (checkboxes): Post to CW ticket note / Save to Knowledge Base / Send client summary
4. **Confirm** button — triggers resolve/escalate/pause and passes structured fields into `ResolutionOutputGenerator`
The existing `SessionResolutionOutput` model and `ResolutionOutputGenerator` service are reused. The `/handoff-draft` stream starts immediately on modal open — the engineer can begin editing while fields fill in.
---
## Backend Changes Required
### 1. New AISession columns (Alembic migration)
Add to `ai_sessions` table:
| Column | Type | Purpose |
|--------|------|---------|
| `client_name` | `VARCHAR(255)` | MSP client name for incident header |
| `asset_name` | `VARCHAR(255)` | Device / asset / user being worked on |
| `issue_category` | `VARCHAR(100)` | Human-readable category (e.g. "DNS / Networking") |
| `triage_hypothesis` | `TEXT` | Current working hypothesis — AI-updated + engineer-editable |
| `evidence_items` | `JSONB` | "What We Know" list — persisted for session resume |
`evidence_items` format: `[{ "text": str, "status": "confirmed" | "ruled_out" | "pending" }]`
Note: `problem_domain` (existing) is an internal classifier slug. `issue_category` is the human-readable display label for the header. Both coexist.
### 2. New PATCH endpoint — triage metadata
```
PATCH /ai-sessions/{session_id}/triage
Auth: require_engineer_or_admin
Body: { client_name?, asset_name?, issue_category?, triage_hypothesis?, evidence_items? }
Response: { id, client_name, asset_name, issue_category, triage_hypothesis, evidence_items }
```
Used when the engineer edits any header field or evidence list manually.
### 3. Updated schemas — TriageUpdate and QuestionItem.options
**New `TriageUpdate` model** (returned in chat response when AI infers session context):
```python
class TriageUpdate(BaseModel):
client_name: str | None = None
asset_name: str | None = None
issue_category: str | None = None
triage_hypothesis: str | None = None
evidence_items: list[dict] | None = None # appends to existing list
```
**Updated `ChatMessageResponse`:**
```python
class ChatMessageResponse(BaseModel):
# existing fields unchanged...
triage_update: TriageUpdate | None = None
```
**Updated `QuestionItem`** — add `options` for quick-reply buttons:
```python
class QuestionItem(BaseModel):
text: str
context: str = ""
options: list[str] | None = None # quick-reply labels; null = free-text fallback
```
### 4. unified_chat_service.py — triage extraction
After generating each AI response, run a lightweight extraction to populate `triage_update`. Implementation options (pick one during implementation):
- **Option A (recommended):** Embed structured extraction in the system prompt using an `[TRIAGE_UPDATE]` marker, similar to existing `[QUESTIONS]` / `[ACTIONS]` markers. AI emits the block if it has new triage signals; service parses it.
- **Option B:** Post-response extraction pass using a fast model (`claude-haiku-4-5`) with the last 3 messages as context.
When `triage_update` contains non-null fields, the service auto-PATCHes the session record (so fields are persisted) AND returns `triage_update` in the response for the frontend to update the header immediately.
### 5. New streaming endpoint — handoff draft
```
POST /ai-sessions/{session_id}/handoff-draft
Auth: require_engineer_or_admin
Response: StreamingResponse (text/event-stream)
```
Streams a structured handoff JSON object:
```json
{ "root_cause": "...", "resolution": "...", "steps_taken": ["..."], "recommendations": "..." }
```
Built from session context: `problem_summary`, `triage_hypothesis`, `evidence_items`, `conversation_messages` (last 20), step checklist from saved task lane state.
### 6. Updated conclude schemas
Add optional structured fields to `ResolveSessionRequest` and `EscalateSessionRequest`:
```python
root_cause: str | None = None
steps_taken: list[str] | None = None
recommendations: str | None = None
```
Pass these into `ResolutionOutputGenerator._build_session_context()` to enrich `psa_ticket_notes` and `client_summary` outputs.
### 7. Session read endpoint — include new triage fields
Ensure the session detail response (`GET /ai-sessions/{id}`) returns the new fields so the frontend can restore header state on session resume.
---
## Frontend Changes Required
### 1. AssistantChatPage layout refactor
Replace current layout (sidebar + chat column + TaskLane side panel) with the stacked cockpit layout described above.
**New state:**
- `triageMeta: TriageMeta``{ client_name, asset_name, issue_category, triage_hypothesis, evidence_items }`
- `workZoneHeight: number` — persisted to `localStorage('rf-assistant-work-zone-height')`
**On session load / resume:** populate `triageMeta` from the session response (new fields).
**On AI response:** if `response.triage_update` is non-null, merge into `triageMeta` (partial update, preserve existing non-null values unless AI overwrites).
### 2. New component: `IncidentHeader`
```
frontend/src/components/assistant/IncidentHeader.tsx
```
Props: `triageMeta`, `psaTicketId`, `sessionId`, `onFieldSave(field, value)`, `onResolve`, `onOverflow`
- Renders labelled single-row header
- Each field: micro-label + value + `✏` icon (visible on hover)
- `✏` opens an `EditPopover` (small popover with text input + Save/Cancel)
- On Save: calls `aiSessionsApi.updateTriage(sessionId, { [field]: value })`
- Empty field shows muted placeholder (e.g. "Unknown client")
### 3. Refactored component: `StepsPanel` (from TaskLane)
```
frontend/src/components/assistant/StepsPanel.tsx
```
Same `activeActions` data source. Renders as ordered checklist:
- Completed: `✓` + strikethrough label, muted
- Active: `→` + blue left border, white text, full opacity
- Pending: `○` + muted text
Script generation CTA: shown at bottom when the active step has `command` containing "script" or when AI has flagged it.
### 4. New component: `FlowPilotAsks`
```
frontend/src/components/assistant/FlowPilotAsks.tsx
```
Props: `questions: QuestionItem[]`, `onAnswer(answer: string)`
- Shows first unanswered question (or empty/hidden state if none)
- When `question.options` is non-null: renders as button row, clicking calls `onAnswer(option)`
- When `question.options` is null: renders compact text input with Send button
- `onAnswer` calls `handleSend` in the parent page with the answer text
### 5. New component: `WhatWeKnow`
```
frontend/src/components/assistant/WhatWeKnow.tsx
```
Props: `items: EvidenceItem[]`, `onAdd(text, status)`, `onEdit(index, text, status)`
- Renders evidence list with status icons: `✓` (confirmed, green), `✗` (ruled out, red), `?` (pending, muted)
- "+ Add finding" link at bottom opens an inline input row
- Items are editable inline (click to edit)
- State lives in `AssistantChatPage` as part of `triageMeta.evidence_items`, synced to backend via `PATCH /triage`
### 6. Drag handle — resizable split
Implement as a thin handle bar between work zone and conversation log. On drag:
- Update `workZoneHeight` in state
- Persist to `localStorage`
On mount: restore from `localStorage`, default to `55%` of available height.
### 7. Compact conversation log
Replace current `<ChatMessage>` bubble rendering in the log zone with a compact list:
```
you: Can't resolve external DNS, internal fine
fp: Ping passed — layer 3 OK. Run nslookup google.com.
you: Timed out on 1.1.1.1 too.
```
`ChatMessage` component still used for rich rendering (suggested flows, forks) but in a more compact variant. Full bubble rendering available on hover/expand if needed.
### 8. Redesigned `ConcludeSessionModal`
Replaces current simple textarea with the structured handoff form. On open:
1. Call `aiSessionsApi.getHandoffDraft(sessionId)` — streaming — populate fields as stream arrives
2. Render outcome selector + 4 structured fields (all `<textarea>` with labels)
3. Render output destination checkboxes
4. On Confirm: call resolve/escalate/pause with enriched request body
### 9. MSP-native language pass
| Old | New |
|-----|-----|
| "AI Assistant" | "FlowPilot" |
| "New Chat" | "New Case" |
| "Messages" | "Conversation Log" |
| "Task Lane" (panel header) | "Steps" |
| "Conclude" | "Close Case" |
| "Chat history" (sidebar label) | "Case History" |
---
## What This Is NOT
- Not a redesign of the FlowPilot session page (`/pilot`) — separate page, untouched
- Not a rebuild of session, branching, or PSA architecture
- Not a new data model for conversations — `conversation_messages` JSONB is unchanged
- Not a mobile-first redesign — mobile degrades cleanly but desktop is primary
---
## Verification
### Harness Feel Test (primary — subjective)
- Open `/assistant`, start a new case: does the page read as an MSP triage cockpit within 3 seconds without reading labels?
- Is the current active step obvious without scrolling through chat?
- Do FlowPilot Asks quick-reply buttons submit answers and update the steps list?
- Does the incident header update mid-session as the AI infers context?
- Drag the handle, refresh: does the split restore correctly?
### Functional Regression
- Free-text chat, image paste, suggested flows, forks, branching: all work
- Session pause, resume, and handoff end-to-end: works
- ConcludeSessionModal resolves / escalates / parks correctly
- Handoff draft streams and pre-fills the modal fields
- Manual header edit saves and persists across reload
### MSP Scenario Coverage (from docx)
Run end-to-end: single-user endpoint issue · M365/tenant-wide issue · network/VPN outage · escalation and resume after handoff.
### Backend Checks
```bash
# Migration
alembic upgrade head
# Verify new columns
psql -U postgres -d resolutionflow -c "\d ai_sessions" | grep -E "client_name|asset_name|issue_category|triage_hypothesis|evidence_items"
# Smoke test endpoints (with valid token)
curl -X PATCH .../ai-sessions/{id}/triage -d '{"client_name":"Test"}'
curl -X POST .../ai-sessions/{id}/handoff-draft # should stream JSON
```
---
## Critical Files
| File | Change |
|------|--------|
| `backend/app/models/ai_session.py` | Add 5 new columns |
| `backend/app/schemas/ai_session.py` | Add `TriageUpdate`, extend `QuestionItem`, update request/response schemas |
| `backend/app/api/endpoints/ai_sessions.py` | Add `PATCH /{id}/triage`, `POST /{id}/handoff-draft` |
| `backend/app/services/unified_chat_service.py` | Extract and return `triage_update` per AI response |
| `backend/app/services/resolution_output_generator.py` | Accept structured handoff fields in context builder |
| `backend/alembic/versions/NNN_add_triage_fields_to_ai_sessions.py` | Sequential migration (check `ls backend/alembic/versions/ \| sort \| tail -1` for NNN) |
| `frontend/src/pages/AssistantChatPage.tsx` | Full layout refactor — cockpit structure |
| `frontend/src/components/assistant/IncidentHeader.tsx` | New component |
| `frontend/src/components/assistant/StepsPanel.tsx` | Refactored from `TaskLane` |
| `frontend/src/components/assistant/FlowPilotAsks.tsx` | New component |
| `frontend/src/components/assistant/WhatWeKnow.tsx` | New component |
| `frontend/src/components/assistant/ConcludeSessionModal.tsx` | Redesigned |
| `frontend/src/api/aiSessions.ts` | Add `updateTriage()`, `getHandoffDraft()` |
| `frontend/src/types/ai-session.ts` | Add `TriageUpdate`, `TriageMeta`, `EvidenceItem`; extend `QuestionItem` |