docs: add MSP assistant harness cockpit design spec

Design spec for evolving /assistant into a live triage cockpit. Covers layout decisions (stacked zones, drag-resizable split), incident header (labelled fields, AI-inferred + editable), work zone (steps checklist + FlowPilot Asks + What We Know), conclude modal redesign, and all required backend changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 20:59:15 +00:00
parent f4143e52a1
commit 7998dd237d
1 changed files with 363 additions and 0 deletions
--- a/docs/cockpit/2026-04-01-msp-assistant-harness-design.md
+++ b/docs/cockpit/2026-04-01-msp-assistant-harness-design.md
@@ -0,0 +1,363 @@
+# MSP Assistant Harness — Design Spec
+**Date:** 2026-04-01
+**Status:** Draft — pending user review
+**Source:** MSP_Assistant_Harness_Implementation_Plan.docx (v2.0, March 2026) + brainstorming session
+
+---
+
+## Context
+
+The `/assistant` page currently works as a generic AI chat surface with a task lane side panel and a chat sidebar for session history. It functions well but reads as "AI chat with extras" rather than an MSP engineer's operational tool.
+
+The goal is to reframe the page as a **live triage cockpit** — the place where an engineer opens a ticket, works through it from intake to resolution, and closes with a structured handoff artifact. The underlying session, branching, and chat architecture is preserved. What changes is layout hierarchy, information density, field labelling, and the conclude output.
+
+Scope is broader than the original docx: includes all required backend changes to support the frontend properly.
+
+---
+
+## Design Decisions
+
+### 1. Overall Layout — Stacked Zones
+
+```
+┌─────────────────────────────────────────────┐
+│  Incident Header (labelled fields, 1 row)   │
+├────────────────────────┬────────────────────┤
+│                        │  FlowPilot Asks    │
+│  Steps Checklist       │  (quick replies)   │
+│  (left, ~55%)          ├────────────────────┤
+│                        │  What We Know      │
+│                        │  (evidence list)   │
+├────────────────────────┴────────────────────┤  ← drag handle
+│  Conversation Log (muted, darker bg)        │
+├─────────────────────────────────────────────┤
+│  Compose area                               │
+└─────────────────────────────────────────────┘
+```
+
+- Work zone (top) and conversation log (bottom) are **drag-resizable** via a handle
+- Default split: ~55% work zone, ~45% chat
+- Existing left sidebar (session history) unchanged
+- Compose area is always pinned to bottom, spans full width
+- `workZoneHeight` persisted to `localStorage` so split survives refresh
+
+### 2. Incident Header
+
+Single row with explicit micro-labels above each field:
+
+```
+CLIENT        DEVICE            CATEGORY          HYPOTHESIS
+Contoso Ltd ✏  jsmith-desktop ✏  DNS / Network ✏   Corrupted DNS cache on NIC ✏
+                                                              [CW #48291] [Resolve ▾] [⋯]
+```
+
+- Each field has its own `✏` icon (visible on hover) that opens an inline edit popover
+- Fields populate from: (a) intake form on session create, (b) AI-inferred updates mid-session via `triage_update`, (c) manual engineer edits via `PATCH /ai-sessions/{id}/triage`
+- PSA ticket number shown if linked; action buttons (Resolve, overflow menu) on the right
+- Empty fields show muted placeholder text — never blank
+
+### 3. Work Zone — Steps + FlowPilot Asks + What We Know
+
+**Left panel (~55%): ordered step checklist**
+- Steps displayed as a vertical list: `✓` completed, `→` active (blue border, white text), `○` pending
+- Active step is visually distinct
+- "Generate Script" CTA appears at the bottom when a script-generation step is active
+
+**Right panel (~45%): two stacked mini-panels**
+- **FlowPilot Asks** (top, amber label): current question from AI. When `options` are provided, renders as quick-reply buttons — clicking a button submits that answer as a chat message. When no `options`, renders a compact free-text input. Panel is empty/hidden when no pending question.
+- **What We Know** (bottom, muted label): running evidence list. Each entry: `✓ confirmed` / `✗ ruled out` / `? pending`. AI appends via `triage_update.evidence_items`; engineer can manually add or edit entries.
+
+### 4. Conversation Log Zone
+
+- Lives below the work zone, separated by a **drag handle**
+- Background: `#13151c` (one step darker than page) — visually recedes
+- Label: "CONVERSATION LOG" in muted colour (`text-muted`)
+- Messages are compact: `you:` / `fp:` prefixes instead of full name/avatar bubbles
+- Scrolls independently
+- Not collapsible — drag handle gives control
+
+### 5. Conclude / Handoff Modal (redesigned)
+
+On opening "Close Case":
+
+1. **Header**: "Close Case — [Client Name]" + outcome selector (Resolved / Escalated / Parked)
+2. **Structured fields** — pre-filled by streaming `/handoff-draft`, all editable:
+   - **Root Cause** (short text input)
+   - **Resolution** (what fixed it)
+   - **Steps Taken** (list, auto-populated from step checklist)
+   - **Recommendations** (next steps / preventive actions)
+3. **Output destinations** (checkboxes): Post to CW ticket note / Save to Knowledge Base / Send client summary
+4. **Confirm** button — triggers resolve/escalate/pause and passes structured fields into `ResolutionOutputGenerator`
+
+The existing `SessionResolutionOutput` model and `ResolutionOutputGenerator` service are reused. The `/handoff-draft` stream starts immediately on modal open — the engineer can begin editing while fields fill in.
+
+---
+
+## Backend Changes Required
+
+### 1. New AISession columns (Alembic migration)
+
+Add to `ai_sessions` table:
+
+| Column | Type | Purpose |
+|--------|------|---------|
+| `client_name` | `VARCHAR(255)` | MSP client name for incident header |
+| `asset_name` | `VARCHAR(255)` | Device / asset / user being worked on |
+| `issue_category` | `VARCHAR(100)` | Human-readable category (e.g. "DNS / Networking") |
+| `triage_hypothesis` | `TEXT` | Current working hypothesis — AI-updated + engineer-editable |
+| `evidence_items` | `JSONB` | "What We Know" list — persisted for session resume |
+
+`evidence_items` format: `[{ "text": str, "status": "confirmed" | "ruled_out" | "pending" }]`
+
+Note: `problem_domain` (existing) is an internal classifier slug. `issue_category` is the human-readable display label for the header. Both coexist.
+
+### 2. New PATCH endpoint — triage metadata
+
+```
+PATCH /ai-sessions/{session_id}/triage
+Auth: require_engineer_or_admin
+Body: { client_name?, asset_name?, issue_category?, triage_hypothesis?, evidence_items? }
+Response: { id, client_name, asset_name, issue_category, triage_hypothesis, evidence_items }
+```
+
+Used when the engineer edits any header field or evidence list manually.
+
+### 3. Updated schemas — TriageUpdate and QuestionItem.options
+
+**New `TriageUpdate` model** (returned in chat response when AI infers session context):
+
+```python
+class TriageUpdate(BaseModel):
+    client_name: str | None = None
+    asset_name: str | None = None
+    issue_category: str | None = None
+    triage_hypothesis: str | None = None
+    evidence_items: list[dict] | None = None  # appends to existing list
+```
+
+**Updated `ChatMessageResponse`:**
+```python
+class ChatMessageResponse(BaseModel):
+    # existing fields unchanged...
+    triage_update: TriageUpdate | None = None
+```
+
+**Updated `QuestionItem`** — add `options` for quick-reply buttons:
+```python
+class QuestionItem(BaseModel):
+    text: str
+    context: str = ""
+    options: list[str] | None = None  # quick-reply labels; null = free-text fallback
+```
+
+### 4. unified_chat_service.py — triage extraction
+
+After generating each AI response, run a lightweight extraction to populate `triage_update`. Implementation options (pick one during implementation):
+
+- **Option A (recommended):** Embed structured extraction in the system prompt using an `[TRIAGE_UPDATE]` marker, similar to existing `[QUESTIONS]` / `[ACTIONS]` markers. AI emits the block if it has new triage signals; service parses it.
+- **Option B:** Post-response extraction pass using a fast model (`claude-haiku-4-5`) with the last 3 messages as context.
+
+When `triage_update` contains non-null fields, the service auto-PATCHes the session record (so fields are persisted) AND returns `triage_update` in the response for the frontend to update the header immediately.
+
+### 5. New streaming endpoint — handoff draft
+
+```
+POST /ai-sessions/{session_id}/handoff-draft
+Auth: require_engineer_or_admin
+Response: StreamingResponse (text/event-stream)
+```
+
+Streams a structured handoff JSON object:
+```json
+{ "root_cause": "...", "resolution": "...", "steps_taken": ["..."], "recommendations": "..." }
+```
+
+Built from session context: `problem_summary`, `triage_hypothesis`, `evidence_items`, `conversation_messages` (last 20), step checklist from saved task lane state.
+
+### 6. Updated conclude schemas
+
+Add optional structured fields to `ResolveSessionRequest` and `EscalateSessionRequest`:
+
+```python
+root_cause: str | None = None
+steps_taken: list[str] | None = None
+recommendations: str | None = None
+```
+
+Pass these into `ResolutionOutputGenerator._build_session_context()` to enrich `psa_ticket_notes` and `client_summary` outputs.
+
+### 7. Session read endpoint — include new triage fields
+
+Ensure the session detail response (`GET /ai-sessions/{id}`) returns the new fields so the frontend can restore header state on session resume.
+
+---
+
+## Frontend Changes Required
+
+### 1. AssistantChatPage layout refactor
+
+Replace current layout (sidebar + chat column + TaskLane side panel) with the stacked cockpit layout described above.
+
+**New state:**
+- `triageMeta: TriageMeta` — `{ client_name, asset_name, issue_category, triage_hypothesis, evidence_items }`
+- `workZoneHeight: number` — persisted to `localStorage('rf-assistant-work-zone-height')`
+
+**On session load / resume:** populate `triageMeta` from the session response (new fields).
+
+**On AI response:** if `response.triage_update` is non-null, merge into `triageMeta` (partial update, preserve existing non-null values unless AI overwrites).
+
+### 2. New component: `IncidentHeader`
+
+```
+frontend/src/components/assistant/IncidentHeader.tsx
+```
+
+Props: `triageMeta`, `psaTicketId`, `sessionId`, `onFieldSave(field, value)`, `onResolve`, `onOverflow`
+
+- Renders labelled single-row header
+- Each field: micro-label + value + `✏` icon (visible on hover)
+- `✏` opens an `EditPopover` (small popover with text input + Save/Cancel)
+- On Save: calls `aiSessionsApi.updateTriage(sessionId, { [field]: value })`
+- Empty field shows muted placeholder (e.g. "Unknown client")
+
+### 3. Refactored component: `StepsPanel` (from TaskLane)
+
+```
+frontend/src/components/assistant/StepsPanel.tsx
+```
+
+Same `activeActions` data source. Renders as ordered checklist:
+- Completed: `✓` + strikethrough label, muted
+- Active: `→` + blue left border, white text, full opacity
+- Pending: `○` + muted text
+
+Script generation CTA: shown at bottom when the active step has `command` containing "script" or when AI has flagged it.
+
+### 4. New component: `FlowPilotAsks`
+
+```
+frontend/src/components/assistant/FlowPilotAsks.tsx
+```
+
+Props: `questions: QuestionItem[]`, `onAnswer(answer: string)`
+
+- Shows first unanswered question (or empty/hidden state if none)
+- When `question.options` is non-null: renders as button row, clicking calls `onAnswer(option)`
+- When `question.options` is null: renders compact text input with Send button
+- `onAnswer` calls `handleSend` in the parent page with the answer text
+
+### 5. New component: `WhatWeKnow`
+
+```
+frontend/src/components/assistant/WhatWeKnow.tsx
+```
+
+Props: `items: EvidenceItem[]`, `onAdd(text, status)`, `onEdit(index, text, status)`
+
+- Renders evidence list with status icons: `✓` (confirmed, green), `✗` (ruled out, red), `?` (pending, muted)
+- "+ Add finding" link at bottom opens an inline input row
+- Items are editable inline (click to edit)
+- State lives in `AssistantChatPage` as part of `triageMeta.evidence_items`, synced to backend via `PATCH /triage`
+
+### 6. Drag handle — resizable split
+
+Implement as a thin handle bar between work zone and conversation log. On drag:
+- Update `workZoneHeight` in state
+- Persist to `localStorage`
+
+On mount: restore from `localStorage`, default to `55%` of available height.
+
+### 7. Compact conversation log
+
+Replace current `<ChatMessage>` bubble rendering in the log zone with a compact list:
+
+```
+you:  Can't resolve external DNS, internal fine
+fp:   Ping passed — layer 3 OK. Run nslookup google.com.
+you:  Timed out on 1.1.1.1 too.
+```
+
+`ChatMessage` component still used for rich rendering (suggested flows, forks) but in a more compact variant. Full bubble rendering available on hover/expand if needed.
+
+### 8. Redesigned `ConcludeSessionModal`
+
+Replaces current simple textarea with the structured handoff form. On open:
+1. Call `aiSessionsApi.getHandoffDraft(sessionId)` — streaming — populate fields as stream arrives
+2. Render outcome selector + 4 structured fields (all `<textarea>` with labels)
+3. Render output destination checkboxes
+4. On Confirm: call resolve/escalate/pause with enriched request body
+
+### 9. MSP-native language pass
+
+| Old | New |
+|-----|-----|
+| "AI Assistant" | "FlowPilot" |
+| "New Chat" | "New Case" |
+| "Messages" | "Conversation Log" |
+| "Task Lane" (panel header) | "Steps" |
+| "Conclude" | "Close Case" |
+| "Chat history" (sidebar label) | "Case History" |
+
+---
+
+## What This Is NOT
+
+- Not a redesign of the FlowPilot session page (`/pilot`) — separate page, untouched
+- Not a rebuild of session, branching, or PSA architecture
+- Not a new data model for conversations — `conversation_messages` JSONB is unchanged
+- Not a mobile-first redesign — mobile degrades cleanly but desktop is primary
+
+---
+
+## Verification
+
+### Harness Feel Test (primary — subjective)
+- Open `/assistant`, start a new case: does the page read as an MSP triage cockpit within 3 seconds without reading labels?
+- Is the current active step obvious without scrolling through chat?
+- Do FlowPilot Asks quick-reply buttons submit answers and update the steps list?
+- Does the incident header update mid-session as the AI infers context?
+- Drag the handle, refresh: does the split restore correctly?
+
+### Functional Regression
+- Free-text chat, image paste, suggested flows, forks, branching: all work
+- Session pause, resume, and handoff end-to-end: works
+- ConcludeSessionModal resolves / escalates / parks correctly
+- Handoff draft streams and pre-fills the modal fields
+- Manual header edit saves and persists across reload
+
+### MSP Scenario Coverage (from docx)
+Run end-to-end: single-user endpoint issue · M365/tenant-wide issue · network/VPN outage · escalation and resume after handoff.
+
+### Backend Checks
+```bash
+# Migration
+alembic upgrade head
+
+# Verify new columns
+psql -U postgres -d resolutionflow -c "\d ai_sessions" | grep -E "client_name|asset_name|issue_category|triage_hypothesis|evidence_items"
+
+# Smoke test endpoints (with valid token)
+curl -X PATCH .../ai-sessions/{id}/triage -d '{"client_name":"Test"}'
+curl -X POST .../ai-sessions/{id}/handoff-draft  # should stream JSON
+```
+
+---
+
+## Critical Files
+
+| File | Change |
+|------|--------|
+| `backend/app/models/ai_session.py` | Add 5 new columns |
+| `backend/app/schemas/ai_session.py` | Add `TriageUpdate`, extend `QuestionItem`, update request/response schemas |
+| `backend/app/api/endpoints/ai_sessions.py` | Add `PATCH /{id}/triage`, `POST /{id}/handoff-draft` |
+| `backend/app/services/unified_chat_service.py` | Extract and return `triage_update` per AI response |
+| `backend/app/services/resolution_output_generator.py` | Accept structured handoff fields in context builder |
+| `backend/alembic/versions/NNN_add_triage_fields_to_ai_sessions.py` | Sequential migration (check `ls backend/alembic/versions/ \| sort \| tail -1` for NNN) |
+| `frontend/src/pages/AssistantChatPage.tsx` | Full layout refactor — cockpit structure |
+| `frontend/src/components/assistant/IncidentHeader.tsx` | New component |
+| `frontend/src/components/assistant/StepsPanel.tsx` | Refactored from `TaskLane` |
+| `frontend/src/components/assistant/FlowPilotAsks.tsx` | New component |
+| `frontend/src/components/assistant/WhatWeKnow.tsx` | New component |
+| `frontend/src/components/assistant/ConcludeSessionModal.tsx` | Redesigned |
+| `frontend/src/api/aiSessions.ts` | Add `updateTriage()`, `getHandoffDraft()` |
+| `frontend/src/types/ai-session.ts` | Add `TriageUpdate`, `TriageMeta`, `EvidenceItem`; extend `QuestionItem` |