diff --git a/docs/cockpit/2026-04-01-msp-assistant-harness-plan-claude.md b/docs/cockpit/2026-04-01-msp-assistant-harness-plan-claude.md index 101191ed..e00ca878 100644 --- a/docs/cockpit/2026-04-01-msp-assistant-harness-plan-claude.md +++ b/docs/cockpit/2026-04-01-msp-assistant-harness-plan-claude.md @@ -79,6 +79,108 @@ The brainstorming session (2026-04-01) locked these decisions. They are not open --- +## Contract Decisions (Codex Readiness Review) + +The following decisions were flagged as ambiguous by the Codex readiness review. Each is now resolved. + +### RED — Canonical handoff artifact + +**Decision:** `ResolutionOutputGenerator` is the single canonical generator. Everything else is transport or UI. + +- `POST /handoff-draft` is a **preview** endpoint — streams a draft for the conclude modal UI. Does not persist. Does not generate final artifacts. +- On confirm (resolve/escalate), the page calls the existing resolve/escalate endpoints, which trigger `ResolutionOutputGenerator.generate_all()` as today. The structured fields from the modal (`root_cause`, `steps_taken`, `recommendations`) are passed into `_build_session_context()` to enrich the final outputs. +- `/documentation/stream` and `/status-update` remain untouched — they are separate transport channels for the same canonical outputs. +- `/handoff-draft` is **assistant-only** in v1 (not shared with guided FlowPilot sessions on `/pilot`). + +### RED — AI vs manual field authority + +**Decision:** Manual edits win. AI does not overwrite manual edits. + +| Rule | Behavior | +|------|----------| +| AI auto-fill | Only fills fields that are currently `null` or empty. Never overwrites a non-null value. | +| Manual edit | Persists immediately via `PATCH /triage`. Sets the field as "manually set." | +| AI after manual edit | AI may **suggest** an update (shown as a subtle inline prompt: "FlowPilot suggests: Contoso Corp → Contoso Ltd"), but does not auto-write. | +| Evidence items — AI | Appends new items only. Does not modify or remove existing items. | +| Evidence items — engineer | Full authority: add, edit status, edit text, remove. | + +Implementation: add a `triage_manual_fields` set (stored in frontend `localStorage` per session) tracking which fields the engineer has manually edited. AI `triage_update` skips those fields unless the engineer explicitly accepts the suggestion. + +### RED — `evidence_items` write model + +**Decision:** Full-list replacement for all writes. Keep it simple. + +- `PATCH /triage` sends the complete `evidence_items` array. Backend replaces the stored array. +- AI appends: frontend receives `triage_update.evidence_items`, appends to the current local list, then PATCHes the full merged list. +- Engineer edits: frontend modifies the local list, PATCHes the full list. +- No partial-update or append-only semantics on the backend. The frontend is the merge authority. + +### YELLOW — TaskLane persistence in StepsPanel + +**Decision:** `StepsPanel` is presentation only. All persistence behavior stays in `AssistantChatPage`. + +`TaskLane` currently owns sessionStorage drafts, debounced backend saves, and restoration. In the cockpit refactor: +- `AssistantChatPage` lifts all persistence logic out of `TaskLane` into the page (or a custom hook like `useTaskPersistence`) +- `StepsPanel` receives `activeActions` as a prop and renders them — no persistence responsibility +- `TaskLane.tsx` remains in the codebase untouched (other pages may still use it) + +### YELLOW — Quick-reply submission semantics + +**Decision:** Quick replies are **immediate-send** controls. + +- Clicking a quick-reply button calls `handleSend(option)` — the answer goes directly to the AI as a chat message +- No local-only "select then send" workflow +- The answer appears in the conversation log as a regular `you:` message +- This is a full-stack change: prompt instructions must tell the AI to include `options` on constrained questions, parser must extract them, schema must carry them, frontend must render and submit them + +### YELLOW — `issue_category` format + +**Decision:** Free text in v1. No controlled taxonomy. + +- AI infers a human-readable category string (e.g., "DNS / Networking", "Microsoft 365", "Active Directory") +- Engineer can edit to any value via the header `✏` popover +- Future: may introduce a taxonomy dropdown populated from session history — but not in v1 + +### YELLOW — `asset_name` when user and device differ + +**Decision:** Free text. The engineer enters whatever is most operationally relevant. + +- Could be a device name ("jsmith-desktop-04"), a user ("John Smith"), or both ("jsmith-desktop-04 / John Smith") +- AI infers from conversation context — typically the entity being troubleshot +- No enforced format in v1 + +### YELLOW — Structured conclude fields persistence + +**Decision:** Structured conclude fields (`root_cause`, `steps_taken`, `recommendations`) are **passed through to `ResolutionOutputGenerator`** but are NOT stored as separate session columns. + +- They arrive in the resolve/escalate request body +- `_build_session_context()` uses them to generate richer PSA notes and client summaries +- The generated outputs (stored in `session_resolution_outputs`) are the persisted artifacts +- If we later need the raw structured fields, add columns then — not speculatively now + +### Fallback — `[TRIAGE_UPDATE]` unreliability + +**Decision:** If prompt-embedded extraction proves unreliable after testing against 5 real sessions: + +1. **First fallback:** Post-response extraction using `claude-haiku-4-5` with last 3 messages as context. Cheap, fast, decoupled from the main prompt. +2. **Second fallback:** Fully manual header — engineer fills in fields, AI never auto-updates. Cockpit still works; it just requires more manual input. + +Gate: Phase 2 step 15 ("verify extraction in a live session") must pass before wiring `triage_update` into the visible header. + +--- + +## Implementation Guardrails + +These are hard rules during implementation, not suggestions. + +1. **Do not let AI write speculative values into the header.** Every AI-inferred field must trace to ticket data or explicit conversation evidence. If the AI can't ground it, the field stays empty. +2. **Do not redesign conclude UX until the canonical handoff source-of-truth is wired.** Phase 6 (conclude modal) depends on Phase 1 (backend) being stable. +3. **Do not treat `TaskLane` as presentation-only until its persistence behavior has been lifted.** Extract persistence into a hook or the page before building `StepsPanel`. +4. **Do not wire header auto-updates from `[TRIAGE_UPDATE]` until real-session reliability is tested.** Phase 2 step 15 is a gate. +5. **Run `npx tsc -b` after every phase.** Do not batch TypeScript error fixes (lesson #92). + +--- + ## Non-Goals - No redesign of `/pilot` (FlowPilot session page) — separate page, untouched @@ -369,10 +471,10 @@ interface QuestionItem { ### Phase 2 — Triage Extraction (AI layer) 11. Add `[TRIAGE_UPDATE]` marker to `unified_chat_service.py` system prompt -12. Implement `_parse_triage_update_marker()` in the service -13. Auto-PATCH session on non-null `triage_update` +12. Implement `_parse_triage_update_marker()` in the service (follow existing `_parse_questions_marker` / `_parse_actions_marker` pattern) +13. Auto-PATCH session on non-null `triage_update` (respect manual-edit authority: skip fields in `triage_manual_fields`) 14. Add `options` generation instructions to `[QUESTIONS]` system prompt section -15. Verify extraction in a live session +15. **GATE:** Verify extraction in 5 real sessions. If `[TRIAGE_UPDATE]` is emitted reliably (≥4/5), proceed. Otherwise switch to Haiku post-response fallback before wiring into the header. ### Phase 3 — New Frontend Types + API 16. Add `TriageMeta`, `EvidenceItem`, `TriageUpdate` to `types/ai-session.ts` @@ -380,30 +482,31 @@ interface QuestionItem { 18. Add `updateTriage()` and `getHandoffDraft()` to `aiSessions.ts` ### Phase 4 — New Work Zone Components -19. Build `IncidentHeader.tsx` with `EditPopover` -20. Build `StepsPanel.tsx` -21. Build `FlowPilotAsks.tsx` -22. Build `WhatWeKnow.tsx` +19. Extract `TaskLane` persistence logic into `useTaskPersistence` hook (sessionStorage drafts, debounced saves, restoration) — prerequisite for StepsPanel +20. Build `IncidentHeader.tsx` with `EditPopover` +21. Build `StepsPanel.tsx` (presentation only — receives props from hook) +22. Build `FlowPilotAsks.tsx` +23. Build `WhatWeKnow.tsx` ### Phase 5 — Page Layout Refactor -23. Refactor `AssistantChatPage.tsx` — implement stacked cockpit layout -24. Wire `triageMeta` state, session load population, `triage_update` merge -25. Implement drag-resizable split with `localStorage` persistence -26. Compact `ConversationLog` rendering +24. Refactor `AssistantChatPage.tsx` — implement stacked cockpit layout +25. Wire `triageMeta` state, session load population, `triage_update` merge (with `triage_manual_fields` guard) +26. Implement drag-resizable split with `localStorage` persistence +27. Compact `ConversationLog` rendering (with click-to-expand for long messages) ### Phase 6 — Handoff Modal + Language Pass + Sidebar -27. Redesign `ConcludeSessionModal.tsx` — structured handoff form -28. Sidebar visual demotion — background, label prominence, default-collapsed -29. MSP-native language pass across all assistant components -30. Update `` title +28. Redesign `ConcludeSessionModal.tsx` — structured handoff form (calls `/handoff-draft` for preview, confirms via existing resolve/escalate endpoints which trigger `ResolutionOutputGenerator`) +29. Sidebar visual demotion — background, label prominence, default-collapsed +30. MSP-native language pass across all assistant components +31. Update `` title ### Phase 7 — QA + Hardening -31. `npx tsc -b` — fix any TypeScript errors -32. `npm run build` — production build clean -33. Functional regression: all chat flows, session switching, conclude/resume -34. Harness feel test: cockpit within 3 seconds? -35. Mobile viewport check -36. Stress test: 50+ messages, 10+ steps, long outputs +32. `npx tsc -b` — fix any TypeScript errors +33. `npm run build` — production build clean +34. Functional regression: all chat flows, session switching, conclude/resume +35. Harness feel test: cockpit within 3 seconds? +36. Mobile viewport check +37. Stress test: 50+ messages, 10+ steps, long outputs ---