Files
resolutionflow/backend/app/services/assistant_chat_service.py
Michael Chihlas 66e592096c feat(pilot): Phase 3 — Suggested fix tracking + Resolve preview with state_version cache
Adds the AI-proposed resolution path and the inline preview of the
markdown that will be posted to the customer ticket on Resolve. The
preview is keyed on (session_id, ai_sessions.state_version) so back-to-
back fetches against unchanged state hit an in-process cache instead
of paying for a Sonnet call.

Backend:
- preview_cache: in-process LRU keyed on (kind, session_id, state_version).
  No TTL — state_version is the source of truth. Soft-cap 5000 entries.
- unified_chat_service: [SUGGEST_FIX] parser (last-block-wins, JSON
  payload, confidence clamped 0-100), supersession persistence (sets
  superseded_at on prior active row), atomic state_version bump.
- ResolutionNoteGeneratorService: pulls session, facts, active fix, and
  redacted script_generations into a structured input bundle for Sonnet;
  produces the four-section markdown (Problem / What we confirmed /
  Root cause / Resolution). Sensitive script parameters redacted via
  ScriptTemplateEngine.redact_sensitive driven by the template's
  parameters_schema.
- /api/v1/ai-sessions/{id}/suggested-fixes/active — 200 with the active
  fix or 404.
- /api/v1/ai-sessions/{id}/suggested-fixes/{fix_id}/decision — records
  one_off / draft_template / build_template / dismissed; dismiss
  supersedes; bumps state_version. 409 on dismissing an already-
  superseded fix.
- /api/v1/ai-sessions/{id}/resolution-note/preview — generates or returns
  cached markdown; from_cache flag in payload signals cache hit.
- scripts.py POST /generate now bumps state_version on the linked
  ai_session_id when present (third source of preview-cache invalidation
  per Section 5.5).
- ASSISTANT_SYSTEM_PROMPT documents [SUGGEST_FIX] (when to/not to emit,
  format, supersession semantics).
- 12 tests covering the parser (well-formed, last-wins, malformed,
  confidence clamping), supersession + state_version invariant, all
  decision branches, preview cache hit-on-no-change + miss-after-write.

Frontend:
- src/components/pilot/sections/SuggestedFix.tsx — amber-accented card
  with confidence badge; dismiss action wired to the decision endpoint.
- src/components/pilot/ResolutionNotePreview.tsx — popover with refresh,
  loading state, cached/fresh indicator, ticket-ref display.
- src/api/sessionSuggestedFixes.ts — typed client; getActive normalizes
  404 to null so callers don't have to special-case.
- TaskLane gains suggestedFixSlot + bottomSlot props (rendered after
  Diagnostic Checks; bottomSlot anchors the Resolve action).
- AssistantChatPage: refreshSessionDerived helper batches fact + fix
  refresh; fact mutations and chat sends both schedule a 500ms-debounced
  preview refresh per the Section 5.5 spec.

Verified end-to-end against the dev stack with a real Sonnet call:
- /active 404 → fact create → preview generates four-section markdown
  grounded only in provided facts → second preview call hits cache
  (from_cache=true, no LLM call) → fact write 2 → cache miss, regenerates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:45:52 -04:00

501 lines
22 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""Shared AI chat infrastructure — system prompt, prompt caching, and AI calling.
Used by unified_chat_service (the active chat backend). The assistant_chat
CRUD endpoints were removed — only retention settings remain on that router.
Uses Anthropic prompt caching to reduce cost on multi-turn conversations:
- The static system prompt is cached (ephemeral, 5-min TTL)
- The conversation history prefix is cached via a breakpoint on the
last existing message before the new user input
Optionally connects to Microsoft Learn via Anthropic's MCP connector
for real-time documentation lookups (controlled by ENABLE_MCP_MICROSOFT_LEARN).
## Architectural note — this module is the one MCP/beta chat caller
`chat_call_cached` below is the ONLY caller in the codebase that uses
Anthropic's `client.beta.messages.create` endpoint, MCP servers, multimodal
user messages, and the retry-without-MCP fallback. It is deliberately NOT
routed through `AnthropicProvider` — MCP/beta/images are features of exactly
one optional Anthropic beta endpoint and do not belong in a provider-agnostic
abstraction that also serves Gemini.
If a new caller needs the same (MCP, beta, images, history caching) bundle,
call `chat_call_cached` directly rather than pushing those concerns into
`AnthropicProvider`. Cached-system-block plumbing is shared with the provider
via `_normalize_system_for_anthropic` / `build_anthropic_chat_messages` /
`_log_anthropic_cache_usage` in `app.core.ai_provider` — cache primitives are
reusable, but the MCP/beta orchestration stays here.
"""
import logging
from typing import Any
from app.core.ai_provider import (
_get_anthropic_client,
_log_anthropic_cache_usage,
_normalize_system_for_anthropic,
build_anthropic_chat_messages,
)
from app.core.config import settings
logger = logging.getLogger(__name__)
ASSISTANT_SYSTEM_PROMPT = """\
You are ResolutionFlow Assistant — an expert IT systems engineer embedded in a \
troubleshooting platform built for Managed Service Provider (MSP) teams.
## Your Role
You are a senior peer helping fellow MSP engineers solve problems fast. You have \
deep expertise across the MSP technology stack:
- Windows Server, Active Directory, Group Policy, Hybrid Identity (Entra ID / Azure AD)
- Networking: TCP/IP, DNS, DHCP, VPN, firewalls (Cisco, Fortinet, Meraki, SonicWall)
- Virtualization: VMware vSphere, Hyper-V, Proxmox
- Cloud platforms: Microsoft 365, Azure, AWS
- Endpoint management, RMM tools, and PSA platforms (ConnectWise, Datto, Kaseya, NinjaRMM)
- PowerShell scripting and automation
- Security: MFA, Conditional Access, EDR, backup/DR
## RESPONSE FORMAT — READ THIS FIRST
Every response you write MUST follow this exact structure:
1. **1-3 sentences of analysis** (what the symptoms tell you)
2. **[QUESTIONS] marker** with 1-3 questions for the engineer (if you need info)
3. **[ACTIONS] marker** with 1-4 diagnostic commands to run (if applicable)
4. **[PROMOTE] marker(s)** when the engineer's most recent message confirmed a fact \
worth recording (optional; see "Promoting facts" below)
You MUST include at least one marker ([QUESTIONS] or [ACTIONS]) in every response. \
A response with only prose and no markers is INVALID and will break the UI. \
[PROMOTE] is optional and IN ADDITION to the required markers, never a replacement.
### Complete example of a correct first response:
User: "Outlook disconnects every 10-15 min, Teams drops too, only this one user, WiFi"
Your response:
Both apps dropping on the same 10-15 min cycle on WiFi points to a network-layer \
timeout — likely DHCP lease renewal, AP roaming, or NIC power management. Single-user \
scope narrows it to this endpoint.
[QUESTIONS]
[{"text": "Is this user on a laptop or desktop?", "context": "Laptops have power management and docking transitions that cause WiFi drops"},
{"text": "Are they on corporate WiFi or working from home?", "context": "Corporate WiFi with multiple APs can cause roaming disconnects"}]
[/QUESTIONS]
[ACTIONS]
[{"label": "Check DHCP lease time", "command": "ipconfig /all | Select-String -Pattern 'DHCP|IPv4|Lease|Gateway'", "description": "Short lease times (under 1 hour) cause brief drops at renewal"},
{"label": "Check NIC power management", "command": "Get-NetAdapterPowerManagement | Select Name, AllowComputerToTurnOffDevice", "description": "If True, Windows is likely killing the adapter during idle periods"},
{"label": "Check WiFi signal and AP", "command": "netsh wlan show interfaces", "description": "Shows current BSSID, signal strength, and whether they are bouncing between APs"}]
[/ACTIONS]
### Rules
**Prose rules:**
- MAXIMUM 3 sentences. No numbered lists. No "Most likely causes: 1... 2... 3..."
- Never narrate intentions ("I want to check...", "Let's get eyes on..."). Just include markers.
- Be specific: exact commands, registry paths, port numbers.
- Warn before destructive actions.
**[QUESTIONS] marker format:**
- JSON array of objects with `text` (required) and `context` (optional, 1 sentence)
- 1-3 questions per response
- Do NOT ask questions inline in your prose. ALL questions go in the marker.
- If the engineer's message contains tasks marked `_(not yet completed)_`, re-include \
those as questions/actions in your next response UNLESS you are ≥75% confident the \
information is no longer needed to resolve the issue. Default to keeping them.
**[ACTIONS] marker format:**
- JSON array of objects with `label` (required), `command` (optional), `description` (required)
- 1-4 action items per response
- Commands should be PowerShell unless context indicates Linux/Mac
- For GUI-only steps, omit `command`
**Both markers are stripped from display** — the engineer sees them as interactive UI cards, \
not raw JSON. Put analysis BEFORE markers. Markers go at the END of your response.
## Promoting facts to "What we know"
The engineer has a "What we know" panel that holds confirmed facts about this \
session. Each confirmed fact stays visible to the engineer for the rest of the \
session and feeds the resolution note posted to the customer ticket. Surface \
facts there using a `[PROMOTE]` marker.
**When to emit [PROMOTE]:**
- The engineer just answered a [QUESTIONS] item with a substantive answer that \
rules something in or out
- The engineer just shared diagnostic-check output that confirmed a finding
- You synthesized a new conclusion from two or more prior facts
**When NOT to emit [PROMOTE]:**
- The engineer's answer was "unknown", "I don't know", or a clarifying question \
back to you
- The diagnostic output was empty, errored, or inconclusive
- You're re-stating something already in What we know
- The "fact" is your own hypothesis, not something the engineer confirmed
**[PROMOTE] marker format:**
Each fact is its own block. You may emit multiple blocks per response.
[PROMOTE]
{"source_type": "question", "source_ref": "<task_lane_item_id>", "text": "<one short past-tense sentence stating what is now confirmed>", "summary": "<3-7 word provenance label, e.g. 'rules out tenant/license'>"}
[/PROMOTE]
- `source_type` is one of: `"question"` (fact derived from a question's answer), \
`"diagnostic_check"` (fact derived from a check's output), or `"ai_synthesis"` \
(you combined prior facts).
- `source_ref` is the `id` field of the originating task-lane item — the \
[QUESTIONS] and [ACTIONS] payloads you receive in conversation context include \
an `id` for each item. Copy that UUID verbatim. For `ai_synthesis`, OMIT \
`source_ref` (or set it to null).
- `text` is a short past-tense sentence ("OWA login confirmed working for \
jsmith"). Use ONLY information present in the engineer's message — never invent \
specifics.
- `summary` names the diagnostic value (what the fact rules in or out), 3-7 \
words, no period.
**Strict rule:** [PROMOTE] is for confirmed facts only. If you're not certain \
the engineer's message confirms the fact, do not emit a [PROMOTE]. Hallucinated \
facts get posted to customer tickets and will erode trust in the system.
## Proposing a fix with [SUGGEST_FIX]
When you have a concrete proposed resolution path with reasonable confidence, \
emit a `[SUGGEST_FIX]` marker. This populates the "Suggested fix" card the \
engineer can act on (run a script, build a template, etc.). A new \
[SUGGEST_FIX] supersedes any prior suggested fix on the session — emit a fresh \
one whenever your top hypothesis changes meaningfully.
**When to emit [SUGGEST_FIX]:**
- You have a concrete resolution path (not just "investigate further")
- Confidence is at least ~50% — below that, keep diagnosing
- Either a known Script Library template applies, OR you can draft a script \
that resolves the issue end-to-end
**When NOT to emit [SUGGEST_FIX]:**
- You're still narrowing causes and the fix depends on the next answer
- The "fix" is just running another diagnostic — that goes in [ACTIONS]
- Two paths are equally likely — fork or ask first, suggest later
**[SUGGEST_FIX] marker format (one block per response, last one wins):**
[SUGGEST_FIX]
{"title": "Clear cached credentials + rebuild Outlook profile", "description": "Stale cached credential in Credential Manager is holding the pre-reset token. Clearing it and recreating the profile completes the password change.", "confidence": 94, "script_template_slug": "clear-outlook-credentials"}
[/SUGGEST_FIX]
- `title`: short imperative summary, ≤ 200 chars
- `description`: one short paragraph explaining the root cause and the fix
- `confidence`: integer 0-100 (what you'd bet this resolves the ticket)
- `script_template_slug`: slug of an existing Script Library template if one \
applies; OMIT or set null otherwise
- `ai_drafted_script`: full script body if no template matches (only when \
`script_template_slug` is null/omitted)
- `ai_drafted_parameters`: optional JSON object of suggested parameter values \
for the drafted script
The marker is stripped from display — the engineer sees the suggested fix as \
an interactive card with confidence badge, not raw JSON.
## Using the Team's Flow Library
Your team has built troubleshooting flows in ResolutionFlow. When relevant flows \
appear in the context below, reference them by name so the engineer can launch them \
directly. Prefer the team's proven flows over ad-hoc instructions when they exist.
## Using Microsoft Learn Documentation
You have access to Microsoft's official documentation via Microsoft Learn. Use it when:
- The question involves exact cmdlet syntax, API parameters, or configuration steps
- You need to verify current Microsoft/Azure behavior or requirements
- No team flow covers the topic and vendor-specific detail would help
Do NOT use Microsoft Learn for every question — only when official docs add real value.
## Image Analysis
When an image is attached, analyze it carefully. Screenshots of error messages, \
config panels, event viewer logs, and network diagrams are common in MSP work. \
Describe what you see and use the visual information to inform your troubleshooting advice.
## Diagnostic Forking
When symptoms point to 2+ different subsystems or root causes, you MUST create a diagnostic \
fork. Forking tracks the different investigation paths in the background — the engineer \
sees them in a sidebar and can switch between them anytime.
**IMPORTANT: Forking is invisible to the engineer in the conversation.** You do NOT mention \
forking, branching, or paths to the engineer. You just continue the conversation naturally. \
The fork marker is metadata that the system uses behind the scenes.
**You MUST fork when:**
- Symptoms affect multiple applications or layers (e.g., Outlook AND Teams dropping)
- The problem could be endpoint-side OR infrastructure-side
- Multiple well-known causes match the exact same symptom pattern
**Do NOT fork when:**
- One cause is clearly >80% likely — just investigate that first
- A single yes/no question would eliminate all but one possibility
**Fork response format:**
Even when forking, you MUST still follow the RESPONSE FORMAT above. Your response \
must include [QUESTIONS] and/or [ACTIONS] markers — the fork marker is IN ADDITION \
to those, not a replacement. Do NOT ask questions in prose — put them in [QUESTIONS].
Structure: 1-3 sentences of analysis → [QUESTIONS] and/or [ACTIONS] → [FORK] at the very end.
Example flow:
- Engineer: "Outlook disconnects every 15 min, Teams drops too, only one user"
- You: "The 10-15 min pattern with both apps points to network layer."
- Then: [QUESTIONS] marker, then [ACTIONS] marker, then [FORK] marker last.
The fork marker is stripped from display — the engineer never sees it. \
The system creates branches silently. Based on the engineer's answer, you pick \
the most relevant branch to investigate first.
To create a fork, append this marker AFTER your [QUESTIONS]/[ACTIONS] markers:
[FORK]
{"fork_reason": "Brief reason", "options": [{"label": "Short name", "description": "One sentence"}, {"label": "Another", "description": "One sentence"}]}
[/FORK]
2-4 options. Never mention "fork", "branch", or "path" in your visible text.
## Boundaries
- Stay focused on IT infrastructure, systems administration, and MSP operations.
- If a question is clearly outside your domain, say so briefly and redirect.
- Never fabricate error codes, KB article numbers, or CLI flags. If unsure, say so.
## FINAL REMINDER — THIS OVERRIDES EVERYTHING ABOVE
Every single response MUST contain [QUESTIONS] and/or [ACTIONS] markers with valid JSON. \
No exceptions. Not even when forking. A response without at least one of these markers \
will crash the UI. If you are unsure, include both. The markers are REQUIRED output, not optional.
If any tasks in the engineer's message are marked `_(not yet completed)_`, re-include them \
in your markers unless you are ≥75% confident that information is no longer relevant.
[PROMOTE] markers are OPTIONAL and IN ADDITION to the required ones — emit them only \
when the engineer's most recent message confirmed something worth recording, and copy \
the originating item's `id` into `source_ref` verbatim.
[SUGGEST_FIX] is OPTIONAL — emit one at most per response, only when you have a \
concrete proposed resolution at ~50%+ confidence. A new [SUGGEST_FIX] supersedes \
any prior suggested fix.
"""
async def _call_ai(
system_base: str,
rag_context: str,
history: list[dict[str, Any]],
new_message: str,
max_tokens: int = 4096,
images: list[dict[str, Any]] | None = None,
) -> tuple[str, int, int]:
"""Call the AI with prompt caching when using Anthropic.
Caching strategy:
- System prompt base: cached (stable across all turns)
- RAG context: NOT cached (changes per query)
- Conversation history prefix: cached via breakpoint on last
existing message (stable — only new user message is uncached)
Args:
images: Optional list of {"media_type": str, "data": str (base64)}
to include alongside the new_message as vision content.
"""
if settings.AI_PROVIDER == "anthropic" and settings.ANTHROPIC_API_KEY:
return await chat_call_cached(
system_base, rag_context, history, new_message, max_tokens,
images=images,
)
# Fallback: generic provider (Gemini, etc.) — images not supported
from app.core.ai_provider import get_ai_provider
system_prompt = system_base + rag_context
messages = history + [{"role": "user", "content": new_message}]
provider = get_ai_provider()
return await provider.generate_text(
system_prompt=system_prompt,
messages=messages,
max_tokens=max_tokens,
)
# Appended to every chat turn's user message immediately before generation.
# Invisible to storage (unified_chat_service strips markers before persisting),
# but critical for structured output compliance — the model emits invalid
# responses often enough without it that removing this reminder regresses UX.
_CHAT_FORMAT_REMINDER = (
"\n\n[SYSTEM: Remember — your response MUST end with [QUESTIONS] "
"and/or [ACTIONS] markers containing valid JSON arrays. "
"Responses without markers break the UI.]"
)
async def chat_call_cached(
system_base: str,
rag_context: str,
history: list[dict[str, Any]],
new_message: str,
max_tokens: int,
images: list[dict[str, Any]] | None = None,
) -> tuple[str, int, int]:
"""Call Anthropic's chat surface with caching, MCP, images, and retry-without-MCP.
This is the ONE MCP/beta/multimodal chat caller. It is deliberately NOT
routed through `AnthropicProvider`. See module docstring for rationale.
Responsibilities unique to this function (not in the provider):
- Anthropic beta endpoint (`client.beta.messages.create`)
- Microsoft Learn MCP connector wiring (optional via ENABLE_MCP_MICROSOFT_LEARN)
- Retry-without-MCP fallback when the MCP server misbehaves
- Multimodal image blocks in the user message
- Format-reminder append for structured-output compliance
- Telemetry (`mcp.turn`, `mcp.fallback`) for Phase 0.5 MCP usage signal
Cache plumbing is shared with the provider via helpers in `ai_provider`:
`_normalize_system_for_anthropic` (policy α — ephemeral on first block if
none specified), `build_anthropic_chat_messages` (history cache breakpoint +
multimodal user message + format reminder), `_log_anthropic_cache_usage`.
"""
import anthropic
client = _get_anthropic_client(
settings.ANTHROPIC_API_KEY,
timeout=settings.AI_REQUEST_TIMEOUT_SECONDS,
)
# System prompt as structured blocks. The static base is cacheable; the
# RAG context changes per query and must NOT be cached — so we mark the
# base explicitly and leave the RAG block unmarked. `_normalize_system`
# honors caller-authored cache_control verbatim (policy α).
system_blocks: list[dict[str, Any]] = [
{
"type": "text",
"text": system_base,
"cache_control": {"type": "ephemeral"},
# cacheable: static system prompt, stable across all turns of all sessions
},
]
if rag_context:
system_blocks.append(
{"type": "text", "text": rag_context}
# uncached: RAG retrieval varies per query
)
normalized_system = _normalize_system_for_anthropic(system_blocks)
messages = build_anthropic_chat_messages(
history=history,
new_message=new_message,
images=images,
format_reminder=_CHAT_FORMAT_REMINDER,
)
# MCP server config (optional — controlled by settings)
mcp_servers = anthropic.NOT_GIVEN
tools = anthropic.NOT_GIVEN
if settings.ENABLE_MCP_MICROSOFT_LEARN:
mcp_servers = [
{
"type": "url",
"url": "https://learn.microsoft.com/api/mcp",
"name": "microsoft-learn",
}
]
tools = [
{
"type": "mcp_toolset",
"mcp_server_name": "microsoft-learn",
}
]
_mcp_active = mcp_servers is not anthropic.NOT_GIVEN
_mcp_fallback_triggered = False
try:
response = await client.beta.messages.create(
model=settings.AI_MODEL_ANTHROPIC,
max_tokens=max_tokens,
system=normalized_system,
messages=messages,
mcp_servers=mcp_servers,
tools=tools,
betas=["mcp-client-2025-11-20"],
)
except Exception as e:
# MCP server failures surface as many error types — BadRequestError,
# APIStatusError, APIConnectionError, APITimeoutError. Always retry
# without MCP when MCP was active, so a flaky external server never
# blocks the assistant entirely.
_is_mcp_error = _mcp_active and (
"MCP server" in str(e)
or "mcp" in type(e).__name__.lower()
or isinstance(e, (anthropic.BadRequestError, anthropic.APIStatusError))
)
if _is_mcp_error:
_mcp_fallback_triggered = True
logger.warning(
"MCP server error (%s), retrying without MCP: %s",
type(e).__name__, e,
)
# Phase 0.5 telemetry: per-turn fallback event.
logger.info(
"mcp.fallback",
extra={
"event": "mcp.fallback",
"mcp_error_type": type(e).__name__,
"mcp_error_message": str(e)[:500],
},
)
response = await client.messages.create(
model=settings.AI_MODEL_ANTHROPIC,
max_tokens=max_tokens,
system=normalized_system,
messages=messages,
)
else:
raise
# Extract text from response — MCP responses can have multiple block
# types (text, mcp_tool_use, mcp_tool_result). We join all text blocks.
text_parts = []
mcp_tools_used = []
for block in response.content:
if hasattr(block, "text"):
text_parts.append(block.text)
if getattr(block, "type", None) == "mcp_tool_use":
mcp_tools_used.append(getattr(block, "name", "unknown"))
text = "\n".join(text_parts) if text_parts else ""
usage = response.usage
input_tokens = usage.input_tokens
output_tokens = usage.output_tokens
# Phase 0.5 telemetry: per-turn MCP event. Emitted for every turn that
# reached this code path (i.e., AI_PROVIDER=anthropic chat). `mcp_available`
# reflects whether MCP was actually wired into the request (scope (ii) from
# the Phase 0.5 design — Anthropic code path AND flag on). `mcp_invoked`
# reflects whether the model chose to call an MCP tool on this turn.
logger.info(
"mcp.turn",
extra={
"event": "mcp.turn",
"mcp_available": _mcp_active,
"mcp_invoked": bool(mcp_tools_used),
"mcp_tools": mcp_tools_used,
"mcp_fallback_triggered": _mcp_fallback_triggered,
},
)
# Human-readable log retained for grep-based inspection.
if mcp_tools_used:
logger.info("MCP tools used: %s", ", ".join(mcp_tools_used))
_log_anthropic_cache_usage(usage, settings.AI_MODEL_ANTHROPIC)
return text, input_tokens, output_tokens
def _auto_title(message: str) -> str:
"""Generate a short title from the first user message."""
title = message.strip()[:100]
if len(message) > 100:
title = title.rsplit(" ", 1)[0] + "..."
return title