Renames the chat caller to a name that signals its actual purpose, and factors the reusable cached-system-block + cached-history + cache-usage-log primitives out to app.core.ai_provider so they can be shared with the provider-generic path without pulling MCP/beta/images into the abstract interface. Helpers added to ai_provider.py: - `build_anthropic_chat_messages(history, new_message, images, format_reminder)` — owns: copy history, apply cache_control to last history message, append format reminder to new message, render images as multimodal blocks. Anthropic-shaped by design; do not call from Gemini paths. chat_call_cached keeps exactly the concerns that are unique to the one MCP/beta/multimodal chat caller: - Anthropic beta endpoint invocation - Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN) - Retry-without-MCP fallback - Format-reminder content string (declared as module constant) - Phase 0.5 telemetry (mcp.turn, mcp.fallback) Documents in the module docstring AND at the function site that this is the ONE MCP/beta chat caller and should not become the general provider path. MCP/beta/images are features of exactly one optional Anthropic beta endpoint; routing them through AnthropicProvider would leak a provider- specific concern into the abstract interface that also serves Gemini. Behavior change: chat_call_cached now reuses the singleton AnthropicProvider HTTP client via `_get_anthropic_client(...)` instead of instantiating a new `anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern and avoids burning connections per-turn. No user-visible difference. No runtime verification from code-server. TODO(phase0-verify) in ai_provider.py tracks the cache-hit verification owed on the new dev env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
410 lines
17 KiB
Python
410 lines
17 KiB
Python
"""Shared AI chat infrastructure — system prompt, prompt caching, and AI calling.
|
||
|
||
Used by unified_chat_service (the active chat backend). The assistant_chat
|
||
CRUD endpoints were removed — only retention settings remain on that router.
|
||
|
||
Uses Anthropic prompt caching to reduce cost on multi-turn conversations:
|
||
- The static system prompt is cached (ephemeral, 5-min TTL)
|
||
- The conversation history prefix is cached via a breakpoint on the
|
||
last existing message before the new user input
|
||
|
||
Optionally connects to Microsoft Learn via Anthropic's MCP connector
|
||
for real-time documentation lookups (controlled by ENABLE_MCP_MICROSOFT_LEARN).
|
||
|
||
## Architectural note — this module is the one MCP/beta chat caller
|
||
|
||
`chat_call_cached` below is the ONLY caller in the codebase that uses
|
||
Anthropic's `client.beta.messages.create` endpoint, MCP servers, multimodal
|
||
user messages, and the retry-without-MCP fallback. It is deliberately NOT
|
||
routed through `AnthropicProvider` — MCP/beta/images are features of exactly
|
||
one optional Anthropic beta endpoint and do not belong in a provider-agnostic
|
||
abstraction that also serves Gemini.
|
||
|
||
If a new caller needs the same (MCP, beta, images, history caching) bundle,
|
||
call `chat_call_cached` directly rather than pushing those concerns into
|
||
`AnthropicProvider`. Cached-system-block plumbing is shared with the provider
|
||
via `_normalize_system_for_anthropic` / `build_anthropic_chat_messages` /
|
||
`_log_anthropic_cache_usage` in `app.core.ai_provider` — cache primitives are
|
||
reusable, but the MCP/beta orchestration stays here.
|
||
"""
|
||
import logging
|
||
from typing import Any
|
||
|
||
from app.core.ai_provider import (
|
||
_get_anthropic_client,
|
||
_log_anthropic_cache_usage,
|
||
_normalize_system_for_anthropic,
|
||
build_anthropic_chat_messages,
|
||
)
|
||
from app.core.config import settings
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
ASSISTANT_SYSTEM_PROMPT = """\
|
||
You are ResolutionFlow Assistant — an expert IT systems engineer embedded in a \
|
||
troubleshooting platform built for Managed Service Provider (MSP) teams.
|
||
|
||
## Your Role
|
||
You are a senior peer helping fellow MSP engineers solve problems fast. You have \
|
||
deep expertise across the MSP technology stack:
|
||
- Windows Server, Active Directory, Group Policy, Hybrid Identity (Entra ID / Azure AD)
|
||
- Networking: TCP/IP, DNS, DHCP, VPN, firewalls (Cisco, Fortinet, Meraki, SonicWall)
|
||
- Virtualization: VMware vSphere, Hyper-V, Proxmox
|
||
- Cloud platforms: Microsoft 365, Azure, AWS
|
||
- Endpoint management, RMM tools, and PSA platforms (ConnectWise, Datto, Kaseya, NinjaRMM)
|
||
- PowerShell scripting and automation
|
||
- Security: MFA, Conditional Access, EDR, backup/DR
|
||
|
||
## RESPONSE FORMAT — READ THIS FIRST
|
||
|
||
Every response you write MUST follow this exact structure:
|
||
|
||
1. **1-3 sentences of analysis** (what the symptoms tell you)
|
||
2. **[QUESTIONS] marker** with 1-3 questions for the engineer (if you need info)
|
||
3. **[ACTIONS] marker** with 1-4 diagnostic commands to run (if applicable)
|
||
|
||
You MUST include at least one marker ([QUESTIONS] or [ACTIONS]) in every response. \
|
||
A response with only prose and no markers is INVALID and will break the UI.
|
||
|
||
### Complete example of a correct first response:
|
||
|
||
User: "Outlook disconnects every 10-15 min, Teams drops too, only this one user, WiFi"
|
||
|
||
Your response:
|
||
|
||
Both apps dropping on the same 10-15 min cycle on WiFi points to a network-layer \
|
||
timeout — likely DHCP lease renewal, AP roaming, or NIC power management. Single-user \
|
||
scope narrows it to this endpoint.
|
||
|
||
[QUESTIONS]
|
||
[{"text": "Is this user on a laptop or desktop?", "context": "Laptops have power management and docking transitions that cause WiFi drops"},
|
||
{"text": "Are they on corporate WiFi or working from home?", "context": "Corporate WiFi with multiple APs can cause roaming disconnects"}]
|
||
[/QUESTIONS]
|
||
|
||
[ACTIONS]
|
||
[{"label": "Check DHCP lease time", "command": "ipconfig /all | Select-String -Pattern 'DHCP|IPv4|Lease|Gateway'", "description": "Short lease times (under 1 hour) cause brief drops at renewal"},
|
||
{"label": "Check NIC power management", "command": "Get-NetAdapterPowerManagement | Select Name, AllowComputerToTurnOffDevice", "description": "If True, Windows is likely killing the adapter during idle periods"},
|
||
{"label": "Check WiFi signal and AP", "command": "netsh wlan show interfaces", "description": "Shows current BSSID, signal strength, and whether they are bouncing between APs"}]
|
||
[/ACTIONS]
|
||
|
||
### Rules
|
||
|
||
**Prose rules:**
|
||
- MAXIMUM 3 sentences. No numbered lists. No "Most likely causes: 1... 2... 3..."
|
||
- Never narrate intentions ("I want to check...", "Let's get eyes on..."). Just include markers.
|
||
- Be specific: exact commands, registry paths, port numbers.
|
||
- Warn before destructive actions.
|
||
|
||
**[QUESTIONS] marker format:**
|
||
- JSON array of objects with `text` (required) and `context` (optional, 1 sentence)
|
||
- 1-3 questions per response
|
||
- Do NOT ask questions inline in your prose. ALL questions go in the marker.
|
||
- If the engineer's message contains tasks marked `_(not yet completed)_`, re-include \
|
||
those as questions/actions in your next response UNLESS you are ≥75% confident the \
|
||
information is no longer needed to resolve the issue. Default to keeping them.
|
||
|
||
**[ACTIONS] marker format:**
|
||
- JSON array of objects with `label` (required), `command` (optional), `description` (required)
|
||
- 1-4 action items per response
|
||
- Commands should be PowerShell unless context indicates Linux/Mac
|
||
- For GUI-only steps, omit `command`
|
||
|
||
**Both markers are stripped from display** — the engineer sees them as interactive UI cards, \
|
||
not raw JSON. Put analysis BEFORE markers. Markers go at the END of your response.
|
||
|
||
## Using the Team's Flow Library
|
||
Your team has built troubleshooting flows in ResolutionFlow. When relevant flows \
|
||
appear in the context below, reference them by name so the engineer can launch them \
|
||
directly. Prefer the team's proven flows over ad-hoc instructions when they exist.
|
||
|
||
## Using Microsoft Learn Documentation
|
||
You have access to Microsoft's official documentation via Microsoft Learn. Use it when:
|
||
- The question involves exact cmdlet syntax, API parameters, or configuration steps
|
||
- You need to verify current Microsoft/Azure behavior or requirements
|
||
- No team flow covers the topic and vendor-specific detail would help
|
||
Do NOT use Microsoft Learn for every question — only when official docs add real value.
|
||
|
||
## Image Analysis
|
||
When an image is attached, analyze it carefully. Screenshots of error messages, \
|
||
config panels, event viewer logs, and network diagrams are common in MSP work. \
|
||
Describe what you see and use the visual information to inform your troubleshooting advice.
|
||
|
||
## Diagnostic Forking
|
||
When symptoms point to 2+ different subsystems or root causes, you MUST create a diagnostic \
|
||
fork. Forking tracks the different investigation paths in the background — the engineer \
|
||
sees them in a sidebar and can switch between them anytime.
|
||
|
||
**IMPORTANT: Forking is invisible to the engineer in the conversation.** You do NOT mention \
|
||
forking, branching, or paths to the engineer. You just continue the conversation naturally. \
|
||
The fork marker is metadata that the system uses behind the scenes.
|
||
|
||
**You MUST fork when:**
|
||
- Symptoms affect multiple applications or layers (e.g., Outlook AND Teams dropping)
|
||
- The problem could be endpoint-side OR infrastructure-side
|
||
- Multiple well-known causes match the exact same symptom pattern
|
||
|
||
**Do NOT fork when:**
|
||
- One cause is clearly >80% likely — just investigate that first
|
||
- A single yes/no question would eliminate all but one possibility
|
||
|
||
**Fork response format:**
|
||
Even when forking, you MUST still follow the RESPONSE FORMAT above. Your response \
|
||
must include [QUESTIONS] and/or [ACTIONS] markers — the fork marker is IN ADDITION \
|
||
to those, not a replacement. Do NOT ask questions in prose — put them in [QUESTIONS].
|
||
|
||
Structure: 1-3 sentences of analysis → [QUESTIONS] and/or [ACTIONS] → [FORK] at the very end.
|
||
|
||
Example flow:
|
||
- Engineer: "Outlook disconnects every 15 min, Teams drops too, only one user"
|
||
- You: "The 10-15 min pattern with both apps points to network layer."
|
||
- Then: [QUESTIONS] marker, then [ACTIONS] marker, then [FORK] marker last.
|
||
|
||
The fork marker is stripped from display — the engineer never sees it. \
|
||
The system creates branches silently. Based on the engineer's answer, you pick \
|
||
the most relevant branch to investigate first.
|
||
|
||
To create a fork, append this marker AFTER your [QUESTIONS]/[ACTIONS] markers:
|
||
|
||
[FORK]
|
||
{"fork_reason": "Brief reason", "options": [{"label": "Short name", "description": "One sentence"}, {"label": "Another", "description": "One sentence"}]}
|
||
[/FORK]
|
||
|
||
2-4 options. Never mention "fork", "branch", or "path" in your visible text.
|
||
|
||
## Boundaries
|
||
- Stay focused on IT infrastructure, systems administration, and MSP operations.
|
||
- If a question is clearly outside your domain, say so briefly and redirect.
|
||
- Never fabricate error codes, KB article numbers, or CLI flags. If unsure, say so.
|
||
|
||
## FINAL REMINDER — THIS OVERRIDES EVERYTHING ABOVE
|
||
Every single response MUST contain [QUESTIONS] and/or [ACTIONS] markers with valid JSON. \
|
||
No exceptions. Not even when forking. A response without at least one of these markers \
|
||
will crash the UI. If you are unsure, include both. The markers are REQUIRED output, not optional.
|
||
If any tasks in the engineer's message are marked `_(not yet completed)_`, re-include them \
|
||
in your markers unless you are ≥75% confident that information is no longer relevant.
|
||
"""
|
||
|
||
|
||
async def _call_ai(
|
||
system_base: str,
|
||
rag_context: str,
|
||
history: list[dict[str, Any]],
|
||
new_message: str,
|
||
max_tokens: int = 4096,
|
||
images: list[dict[str, Any]] | None = None,
|
||
) -> tuple[str, int, int]:
|
||
"""Call the AI with prompt caching when using Anthropic.
|
||
|
||
Caching strategy:
|
||
- System prompt base: cached (stable across all turns)
|
||
- RAG context: NOT cached (changes per query)
|
||
- Conversation history prefix: cached via breakpoint on last
|
||
existing message (stable — only new user message is uncached)
|
||
|
||
Args:
|
||
images: Optional list of {"media_type": str, "data": str (base64)}
|
||
to include alongside the new_message as vision content.
|
||
"""
|
||
if settings.AI_PROVIDER == "anthropic" and settings.ANTHROPIC_API_KEY:
|
||
return await chat_call_cached(
|
||
system_base, rag_context, history, new_message, max_tokens,
|
||
images=images,
|
||
)
|
||
|
||
# Fallback: generic provider (Gemini, etc.) — images not supported
|
||
from app.core.ai_provider import get_ai_provider
|
||
|
||
system_prompt = system_base + rag_context
|
||
messages = history + [{"role": "user", "content": new_message}]
|
||
provider = get_ai_provider()
|
||
return await provider.generate_text(
|
||
system_prompt=system_prompt,
|
||
messages=messages,
|
||
max_tokens=max_tokens,
|
||
)
|
||
|
||
|
||
# Appended to every chat turn's user message immediately before generation.
|
||
# Invisible to storage (unified_chat_service strips markers before persisting),
|
||
# but critical for structured output compliance — the model emits invalid
|
||
# responses often enough without it that removing this reminder regresses UX.
|
||
_CHAT_FORMAT_REMINDER = (
|
||
"\n\n[SYSTEM: Remember — your response MUST end with [QUESTIONS] "
|
||
"and/or [ACTIONS] markers containing valid JSON arrays. "
|
||
"Responses without markers break the UI.]"
|
||
)
|
||
|
||
|
||
async def chat_call_cached(
|
||
system_base: str,
|
||
rag_context: str,
|
||
history: list[dict[str, Any]],
|
||
new_message: str,
|
||
max_tokens: int,
|
||
images: list[dict[str, Any]] | None = None,
|
||
) -> tuple[str, int, int]:
|
||
"""Call Anthropic's chat surface with caching, MCP, images, and retry-without-MCP.
|
||
|
||
This is the ONE MCP/beta/multimodal chat caller. It is deliberately NOT
|
||
routed through `AnthropicProvider`. See module docstring for rationale.
|
||
|
||
Responsibilities unique to this function (not in the provider):
|
||
- Anthropic beta endpoint (`client.beta.messages.create`)
|
||
- Microsoft Learn MCP connector wiring (optional via ENABLE_MCP_MICROSOFT_LEARN)
|
||
- Retry-without-MCP fallback when the MCP server misbehaves
|
||
- Multimodal image blocks in the user message
|
||
- Format-reminder append for structured-output compliance
|
||
- Telemetry (`mcp.turn`, `mcp.fallback`) for Phase 0.5 MCP usage signal
|
||
|
||
Cache plumbing is shared with the provider via helpers in `ai_provider`:
|
||
`_normalize_system_for_anthropic` (policy α — ephemeral on first block if
|
||
none specified), `build_anthropic_chat_messages` (history cache breakpoint +
|
||
multimodal user message + format reminder), `_log_anthropic_cache_usage`.
|
||
"""
|
||
import anthropic
|
||
|
||
client = _get_anthropic_client(
|
||
settings.ANTHROPIC_API_KEY,
|
||
timeout=settings.AI_REQUEST_TIMEOUT_SECONDS,
|
||
)
|
||
|
||
# System prompt as structured blocks. The static base is cacheable; the
|
||
# RAG context changes per query and must NOT be cached — so we mark the
|
||
# base explicitly and leave the RAG block unmarked. `_normalize_system`
|
||
# honors caller-authored cache_control verbatim (policy α).
|
||
system_blocks: list[dict[str, Any]] = [
|
||
{
|
||
"type": "text",
|
||
"text": system_base,
|
||
"cache_control": {"type": "ephemeral"},
|
||
# cacheable: static system prompt, stable across all turns of all sessions
|
||
},
|
||
]
|
||
if rag_context:
|
||
system_blocks.append(
|
||
{"type": "text", "text": rag_context}
|
||
# uncached: RAG retrieval varies per query
|
||
)
|
||
normalized_system = _normalize_system_for_anthropic(system_blocks)
|
||
|
||
messages = build_anthropic_chat_messages(
|
||
history=history,
|
||
new_message=new_message,
|
||
images=images,
|
||
format_reminder=_CHAT_FORMAT_REMINDER,
|
||
)
|
||
|
||
# MCP server config (optional — controlled by settings)
|
||
mcp_servers = anthropic.NOT_GIVEN
|
||
tools = anthropic.NOT_GIVEN
|
||
|
||
if settings.ENABLE_MCP_MICROSOFT_LEARN:
|
||
mcp_servers = [
|
||
{
|
||
"type": "url",
|
||
"url": "https://learn.microsoft.com/api/mcp",
|
||
"name": "microsoft-learn",
|
||
}
|
||
]
|
||
tools = [
|
||
{
|
||
"type": "mcp_toolset",
|
||
"mcp_server_name": "microsoft-learn",
|
||
}
|
||
]
|
||
|
||
_mcp_active = mcp_servers is not anthropic.NOT_GIVEN
|
||
_mcp_fallback_triggered = False
|
||
|
||
try:
|
||
response = await client.beta.messages.create(
|
||
model=settings.AI_MODEL_ANTHROPIC,
|
||
max_tokens=max_tokens,
|
||
system=normalized_system,
|
||
messages=messages,
|
||
mcp_servers=mcp_servers,
|
||
tools=tools,
|
||
betas=["mcp-client-2025-11-20"],
|
||
)
|
||
except Exception as e:
|
||
# MCP server failures surface as many error types — BadRequestError,
|
||
# APIStatusError, APIConnectionError, APITimeoutError. Always retry
|
||
# without MCP when MCP was active, so a flaky external server never
|
||
# blocks the assistant entirely.
|
||
_is_mcp_error = _mcp_active and (
|
||
"MCP server" in str(e)
|
||
or "mcp" in type(e).__name__.lower()
|
||
or isinstance(e, (anthropic.BadRequestError, anthropic.APIStatusError))
|
||
)
|
||
if _is_mcp_error:
|
||
_mcp_fallback_triggered = True
|
||
logger.warning(
|
||
"MCP server error (%s), retrying without MCP: %s",
|
||
type(e).__name__, e,
|
||
)
|
||
# Phase 0.5 telemetry: per-turn fallback event.
|
||
logger.info(
|
||
"mcp.fallback",
|
||
extra={
|
||
"event": "mcp.fallback",
|
||
"mcp_error_type": type(e).__name__,
|
||
"mcp_error_message": str(e)[:500],
|
||
},
|
||
)
|
||
response = await client.messages.create(
|
||
model=settings.AI_MODEL_ANTHROPIC,
|
||
max_tokens=max_tokens,
|
||
system=normalized_system,
|
||
messages=messages,
|
||
)
|
||
else:
|
||
raise
|
||
|
||
# Extract text from response — MCP responses can have multiple block
|
||
# types (text, mcp_tool_use, mcp_tool_result). We join all text blocks.
|
||
text_parts = []
|
||
mcp_tools_used = []
|
||
for block in response.content:
|
||
if hasattr(block, "text"):
|
||
text_parts.append(block.text)
|
||
if getattr(block, "type", None) == "mcp_tool_use":
|
||
mcp_tools_used.append(getattr(block, "name", "unknown"))
|
||
|
||
text = "\n".join(text_parts) if text_parts else ""
|
||
|
||
usage = response.usage
|
||
input_tokens = usage.input_tokens
|
||
output_tokens = usage.output_tokens
|
||
|
||
# Phase 0.5 telemetry: per-turn MCP event. Emitted for every turn that
|
||
# reached this code path (i.e., AI_PROVIDER=anthropic chat). `mcp_available`
|
||
# reflects whether MCP was actually wired into the request (scope (ii) from
|
||
# the Phase 0.5 design — Anthropic code path AND flag on). `mcp_invoked`
|
||
# reflects whether the model chose to call an MCP tool on this turn.
|
||
logger.info(
|
||
"mcp.turn",
|
||
extra={
|
||
"event": "mcp.turn",
|
||
"mcp_available": _mcp_active,
|
||
"mcp_invoked": bool(mcp_tools_used),
|
||
"mcp_tools": mcp_tools_used,
|
||
"mcp_fallback_triggered": _mcp_fallback_triggered,
|
||
},
|
||
)
|
||
|
||
# Human-readable log retained for grep-based inspection.
|
||
if mcp_tools_used:
|
||
logger.info("MCP tools used: %s", ", ".join(mcp_tools_used))
|
||
|
||
_log_anthropic_cache_usage(usage, settings.AI_MODEL_ANTHROPIC)
|
||
|
||
return text, input_tokens, output_tokens
|
||
|
||
|
||
def _auto_title(message: str) -> str:
|
||
"""Generate a short title from the first user message."""
|
||
title = message.strip()[:100]
|
||
if len(message) > 100:
|
||
title = title.rsplit(" ", 1)[0] + "..."
|
||
return title
|