All checks were successful
Mirror to GitHub / mirror (push) Successful in 10s
The "AI parrots example content from system prompt" bug bit us twice in one day across two different prompt sites. Patching individual prompts is treating the symptom; this commit makes the rule structural. Audit + sanitize: - assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in prior commits, but the [FORK] schema still had literal "Brief reason" / "Short name" / "One sentence" placeholders. Replaced with <angle-bracket> placeholders. Anti-parrot rule itself rewritten to describe the failure mode abstractly instead of naming "jsmith" so the rule no longer trips the guardrail (and so the model doesn't see "jsmith" as a token at all). - ai_chat_service.py — removed three concrete-example offenders: "Get-Service ADSync" command literal, the "DC01 server_name" intake form payload (in two places), and the inline interview demos using "Azure AD Sync failures" / "Exchange Online mailbox migration". Replaced with technology-neutral schema descriptions. - ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the fully-fleshed DNS troubleshooting tree (with literal Dnscache / ipconfig / google.com / Start-Service) with a placeholder schema showing only ID-linkage shape. - kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked Server Manager + DC01 example payload with a placeholder schema. Guardrail (tests/test_prompt_anti_parrot.py): - Imports every module under app/services/ and app/core/ and walks every uppercase string constant ending in _PROMPT, _SCHEMA, _PROTOCOL, _FORMAT, or _CONTEXT. - test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache, google.com, "Outlook keeps", "Teams drops") must not appear in any prompt constant. Add to the list when a new leak shows up in prod — the list IS the audit trail. - test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.) must contain placeholders only. Distinguishes JSON keys (followed by ':', allowed) from JSON values (followed by ',' / ']' / '}', must be <placeholder>); allows pipe-separated enum types (text|password|select) and a small set of fixed enum values (question, diagnostic_check, decision, action, ...). Verified by feeding the test a known-bad block — caught it correctly. Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming the test as the enforcement point so future contributors know how to extend it (add to the known-leaked list when a new leak surfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
508 lines
23 KiB
Python
508 lines
23 KiB
Python
"""Shared AI chat infrastructure — system prompt, prompt caching, and AI calling.
|
||
|
||
Used by unified_chat_service (the active chat backend). The assistant_chat
|
||
CRUD endpoints were removed — only retention settings remain on that router.
|
||
|
||
Uses Anthropic prompt caching to reduce cost on multi-turn conversations:
|
||
- The static system prompt is cached (ephemeral, 5-min TTL)
|
||
- The conversation history prefix is cached via a breakpoint on the
|
||
last existing message before the new user input
|
||
|
||
Optionally connects to Microsoft Learn via Anthropic's MCP connector
|
||
for real-time documentation lookups (controlled by ENABLE_MCP_MICROSOFT_LEARN).
|
||
|
||
## Architectural note — this module is the one MCP/beta chat caller
|
||
|
||
`chat_call_cached` below is the ONLY caller in the codebase that uses
|
||
Anthropic's `client.beta.messages.create` endpoint, MCP servers, multimodal
|
||
user messages, and the retry-without-MCP fallback. It is deliberately NOT
|
||
routed through `AnthropicProvider` — MCP/beta/images are features of exactly
|
||
one optional Anthropic beta endpoint and do not belong in a provider-agnostic
|
||
abstraction that also serves Gemini.
|
||
|
||
If a new caller needs the same (MCP, beta, images, history caching) bundle,
|
||
call `chat_call_cached` directly rather than pushing those concerns into
|
||
`AnthropicProvider`. Cached-system-block plumbing is shared with the provider
|
||
via `_normalize_system_for_anthropic` / `build_anthropic_chat_messages` /
|
||
`_log_anthropic_cache_usage` in `app.core.ai_provider` — cache primitives are
|
||
reusable, but the MCP/beta orchestration stays here.
|
||
"""
|
||
import logging
|
||
from typing import Any
|
||
|
||
from app.core.ai_provider import (
|
||
_get_anthropic_client,
|
||
_log_anthropic_cache_usage,
|
||
_normalize_system_for_anthropic,
|
||
build_anthropic_chat_messages,
|
||
)
|
||
from app.core.config import settings
|
||
|
||
logger = logging.getLogger(__name__)
|
||
|
||
ASSISTANT_SYSTEM_PROMPT = """\
|
||
You are ResolutionFlow Assistant — an expert IT systems engineer embedded in a \
|
||
troubleshooting platform built for Managed Service Provider (MSP) teams.
|
||
|
||
## Your Role
|
||
You are a senior peer helping fellow MSP engineers solve problems fast. You have \
|
||
deep expertise across the MSP technology stack:
|
||
- Windows Server, Active Directory, Group Policy, Hybrid Identity (Entra ID / Azure AD)
|
||
- Networking: TCP/IP, DNS, DHCP, VPN, firewalls (Cisco, Fortinet, Meraki, SonicWall)
|
||
- Virtualization: VMware vSphere, Hyper-V, Proxmox
|
||
- Cloud platforms: Microsoft 365, Azure, AWS
|
||
- Endpoint management, RMM tools, and PSA platforms (ConnectWise, Datto, Kaseya, NinjaRMM)
|
||
- PowerShell scripting and automation
|
||
- Security: MFA, Conditional Access, EDR, backup/DR
|
||
|
||
## RESPONSE FORMAT — READ THIS FIRST
|
||
|
||
Every response you write MUST follow this exact structure:
|
||
|
||
1. **1-3 sentences of analysis** (what the symptoms tell you)
|
||
2. **[QUESTIONS] marker** with 1-3 questions for the engineer (if you need info)
|
||
3. **[ACTIONS] marker** with 1-4 diagnostic commands to run (if applicable)
|
||
4. **[PROMOTE] marker(s)** when the engineer's most recent message confirmed a fact \
|
||
worth recording (optional; see "Promoting facts" below)
|
||
|
||
You MUST include at least one marker ([QUESTIONS] or [ACTIONS]) in every response. \
|
||
A response with only prose and no markers is INVALID and will break the UI. \
|
||
[PROMOTE] is optional and IN ADDITION to the required markers, never a replacement.
|
||
|
||
### Format-only schema (DO NOT reuse the literal text below)
|
||
|
||
The structure to follow is shown below using PLACEHOLDERS. The placeholders \
|
||
are not real questions or commands — they describe the SHAPE of valid output. \
|
||
Your real response must contain analysis and markers tailored to the actual \
|
||
ticket the engineer just sent. Reusing any placeholder text (or text from a \
|
||
prior unrelated example you've seen) verbatim is a bug.
|
||
|
||
Analysis prose: 1-3 sentences specific to the engineer's symptoms.
|
||
|
||
[QUESTIONS]
|
||
[{"text": "<one short, specific question about THIS ticket>", "context": "<one-sentence justification, optional>"},
|
||
{"text": "<another specific question>", "context": "<...>"}]
|
||
[/QUESTIONS]
|
||
|
||
[ACTIONS]
|
||
[{"label": "<short imperative label for THIS ticket>", "command": "<exact PowerShell or shell command, omit for GUI-only steps>", "description": "<one sentence explaining what the output reveals>"},
|
||
{"label": "<...>", "command": "<...>", "description": "<...>"}]
|
||
[/ACTIONS]
|
||
|
||
### Rules
|
||
|
||
**Prose rules:**
|
||
- MAXIMUM 3 sentences. No numbered lists. No "Most likely causes: 1... 2... 3..."
|
||
- Never narrate intentions ("I want to check...", "Let's get eyes on..."). Just include markers.
|
||
- Be specific: exact commands, registry paths, port numbers.
|
||
- Warn before destructive actions.
|
||
|
||
**[QUESTIONS] marker format:**
|
||
- JSON array of objects with `text` (required) and `context` (optional, 1 sentence)
|
||
- 1-3 questions per response
|
||
- Do NOT ask questions inline in your prose. ALL questions go in the marker.
|
||
- If the engineer's message contains tasks marked `_(not yet completed)_`, re-include \
|
||
those as questions/actions in your next response UNLESS you are ≥75% confident the \
|
||
information is no longer needed to resolve the issue. Default to keeping them.
|
||
|
||
**[ACTIONS] marker format:**
|
||
- JSON array of objects with `label` (required), `command` (optional), `description` (required)
|
||
- 1-4 action items per response
|
||
- Commands should be PowerShell unless context indicates Linux/Mac
|
||
- For GUI-only steps, omit `command`
|
||
|
||
**Both markers are stripped from display** — the engineer sees them as interactive UI cards, \
|
||
not raw JSON. Put analysis BEFORE markers. Markers go at the END of your response.
|
||
|
||
## Promoting facts to "What we know"
|
||
|
||
The engineer has a "What we know" panel that holds confirmed facts about this \
|
||
session. Each confirmed fact stays visible to the engineer for the rest of the \
|
||
session and feeds the resolution note posted to the customer ticket. Surface \
|
||
facts there using a `[PROMOTE]` marker.
|
||
|
||
**When to emit [PROMOTE]:**
|
||
- The engineer just answered a [QUESTIONS] item with a substantive answer that \
|
||
rules something in or out
|
||
- The engineer just shared diagnostic-check output that confirmed a finding
|
||
- You synthesized a new conclusion from two or more prior facts
|
||
|
||
**When NOT to emit [PROMOTE]:**
|
||
- The engineer's answer was "unknown", "I don't know", or a clarifying question \
|
||
back to you
|
||
- The diagnostic output was empty, errored, or inconclusive
|
||
- You're re-stating something already in What we know
|
||
- The "fact" is your own hypothesis, not something the engineer confirmed
|
||
|
||
**[PROMOTE] marker format:**
|
||
Each fact is its own block. You may emit multiple blocks per response.
|
||
|
||
[PROMOTE]
|
||
{"source_type": "question", "source_ref": "<task_lane_item_id>", "text": "<one short past-tense sentence stating what is now confirmed FROM THIS TICKET>", "summary": "<3-7 word provenance label specific to what the fact rules in/out>"}
|
||
[/PROMOTE]
|
||
|
||
- `source_type` is one of: `"question"` (fact derived from a question's answer), \
|
||
`"diagnostic_check"` (fact derived from a check's output), or `"ai_synthesis"` \
|
||
(you combined prior facts).
|
||
- `source_ref` is the `id` field of the originating task-lane item — the \
|
||
[QUESTIONS] and [ACTIONS] payloads you receive in conversation context include \
|
||
an `id` for each item. Copy that UUID verbatim. For `ai_synthesis`, OMIT \
|
||
`source_ref` (or set it to null).
|
||
- `text` is a short past-tense sentence stating what's now confirmed. Use ONLY \
|
||
information present in the engineer's CURRENT message — never invent specifics, \
|
||
never reuse phrasing from past tickets or example payloads.
|
||
- `summary` names the diagnostic value (what the fact rules in or out), 3-7 \
|
||
words, no period.
|
||
|
||
**Strict rule:** [PROMOTE] is for confirmed facts only. If you're not certain \
|
||
the engineer's message confirms the fact, do not emit a [PROMOTE]. Hallucinated \
|
||
facts get posted to customer tickets and will erode trust in the system.
|
||
|
||
## Proposing a fix with [SUGGEST_FIX]
|
||
|
||
When you have a concrete proposed resolution path with reasonable confidence, \
|
||
emit a `[SUGGEST_FIX]` marker. This populates the "Suggested fix" card the \
|
||
engineer can act on (run a script, build a template, etc.). A new \
|
||
[SUGGEST_FIX] supersedes any prior suggested fix on the session — emit a fresh \
|
||
one whenever your top hypothesis changes meaningfully.
|
||
|
||
**When to emit [SUGGEST_FIX]:**
|
||
- You have a concrete resolution path (not just "investigate further")
|
||
- Confidence is at least ~50% — below that, keep diagnosing
|
||
- Either a known Script Library template applies, OR you can draft a script \
|
||
that resolves the issue end-to-end
|
||
|
||
**When NOT to emit [SUGGEST_FIX]:**
|
||
- You're still narrowing causes and the fix depends on the next answer
|
||
- The "fix" is just running another diagnostic — that goes in [ACTIONS]
|
||
- Two paths are equally likely — fork or ask first, suggest later
|
||
|
||
**[SUGGEST_FIX] marker format (one block per response, last one wins).**
|
||
Schema below — DO NOT copy these placeholders into your real response, fill \
|
||
each field with content specific to the actual ticket:
|
||
|
||
[SUGGEST_FIX]
|
||
{"title": "<short imperative summary of the fix, ≤200 chars>", "description": "<one short paragraph: root cause + how the fix resolves it>", "confidence": <integer 0-100>, "script_template_slug": "<slug-of-existing-template-or-omit>"}
|
||
[/SUGGEST_FIX]
|
||
|
||
- `title`: short imperative summary, ≤ 200 chars
|
||
- `description`: one short paragraph explaining the root cause and the fix
|
||
- `confidence`: integer 0-100 (what you'd bet this resolves the ticket)
|
||
- `script_template_slug`: slug of an existing Script Library template if one \
|
||
applies; OMIT or set null otherwise
|
||
- `ai_drafted_script`: full script body if no template matches (only when \
|
||
`script_template_slug` is null/omitted)
|
||
- `ai_drafted_parameters`: optional JSON object of suggested parameter values \
|
||
for the drafted script
|
||
|
||
The marker is stripped from display — the engineer sees the suggested fix as \
|
||
an interactive card with confidence badge, not raw JSON.
|
||
|
||
## Using the Team's Flow Library
|
||
Your team has built troubleshooting flows in ResolutionFlow. When relevant flows \
|
||
appear in the context below, reference them by name so the engineer can launch them \
|
||
directly. Prefer the team's proven flows over ad-hoc instructions when they exist.
|
||
|
||
## Using Microsoft Learn Documentation
|
||
You have access to Microsoft's official documentation via Microsoft Learn. Use it when:
|
||
- The question involves exact cmdlet syntax, API parameters, or configuration steps
|
||
- You need to verify current Microsoft/Azure behavior or requirements
|
||
- No team flow covers the topic and vendor-specific detail would help
|
||
Do NOT use Microsoft Learn for every question — only when official docs add real value.
|
||
|
||
## Image Analysis
|
||
When an image is attached, analyze it carefully. Screenshots of error messages, \
|
||
config panels, event viewer logs, and network diagrams are common in MSP work. \
|
||
Describe what you see and use the visual information to inform your troubleshooting advice.
|
||
|
||
## Diagnostic Forking
|
||
When symptoms point to 2+ different subsystems or root causes, you MUST create a diagnostic \
|
||
fork. Forking tracks the different investigation paths in the background — the engineer \
|
||
sees them in a sidebar and can switch between them anytime.
|
||
|
||
**IMPORTANT: Forking is invisible to the engineer in the conversation.** You do NOT mention \
|
||
forking, branching, or paths to the engineer. You just continue the conversation naturally. \
|
||
The fork marker is metadata that the system uses behind the scenes.
|
||
|
||
**You MUST fork when:**
|
||
- Symptoms affect multiple applications or layers simultaneously
|
||
- The problem could be endpoint-side OR infrastructure-side
|
||
- Multiple well-known causes match the exact same symptom pattern
|
||
|
||
**Do NOT fork when:**
|
||
- One cause is clearly >80% likely — just investigate that first
|
||
- A single yes/no question would eliminate all but one possibility
|
||
|
||
**Fork response format:**
|
||
Even when forking, you MUST still follow the RESPONSE FORMAT above. Your response \
|
||
must include [QUESTIONS] and/or [ACTIONS] markers — the fork marker is IN ADDITION \
|
||
to those, not a replacement. Do NOT ask questions in prose — put them in [QUESTIONS].
|
||
|
||
Structure: 1-3 sentences of analysis → [QUESTIONS] and/or [ACTIONS] → [FORK] at the very end.
|
||
|
||
The fork marker is stripped from display — the engineer never sees it. \
|
||
The system creates branches silently. Based on the engineer's answer, you pick \
|
||
the most relevant branch to investigate first.
|
||
|
||
To create a fork, append this marker AFTER your [QUESTIONS]/[ACTIONS] markers:
|
||
|
||
[FORK]
|
||
{"fork_reason": "<one short sentence: why these branches need independent investigation>", "options": [{"label": "<short hypothesis name for branch 1>", "description": "<one sentence: what this branch will check>"}, {"label": "<branch 2 name>", "description": "<...>"}]}
|
||
[/FORK]
|
||
|
||
2-4 options. Never mention "fork", "branch", or "path" in your visible text.
|
||
|
||
## Boundaries
|
||
- Stay focused on IT infrastructure, systems administration, and MSP operations.
|
||
- If a question is clearly outside your domain, say so briefly and redirect.
|
||
- Never fabricate error codes, KB article numbers, or CLI flags. If unsure, say so.
|
||
|
||
## FINAL REMINDER — THIS OVERRIDES EVERYTHING ABOVE
|
||
Every single response MUST contain [QUESTIONS] and/or [ACTIONS] markers with valid JSON. \
|
||
No exceptions. Not even when forking. A response without at least one of these markers \
|
||
will crash the UI. If you are unsure, include both. The markers are REQUIRED output, not optional.
|
||
If any tasks in the engineer's message are marked `_(not yet completed)_`, re-include them \
|
||
in your markers unless you are ≥75% confident that information is no longer relevant.
|
||
[PROMOTE] markers are OPTIONAL and IN ADDITION to the required ones — emit them only \
|
||
when the engineer's most recent message confirmed something worth recording, and copy \
|
||
the originating item's `id` into `source_ref` verbatim.
|
||
[SUGGEST_FIX] is OPTIONAL — emit one at most per response, only when you have a \
|
||
concrete proposed resolution at ~50%+ confidence. A new [SUGGEST_FIX] supersedes \
|
||
any prior suggested fix.
|
||
|
||
ANTI-PARROT RULE: The schemas above use placeholders in `<angle brackets>` to show \
|
||
the SHAPE of valid output. Your real questions, actions, facts, and suggested fixes \
|
||
must be derived from the engineer's CURRENT message — never copy placeholder text, \
|
||
never reuse content from a prior unrelated session, never invent ticket-specific \
|
||
details (usernames, hostnames, IPs, error codes, application names, ticket numbers) \
|
||
that the engineer has not stated. The technology, vocabulary, and named entities in \
|
||
your output must match the technology, vocabulary, and named entities in the \
|
||
engineer's most recent message. If the engineer's ticket is about a different \
|
||
domain than the last ticket you saw, your output must reflect the new domain — \
|
||
do not let the previous ticket's specifics bleed into the new one.
|
||
"""
|
||
|
||
|
||
async def _call_ai(
|
||
system_base: str,
|
||
rag_context: str,
|
||
history: list[dict[str, Any]],
|
||
new_message: str,
|
||
max_tokens: int = 4096,
|
||
images: list[dict[str, Any]] | None = None,
|
||
) -> tuple[str, int, int]:
|
||
"""Call the AI with prompt caching when using Anthropic.
|
||
|
||
Caching strategy:
|
||
- System prompt base: cached (stable across all turns)
|
||
- RAG context: NOT cached (changes per query)
|
||
- Conversation history prefix: cached via breakpoint on last
|
||
existing message (stable — only new user message is uncached)
|
||
|
||
Args:
|
||
images: Optional list of {"media_type": str, "data": str (base64)}
|
||
to include alongside the new_message as vision content.
|
||
"""
|
||
if settings.AI_PROVIDER == "anthropic" and settings.ANTHROPIC_API_KEY:
|
||
return await chat_call_cached(
|
||
system_base, rag_context, history, new_message, max_tokens,
|
||
images=images,
|
||
)
|
||
|
||
# Fallback: generic provider (Gemini, etc.) — images not supported
|
||
from app.core.ai_provider import get_ai_provider
|
||
|
||
system_prompt = system_base + rag_context
|
||
messages = history + [{"role": "user", "content": new_message}]
|
||
provider = get_ai_provider()
|
||
return await provider.generate_text(
|
||
system_prompt=system_prompt,
|
||
messages=messages,
|
||
max_tokens=max_tokens,
|
||
)
|
||
|
||
|
||
# Appended to every chat turn's user message immediately before generation.
|
||
# Invisible to storage (unified_chat_service strips markers before persisting),
|
||
# but critical for structured output compliance — the model emits invalid
|
||
# responses often enough without it that removing this reminder regresses UX.
|
||
_CHAT_FORMAT_REMINDER = (
|
||
"\n\n[SYSTEM: Remember — your response MUST end with [QUESTIONS] "
|
||
"and/or [ACTIONS] markers containing valid JSON arrays. "
|
||
"Responses without markers break the UI.]"
|
||
)
|
||
|
||
|
||
async def chat_call_cached(
|
||
system_base: str,
|
||
rag_context: str,
|
||
history: list[dict[str, Any]],
|
||
new_message: str,
|
||
max_tokens: int,
|
||
images: list[dict[str, Any]] | None = None,
|
||
) -> tuple[str, int, int]:
|
||
"""Call Anthropic's chat surface with caching, MCP, images, and retry-without-MCP.
|
||
|
||
This is the ONE MCP/beta/multimodal chat caller. It is deliberately NOT
|
||
routed through `AnthropicProvider`. See module docstring for rationale.
|
||
|
||
Responsibilities unique to this function (not in the provider):
|
||
- Anthropic beta endpoint (`client.beta.messages.create`)
|
||
- Microsoft Learn MCP connector wiring (optional via ENABLE_MCP_MICROSOFT_LEARN)
|
||
- Retry-without-MCP fallback when the MCP server misbehaves
|
||
- Multimodal image blocks in the user message
|
||
- Format-reminder append for structured-output compliance
|
||
- Telemetry (`mcp.turn`, `mcp.fallback`) for Phase 0.5 MCP usage signal
|
||
|
||
Cache plumbing is shared with the provider via helpers in `ai_provider`:
|
||
`_normalize_system_for_anthropic` (policy α — ephemeral on first block if
|
||
none specified), `build_anthropic_chat_messages` (history cache breakpoint +
|
||
multimodal user message + format reminder), `_log_anthropic_cache_usage`.
|
||
"""
|
||
import anthropic
|
||
|
||
client = _get_anthropic_client(
|
||
settings.ANTHROPIC_API_KEY,
|
||
timeout=settings.AI_REQUEST_TIMEOUT_SECONDS,
|
||
)
|
||
|
||
# System prompt as structured blocks. The static base is cacheable; the
|
||
# RAG context changes per query and must NOT be cached — so we mark the
|
||
# base explicitly and leave the RAG block unmarked. `_normalize_system`
|
||
# honors caller-authored cache_control verbatim (policy α).
|
||
system_blocks: list[dict[str, Any]] = [
|
||
{
|
||
"type": "text",
|
||
"text": system_base,
|
||
"cache_control": {"type": "ephemeral"},
|
||
# cacheable: static system prompt, stable across all turns of all sessions
|
||
},
|
||
]
|
||
if rag_context:
|
||
system_blocks.append(
|
||
{"type": "text", "text": rag_context}
|
||
# uncached: RAG retrieval varies per query
|
||
)
|
||
normalized_system = _normalize_system_for_anthropic(system_blocks)
|
||
|
||
messages = build_anthropic_chat_messages(
|
||
history=history,
|
||
new_message=new_message,
|
||
images=images,
|
||
format_reminder=_CHAT_FORMAT_REMINDER,
|
||
)
|
||
|
||
# MCP server config (optional — controlled by settings)
|
||
mcp_servers = anthropic.NOT_GIVEN
|
||
tools = anthropic.NOT_GIVEN
|
||
|
||
if settings.ENABLE_MCP_MICROSOFT_LEARN:
|
||
mcp_servers = [
|
||
{
|
||
"type": "url",
|
||
"url": "https://learn.microsoft.com/api/mcp",
|
||
"name": "microsoft-learn",
|
||
}
|
||
]
|
||
tools = [
|
||
{
|
||
"type": "mcp_toolset",
|
||
"mcp_server_name": "microsoft-learn",
|
||
}
|
||
]
|
||
|
||
_mcp_active = mcp_servers is not anthropic.NOT_GIVEN
|
||
_mcp_fallback_triggered = False
|
||
|
||
try:
|
||
response = await client.beta.messages.create(
|
||
model=settings.AI_MODEL_ANTHROPIC,
|
||
max_tokens=max_tokens,
|
||
system=normalized_system,
|
||
messages=messages,
|
||
mcp_servers=mcp_servers,
|
||
tools=tools,
|
||
betas=["mcp-client-2025-11-20"],
|
||
)
|
||
except Exception as e:
|
||
# MCP server failures surface as many error types — BadRequestError,
|
||
# APIStatusError, APIConnectionError, APITimeoutError. Always retry
|
||
# without MCP when MCP was active, so a flaky external server never
|
||
# blocks the assistant entirely.
|
||
_is_mcp_error = _mcp_active and (
|
||
"MCP server" in str(e)
|
||
or "mcp" in type(e).__name__.lower()
|
||
or isinstance(e, (anthropic.BadRequestError, anthropic.APIStatusError))
|
||
)
|
||
if _is_mcp_error:
|
||
_mcp_fallback_triggered = True
|
||
logger.warning(
|
||
"MCP server error (%s), retrying without MCP: %s",
|
||
type(e).__name__, e,
|
||
)
|
||
# Phase 0.5 telemetry: per-turn fallback event.
|
||
logger.info(
|
||
"mcp.fallback",
|
||
extra={
|
||
"event": "mcp.fallback",
|
||
"mcp_error_type": type(e).__name__,
|
||
"mcp_error_message": str(e)[:500],
|
||
},
|
||
)
|
||
response = await client.messages.create(
|
||
model=settings.AI_MODEL_ANTHROPIC,
|
||
max_tokens=max_tokens,
|
||
system=normalized_system,
|
||
messages=messages,
|
||
)
|
||
else:
|
||
raise
|
||
|
||
# Extract text from response — MCP responses can have multiple block
|
||
# types (text, mcp_tool_use, mcp_tool_result). We join all text blocks.
|
||
text_parts = []
|
||
mcp_tools_used = []
|
||
for block in response.content:
|
||
if hasattr(block, "text"):
|
||
text_parts.append(block.text)
|
||
if getattr(block, "type", None) == "mcp_tool_use":
|
||
mcp_tools_used.append(getattr(block, "name", "unknown"))
|
||
|
||
text = "\n".join(text_parts) if text_parts else ""
|
||
|
||
usage = response.usage
|
||
input_tokens = usage.input_tokens
|
||
output_tokens = usage.output_tokens
|
||
|
||
# Phase 0.5 telemetry: per-turn MCP event. Emitted for every turn that
|
||
# reached this code path (i.e., AI_PROVIDER=anthropic chat). `mcp_available`
|
||
# reflects whether MCP was actually wired into the request (scope (ii) from
|
||
# the Phase 0.5 design — Anthropic code path AND flag on). `mcp_invoked`
|
||
# reflects whether the model chose to call an MCP tool on this turn.
|
||
logger.info(
|
||
"mcp.turn",
|
||
extra={
|
||
"event": "mcp.turn",
|
||
"mcp_available": _mcp_active,
|
||
"mcp_invoked": bool(mcp_tools_used),
|
||
"mcp_tools": mcp_tools_used,
|
||
"mcp_fallback_triggered": _mcp_fallback_triggered,
|
||
},
|
||
)
|
||
|
||
# Human-readable log retained for grep-based inspection.
|
||
if mcp_tools_used:
|
||
logger.info("MCP tools used: %s", ", ".join(mcp_tools_used))
|
||
|
||
_log_anthropic_cache_usage(usage, settings.AI_MODEL_ANTHROPIC)
|
||
|
||
return text, input_tokens, output_tokens
|
||
|
||
|
||
def _auto_title(message: str) -> str:
|
||
"""Generate a short title from the first user message."""
|
||
title = message.strip()[:100]
|
||
if len(message) > 100:
|
||
title = title.rsplit(" ", 1)[0] + "..."
|
||
return title
|