refactor(ai): rename _call_anthropic_cached → chat_call_cached; extract cache plumbing (Phase 0.4)

Renames the chat caller to a name that signals its actual purpose, and factors the reusable cached-system-block + cached-history + cache-usage-log primitives out to app.core.ai_provider so they can be shared with the provider-generic path without pulling MCP/beta/images into the abstract interface. Helpers added to ai_provider.py: - `build_anthropic_chat_messages(history, new_message, images, format_reminder)` — owns: copy history, apply cache_control to last history message, append format reminder to new message, render images as multimodal blocks. Anthropic-shaped by design; do not call from Gemini paths. chat_call_cached keeps exactly the concerns that are unique to the one MCP/beta/multimodal chat caller: - Anthropic beta endpoint invocation - Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN) - Retry-without-MCP fallback - Format-reminder content string (declared as module constant) - Phase 0.5 telemetry (mcp.turn, mcp.fallback) Documents in the module docstring AND at the function site that this is the ONE MCP/beta chat caller and should not become the general provider path. MCP/beta/images are features of exactly one optional Anthropic beta endpoint; routing them through AnthropicProvider would leak a provider- specific concern into the abstract interface that also serves Gemini. Behavior change: chat_call_cached now reuses the singleton AnthropicProvider HTTP client via `_get_anthropic_client(...)` instead of instantiating a new `anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern and avoids burning connections per-turn. No user-visible difference. No runtime verification from code-server. TODO(phase0-verify) in ai_provider.py tracks the cache-hit verification owed on the new dev env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 17:03:09 +00:00
parent da93ae55c3
commit 3f0a132058
2 changed files with 138 additions and 68 deletions
--- a/backend/app/core/ai_provider.py
+++ b/backend/app/core/ai_provider.py
@@ -80,6 +80,73 @@ def _flatten_system_for_gemini(
    return "\n\n".join(b.get("text", "") for b in system_prompt)


+def build_anthropic_chat_messages(
+    history: list[dict[str, Any]],
+    new_message: str,
+    images: list[dict[str, Any]] | None = None,
+    format_reminder: str | None = None,
+) -> list[dict[str, Any]]:
+    """Construct the Anthropic `messages` payload for a cached multi-turn chat.
+
+    Responsibilities:
+    - Copy the valid history messages in order.
+    - Apply `cache_control: ephemeral` to the LAST history message so the entire
+      conversation prefix is cached across turns. The new user message stays
+      uncached (it changes each turn).
+    - Append `format_reminder` to the new user message if provided. The reminder
+      is invisible to storage (caller's concern) but helps enforce structured
+      output compliance at generation time.
+    - If `images` are provided, render the new user message as a multimodal
+      content block list (images first, then text). Otherwise, render it as
+      a plain string.
+
+    This helper is Anthropic-specific: the cache-breakpoint pattern, ephemeral
+    cache_control, and multimodal block shape are all Anthropic conventions.
+    Do not call it from Gemini code paths.
+    """
+    messages: list[dict[str, Any]] = []
+    for msg in history:
+        messages.append({"role": msg["role"], "content": msg["content"]})
+
+    # Cache breakpoint on the last existing history message so the entire
+    # conversation prefix is cached across turns. Safe only when there IS a
+    # history message; otherwise the new message is the only message.
+    if messages:
+        last = messages[-1]
+        messages[-1] = {
+            "role": last["role"],
+            "content": [
+                {
+                    "type": "text",
+                    "text": last["content"],
+                    "cache_control": {"type": "ephemeral"},
+                }
+            ],
+        }
+
+    effective_text = new_message + (format_reminder or "")
+
+    if images:
+        content_blocks: list[dict[str, Any]] = []
+        for img in images:
+            content_blocks.append(
+                {
+                    "type": "image",
+                    "source": {
+                        "type": "base64",
+                        "media_type": img["media_type"],
+                        "data": img["data"],
+                    },
+                }
+            )
+        content_blocks.append({"type": "text", "text": effective_text})
+        messages.append({"role": "user", "content": content_blocks})
+    else:
+        messages.append({"role": "user", "content": effective_text})
+
+    return messages
+
+
 def _log_anthropic_cache_usage(usage: Any, model: str) -> None:
    """Emit a structured log line capturing cache_read / cache_creation tokens."""
    cache_read = getattr(usage, "cache_read_input_tokens", 0) or 0