refactor(ai): rename _call_anthropic_cached → chat_call_cached; extract cache plumbing (Phase 0.4)
Renames the chat caller to a name that signals its actual purpose, and factors the reusable cached-system-block + cached-history + cache-usage-log primitives out to app.core.ai_provider so they can be shared with the provider-generic path without pulling MCP/beta/images into the abstract interface. Helpers added to ai_provider.py: - `build_anthropic_chat_messages(history, new_message, images, format_reminder)` — owns: copy history, apply cache_control to last history message, append format reminder to new message, render images as multimodal blocks. Anthropic-shaped by design; do not call from Gemini paths. chat_call_cached keeps exactly the concerns that are unique to the one MCP/beta/multimodal chat caller: - Anthropic beta endpoint invocation - Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN) - Retry-without-MCP fallback - Format-reminder content string (declared as module constant) - Phase 0.5 telemetry (mcp.turn, mcp.fallback) Documents in the module docstring AND at the function site that this is the ONE MCP/beta chat caller and should not become the general provider path. MCP/beta/images are features of exactly one optional Anthropic beta endpoint; routing them through AnthropicProvider would leak a provider- specific concern into the abstract interface that also serves Gemini. Behavior change: chat_call_cached now reuses the singleton AnthropicProvider HTTP client via `_get_anthropic_client(...)` instead of instantiating a new `anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern and avoids burning connections per-turn. No user-visible difference. No runtime verification from code-server. TODO(phase0-verify) in ai_provider.py tracks the cache-hit verification owed on the new dev env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -80,6 +80,73 @@ def _flatten_system_for_gemini(
|
||||
return "\n\n".join(b.get("text", "") for b in system_prompt)
|
||||
|
||||
|
||||
def build_anthropic_chat_messages(
|
||||
history: list[dict[str, Any]],
|
||||
new_message: str,
|
||||
images: list[dict[str, Any]] | None = None,
|
||||
format_reminder: str | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Construct the Anthropic `messages` payload for a cached multi-turn chat.
|
||||
|
||||
Responsibilities:
|
||||
- Copy the valid history messages in order.
|
||||
- Apply `cache_control: ephemeral` to the LAST history message so the entire
|
||||
conversation prefix is cached across turns. The new user message stays
|
||||
uncached (it changes each turn).
|
||||
- Append `format_reminder` to the new user message if provided. The reminder
|
||||
is invisible to storage (caller's concern) but helps enforce structured
|
||||
output compliance at generation time.
|
||||
- If `images` are provided, render the new user message as a multimodal
|
||||
content block list (images first, then text). Otherwise, render it as
|
||||
a plain string.
|
||||
|
||||
This helper is Anthropic-specific: the cache-breakpoint pattern, ephemeral
|
||||
cache_control, and multimodal block shape are all Anthropic conventions.
|
||||
Do not call it from Gemini code paths.
|
||||
"""
|
||||
messages: list[dict[str, Any]] = []
|
||||
for msg in history:
|
||||
messages.append({"role": msg["role"], "content": msg["content"]})
|
||||
|
||||
# Cache breakpoint on the last existing history message so the entire
|
||||
# conversation prefix is cached across turns. Safe only when there IS a
|
||||
# history message; otherwise the new message is the only message.
|
||||
if messages:
|
||||
last = messages[-1]
|
||||
messages[-1] = {
|
||||
"role": last["role"],
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": last["content"],
|
||||
"cache_control": {"type": "ephemeral"},
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
effective_text = new_message + (format_reminder or "")
|
||||
|
||||
if images:
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
for img in images:
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": img["media_type"],
|
||||
"data": img["data"],
|
||||
},
|
||||
}
|
||||
)
|
||||
content_blocks.append({"type": "text", "text": effective_text})
|
||||
messages.append({"role": "user", "content": content_blocks})
|
||||
else:
|
||||
messages.append({"role": "user", "content": effective_text})
|
||||
|
||||
return messages
|
||||
|
||||
|
||||
def _log_anthropic_cache_usage(usage: Any, model: str) -> None:
|
||||
"""Emit a structured log line capturing cache_read / cache_creation tokens."""
|
||||
cache_read = getattr(usage, "cache_read_input_tokens", 0) or 0
|
||||
|
||||
Reference in New Issue
Block a user