diff --git a/docs/FlowAssist_Migration/FLOWPILOT-MIGRATION.md b/docs/FlowAssist_Migration/FLOWPILOT-MIGRATION.md index 6009b7cb..a6cd9e5f 100644 --- a/docs/FlowAssist_Migration/FLOWPILOT-MIGRATION.md +++ b/docs/FlowAssist_Migration/FLOWPILOT-MIGRATION.md @@ -557,6 +557,21 @@ A codebase audit revealed that prompt caching is only implemented in `assistant_ - **0.1** Promote `AnthropicProvider.generate_json()` and `generate_text_stream()` in `ai_provider.py` to the cached pattern currently implemented in `assistant_chat_service.py:_call_anthropic_cached()`. Convert the `system` string parameter to a structured system block list with `cache_control: {"type": "ephemeral"}` on the static portion. Add a second breakpoint on the last history message. For the streaming variant, capture the final usage object via `get_final_message()`. Log `cache_read_input_tokens` and `cache_creation_input_tokens` on every response. - **0.2** Update `integrations.py:557` (`/tickets/ai-parse`) to move the members list and team-stable boards data into a cached system block. + + > **Phase 0.2 — pending target endpoint.** The `/tickets/ai-parse` endpoint described in the original migration doc does not exist in the codebase as of this commit. When this endpoint is built, apply the cached-system-block pattern: + > + > ```python + > system_blocks = [ + > {"type": "text", "text": members_json, "cache_control": {"type": "ephemeral"}}, + > # cacheable: team-stable + > {"type": "text", "text": boards_json, "cache_control": {"type": "ephemeral"}}, + > # cacheable: team-stable + > {"type": "text", "text": engineer_description}, + > # uncached: per-request + > ] + > ``` + > + > Remove this note when the endpoint is implemented and the pattern applied. - **0.3** Add `cache_control` to one-shot generators: `ai_tree_generator`, `kb_conversion`, `ai_fix`, `script_builder`. Same pattern as 0.1. - **0.4** Extract the caching logic from `assistant_chat_service.py:_call_anthropic_cached()` into `AnthropicProvider` and delete `_call_anthropic_cached`. `assistant_chat_service` should call the provider like every other service. This prevents two canonical implementations of the same pattern.