docs: add AI auto-fix and Gemini Flash provider design

Design for two combined features: Gemini 2.5 Flash as primary AI provider with Claude fallback, and AI-powered auto-fix for validation errors in the tree editor. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:28:33 -05:00
parent 1002f0c177
commit 6527b33d05
1 changed files with 209 additions and 0 deletions
--- a/docs/plans/2026-02-26-ai-autofix-gemini-design.md
+++ b/docs/plans/2026-02-26-ai-autofix-gemini-design.md
@@ -0,0 +1,209 @@
+# AI Auto-Fix & Gemini Flash Provider Design
+
+> **Date:** 2026-02-26
+> **Status:** Approved
+
+---
+
+## Overview
+
+Two combined features:
+
+1. **AI Provider Abstraction** — Add Gemini 2.5 Flash as the default AI provider with Claude as fallback, behind a unified interface.
+2. **AI Auto-Fix for Validation Errors** — When a flow fails validation, offer an AI-powered "Fix with AI" button that generates structural fixes for review.
+
+---
+
+## Section 1: AI Provider Abstraction
+
+### Design
+
+New `backend/app/core/ai_provider.py` with a unified interface:
+
+```python
+class AIProvider(ABC):
+    async def generate_json(
+        self,
+        system_prompt: str,
+        messages: list[dict],
+        max_tokens: int = 4096,
+    ) -> tuple[str, int, int]:
+        """Returns (text, input_tokens, output_tokens)"""
+```
+
+Two implementations:
+
+| Provider | Model | SDK | Role |
+|----------|-------|-----|------|
+| `GeminiProvider` | `gemini-2.5-flash` | `google-genai` | Default |
+| `AnthropicProvider` | `claude-haiku-4-5-20251001` | `anthropic` | Fallback |
+
+### Provider Selection
+
+- `get_ai_provider()` factory reads `AI_PROVIDER` env var (default: `"gemini"`)
+- Falls back to Anthropic if Gemini key is missing
+- Existing `ai_tree_generator_service.py` swaps direct Anthropic calls for `get_ai_provider()`
+
+### New Environment Variables
+
+| Variable | Default | Purpose |
+|----------|---------|---------|
+| `AI_PROVIDER` | `"gemini"` | Which provider to use (`gemini` or `anthropic`) |
+| `GOOGLE_AI_API_KEY` | — | Gemini API key |
+
+Existing `ANTHROPIC_API_KEY` remains for fallback.
+
+### Config Changes (`core/config.py`)
+
+```python
+AI_PROVIDER: str = "gemini"
+GOOGLE_AI_API_KEY: str | None = None
+AI_MODEL_GEMINI: str = "gemini-2.5-flash"
+AI_MODEL_ANTHROPIC: str = "claude-haiku-4-5-20251001"
+```
+
+---
+
+## Section 2: AI Auto-Fix Feature
+
+### Backend Endpoint
+
+**`POST /api/v1/ai/fix-tree`**
+
+Request:
+```json
+{
+  "tree_structure": { /* full tree */ },
+  "tree_name": "Router Troubleshooting",
+  "tree_type": "troubleshooting",
+  "validation_errors": [
+    {
+      "node_id": "node_abc",
+      "message": "Decision node must have at least 2 children (branches)"
+    }
+  ]
+}
+```
+
+Response:
+```json
+{
+  "fixes": [
+    {
+      "target_node_id": "node_abc",
+      "error_message": "Decision node must have at least 2 children (branches)",
+      "description": "Added second branch 'Check firmware version' with solution node",
+      "original_node": { /* snapshot before fix */ },
+      "fixed_node": { /* replacement node with corrected subtree */ }
+    }
+  ],
+  "tokens_used": { "input": 1200, "output": 800 }
+}
+```
+
+### How It Works
+
+1. For each validation error tied to a `node_id`, extract that node + its parent + siblings from the tree.
+2. Build a prompt with:
+   - The **full tree structure** serialized as a simplified outline (node titles + types + structure) for context
+   - The **specific failing node** highlighted with full JSON detail
+   - The **validation error message**
+   - Instructions: "Fix ONLY this node's structural issue. Keep all existing content. Generate domain-relevant additions that fit the flow's topic."
+3. AI returns a corrected version of that node (with children/options adjusted).
+4. Backend re-validates the fixed node before returning it.
+5. If re-validation fails, retry once with the error fed back (corrective prompt pattern).
+
+### Prompt Strategy
+
+The prompt gives the AI the full tree as a compact outline, then zooms into the failing node:
+
+```
+You are fixing a validation error in a troubleshooting flow called "Router Troubleshooting".
+
+FULL FLOW OUTLINE:
+- [decision] Is the router powered on?
+  - [action] Check power cable → [solution] Power restored
+  - [decision] Are lights blinking? ← ERROR HERE
+    - [solution] Contact ISP
+
+ERROR: Decision node "Are lights blinking?" must have at least 2 children (branches).
+
+FAILING NODE (full detail):
+{...json...}
+
+Fix this node by adding the minimum structure needed to resolve the error.
+Return ONLY the fixed node as JSON.
+```
+
+### Frontend UX
+
+1. **Trigger**: "Fix with AI" button in `ValidationSummary` — appears when there are fixable errors (structural errors with a `node_id`).
+2. **Loading state**: Button shows spinner + "Generating fixes..." — disabled during request.
+3. **Review modal** (`AIFixReviewModal`): Shows each proposed fix as a card:
+   - Error message at top
+   - Before/after view of the node change
+   - "Apply" / "Skip" buttons per fix
+   - "Apply All" button in footer
+4. **Apply**: Each accepted fix calls `updateNode(targetNodeId, fixedNode)` in the tree editor store.
+5. **Re-validate**: After applying fixes, auto-run `validate()` to confirm resolution.
+
+---
+
+## Section 3: Scope & Constraints
+
+### Fixable Errors (Auto-Fix Scope)
+
+Only structural validation errors with a `node_id`:
+- Decision node missing children/branches
+- Decision node missing options
+- Action node missing `next_node_id`
+- Dead-end decision nodes (no children)
+
+### NOT Fixable
+
+- Global checks (tree too small/large, not enough solutions) — require rethinking the whole tree
+- Content quality issues — out of scope
+- Errors without a `node_id` (root-level issues)
+
+Non-fixable errors still show in ValidationSummary but without the "Fix with AI" option.
+
+### Token Budget
+
+- Tree outline: ~50-100 tokens for a typical 15-node tree
+- Failing node detail: ~100-200 tokens
+- System prompt + instructions: ~300 tokens
+- **Total input per fix: ~500-600 tokens**
+- One API call per failing node (not batched)
+
+### Error Handling
+
+- Provider failure (rate limit, network): toast error, user can retry
+- Fix fails re-validation: "AI couldn't generate a valid fix" with retry option
+- Max 1 retry with corrective prompt per attempt
+- Both provider and fallback fail: surface error to user
+
+### Auth
+
+- Requires `engineer` role or above (`require_engineer_or_admin`)
+
+---
+
+## New Files
+
+| File | Purpose |
+|------|---------|
+| `backend/app/core/ai_provider.py` | Provider abstraction + Gemini/Anthropic implementations |
+| `backend/app/core/ai_fix_service.py` | Fix generation logic + prompt building |
+| `backend/app/api/endpoints/ai.py` | `POST /ai/fix-tree` endpoint |
+| `backend/app/schemas/ai.py` | Request/response schemas for AI endpoints |
+| `frontend/src/components/tree-editor/AIFixReviewModal.tsx` | Review modal for proposed fixes |
+
+## Modified Files
+
+| File | Change |
+|------|--------|
+| `backend/app/core/config.py` | Add Gemini config vars |
+| `backend/app/core/ai_tree_generator_service.py` | Swap Anthropic calls for provider abstraction |
+| `backend/app/api/router.py` | Register `/ai` routes |
+| `frontend/src/api/trees.ts` | Add `fixTree()` API call |
+| `frontend/src/components/tree-editor/ValidationSummary.tsx` | Add "Fix with AI" button |