feat(pilot): Phase 2 — What we know (facts) with stable task-lane IDs

Adds the load-bearing structural feature of the FlowPilot migration: a "What we know" panel that holds confirmed facts for a session, fed by AI [PROMOTE] markers and engineer-added notes. Facts feed the resolution note preview (Phase 3) and survive across turns via stable UUIDs assigned to pending_task_lane items. Backend: - FactSynthesisService: create/update/soft-delete facts with atomic state_version bumps; LLM-backed synthesize_from_question/check on the fact_synthesis (Haiku) action tier per Section 6.6. - /api/v1/ai-sessions/{id}/facts CRUD + /facts/promote (proposed_text or via synthesis). PATCH returns 403 for question/diagnostic_check facts (edit the source item instead, Section 7.3). - unified_chat_service: [PROMOTE] marker parser (JSON-block per Section 8.1 spec drift note), stable-UUID assignment for pending_task_lane questions/actions preserved by exact text/label match across turns. - ASSISTANT_SYSTEM_PROMPT: documents [PROMOTE] format, when to/not to emit, hallucination guardrails, source_ref handling. - 17 tests covering parser, stable IDs, service validation, CRUD, editability rule, both promote modes, 422 null-synthesis path, state_version invariant. Frontend: - src/components/pilot/sections/{WhatWeKnow,WhatWeKnowItem,AddNoteButton} — green-gradient section above Questions, dashed-circle check, inline edit/delete gated by the server's editable flag. - TaskLane gains a whatWeKnowSlot prop (existing assistant/ folder kept per the doc's "rename is opportunistic" guidance). - AssistantChatPage fetches facts on selectChat and refetches after each chat send (so [PROMOTE]-synthesized facts appear immediately); auto- opens the lane when facts exist. Verification: end-to-end smoke against the local docker stack confirms all five endpoints (list/create/patch/delete/promote) plus the 403 editability rule. pytest suite verifies the same with mocked LLM. Live [PROMOTE] flow remains untested until used in the UI — the marker shape is covered by parser tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:13:44 -04:00
parent 19cfd71995
commit 625dba7548
15 changed files with 1922 additions and 21 deletions
--- a/backend/app/services/fact_synthesis_service.py
+++ b/backend/app/services/fact_synthesis_service.py
@@ -0,0 +1,285 @@
+"""FactSynthesisService — converts engineer answers and check output into facts.
+
+Two paths feed this service:
+
+1. **AI marker path (the common case).** When the model emits a `[PROMOTE]`
+   marker in the chat stream, `unified_chat_service` parses the marker (which
+   already contains the engineer-readable `text` and short provenance `summary`)
+   and calls `create_fact` directly. No LLM call is needed — the model already
+   wrote the fact.
+
+2. **Engineer-driven synthesize path.** The "+ Promote to What we know" affordance
+   in the UI sends a raw answer or check output and asks the server to draft
+   `text` + `summary` for review. `synthesize_from_question` /
+   `synthesize_from_check` make a small Haiku call for that draft. The engineer
+   confirms (or edits) before persistence, so the LLM output is never
+   silently posted to a customer ticket.
+
+Either way, persistence funnels through `create_fact`, which ALSO bumps
+`ai_sessions.state_version` so the resolution-note preview cache invalidates
+(see FLOWPILOT-MIGRATION.md Section 5.5).
+
+Model tier is `fact_synthesis` in `settings.ACTION_MODEL_MAP` (Haiku per
+Section 6.6). MCP is intentionally disabled for synthesis — these are
+pure transformations of input, not research calls.
+"""
+from __future__ import annotations
+
+import json
+import logging
+import re
+from typing import Any
+from uuid import UUID
+
+from sqlalchemy import select, update
+from sqlalchemy.ext.asyncio import AsyncSession
+
+from app.core.ai_provider import get_ai_provider
+from app.core.config import settings
+from app.models.ai_session import AISession
+from app.models.session_fact import SessionFact
+
+logger = logging.getLogger(__name__)
+
+
+# Conservative synthesis prompt. Hallucinated specifics are a trust-killer
+# because facts feed the resolution note posted to customer tickets — the
+# prompt makes "no fact" an explicit, valid output.
+_SYNTHESIS_SYSTEM_PROMPT = """\
+You convert one engineer answer or one diagnostic-check output into a single \
+candidate fact for a troubleshooting session's "What we know" log.
+
+Return strict JSON with this shape:
+{
+  "text": "<one short sentence stating what is now known, in past tense>",
+  "summary": "<3-7 word provenance label, e.g. 'rules out tenant/license'>"
+}
+
+If the answer/output does NOT contain a substantive fact (e.g. the engineer \
+typed 'unknown', the command failed, the output is empty), return:
+{
+  "text": null,
+  "summary": null
+}
+
+Strict rules:
+- Use ONLY information present in the input. Never add details that were not stated.
+- Do not speculate, infer causes, or extrapolate. State only what the input proves.
+- The text is a fact a colleague could verify by looking at the original answer/output.
+- The summary names the diagnostic value (what this fact rules in or out), not the topic.
+- Output ONLY the JSON object, no prose, no markdown fences.
+"""
+
+
+class FactSynthesisService:
+    """Persists session facts and (optionally) drafts them via an LLM call.
+
+    Methods that touch the database take an `AsyncSession` and assume the
+    caller commits. `create_fact` flushes so the returned row has a primary key.
+    """
+
+    def __init__(self, db: AsyncSession) -> None:
+        self.db = db
+
+    # ── Persistence ────────────────────────────────────────────────────────
+
+    async def create_fact(
+        self,
+        *,
+        session_id: UUID,
+        account_id: UUID,
+        user_id: UUID,
+        source_type: str,
+        text: str,
+        summary: str | None = None,
+        source_ref: UUID | None = None,
+    ) -> SessionFact:
+        """Persist a fact and bump the session's preview-cache version.
+
+        `source_ref` MUST be None for `user_note` and `ai_synthesis` sources;
+        for `question` and `diagnostic_check` it should point at the stable
+        UUID of the originating task-lane item. The DB has no FK constraint
+        on `source_ref` (the target lives inside JSONB) — integrity is enforced
+        here.
+        """
+        if source_type not in ("question", "diagnostic_check", "user_note", "ai_synthesis"):
+            raise ValueError(f"Invalid source_type: {source_type}")
+
+        if source_type in ("user_note", "ai_synthesis") and source_ref is not None:
+            # `source_ref` is a back-pointer to a question/check; user notes
+            # and AI-synthesized facts have no source item to point at.
+            raise ValueError(
+                f"source_ref must be None for source_type={source_type}"
+            )
+
+        text = (text or "").strip()
+        if not text:
+            raise ValueError("Fact text cannot be empty")
+
+        fact = SessionFact(
+            session_id=session_id,
+            account_id=account_id,
+            text=text,
+            source_type=source_type,
+            source_ref=source_ref,
+            source_summary=(summary or "").strip() or None,
+            created_by=user_id,
+        )
+        self.db.add(fact)
+
+        # Invalidate any preview cached against the prior state_version.
+        # Single UPDATE so the bump is atomic relative to the fact insert
+        # within this transaction; concurrent writers serialize on the row.
+        await self.db.execute(
+            update(AISession)
+            .where(AISession.id == session_id)
+            .values(state_version=AISession.state_version + 1)
+        )
+        await self.db.flush()
+        return fact
+
+    async def soft_delete_fact(self, fact: SessionFact) -> None:
+        """Mark a fact deleted and bump state_version."""
+        from datetime import datetime, timezone
+
+        fact.deleted_at = datetime.now(timezone.utc)
+        await self.db.execute(
+            update(AISession)
+            .where(AISession.id == fact.session_id)
+            .values(state_version=AISession.state_version + 1)
+        )
+        await self.db.flush()
+
+    async def update_fact(
+        self,
+        fact: SessionFact,
+        *,
+        text: str | None = None,
+        summary: str | None = None,
+    ) -> SessionFact:
+        """Update an editable fact and bump state_version.
+
+        Caller is responsible for the editability check — only `user_note`
+        and `ai_synthesis` facts may be edited at the card level. The
+        endpoint enforces this and returns 403 for the read-only types.
+        """
+        if text is not None:
+            stripped = text.strip()
+            if not stripped:
+                raise ValueError("Fact text cannot be empty")
+            fact.text = stripped
+        if summary is not None:
+            fact.source_summary = summary.strip() or None
+
+        await self.db.execute(
+            update(AISession)
+            .where(AISession.id == fact.session_id)
+            .values(state_version=AISession.state_version + 1)
+        )
+        await self.db.flush()
+        return fact
+
+    # ── LLM-backed drafting ────────────────────────────────────────────────
+
+    async def synthesize_from_question(
+        self, *, question_text: str, raw_answer: str
+    ) -> dict[str, str | None]:
+        """Draft `{text, summary}` from a question + engineer's free-text answer.
+
+        Returns `{"text": None, "summary": None}` when the answer doesn't
+        contain a substantive fact — caller should not persist in that case.
+        """
+        return await self._synthesize(
+            user_input=(
+                f"Question asked: {question_text.strip()}\n"
+                f"Engineer's answer: {raw_answer.strip()}"
+            ),
+        )
+
+    async def synthesize_from_check(
+        self, *, check_label: str, check_output: str
+    ) -> dict[str, str | None]:
+        """Draft `{text, summary}` from a diagnostic check label + its output."""
+        return await self._synthesize(
+            user_input=(
+                f"Diagnostic check: {check_label.strip()}\n"
+                f"Output:\n{check_output.strip()}"
+            ),
+        )
+
+    async def _synthesize(self, *, user_input: str) -> dict[str, str | None]:
+        """Single Haiku call with the conservative synthesis prompt."""
+        model = settings.get_model_for_action("fact_synthesis")
+        provider = get_ai_provider(model=model)
+
+        # Cache the system prompt — it's identical across every synthesis call.
+        system_blocks: list[dict[str, Any]] = [
+            {
+                "type": "text",
+                "text": _SYNTHESIS_SYSTEM_PROMPT,
+                "cache_control": {"type": "ephemeral"},
+                # cacheable: identical across all fact-synthesis calls
+            },
+        ]
+
+        try:
+            text, _in, _out = await provider.generate_json(
+                system_prompt=system_blocks,
+                messages=[{"role": "user", "content": user_input}],
+                max_tokens=200,
+            )
+        except Exception:
+            logger.exception("Fact synthesis LLM call failed")
+            return {"text": None, "summary": None}
+
+        return self._parse_synthesis_response(text)
+
+    @staticmethod
+    def _parse_synthesis_response(raw: str) -> dict[str, str | None]:
+        """Tolerant parse: strip fences, accept null fields, ignore extras."""
+        cleaned = raw.strip()
+        if cleaned.startswith("```"):
+            cleaned = re.sub(r"^```(?:json)?\s*", "", cleaned)
+            cleaned = re.sub(r"\s*```$", "", cleaned)
+
+        try:
+            data = json.loads(cleaned)
+        except (json.JSONDecodeError, ValueError):
+            logger.warning("Fact synthesis returned non-JSON: %r", raw[:200])
+            return {"text": None, "summary": None}
+
+        if not isinstance(data, dict):
+            return {"text": None, "summary": None}
+
+        text = data.get("text")
+        summary = data.get("summary")
+        if text is not None and not isinstance(text, str):
+            text = None
+        if summary is not None and not isinstance(summary, str):
+            summary = None
+
+        # Treat empty strings the same as null — "no substantive fact".
+        if isinstance(text, str) and not text.strip():
+            text = None
+        if isinstance(summary, str) and not summary.strip():
+            summary = None
+
+        return {"text": text, "summary": summary}
+
+
+async def list_facts_for_session(
+    db: AsyncSession, session_id: UUID
+) -> list[SessionFact]:
+    """List non-deleted facts for a session, oldest first.
+
+    RLS filters by tenant; the explicit account_id check is unnecessary here.
+    """
+    result = await db.execute(
+        select(SessionFact)
+        .where(
+            SessionFact.session_id == session_id,
+            SessionFact.deleted_at.is_(None),
+        )
+        .order_by(SessionFact.created_at.asc())
+    )
+    return list(result.scalars().all())