fix(pilot): outcome-aware Resolve/Escalate previews

Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't enough — previews were also omitting outcome fields from the LLM bundle, so a fresh regenerate still couldn't distinguish proposed / failed / partial / success. - PATCH /outcome now bumps ai_sessions.state_version (matches record_decision's existing pattern). - Resolution-note + escalation-package bundles now include status, applied_at, verified_at, partial_notes, failure_reason on the active fix. - Generator prompts prescribe outcome-aware phrasing (closure language for success; what-we've-tried + next-steps for failed/partial). - New end-to-end test asserts the regenerated preview reflects the recorded outcome, not just that the cache key changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:04:56 -04:00
parent ec104dc8de
commit 362c7b1d79
4 changed files with 208 additions and 15 deletions
--- a/backend/app/services/escalation_package_generator.py
+++ b/backend/app/services/escalation_package_generator.py
@@ -55,22 +55,45 @@ header.>
 If there are no facts, write "Nothing confirmed yet." and continue.>

 ## What we've tried
-<bulleted list of diagnostic checks run (from the [diagnostic_check] facts) \
-and scripts generated during the session. State what each revealed or did, \
-not what was attempted without an outcome. If nothing has been tried, write \
-"No diagnostic actions run yet." and continue.>
+<Bulleted list of diagnostic checks run and scripts generated during the \
+session. The content of this section also depends on the outcome recorded for \
+the active suggested fix, as given in the input bundle under "Outcome status":>
+
+- applied_failed: List the fix as a tried path. Include the failure reason if \
+provided. State that it did not resolve the issue.
+- applied_partial: Include the fix as a partially tried path. Include partial \
+notes if provided. Indicate it was not fully completed or not verified.
+- applied_success: Note that the fix was applied and verified but escalation \
+is still needed for another reason (unusual — reflect this accurately).
+- dismissed: Do not mention the fix as a tried path; it was only considered.
+- proposed (no outcome yet): Do not list it here; it goes in Current hypothesis.
+
+If nothing has been tried at all (no checks, no scripts, no applied/partial \
+fix), write "No diagnostic actions run yet." and continue.

 ## Current hypothesis
-<one short paragraph naming the active suggested fix and its confidence. If \
-confidence is below 60% or there is no active fix, say so plainly: "No leading \
-hypothesis yet — symptoms are still being narrowed.">
+<The content depends on the outcome recorded for the active suggested fix:>
+
+- proposed (no outcome yet): State the fix title and confidence. If confidence \
+is below 60% or there is no active fix, say "No leading hypothesis yet — \
+symptoms are still being narrowed."
+- applied_failed or dismissed: Say the proposed fix did not hold or was set \
+aside. State any remaining uncertainty.
+- applied_partial: Note the partial application and what remains open.
+- applied_success: Unusual in an escalate path — state the fix resolved the \
+original symptom but a new or related issue requires escalation.

 ## Suggested next steps
 <bulleted list of 2-4 concrete next actions the receiving engineer should \
 take. Prefer specifics: commands to run, tickets to check, people to contact. \
-Derive from the gap between confirmed facts and a complete resolution. If the \
-active suggested fix is high confidence (>80%), the first bullet is "Try the \
-suggested fix: <title>.">
+Derive from the gap between confirmed facts and a complete resolution. \
+If the active suggested fix failed (applied_failed), inform the next steps \
+accordingly — e.g. suggest alternatives or deeper investigation paths, \
+drawing on the failure reason if provided. \
+If the fix is partially applied (applied_partial), the first step is typically \
+to complete or verify it. \
+If the fix is still proposed (no outcome), the first step is to try it if \
+confidence is high (>80%).>

 Strict rules:
 - Use ONLY the input I provide. Never invent command names, KB articles, or \
@@ -269,6 +292,15 @@ class EscalationPackageGeneratorService:
            lines.append(f"Title: {active_fix.title}")
            lines.append(f"Confidence: {active_fix.confidence_pct}%")
            lines.append(f"Description: {active_fix.description}")
+            lines.append(f"Outcome status: {active_fix.status}")
+            if active_fix.applied_at:
+                lines.append(f"Applied at: {active_fix.applied_at.isoformat()}")
+            if active_fix.verified_at:
+                lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
+            if active_fix.partial_notes:
+                lines.append(f"Partial notes: {active_fix.partial_notes}")
+            if active_fix.failure_reason:
+                lines.append(f"Failure reason: {active_fix.failure_reason}")

        lines.append("")
        lines.append(
--- a/backend/app/services/resolution_note_generator.py
+++ b/backend/app/services/resolution_note_generator.py
@@ -69,11 +69,24 @@ say "Root cause not definitively isolated." and explain what is suspected based
 on facts.>

 ## Resolution
-<one short paragraph describing the resolution applied. If a script ran during \
-the session, mention it (e.g. "Cleared cached credentials via the \
-clear-outlook-credentials script."). If no resolution has been performed yet, \
-write "Resolution not yet applied — fix proposed: <fix title>." Pull verbatim \
-script names and template references when available.>
+<The content of this section depends on the outcome recorded for the active \
+suggested fix, as given in the input bundle under "fix.status":>
+
+- applied_success: Write in past tense using closure language. State that the \
+fix was applied and verified as working. If verified_at is provided, you may \
+reference it as the time resolution was confirmed. Example phrasing: \
+"Applied <fix title>; confirmed working."
+- applied_failed: Acknowledge that the proposed fix did not resolve the issue \
+and was discarded. If failure_reason is provided, include it. Then describe \
+the actual resolution path taken (derived from facts and scripts run). This \
+state means the engineer resolved the issue another way; the note should cover \
+that actual resolution, not just the failed attempt.
+- applied_partial: Note that the fix was partially applied. If partial_notes \
+are provided, include them. Then describe the final resolution path taken.
+- dismissed: Treat the fix as considered and set aside. Do not center the note \
+on it. Describe the resolution based on what was actually confirmed and done.
+- proposed (no outcome yet): Write "Resolution not yet applied — fix proposed: \
+<fix title>." Pull verbatim script names and template references when available.

 Strict rules:
 - Use ONLY the facts and state I provide. Never invent specifics that are not \
@@ -302,6 +315,15 @@ class ResolutionNoteGeneratorService:
            lines.append(f"Description: {active_fix.description}")
            if active_fix.user_decision:
                lines.append(f"Engineer decision: {active_fix.user_decision}")
+            lines.append(f"Outcome status: {active_fix.status}")
+            if active_fix.applied_at:
+                lines.append(f"Applied at: {active_fix.applied_at.isoformat()}")
+            if active_fix.verified_at:
+                lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
+            if active_fix.partial_notes:
+                lines.append(f"Partial notes: {active_fix.partial_notes}")
+            if active_fix.failure_reason:
+                lines.append(f"Failure reason: {active_fix.failure_reason}")

        lines.append("")
        lines.append("# Scripts run during the session (passwords redacted)")