fix(pilot): outcome-aware Resolve/Escalate previews
Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't enough — previews were also omitting outcome fields from the LLM bundle, so a fresh regenerate still couldn't distinguish proposed / failed / partial / success. - PATCH /outcome now bumps ai_sessions.state_version (matches record_decision's existing pattern). - Resolution-note + escalation-package bundles now include status, applied_at, verified_at, partial_notes, failure_reason on the active fix. - Generator prompts prescribe outcome-aware phrasing (closure language for success; what-we've-tried + next-steps for failed/partial). - New end-to-end test asserts the regenerated preview reflects the recorded outcome, not just that the cache key changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -276,6 +276,15 @@ async def patch_suggested_fix_outcome(
|
||||
if fix.applied_at is None and body.outcome != "dismissed":
|
||||
fix.applied_at = now
|
||||
|
||||
# Outcome changes the bundle that resolution-note/escalation-package
|
||||
# previews see, so bump state_version inside the same transaction —
|
||||
# mirrors the pattern in record_decision above.
|
||||
await db.execute(
|
||||
update(AISession)
|
||||
.where(AISession.id == session_id)
|
||||
.values(state_version=AISession.state_version + 1)
|
||||
)
|
||||
|
||||
await db.commit()
|
||||
await db.refresh(fix)
|
||||
return SessionSuggestedFixResponse.model_validate(fix)
|
||||
|
||||
@@ -55,22 +55,45 @@ header.>
|
||||
If there are no facts, write "Nothing confirmed yet." and continue.>
|
||||
|
||||
## What we've tried
|
||||
<bulleted list of diagnostic checks run (from the [diagnostic_check] facts) \
|
||||
and scripts generated during the session. State what each revealed or did, \
|
||||
not what was attempted without an outcome. If nothing has been tried, write \
|
||||
"No diagnostic actions run yet." and continue.>
|
||||
<Bulleted list of diagnostic checks run and scripts generated during the \
|
||||
session. The content of this section also depends on the outcome recorded for \
|
||||
the active suggested fix, as given in the input bundle under "Outcome status":>
|
||||
|
||||
- applied_failed: List the fix as a tried path. Include the failure reason if \
|
||||
provided. State that it did not resolve the issue.
|
||||
- applied_partial: Include the fix as a partially tried path. Include partial \
|
||||
notes if provided. Indicate it was not fully completed or not verified.
|
||||
- applied_success: Note that the fix was applied and verified but escalation \
|
||||
is still needed for another reason (unusual — reflect this accurately).
|
||||
- dismissed: Do not mention the fix as a tried path; it was only considered.
|
||||
- proposed (no outcome yet): Do not list it here; it goes in Current hypothesis.
|
||||
|
||||
If nothing has been tried at all (no checks, no scripts, no applied/partial \
|
||||
fix), write "No diagnostic actions run yet." and continue.
|
||||
|
||||
## Current hypothesis
|
||||
<one short paragraph naming the active suggested fix and its confidence. If \
|
||||
confidence is below 60% or there is no active fix, say so plainly: "No leading \
|
||||
hypothesis yet — symptoms are still being narrowed.">
|
||||
<The content depends on the outcome recorded for the active suggested fix:>
|
||||
|
||||
- proposed (no outcome yet): State the fix title and confidence. If confidence \
|
||||
is below 60% or there is no active fix, say "No leading hypothesis yet — \
|
||||
symptoms are still being narrowed."
|
||||
- applied_failed or dismissed: Say the proposed fix did not hold or was set \
|
||||
aside. State any remaining uncertainty.
|
||||
- applied_partial: Note the partial application and what remains open.
|
||||
- applied_success: Unusual in an escalate path — state the fix resolved the \
|
||||
original symptom but a new or related issue requires escalation.
|
||||
|
||||
## Suggested next steps
|
||||
<bulleted list of 2-4 concrete next actions the receiving engineer should \
|
||||
take. Prefer specifics: commands to run, tickets to check, people to contact. \
|
||||
Derive from the gap between confirmed facts and a complete resolution. If the \
|
||||
active suggested fix is high confidence (>80%), the first bullet is "Try the \
|
||||
suggested fix: <title>.">
|
||||
Derive from the gap between confirmed facts and a complete resolution. \
|
||||
If the active suggested fix failed (applied_failed), inform the next steps \
|
||||
accordingly — e.g. suggest alternatives or deeper investigation paths, \
|
||||
drawing on the failure reason if provided. \
|
||||
If the fix is partially applied (applied_partial), the first step is typically \
|
||||
to complete or verify it. \
|
||||
If the fix is still proposed (no outcome), the first step is to try it if \
|
||||
confidence is high (>80%).>
|
||||
|
||||
Strict rules:
|
||||
- Use ONLY the input I provide. Never invent command names, KB articles, or \
|
||||
@@ -269,6 +292,15 @@ class EscalationPackageGeneratorService:
|
||||
lines.append(f"Title: {active_fix.title}")
|
||||
lines.append(f"Confidence: {active_fix.confidence_pct}%")
|
||||
lines.append(f"Description: {active_fix.description}")
|
||||
lines.append(f"Outcome status: {active_fix.status}")
|
||||
if active_fix.applied_at:
|
||||
lines.append(f"Applied at: {active_fix.applied_at.isoformat()}")
|
||||
if active_fix.verified_at:
|
||||
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
|
||||
if active_fix.partial_notes:
|
||||
lines.append(f"Partial notes: {active_fix.partial_notes}")
|
||||
if active_fix.failure_reason:
|
||||
lines.append(f"Failure reason: {active_fix.failure_reason}")
|
||||
|
||||
lines.append("")
|
||||
lines.append(
|
||||
|
||||
@@ -69,11 +69,24 @@ say "Root cause not definitively isolated." and explain what is suspected based
|
||||
on facts.>
|
||||
|
||||
## Resolution
|
||||
<one short paragraph describing the resolution applied. If a script ran during \
|
||||
the session, mention it (e.g. "Cleared cached credentials via the \
|
||||
clear-outlook-credentials script."). If no resolution has been performed yet, \
|
||||
write "Resolution not yet applied — fix proposed: <fix title>." Pull verbatim \
|
||||
script names and template references when available.>
|
||||
<The content of this section depends on the outcome recorded for the active \
|
||||
suggested fix, as given in the input bundle under "fix.status":>
|
||||
|
||||
- applied_success: Write in past tense using closure language. State that the \
|
||||
fix was applied and verified as working. If verified_at is provided, you may \
|
||||
reference it as the time resolution was confirmed. Example phrasing: \
|
||||
"Applied <fix title>; confirmed working."
|
||||
- applied_failed: Acknowledge that the proposed fix did not resolve the issue \
|
||||
and was discarded. If failure_reason is provided, include it. Then describe \
|
||||
the actual resolution path taken (derived from facts and scripts run). This \
|
||||
state means the engineer resolved the issue another way; the note should cover \
|
||||
that actual resolution, not just the failed attempt.
|
||||
- applied_partial: Note that the fix was partially applied. If partial_notes \
|
||||
are provided, include them. Then describe the final resolution path taken.
|
||||
- dismissed: Treat the fix as considered and set aside. Do not center the note \
|
||||
on it. Describe the resolution based on what was actually confirmed and done.
|
||||
- proposed (no outcome yet): Write "Resolution not yet applied — fix proposed: \
|
||||
<fix title>." Pull verbatim script names and template references when available.
|
||||
|
||||
Strict rules:
|
||||
- Use ONLY the facts and state I provide. Never invent specifics that are not \
|
||||
@@ -302,6 +315,15 @@ class ResolutionNoteGeneratorService:
|
||||
lines.append(f"Description: {active_fix.description}")
|
||||
if active_fix.user_decision:
|
||||
lines.append(f"Engineer decision: {active_fix.user_decision}")
|
||||
lines.append(f"Outcome status: {active_fix.status}")
|
||||
if active_fix.applied_at:
|
||||
lines.append(f"Applied at: {active_fix.applied_at.isoformat()}")
|
||||
if active_fix.verified_at:
|
||||
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
|
||||
if active_fix.partial_notes:
|
||||
lines.append(f"Partial notes: {active_fix.partial_notes}")
|
||||
if active_fix.failure_reason:
|
||||
lines.append(f"Failure reason: {active_fix.failure_reason}")
|
||||
|
||||
lines.append("")
|
||||
lines.append("# Scripts run during the session (passwords redacted)")
|
||||
|
||||
@@ -5,13 +5,24 @@ Fixture style follows test_session_suggested_fixes_api.py:
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from unittest.mock import AsyncMock, call, patch
|
||||
|
||||
import pytest
|
||||
from httpx import AsyncClient
|
||||
from sqlalchemy import select
|
||||
|
||||
from app.api.endpoints.session_suggested_fixes import _clear_preview_cache_for_tests
|
||||
from app.models.ai_session import AISession
|
||||
from app.models.session_suggested_fix import SessionSuggestedFix
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _isolate_preview_cache():
|
||||
_clear_preview_cache_for_tests()
|
||||
yield
|
||||
_clear_preview_cache_for_tests()
|
||||
|
||||
|
||||
# ── shared helper ────────────────────────────────────────────────────────────
|
||||
|
||||
async def _make_session_with_fix(test_db, user) -> tuple[str, str]:
|
||||
@@ -197,3 +208,122 @@ async def test_failed_outcome_stores_notes_as_failure_reason(
|
||||
body = r.json()
|
||||
assert body["failure_reason"] == "user reports no change"
|
||||
assert body["partial_notes"] is None
|
||||
|
||||
|
||||
# ── state_version bump ────────────────────────────────────────────────────────
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_outcome_patch_bumps_state_version(
|
||||
client: AsyncClient, test_user, auth_headers, test_db
|
||||
):
|
||||
"""PATCH /outcome must increment ai_sessions.state_version (like record_decision)."""
|
||||
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
|
||||
|
||||
# Capture the initial state_version from DB.
|
||||
from uuid import UUID
|
||||
result = await test_db.execute(
|
||||
select(AISession).where(AISession.id == UUID(session_id))
|
||||
)
|
||||
session_obj = result.scalar_one()
|
||||
initial_version = session_obj.state_version
|
||||
|
||||
r = await client.patch(
|
||||
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
|
||||
json={"outcome": "applied_success"},
|
||||
headers=auth_headers,
|
||||
)
|
||||
assert r.status_code == 200
|
||||
|
||||
await test_db.refresh(session_obj)
|
||||
assert session_obj.state_version == initial_version + 1, (
|
||||
"Outcome patch must bump state_version so preview cache is invalidated"
|
||||
)
|
||||
|
||||
|
||||
# ── outcome propagation into preview bundle ───────────────────────────────────
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resolution_note_preview_reflects_outcome_after_patch(
|
||||
client: AsyncClient, test_user, auth_headers, test_db
|
||||
):
|
||||
"""End-to-end: preview before outcome != preview after outcome; new preview
|
||||
bundle includes failure_reason; state_version was bumped between the two.
|
||||
|
||||
The LLM is stubbed so the test is deterministic. The stub returns whatever
|
||||
the user-message content is, which means the captured call args reflect
|
||||
what the bundle actually contained.
|
||||
"""
|
||||
session_id, fix_id = await _make_session_with_fix(test_db, test_user)
|
||||
|
||||
distinct_failure_reason = "DISTINCT-FAILURE-REASON-XYZZY-42"
|
||||
|
||||
calls_made: list[str] = []
|
||||
|
||||
async def fake_generate_text(system_prompt, messages, max_tokens):
|
||||
user_content = messages[0]["content"]
|
||||
calls_made.append(user_content)
|
||||
# Return markdown that includes the user-message bundle verbatim so we
|
||||
# can assert the bundle shape without inspecting mock internals.
|
||||
return (
|
||||
f"## Problem\ntest\n\n## What we confirmed\n(none)\n\n"
|
||||
f"## Root cause\ntest\n\n## Resolution\nBUNDLE_CONTENT={user_content}",
|
||||
100,
|
||||
50,
|
||||
)
|
||||
|
||||
fake_provider = AsyncMock()
|
||||
fake_provider.generate_text = AsyncMock(side_effect=fake_generate_text)
|
||||
|
||||
with patch(
|
||||
"app.services.resolution_note_generator.get_ai_provider",
|
||||
return_value=fake_provider,
|
||||
):
|
||||
# Preview A — before any outcome recorded (status = "proposed").
|
||||
r_a = await client.post(
|
||||
f"/api/v1/ai-sessions/{session_id}/resolution-note/preview",
|
||||
headers=auth_headers,
|
||||
)
|
||||
assert r_a.status_code == 200
|
||||
markdown_a = r_a.json()["markdown"]
|
||||
version_a = r_a.json()["state_version"]
|
||||
assert r_a.json()["from_cache"] is False
|
||||
|
||||
# Record an applied_failed outcome with a distinctive reason.
|
||||
r_patch = await client.patch(
|
||||
f"/api/v1/ai-sessions/{session_id}/suggested-fixes/{fix_id}/outcome",
|
||||
json={"outcome": "applied_failed", "notes": distinct_failure_reason},
|
||||
headers=auth_headers,
|
||||
)
|
||||
assert r_patch.status_code == 200
|
||||
|
||||
# Preview B — must be a cache miss because state_version changed.
|
||||
r_b = await client.post(
|
||||
f"/api/v1/ai-sessions/{session_id}/resolution-note/preview",
|
||||
headers=auth_headers,
|
||||
)
|
||||
assert r_b.status_code == 200
|
||||
markdown_b = r_b.json()["markdown"]
|
||||
version_b = r_b.json()["state_version"]
|
||||
assert r_b.json()["from_cache"] is False, (
|
||||
"Preview after outcome patch must be a cache miss (state_version changed)"
|
||||
)
|
||||
|
||||
# State version increased between the two previews.
|
||||
assert version_b > version_a, (
|
||||
f"state_version should have increased; got {version_a} → {version_b}"
|
||||
)
|
||||
|
||||
# Markdown differs between the two previews.
|
||||
assert markdown_a != markdown_b, (
|
||||
"Regenerated preview after outcome patch should differ from pre-outcome preview"
|
||||
)
|
||||
|
||||
# The bundle passed to the LLM for preview B includes the outcome fields.
|
||||
assert len(calls_made) == 2, f"Expected 2 LLM calls (one per preview); got {len(calls_made)}"
|
||||
bundle_b = calls_made[1]
|
||||
assert "applied_failed" in bundle_b, (
|
||||
"Bundle for second preview should include 'Outcome status: applied_failed'"
|
||||
)
|
||||
assert distinct_failure_reason in bundle_b, (
|
||||
"Bundle for second preview should include the failure_reason text"
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user