Tracks the three code-review issues that were fixed on this branch (#1 outcome-aware previews, #2 persist Apply, #3 persist proposal rejection) plus a newly-documented pre-existing test failure (#4 — decision-endpoint test written in Phase 3 never updated when Phase 5 added the drafted-script validation guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.8 KiB
Phase 8 Review Issues
Date: 2026-04-23
Scope reviewed:
backend/app/api/endpoints/session_suggested_fixes.pybackend/app/services/unified_chat_service.pyfrontend/src/pages/AssistantChatPage.tsxfrontend/src/components/pilot/ProposalBanner.tsxfrontend/src/components/pilot/EscalateInterceptDialog.tsx
1. Outcome writes do not invalidate Resolve/Escalate preview cache
Severity: High
PATCH /suggested-fixes/{fix_id}/outcome updates the fix row but does not bump
ai_sessions.state_version. Even after adding that bump, the preview input
bundle also needs to include the fix outcome fields; otherwise a regenerated
preview still cannot distinguish proposed, partially applied, failed, or
successful fixes.
Relevant files:
backend/app/api/endpoints/session_suggested_fixes.py:226backend/app/api/endpoints/session_suggested_fixes.py:146backend/app/services/resolution_note_generator.py:13backend/app/services/escalation_package_generator.py:14
Why this matters:
- Resolve and Escalate previews are cached by
(session_id, state_version). - The decision endpoint already bumps
state_version. - The new outcome endpoint does not.
- A user can record
applied_success/applied_failed/applied_partialand still see markdown generated from the pre-outcome session state. - The preview generators currently pass only the active fix title,
confidence, description, and user decision into the LLM bundle. They do not
pass
status,applied_at,verified_at,partial_notes, orfailure_reason. - Therefore a cache miss alone is not enough: the generated markdown may still describe the fix as merely proposed because the outcome is absent from the prompt input.
Recommended fix:
- Bump
AISession.state_versioninside the outcome endpoint transaction. - Include suggested-fix outcome state in both preview bundles:
statusapplied_atverified_atpartial_notesfailure_reason
- Update the resolution-note prompt expectations so
applied_successproduces closure language,applied_failedstates that the proposed fix did not resolve the issue, andapplied_partialincludes the engineer's partial notes. - Update the escalation-package prompt expectations so failed/partial outcomes appear under "What we've tried" and inform "Suggested next steps."
- Add a test proving a preview generated before an outcome change is invalidated after the outcome patch and that the regenerated preview input includes the recorded outcome.
2. "Apply" is not persisted, so Verifying state is lost on reload/reselect
Severity: High
Phase 8 introduces a Verifying lifecycle in the UI, but clicking Apply only sets local React state.
Relevant files:
frontend/src/pages/AssistantChatPage.tsx:142frontend/src/pages/AssistantChatPage.tsx:516backend/app/api/endpoints/session_suggested_fixes.py:276
Why this matters:
bannerAppliedis a client-side-only flag.handleApplyFix()opens the script panel and flips local state, but does not persist anything.applied_atis only stamped later when an outcome is patched.- After refresh, chat reselect, or multi-tab use, a fix that had entered
Verifying falls back to
proposed. - Nudge timing, resolve auto-success, and escalate interception therefore do not survive normal session resume.
Recommended fix:
- Persist "apply started" as part of the fix lifecycle.
- Either add an explicit backend transition for apply/start-verifying, or
persist
applied_atwhen Apply is clicked. - Add a test or browser regression check covering refresh/reselect continuity.
3. Rejecting an AI outcome proposal is only local and will reappear
Severity: Medium
Rejecting the AI-confirming banner clears ai_outcome_proposal only in local
component state.
Relevant files:
frontend/src/pages/AssistantChatPage.tsx:571frontend/src/pages/AssistantChatPage.tsx:431
Why this matters:
handleRejectAIProposal()only updates localactiveFix.- The server-side
ai_outcome_proposalremains unchanged. - The proposal comes back on the next
refreshSessionDerived()call, which happens after sends, task submissions, and chat selection. - "Not yet" is therefore a temporary hide, not a real rejection/correction.
Recommended fix:
- Add a backend way to clear or reject
ai_outcome_proposal. - Make the reject action persist so the banner does not immediately re-arm on the next refetch.
4. Pre-existing failing decision test
Severity: Low (test gap, no runtime regression)
tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version
was authored in Phase 3 (66e5920) when the decision endpoint had no
validation on ai_drafted_script. Phase 5 (fa61376) added a 400 guard:
when the decision is one_off, draft_template, or build_template and the
fix has no ai_drafted_script (and the caller provides no edited_script in
the request body), the endpoint returns 400 with the message "Suggested fix has
no ai_drafted_script — use /api/v1/scripts/generate for template-matched
fixes."
The test creates a fix without an ai_drafted_script and posts
{"decision": "draft_template"} naked, so the guard fires and returns 400. The
test still asserts 200. This was already broken before Phase 8 began — commit
cdd8bb0 (first Phase 8 commit) is 8 commits after fa61376.
Root cause: test was never updated to match the Phase 5 contract change.
Recommended fix for the next branch:
- Option A (minimal): supply
ai_drafted_script="echo hello"when creating the fix fixture, or addedited_scriptto the POST body. Validates the happy path fordraft_templatewith a real drafted body. - Option B (comprehensive): add a separate test case asserting the 400 when
ai_drafted_scriptis null and noedited_scriptis provided, then fix the existing test as in Option A. The 400-guard already has coverage in the Phase 5 test file; the main gap is just the missing fixture update here.
No Phase 8 code change required — this is a test-fixture gap from Phase 3/5 drift, not a regression introduced in this branch.
Test Context
Relevant backend suites were run serially from backend/:
pytest tests/test_fix_outcome_endpoint.py tests/test_fix_outcome_marker.py tests/test_session_suggested_fixes_api.py -q
Observed result:
28 passed1 failed
Remaining failure:
tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version
Notes:
- That failing test is in the older decision-path suite and expects
draft_templateto succeed without a drafted script. - The new outcome endpoint tests and marker parser tests passed in the serial run.
- The three issues above are based on code inspection and remain valid regardless of that separate failing test.
- Full root cause analysis documented in section 4 above.