Files

Michael Chihlas f2b9476edb docs(pilot): log Issues #1-4 findings for Phase 8 review

Tracks the three code-review issues that were fixed on this branch
(#1 outcome-aware previews, #2 persist Apply, #3 persist proposal
rejection) plus a newly-documented pre-existing test failure (#4 —
decision-endpoint test written in Phase 3 never updated when Phase 5
added the drafted-script validation guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 22:18:13 -04:00

6.8 KiB

Raw Blame History

Phase 8 Review Issues

Date: 2026-04-23

Scope reviewed:

backend/app/api/endpoints/session_suggested_fixes.py
backend/app/services/unified_chat_service.py
frontend/src/pages/AssistantChatPage.tsx
frontend/src/components/pilot/ProposalBanner.tsx
frontend/src/components/pilot/EscalateInterceptDialog.tsx

1. Outcome writes do not invalidate Resolve/Escalate preview cache

Severity: High

PATCH /suggested-fixes/{fix_id}/outcome updates the fix row but does not bump ai_sessions.state_version. Even after adding that bump, the preview input bundle also needs to include the fix outcome fields; otherwise a regenerated preview still cannot distinguish proposed, partially applied, failed, or successful fixes.

Relevant files:

backend/app/api/endpoints/session_suggested_fixes.py:226
backend/app/api/endpoints/session_suggested_fixes.py:146
backend/app/services/resolution_note_generator.py:13
backend/app/services/escalation_package_generator.py:14

Why this matters:

Resolve and Escalate previews are cached by (session_id, state_version).
The decision endpoint already bumps state_version.
The new outcome endpoint does not.
A user can record applied_success / applied_failed / applied_partial and still see markdown generated from the pre-outcome session state.
The preview generators currently pass only the active fix title, confidence, description, and user decision into the LLM bundle. They do not pass status, applied_at, verified_at, partial_notes, or failure_reason.
Therefore a cache miss alone is not enough: the generated markdown may still describe the fix as merely proposed because the outcome is absent from the prompt input.

Recommended fix:

Bump AISession.state_version inside the outcome endpoint transaction.
Include suggested-fix outcome state in both preview bundles:
- status
- applied_at
- verified_at
- partial_notes
- failure_reason
Update the resolution-note prompt expectations so applied_success produces closure language, applied_failed states that the proposed fix did not resolve the issue, and applied_partial includes the engineer's partial notes.
Update the escalation-package prompt expectations so failed/partial outcomes appear under "What we've tried" and inform "Suggested next steps."
Add a test proving a preview generated before an outcome change is invalidated after the outcome patch and that the regenerated preview input includes the recorded outcome.

2. "Apply" is not persisted, so Verifying state is lost on reload/reselect

Severity: High

Phase 8 introduces a Verifying lifecycle in the UI, but clicking Apply only sets local React state.

Relevant files:

frontend/src/pages/AssistantChatPage.tsx:142
frontend/src/pages/AssistantChatPage.tsx:516
backend/app/api/endpoints/session_suggested_fixes.py:276

Why this matters:

bannerApplied is a client-side-only flag.
handleApplyFix() opens the script panel and flips local state, but does not persist anything.
applied_at is only stamped later when an outcome is patched.
After refresh, chat reselect, or multi-tab use, a fix that had entered Verifying falls back to proposed.
Nudge timing, resolve auto-success, and escalate interception therefore do not survive normal session resume.

Recommended fix:

Persist "apply started" as part of the fix lifecycle.
Either add an explicit backend transition for apply/start-verifying, or persist applied_at when Apply is clicked.
Add a test or browser regression check covering refresh/reselect continuity.

3. Rejecting an AI outcome proposal is only local and will reappear

Severity: Medium

Rejecting the AI-confirming banner clears ai_outcome_proposal only in local component state.

Relevant files:

frontend/src/pages/AssistantChatPage.tsx:571
frontend/src/pages/AssistantChatPage.tsx:431

Why this matters:

handleRejectAIProposal() only updates local activeFix.
The server-side ai_outcome_proposal remains unchanged.
The proposal comes back on the next refreshSessionDerived() call, which happens after sends, task submissions, and chat selection.
"Not yet" is therefore a temporary hide, not a real rejection/correction.

Recommended fix:

Add a backend way to clear or reject ai_outcome_proposal.
Make the reject action persist so the banner does not immediately re-arm on the next refetch.

4. Pre-existing failing decision test

Severity: Low (test gap, no runtime regression)

tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version was authored in Phase 3 (66e5920) when the decision endpoint had no validation on ai_drafted_script. Phase 5 (fa61376) added a 400 guard: when the decision is one_off, draft_template, or build_template and the fix has no ai_drafted_script (and the caller provides no edited_script in the request body), the endpoint returns 400 with the message "Suggested fix has no ai_drafted_script — use /api/v1/scripts/generate for template-matched fixes."

The test creates a fix without an ai_drafted_script and posts {"decision": "draft_template"} naked, so the guard fires and returns 400. The test still asserts 200. This was already broken before Phase 8 began — commit cdd8bb0 (first Phase 8 commit) is 8 commits after fa61376.

Root cause: test was never updated to match the Phase 5 contract change.

Recommended fix for the next branch:

Option A (minimal): supply ai_drafted_script="echo hello" when creating the fix fixture, or add edited_script to the POST body. Validates the happy path for draft_template with a real drafted body.
Option B (comprehensive): add a separate test case asserting the 400 when ai_drafted_script is null and no edited_script is provided, then fix the existing test as in Option A. The 400-guard already has coverage in the Phase 5 test file; the main gap is just the missing fixture update here.

No Phase 8 code change required — this is a test-fixture gap from Phase 3/5 drift, not a regression introduced in this branch.

Test Context

Relevant backend suites were run serially from backend/:

pytest tests/test_fix_outcome_endpoint.py tests/test_fix_outcome_marker.py tests/test_session_suggested_fixes_api.py -q

Observed result:

28 passed
1 failed

Remaining failure:

tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version

Notes:

That failing test is in the older decision-path suite and expects draft_template to succeed without a drafted script.
The new outcome endpoint tests and marker parser tests passed in the serial run.
The three issues above are based on code inspection and remain valid regardless of that separate failing test.
Full root cause analysis documented in section 4 above.

6.8 KiB Raw Blame History

Phase 8 Review Issues

1. Outcome writes do not invalidate Resolve/Escalate preview cache

2. "Apply" is not persisted, so Verifying state is lost on reload/reselect

3. Rejecting an AI outcome proposal is only local and will reappear

4. Pre-existing failing decision test

Test Context

6.8 KiB

Raw Blame History