diff --git a/docs/FlowAssist_Migration/Issues/phase-8-review-issues.md b/docs/FlowAssist_Migration/Issues/phase-8-review-issues.md new file mode 100644 index 00000000..00ad9cfa --- /dev/null +++ b/docs/FlowAssist_Migration/Issues/phase-8-review-issues.md @@ -0,0 +1,165 @@ +# Phase 8 Review Issues + +Date: 2026-04-23 + +Scope reviewed: +- `backend/app/api/endpoints/session_suggested_fixes.py` +- `backend/app/services/unified_chat_service.py` +- `frontend/src/pages/AssistantChatPage.tsx` +- `frontend/src/components/pilot/ProposalBanner.tsx` +- `frontend/src/components/pilot/EscalateInterceptDialog.tsx` + +## 1. Outcome writes do not invalidate Resolve/Escalate preview cache + +Severity: High + +`PATCH /suggested-fixes/{fix_id}/outcome` updates the fix row but does not bump +`ai_sessions.state_version`. Even after adding that bump, the preview input +bundle also needs to include the fix outcome fields; otherwise a regenerated +preview still cannot distinguish proposed, partially applied, failed, or +successful fixes. + +Relevant files: +- `backend/app/api/endpoints/session_suggested_fixes.py:226` +- `backend/app/api/endpoints/session_suggested_fixes.py:146` +- `backend/app/services/resolution_note_generator.py:13` +- `backend/app/services/escalation_package_generator.py:14` + +Why this matters: +- Resolve and Escalate previews are cached by `(session_id, state_version)`. +- The decision endpoint already bumps `state_version`. +- The new outcome endpoint does not. +- A user can record `applied_success` / `applied_failed` / `applied_partial` + and still see markdown generated from the pre-outcome session state. +- The preview generators currently pass only the active fix title, + confidence, description, and user decision into the LLM bundle. They do not + pass `status`, `applied_at`, `verified_at`, `partial_notes`, or + `failure_reason`. +- Therefore a cache miss alone is not enough: the generated markdown may still + describe the fix as merely proposed because the outcome is absent from the + prompt input. + +Recommended fix: +- Bump `AISession.state_version` inside the outcome endpoint transaction. +- Include suggested-fix outcome state in both preview bundles: + - `status` + - `applied_at` + - `verified_at` + - `partial_notes` + - `failure_reason` +- Update the resolution-note prompt expectations so `applied_success` produces + closure language, `applied_failed` states that the proposed fix did not + resolve the issue, and `applied_partial` includes the engineer's partial + notes. +- Update the escalation-package prompt expectations so failed/partial outcomes + appear under "What we've tried" and inform "Suggested next steps." +- Add a test proving a preview generated before an outcome change is + invalidated after the outcome patch and that the regenerated preview input + includes the recorded outcome. + +## 2. "Apply" is not persisted, so Verifying state is lost on reload/reselect + +Severity: High + +Phase 8 introduces a Verifying lifecycle in the UI, but clicking Apply only +sets local React state. + +Relevant files: +- `frontend/src/pages/AssistantChatPage.tsx:142` +- `frontend/src/pages/AssistantChatPage.tsx:516` +- `backend/app/api/endpoints/session_suggested_fixes.py:276` + +Why this matters: +- `bannerApplied` is a client-side-only flag. +- `handleApplyFix()` opens the script panel and flips local state, but does not + persist anything. +- `applied_at` is only stamped later when an outcome is patched. +- After refresh, chat reselect, or multi-tab use, a fix that had entered + Verifying falls back to `proposed`. +- Nudge timing, resolve auto-success, and escalate interception therefore do + not survive normal session resume. + +Recommended fix: +- Persist "apply started" as part of the fix lifecycle. +- Either add an explicit backend transition for apply/start-verifying, or + persist `applied_at` when Apply is clicked. +- Add a test or browser regression check covering refresh/reselect continuity. + +## 3. Rejecting an AI outcome proposal is only local and will reappear + +Severity: Medium + +Rejecting the AI-confirming banner clears `ai_outcome_proposal` only in local +component state. + +Relevant files: +- `frontend/src/pages/AssistantChatPage.tsx:571` +- `frontend/src/pages/AssistantChatPage.tsx:431` + +Why this matters: +- `handleRejectAIProposal()` only updates local `activeFix`. +- The server-side `ai_outcome_proposal` remains unchanged. +- The proposal comes back on the next `refreshSessionDerived()` call, which + happens after sends, task submissions, and chat selection. +- "Not yet" is therefore a temporary hide, not a real rejection/correction. + +Recommended fix: +- Add a backend way to clear or reject `ai_outcome_proposal`. +- Make the reject action persist so the banner does not immediately re-arm on + the next refetch. + +## 4. Pre-existing failing decision test + +Severity: Low (test gap, no runtime regression) + +`tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version` +was authored in Phase 3 (`66e5920`) when the `decision` endpoint had no +validation on `ai_drafted_script`. Phase 5 (`fa61376`) added a 400 guard: +when the decision is `one_off`, `draft_template`, or `build_template` and the +fix has no `ai_drafted_script` (and the caller provides no `edited_script` in +the request body), the endpoint returns 400 with the message "Suggested fix has +no ai_drafted_script — use /api/v1/scripts/generate for template-matched +fixes." + +The test creates a fix without an `ai_drafted_script` and posts +`{"decision": "draft_template"}` naked, so the guard fires and returns 400. The +test still asserts 200. This was already broken before Phase 8 began — commit +`cdd8bb0` (first Phase 8 commit) is 8 commits after `fa61376`. + +Root cause: test was never updated to match the Phase 5 contract change. + +Recommended fix for the next branch: +- Option A (minimal): supply `ai_drafted_script="echo hello"` when creating the + fix fixture, or add `edited_script` to the POST body. Validates the happy path + for `draft_template` with a real drafted body. +- Option B (comprehensive): add a separate test case asserting the 400 when + `ai_drafted_script` is null and no `edited_script` is provided, then fix the + existing test as in Option A. The 400-guard already has coverage in the + Phase 5 test file; the main gap is just the missing fixture update here. + +No Phase 8 code change required — this is a test-fixture gap from Phase 3/5 +drift, not a regression introduced in this branch. + +## Test Context + +Relevant backend suites were run serially from `backend/`: + +```bash +pytest tests/test_fix_outcome_endpoint.py tests/test_fix_outcome_marker.py tests/test_session_suggested_fixes_api.py -q +``` + +Observed result: +- `28 passed` +- `1 failed` + +Remaining failure: +- `tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version` + +Notes: +- That failing test is in the older decision-path suite and expects + `draft_template` to succeed without a drafted script. +- The new outcome endpoint tests and marker parser tests passed in the serial + run. +- The three issues above are based on code inspection and remain valid + regardless of that separate failing test. +- Full root cause analysis documented in section 4 above.