docs(pilot): log Issues #1-4 findings for Phase 8 review
Tracks the three code-review issues that were fixed on this branch (#1 outcome-aware previews, #2 persist Apply, #3 persist proposal rejection) plus a newly-documented pre-existing test failure (#4 — decision-endpoint test written in Phase 3 never updated when Phase 5 added the drafted-script validation guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
165
docs/FlowAssist_Migration/Issues/phase-8-review-issues.md
Normal file
165
docs/FlowAssist_Migration/Issues/phase-8-review-issues.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Phase 8 Review Issues
|
||||
|
||||
Date: 2026-04-23
|
||||
|
||||
Scope reviewed:
|
||||
- `backend/app/api/endpoints/session_suggested_fixes.py`
|
||||
- `backend/app/services/unified_chat_service.py`
|
||||
- `frontend/src/pages/AssistantChatPage.tsx`
|
||||
- `frontend/src/components/pilot/ProposalBanner.tsx`
|
||||
- `frontend/src/components/pilot/EscalateInterceptDialog.tsx`
|
||||
|
||||
## 1. Outcome writes do not invalidate Resolve/Escalate preview cache
|
||||
|
||||
Severity: High
|
||||
|
||||
`PATCH /suggested-fixes/{fix_id}/outcome` updates the fix row but does not bump
|
||||
`ai_sessions.state_version`. Even after adding that bump, the preview input
|
||||
bundle also needs to include the fix outcome fields; otherwise a regenerated
|
||||
preview still cannot distinguish proposed, partially applied, failed, or
|
||||
successful fixes.
|
||||
|
||||
Relevant files:
|
||||
- `backend/app/api/endpoints/session_suggested_fixes.py:226`
|
||||
- `backend/app/api/endpoints/session_suggested_fixes.py:146`
|
||||
- `backend/app/services/resolution_note_generator.py:13`
|
||||
- `backend/app/services/escalation_package_generator.py:14`
|
||||
|
||||
Why this matters:
|
||||
- Resolve and Escalate previews are cached by `(session_id, state_version)`.
|
||||
- The decision endpoint already bumps `state_version`.
|
||||
- The new outcome endpoint does not.
|
||||
- A user can record `applied_success` / `applied_failed` / `applied_partial`
|
||||
and still see markdown generated from the pre-outcome session state.
|
||||
- The preview generators currently pass only the active fix title,
|
||||
confidence, description, and user decision into the LLM bundle. They do not
|
||||
pass `status`, `applied_at`, `verified_at`, `partial_notes`, or
|
||||
`failure_reason`.
|
||||
- Therefore a cache miss alone is not enough: the generated markdown may still
|
||||
describe the fix as merely proposed because the outcome is absent from the
|
||||
prompt input.
|
||||
|
||||
Recommended fix:
|
||||
- Bump `AISession.state_version` inside the outcome endpoint transaction.
|
||||
- Include suggested-fix outcome state in both preview bundles:
|
||||
- `status`
|
||||
- `applied_at`
|
||||
- `verified_at`
|
||||
- `partial_notes`
|
||||
- `failure_reason`
|
||||
- Update the resolution-note prompt expectations so `applied_success` produces
|
||||
closure language, `applied_failed` states that the proposed fix did not
|
||||
resolve the issue, and `applied_partial` includes the engineer's partial
|
||||
notes.
|
||||
- Update the escalation-package prompt expectations so failed/partial outcomes
|
||||
appear under "What we've tried" and inform "Suggested next steps."
|
||||
- Add a test proving a preview generated before an outcome change is
|
||||
invalidated after the outcome patch and that the regenerated preview input
|
||||
includes the recorded outcome.
|
||||
|
||||
## 2. "Apply" is not persisted, so Verifying state is lost on reload/reselect
|
||||
|
||||
Severity: High
|
||||
|
||||
Phase 8 introduces a Verifying lifecycle in the UI, but clicking Apply only
|
||||
sets local React state.
|
||||
|
||||
Relevant files:
|
||||
- `frontend/src/pages/AssistantChatPage.tsx:142`
|
||||
- `frontend/src/pages/AssistantChatPage.tsx:516`
|
||||
- `backend/app/api/endpoints/session_suggested_fixes.py:276`
|
||||
|
||||
Why this matters:
|
||||
- `bannerApplied` is a client-side-only flag.
|
||||
- `handleApplyFix()` opens the script panel and flips local state, but does not
|
||||
persist anything.
|
||||
- `applied_at` is only stamped later when an outcome is patched.
|
||||
- After refresh, chat reselect, or multi-tab use, a fix that had entered
|
||||
Verifying falls back to `proposed`.
|
||||
- Nudge timing, resolve auto-success, and escalate interception therefore do
|
||||
not survive normal session resume.
|
||||
|
||||
Recommended fix:
|
||||
- Persist "apply started" as part of the fix lifecycle.
|
||||
- Either add an explicit backend transition for apply/start-verifying, or
|
||||
persist `applied_at` when Apply is clicked.
|
||||
- Add a test or browser regression check covering refresh/reselect continuity.
|
||||
|
||||
## 3. Rejecting an AI outcome proposal is only local and will reappear
|
||||
|
||||
Severity: Medium
|
||||
|
||||
Rejecting the AI-confirming banner clears `ai_outcome_proposal` only in local
|
||||
component state.
|
||||
|
||||
Relevant files:
|
||||
- `frontend/src/pages/AssistantChatPage.tsx:571`
|
||||
- `frontend/src/pages/AssistantChatPage.tsx:431`
|
||||
|
||||
Why this matters:
|
||||
- `handleRejectAIProposal()` only updates local `activeFix`.
|
||||
- The server-side `ai_outcome_proposal` remains unchanged.
|
||||
- The proposal comes back on the next `refreshSessionDerived()` call, which
|
||||
happens after sends, task submissions, and chat selection.
|
||||
- "Not yet" is therefore a temporary hide, not a real rejection/correction.
|
||||
|
||||
Recommended fix:
|
||||
- Add a backend way to clear or reject `ai_outcome_proposal`.
|
||||
- Make the reject action persist so the banner does not immediately re-arm on
|
||||
the next refetch.
|
||||
|
||||
## 4. Pre-existing failing decision test
|
||||
|
||||
Severity: Low (test gap, no runtime regression)
|
||||
|
||||
`tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version`
|
||||
was authored in Phase 3 (`66e5920`) when the `decision` endpoint had no
|
||||
validation on `ai_drafted_script`. Phase 5 (`fa61376`) added a 400 guard:
|
||||
when the decision is `one_off`, `draft_template`, or `build_template` and the
|
||||
fix has no `ai_drafted_script` (and the caller provides no `edited_script` in
|
||||
the request body), the endpoint returns 400 with the message "Suggested fix has
|
||||
no ai_drafted_script — use /api/v1/scripts/generate for template-matched
|
||||
fixes."
|
||||
|
||||
The test creates a fix without an `ai_drafted_script` and posts
|
||||
`{"decision": "draft_template"}` naked, so the guard fires and returns 400. The
|
||||
test still asserts 200. This was already broken before Phase 8 began — commit
|
||||
`cdd8bb0` (first Phase 8 commit) is 8 commits after `fa61376`.
|
||||
|
||||
Root cause: test was never updated to match the Phase 5 contract change.
|
||||
|
||||
Recommended fix for the next branch:
|
||||
- Option A (minimal): supply `ai_drafted_script="echo hello"` when creating the
|
||||
fix fixture, or add `edited_script` to the POST body. Validates the happy path
|
||||
for `draft_template` with a real drafted body.
|
||||
- Option B (comprehensive): add a separate test case asserting the 400 when
|
||||
`ai_drafted_script` is null and no `edited_script` is provided, then fix the
|
||||
existing test as in Option A. The 400-guard already has coverage in the
|
||||
Phase 5 test file; the main gap is just the missing fixture update here.
|
||||
|
||||
No Phase 8 code change required — this is a test-fixture gap from Phase 3/5
|
||||
drift, not a regression introduced in this branch.
|
||||
|
||||
## Test Context
|
||||
|
||||
Relevant backend suites were run serially from `backend/`:
|
||||
|
||||
```bash
|
||||
pytest tests/test_fix_outcome_endpoint.py tests/test_fix_outcome_marker.py tests/test_session_suggested_fixes_api.py -q
|
||||
```
|
||||
|
||||
Observed result:
|
||||
- `28 passed`
|
||||
- `1 failed`
|
||||
|
||||
Remaining failure:
|
||||
- `tests/test_session_suggested_fixes_api.py::test_record_decision_persists_and_bumps_state_version`
|
||||
|
||||
Notes:
|
||||
- That failing test is in the older decision-path suite and expects
|
||||
`draft_template` to succeed without a drafted script.
|
||||
- The new outcome endpoint tests and marker parser tests passed in the serial
|
||||
run.
|
||||
- The three issues above are based on code inspection and remain valid
|
||||
regardless of that separate failing test.
|
||||
- Full root cause analysis documented in section 4 above.
|
||||
Reference in New Issue
Block a user