docs(pilot): revise Phase 9 spec per review findings

Four findings addressed:

- High: drop proposed parent_pilot_session_id column; reuse the
  existing ai_session_id FK on script_builder_sessions. Add an
  origin + ai_session_id coherence invariant.
- High: don't add a 'mode' prop to ScriptBuilderChat (it's
  presentational). Introduce a ScriptBuilderTab controller that owns
  session lifecycle + submit, renders ScriptBuilderChat unchanged.
- Medium: filter list_sessions / count_user_sessions to origin='standalone'
  so pilot_inline scratch sessions don't pollute the /script-builder
  dashboard or count against the 5-session cap.
- Medium: applied_at is stamped only when the engineer commits to a
  run-action (one_off, TemplateMatchPanel Run), not on banner Apply
  click. Corrects a Phase 8 over-eager stamp that would otherwise
  multiply across three surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-23 23:28:53 -04:00
parent 196c003876
commit fcd224429c

View File

@@ -13,9 +13,10 @@ Close the two remaining open items from the FlowPilot migration handoff:
1. **NoTemplateDialog narrow-lane bug** — today the dialog renders in the task lane (~340px) and its `grid-cols-1 sm:grid-cols-3` layout crushes the three option cards. When the AI proposes a fix with no drafted script, all three cards render disabled, producing a dead end.
2. **Tabbed Script Builder inside the chat** — give the engineer a way to draft the missing script without leaving the session (either by chatting with the AI or hand-writing in a code editor), then feed the draft back into the existing fix lifecycle.
Plus one Phase 8 cleanup item flagged during code review:
Plus two Phase 8 cleanup items flagged during code review:
3. **`EscalateInterceptDialog` missing the Partial choice** — if a fix is in `applied_partial` when the engineer escalates, the intercept dialog's current three choices (worked / didn't work / never applied) don't match. Add a fourth choice for partial.
4. **`applied_at` semantics correction** — today Phase 8's `handleApplyFix` stamps `applied_at` on every banner Apply click, starting the Verifying timer even when the engineer is only opening a drafting/evaluation surface. Move the stamp to the actual run-action handlers (see §5).
This phase depends on Phase 8's `ProposalBanner` already being in the chat region — it reuses the same "chat-region-owns-Apply-flow" philosophy.
@@ -26,11 +27,12 @@ This phase depends on Phase 8's `ProposalBanner` already being in the chat regio
| # | Decision | Rationale |
|---|---|---|
| 1 | When a fix has no `ai_drafted_script`, the banner's Apply button routes **directly** to the Script Builder tab (bypassing `NoTemplateDialog` entirely). | Banner is the single entry point for Apply. `NoTemplateDialog` stays narrowly scoped to evaluating a draft that actually exists. |
| 2 | Inside the Script Builder tab, the default experience is **`ScriptBuilderChat` (AI-driven)**. A "✎ Write it myself" button in the tab's header toolbar swaps into a Monaco editor. | AI is the common path. Manual editing is an escape hatch for engineers who already know what they want. |
| 2 | Inside the Script Builder tab, the default experience is AI-driven — a new `ScriptBuilderTab` controller owns session lifecycle + submit, and *renders* `ScriptBuilderChat` (which stays purely presentational). A "✎ Write it myself" button in the tab's header toolbar swaps the controller's render into a Monaco editor. | AI is the common path. Persistence semantics belong on the controller, not the chat display component (`ScriptBuilderChat` already exposes `onSaveScript` as its seam — the controller wires that callback). |
| 3 | The manual editor uses **Monaco**, reusing the pattern from `frontend/src/components/tree-editor/code-mode/CodeModeEditor.tsx`. | Monaco is already a dependency (`@monaco-editor/react` + `monaco-editor`). No bundle cost, proven pattern. |
| 4 | The Script Builder tab is **always present while the fix is non-terminal** (no close affordance). An **indicator dot** on the tab signals in-progress draft state. | Matches Phase 8's `display: none` philosophy — engineers move freely between chat and draft without tracking a separate open/close state. |
| 5 | `NoTemplateDialog` (draft-exists case) moves from `TaskLane.bottomSlot` to the **chat region** (sibling of `ProposalBanner`, slides up above composer). | Script evaluation is an action surface, not a context surface — belongs with the other action surfaces. Chat region is wide enough for the three cards to actually fit side-by-side. |
| 6 | `EscalateInterceptDialog` gains a **fourth "Partial" choice** that writes `applied_partial` with a notes prompt. | Closes the gap flagged in Phase 8 final review. Minimal incremental cost since the dialog is already getting touched. |
| 7 | `applied_at` is stamped only when the engineer commits to an action that **runs or triggers** a script — not on banner Apply click. Opening a drafting/evaluation surface no longer starts the Verifying timer. | Prevents false "applied" state when the engineer is still authoring. Corrects a Phase 8 over-eager stamp that this phase would otherwise multiply across three surfaces. |
---
@@ -58,7 +60,14 @@ A two-tab strip at the top of the chat region:
### 2. Script Builder tab content
Default mode is `ScriptBuilderChat` embedded inside the tab. A header toolbar above the chat hosts the mode toggle:
A new controller component `ScriptBuilderTab` owns the inline lifecycle:
- Creates / resumes a `script_builder_sessions` row with `origin='pilot_inline'` + `ai_session_id = <pilot session id>`.
- Manages AI-chat message state (via the existing script-builder message endpoints) and the Monaco editor buffer.
- On submit, fires `PATCH /ai-sessions/{sid}/suggested-fixes/{fid}/script`.
`ScriptBuilderChat` itself is **unchanged** — it stays a pure display component taking `messages`, `language`, `onViewScript`, `onSaveScript`, `isLoading`. The controller wires `onSaveScript` to its submit path instead of the template-creation path the standalone `/script-builder` page uses.
A header toolbar above the controller's render area hosts the mode toggle:
```
┌──────────────────────────────────────┐
@@ -70,10 +79,11 @@ Default mode is `ScriptBuilderChat` embedded inside the tab. A header toolbar ab
└──────────────────────────────────────┘
```
- Clicking **✎ Write myself** flips `scriptBuilderMode` to `'editor'`Monaco renders in place of `ScriptBuilderChat`, pre-loaded with a scaffold (fix description as a language-appropriate comment header + an empty body).
- Clicking **✎ Write myself** flips `scriptBuilderMode` to `'editor'`the controller renders Monaco in place of `ScriptBuilderChat`, pre-loaded with a scaffold (fix description as a language-appropriate comment header + an empty body).
- A reciprocal **✨ Back to AI** button in editor mode returns to the chat.
- Switching modes **does not discard** work. The Monaco buffer and the `ScriptBuilderChat` session both persist across toggles. This matters when an engineer drafts with AI, switches to editor to tweak a line, then considers going back.
- Both modes share a single terminal action: **Submit → `PATCH /ai-sessions/{sid}/suggested-fixes/{fid}/script`**. On success the fix now has `ai_drafted_script`; the tab strip disappears (since the fix no longer needs a script) and the banner's Apply button now routes to `NoTemplateDialog` in the chat region.
- Switching modes **does not discard** work. The Monaco buffer and the script-builder session both persist across toggles. This matters when an engineer drafts with AI, switches to editor to tweak a line, then considers going back.
- Both modes share a single terminal action: the controller's **Submit → `PATCH /ai-sessions/{sid}/suggested-fixes/{fid}/script`**. On success the fix gains `ai_drafted_script`; the tab strip disappears (since the fix no longer needs a script) and the banner's Apply button now routes to `NoTemplateDialog` in the chat region.
- **Submit does NOT stamp `applied_at`.** A draft is not an application — see §5 Apply lifecycle below.
### 3. NoTemplateDialog relocation to chat region
@@ -100,7 +110,26 @@ The NoTemplateDialog-in-chat-region path lives on the **Chat tab** (slides up ab
`TemplateMatchPanel` stays in the task lane for this phase — it's a different surface with different interactions and it's not broken. Moving it is possible future work.
### 5. EscalateInterceptDialog partial choice
### 5. Apply lifecycle — `applied_at` semantics correction
**Problem.** Today (Phase 8) `handleApplyFix` calls `POST /apply` the moment the banner's Apply button is clicked, stamping `applied_at` regardless of what happens next. This starts the Verifying timer (nudge countdown, Resolve auto-success, Escalate intercept) even if the engineer is only opening a drafting surface and hasn't actually run anything yet. For the no-draft path introduced in this phase, that's clearly wrong — opening the Script Builder tab is the start of *authoring*, not the start of *verifying*.
**Rule.** `applied_at` is stamped **only when the engineer commits to an action that produces or triggers a run**, not when they open a surface:
| Banner Apply click → routes to... | Stamps `applied_at`? |
|---|---|
| `TemplateMatchPanel` (existing flow) | Only when the engineer clicks the "Run" action inside the panel |
| `NoTemplateDialog` (existing, now in chat region) → `one_off` card | Yes — one_off runs the script |
| `NoTemplateDialog``draft_template` card | No — this is template authoring, not fix application |
| `NoTemplateDialog``build_template` card | No — navigates to the builder; no run |
| Script Builder tab → Submit | No — just produces a draft. Engineer then clicks Apply again, gets `NoTemplateDialog`, picks `one_off` to actually run |
**Implementation.**
- Remove `sessionSuggestedFixesApi.applyFix(...)` call from `handleApplyFix`. Move the call to the run-path handlers: `TemplateMatchPanel`'s run button, `handleScriptDecision`'s `one_off` branch only. The `applyFix` endpoint itself (from Phase 8 Issue #2) stays unchanged — only its call sites move.
- Until `applied_at` is stamped, the fix remains in `proposed`. `bannerMode` computation already returns `'proposed'` when `applied_at` is null, so the banner naturally stays on Proposed state through the entire drafting phase.
- **Phase 8 consequence.** This is a semantic revision of Phase 8, not just Phase 9 behavior. Add a test asserting that opening `TemplateMatchPanel` does NOT stamp `applied_at`; only the Run action does.
### 6. EscalateInterceptDialog partial choice
Adds a fourth button to the existing popover:
@@ -121,22 +150,38 @@ Adds a fourth button to the existing popover:
### New migration
`script_builder_sessions` **already has** `ai_session_id` (FK → `ai_sessions.id`, nullable, `ON DELETE SET NULL`) with the comment "Link to FlowPilot session if launched from there." The existing column is the link we need — no new FK is added. The migration introduces only the `origin` discriminator:
```sql
ALTER TABLE script_builder_sessions
ADD COLUMN origin VARCHAR(20) NOT NULL DEFAULT 'standalone',
ADD COLUMN parent_pilot_session_id UUID NULL REFERENCES ai_sessions(id) ON DELETE SET NULL;
ADD COLUMN origin VARCHAR(20) NOT NULL DEFAULT 'standalone';
ALTER TABLE script_builder_sessions
ADD CONSTRAINT ck_script_builder_sessions_origin
CHECK (origin IN ('standalone', 'pilot_inline'));
-- Invariant: pilot_inline rows must be linked to a pilot session.
-- Standalone rows may or may not be linked (legacy back-channel).
ALTER TABLE script_builder_sessions
ADD CONSTRAINT ck_script_builder_sessions_origin_ai_session
CHECK (origin <> 'pilot_inline' OR ai_session_id IS NOT NULL);
```
Both columns nullable-defaulted so existing rows stay valid.
`origin = 'standalone'` → existing `/script-builder` page usage (existing rows backfill to this default). `origin = 'pilot_inline'` → new Script Builder tab; `ai_session_id` is populated at row creation.
- `origin = 'standalone'` → existing `/script-builder` page usage.
- `origin = 'pilot_inline'` → new Script Builder tab. `parent_pilot_session_id` is set to the pilot session's `ai_sessions.id`.
`origin` earns its keep as an explicit discriminator for:
- Filtering (`list_sessions` / `count_user_sessions` exclude `pilot_inline` by default — see §Data model filter changes below).
- Future split-quota billing (decided to count as one billable session for now, but tagged for analytics).
`origin` enables future split-quota billing (decided to count as one billable session for now, but tagged for analytics). `parent_pilot_session_id` enables cross-reference when investigating pilot sessions.
### Data-model filter changes — `script_builder_sessions` list + count
Inline sessions would otherwise pollute the standalone `/script-builder` dashboard and count against the per-user 5-session cap enforced by the `POST /script-builder/sessions` endpoint. Required changes:
- `script_builder_service.list_sessions(user_id)` → default scope `origin = 'standalone'`. Callers that genuinely want all rows (e.g., an admin dashboard in a future phase) can pass an explicit `include_inline=True` flag, but no current caller needs it.
- `script_builder_service.count_user_sessions(user_id)` → same scope.
- Both changes covered by tests:
- 5 `pilot_inline` sessions should still leave the engineer free to create 5 standalone sessions (no cap interaction).
- `list_sessions` returns only `standalone` rows.
### New backend endpoint
@@ -157,20 +202,23 @@ Behavior:
- 404 if fix not found on that session.
- 409 if fix is in a terminal status (`applied_success`, `applied_failed`, `dismissed`) — a drafted script can't be attached after the fix is done.
- Sets `fix.ai_drafted_script` + `fix.ai_drafted_parameters`.
- **Does NOT stamp `fix.applied_at`.** A draft is not an application — see §5 above.
- **Bumps `ai_sessions.state_version`** — the fix just transitioned from "needs drafting" to "has draft", which affects Resolve/Escalate preview regeneration.
- Returns `SessionSuggestedFixResponse`.
### Lift of `ScriptBuilderChat`
### ScriptBuilderTab controller (frontend) — no changes to `ScriptBuilderChat`
Add a new prop:
```ts
mode: 'standalone' | 'ephemeral'
```
`ScriptBuilderChat` (`frontend/src/components/script-builder/ScriptBuilderChat.tsx`) is a presentational component taking `messages`, `language`, `onViewScript`, `onSaveScript`, `isLoading`. **It does not need a `mode` prop** — adding persistence semantics to a display component would be wrong.
- `'standalone'`: current behavior — on submit, creates a `script_templates` row.
- `'ephemeral'`: on submit, invokes a caller-provided callback `onScriptDrafted(body, parameters)` instead of creating a template row. The caller (AssistantChatPage) uses this callback to fire the new PATCH /script endpoint.
Instead, introduce a new controller component `frontend/src/components/pilot/ScriptBuilderTab.tsx` that owns the inline lifecycle:
Implementation: look for the submission path in `ScriptBuilderChat`, split the `script_templates` INSERT from the script-assembly logic, gate the INSERT on `mode === 'standalone'`.
- On mount: locate-or-create a `script_builder_sessions` row via the existing `POST /script-builder/sessions` endpoint, passing a new body field `origin: 'pilot_inline'` and the current pilot session id for `ai_session_id`. (The endpoint gains the `origin` parameter; legacy callers continue defaulting to `'standalone'`.)
- Holds local state for the AI message list, the Monaco buffer, and `scriptBuilderMode`.
- Renders `ScriptBuilderChat` in AI mode with `onSaveScript` wired to the inline submit path (PATCH /script), NOT the standalone template-creation path.
- Renders Monaco (via existing `CodeModeEditor` pattern) in `'editor'` mode with its own Save button that triggers the same submit.
- Emits an `onScriptDrafted` event to `AssistantChatPage` on success so the page can `setActiveFix(updated)`, hide the tab strip, and return the engineer to Chat tab.
The standalone `/script-builder` page retains its current behavior unchanged — it continues to create `script_templates` rows on submit. The split happens cleanly at the controller layer, not inside `ScriptBuilderChat`.
---
@@ -225,13 +273,19 @@ The existing `currentChatRef` pattern (Async-select-load-apply guard) applies: w
### Backend
- **Migration:** forward + downgrade reversibility.
- **Migration:** forward + downgrade reversibility; existing rows backfill to `origin='standalone'`; the `origin='pilot_inline' ⇒ ai_session_id IS NOT NULL` invariant is enforced by the check constraint.
- **PATCH /script endpoint** (new test file `test_fix_script_endpoint.py`):
- happy path — 200, `ai_drafted_script` set, `state_version` bumped
- 404 on wrong session
- 409 on terminal status
- 400 on empty body
- **ScriptBuilderChat ephemeral mode** — existing test suite extended; submitting does NOT create a `script_templates` row when `mode='ephemeral'`.
- happy path — 200, `ai_drafted_script` set, `state_version` bumped, `applied_at` untouched.
- 404 on wrong session.
- 409 on terminal status.
- 400 on empty body.
- **list/count filter changes** (extend `test_script_builder.py` or nearby):
- 5 `pilot_inline` sessions + subsequent `standalone` session creation succeeds (does not hit the 5-cap).
- `list_sessions` returns only `standalone` rows by default.
- **Apply lifecycle correction** (extend `test_fix_outcome_endpoint.py`):
- Banner Apply click that routes to a drafting/evaluation surface does NOT stamp `applied_at`.
- `one_off` decision from `NoTemplateDialog` DOES stamp `applied_at`.
- `TemplateMatchPanel` Run action DOES stamp `applied_at`.
### Frontend
@@ -259,23 +313,27 @@ Manual verification (no component test harness in this codebase per CLAUDE.md):
- `backend/tests/test_fix_script_endpoint.py`
**Backend — modified:**
- `backend/app/models/script_builder_session.py` — add columns
- `backend/app/schemas/session_suggested_fix.py` — add `SessionSuggestedFixScriptRequest`
- `backend/app/api/endpoints/session_suggested_fixes.py` — add PATCH /script endpoint
- `backend/app/services/script_builder_service.py` — stamp `origin='pilot_inline'` + `parent_pilot_session_id` when a session is created from the Script Builder tab; gate the `script_templates` INSERT on `origin='standalone'`
- `backend/app/models/session_suggested_fix.py` — unchanged (schema already has `ai_drafted_script`)
- `backend/app/models/script_builder_session.py` — add `origin` column only (`ai_session_id` already exists).
- `backend/app/schemas/session_suggested_fix.py` — add `SessionSuggestedFixScriptRequest`.
- `backend/app/schemas/script_builder.py` (or equivalent) — add optional `origin` field to the `POST /script-builder/sessions` request schema; default `'standalone'`.
- `backend/app/api/endpoints/session_suggested_fixes.py` — add PATCH /script endpoint. Move the existing `applied_at` stamp out of the apply path and into `handleScriptDecision('one_off')` only (server side: no change to `/apply`; callers shift instead).
- `backend/app/api/endpoints/script_builder.py` — accept `origin` on session creation; enforce the `pilot_inline ⇒ ai_session_id` invariant at the handler level.
- `backend/app/services/script_builder_service.py` — persist `origin`; `list_sessions` + `count_user_sessions` filter to `origin='standalone'` by default.
- `backend/app/models/session_suggested_fix.py` — unchanged (schema already has `ai_drafted_script`).
**Frontend — new:**
- `frontend/src/components/pilot/ChatTabStrip.tsx` — renders the `[Chat] [Script Builder ●]` strip
- `frontend/src/components/pilot/ScriptBuilderTab.tsx`houses the AI/editor toggle + Monaco wrapper
- `frontend/src/components/pilot/NoTemplateDialogInline.tsx` (or reuse existing `NoTemplateDialog` with a new wrapper for chat-region styling)
- `frontend/src/components/pilot/ChatTabStrip.tsx` — renders the `[Chat] [Script Builder ●]` strip.
- `frontend/src/components/pilot/ScriptBuilderTab.tsx`controller that owns session lifecycle, AI message state, Monaco buffer, mode toggle, and submit. Renders `ScriptBuilderChat` in AI mode and Monaco in editor mode.
- `frontend/src/components/pilot/NoTemplateDialogInline.tsx` (or reuse existing `NoTemplateDialog` with a new wrapper for chat-region styling).
**Frontend — modified:**
- `frontend/src/api/sessionSuggestedFixes.ts` — add `patchScript(sessionId, fixId, body, parameters)` method
- `frontend/src/components/script-builder/ScriptBuilderChat.tsx` — add `mode` prop
- `frontend/src/pages/AssistantChatPage.tsx` — wire tab strip, tab content, banner Apply routing, NoTemplateDialog chat-region render
- `frontend/src/components/pilot/EscalateInterceptDialog.tsx` — add fourth choice
- `frontend/src/components/pilot/TaskLane.tsx` — remove `bottomSlot` usage of NoTemplateDialog (leave prop API stable)
- `frontend/src/api/sessionSuggestedFixes.ts` — add `patchScript(sessionId, fixId, body, parameters)` method.
- `frontend/src/api/scriptBuilder.ts` (or equivalent) — `createSession` accepts an optional `origin` argument.
- `frontend/src/components/script-builder/ScriptBuilderChat.tsx`**unchanged**. Stays a pure display component.
- `frontend/src/pages/ScriptBuilderPage.tsx`**unchanged on the session-creation path** (defaults to `origin='standalone'`).
- `frontend/src/pages/AssistantChatPage.tsx` — wire tab strip, mount `ScriptBuilderTab`, banner Apply routing (no `applied_at` stamp on click), NoTemplateDialog chat-region render. Move the `sessionSuggestedFixesApi.applyFix(...)` call from `handleApplyFix` to `handleScriptDecision('one_off')` and the TemplateMatchPanel run-path handler.
- `frontend/src/components/pilot/EscalateInterceptDialog.tsx` — add fourth choice.
- `frontend/src/components/pilot/TaskLane.tsx` — remove `bottomSlot` usage of NoTemplateDialog (leave prop API stable).
**Frontend — deleted:**
- None (existing components get refactored, not deleted).