docs(pilot): Phase 9 spec — fix Apply semantics + session idempotency

Four review findings addressed:

- High: draft_template 'Run now, templatize after' DOES run the
  script; applied_at table now stamps for both one_off and
  draft_template. Only build_template (no run) skips the stamp.
- Medium: TemplateMatchPanel needs an explicit '✓ I ran this' button.
  Generate/Copy don't commit to running. The new button is the stamp
  moment for template-match fixes.
- Medium: get-or-create for inline script_builder_sessions —
  POST /script-builder/sessions is now idempotent for
  origin='pilot_inline' (returns the existing row for a
  (user, ai_session_id) pair). Backed by a partial unique index:
    UNIQUE (user_id, ai_session_id) WHERE origin = 'pilot_inline'
  so remount doesn't create duplicates and draft continuity is
  preserved.
- Medium: authorization — the create endpoint validates that any
  provided ai_session_id is owned by the current user (same guard
  other pilot endpoints use). Prevents cross-user attachment of
  scratch sessions to arbitrary pilot sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-23 23:34:53 -04:00
parent fcd224429c
commit 75b59123e6

View File

@@ -118,16 +118,18 @@ The NoTemplateDialog-in-chat-region path lives on the **Chat tab** (slides up ab
| Banner Apply click → routes to... | Stamps `applied_at`? |
|---|---|
| `TemplateMatchPanel` (existing flow) | Only when the engineer clicks the "Run" action inside the panel |
| `NoTemplateDialog` (existing, now in chat region) → `one_off` card | Yes — one_off runs the script |
| `NoTemplateDialog``draft_template` card | No — this is template authoring, not fix application |
| `NoTemplateDialog``build_template` card | No — navigates to the builder; no run |
| Script Builder tab → Submit | No — just produces a draft. Engineer then clicks Apply again, gets `NoTemplateDialog`, picks `one_off` to actually run |
| `TemplateMatchPanel` (existing flow) | Only when the engineer clicks a new explicit **"✓ I ran this"** action inside the panel (see below) |
| `NoTemplateDialog` `one_off` card ("Run now, no template") | **Yes** — the card click declares "I'm running this now" |
| `NoTemplateDialog``draft_template` card ("Run now, templatize after") | **Yes** — same declaration, the template proposal is a side effect |
| `NoTemplateDialog``build_template` card ("Just open the builder") | No — no run is declared; the engineer is going off to author a proper template |
| Script Builder tab → Submit | No — just produces a draft. Engineer then clicks Apply again, gets `NoTemplateDialog`, picks `one_off` or `draft_template` to declare the run |
**New explicit "I ran this" action in `TemplateMatchPanel`.** Today the panel has Generate, Copy, and Edit Parameters — none of which commit to running. Copying doesn't imply running; the engineer can walk away. This phase adds a distinct primary button (accent-colored, below Copy) labeled **"✓ I ran this"** or **"Mark as applied"**. Click → calls `applyFix` → fix transitions to Verifying. Until clicked, the fix stays in `proposed`.
**Implementation.**
- Remove `sessionSuggestedFixesApi.applyFix(...)` call from `handleApplyFix`. Move the call to the run-path handlers: `TemplateMatchPanel`'s run button, `handleScriptDecision`'s `one_off` branch only. The `applyFix` endpoint itself (from Phase 8 Issue #2) stays unchanged — only its call sites move.
- Remove `sessionSuggestedFixesApi.applyFix(...)` call from `handleApplyFix`. Move it to the three run-declaring call sites: `NoTemplateDialog`'s `handleScriptDecision('one_off' | 'draft_template')` paths AND the new `TemplateMatchPanel` "I ran this" button. The `applyFix` endpoint itself (from Phase 8 Issue #2) stays unchanged — only its call sites move.
- Until `applied_at` is stamped, the fix remains in `proposed`. `bannerMode` computation already returns `'proposed'` when `applied_at` is null, so the banner naturally stays on Proposed state through the entire drafting phase.
- **Phase 8 consequence.** This is a semantic revision of Phase 8, not just Phase 9 behavior. Add a test asserting that opening `TemplateMatchPanel` does NOT stamp `applied_at`; only the Run action does.
- **Phase 8 consequence.** This is a semantic revision of Phase 8, not just Phase 9 behavior. Tests must assert: opening `TemplateMatchPanel` does NOT stamp `applied_at`; clicking "I ran this" DOES; `NoTemplateDialog` `one_off` AND `draft_template` both DO; `build_template` does NOT.
### 6. EscalateInterceptDialog partial choice
@@ -150,7 +152,7 @@ Adds a fourth button to the existing popover:
### New migration
`script_builder_sessions` **already has** `ai_session_id` (FK → `ai_sessions.id`, nullable, `ON DELETE SET NULL`) with the comment "Link to FlowPilot session if launched from there." The existing column is the link we need — no new FK is added. The migration introduces only the `origin` discriminator:
`script_builder_sessions` **already has** `ai_session_id` (FK → `ai_sessions.id`, nullable, `ON DELETE SET NULL`) with the comment "Link to FlowPilot session if launched from there." The existing column is the link we need — no new FK is added. The migration introduces only the `origin` discriminator plus a uniqueness guard for inline sessions:
```sql
ALTER TABLE script_builder_sessions
@@ -165,6 +167,14 @@ ALTER TABLE script_builder_sessions
ALTER TABLE script_builder_sessions
ADD CONSTRAINT ck_script_builder_sessions_origin_ai_session
CHECK (origin <> 'pilot_inline' OR ai_session_id IS NOT NULL);
-- Uniqueness: at most one pilot_inline session per (user, pilot session).
-- Required to back the get-or-create semantics on the endpoint and prevent
-- duplicate scratch rows on remount. Partial index scoped to pilot_inline
-- so standalone rows are unaffected.
CREATE UNIQUE INDEX ux_script_builder_sessions_pilot_inline
ON script_builder_sessions (user_id, ai_session_id)
WHERE origin = 'pilot_inline';
```
`origin = 'standalone'` → existing `/script-builder` page usage (existing rows backfill to this default). `origin = 'pilot_inline'` → new Script Builder tab; `ai_session_id` is populated at row creation.
@@ -212,7 +222,7 @@ Behavior:
Instead, introduce a new controller component `frontend/src/components/pilot/ScriptBuilderTab.tsx` that owns the inline lifecycle:
- On mount: locate-or-create a `script_builder_sessions` row via the existing `POST /script-builder/sessions` endpoint, passing a new body field `origin: 'pilot_inline'` and the current pilot session id for `ai_session_id`. (The endpoint gains the `origin` parameter; legacy callers continue defaulting to `'standalone'`.)
- On mount: **get-or-create** the single inline `script_builder_sessions` row for `(current user, current pilot session)` via the existing `POST /script-builder/sessions` endpoint, passing `origin: 'pilot_inline'` and the current pilot session id for `ai_session_id`. The endpoint becomes idempotent for `origin='pilot_inline'` — if a row exists for that `(user_id, ai_session_id)` pair, it's returned; otherwise created. The partial unique index on the DB backs the invariant independent of endpoint code. Remounting (tab hide/show, page refresh) resumes the same session — no duplicates, no lost draft continuity.
- Holds local state for the AI message list, the Monaco buffer, and `scriptBuilderMode`.
- Renders `ScriptBuilderChat` in AI mode with `onSaveScript` wired to the inline submit path (PATCH /script), NOT the standalone template-creation path.
- Renders Monaco (via existing `CodeModeEditor` pattern) in `'editor'` mode with its own Save button that triggers the same submit.
@@ -220,6 +230,18 @@ Instead, introduce a new controller component `frontend/src/components/pilot/Scr
The standalone `/script-builder` page retains its current behavior unchanged — it continues to create `script_templates` rows on submit. The split happens cleanly at the controller layer, not inside `ScriptBuilderChat`.
### `POST /script-builder/sessions` — changes for inline origin
The existing endpoint is extended in three ways:
1. **Accepts `origin`** in the request body (`Literal['standalone', 'pilot_inline']`, default `'standalone'`). Legacy callers unchanged.
2. **Authorization on `ai_session_id`.** When `origin='pilot_inline'` is passed AND `ai_session_id` is provided, the handler MUST verify the referenced `ai_sessions` row is owned by the current user (or within their account — whichever guard `_load_session_or_404(db, ai_session_id, current_user)` already enforces for the pilot endpoints). Without this check, a caller could attach an inline scratch session to an arbitrary pilot session. The check fires before any row lookup or creation.
3. **Idempotent for `origin='pilot_inline'`.** If a row with `(user_id = current, ai_session_id = provided, origin = 'pilot_inline')` already exists, the handler returns that row (200) instead of creating a new one (201). The unique partial index enforces at-most-one at the DB layer; a race between two concurrent POSTs surfaces as an integrity error that the handler catches and re-reads.
For `origin='standalone'`, behavior is unchanged — always creates, still subject to the 5-session cap.
The 5-session cap applies only to `standalone` rows (see §Data-model filter changes). Inline sessions are out of that accounting entirely.
---
## State
@@ -285,7 +307,14 @@ The existing `currentChatRef` pattern (Async-select-load-apply guard) applies: w
- **Apply lifecycle correction** (extend `test_fix_outcome_endpoint.py`):
- Banner Apply click that routes to a drafting/evaluation surface does NOT stamp `applied_at`.
- `one_off` decision from `NoTemplateDialog` DOES stamp `applied_at`.
- `TemplateMatchPanel` Run action DOES stamp `applied_at`.
- `draft_template` decision from `NoTemplateDialog` DOES stamp `applied_at` (it still runs the script).
- `build_template` decision from `NoTemplateDialog` does NOT stamp (no run).
- `TemplateMatchPanel` "I ran this" action DOES stamp `applied_at`; Generate / Copy alone do NOT.
- **Script Builder session create — inline semantics** (extend `test_script_builder.py` or equivalent):
- First `POST /script-builder/sessions` with `origin='pilot_inline', ai_session_id=X` creates and returns a row.
- Second `POST` with the same `(ai_session_id, user)` returns the SAME row (no duplicate created); DB row count confirms.
- `POST` with `origin='pilot_inline'` and `ai_session_id` pointing at another user's pilot session is rejected (403/404).
- Race: two concurrent `POST`s for the same `(user, ai_session_id)` resolve to the same row id (one winner, one returns the existing).
### Frontend