The kb_setup fixture asserts free-plan quota numbers (lifetime_conversions_limit=3),
but Phase 1 conftest seeds test_user on Pro. Downgrade explicitly inside kb_setup
to preserve the original test intent without affecting other suites.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mounts on Pro routers (trees, sessions, scripts, FlowPilot, etc.) and
returns 402 with structured detail when an account's subscription is
missing or locked. Allowlist bypasses billing/account/auth flows so
users can recover from a lapsed subscription.
Conftest now seeds a default Pro/active Subscription on test_user and
test_admin (delete-then-insert because the register endpoint already
creates a free/active sub by default). Two existing tests adapted to
the new seeded plan; tenant-isolation tests seed Subscription rows for
the accounts they create directly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Engineer applies a fix but can't verify yet (waiting on client power-cycle,
AD replication, async sync). Today the verifying banner forces a synchronous
verdict (worked / didn't / partial) — anything else means leaving the banner
stale or guessing wrong. This adds a fourth outcome that parks the fix in a
non-terminal "Awaiting verification" state with a reason ("waiting on what?")
and exposes it on the chat-anchored banner so the engineer doesn't lose track.
Backend
- New non-terminal status `applied_pending` parallel to `applied_partial`.
- New `pending_reason` column (nullable Text) — the "what are you waiting on?"
prose, mirrors `partial_notes`. Required when outcome=applied_pending.
- Outcome endpoint allows pending in/out transitions; pending stamps
applied_at but NOT verified_at (it's parked, not verified).
- Resolution-note + escalation-package prompts handle the new status:
resolution note frames the fix as provisional; escalation package surfaces
pending verification as the leading hypothesis with reference to what's
being waited on.
- Migration: add column + extend status CHECK constraint.
Frontend
- New `BannerMode = 'pending'` + `PendingBanner` component (info-tone,
parallel to PartialBanner) with worked / didn't / update-reason actions.
- VerifyingBanner overflow menu adds "Waiting to verify…".
- Nudge banner's "Still checking" button now actually records pending with
a reason, instead of just silencing for the session.
- AssistantChatPage banner-mode derivation maps applied_pending → 'pending'.
Tests: 4 new integration tests covering pending notes requirement, reason
storage + applied_at/verified_at semantics, pending→success transition,
and pending_reason update on re-PATCH.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codex review pass on the escalation wedge. Reworks claim_session from
read-then-write to a conditional UPDATE so two seniors racing can't both
win, blocks the original engineer from claiming their own handoff, and
filters self-escalated sessions out of the dashboard escalation queue.
Also preassigns the handoff UUID before flush so the compatibility
escalation_package payload carries it. Removes legacy frontend pickup
state (claiming, handleStartHere) that broke tsc --noEmit.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Four items from the design-plan audit, all flagged as locked-design or
Codex corrections, shipped together so the GTM demo path covers them
end-to-end before bug bash.
1. Live AI assessment refresh on the magic-moment screen. Backend already
publishes handoff_assessment_ready when enrich_escalation_async commits;
wire the frontend listener so the senior sees the assessment populate
without a manual reopen. New event type + onAssessmentReady handler on
streamEscalations; AssistantChatPage opens a scoped SSE subscription
whenever it tracks a handoff missing its assessment, refetches on match,
and replaces magicHandoff / overlayHandoff in place. Closes the loop on
the async-assessment commit e8ba74e.
2. Suggested-step chips below the chat input. Locked design from the plan
(Codex correction). Chip strip renders above the composer post-claim
when ai_assessment_data.suggested_steps[] is non-empty. Click prefills
the input and focuses; first send or explicit X hides for the session.
3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen
set (rf-escalation-seen, capped 200). Dot top-right when not seen.
Cleared on open (card click) or claim (Pick Up) — NOT on hover, per
Codex correction. Pick Up stops propagation so it doesn't double-fire.
4. Race-condition toast on claim conflict. The /claim endpoint previously
silently overwrote claimed_by — both seniors thought they owned the
session. New HandoffAlreadyClaimedError carries the winner's id/name/
timestamp; claim_session rejects different-user re-claims (same-user is
idempotent for double-click safety); endpoint returns 409 with
structured detail. AssistantChatPage.handleStartHere extracts and
surfaces "Already claimed by {name} {time_ago}." via toast, drops
?pickup=true, dismisses magic-moment so the loser flows back to queue.
Tests: 2 new unit tests in test_handoff_manager.py (conflict raises,
same-user idempotent). Full handoff + escalation suite (34 tests) green.
Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.
What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
pub/sub keyed by account_id. asyncio.Queue per subscriber with
64-event maxsize and drop-on-full semantics. Designed to be swappable
for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
/api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
now publishes a `handoff_created` event to the bus BEFORE the email
fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
(publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
blocked, ready event handshake).
[gstack-context]
Decisions:
- SSE over WebSocket (one-way, browser EventSource semantics, fewer
moving parts behind Railway proxy)
- In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
- Drop-on-full subscriber queue rather than back-pressure publishers
- Bus publish ahead of email send, both wrapped in try/except so
neither can break handoff creation
- Frontend will be a fetch-based ReadableStream reader matching the
existing streamDocumentation pattern, not native EventSource
(custom-header auth)
Remaining (post-Codex):
- Frontend SSE subscription in EscalationQueue.tsx (slide-in,
reconnect, tab-title flash, prefers-reduced-motion)
- Magic-moment handoff-context screen
- Re-run the full backend test suite to verify the SSE +
dispatcher integration tests (bus units already green standalone)
Tried:
- Running the full test suite repeatedly without xdist; the per-test
DROP SCHEMA + recreate fixture made wall-clock prohibitive when
multiple stale runs collided on the same Postgres test schema.
Resolution: -n auto next time.
[/gstack-context]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.
HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
(excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
creation is authoritative; notification is advisory. This is the
graceful-degradation regression both eng + codex reviews flagged as
critical (handoff must succeed even if SMTP is down).
Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.
Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST
Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.
20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.
Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate
Existing handoff create + queue tests unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}
Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.
The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.
Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
aggregates avg/median/p95 in Python (~1k rows/account/month is well within
budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases
Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.
First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).
How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
'gw0', 'gw1', …). When set, derives a per-worker DB URL like
`…/resolutionflow_test_gw0`. The base DB stays for serial / master
runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
connects to the postgres maintenance DB, and `CREATE DATABASE`s the
worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
worker's isolated DB, no cross-worker race.
CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).
Adds `pytest-xdist==3.6.1` to requirements-dev.txt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that get PR #150 to a green CI gate:
1. **test_record_decision_persists_and_bumps_state_version** — the
`decision: draft_template` path calls `_extract_template_parameters`
(TemplateExtractionService → AI provider). CI doesn't set
ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
`RuntimeError: No AI provider configured` and returned 500. The test
isn't exercising the AI integration — patched the extractor with an
AsyncMock returning a minimal valid `{templated_body, parameters}`
dict. Verified locally: the test now passes.
2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
the hash of requirements*.txt / package-lock.json with a runner-os
restore-key fallback. Saves ~30-60s per run on cache hit.
3. **Pytest invocation tightened**:
- Dropped `--cov-report=term-missing` — the custom "Display coverage
summary" step below parses coverage.json and prints the same
module list more concisely. Term-missing dumps every uncovered
line which adds ~5-10s of stdout.
- Added `--maxfail=10` so a structural breakage (fixture explosion,
DB unreachable) bails after 10 errors instead of running the full
25-min suite. Tunable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test_db fixture calls Base.metadata.create_all on a fresh test DB.
That only creates tables for models that have been imported (and thus
registered with Base.metadata) by the time the fixture runs.
app.main imports app.core.database (which gives us Base) but does NOT
eagerly import the model modules — most are pulled in lazily inside
scheduler functions (archive_stale_ai_sessions etc.) and route
modules. At fixture-setup time, only the handful of models touched by
those eager imports are on the metadata, so any test that exercises
PSA, network diagrams, ratings, escalations, etc. fails with
\`UndefinedTableError: relation "X" does not exist\` and a cascade of
500s on every endpoint that queries the missing table.
Adding \`from app import models as _models\` (rather than the bare
\`import app.models\` which would shadow the \`app\` FastAPI instance
imported just above) pulls in app/models/__init__.py, which itself
imports every model module — registering all ~60 tables with
Base.metadata before create_all runs.
Verified locally: tests/test_psa_writeback_phase4.py went from
1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were
masking the actual passes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top
of a unified main. Two manual conflict resolutions:
1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven
protocol). The pre-rewrite reference content (CW integration notes,
lessons archive, env vars table) lives in `docs/connectwise/`,
`docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design.
2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions
were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks
(facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with
PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint).
Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay,
NewTicketModal) kept. All setters wired by either branch are intact.
Verification:
- `tsc -b` clean across the merged tree.
- Browser smoke-test (Session B fixture): Phase 9 ProposalBanner
("Run AI-drafted PowerShell to recover SSL VPN") renders alongside
PSA's new Tickets sidebar icon. Console clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Continues the test-isolation work from dab740d. RLS migration tests run
against a policy-installed database and fail in the default create_all
suite, so they need to be opt-in:
- pytest.ini: register `rls` marker.
- conftest.py: auto-deselect test_rls_isolation.py unless
RUN_RLS_TESTS=1. Drops the deprecated session-scoped event_loop
fixture (not needed since pytest-asyncio 0.23+).
- test_rls_isolation.py: tag module with `rls` marker. Replace
hardcoded `patherly_test` DB reference with parsed DATABASE_TEST_URL
(matches conftest.py default `resolutionflow_test`). Updated docstring
command to show RUN_RLS_TESTS=1.
- requirements-dev.txt: bump pytest-asyncio 0.23.0 → 0.24.0 (loop-scope
marker behavior required by the RLS module fixture).
Run the RLS suite with:
RUN_RLS_TESTS=1 DB_APP_ROLE_PASSWORD=... pytest tests/test_rls_isolation.py
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Root cause of the 06:32 AM outage: running 'pytest tests/' inside the
resolutionflow_backend container silently dropped the public schema on
the DEV database. Two layered bugs made this possible; both are fixed.
Bug 1 — env-var lookup in conftest.TEST_DATABASE_URL put DATABASE_URL
(which normally points at the dev/prod DB) ahead of DATABASE_TEST_URL.
When DATABASE_URL is set, pytest used the dev DB as the 'test' DB and
the test_db fixture's DROP SCHEMA public CASCADE wiped it. Fixed:
- Honor only DATABASE_TEST_URL (or the localhost fallback).
- Assert at module load that the DB name contains 'test' — refuses
to run otherwise. Makes future misconfiguration impossible.
Bug 2 — conftest overrode app.dependency_overrides[get_db] but not
get_admin_db. Endpoints using get_admin_db (register, admin routes)
bypassed the test session and hit the real admin DB. Before Bug 1 was
fixed this was hidden because both engines pointed at the same dev DB.
With isolation in place, register started failing 'Email already
registered' because of stale users in the dev DB. Fixed:
- Also override get_admin_db to yield the same test session. RLS is
not enabled in the create_all-managed test schema, so sharing is
safe.
Also adds DATABASE_TEST_URL=resolutionflow_test to docker-compose.dev.yml
so pytest in the container works out of the box.
Verified: 49/50 Phase 8 + 9 tests pass against resolutionflow_test; the
1 failure is the pre-existing Phase 8 Issue #4
(test_record_decision_persists_and_bumps_state_version).
Refs gitea #145 (will update that issue with this as the primary fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Called by the inline Script Builder tab on Submit. Writes
ai_drafted_script + ai_drafted_parameters to the fix without stamping
applied_at (a draft is not an application — that's §5 of the Phase 9
spec). Bumps state_version so Resolve/Escalate preview bundles
regenerate.
409 on terminal fix status. 404 on wrong session. 422 on empty script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /script-builder/sessions now supports origin='pilot_inline':
- Requires ai_session_id; validates it against current user ownership.
- Get-or-create: returns existing row for (user, ai_session_id) pair.
- Partial unique index on the DB backs the invariant; races resolve to
the single winner row.
list_sessions + count_user_sessions default-scope to origin='standalone'
so inline scratch sessions don't pollute the /script-builder dashboard
or count against the 5-session cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #3 from phase-8-review-issues.md. 'Not yet' on the AI-confirming
banner was a local-state hide; the proposal re-surfaced on the next
refreshSessionDerived call.
Two-part fix:
- PATCH /outcome now clears ai_outcome_proposal on any terminal action
(engineer has taken a decision; stale AI proposal is moot).
- New DELETE /ai-sessions/:sid/suggested-fixes/:fid/ai-outcome-proposal
endpoint for explicit 'Not yet' rejection. Does not touch status
or state_version — pure UI state.
Frontend handleRejectAIProposal now calls the DELETE and setActiveFix
with the server response.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #2 from phase-8-review-issues.md. Apply was client-side-only via
a bannerApplied flag. Refresh / chat reselect / multi-tab would drop
Verifying state back to Proposed.
- New POST /ai-sessions/{sid}/suggested-fixes/{fid}/apply stamps
applied_at without changing status (still 'proposed'). Idempotent
if already stamped; 409 if fix is past proposed (a terminal outcome
was already recorded).
- Bumps state_version so resolve/escalate preview bundles reflect that
the fix has entered verifying.
- Frontend handleApplyFix calls the endpoint and uses the returned
applied_at directly. bannerApplied client flag is removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't
enough — previews were also omitting outcome fields from the LLM bundle,
so a fresh regenerate still couldn't distinguish proposed / failed /
partial / success.
- PATCH /outcome now bumps ai_sessions.state_version (matches
record_decision's existing pattern).
- Resolution-note + escalation-package bundles now include status,
applied_at, verified_at, partial_notes, failure_reason on the active fix.
- Generator prompts prescribe outcome-aware phrasing (closure language
for success; what-we've-tried + next-steps for failed/partial).
- New end-to-end test asserts the regenerated preview reflects the
recorded outcome, not just that the cache key changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tells the AI when + how to emit the [FIX_OUTCOME] marker that Task 4's
parser consumes. Placeholder-only per the anti-parrot pattern — no
literal UUIDs, outcomes, or reasons that could leak into unrelated
sessions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AI emits [FIX_OUTCOME] when the engineer indicates in chat that a
prior suggested fix worked, didn't work, or was partially applied. The
marker writes to session_suggested_fixes.ai_outcome_proposal (JSONB),
which the frontend surfaces as a "confirm outcome?" banner. The status
column is only updated when the engineer clicks confirm (via PATCH
/outcome endpoint from Task 3).
Placeholder-only system prompt wiring comes in Task 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records engineer-reported outcome (applied_success|applied_failed|
applied_partial|dismissed). Enforces transition rules (partial → success/
failed allowed; terminal outcomes return 409) and notes requirements
(applied_partial requires notes).
Sets verified_at on success/failure, stamps applied_at if not already
set (handles the case where the AI [FIX_OUTCOME] marker fires before
the engineer clicks Apply).
Also fixes pre-existing test-infrastructure bug: network_diagram.py used
bare string server_default="'[]'" for JSONB columns, which asyncpg
rejects during test schema creation. Changed to text("'[]'::jsonb") to
match the pattern used by script_template.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on the Phase 5 "Run now, templatize after resolve" path.
After a session resolves, drafts queued by the three-option dialog surface
as a modal that lets the engineer review the AI-proposed parameterization
and either save as a reusable team template or skip. A "don't ask again"
toggle writes to account_settings.preferences so the next resolve won't
pop the modal.
Backend:
- /api/v1/draft-templates:
* GET — list account drafts (pending_only default true; pass false for
audit view including accepted/rejected)
* GET /{id} — single draft
* POST /{id}/accept — promotes to a new script_templates row with
source_session_id / source_user_id / source_ticket_ref populated
(drives the Script Library "generated from CW #X · resolved by Y"
provenance chip). Draft flips to status=accepted,
promoted_template_id set, resolved_at stamped. 409 on re-accept /
already-rejected. 400 on unknown category_id.
* POST /{id}/reject — flips to status=rejected. 409 on re-reject.
- /api/v1/accounts/me/preferences (GET/PATCH) — thin wrapper over
AccountSettings.get_setting/set_setting. PATCH merges keys into the
JSONB column, preserving existing keys the client didn't touch.
Used by the "Don't ask again for this team" checkbox
(templatize_prompt_enabled=false) and, forward-looking, by
cw_resolved_status_id / cw_escalated_status_id from Phase 4.
- 13 tests: list filter, accept with/without edited_body, provenance
copy-through, reject, 409 on re-accept / re-reject, 400 on unknown
category, prefs round-trip with merge semantics.
Frontend:
- src/components/pilot/script/TemplatizePrompt.tsx — modal showing the
drafted script with proposed parameters in the Phase 5
ParameterizationPreview, editable name/category/description, an
individual-parameter remove button, and the "don't ask again" opt-out.
Accept posts to /draft-templates/{id}/accept + optionally PATCHes
preferences. Skip posts /reject.
- src/api/draftTemplates.ts — typed client plus accountPreferencesApi.
- AssistantChatPage: after a successful Resolve (external OR local),
fetches preferences + pending drafts for the session and queues the
modal one draft at a time. Escalate does not trigger this flow.
- Sidebar: Scripts nav shows the pending-draft count as a badge. Fetched
independently of the main sidebar stats so endpoint flakes don't
break the rest of the sidebar.
Verified live 2026-04-22: seed two drafts → GET sees both pending →
accept draft A (template created, provenance CW #99123 populated) →
reject draft B → pending count drops → PATCH opt-out → GET confirms
persistence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "AI parrots example content from system prompt" bug bit us twice in
one day across two different prompt sites. Patching individual prompts
is treating the symptom; this commit makes the rule structural.
Audit + sanitize:
- assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in
prior commits, but the [FORK] schema still had literal "Brief reason"
/ "Short name" / "One sentence" placeholders. Replaced with
<angle-bracket> placeholders. Anti-parrot rule itself rewritten to
describe the failure mode abstractly instead of naming "jsmith" so
the rule no longer trips the guardrail (and so the model doesn't
see "jsmith" as a token at all).
- ai_chat_service.py — removed three concrete-example offenders:
"Get-Service ADSync" command literal, the "DC01 server_name" intake
form payload (in two places), and the inline interview demos using
"Azure AD Sync failures" / "Exchange Online mailbox migration".
Replaced with technology-neutral schema descriptions.
- ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the
fully-fleshed DNS troubleshooting tree (with literal Dnscache /
ipconfig / google.com / Start-Service) with a placeholder schema
showing only ID-linkage shape.
- kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked
Server Manager + DC01 example payload with a placeholder schema.
Guardrail (tests/test_prompt_anti_parrot.py):
- Imports every module under app/services/ and app/core/ and walks
every uppercase string constant ending in _PROMPT, _SCHEMA,
_PROTOCOL, _FORMAT, or _CONTEXT.
- test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache,
google.com, "Outlook keeps", "Teams drops") must not appear in any
prompt constant. Add to the list when a new leak shows up in prod —
the list IS the audit trail.
- test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.)
must contain placeholders only. Distinguishes JSON keys (followed
by ':', allowed) from JSON values (followed by ',' / ']' / '}',
must be <placeholder>); allows pipe-separated enum types
(text|password|select) and a small set of fixed enum values
(question, diagnostic_check, decision, action, ...). Verified by
feeding the test a known-bad block — caught it correctly.
Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming
the test as the enforcement point so future contributors know how to
extend it (add to the known-leaked list when a new leak surfaces).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the SuggestedFix card to an inline panel that handles both cases:
template-matched fixes open the Script Library generator with parameters
pre-filled from session context; un-matched fixes open the three-option
dialog (one_off / draft_template / build_template). The decision endpoint
records the path choice with side effects: draft_template persists a
draft_templates row via a Sonnet-driven TemplateExtractionService;
build_template returns a redirect to the Script Builder; one_off just
records the choice.
Backend:
- TemplateExtractionService: drafts a parameter schema from a concrete
rendered script. Conservative by default ("prefer fewer parameters").
Round-trip-validates that templated_body only references declared
parameters; missing-key mismatch falls back to the original script
with no params. LLM/parse failures fall back identically — the
engineer can still create a draft and refine in the post-resolve
prompt (Phase 6).
- /suggested-fixes/{fix_id}/decision side effects:
* one_off → returns rendered_script (engineer's edited version or the
fix's ai_drafted_script verbatim)
* draft_template → same + creates draft_templates row with extracted
params, returns draft_template_id
* build_template → returns redirect_path=/scripts/builder?from_session=
&fix= so the frontend can navigate to the builder pre-loaded
- 400 when a non-template fix has no ai_drafted_script (template-matched
fixes take the dedicated /scripts/generate path, not this endpoint).
- 12 tests: TemplateExtractionService parse + fallback paths, all four
decision branches, edited_script override, missing-script 400.
Frontend:
- src/components/pilot/script/{TemplateMatchPanel, NoTemplateDialog,
ParameterizationPreview}.tsx — inline panels rendered in the task
lane's bottom slot when the engineer clicks a SuggestedFix card.
- TemplateMatchPanel: loads template via /scripts/templates/{id},
pre-fills params from fix.ai_drafted_parameters with cyan "from
session" tags, generates via existing /scripts/generate (already
bumps state_version on ai_session_id from Phase 3). 404 falls back
with a clear message instead of erroring.
- NoTemplateDialog: shows the AI-drafted script with proposed parameter
values highlighted in amber via ParameterizationPreview; three option
cards with the middle (draft_template) flagged Recommended; inline
edit on the script body before deciding.
- SuggestedFix card now clickable: onActivate toggles the inline panel.
- AssistantChatPage: scriptPanelOpen state + handleScriptDecision that
navigates on build_template and toasts on the other paths. Active fix
changes auto-close the panel so engineers don't act on stale state.
- Cmd+K → "Open inline Script Generator" palette entry surfaces only on
/pilot/:id routes; fires a window event the chat page subscribes to.
No Resolve shortcut added per Section 14 decision (browser ⌘R conflict).
Verified 2026-04-22 against the dev stack:
- one_off / draft_template / build_template all return the right shape
with real Sonnet TemplateExtractionService for the draft path.
- Conservative extraction confirmed: cmdkey + Restart-Process script
yielded zero proposed parameters as intended.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>