Replaces the legacy flowpilot_engine.escalate_session orchestration with
a single canonical path through HandoffManager. Every escalation now
creates a SessionHandoff row, fans out via the SSE bus, persists
AppNotification rows for the bell icon, dispatches to external channels
(Slack/Teams) via notify(), and emails per-user — regardless of whether
the call entered through /escalate (legacy URL) or /handoff (new URL).
The senior-pickup magic-moment screen now works end-to-end from the
EscalateModal bell-icon path the user just tested.
Backend
- HandoffCreateRequest gains optional target_user_id (the equivalent of
the legacy escalated_to_id field). Self-targeting rejected.
- HandoffManager.create_handoff handles intent='escalate' end-to-end:
sets escalation_reason + escalated_to_id, builds the legacy enhanced
AI escalation_package (Sonnet, lazy-imported from flowpilot_engine,
graceful fallback on failure), and merges handoff metadata into it.
Eager-loads session.steps and session.user via selectinload — required
by both the enhanced-package builder and notify() to avoid
MissingGreenlet on async lazy access.
- HandoffManager.finalize_escalation generates SessionDocumentation,
pushes documentation to PSA, and runs notify() — pre-commit so the
AppNotification rows persist atomically with the handoff.
- HandoffManager.dispatch_escalation_notifications keeps only the
fire-and-forget IO (bus publish, per-user emails) — runs post-commit.
Pulls engineer name via a separate User query rather than relying on
session.user lazy access.
- /handoff endpoint passes target_user_id through and calls
finalize_escalation pre-commit.
- /escalate endpoint is now a thin shim: owner-only session lookup,
HandoffManager.create_handoff(intent='escalate'), finalize_escalation,
commit, dispatch_escalation_notifications, return SessionCloseResponse
built from documentation + psa_result. flowpilot_engine.escalate_session
is no longer called by any endpoint.
- pickup_session accepts both 'requesting_escalation' (legacy in-flight
sessions) and 'escalated' (new canonical) so the migration is seamless
for sessions already in the queue.
- Escalation queue list and sidebar count now match either status.
Frontend
- useFlowPilotSession optimistic update flips status to 'escalated'
instead of 'requesting_escalation' so the page state matches the
unified backend response.
Verified end-to-end live: a fresh /escalate call from the junior produces
status='escalated', a SessionHandoff row, a SessionDocumentation, PSA
push attempted (no_psa for this test session), AND a bell-icon
AppNotification for the team admin with link
/pilot/{session_id}?pickup=true. Backend test suite: 1103 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two backend changes that unbreak the senior-pickup path from the
notification panel:
1. notification_service: session.escalated link template now ends with
?pickup=true so the senior lands in the handoff/pickup flow on
click. Without it, navigation hit /pilot/:id directly, which then
404'd on the GET because the senior isn't yet escalated_to_id —
the user perceives this as the bell-icon "just clearing the
notification".
2. ai_sessions GET access: any account member can now read an escalated
session's detail when status is requesting_escalation or escalated.
The owner-only guard was overly restrictive for explicitly-shared
in-transit states. Tenant boundary is enforced by RLS on the
underlying query, so account-scope is the right ceiling here. After
pickup, the existing handler/escalated_to_id checks still apply.
Verified live: re-login as the senior engineer and GET the active
escalated session — now returns 200 with full detail. Focused test
subset plus tests/test_sessions.py and tests/test_session_sharing.py
→ 94 passed in 43.26s, no regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.
What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
pub/sub keyed by account_id. asyncio.Queue per subscriber with
64-event maxsize and drop-on-full semantics. Designed to be swappable
for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
/api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
now publishes a `handoff_created` event to the bus BEFORE the email
fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
(publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
blocked, ready event handshake).
[gstack-context]
Decisions:
- SSE over WebSocket (one-way, browser EventSource semantics, fewer
moving parts behind Railway proxy)
- In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
- Drop-on-full subscriber queue rather than back-pressure publishers
- Bus publish ahead of email send, both wrapped in try/except so
neither can break handoff creation
- Frontend will be a fetch-based ReadableStream reader matching the
existing streamDocumentation pattern, not native EventSource
(custom-header auth)
Remaining (post-Codex):
- Frontend SSE subscription in EscalationQueue.tsx (slide-in,
reconnect, tab-title flash, prefers-reduced-motion)
- Magic-moment handoff-context screen
- Re-run the full backend test suite to verify the SSE +
dispatcher integration tests (bus units already green standalone)
Tried:
- Running the full test suite repeatedly without xdist; the per-test
DROP SCHEMA + recreate fixture made wall-clock prohibitive when
multiple stale runs collided on the same Postgres test schema.
Resolution: -n auto next time.
[/gstack-context]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.
HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
(excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
creation is authoritative; notification is advisory. This is the
graceful-degradation regression both eng + codex reviews flagged as
critical (handoff must succeed even if SMTP is down).
Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.
Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST
Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.
20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.
Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate
Existing handoff create + queue tests unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}
Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.
The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.
Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
aggregates avg/median/p95 in Python (~1k rows/account/month is well within
budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases
Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.
First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top
of a unified main. Two manual conflict resolutions:
1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven
protocol). The pre-rewrite reference content (CW integration notes,
lessons archive, env vars table) lives in `docs/connectwise/`,
`docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design.
2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions
were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks
(facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with
PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint).
Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay,
NewTicketModal) kept. All setters wired by either branch are intact.
Verification:
- `tsc -b` clean across the merged tree.
- Browser smoke-test (Session B fixture): Phase 9 ProposalBanner
("Run AI-drafted PowerShell to recover SSL VPN") renders alongside
PSA's new Tickets sidebar icon. Console clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Called by the inline Script Builder tab on Submit. Writes
ai_drafted_script + ai_drafted_parameters to the fix without stamping
applied_at (a draft is not an application — that's §5 of the Phase 9
spec). Bumps state_version so Resolve/Escalate preview bundles
regenerate.
409 on terminal fix status. 404 on wrong session. 422 on empty script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /script-builder/sessions now supports origin='pilot_inline':
- Requires ai_session_id; validates it against current user ownership.
- Get-or-create: returns existing row for (user, ai_session_id) pair.
- Partial unique index on the DB backs the invariant; races resolve to
the single winner row.
list_sessions + count_user_sessions default-scope to origin='standalone'
so inline scratch sessions don't pollute the /script-builder dashboard
or count against the 5-session cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the DB column added in the prior migration. App-level default
is 'standalone' so existing callers of ScriptBuilderSession(...) work
without code changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #3 from phase-8-review-issues.md. 'Not yet' on the AI-confirming
banner was a local-state hide; the proposal re-surfaced on the next
refreshSessionDerived call.
Two-part fix:
- PATCH /outcome now clears ai_outcome_proposal on any terminal action
(engineer has taken a decision; stale AI proposal is moot).
- New DELETE /ai-sessions/:sid/suggested-fixes/:fid/ai-outcome-proposal
endpoint for explicit 'Not yet' rejection. Does not touch status
or state_version — pure UI state.
Frontend handleRejectAIProposal now calls the DELETE and setActiveFix
with the server response.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #2 from phase-8-review-issues.md. Apply was client-side-only via
a bannerApplied flag. Refresh / chat reselect / multi-tab would drop
Verifying state back to Proposed.
- New POST /ai-sessions/{sid}/suggested-fixes/{fid}/apply stamps
applied_at without changing status (still 'proposed'). Idempotent
if already stamped; 409 if fix is past proposed (a terminal outcome
was already recorded).
- Bumps state_version so resolve/escalate preview bundles reflect that
the fix has entered verifying.
- Frontend handleApplyFix calls the endpoint and uses the returned
applied_at directly. bannerApplied client flag is removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't
enough — previews were also omitting outcome fields from the LLM bundle,
so a fresh regenerate still couldn't distinguish proposed / failed /
partial / success.
- PATCH /outcome now bumps ai_sessions.state_version (matches
record_decision's existing pattern).
- Resolution-note + escalation-package bundles now include status,
applied_at, verified_at, partial_notes, failure_reason on the active fix.
- Generator prompts prescribe outcome-aware phrasing (closure language
for success; what-we've-tried + next-steps for failed/partial).
- New end-to-end test asserts the regenerated preview reflects the
recorded outcome, not just that the cache key changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tells the AI when + how to emit the [FIX_OUTCOME] marker that Task 4's
parser consumes. Placeholder-only per the anti-parrot pattern — no
literal UUIDs, outcomes, or reasons that could leak into unrelated
sessions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AI emits [FIX_OUTCOME] when the engineer indicates in chat that a
prior suggested fix worked, didn't work, or was partially applied. The
marker writes to session_suggested_fixes.ai_outcome_proposal (JSONB),
which the frontend surfaces as a "confirm outcome?" banner. The status
column is only updated when the engineer clicks confirm (via PATCH
/outcome endpoint from Task 3).
Placeholder-only system prompt wiring comes in Task 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records engineer-reported outcome (applied_success|applied_failed|
applied_partial|dismissed). Enforces transition rules (partial → success/
failed allowed; terminal outcomes return 409) and notes requirements
(applied_partial requires notes).
Sets verified_at on success/failure, stamps applied_at if not already
set (handles the case where the AI [FIX_OUTCOME] marker fires before
the engineer clicks Apply).
Also fixes pre-existing test-infrastructure bug: network_diagram.py used
bare string server_default="'[]'" for JSONB columns, which asyncpg
rejects during test schema creation. Changed to text("'[]'::jsonb") to
match the pattern used by script_template.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds FixStatus literal (5 values matching the DB check constraint),
extends SessionSuggestedFixResponse with outcome fields, and introduces
SessionSuggestedFixOutcomeRequest for the PATCH /outcome endpoint coming
in Task 3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 8 prep for the fix outcome banner. Adds:
- status (proposed|applied_success|applied_failed|applied_partial|dismissed)
- applied_at, verified_at (timestamps)
- partial_notes, failure_reason (engineer-provided context)
- ai_outcome_proposal (JSONB for AI [FIX_OUTCOME] marker payloads)
Backfills status='dismissed' from user_decision='dismissed'. status is
orthogonal to user_decision — outcome (did the fix work?) vs script-path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on the Phase 5 "Run now, templatize after resolve" path.
After a session resolves, drafts queued by the three-option dialog surface
as a modal that lets the engineer review the AI-proposed parameterization
and either save as a reusable team template or skip. A "don't ask again"
toggle writes to account_settings.preferences so the next resolve won't
pop the modal.
Backend:
- /api/v1/draft-templates:
* GET — list account drafts (pending_only default true; pass false for
audit view including accepted/rejected)
* GET /{id} — single draft
* POST /{id}/accept — promotes to a new script_templates row with
source_session_id / source_user_id / source_ticket_ref populated
(drives the Script Library "generated from CW #X · resolved by Y"
provenance chip). Draft flips to status=accepted,
promoted_template_id set, resolved_at stamped. 409 on re-accept /
already-rejected. 400 on unknown category_id.
* POST /{id}/reject — flips to status=rejected. 409 on re-reject.
- /api/v1/accounts/me/preferences (GET/PATCH) — thin wrapper over
AccountSettings.get_setting/set_setting. PATCH merges keys into the
JSONB column, preserving existing keys the client didn't touch.
Used by the "Don't ask again for this team" checkbox
(templatize_prompt_enabled=false) and, forward-looking, by
cw_resolved_status_id / cw_escalated_status_id from Phase 4.
- 13 tests: list filter, accept with/without edited_body, provenance
copy-through, reject, 409 on re-accept / re-reject, 400 on unknown
category, prefs round-trip with merge semantics.
Frontend:
- src/components/pilot/script/TemplatizePrompt.tsx — modal showing the
drafted script with proposed parameters in the Phase 5
ParameterizationPreview, editable name/category/description, an
individual-parameter remove button, and the "don't ask again" opt-out.
Accept posts to /draft-templates/{id}/accept + optionally PATCHes
preferences. Skip posts /reject.
- src/api/draftTemplates.ts — typed client plus accountPreferencesApi.
- AssistantChatPage: after a successful Resolve (external OR local),
fetches preferences + pending drafts for the session and queues the
modal one draft at a time. Escalate does not trigger this flow.
- Sidebar: Scripts nav shows the pending-draft count as a badge. Fetched
independently of the main sidebar stats so endpoint flakes don't
break the rest of the sidebar.
Verified live 2026-04-22: seed two drafts → GET sees both pending →
accept draft A (template created, provenance CW #99123 populated) →
reject draft B → pending count drops → PATCH opt-out → GET confirms
persistence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "AI parrots example content from system prompt" bug bit us twice in
one day across two different prompt sites. Patching individual prompts
is treating the symptom; this commit makes the rule structural.
Audit + sanitize:
- assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in
prior commits, but the [FORK] schema still had literal "Brief reason"
/ "Short name" / "One sentence" placeholders. Replaced with
<angle-bracket> placeholders. Anti-parrot rule itself rewritten to
describe the failure mode abstractly instead of naming "jsmith" so
the rule no longer trips the guardrail (and so the model doesn't
see "jsmith" as a token at all).
- ai_chat_service.py — removed three concrete-example offenders:
"Get-Service ADSync" command literal, the "DC01 server_name" intake
form payload (in two places), and the inline interview demos using
"Azure AD Sync failures" / "Exchange Online mailbox migration".
Replaced with technology-neutral schema descriptions.
- ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the
fully-fleshed DNS troubleshooting tree (with literal Dnscache /
ipconfig / google.com / Start-Service) with a placeholder schema
showing only ID-linkage shape.
- kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked
Server Manager + DC01 example payload with a placeholder schema.
Guardrail (tests/test_prompt_anti_parrot.py):
- Imports every module under app/services/ and app/core/ and walks
every uppercase string constant ending in _PROMPT, _SCHEMA,
_PROTOCOL, _FORMAT, or _CONTEXT.
- test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache,
google.com, "Outlook keeps", "Teams drops") must not appear in any
prompt constant. Add to the list when a new leak shows up in prod —
the list IS the audit trail.
- test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.)
must contain placeholders only. Distinguishes JSON keys (followed
by ':', allowed) from JSON values (followed by ',' / ']' / '}',
must be <placeholder>); allows pipe-separated enum types
(text|password|select) and a small set of fixed enum values
(question, diagnostic_check, decision, action, ...). Verified by
feeding the test a known-bad block — caught it correctly.
Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming
the test as the enforcement point so future contributors know how to
extend it (add to the known-leaked list when a new leak surfaces).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The system prompt had a "Complete example of a correct first response"
section with a specific Outlook/WiFi/jsmith scenario plus literal JSON
payloads in [QUESTIONS], [ACTIONS], [SUGGEST_FIX], and [PROMOTE]
markers. The model was emitting those literal strings (the same
WiFi/laptop questions, the same "Clear cached credentials" suggested
fix, the same "OWA login confirmed for jsmith" promote) on EVERY
unrelated chat — making the task lane look like it was leaking previous-
session data when in fact the AI was just reciting the prompt examples.
Replaced literal example content with `<placeholder>` schemas. Added an
explicit ANTI-PARROT RULE in the FINAL REMINDER section calling out
that the angle-bracket placeholders show SHAPE, not CONTENT, with
concrete examples of the failure mode (printer ticket → don't ask
about Outlook; user not named jsmith → don't name jsmith).
Same scrub applied to the FORK section's "Outlook AND Teams dropping"
and the worked fork-flow example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the SuggestedFix card to an inline panel that handles both cases:
template-matched fixes open the Script Library generator with parameters
pre-filled from session context; un-matched fixes open the three-option
dialog (one_off / draft_template / build_template). The decision endpoint
records the path choice with side effects: draft_template persists a
draft_templates row via a Sonnet-driven TemplateExtractionService;
build_template returns a redirect to the Script Builder; one_off just
records the choice.
Backend:
- TemplateExtractionService: drafts a parameter schema from a concrete
rendered script. Conservative by default ("prefer fewer parameters").
Round-trip-validates that templated_body only references declared
parameters; missing-key mismatch falls back to the original script
with no params. LLM/parse failures fall back identically — the
engineer can still create a draft and refine in the post-resolve
prompt (Phase 6).
- /suggested-fixes/{fix_id}/decision side effects:
* one_off → returns rendered_script (engineer's edited version or the
fix's ai_drafted_script verbatim)
* draft_template → same + creates draft_templates row with extracted
params, returns draft_template_id
* build_template → returns redirect_path=/scripts/builder?from_session=
&fix= so the frontend can navigate to the builder pre-loaded
- 400 when a non-template fix has no ai_drafted_script (template-matched
fixes take the dedicated /scripts/generate path, not this endpoint).
- 12 tests: TemplateExtractionService parse + fallback paths, all four
decision branches, edited_script override, missing-script 400.
Frontend:
- src/components/pilot/script/{TemplateMatchPanel, NoTemplateDialog,
ParameterizationPreview}.tsx — inline panels rendered in the task
lane's bottom slot when the engineer clicks a SuggestedFix card.
- TemplateMatchPanel: loads template via /scripts/templates/{id},
pre-fills params from fix.ai_drafted_parameters with cyan "from
session" tags, generates via existing /scripts/generate (already
bumps state_version on ai_session_id from Phase 3). 404 falls back
with a clear message instead of erroring.
- NoTemplateDialog: shows the AI-drafted script with proposed parameter
values highlighted in amber via ParameterizationPreview; three option
cards with the middle (draft_template) flagged Recommended; inline
edit on the script body before deciding.
- SuggestedFix card now clickable: onActivate toggles the inline panel.
- AssistantChatPage: scriptPanelOpen state + handleScriptDecision that
navigates on build_template and toasts on the other paths. Active fix
changes auto-close the panel so engineers don't act on stale state.
- Cmd+K → "Open inline Script Generator" palette entry surfaces only on
/pilot/:id routes; fires a window event the chat page subscribes to.
No Resolve shortcut added per Section 14 decision (browser ⌘R conflict).
Verified 2026-04-22 against the dev stack:
- one_off / draft_template / build_template all return the right shape
with real Sonnet TemplateExtractionService for the draft path.
- Conservative extraction confirmed: cmdkey + Restart-Process script
yielded zero proposed parameters as intended.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the preview popover's Confirm & post action to ConnectWise (and,
via the provider pattern, any future PSA). Adds the parallel Escalate
flow with the handoff-oriented five-section markdown. Sessions without a
linked PSA ticket resolve/escalate locally — markdown stored, status
flipped, nothing posted externally.
Backend:
- EscalationPackageGeneratorService: Sonnet, five sections (Problem /
What we've confirmed / What we've tried / Current hypothesis /
Suggested next steps). Shares the preview_cache with a separate KIND
so Resolve and Escalate previews for the same state coexist.
- PSAWritebackService: post_resolution_note (RESOLUTION note type,
customer-visible), post_escalation_package (INTERNAL_ANALYSIS,
handoff for the next engineer only), transition_ticket_status with
mandatory re-fetch verification. PSAStatusVerificationError surfaces
loudly when CW silently rejects a status change — the
ConnectWise anti-pattern CLAUDE.md flags.
- Endpoints:
* POST /ai-sessions/{id}/escalation-package/preview
* POST /ai-sessions/{id}/resolution-note/post
* POST /ai-sessions/{id}/escalation-package/post
Outcomes: "resolved" / "escalated" with external_id + verified status,
"resolved_local" / "escalated_local" when no PSA linked.
- Target CW status IDs live in account_settings.preferences
(cw_resolved_status_id, cw_escalated_status_id). When unset, the post
proceeds without a status transition — response includes a
status_transition_skipped_reason rather than silently erroring.
- 7 tests: local-only path, PSA happy path with verified transition,
status verification failure → 502, skipped transition when
unconfigured, 409 on already-resolved re-post, escalate parallel path,
internal-analysis note type enforced.
Frontend:
- ResolutionNotePreview now kind-parameterized ('resolve' | 'escalate')
with inline edit + Confirm & post. Preview loads from the matching
backend endpoint; posting calls the matching endpoint; outcome toast
surfaces the verified CW status or the local-only result.
- AssistantChatPage: previewKind state replaces previewOpen; two toggle
buttons (Preview Resolve note / Escalate instead) in the lane's bottom
slot. handleConfirmPost dispatches by kind.
Verified 2026-04-22:
- Local-only Resolve + Escalate round-trip against the dev stack.
- Live Sonnet escalation-package preview; cache hit on repeat call
with no state change (separate cache kind from resolution-note).
- PSA post + status-verification paths covered by mocked-provider pytest
cases. Live CW round-trip pending a test CW instance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the AI-proposed resolution path and the inline preview of the
markdown that will be posted to the customer ticket on Resolve. The
preview is keyed on (session_id, ai_sessions.state_version) so back-to-
back fetches against unchanged state hit an in-process cache instead
of paying for a Sonnet call.
Backend:
- preview_cache: in-process LRU keyed on (kind, session_id, state_version).
No TTL — state_version is the source of truth. Soft-cap 5000 entries.
- unified_chat_service: [SUGGEST_FIX] parser (last-block-wins, JSON
payload, confidence clamped 0-100), supersession persistence (sets
superseded_at on prior active row), atomic state_version bump.
- ResolutionNoteGeneratorService: pulls session, facts, active fix, and
redacted script_generations into a structured input bundle for Sonnet;
produces the four-section markdown (Problem / What we confirmed /
Root cause / Resolution). Sensitive script parameters redacted via
ScriptTemplateEngine.redact_sensitive driven by the template's
parameters_schema.
- /api/v1/ai-sessions/{id}/suggested-fixes/active — 200 with the active
fix or 404.
- /api/v1/ai-sessions/{id}/suggested-fixes/{fix_id}/decision — records
one_off / draft_template / build_template / dismissed; dismiss
supersedes; bumps state_version. 409 on dismissing an already-
superseded fix.
- /api/v1/ai-sessions/{id}/resolution-note/preview — generates or returns
cached markdown; from_cache flag in payload signals cache hit.
- scripts.py POST /generate now bumps state_version on the linked
ai_session_id when present (third source of preview-cache invalidation
per Section 5.5).
- ASSISTANT_SYSTEM_PROMPT documents [SUGGEST_FIX] (when to/not to emit,
format, supersession semantics).
- 12 tests covering the parser (well-formed, last-wins, malformed,
confidence clamping), supersession + state_version invariant, all
decision branches, preview cache hit-on-no-change + miss-after-write.
Frontend:
- src/components/pilot/sections/SuggestedFix.tsx — amber-accented card
with confidence badge; dismiss action wired to the decision endpoint.
- src/components/pilot/ResolutionNotePreview.tsx — popover with refresh,
loading state, cached/fresh indicator, ticket-ref display.
- src/api/sessionSuggestedFixes.ts — typed client; getActive normalizes
404 to null so callers don't have to special-case.
- TaskLane gains suggestedFixSlot + bottomSlot props (rendered after
Diagnostic Checks; bottomSlot anchors the Resolve action).
- AssistantChatPage: refreshSessionDerived helper batches fact + fix
refresh; fact mutations and chat sends both schedule a 500ms-debounced
preview refresh per the Section 5.5 spec.
Verified end-to-end against the dev stack with a real Sonnet call:
- /active 404 → fact create → preview generates four-section markdown
grounded only in provided facts → second preview call hits cache
(from_cache=true, no LLM call) → fact write 2 → cache miss, regenerates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the load-bearing structural feature of the FlowPilot migration: a
"What we know" panel that holds confirmed facts for a session, fed by AI
[PROMOTE] markers and engineer-added notes. Facts feed the resolution
note preview (Phase 3) and survive across turns via stable UUIDs assigned
to pending_task_lane items.
Backend:
- FactSynthesisService: create/update/soft-delete facts with atomic
state_version bumps; LLM-backed synthesize_from_question/check on the
fact_synthesis (Haiku) action tier per Section 6.6.
- /api/v1/ai-sessions/{id}/facts CRUD + /facts/promote (proposed_text or
via synthesis). PATCH returns 403 for question/diagnostic_check facts
(edit the source item instead, Section 7.3).
- unified_chat_service: [PROMOTE] marker parser (JSON-block per Section
8.1 spec drift note), stable-UUID assignment for pending_task_lane
questions/actions preserved by exact text/label match across turns.
- ASSISTANT_SYSTEM_PROMPT: documents [PROMOTE] format, when to/not to
emit, hallucination guardrails, source_ref handling.
- 17 tests covering parser, stable IDs, service validation, CRUD,
editability rule, both promote modes, 422 null-synthesis path,
state_version invariant.
Frontend:
- src/components/pilot/sections/{WhatWeKnow,WhatWeKnowItem,AddNoteButton}
— green-gradient section above Questions, dashed-circle check, inline
edit/delete gated by the server's editable flag.
- TaskLane gains a whatWeKnowSlot prop (existing assistant/ folder kept
per the doc's "rename is opportunistic" guidance).
- AssistantChatPage fetches facts on selectChat and refetches after each
chat send (so [PROMOTE]-synthesized facts appear immediately); auto-
opens the lane when facts exist.
Verification: end-to-end smoke against the local docker stack confirms
all five endpoints (list/create/patch/delete/promote) plus the 403
editability rule. pytest suite verifies the same with mocked LLM. Live
[PROMOTE] flow remains untested until used in the UI — the marker shape
is covered by parser tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backs the schema added in 210d310 with SQLAlchemy 2.0 models.
- SessionFact: "What we know" facts with polymorphic source_ref pointing
at task-lane item UUIDs inside ai_sessions.pending_task_lane (not a FK
per Section 4.2).
- SessionSuggestedFix: AI-proposed resolutions with supersession tracking
and the full user_decision state machine.
- DraftTemplate: post-resolve templatization queue with promotion to
script_templates.
- AccountSettings: per-account JSONB preferences grab-bag with async
classmethod helpers — get_setting(db, account_id, key, default) reads
without creating, set_setting(db, account_id, key, value) upserts via
Postgres ON CONFLICT + jsonb `||` merge so existing keys are preserved.
Lazy row creation matches the Phase 1 design.
Column additions on existing models to mirror the migration:
- AISession: resolution_note_* / escalation_package_* / state_version
(the preview-cache-invalidation counter consumed by Phase 3).
- ScriptTemplate: source_session_id / source_user_id / source_ticket_ref
(provenance for templates promoted from DraftTemplate).
All four new models registered in app.models.__init__ and __all__.
TYPE_CHECKING-guarded relationship imports throughout, matching the
repo's existing model style.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames the chat caller to a name that signals its actual purpose, and
factors the reusable cached-system-block + cached-history + cache-usage-log
primitives out to app.core.ai_provider so they can be shared with the
provider-generic path without pulling MCP/beta/images into the abstract
interface.
Helpers added to ai_provider.py:
- `build_anthropic_chat_messages(history, new_message, images, format_reminder)`
— owns: copy history, apply cache_control to last history message,
append format reminder to new message, render images as multimodal blocks.
Anthropic-shaped by design; do not call from Gemini paths.
chat_call_cached keeps exactly the concerns that are unique to the one
MCP/beta/multimodal chat caller:
- Anthropic beta endpoint invocation
- Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN)
- Retry-without-MCP fallback
- Format-reminder content string (declared as module constant)
- Phase 0.5 telemetry (mcp.turn, mcp.fallback)
Documents in the module docstring AND at the function site that this is
the ONE MCP/beta chat caller and should not become the general provider
path. MCP/beta/images are features of exactly one optional Anthropic beta
endpoint; routing them through AnthropicProvider would leak a provider-
specific concern into the abstract interface that also serves Gemini.
Behavior change: chat_call_cached now reuses the singleton AnthropicProvider
HTTP client via `_get_anthropic_client(...)` instead of instantiating a new
`anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern
and avoids burning connections per-turn. No user-visible difference.
No runtime verification from code-server. TODO(phase0-verify) in
ai_provider.py tracks the cache-hit verification owed on the new dev env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps each static system prompt in a single-block list so Phase 0.1's
AnthropicProvider applies cache_control: ephemeral automatically (policy α,
first block gets marked when no caller-authored cache_control is present).
Call sites:
- ai_tree_generator.scaffold_branches: SCAFFOLD_SYSTEM_PROMPT (~1k tokens)
- ai_tree_generator.generate_branch_detail: BRANCH_DETAIL_SYSTEM_PROMPT
(~2.5k tokens with few-shot example); retries inside the same function
re-read the cached block instead of paying full input cost on each attempt
- kb_conversion.convert_document: TROUBLESHOOTING or PROCEDURAL prompt
(each caches independently by text content)
- ai_fix.generate_fixes: FIX_SYSTEM_PROMPT on first attempt + corrective retry
- script_builder.send_message: SYSTEM_PROMPT_TEMPLATE (per-session language
substitution — same-language sessions share cache entries)
Each edit includes an inline comment explaining why the block is cacheable
(stable-constant, retry-reuse, per-language variant) so a future dev can
see the intent at the cache_control marker site.
script_builder history caching deliberately deferred — per Phase 0.1
decision (option i), AnthropicProvider does not automatically cache the
message list. If script_builder's growing 20-message history turns out
to be a visible cost driver via the anthropic.cache telemetry, route
that caller through the 0.4 chat wrapper which handles history caching.
No runtime verification from code-server; cache-hit behavior will be
confirmed against the new dev environment when it's up, per the inline
TODO(phase0-verify) in ai_provider.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Widens AIProvider.generate_json / generate_text / generate_text_stream
signatures to accept `system_prompt: str | list[SystemBlock]`:
- `str` (the existing call shape): passes through uncached, unchanged
behavior. Every existing caller stays on the uncached path — no silent
behavior change.
- `list[SystemBlock]`: enables Anthropic prompt caching via structured
system blocks. Caller-authored `cache_control` is honored verbatim
(policy α); if no block carries it, the provider applies
`cache_control: {"type": "ephemeral"}` to the first block only.
Gemini ignores cache_control and concatenates list entries into one
system string — the widened signature is strictly additive on that path.
Adds `anthropic.cache` structured-log telemetry: on every Anthropic
response (streaming included, via `stream.get_final_message()`), logs
`cache_read_input_tokens` and `cache_creation_input_tokens`. Telemetry
failure in streaming is swallowed so the user-facing stream never breaks.
Verification deferred: cannot run from code-server (no Python, no DB,
no dev env). TODO(phase0-verify) left inline in the module docstring.
First verification task on the new dev environment is to hit any
FlowPilot endpoint twice within 5 minutes and confirm the second call
shows cache_read_input_tokens > 0 in the `anthropic.cache` log event.
If verification fails, that's a debug task on the new env — not a
blocker for continuing Phase 0.2/0.3/0.4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Emits structured `mcp.turn` log events on every Anthropic-path chat turn,
capturing whether MCP was wired in (mcp_available), whether the model
actually invoked an MCP tool (mcp_invoked), which tool names fired,
and whether the silent retry-without-MCP fallback was triggered.
Adds a separate `mcp.fallback` event with error type/message for
fallback occurrences.
Establishes baseline data for deciding whether MCP investment is earning
its keep before Phase 2+ expands the product footprint. Scope: the one
MCP-using code path (`_call_anthropic_cached`) — not a general
instrumentation layer.
No new dependencies, no schema changes, no behavior change. Standard
library `logging` is the sink; PostHog is not wired on the backend.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation PATCHed the `resources` string directly, which CW
silently ignores because `resources` is a server-derived read-only field (it's
populated from schedule entries of type/id=4, not freely writable).
Per CW docs (openapi line 70949): "Please use the
/schedule/entries?conditions=type/id=4 AND objectId={id} endpoint".
Behavior per spec:
- No owner + assign user → set owner (existing behavior kept)
- Has owner + assign different user → POST /schedule/entries with type/id=4,
member, objectId; owner untouched
- User already assigned (owner or schedule entry) → idempotent no-op
- Remove owner → clear owner (existing behavior kept)
- Remove co-assignee → DELETE /schedule/entries/{entry_id}
- list_resources now merges owner + schedule-entry members, deduped by id
Required CW security role permission on the API member:
- Service > Resource Scheduling > Add/Inquire/Delete
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous `resources`-string PATCH was silently ignored by CW — the
`resources` field is server-derived from the ticket's owner + schedule
entries, not freely writable. Status PATCH could also silently no-op
when a cross-board status id was sent.
- add_resource: when the ticket is unassigned, set the `owner`
MemberReference (the canonical writable primary-assignee field).
If already owned by someone else, append the identifier to the
`resources` co-assignee string best-effort.
- remove_resource: clear `owner` (with remove→replace:null fallback) if
the target is the current owner, otherwise strip from `resources`.
- list_resources: merge owner + resources string, deduped by member id,
so the UI reflects both single-owner and multi-resource assignments.
- update_ticket_status: verify CW applied the status by comparing the
response body's status.id — raises PSAError with a clear message when
CW silently rejects the change (e.g., status invalid for ticket's
board), instead of reporting spurious success.
- Frontend: surface the backend error detail in the toast so users see
the real reason instead of a generic "Failed to update" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Status update was returning only new_status (string) and the parent list's
onStatusUpdated only set status_name. The <select> was bound to status_id,
which never changed — so it visually reverted to the old status even though
the PATCH succeeded.
- Backend: include new_status_id in the status-update response.
- Panel: own currentStatusId/currentStatusName state so the select reflects
the change immediately and survives stale parent snapshots.
- Parent list: update status_id on both the row and selectedTicket so the
list row stays in sync when the panel stays open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Status filter: aggregate statuses across all boards (deduped by name)
when no board is selected. Backend accepts status_name and filters by
status/name so the same status matches across boards.
- Resource assignment: CW has no /service/tickets/{id}/members endpoint —
assignees live in the ticket's comma-separated `resources` string field.
Rewrote list/add/remove to read/PATCH that field via member identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Apply company_id filter in CW search_tickets conditions (was silently ignored)
- Sanitize query string to strip single quotes before CW condition interpolation
- Add psaError state to TicketsPage for permissions error surfacing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GET /boards/{board_id}/statuses endpoint — direct board-to-statuses lookup
without ticket roundabout; used by filter bar and new ticket form
- Fix TicketsPage and NewTicketModal to call getBoardStatuses(board_id) instead
of misusing getTicketStatuses(ticket_id) with a board_id value
- Fix list_members auth: was require_account_owner (owner/super_admin only) —
changed to require_engineer_or_admin so engineers can see member list for
ticket assignment
- list_members: return [] on PSAError instead of 502 (Lesson 111 pattern)
- get_ticket_statuses: return [] on PSAError instead of 502
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- list_resources: return [] on PSAError instead of 502 — stops global interceptor
toast when CW API key lacks ticket members permission (Lesson 111)
- list_boards/list_priorities: add warning logging so Railway logs reveal the
root cause when CW permissions are missing
- TicketsPage: derive board options from ticket search results when listBoards
returns empty (CW permissions fallback)
- TicketFilterBar: replace assignment <select> with searchable member picker —
fixed options (All/Mine/Unassigned) + text-filtered member dropdown
- TicketQueue: remove Load More / infinite scroll; page now exists at /tickets
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>