Three changes that get PR #150 to a green CI gate:
1. **test_record_decision_persists_and_bumps_state_version** — the
`decision: draft_template` path calls `_extract_template_parameters`
(TemplateExtractionService → AI provider). CI doesn't set
ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
`RuntimeError: No AI provider configured` and returned 500. The test
isn't exercising the AI integration — patched the extractor with an
AsyncMock returning a minimal valid `{templated_body, parameters}`
dict. Verified locally: the test now passes.
2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
the hash of requirements*.txt / package-lock.json with a runner-os
restore-key fallback. Saves ~30-60s per run on cache hit.
3. **Pytest invocation tightened**:
- Dropped `--cov-report=term-missing` — the custom "Display coverage
summary" step below parses coverage.json and prints the same
module list more concisely. Term-missing dumps every uncovered
line which adds ~5-10s of stdout.
- Added `--maxfail=10` so a structural breakage (fixture explosion,
DB unreachable) bails after 10 errors instead of running the full
25-min suite. Tunable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test_db fixture calls Base.metadata.create_all on a fresh test DB.
That only creates tables for models that have been imported (and thus
registered with Base.metadata) by the time the fixture runs.
app.main imports app.core.database (which gives us Base) but does NOT
eagerly import the model modules — most are pulled in lazily inside
scheduler functions (archive_stale_ai_sessions etc.) and route
modules. At fixture-setup time, only the handful of models touched by
those eager imports are on the metadata, so any test that exercises
PSA, network diagrams, ratings, escalations, etc. fails with
\`UndefinedTableError: relation "X" does not exist\` and a cascade of
500s on every endpoint that queries the missing table.
Adding \`from app import models as _models\` (rather than the bare
\`import app.models\` which would shadow the \`app\` FastAPI instance
imported just above) pulls in app/models/__init__.py, which itself
imports every model module — registering all ~60 tables with
Base.metadata before create_all runs.
Verified locally: tests/test_psa_writeback_phase4.py went from
1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were
masking the actual passes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pytest-asyncio==0.24.0 (added on the FlowPilot branch as part of the
RLS test infra refactor) declares pytest>=8.2 — but requirements-dev.txt
still pinned pytest==7.4.3, so a clean pip install fails with
ResolutionImpossible. CI runners that started from a fresh image would
have refused to install dev deps; the FlowPilot tests passed locally
only because the dev container had a pre-installed pytest 8.x lying
around.
pytest-cov 4.1.0 also needs >= 5.0 to play nicely with pytest 8.
No code changes — pytest 8 is API-compatible with the existing test
suite once the install resolves.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top
of a unified main. Two manual conflict resolutions:
1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven
protocol). The pre-rewrite reference content (CW integration notes,
lessons archive, env vars table) lives in `docs/connectwise/`,
`docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design.
2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions
were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks
(facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with
PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint).
Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay,
NewTicketModal) kept. All setters wired by either branch are intact.
Verification:
- `tsc -b` clean across the merged tree.
- Browser smoke-test (Session B fixture): Phase 9 ProposalBanner
("Run AI-drafted PowerShell to recover SSL VPN") renders alongside
PSA's new Tickets sidebar icon. Console clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds backend/scripts/seed_phase9_qa_fixtures.py — creates 4 ai_sessions
plus matching session_suggested_fixes that pre-bake the four backend
states the AI orchestrator must produce to mount the five conditional
Phase 9 components:
A. no template, no draft → ChatTabStrip + ScriptBuilderTab
B. ai_drafted_script set → InlineNoTemplateDialog
C. script_template_id set → TemplateMatchPanel
D. applied_at + status=proposed → EscalateInterceptDialog (verify state)
Background: a Phase 9 QA pass against a regular session left these
five components unreached because the AI didn't emit SUGGEST_FIX in
time/at all. Seeding directly bypasses the AI and lets QA exercise
each surface deterministically.
UUIDs are deterministic (uuid5 over a fixed namespace) so re-runs
upsert. Pass --reset to wipe and recreate. Each session gets two
synthetic conversation messages so the chat header's canAct gate
(messages.length >= 2) opens up Resolve/Escalate.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Discovered during Phase 9 QA: seed_test_users.py was missing the
cancel_at_period_end column in its subscriptions INSERT, but the
column is NOT NULL (added in 016_add_subscription_tables.py).
Result: seed crashed with NotNullViolationError before any users
were created, blocking auth in fresh dev environments.
Pre-existing on main; not introduced by the FlowPilot migration
branch. Default value: false.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Continues the test-isolation work from dab740d. RLS migration tests run
against a policy-installed database and fail in the default create_all
suite, so they need to be opt-in:
- pytest.ini: register `rls` marker.
- conftest.py: auto-deselect test_rls_isolation.py unless
RUN_RLS_TESTS=1. Drops the deprecated session-scoped event_loop
fixture (not needed since pytest-asyncio 0.23+).
- test_rls_isolation.py: tag module with `rls` marker. Replace
hardcoded `patherly_test` DB reference with parsed DATABASE_TEST_URL
(matches conftest.py default `resolutionflow_test`). Updated docstring
command to show RUN_RLS_TESTS=1.
- requirements-dev.txt: bump pytest-asyncio 0.23.0 → 0.24.0 (loop-scope
marker behavior required by the RLS module fixture).
Run the RLS suite with:
RUN_RLS_TESTS=1 DB_APP_ROLE_PASSWORD=... pytest tests/test_rls_isolation.py
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Root cause of the 06:32 AM outage: running 'pytest tests/' inside the
resolutionflow_backend container silently dropped the public schema on
the DEV database. Two layered bugs made this possible; both are fixed.
Bug 1 — env-var lookup in conftest.TEST_DATABASE_URL put DATABASE_URL
(which normally points at the dev/prod DB) ahead of DATABASE_TEST_URL.
When DATABASE_URL is set, pytest used the dev DB as the 'test' DB and
the test_db fixture's DROP SCHEMA public CASCADE wiped it. Fixed:
- Honor only DATABASE_TEST_URL (or the localhost fallback).
- Assert at module load that the DB name contains 'test' — refuses
to run otherwise. Makes future misconfiguration impossible.
Bug 2 — conftest overrode app.dependency_overrides[get_db] but not
get_admin_db. Endpoints using get_admin_db (register, admin routes)
bypassed the test session and hit the real admin DB. Before Bug 1 was
fixed this was hidden because both engines pointed at the same dev DB.
With isolation in place, register started failing 'Email already
registered' because of stale users in the dev DB. Fixed:
- Also override get_admin_db to yield the same test session. RLS is
not enabled in the create_all-managed test schema, so sharing is
safe.
Also adds DATABASE_TEST_URL=resolutionflow_test to docker-compose.dev.yml
so pytest in the container works out of the box.
Verified: 49/50 Phase 8 + 9 tests pass against resolutionflow_test; the
1 failure is the pre-existing Phase 8 Issue #4
(test_record_decision_persists_and_bumps_state_version).
Refs gitea #145 (will update that issue with this as the primary fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Called by the inline Script Builder tab on Submit. Writes
ai_drafted_script + ai_drafted_parameters to the fix without stamping
applied_at (a draft is not an application — that's §5 of the Phase 9
spec). Bumps state_version so Resolve/Escalate preview bundles
regenerate.
409 on terminal fix status. 404 on wrong session. 422 on empty script.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST /script-builder/sessions now supports origin='pilot_inline':
- Requires ai_session_id; validates it against current user ownership.
- Get-or-create: returns existing row for (user, ai_session_id) pair.
- Partial unique index on the DB backs the invariant; races resolve to
the single winner row.
list_sessions + count_user_sessions default-scope to origin='standalone'
so inline scratch sessions don't pollute the /script-builder dashboard
or count against the 5-session cap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the DB column added in the prior migration. App-level default
is 'standalone' so existing callers of ScriptBuilderSession(...) work
without code changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 9 prep. Adds:
- origin VARCHAR(20) NOT NULL with CHECK ('standalone' | 'pilot_inline')
- invariant: pilot_inline rows must have ai_session_id
- partial unique index on (user_id, ai_session_id) WHERE origin='pilot_inline'
— backs get-or-create idempotency for the inline Script Builder tab.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #3 from phase-8-review-issues.md. 'Not yet' on the AI-confirming
banner was a local-state hide; the proposal re-surfaced on the next
refreshSessionDerived call.
Two-part fix:
- PATCH /outcome now clears ai_outcome_proposal on any terminal action
(engineer has taken a decision; stale AI proposal is moot).
- New DELETE /ai-sessions/:sid/suggested-fixes/:fid/ai-outcome-proposal
endpoint for explicit 'Not yet' rejection. Does not touch status
or state_version — pure UI state.
Frontend handleRejectAIProposal now calls the DELETE and setActiveFix
with the server response.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #2 from phase-8-review-issues.md. Apply was client-side-only via
a bannerApplied flag. Refresh / chat reselect / multi-tab would drop
Verifying state back to Proposed.
- New POST /ai-sessions/{sid}/suggested-fixes/{fid}/apply stamps
applied_at without changing status (still 'proposed'). Idempotent
if already stamped; 409 if fix is past proposed (a terminal outcome
was already recorded).
- Bumps state_version so resolve/escalate preview bundles reflect that
the fix has entered verifying.
- Frontend handleApplyFix calls the endpoint and uses the returned
applied_at directly. bannerApplied client flag is removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't
enough — previews were also omitting outcome fields from the LLM bundle,
so a fresh regenerate still couldn't distinguish proposed / failed /
partial / success.
- PATCH /outcome now bumps ai_sessions.state_version (matches
record_decision's existing pattern).
- Resolution-note + escalation-package bundles now include status,
applied_at, verified_at, partial_notes, failure_reason on the active fix.
- Generator prompts prescribe outcome-aware phrasing (closure language
for success; what-we've-tried + next-steps for failed/partial).
- New end-to-end test asserts the regenerated preview reflects the
recorded outcome, not just that the cache key changed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tells the AI when + how to emit the [FIX_OUTCOME] marker that Task 4's
parser consumes. Placeholder-only per the anti-parrot pattern — no
literal UUIDs, outcomes, or reasons that could leak into unrelated
sessions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The AI emits [FIX_OUTCOME] when the engineer indicates in chat that a
prior suggested fix worked, didn't work, or was partially applied. The
marker writes to session_suggested_fixes.ai_outcome_proposal (JSONB),
which the frontend surfaces as a "confirm outcome?" banner. The status
column is only updated when the engineer clicks confirm (via PATCH
/outcome endpoint from Task 3).
Placeholder-only system prompt wiring comes in Task 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records engineer-reported outcome (applied_success|applied_failed|
applied_partial|dismissed). Enforces transition rules (partial → success/
failed allowed; terminal outcomes return 409) and notes requirements
(applied_partial requires notes).
Sets verified_at on success/failure, stamps applied_at if not already
set (handles the case where the AI [FIX_OUTCOME] marker fires before
the engineer clicks Apply).
Also fixes pre-existing test-infrastructure bug: network_diagram.py used
bare string server_default="'[]'" for JSONB columns, which asyncpg
rejects during test schema creation. Changed to text("'[]'::jsonb") to
match the pattern used by script_template.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds FixStatus literal (5 values matching the DB check constraint),
extends SessionSuggestedFixResponse with outcome fields, and introduces
SessionSuggestedFixOutcomeRequest for the PATCH /outcome endpoint coming
in Task 3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 8 prep for the fix outcome banner. Adds:
- status (proposed|applied_success|applied_failed|applied_partial|dismissed)
- applied_at, verified_at (timestamps)
- partial_notes, failure_reason (engineer-provided context)
- ai_outcome_proposal (JSONB for AI [FIX_OUTCOME] marker payloads)
Backfills status='dismissed' from user_decision='dismissed'. status is
orthogonal to user_decision — outcome (did the fix work?) vs script-path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on the Phase 5 "Run now, templatize after resolve" path.
After a session resolves, drafts queued by the three-option dialog surface
as a modal that lets the engineer review the AI-proposed parameterization
and either save as a reusable team template or skip. A "don't ask again"
toggle writes to account_settings.preferences so the next resolve won't
pop the modal.
Backend:
- /api/v1/draft-templates:
* GET — list account drafts (pending_only default true; pass false for
audit view including accepted/rejected)
* GET /{id} — single draft
* POST /{id}/accept — promotes to a new script_templates row with
source_session_id / source_user_id / source_ticket_ref populated
(drives the Script Library "generated from CW #X · resolved by Y"
provenance chip). Draft flips to status=accepted,
promoted_template_id set, resolved_at stamped. 409 on re-accept /
already-rejected. 400 on unknown category_id.
* POST /{id}/reject — flips to status=rejected. 409 on re-reject.
- /api/v1/accounts/me/preferences (GET/PATCH) — thin wrapper over
AccountSettings.get_setting/set_setting. PATCH merges keys into the
JSONB column, preserving existing keys the client didn't touch.
Used by the "Don't ask again for this team" checkbox
(templatize_prompt_enabled=false) and, forward-looking, by
cw_resolved_status_id / cw_escalated_status_id from Phase 4.
- 13 tests: list filter, accept with/without edited_body, provenance
copy-through, reject, 409 on re-accept / re-reject, 400 on unknown
category, prefs round-trip with merge semantics.
Frontend:
- src/components/pilot/script/TemplatizePrompt.tsx — modal showing the
drafted script with proposed parameters in the Phase 5
ParameterizationPreview, editable name/category/description, an
individual-parameter remove button, and the "don't ask again" opt-out.
Accept posts to /draft-templates/{id}/accept + optionally PATCHes
preferences. Skip posts /reject.
- src/api/draftTemplates.ts — typed client plus accountPreferencesApi.
- AssistantChatPage: after a successful Resolve (external OR local),
fetches preferences + pending drafts for the session and queues the
modal one draft at a time. Escalate does not trigger this flow.
- Sidebar: Scripts nav shows the pending-draft count as a badge. Fetched
independently of the main sidebar stats so endpoint flakes don't
break the rest of the sidebar.
Verified live 2026-04-22: seed two drafts → GET sees both pending →
accept draft A (template created, provenance CW #99123 populated) →
reject draft B → pending count drops → PATCH opt-out → GET confirms
persistence.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "AI parrots example content from system prompt" bug bit us twice in
one day across two different prompt sites. Patching individual prompts
is treating the symptom; this commit makes the rule structural.
Audit + sanitize:
- assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in
prior commits, but the [FORK] schema still had literal "Brief reason"
/ "Short name" / "One sentence" placeholders. Replaced with
<angle-bracket> placeholders. Anti-parrot rule itself rewritten to
describe the failure mode abstractly instead of naming "jsmith" so
the rule no longer trips the guardrail (and so the model doesn't
see "jsmith" as a token at all).
- ai_chat_service.py — removed three concrete-example offenders:
"Get-Service ADSync" command literal, the "DC01 server_name" intake
form payload (in two places), and the inline interview demos using
"Azure AD Sync failures" / "Exchange Online mailbox migration".
Replaced with technology-neutral schema descriptions.
- ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the
fully-fleshed DNS troubleshooting tree (with literal Dnscache /
ipconfig / google.com / Start-Service) with a placeholder schema
showing only ID-linkage shape.
- kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked
Server Manager + DC01 example payload with a placeholder schema.
Guardrail (tests/test_prompt_anti_parrot.py):
- Imports every module under app/services/ and app/core/ and walks
every uppercase string constant ending in _PROMPT, _SCHEMA,
_PROTOCOL, _FORMAT, or _CONTEXT.
- test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache,
google.com, "Outlook keeps", "Teams drops") must not appear in any
prompt constant. Add to the list when a new leak shows up in prod —
the list IS the audit trail.
- test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.)
must contain placeholders only. Distinguishes JSON keys (followed
by ':', allowed) from JSON values (followed by ',' / ']' / '}',
must be <placeholder>); allows pipe-separated enum types
(text|password|select) and a small set of fixed enum values
(question, diagnostic_check, decision, action, ...). Verified by
feeding the test a known-bad block — caught it correctly.
Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming
the test as the enforcement point so future contributors know how to
extend it (add to the known-leaked list when a new leak surfaces).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The system prompt had a "Complete example of a correct first response"
section with a specific Outlook/WiFi/jsmith scenario plus literal JSON
payloads in [QUESTIONS], [ACTIONS], [SUGGEST_FIX], and [PROMOTE]
markers. The model was emitting those literal strings (the same
WiFi/laptop questions, the same "Clear cached credentials" suggested
fix, the same "OWA login confirmed for jsmith" promote) on EVERY
unrelated chat — making the task lane look like it was leaking previous-
session data when in fact the AI was just reciting the prompt examples.
Replaced literal example content with `<placeholder>` schemas. Added an
explicit ANTI-PARROT RULE in the FINAL REMINDER section calling out
that the angle-bracket placeholders show SHAPE, not CONTENT, with
concrete examples of the failure mode (printer ticket → don't ask
about Outlook; user not named jsmith → don't name jsmith).
Same scrub applied to the FORK section's "Outlook AND Teams dropping"
and the worked fork-flow example.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the SuggestedFix card to an inline panel that handles both cases:
template-matched fixes open the Script Library generator with parameters
pre-filled from session context; un-matched fixes open the three-option
dialog (one_off / draft_template / build_template). The decision endpoint
records the path choice with side effects: draft_template persists a
draft_templates row via a Sonnet-driven TemplateExtractionService;
build_template returns a redirect to the Script Builder; one_off just
records the choice.
Backend:
- TemplateExtractionService: drafts a parameter schema from a concrete
rendered script. Conservative by default ("prefer fewer parameters").
Round-trip-validates that templated_body only references declared
parameters; missing-key mismatch falls back to the original script
with no params. LLM/parse failures fall back identically — the
engineer can still create a draft and refine in the post-resolve
prompt (Phase 6).
- /suggested-fixes/{fix_id}/decision side effects:
* one_off → returns rendered_script (engineer's edited version or the
fix's ai_drafted_script verbatim)
* draft_template → same + creates draft_templates row with extracted
params, returns draft_template_id
* build_template → returns redirect_path=/scripts/builder?from_session=
&fix= so the frontend can navigate to the builder pre-loaded
- 400 when a non-template fix has no ai_drafted_script (template-matched
fixes take the dedicated /scripts/generate path, not this endpoint).
- 12 tests: TemplateExtractionService parse + fallback paths, all four
decision branches, edited_script override, missing-script 400.
Frontend:
- src/components/pilot/script/{TemplateMatchPanel, NoTemplateDialog,
ParameterizationPreview}.tsx — inline panels rendered in the task
lane's bottom slot when the engineer clicks a SuggestedFix card.
- TemplateMatchPanel: loads template via /scripts/templates/{id},
pre-fills params from fix.ai_drafted_parameters with cyan "from
session" tags, generates via existing /scripts/generate (already
bumps state_version on ai_session_id from Phase 3). 404 falls back
with a clear message instead of erroring.
- NoTemplateDialog: shows the AI-drafted script with proposed parameter
values highlighted in amber via ParameterizationPreview; three option
cards with the middle (draft_template) flagged Recommended; inline
edit on the script body before deciding.
- SuggestedFix card now clickable: onActivate toggles the inline panel.
- AssistantChatPage: scriptPanelOpen state + handleScriptDecision that
navigates on build_template and toasts on the other paths. Active fix
changes auto-close the panel so engineers don't act on stale state.
- Cmd+K → "Open inline Script Generator" palette entry surfaces only on
/pilot/:id routes; fires a window event the chat page subscribes to.
No Resolve shortcut added per Section 14 decision (browser ⌘R conflict).
Verified 2026-04-22 against the dev stack:
- one_off / draft_template / build_template all return the right shape
with real Sonnet TemplateExtractionService for the draft path.
- Conservative extraction confirmed: cmdkey + Restart-Process script
yielded zero proposed parameters as intended.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the preview popover's Confirm & post action to ConnectWise (and,
via the provider pattern, any future PSA). Adds the parallel Escalate
flow with the handoff-oriented five-section markdown. Sessions without a
linked PSA ticket resolve/escalate locally — markdown stored, status
flipped, nothing posted externally.
Backend:
- EscalationPackageGeneratorService: Sonnet, five sections (Problem /
What we've confirmed / What we've tried / Current hypothesis /
Suggested next steps). Shares the preview_cache with a separate KIND
so Resolve and Escalate previews for the same state coexist.
- PSAWritebackService: post_resolution_note (RESOLUTION note type,
customer-visible), post_escalation_package (INTERNAL_ANALYSIS,
handoff for the next engineer only), transition_ticket_status with
mandatory re-fetch verification. PSAStatusVerificationError surfaces
loudly when CW silently rejects a status change — the
ConnectWise anti-pattern CLAUDE.md flags.
- Endpoints:
* POST /ai-sessions/{id}/escalation-package/preview
* POST /ai-sessions/{id}/resolution-note/post
* POST /ai-sessions/{id}/escalation-package/post
Outcomes: "resolved" / "escalated" with external_id + verified status,
"resolved_local" / "escalated_local" when no PSA linked.
- Target CW status IDs live in account_settings.preferences
(cw_resolved_status_id, cw_escalated_status_id). When unset, the post
proceeds without a status transition — response includes a
status_transition_skipped_reason rather than silently erroring.
- 7 tests: local-only path, PSA happy path with verified transition,
status verification failure → 502, skipped transition when
unconfigured, 409 on already-resolved re-post, escalate parallel path,
internal-analysis note type enforced.
Frontend:
- ResolutionNotePreview now kind-parameterized ('resolve' | 'escalate')
with inline edit + Confirm & post. Preview loads from the matching
backend endpoint; posting calls the matching endpoint; outcome toast
surfaces the verified CW status or the local-only result.
- AssistantChatPage: previewKind state replaces previewOpen; two toggle
buttons (Preview Resolve note / Escalate instead) in the lane's bottom
slot. handleConfirmPost dispatches by kind.
Verified 2026-04-22:
- Local-only Resolve + Escalate round-trip against the dev stack.
- Live Sonnet escalation-package preview; cache hit on repeat call
with no state change (separate cache kind from resolution-note).
- PSA post + status-verification paths covered by mocked-provider pytest
cases. Live CW round-trip pending a test CW instance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the AI-proposed resolution path and the inline preview of the
markdown that will be posted to the customer ticket on Resolve. The
preview is keyed on (session_id, ai_sessions.state_version) so back-to-
back fetches against unchanged state hit an in-process cache instead
of paying for a Sonnet call.
Backend:
- preview_cache: in-process LRU keyed on (kind, session_id, state_version).
No TTL — state_version is the source of truth. Soft-cap 5000 entries.
- unified_chat_service: [SUGGEST_FIX] parser (last-block-wins, JSON
payload, confidence clamped 0-100), supersession persistence (sets
superseded_at on prior active row), atomic state_version bump.
- ResolutionNoteGeneratorService: pulls session, facts, active fix, and
redacted script_generations into a structured input bundle for Sonnet;
produces the four-section markdown (Problem / What we confirmed /
Root cause / Resolution). Sensitive script parameters redacted via
ScriptTemplateEngine.redact_sensitive driven by the template's
parameters_schema.
- /api/v1/ai-sessions/{id}/suggested-fixes/active — 200 with the active
fix or 404.
- /api/v1/ai-sessions/{id}/suggested-fixes/{fix_id}/decision — records
one_off / draft_template / build_template / dismissed; dismiss
supersedes; bumps state_version. 409 on dismissing an already-
superseded fix.
- /api/v1/ai-sessions/{id}/resolution-note/preview — generates or returns
cached markdown; from_cache flag in payload signals cache hit.
- scripts.py POST /generate now bumps state_version on the linked
ai_session_id when present (third source of preview-cache invalidation
per Section 5.5).
- ASSISTANT_SYSTEM_PROMPT documents [SUGGEST_FIX] (when to/not to emit,
format, supersession semantics).
- 12 tests covering the parser (well-formed, last-wins, malformed,
confidence clamping), supersession + state_version invariant, all
decision branches, preview cache hit-on-no-change + miss-after-write.
Frontend:
- src/components/pilot/sections/SuggestedFix.tsx — amber-accented card
with confidence badge; dismiss action wired to the decision endpoint.
- src/components/pilot/ResolutionNotePreview.tsx — popover with refresh,
loading state, cached/fresh indicator, ticket-ref display.
- src/api/sessionSuggestedFixes.ts — typed client; getActive normalizes
404 to null so callers don't have to special-case.
- TaskLane gains suggestedFixSlot + bottomSlot props (rendered after
Diagnostic Checks; bottomSlot anchors the Resolve action).
- AssistantChatPage: refreshSessionDerived helper batches fact + fix
refresh; fact mutations and chat sends both schedule a 500ms-debounced
preview refresh per the Section 5.5 spec.
Verified end-to-end against the dev stack with a real Sonnet call:
- /active 404 → fact create → preview generates four-section markdown
grounded only in provided facts → second preview call hits cache
(from_cache=true, no LLM call) → fact write 2 → cache miss, regenerates.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the load-bearing structural feature of the FlowPilot migration: a
"What we know" panel that holds confirmed facts for a session, fed by AI
[PROMOTE] markers and engineer-added notes. Facts feed the resolution
note preview (Phase 3) and survive across turns via stable UUIDs assigned
to pending_task_lane items.
Backend:
- FactSynthesisService: create/update/soft-delete facts with atomic
state_version bumps; LLM-backed synthesize_from_question/check on the
fact_synthesis (Haiku) action tier per Section 6.6.
- /api/v1/ai-sessions/{id}/facts CRUD + /facts/promote (proposed_text or
via synthesis). PATCH returns 403 for question/diagnostic_check facts
(edit the source item instead, Section 7.3).
- unified_chat_service: [PROMOTE] marker parser (JSON-block per Section
8.1 spec drift note), stable-UUID assignment for pending_task_lane
questions/actions preserved by exact text/label match across turns.
- ASSISTANT_SYSTEM_PROMPT: documents [PROMOTE] format, when to/not to
emit, hallucination guardrails, source_ref handling.
- 17 tests covering parser, stable IDs, service validation, CRUD,
editability rule, both promote modes, 422 null-synthesis path,
state_version invariant.
Frontend:
- src/components/pilot/sections/{WhatWeKnow,WhatWeKnowItem,AddNoteButton}
— green-gradient section above Questions, dashed-circle check, inline
edit/delete gated by the server's editable flag.
- TaskLane gains a whatWeKnowSlot prop (existing assistant/ folder kept
per the doc's "rename is opportunistic" guidance).
- AssistantChatPage fetches facts on selectChat and refetches after each
chat send (so [PROMOTE]-synthesized facts appear immediately); auto-
opens the lane when facts exist.
Verification: end-to-end smoke against the local docker stack confirms
all five endpoints (list/create/patch/delete/promote) plus the 403
editability rule. pytest suite verifies the same with mocked LLM. Live
[PROMOTE] flow remains untested until used in the UI — the marker shape
is covered by parser tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Backs the schema added in 210d310 with SQLAlchemy 2.0 models.
- SessionFact: "What we know" facts with polymorphic source_ref pointing
at task-lane item UUIDs inside ai_sessions.pending_task_lane (not a FK
per Section 4.2).
- SessionSuggestedFix: AI-proposed resolutions with supersession tracking
and the full user_decision state machine.
- DraftTemplate: post-resolve templatization queue with promotion to
script_templates.
- AccountSettings: per-account JSONB preferences grab-bag with async
classmethod helpers — get_setting(db, account_id, key, default) reads
without creating, set_setting(db, account_id, key, value) upserts via
Postgres ON CONFLICT + jsonb `||` merge so existing keys are preserved.
Lazy row creation matches the Phase 1 design.
Column additions on existing models to mirror the migration:
- AISession: resolution_note_* / escalation_package_* / state_version
(the preview-cache-invalidation counter consumed by Phase 3).
- ScriptTemplate: source_session_id / source_user_id / source_ticket_ref
(provenance for templates promoted from DraftTemplate).
All four new models registered in app.models.__init__ and __all__.
TYPE_CHECKING-guarded relationship imports throughout, matching the
repo's existing model style.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the backing store for the FlowPilot unified session surface, per
the FLOWPILOT-MIGRATION.md Phase 1 deliverable. Descends from production
head 074 (add_network_diagrams_table).
New tables (all tenant-scoped, all RLS-enabled + forced):
- session_facts — "What we know" facts. source_ref is a polymorphic
pointer to a task-lane item inside ai_sessions.pending_task_lane
(no DB-level FK; integrity enforced at service layer per Section 4.2
of the design doc). Soft-delete via deleted_at; active-facts partial
index excludes deleted rows.
- session_suggested_fixes — AI-proposed resolutions. One active per
session at a time (supersession tracked via superseded_at; partial
index on (session_id) WHERE superseded_at IS NULL powers the
"find active fix" query).
- draft_templates — scripts pending post-resolve templatization.
Partial index on (account_id) WHERE status='pending' supports the
"N scripts ready to review" Script Library badge.
- account_settings — new per-account table with JSONB preferences
grab-bag. Rows created lazily on first write; get_setting returns
default when no row exists.
Column additions on ai_sessions:
- resolution_note_markdown / posted_at / external_id
- escalation_package_markdown / posted_at / external_id
- state_version (INTEGER NOT NULL DEFAULT 0) — incremented atomically
by any write that invalidates the resolution note preview cache
per Section 5.5. Phase 3 consumes this.
Column additions on script_templates:
- source_session_id, source_user_id, source_ticket_ref — powers the
"generated from CW #X · resolved by Y · used N times" provenance
chip in the Script Library.
RLS pattern matches the repo convention (074 / network_diagrams is the
nearest template): ENABLE + FORCE, USING + WITH CHECK on
`account_id = app.current_account_id`. Downgrade is reversible —
drops in the inverse order of creation so FK dependencies unwind.
No runtime verification from code-server; migration apply + downgrade
will be verified on the new dev environment per the standing deferral.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Renames the chat caller to a name that signals its actual purpose, and
factors the reusable cached-system-block + cached-history + cache-usage-log
primitives out to app.core.ai_provider so they can be shared with the
provider-generic path without pulling MCP/beta/images into the abstract
interface.
Helpers added to ai_provider.py:
- `build_anthropic_chat_messages(history, new_message, images, format_reminder)`
— owns: copy history, apply cache_control to last history message,
append format reminder to new message, render images as multimodal blocks.
Anthropic-shaped by design; do not call from Gemini paths.
chat_call_cached keeps exactly the concerns that are unique to the one
MCP/beta/multimodal chat caller:
- Anthropic beta endpoint invocation
- Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN)
- Retry-without-MCP fallback
- Format-reminder content string (declared as module constant)
- Phase 0.5 telemetry (mcp.turn, mcp.fallback)
Documents in the module docstring AND at the function site that this is
the ONE MCP/beta chat caller and should not become the general provider
path. MCP/beta/images are features of exactly one optional Anthropic beta
endpoint; routing them through AnthropicProvider would leak a provider-
specific concern into the abstract interface that also serves Gemini.
Behavior change: chat_call_cached now reuses the singleton AnthropicProvider
HTTP client via `_get_anthropic_client(...)` instead of instantiating a new
`anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern
and avoids burning connections per-turn. No user-visible difference.
No runtime verification from code-server. TODO(phase0-verify) in
ai_provider.py tracks the cache-hit verification owed on the new dev env.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps each static system prompt in a single-block list so Phase 0.1's
AnthropicProvider applies cache_control: ephemeral automatically (policy α,
first block gets marked when no caller-authored cache_control is present).
Call sites:
- ai_tree_generator.scaffold_branches: SCAFFOLD_SYSTEM_PROMPT (~1k tokens)
- ai_tree_generator.generate_branch_detail: BRANCH_DETAIL_SYSTEM_PROMPT
(~2.5k tokens with few-shot example); retries inside the same function
re-read the cached block instead of paying full input cost on each attempt
- kb_conversion.convert_document: TROUBLESHOOTING or PROCEDURAL prompt
(each caches independently by text content)
- ai_fix.generate_fixes: FIX_SYSTEM_PROMPT on first attempt + corrective retry
- script_builder.send_message: SYSTEM_PROMPT_TEMPLATE (per-session language
substitution — same-language sessions share cache entries)
Each edit includes an inline comment explaining why the block is cacheable
(stable-constant, retry-reuse, per-language variant) so a future dev can
see the intent at the cache_control marker site.
script_builder history caching deliberately deferred — per Phase 0.1
decision (option i), AnthropicProvider does not automatically cache the
message list. If script_builder's growing 20-message history turns out
to be a visible cost driver via the anthropic.cache telemetry, route
that caller through the 0.4 chat wrapper which handles history caching.
No runtime verification from code-server; cache-hit behavior will be
confirmed against the new dev environment when it's up, per the inline
TODO(phase0-verify) in ai_provider.py.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Widens AIProvider.generate_json / generate_text / generate_text_stream
signatures to accept `system_prompt: str | list[SystemBlock]`:
- `str` (the existing call shape): passes through uncached, unchanged
behavior. Every existing caller stays on the uncached path — no silent
behavior change.
- `list[SystemBlock]`: enables Anthropic prompt caching via structured
system blocks. Caller-authored `cache_control` is honored verbatim
(policy α); if no block carries it, the provider applies
`cache_control: {"type": "ephemeral"}` to the first block only.
Gemini ignores cache_control and concatenates list entries into one
system string — the widened signature is strictly additive on that path.
Adds `anthropic.cache` structured-log telemetry: on every Anthropic
response (streaming included, via `stream.get_final_message()`), logs
`cache_read_input_tokens` and `cache_creation_input_tokens`. Telemetry
failure in streaming is swallowed so the user-facing stream never breaks.
Verification deferred: cannot run from code-server (no Python, no DB,
no dev env). TODO(phase0-verify) left inline in the module docstring.
First verification task on the new dev environment is to hit any
FlowPilot endpoint twice within 5 minutes and confirm the second call
shows cache_read_input_tokens > 0 in the `anthropic.cache` log event.
If verification fails, that's a debug task on the new env — not a
blocker for continuing Phase 0.2/0.3/0.4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Emits structured `mcp.turn` log events on every Anthropic-path chat turn,
capturing whether MCP was wired in (mcp_available), whether the model
actually invoked an MCP tool (mcp_invoked), which tool names fired,
and whether the silent retry-without-MCP fallback was triggered.
Adds a separate `mcp.fallback` event with error type/message for
fallback occurrences.
Establishes baseline data for deciding whether MCP investment is earning
its keep before Phase 2+ expands the product footprint. Scope: the one
MCP-using code path (`_call_anthropic_cached`) — not a general
instrumentation layer.
No new dependencies, no schema changes, no behavior change. Standard
library `logging` is the sink; PostHog is not wired on the backend.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation PATCHed the `resources` string directly, which CW
silently ignores because `resources` is a server-derived read-only field (it's
populated from schedule entries of type/id=4, not freely writable).
Per CW docs (openapi line 70949): "Please use the
/schedule/entries?conditions=type/id=4 AND objectId={id} endpoint".
Behavior per spec:
- No owner + assign user → set owner (existing behavior kept)
- Has owner + assign different user → POST /schedule/entries with type/id=4,
member, objectId; owner untouched
- User already assigned (owner or schedule entry) → idempotent no-op
- Remove owner → clear owner (existing behavior kept)
- Remove co-assignee → DELETE /schedule/entries/{entry_id}
- list_resources now merges owner + schedule-entry members, deduped by id
Required CW security role permission on the API member:
- Service > Resource Scheduling > Add/Inquire/Delete
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous `resources`-string PATCH was silently ignored by CW — the
`resources` field is server-derived from the ticket's owner + schedule
entries, not freely writable. Status PATCH could also silently no-op
when a cross-board status id was sent.
- add_resource: when the ticket is unassigned, set the `owner`
MemberReference (the canonical writable primary-assignee field).
If already owned by someone else, append the identifier to the
`resources` co-assignee string best-effort.
- remove_resource: clear `owner` (with remove→replace:null fallback) if
the target is the current owner, otherwise strip from `resources`.
- list_resources: merge owner + resources string, deduped by member id,
so the UI reflects both single-owner and multi-resource assignments.
- update_ticket_status: verify CW applied the status by comparing the
response body's status.id — raises PSAError with a clear message when
CW silently rejects the change (e.g., status invalid for ticket's
board), instead of reporting spurious success.
- Frontend: surface the backend error detail in the toast so users see
the real reason instead of a generic "Failed to update" message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Status update was returning only new_status (string) and the parent list's
onStatusUpdated only set status_name. The <select> was bound to status_id,
which never changed — so it visually reverted to the old status even though
the PATCH succeeded.
- Backend: include new_status_id in the status-update response.
- Panel: own currentStatusId/currentStatusName state so the select reflects
the change immediately and survives stale parent snapshots.
- Parent list: update status_id on both the row and selectedTicket so the
list row stays in sync when the panel stays open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Status filter: aggregate statuses across all boards (deduped by name)
when no board is selected. Backend accepts status_name and filters by
status/name so the same status matches across boards.
- Resource assignment: CW has no /service/tickets/{id}/members endpoint —
assignees live in the ticket's comma-separated `resources` string field.
Rewrote list/add/remove to read/PATCH that field via member identifier.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Apply company_id filter in CW search_tickets conditions (was silently ignored)
- Sanitize query string to strip single quotes before CW condition interpolation
- Add psaError state to TicketsPage for permissions error surfacing
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add GET /boards/{board_id}/statuses endpoint — direct board-to-statuses lookup
without ticket roundabout; used by filter bar and new ticket form
- Fix TicketsPage and NewTicketModal to call getBoardStatuses(board_id) instead
of misusing getTicketStatuses(ticket_id) with a board_id value
- Fix list_members auth: was require_account_owner (owner/super_admin only) —
changed to require_engineer_or_admin so engineers can see member list for
ticket assignment
- list_members: return [] on PSAError instead of 502 (Lesson 111 pattern)
- get_ticket_statuses: return [] on PSAError instead of 502
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- list_resources: return [] on PSAError instead of 502 — stops global interceptor
toast when CW API key lacks ticket members permission (Lesson 111)
- list_boards/list_priorities: add warning logging so Railway logs reveal the
root cause when CW permissions are missing
- TicketsPage: derive board options from ticket search results when listBoards
returns empty (CW permissions fallback)
- TicketFilterBar: replace assignment <select> with searchable member picker —
fixed options (All/Mine/Unassigned) + text-filtered member dropdown
- TicketQueue: remove Load More / infinite scroll; page now exists at /tickets
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>