resolutionflow

Author	SHA1	Message	Date
Michael Chihlas	0f00ee5e01	feat(escalations): close out plan-locked wedge polish Four items from the design-plan audit, all flagged as locked-design or Codex corrections, shipped together so the GTM demo path covers them end-to-end before bug bash. 1. Live AI assessment refresh on the magic-moment screen. Backend already publishes handoff_assessment_ready when enrich_escalation_async commits; wire the frontend listener so the senior sees the assessment populate without a manual reopen. New event type + onAssessmentReady handler on streamEscalations; AssistantChatPage opens a scoped SSE subscription whenever it tracks a handoff missing its assessment, refetches on match, and replaces magicHandoff / overlayHandoff in place. Closes the loop on the async-assessment commit `e8ba74e`. 2. Suggested-step chips below the chat input. Locked design from the plan (Codex correction). Chip strip renders above the composer post-claim when ai_assessment_data.suggested_steps[] is non-empty. Click prefills the input and focuses; first send or explicit X hides for the session. 3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen set (rf-escalation-seen, capped 200). Dot top-right when not seen. Cleared on open (card click) or claim (Pick Up) — NOT on hover, per Codex correction. Pick Up stops propagation so it doesn't double-fire. 4. Race-condition toast on claim conflict. The /claim endpoint previously silently overwrote claimed_by — both seniors thought they owned the session. New HandoffAlreadyClaimedError carries the winner's id/name/ timestamp; claim_session rejects different-user re-claims (same-user is idempotent for double-click safety); endpoint returns 409 with structured detail. AssistantChatPage.handleStartHere extracts and surfaces "Already claimed by {name} {time_ago}." via toast, drops ?pickup=true, dismisses magic-moment so the loser flows back to queue. Tests: 2 new unit tests in test_handoff_manager.py (conflict raises, same-user idempotent). Full handoff + escalation suite (34 tests) green. Frontend tsc -b clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 01:59:28 -04:00
Michael Chihlas	e8ba74ed6d	feat(escalations): distinguishable notifications, async AI, richer sidebar All checks were successful Mirror to GitHub / mirror (push) Successful in 6m5s Details CI / frontend (pull_request) Successful in 11m59s Details CI / e2e (pull_request) Successful in 10m7s Details CI / backend (pull_request) Successful in 16m22s Details Three improvements driven by live wedge testing. 1) Notification title now includes a problem snippet and PSA ticket suffix when present: "Escalation from Jane · #12345: Outlook is failing to sync email…" Replaces the prior "Session escalated by Jane" copy that made every escalation from the same junior look identical in the bell panel. Snippet is trimmed to 70 chars with ellipsis. handoff_manager now passes psa_ticket_id through in the notify() payload so this works for both /escalate and /handoff entry points. 2) AI enrichment (assessment + enhanced escalation_package) moved to a FastAPI BackgroundTask. The escalating engineer no longer waits on 15-25s of Sonnet latency — handoff creation returns as soon as snapshot, status flip, dual-write, documentation, PSA push, and notify() are committed. enrich_escalation_async opens its own DB session, runs both AI calls, updates handoff.ai_assessment + session.escalation_package, commits, and publishes a new `handoff_assessment_ready` event on the escalation bus. Frontend doesn't yet listen for that event — the magic-moment screen still shows a placeholder ("AI assessment is still generating. Reopen this view in a few seconds…") which is honest about the state. Live polling / auto-refresh on the bus event is the natural next step. 3) ChatSidebar entries now surface the problem summary as a secondary line and tag PSA-linked sessions with a monospace #ticket badge plus an "Escalated" pill on in-transit sessions. ChatListItem grew problem_summary, psa_ticket_id, and status fields; loadChats populates them from listSessions. The user couldn't tell their own sessions apart in the sidebar because they all rendered as "New Chat" with no distinguishing detail — this fixes that for any session, escalated or not. Test plan - Backend full suite: 1103 passed in 255.85s with -n auto. - Frontend tsc -b clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 00:34:32 -04:00
Michael Chihlas	aca915b047	fix(escalations): bump assessment timeout, surface picked-up sessions in sidebar All checks were successful Mirror to GitHub / mirror (push) Successful in 4s Details CI / frontend (pull_request) Successful in 5m6s Details CI / backend (pull_request) Successful in 9m45s Details CI / e2e (pull_request) Successful in 10m20s Details Two field-reported issues from live wedge testing. ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS bumped 5s → 15s. The 5s bound fired too aggressively against the Sonnet diagnostic assessment prompt; ~4-8s is typical but tail latency hits 12-14s. The fallback "Assessment unavailable — model didn't respond in time" placeholder was showing on the magic-moment screen for two consecutive escalations, which kills the demo. 15s keeps the click-path bounded but lets the typical case return real content. Real fix is async generation (kick off, persist when done, surface "still computing" with refresh) — captured as a follow-up; bumping the bound is the right call for the wedge demo. list_sessions now matches escalated_to_id == current_user.id alongside the existing user_id and escalation_package.picked_up_by clauses. The unified HandoffManager.claim_session sets escalated_to_id but doesn't write the legacy picked_up_by JSONB key, so picked-up sessions never showed in the senior's chat list — the senior would land on the session detail (active chat) but the sidebar showed only their other unrelated sessions. User reported this as "4 different versions of the session in the chat history section" — they were actually 4 unrelated empty sessions the senior owned, plus the picked-up session was just invisible. Backend tests still 94/94. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 00:04:08 -04:00
Michael Chihlas	029680ab2d	feat(escalations): unify /escalate through HandoffManager All checks were successful Mirror to GitHub / mirror (push) Successful in 4s Details CI / frontend (pull_request) Successful in 5m8s Details CI / backend (pull_request) Successful in 10m13s Details CI / e2e (pull_request) Successful in 10m47s Details Replaces the legacy flowpilot_engine.escalate_session orchestration with a single canonical path through HandoffManager. Every escalation now creates a SessionHandoff row, fans out via the SSE bus, persists AppNotification rows for the bell icon, dispatches to external channels (Slack/Teams) via notify(), and emails per-user — regardless of whether the call entered through /escalate (legacy URL) or /handoff (new URL). The senior-pickup magic-moment screen now works end-to-end from the EscalateModal bell-icon path the user just tested. Backend - HandoffCreateRequest gains optional target_user_id (the equivalent of the legacy escalated_to_id field). Self-targeting rejected. - HandoffManager.create_handoff handles intent='escalate' end-to-end: sets escalation_reason + escalated_to_id, builds the legacy enhanced AI escalation_package (Sonnet, lazy-imported from flowpilot_engine, graceful fallback on failure), and merges handoff metadata into it. Eager-loads session.steps and session.user via selectinload — required by both the enhanced-package builder and notify() to avoid MissingGreenlet on async lazy access. - HandoffManager.finalize_escalation generates SessionDocumentation, pushes documentation to PSA, and runs notify() — pre-commit so the AppNotification rows persist atomically with the handoff. - HandoffManager.dispatch_escalation_notifications keeps only the fire-and-forget IO (bus publish, per-user emails) — runs post-commit. Pulls engineer name via a separate User query rather than relying on session.user lazy access. - /handoff endpoint passes target_user_id through and calls finalize_escalation pre-commit. - /escalate endpoint is now a thin shim: owner-only session lookup, HandoffManager.create_handoff(intent='escalate'), finalize_escalation, commit, dispatch_escalation_notifications, return SessionCloseResponse built from documentation + psa_result. flowpilot_engine.escalate_session is no longer called by any endpoint. - pickup_session accepts both 'requesting_escalation' (legacy in-flight sessions) and 'escalated' (new canonical) so the migration is seamless for sessions already in the queue. - Escalation queue list and sidebar count now match either status. Frontend - useFlowPilotSession optimistic update flips status to 'escalated' instead of 'requesting_escalation' so the page state matches the unified backend response. Verified end-to-end live: a fresh /escalate call from the junior produces status='escalated', a SessionHandoff row, a SessionDocumentation, PSA push attempted (no_psa for this test session), AND a bell-icon AppNotification for the team admin with link /pilot/{session_id}?pickup=true. Backend test suite: 1103 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 22:27:26 -04:00
Michael Chihlas	641853a002	fix(escalations): bell-icon notification opens the pickup flow Some checks failed Mirror to GitHub / mirror (push) Successful in 4s Details CI / backend (pull_request) Failing after 1m17s Details CI / frontend (pull_request) Successful in 4m53s Details CI / e2e (pull_request) Successful in 9m18s Details Two backend changes that unbreak the senior-pickup path from the notification panel: 1. notification_service: session.escalated link template now ends with ?pickup=true so the senior lands in the handoff/pickup flow on click. Without it, navigation hit /pilot/:id directly, which then 404'd on the GET because the senior isn't yet escalated_to_id — the user perceives this as the bell-icon "just clearing the notification". 2. ai_sessions GET access: any account member can now read an escalated session's detail when status is requesting_escalation or escalated. The owner-only guard was overly restrictive for explicitly-shared in-transit states. Tenant boundary is enforced by RLS on the underlying query, so account-scope is the right ceiling here. After pickup, the existing handler/escalated_to_id checks still apply. Verified live: re-login as the senior engineer and GET the active escalated session — now returns 200 with full detail. Focused test subset plus tests/test_sessions.py and tests/test_session_sharing.py → 94 passed in 43.26s, no regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 21:29:47 -04:00
Michael Chihlas	9bdd9959a8	fix(handoff): bound escalation assessment latency Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 20:03:14 -04:00
Michael Chihlas	bc15952857	fix(tests): stabilize escalation SSE backend tests Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 19:47:43 -04:00
Michael Chihlas	87bd0b7c56	WIP: SSE pub/sub for live escalation arrivals (paused for Codex review) First half of the WebSocket/SSE push slice. Paused mid-flight to hand the branch to Codex for outside-voice review before stacking more commits on top. See .ai/HANDOFF.md for the full pause context + what to look at. What's here: - backend/app/core/escalation_bus.py — module-level singleton in-memory pub/sub keyed by account_id. asyncio.Queue per subscriber with 64-event maxsize and drop-on-full semantics. Designed to be swappable for Redis pub/sub when Railway scales past single-replica. - backend/app/api/endpoints/session_handoffs.py — GET /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe bound to current_user.account_id. - backend/app/services/handoff_manager.py — dispatch_escalation_notifications now publishes a `handoff_created` event to the bus BEFORE the email fan-out, in a try/except so a bus failure can't block email delivery. - backend/tests/test_escalation_bus.py — 7 unit tests, all green standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers. - backend/tests/test_handoff_manager.py — +1 dispatcher integration test (publishes to bus, payload shape). - backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer blocked, ready event handshake). [gstack-context] Decisions: - SSE over WebSocket (one-way, browser EventSource semantics, fewer moving parts behind Railway proxy) - In-memory bus over Redis for v1 pilot (3 MSPs, single replica) - Drop-on-full subscriber queue rather than back-pressure publishers - Bus publish ahead of email send, both wrapped in try/except so neither can break handoff creation - Frontend will be a fetch-based ReadableStream reader matching the existing streamDocumentation pattern, not native EventSource (custom-header auth) Remaining (post-Codex): - Frontend SSE subscription in EscalationQueue.tsx (slide-in, reconnect, tab-title flash, prefers-reduced-motion) - Magic-moment handoff-context screen - Re-run the full backend test suite to verify the SSE + dispatcher integration tests (bus units already green standalone) Tried: - Running the full test suite repeatedly without xdist; the per-test DROP SCHEMA + recreate fixture made wall-clock prohibitive when multiple stale runs collided on the same Postgres test schema. Resolution: -n auto next time. [/gstack-context] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 19:29:07 -04:00
Michael Chihlas	07d0db9579	feat(handoff): email engineer-or-admin teammates on escalation First half of the Escalation Mode notification dual-path. WebSocket/SSE push is the second half (next commit) — email handles offline seniors, push handles online ones for the magic-moment demo. HandoffManager.dispatch_escalation_notifications: - Pulls active engineer/admin/owner-role users in the same account_id (excludes the escalator + viewers + soft-deleted) - Sends via existing EmailService.send_notification_email, concurrent via asyncio.gather; per-message failures don't block the rest - Wrapped in try/except: any exception is logged + swallowed. Handoff creation is authoritative; notification is advisory. This is the graceful-degradation regression both eng + codex reviews flagged as critical (handoff must succeed even if SMTP is down). Endpoint wiring (POST /ai-sessions/{id}/handoff): - Dispatch fires AFTER db.commit() — never email about a rolled-back handoff. Trust-erosion bug if we got that wrong. - Only fires for intent=escalate. Park is private to the escalator. Tests (4 new): - emails-engineer-recipients-in-account: viewer excluded, escalator excluded, only the engineer/admin teammates get the message - skipped-for-park-intent: park doesn't fan out - graceful-degradation-when-email-raises: RuntimeError from the email service does NOT bubble out of dispatch - endpoint-dispatches-on-escalate: end-to-end wiring through POST Per-channel delivery records (replacing the dead `notification_sent` boolean per Codex correction) is a v1.x story — for now application logs are the audit trail. See docs/plans/2026-04-27-escalation-mode-wedge-design.md. 20 tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. No regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:58:05 -04:00
Michael Chihlas	7a5b853b3b	feat(api): role-gate handoff claim to engineer-or-admin POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an authenticated user, so a viewer-role account user could claim escalations. Codex review flagged this as wedge-relevant: the Escalation Mode race- condition story (two seniors clicking Pick Up simultaneously) depends on auth gating for audit integrity. Originally captured as a deferred TODO during /plan-eng-review, then moved in-scope by /codex review. Swap the dep to require_engineer_or_admin. One-line change. Two new tests: - viewer_role gets 403 with "Engineer or admin access required" - engineer/owner role still succeeds and claimed_at + claimed_by populate Existing handoff create + queue tests unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:46:59 -04:00
Michael Chihlas	52f6d0308f	feat(analytics): add escalation time-to-first-action metric endpoint GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d} Computes the in-product wedge metric for Escalation Mode: average / median / p95 seconds between SessionHandoff.claimed_at and the first ai_session_step created on the same session after that timestamp. Account-scoped, role-gated to engineer-or-admin. The metric is intentionally NOT called "minutes recovered" — that's the two-metric framing locked by /codex review: this in-product number must be paired with manual baseline (the verbal-handoff stopwatch from The Assignment) to produce the savings claim. Schema's `metric_definition` field surfaces the disclaimer in every response so callers don't oversell it. Implementation notes: - Uses correlated scalar subquery for first-step-after-claim per handoff, aggregates avg/median/p95 in Python (~1k rows/account/month is well within budget; cleaner than percentile_cont gymnastics in SQL) - Excludes unclaimed handoffs (claimed_at IS NULL) - Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in n_handoffs_with_action — surfaces the conversion-rate signal - Floors negative deltas at 0 to handle clock-drift edge cases Tests cover happy path, zero-data, claimed-but-no-action accounting, period window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4 RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests, all green. No regressions in existing handoff_manager / session_handoffs suites. First piece of the Approach A wedge build per docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue stat-card and the analytics page. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:25:46 -04:00
Michael Chihlas	7f714363dd	perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m Backend suite is the slow gate (1076 passed locally in 22m27s on fix/ci-workflow-config). Adding pytest-xdist with per-worker DB isolation drops it to ~4m20s on the 8-core homelab runner. Verified locally: `pytest -n auto --no-cov` finished in 4m28s real time (15m19s user — confirms ~5× parallelism). How it works: - conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist — 'gw0', 'gw1', …). When set, derives a per-worker DB URL like `…/resolutionflow_test_gw0`. The base DB stays for serial / master runs. - `_ensure_worker_db_exists` runs synchronously at conftest import, connects to the postgres maintenance DB, and `CREATE DATABASE`s the worker-suffixed DB if it doesn't exist. Idempotent across runs. - The "test" safety guard still applies — every worker DB name contains "test" so the assertion holds. - The per-test `DROP SCHEMA public CASCADE` now operates on the worker's isolated DB, no cross-worker race. CI workflow: backend job switches to `pytest -n auto`. Coverage still collected (pytest-cov has built-in xdist support). Adds `pytest-xdist==3.6.1` to requirements-dev.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:53:47 -04:00
Michael Chihlas	e976fb4e87	fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Successful in 31m8s Details CI / frontend (pull_request) Successful in 5m42s Details CI / e2e (pull_request) Failing after 4m57s Details Three changes that get PR #150 to a green CI gate: 1. test_record_decision_persists_and_bumps_state_version — the `decision: draft_template` path calls `_extract_template_parameters` (TemplateExtractionService → AI provider). CI doesn't set ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised `RuntimeError: No AI provider configured` and returned 500. The test isn't exercising the AI integration — patched the extractor with an AsyncMock returning a minimal valid `{templated_body, parameters}` dict. Verified locally: the test now passes. 2. pip + npm caches in backend, frontend, and e2e jobs. Keyed on the hash of requirements.txt / package-lock.json with a runner-os restore-key fallback. Saves ~30-60s per run on cache hit. 3. Pytest invocation tightened*: - Dropped `--cov-report=term-missing` — the custom "Display coverage summary" step below parses coverage.json and prints the same module list more concisely. Term-missing dumps every uncovered line which adds ~5-10s of stdout. - Added `--maxfail=10` so a structural breakage (fixture explosion, DB unreachable) bails after 10 errors instead of running the full 25-min suite. Tunable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:01:05 -04:00
Michael Chihlas	49f88569da	wip(handoff): restore backend suite to green Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Failing after 27m35s Details CI / frontend (pull_request) Successful in 2m46s Details CI / e2e (pull_request) Failing after 4m9s Details Co-Authored-By: Codex <noreply@openai.com>	2026-04-25 06:13:23 -04:00
Michael Chihlas	d6218f2e07	fix(tests): import all models in conftest so create_all sees the full schema Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 11m23s Details CI / frontend (pull_request) Failing after 2m41s Details CI / e2e (pull_request) Has been skipped Details The test_db fixture calls Base.metadata.create_all on a fresh test DB. That only creates tables for models that have been imported (and thus registered with Base.metadata) by the time the fixture runs. app.main imports app.core.database (which gives us Base) but does NOT eagerly import the model modules — most are pulled in lazily inside scheduler functions (archive_stale_ai_sessions etc.) and route modules. At fixture-setup time, only the handful of models touched by those eager imports are on the metadata, so any test that exercises PSA, network diagrams, ratings, escalations, etc. fails with \`UndefinedTableError: relation "X" does not exist\` and a cascade of 500s on every endpoint that queries the missing table. Adding \`from app import models as _models\` (rather than the bare \`import app.models\` which would shadow the \`app\` FastAPI instance imported just above) pulls in app/models/__init__.py, which itself imports every model module — registering all ~60 tables with Base.metadata before create_all runs. Verified locally: tests/test_psa_writeback_phase4.py went from 1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were masking the actual passes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 02:49:06 -04:00
Michael Chihlas	406ee0ef97	fix(deps): bump pytest 7.4 → 8.4, pytest-cov 4.1 → 5.0 to satisfy pytest-asyncio 0.24 pytest-asyncio==0.24.0 (added on the FlowPilot branch as part of the RLS test infra refactor) declares pytest>=8.2 — but requirements-dev.txt still pinned pytest==7.4.3, so a clean pip install fails with ResolutionImpossible. CI runners that started from a fresh image would have refused to install dev deps; the FlowPilot tests passed locally only because the dev container had a pre-installed pytest 8.x lying around. pytest-cov 4.1.0 also needs >= 5.0 to play nicely with pytest 8. No code changes — pytest 8 is API-compatible with the existing test suite once the install resolves. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 02:32:43 -04:00
Michael Chihlas	1c904373f8	Merge main into feat/flowpilot-migration Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 36s Details CI / frontend (pull_request) Failing after 1m7s Details CI / e2e (pull_request) Has been skipped Details Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top of a unified main. Two manual conflict resolutions: 1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven protocol). The pre-rewrite reference content (CW integration notes, lessons archive, env vars table) lives in `docs/connectwise/`, `docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design. 2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks (facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint). Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay, NewTicketModal) kept. All setters wired by either branch are intact. Verification: - `tsc -b` clean across the merged tree. - Browser smoke-test (Session B fixture): Phase 9 ProposalBanner ("Run AI-drafted PowerShell to recover SSL VPN") renders alongside PSA's new Tickets sidebar icon. Console clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 01:03:33 -04:00
Michael Chihlas	d68131a865	feat(seed): Phase 9 QA fixture seeder Adds backend/scripts/seed_phase9_qa_fixtures.py — creates 4 ai_sessions plus matching session_suggested_fixes that pre-bake the four backend states the AI orchestrator must produce to mount the five conditional Phase 9 components: A. no template, no draft → ChatTabStrip + ScriptBuilderTab B. ai_drafted_script set → InlineNoTemplateDialog C. script_template_id set → TemplateMatchPanel D. applied_at + status=proposed → EscalateInterceptDialog (verify state) Background: a Phase 9 QA pass against a regular session left these five components unreached because the AI didn't emit SUGGEST_FIX in time/at all. Seeding directly bypasses the AI and lets QA exercise each surface deterministically. UUIDs are deterministic (uuid5 over a fixed namespace) so re-runs upsert. Pass --reset to wipe and recreate. Each session gets two synthetic conversation messages so the chat header's canAct gate (messages.length >= 2) opens up Resolve/Escalate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 00:08:38 -04:00
Michael Chihlas	49c6c8fd00	fix(seed): include cancel_at_period_end in test-user subscription INSERT Discovered during Phase 9 QA: seed_test_users.py was missing the cancel_at_period_end column in its subscriptions INSERT, but the column is NOT NULL (added in 016_add_subscription_tables.py). Result: seed crashed with NotNullViolationError before any users were created, blocking auth in fresh dev environments. Pre-existing on main; not introduced by the FlowPilot migration branch. Default value: false. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 23:36:04 -04:00
Michael Chihlas	b14a16a1ab	chore(tests): gate RLS tests behind RUN_RLS_TESTS flag Continues the test-isolation work from `dab740d`. RLS migration tests run against a policy-installed database and fail in the default create_all suite, so they need to be opt-in: - pytest.ini: register `rls` marker. - conftest.py: auto-deselect test_rls_isolation.py unless RUN_RLS_TESTS=1. Drops the deprecated session-scoped event_loop fixture (not needed since pytest-asyncio 0.23+). - test_rls_isolation.py: tag module with `rls` marker. Replace hardcoded `patherly_test` DB reference with parsed DATABASE_TEST_URL (matches conftest.py default `resolutionflow_test`). Updated docstring command to show RUN_RLS_TESTS=1. - requirements-dev.txt: bump pytest-asyncio 0.23.0 → 0.24.0 (loop-scope marker behavior required by the RLS module fixture). Run the RLS suite with: RUN_RLS_TESTS=1 DB_APP_ROLE_PASSWORD=... pytest tests/test_rls_isolation.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 16:09:13 -04:00
Michael Chihlas	dab740ddf7	fix(tests): isolate test DB from dev DB and plug admin-db override gap All checks were successful Mirror to GitHub / mirror (push) Successful in 3s Details Root cause of the 06:32 AM outage: running 'pytest tests/' inside the resolutionflow_backend container silently dropped the public schema on the DEV database. Two layered bugs made this possible; both are fixed. Bug 1 — env-var lookup in conftest.TEST_DATABASE_URL put DATABASE_URL (which normally points at the dev/prod DB) ahead of DATABASE_TEST_URL. When DATABASE_URL is set, pytest used the dev DB as the 'test' DB and the test_db fixture's DROP SCHEMA public CASCADE wiped it. Fixed: - Honor only DATABASE_TEST_URL (or the localhost fallback). - Assert at module load that the DB name contains 'test' — refuses to run otherwise. Makes future misconfiguration impossible. Bug 2 — conftest overrode app.dependency_overrides[get_db] but not get_admin_db. Endpoints using get_admin_db (register, admin routes) bypassed the test session and hit the real admin DB. Before Bug 1 was fixed this was hidden because both engines pointed at the same dev DB. With isolation in place, register started failing 'Email already registered' because of stale users in the dev DB. Fixed: - Also override get_admin_db to yield the same test session. RLS is not enabled in the create_all-managed test schema, so sharing is safe. Also adds DATABASE_TEST_URL=resolutionflow_test to docker-compose.dev.yml so pytest in the container works out of the box. Verified: 49/50 Phase 8 + 9 tests pass against resolutionflow_test; the 1 failure is the pre-existing Phase 8 Issue #4 (test_record_decision_persists_and_bumps_state_version). Refs gitea #145 (will update that issue with this as the primary fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:14:08 -04:00
Michael Chihlas	1c855563ee	feat(pilot): PATCH /suggested-fixes/:id/script endpoint Called by the inline Script Builder tab on Submit. Writes ai_drafted_script + ai_drafted_parameters to the fix without stamping applied_at (a draft is not an application — that's §5 of the Phase 9 spec). Bumps state_version so Resolve/Escalate preview bundles regenerate. 409 on terminal fix status. 404 on wrong session. 422 on empty script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:34:06 -04:00
Michael Chihlas	d4fae87236	feat(pilot): inline Script Builder session — idempotent create + auth + filtered list POST /script-builder/sessions now supports origin='pilot_inline': - Requires ai_session_id; validates it against current user ownership. - Get-or-create: returns existing row for (user, ai_session_id) pair. - Partial unique index on the DB backs the invariant; races resolve to the single winner row. list_sessions + count_user_sessions default-scope to origin='standalone' so inline scratch sessions don't pollute the /script-builder dashboard or count against the 5-session cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:24:57 -04:00
Michael Chihlas	f2fce27f0d	feat(pilot): pydantic schemas for inline origin + script PATCH - ScriptBuilderCreateRequest gains origin ('standalone' \| 'pilot_inline') and optional ai_session_id. Handler-side validation (next task) enforces pilot_inline ⇒ ai_session_id required + owned by caller. - SessionSuggestedFixScriptRequest added for the new PATCH /script endpoint (Phase 9 Task 6). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 01:53:28 -04:00
Michael Chihlas	93c974466a	feat(pilot): script_builder_sessions.origin on SQLAlchemy model Mirrors the DB column added in the prior migration. App-level default is 'standalone' so existing callers of ScriptBuilderSession(...) work without code changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 01:48:22 -04:00
Michael Chihlas	8012668975	feat(pilot): add origin + inline idempotency to script_builder_sessions Phase 9 prep. Adds: - origin VARCHAR(20) NOT NULL with CHECK ('standalone' \| 'pilot_inline') - invariant: pilot_inline rows must have ai_session_id - partial unique index on (user_id, ai_session_id) WHERE origin='pilot_inline' — backs get-or-create idempotency for the inline Script Builder tab. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 00:22:53 -04:00
Michael Chihlas	70c5da0c75	fix(pilot): persist AI-proposal rejection + clear on outcome write Issue #3 from phase-8-review-issues.md. 'Not yet' on the AI-confirming banner was a local-state hide; the proposal re-surfaced on the next refreshSessionDerived call. Two-part fix: - PATCH /outcome now clears ai_outcome_proposal on any terminal action (engineer has taken a decision; stale AI proposal is moot). - New DELETE /ai-sessions/:sid/suggested-fixes/:fid/ai-outcome-proposal endpoint for explicit 'Not yet' rejection. Does not touch status or state_version — pure UI state. Frontend handleRejectAIProposal now calls the DELETE and setActiveFix with the server response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:15:48 -04:00
Michael Chihlas	de2bef3175	fix(pilot): persist Apply — stamp applied_at on click Issue #2 from phase-8-review-issues.md. Apply was client-side-only via a bannerApplied flag. Refresh / chat reselect / multi-tab would drop Verifying state back to Proposed. - New POST /ai-sessions/{sid}/suggested-fixes/{fid}/apply stamps applied_at without changing status (still 'proposed'). Idempotent if already stamped; 409 if fix is past proposed (a terminal outcome was already recorded). - Bumps state_version so resolve/escalate preview bundles reflect that the fix has entered verifying. - Frontend handleApplyFix calls the endpoint and uses the returned applied_at directly. bannerApplied client flag is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:10:52 -04:00
Michael Chihlas	362c7b1d79	fix(pilot): outcome-aware Resolve/Escalate previews Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't enough — previews were also omitting outcome fields from the LLM bundle, so a fresh regenerate still couldn't distinguish proposed / failed / partial / success. - PATCH /outcome now bumps ai_sessions.state_version (matches record_decision's existing pattern). - Resolution-note + escalation-package bundles now include status, applied_at, verified_at, partial_notes, failure_reason on the active fix. - Generator prompts prescribe outcome-aware phrasing (closure language for success; what-we've-tried + next-steps for failed/partial). - New end-to-end test asserts the regenerated preview reflects the recorded outcome, not just that the cache key changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:04:56 -04:00
Michael Chihlas	2cde6673b0	feat(pilot): [FIX_OUTCOME] system prompt instructions Tells the AI when + how to emit the [FIX_OUTCOME] marker that Task 4's parser consumes. Placeholder-only per the anti-parrot pattern — no literal UUIDs, outcomes, or reasons that could leak into unrelated sessions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:17:21 -04:00
Michael Chihlas	c0112f8bee	feat(pilot): [FIX_OUTCOME] marker parser + AI outcome proposal The AI emits [FIX_OUTCOME] when the engineer indicates in chat that a prior suggested fix worked, didn't work, or was partially applied. The marker writes to session_suggested_fixes.ai_outcome_proposal (JSONB), which the frontend surfaces as a "confirm outcome?" banner. The status column is only updated when the engineer clicks confirm (via PATCH /outcome endpoint from Task 3). Placeholder-only system prompt wiring comes in Task 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:08:43 -04:00
Michael Chihlas	8988dbc885	feat(pilot): PATCH /suggested-fixes/:id/outcome endpoint + tests Records engineer-reported outcome (applied_success\|applied_failed\| applied_partial\|dismissed). Enforces transition rules (partial → success/ failed allowed; terminal outcomes return 409) and notes requirements (applied_partial requires notes). Sets verified_at on success/failure, stamps applied_at if not already set (handles the case where the AI [FIX_OUTCOME] marker fires before the engineer clicks Apply). Also fixes pre-existing test-infrastructure bug: network_diagram.py used bare string server_default="'[]'" for JSONB columns, which asyncpg rejects during test schema creation. Changed to text("'[]'::jsonb") to match the pattern used by script_template.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:59:34 -04:00
Michael Chihlas	4a8e3ae954	feat(pilot): pydantic schemas for fix outcome patch Adds FixStatus literal (5 values matching the DB check constraint), extends SessionSuggestedFixResponse with outcome fields, and introduces SessionSuggestedFixOutcomeRequest for the PATCH /outcome endpoint coming in Task 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:44:39 -04:00
Michael Chihlas	cdd8bb05cc	feat(pilot): add outcome tracking columns to session_suggested_fixes Phase 8 prep for the fix outcome banner. Adds: - status (proposed\|applied_success\|applied_failed\|applied_partial\|dismissed) - applied_at, verified_at (timestamps) - partial_notes, failure_reason (engineer-provided context) - ai_outcome_proposal (JSONB for AI [FIX_OUTCOME] marker payloads) Backfills status='dismissed' from user_decision='dismissed'. status is orthogonal to user_decision — outcome (did the fix work?) vs script-path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:40:17 -04:00
Michael Chihlas	4aaf57adb5	feat(pilot): Phase 6 — post-resolve templatize prompt + draft accept/reject All checks were successful Mirror to GitHub / mirror (push) Successful in 11s Details Closes the loop on the Phase 5 "Run now, templatize after resolve" path. After a session resolves, drafts queued by the three-option dialog surface as a modal that lets the engineer review the AI-proposed parameterization and either save as a reusable team template or skip. A "don't ask again" toggle writes to account_settings.preferences so the next resolve won't pop the modal. Backend: - /api/v1/draft-templates: * GET — list account drafts (pending_only default true; pass false for audit view including accepted/rejected) * GET /{id} — single draft * POST /{id}/accept — promotes to a new script_templates row with source_session_id / source_user_id / source_ticket_ref populated (drives the Script Library "generated from CW #X · resolved by Y" provenance chip). Draft flips to status=accepted, promoted_template_id set, resolved_at stamped. 409 on re-accept / already-rejected. 400 on unknown category_id. * POST /{id}/reject — flips to status=rejected. 409 on re-reject. - /api/v1/accounts/me/preferences (GET/PATCH) — thin wrapper over AccountSettings.get_setting/set_setting. PATCH merges keys into the JSONB column, preserving existing keys the client didn't touch. Used by the "Don't ask again for this team" checkbox (templatize_prompt_enabled=false) and, forward-looking, by cw_resolved_status_id / cw_escalated_status_id from Phase 4. - 13 tests: list filter, accept with/without edited_body, provenance copy-through, reject, 409 on re-accept / re-reject, 400 on unknown category, prefs round-trip with merge semantics. Frontend: - src/components/pilot/script/TemplatizePrompt.tsx — modal showing the drafted script with proposed parameters in the Phase 5 ParameterizationPreview, editable name/category/description, an individual-parameter remove button, and the "don't ask again" opt-out. Accept posts to /draft-templates/{id}/accept + optionally PATCHes preferences. Skip posts /reject. - src/api/draftTemplates.ts — typed client plus accountPreferencesApi. - AssistantChatPage: after a successful Resolve (external OR local), fetches preferences + pending drafts for the session and queues the modal one draft at a time. Escalate does not trigger this flow. - Sidebar: Scripts nav shows the pending-draft count as a badge. Fetched independently of the main sidebar stats so endpoint flakes don't break the rest of the sidebar. Verified live 2026-04-22: seed two drafts → GET sees both pending → accept draft A (template created, provenance CW #99123 populated) → reject draft B → pending count drops → PATCH opt-out → GET confirms persistence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:37:49 -04:00
Michael Chihlas	d0ebdef9e8	fix(ai): full-sweep audit — placeholders only in system prompts + CI guardrail All checks were successful Mirror to GitHub / mirror (push) Successful in 10s Details The "AI parrots example content from system prompt" bug bit us twice in one day across two different prompt sites. Patching individual prompts is treating the symptom; this commit makes the rule structural. Audit + sanitize: - assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in prior commits, but the [FORK] schema still had literal "Brief reason" / "Short name" / "One sentence" placeholders. Replaced with <angle-bracket> placeholders. Anti-parrot rule itself rewritten to describe the failure mode abstractly instead of naming "jsmith" so the rule no longer trips the guardrail (and so the model doesn't see "jsmith" as a token at all). - ai_chat_service.py — removed three concrete-example offenders: "Get-Service ADSync" command literal, the "DC01 server_name" intake form payload (in two places), and the inline interview demos using "Azure AD Sync failures" / "Exchange Online mailbox migration". Replaced with technology-neutral schema descriptions. - ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the fully-fleshed DNS troubleshooting tree (with literal Dnscache / ipconfig / google.com / Start-Service) with a placeholder schema showing only ID-linkage shape. - kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked Server Manager + DC01 example payload with a placeholder schema. Guardrail (tests/test_prompt_anti_parrot.py): - Imports every module under app/services/ and app/core/ and walks every uppercase string constant ending in _PROMPT, _SCHEMA, _PROTOCOL, _FORMAT, or _CONTEXT. - test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache, google.com, "Outlook keeps", "Teams drops") must not appear in any prompt constant. Add to the list when a new leak shows up in prod — the list IS the audit trail. - test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.) must contain placeholders only. Distinguishes JSON keys (followed by ':', allowed) from JSON values (followed by ',' / ']' / '}', must be <placeholder>); allows pipe-separated enum types (text\|password\|select) and a small set of fixed enum values (question, diagnostic_check, decision, action, ...). Verified by feeding the test a known-bad block — caught it correctly. Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming the test as the enforcement point so future contributors know how to extend it (add to the known-leaked list when a new leak surfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:09:30 -04:00
Michael Chihlas	50215b9110	fix(pilot): strip literal example content from system prompt — model was parroting All checks were successful Mirror to GitHub / mirror (push) Successful in 10s Details The system prompt had a "Complete example of a correct first response" section with a specific Outlook/WiFi/jsmith scenario plus literal JSON payloads in [QUESTIONS], [ACTIONS], [SUGGEST_FIX], and [PROMOTE] markers. The model was emitting those literal strings (the same WiFi/laptop questions, the same "Clear cached credentials" suggested fix, the same "OWA login confirmed for jsmith" promote) on EVERY unrelated chat — making the task lane look like it was leaking previous- session data when in fact the AI was just reciting the prompt examples. Replaced literal example content with `<placeholder>` schemas. Added an explicit ANTI-PARROT RULE in the FINAL REMINDER section calling out that the angle-bracket placeholders show SHAPE, not CONTENT, with concrete examples of the failure mode (printer ticket → don't ask about Outlook; user not named jsmith → don't name jsmith). Same scrub applied to the FORK section's "Outlook AND Teams dropping" and the worked fork-flow example. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 01:36:29 -04:00
Michael Chihlas	fa61376303	feat(pilot): Phase 5 — inline Script Generator integration All checks were successful Mirror to GitHub / mirror (push) Successful in 10s Details Wires the SuggestedFix card to an inline panel that handles both cases: template-matched fixes open the Script Library generator with parameters pre-filled from session context; un-matched fixes open the three-option dialog (one_off / draft_template / build_template). The decision endpoint records the path choice with side effects: draft_template persists a draft_templates row via a Sonnet-driven TemplateExtractionService; build_template returns a redirect to the Script Builder; one_off just records the choice. Backend: - TemplateExtractionService: drafts a parameter schema from a concrete rendered script. Conservative by default ("prefer fewer parameters"). Round-trip-validates that templated_body only references declared parameters; missing-key mismatch falls back to the original script with no params. LLM/parse failures fall back identically — the engineer can still create a draft and refine in the post-resolve prompt (Phase 6). - /suggested-fixes/{fix_id}/decision side effects: * one_off → returns rendered_script (engineer's edited version or the fix's ai_drafted_script verbatim) * draft_template → same + creates draft_templates row with extracted params, returns draft_template_id * build_template → returns redirect_path=/scripts/builder?from_session= &fix= so the frontend can navigate to the builder pre-loaded - 400 when a non-template fix has no ai_drafted_script (template-matched fixes take the dedicated /scripts/generate path, not this endpoint). - 12 tests: TemplateExtractionService parse + fallback paths, all four decision branches, edited_script override, missing-script 400. Frontend: - src/components/pilot/script/{TemplateMatchPanel, NoTemplateDialog, ParameterizationPreview}.tsx — inline panels rendered in the task lane's bottom slot when the engineer clicks a SuggestedFix card. - TemplateMatchPanel: loads template via /scripts/templates/{id}, pre-fills params from fix.ai_drafted_parameters with cyan "from session" tags, generates via existing /scripts/generate (already bumps state_version on ai_session_id from Phase 3). 404 falls back with a clear message instead of erroring. - NoTemplateDialog: shows the AI-drafted script with proposed parameter values highlighted in amber via ParameterizationPreview; three option cards with the middle (draft_template) flagged Recommended; inline edit on the script body before deciding. - SuggestedFix card now clickable: onActivate toggles the inline panel. - AssistantChatPage: scriptPanelOpen state + handleScriptDecision that navigates on build_template and toasts on the other paths. Active fix changes auto-close the panel so engineers don't act on stale state. - Cmd+K → "Open inline Script Generator" palette entry surfaces only on /pilot/:id routes; fires a window event the chat page subscribes to. No Resolve shortcut added per Section 14 decision (browser ⌘R conflict). Verified 2026-04-22 against the dev stack: - one_off / draft_template / build_template all return the right shape with real Sonnet TemplateExtractionService for the draft path. - Conservative extraction confirmed: cmdkey + Restart-Process script yielded zero proposed parameters as intended. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 00:15:29 -04:00
Michael Chihlas	8fd2c1bac6	feat(pilot): Phase 4 — Resolve + Escalate PSA writebacks with status verification All checks were successful Mirror to GitHub / mirror (push) Successful in 11s Details Wires the preview popover's Confirm & post action to ConnectWise (and, via the provider pattern, any future PSA). Adds the parallel Escalate flow with the handoff-oriented five-section markdown. Sessions without a linked PSA ticket resolve/escalate locally — markdown stored, status flipped, nothing posted externally. Backend: - EscalationPackageGeneratorService: Sonnet, five sections (Problem / What we've confirmed / What we've tried / Current hypothesis / Suggested next steps). Shares the preview_cache with a separate KIND so Resolve and Escalate previews for the same state coexist. - PSAWritebackService: post_resolution_note (RESOLUTION note type, customer-visible), post_escalation_package (INTERNAL_ANALYSIS, handoff for the next engineer only), transition_ticket_status with mandatory re-fetch verification. PSAStatusVerificationError surfaces loudly when CW silently rejects a status change — the ConnectWise anti-pattern CLAUDE.md flags. - Endpoints: * POST /ai-sessions/{id}/escalation-package/preview * POST /ai-sessions/{id}/resolution-note/post * POST /ai-sessions/{id}/escalation-package/post Outcomes: "resolved" / "escalated" with external_id + verified status, "resolved_local" / "escalated_local" when no PSA linked. - Target CW status IDs live in account_settings.preferences (cw_resolved_status_id, cw_escalated_status_id). When unset, the post proceeds without a status transition — response includes a status_transition_skipped_reason rather than silently erroring. - 7 tests: local-only path, PSA happy path with verified transition, status verification failure → 502, skipped transition when unconfigured, 409 on already-resolved re-post, escalate parallel path, internal-analysis note type enforced. Frontend: - ResolutionNotePreview now kind-parameterized ('resolve' \| 'escalate') with inline edit + Confirm & post. Preview loads from the matching backend endpoint; posting calls the matching endpoint; outcome toast surfaces the verified CW status or the local-only result. - AssistantChatPage: previewKind state replaces previewOpen; two toggle buttons (Preview Resolve note / Escalate instead) in the lane's bottom slot. handleConfirmPost dispatches by kind. Verified 2026-04-22: - Local-only Resolve + Escalate round-trip against the dev stack. - Live Sonnet escalation-package preview; cache hit on repeat call with no state change (separate cache kind from resolution-note). - PSA post + status-verification paths covered by mocked-provider pytest cases. Live CW round-trip pending a test CW instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 23:54:54 -04:00
Michael Chihlas	66e592096c	feat(pilot): Phase 3 — Suggested fix tracking + Resolve preview with state_version cache Adds the AI-proposed resolution path and the inline preview of the markdown that will be posted to the customer ticket on Resolve. The preview is keyed on (session_id, ai_sessions.state_version) so back-to- back fetches against unchanged state hit an in-process cache instead of paying for a Sonnet call. Backend: - preview_cache: in-process LRU keyed on (kind, session_id, state_version). No TTL — state_version is the source of truth. Soft-cap 5000 entries. - unified_chat_service: [SUGGEST_FIX] parser (last-block-wins, JSON payload, confidence clamped 0-100), supersession persistence (sets superseded_at on prior active row), atomic state_version bump. - ResolutionNoteGeneratorService: pulls session, facts, active fix, and redacted script_generations into a structured input bundle for Sonnet; produces the four-section markdown (Problem / What we confirmed / Root cause / Resolution). Sensitive script parameters redacted via ScriptTemplateEngine.redact_sensitive driven by the template's parameters_schema. - /api/v1/ai-sessions/{id}/suggested-fixes/active — 200 with the active fix or 404. - /api/v1/ai-sessions/{id}/suggested-fixes/{fix_id}/decision — records one_off / draft_template / build_template / dismissed; dismiss supersedes; bumps state_version. 409 on dismissing an already- superseded fix. - /api/v1/ai-sessions/{id}/resolution-note/preview — generates or returns cached markdown; from_cache flag in payload signals cache hit. - scripts.py POST /generate now bumps state_version on the linked ai_session_id when present (third source of preview-cache invalidation per Section 5.5). - ASSISTANT_SYSTEM_PROMPT documents [SUGGEST_FIX] (when to/not to emit, format, supersession semantics). - 12 tests covering the parser (well-formed, last-wins, malformed, confidence clamping), supersession + state_version invariant, all decision branches, preview cache hit-on-no-change + miss-after-write. Frontend: - src/components/pilot/sections/SuggestedFix.tsx — amber-accented card with confidence badge; dismiss action wired to the decision endpoint. - src/components/pilot/ResolutionNotePreview.tsx — popover with refresh, loading state, cached/fresh indicator, ticket-ref display. - src/api/sessionSuggestedFixes.ts — typed client; getActive normalizes 404 to null so callers don't have to special-case. - TaskLane gains suggestedFixSlot + bottomSlot props (rendered after Diagnostic Checks; bottomSlot anchors the Resolve action). - AssistantChatPage: refreshSessionDerived helper batches fact + fix refresh; fact mutations and chat sends both schedule a 500ms-debounced preview refresh per the Section 5.5 spec. Verified end-to-end against the dev stack with a real Sonnet call: - /active 404 → fact create → preview generates four-section markdown grounded only in provided facts → second preview call hits cache (from_cache=true, no LLM call) → fact write 2 → cache miss, regenerates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 21:45:52 -04:00
Michael Chihlas	625dba7548	feat(pilot): Phase 2 — What we know (facts) with stable task-lane IDs Adds the load-bearing structural feature of the FlowPilot migration: a "What we know" panel that holds confirmed facts for a session, fed by AI [PROMOTE] markers and engineer-added notes. Facts feed the resolution note preview (Phase 3) and survive across turns via stable UUIDs assigned to pending_task_lane items. Backend: - FactSynthesisService: create/update/soft-delete facts with atomic state_version bumps; LLM-backed synthesize_from_question/check on the fact_synthesis (Haiku) action tier per Section 6.6. - /api/v1/ai-sessions/{id}/facts CRUD + /facts/promote (proposed_text or via synthesis). PATCH returns 403 for question/diagnostic_check facts (edit the source item instead, Section 7.3). - unified_chat_service: [PROMOTE] marker parser (JSON-block per Section 8.1 spec drift note), stable-UUID assignment for pending_task_lane questions/actions preserved by exact text/label match across turns. - ASSISTANT_SYSTEM_PROMPT: documents [PROMOTE] format, when to/not to emit, hallucination guardrails, source_ref handling. - 17 tests covering parser, stable IDs, service validation, CRUD, editability rule, both promote modes, 422 null-synthesis path, state_version invariant. Frontend: - src/components/pilot/sections/{WhatWeKnow,WhatWeKnowItem,AddNoteButton} — green-gradient section above Questions, dashed-circle check, inline edit/delete gated by the server's editable flag. - TaskLane gains a whatWeKnowSlot prop (existing assistant/ folder kept per the doc's "rename is opportunistic" guidance). - AssistantChatPage fetches facts on selectChat and refetches after each chat send (so [PROMOTE]-synthesized facts appear immediately); auto- opens the lane when facts exist. Verification: end-to-end smoke against the local docker stack confirms all five endpoints (list/create/patch/delete/promote) plus the 403 editability rule. pytest suite verifies the same with mocked LLM. Live [PROMOTE] flow remains untested until used in the UI — the marker shape is covered by parser tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 21:13:44 -04:00
Michael Chihlas	b49772f1a1	feat(models): Phase 1 SQLAlchemy models — SessionFact, SessionSuggestedFix, DraftTemplate, AccountSettings Backs the schema added in `210d310` with SQLAlchemy 2.0 models. - SessionFact: "What we know" facts with polymorphic source_ref pointing at task-lane item UUIDs inside ai_sessions.pending_task_lane (not a FK per Section 4.2). - SessionSuggestedFix: AI-proposed resolutions with supersession tracking and the full user_decision state machine. - DraftTemplate: post-resolve templatization queue with promotion to script_templates. - AccountSettings: per-account JSONB preferences grab-bag with async classmethod helpers — get_setting(db, account_id, key, default) reads without creating, set_setting(db, account_id, key, value) upserts via Postgres ON CONFLICT + jsonb `\|\|` merge so existing keys are preserved. Lazy row creation matches the Phase 1 design. Column additions on existing models to mirror the migration: - AISession: resolution_note_* / escalation_package_* / state_version (the preview-cache-invalidation counter consumed by Phase 3). - ScriptTemplate: source_session_id / source_user_id / source_ticket_ref (provenance for templates promoted from DraftTemplate). All four new models registered in app.models.__init__ and __all__. TYPE_CHECKING-guarded relationship imports throughout, matching the repo's existing model style. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 18:35:00 +00:00
Michael Chihlas	210d310fb2	feat(db): Phase 1 schema — session_facts, suggested_fixes, draft_templates, account_settings Adds the backing store for the FlowPilot unified session surface, per the FLOWPILOT-MIGRATION.md Phase 1 deliverable. Descends from production head 074 (add_network_diagrams_table). New tables (all tenant-scoped, all RLS-enabled + forced): - session_facts — "What we know" facts. source_ref is a polymorphic pointer to a task-lane item inside ai_sessions.pending_task_lane (no DB-level FK; integrity enforced at service layer per Section 4.2 of the design doc). Soft-delete via deleted_at; active-facts partial index excludes deleted rows. - session_suggested_fixes — AI-proposed resolutions. One active per session at a time (supersession tracked via superseded_at; partial index on (session_id) WHERE superseded_at IS NULL powers the "find active fix" query). - draft_templates — scripts pending post-resolve templatization. Partial index on (account_id) WHERE status='pending' supports the "N scripts ready to review" Script Library badge. - account_settings — new per-account table with JSONB preferences grab-bag. Rows created lazily on first write; get_setting returns default when no row exists. Column additions on ai_sessions: - resolution_note_markdown / posted_at / external_id - escalation_package_markdown / posted_at / external_id - state_version (INTEGER NOT NULL DEFAULT 0) — incremented atomically by any write that invalidates the resolution note preview cache per Section 5.5. Phase 3 consumes this. Column additions on script_templates: - source_session_id, source_user_id, source_ticket_ref — powers the "generated from CW #X · resolved by Y · used N times" provenance chip in the Script Library. RLS pattern matches the repo convention (074 / network_diagrams is the nearest template): ENABLE + FORCE, USING + WITH CHECK on `account_id = app.current_account_id`. Downgrade is reversible — drops in the inverse order of creation so FK dependencies unwind. No runtime verification from code-server; migration apply + downgrade will be verified on the new dev environment per the standing deferral. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 18:14:26 +00:00
Michael Chihlas	3f0a132058	refactor(ai): rename _call_anthropic_cached → chat_call_cached; extract cache plumbing (Phase 0.4) Renames the chat caller to a name that signals its actual purpose, and factors the reusable cached-system-block + cached-history + cache-usage-log primitives out to app.core.ai_provider so they can be shared with the provider-generic path without pulling MCP/beta/images into the abstract interface. Helpers added to ai_provider.py: - `build_anthropic_chat_messages(history, new_message, images, format_reminder)` — owns: copy history, apply cache_control to last history message, append format reminder to new message, render images as multimodal blocks. Anthropic-shaped by design; do not call from Gemini paths. chat_call_cached keeps exactly the concerns that are unique to the one MCP/beta/multimodal chat caller: - Anthropic beta endpoint invocation - Microsoft Learn MCP server wiring (ENABLE_MCP_MICROSOFT_LEARN) - Retry-without-MCP fallback - Format-reminder content string (declared as module constant) - Phase 0.5 telemetry (mcp.turn, mcp.fallback) Documents in the module docstring AND at the function site that this is the ONE MCP/beta chat caller and should not become the general provider path. MCP/beta/images are features of exactly one optional Anthropic beta endpoint; routing them through AnthropicProvider would leak a provider- specific concern into the abstract interface that also serves Gemini. Behavior change: chat_call_cached now reuses the singleton AnthropicProvider HTTP client via `_get_anthropic_client(...)` instead of instantiating a new `anthropic.AsyncAnthropic(...)` per call. Matches the provider's own pattern and avoids burning connections per-turn. No user-visible difference. No runtime verification from code-server. TODO(phase0-verify) in ai_provider.py tracks the cache-hit verification owed on the new dev env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 17:03:09 +00:00
Michael Chihlas	da93ae55c3	feat(ai): opt-in structured-system-block caching for one-shot generators (Phase 0.3) Wraps each static system prompt in a single-block list so Phase 0.1's AnthropicProvider applies cache_control: ephemeral automatically (policy α, first block gets marked when no caller-authored cache_control is present). Call sites: - ai_tree_generator.scaffold_branches: SCAFFOLD_SYSTEM_PROMPT (~1k tokens) - ai_tree_generator.generate_branch_detail: BRANCH_DETAIL_SYSTEM_PROMPT (~2.5k tokens with few-shot example); retries inside the same function re-read the cached block instead of paying full input cost on each attempt - kb_conversion.convert_document: TROUBLESHOOTING or PROCEDURAL prompt (each caches independently by text content) - ai_fix.generate_fixes: FIX_SYSTEM_PROMPT on first attempt + corrective retry - script_builder.send_message: SYSTEM_PROMPT_TEMPLATE (per-session language substitution — same-language sessions share cache entries) Each edit includes an inline comment explaining why the block is cacheable (stable-constant, retry-reuse, per-language variant) so a future dev can see the intent at the cache_control marker site. script_builder history caching deliberately deferred — per Phase 0.1 decision (option i), AnthropicProvider does not automatically cache the message list. If script_builder's growing 20-message history turns out to be a visible cost driver via the anthropic.cache telemetry, route that caller through the 0.4 chat wrapper which handles history caching. No runtime verification from code-server; cache-hit behavior will be confirmed against the new dev environment when it's up, per the inline TODO(phase0-verify) in ai_provider.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 16:29:45 +00:00
Michael Chihlas	b3be66652e	feat(ai): structured-system-block caching in AnthropicProvider (Phase 0.1) Widens AIProvider.generate_json / generate_text / generate_text_stream signatures to accept `system_prompt: str \| list[SystemBlock]`: - `str` (the existing call shape): passes through uncached, unchanged behavior. Every existing caller stays on the uncached path — no silent behavior change. - `list[SystemBlock]`: enables Anthropic prompt caching via structured system blocks. Caller-authored `cache_control` is honored verbatim (policy α); if no block carries it, the provider applies `cache_control: {"type": "ephemeral"}` to the first block only. Gemini ignores cache_control and concatenates list entries into one system string — the widened signature is strictly additive on that path. Adds `anthropic.cache` structured-log telemetry: on every Anthropic response (streaming included, via `stream.get_final_message()`), logs `cache_read_input_tokens` and `cache_creation_input_tokens`. Telemetry failure in streaming is swallowed so the user-facing stream never breaks. Verification deferred: cannot run from code-server (no Python, no DB, no dev env). TODO(phase0-verify) left inline in the module docstring. First verification task on the new dev environment is to hit any FlowPilot endpoint twice within 5 minutes and confirm the second call shows cache_read_input_tokens > 0 in the `anthropic.cache` log event. If verification fails, that's a debug task on the new env — not a blocker for continuing Phase 0.2/0.3/0.4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 16:17:12 +00:00
Michael Chihlas	0fbc1e0a57	feat(telemetry): add MCP per-turn structured-log telemetry (Phase 0.5) Emits structured `mcp.turn` log events on every Anthropic-path chat turn, capturing whether MCP was wired in (mcp_available), whether the model actually invoked an MCP tool (mcp_invoked), which tool names fired, and whether the silent retry-without-MCP fallback was triggered. Adds a separate `mcp.fallback` event with error type/message for fallback occurrences. Establishes baseline data for deciding whether MCP investment is earning its keep before Phase 2+ expands the product footprint. Scope: the one MCP-using code path (`_call_anthropic_cached`) — not a general instrumentation layer. No new dependencies, no schema changes, no behavior change. Standard library `logging` is the sink; PostHog is not wired on the backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:57:13 +00:00
Michael Chihlas	995a0c1d2e	fix(psa): use schedule entries for ticket co-assignees (CW canonical pattern) Some checks failed Mirror to GitHub / mirror (push) Successful in 33s Details CI / backend (pull_request) Failing after 17m0s Details CI / frontend (pull_request) Failing after 51s Details CI / e2e (pull_request) Has been skipped Details The previous implementation PATCHed the `resources` string directly, which CW silently ignores because `resources` is a server-derived read-only field (it's populated from schedule entries of type/id=4, not freely writable). Per CW docs (openapi line 70949): "Please use the /schedule/entries?conditions=type/id=4 AND objectId={id} endpoint". Behavior per spec: - No owner + assign user → set owner (existing behavior kept) - Has owner + assign different user → POST /schedule/entries with type/id=4, member, objectId; owner untouched - User already assigned (owner or schedule entry) → idempotent no-op - Remove owner → clear owner (existing behavior kept) - Remove co-assignee → DELETE /schedule/entries/{entry_id} - list_resources now merges owner + schedule-entry members, deduped by id Required CW security role permission on the API member: - Service > Resource Scheduling > Add/Inquire/Delete Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 00:34:18 +00:00
Michael Chihlas	f6a24ea4e1	fix(psa): resource assignment targets CW `owner`, status PATCH verifies apply Some checks failed Mirror to GitHub / mirror (push) Successful in 2s Details CI / backend (pull_request) Failing after 15m32s Details CI / frontend (pull_request) Failing after 45s Details CI / e2e (pull_request) Has been skipped Details Previous `resources`-string PATCH was silently ignored by CW — the `resources` field is server-derived from the ticket's owner + schedule entries, not freely writable. Status PATCH could also silently no-op when a cross-board status id was sent. - add_resource: when the ticket is unassigned, set the `owner` MemberReference (the canonical writable primary-assignee field). If already owned by someone else, append the identifier to the `resources` co-assignee string best-effort. - remove_resource: clear `owner` (with remove→replace:null fallback) if the target is the current owner, otherwise strip from `resources`. - list_resources: merge owner + resources string, deduped by member id, so the UI reflects both single-owner and multi-resource assignments. - update_ticket_status: verify CW applied the status by comparing the response body's status.id — raises PSAError with a clear message when CW silently rejects the change (e.g., status invalid for ticket's board), instead of reporting spurious success. - Frontend: surface the backend error detail in the toast so users see the real reason instead of a generic "Failed to update" message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 21:39:21 +00:00
Michael Chihlas	04ff2ea301	fix(tickets): refresh status and resources in detail panel after update Some checks failed Mirror to GitHub / mirror (push) Successful in 3s Details CI / backend (pull_request) Failing after 17m32s Details CI / frontend (pull_request) Failing after 48s Details CI / e2e (pull_request) Has been skipped Details Status update was returning only new_status (string) and the parent list's onStatusUpdated only set status_name. The <select> was bound to status_id, which never changed — so it visually reverted to the old status even though the PATCH succeeded. - Backend: include new_status_id in the status-update response. - Panel: own currentStatusId/currentStatusName state so the select reflects the change immediately and survives stale parent snapshots. - Parent list: update status_id on both the row and selectedTicket so the list row stays in sync when the panel stays open. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-16 21:28:48 +00:00

1 2 3 4 5 ...

423 Commits