resolutionflow

Author	SHA1	Message	Date
Michael Chihlas	8494366ec6	feat(billing): add INTERNAL_TESTER_EMAILS allowlist for self-serve soft cutover Some checks failed Mirror to GitHub / mirror (push) Successful in 5s Details CI / e2e (pull_request) Failing after 1m57s Details CI / frontend (pull_request) Failing after 2m35s Details CI / backend (pull_request) Successful in 9m46s Details Phase O Task 46 needs internal validation of the full self-serve flow against the prod backend before flipping SELF_SERVE_ENABLED public. This adds the per-email allowlist that bypasses the global flag for specific authenticated users. - INTERNAL_TESTER_EMAILS: comma-separated list, parsed by a Pydantic field_validator into a normalized lowercase list. Settings.is_internal_tester and Settings.is_self_serve_active_for centralize the allowlist + global-flag check; both endpoints below call the latter. - New get_current_user_optional dep — best-effort auth that returns None on missing/invalid token instead of 401. Used by /config/public so the same endpoint serves anonymous public callers and authenticated allowlist members. - /config/public now accepts optional auth and returns self_serve_enabled=True for authenticated allowlist members even when the global flag is off. Anonymous callers always see the global flag. - /auth/register replaces the SELF_SERVE_ENABLED check with the helper so a registering email on the allowlist can join without an invite code. Non-allowlist emails still 400 when self-serve is off. - docker-compose.dev.yml passes SELF_SERVE_ENABLED + INTERNAL_TESTER_EMAILS through; backend/.env.example documents both. Tests cover: allowlisted authenticated user sees true, non-allowlisted authenticated user sees the global flag, anonymous calls ignore the allowlist, allowlisted email registers without invite code, non-allowlisted email still blocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 16:57:25 -04:00
Michael Chihlas	ba36c47075	feat(billing): reconcile plan taxonomy and add Stripe sync script The marketing surface (PricingPage, Stripe products) was wired for "Starter / Pro / Enterprise" while the backend was on "free / pro / team", leaving plan_billing unseeded and BillingPlan accepting a literal that violated the FK to plan_limits. This change: - Migration 4ce3e594cb87: defensive UPDATE of any subscriptions on plan='team' to 'enterprise' (dev has zero), renames the plan_limits row team -> enterprise, inserts a starter row with caps interpolated between free and pro (max_trees=10, sessions=75, ai=15/mo). - Renames the plan tier across schemas (invite_code, billing, admin, subscription comment), is_paid/has_pro_entitlement checks in the Subscription model, admin/admin_dashboard plan validators, and the frontend useSubscription isPaidPlan check. Resource visibility uses the same string 'team' in a separate domain (Tree/StepLibrary visibility) and is intentionally untouched. - New backend/scripts/sync_stripe_plan_ids.py: idempotent upsert of plan_billing rows from Stripe products by exact name match. Picks the active monthly recurring price for tiers that have one; leaves annual fields NULL by design. Works against test or live keys. - Test fixture updates: conftest seeds the new taxonomy, the public plans helper is a true upsert so tests can override max_users, and team -> enterprise across test_admin_plan_limits and test_invite_plan. Verified: 86/86 passing across the subscription/billing/plan/invite/ admin sweep; sync script run against test mode populates plan_billing correctly for all three tiers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 15:59:42 -04:00
Michael Chihlas	f1be3abcc5	feat: self-serve signup Phase 2 (frontend cutover) (#162 ) Some checks failed CI / e2e (push) Has been cancelled Details CI / frontend (push) Has been cancelled Details CI / backend (push) Has been cancelled Details Mirror to GitHub / mirror (push) Has been cancelled Details Co-authored-by: Michael Chihlas <michael@resolutionflow.com> Co-committed-by: Michael Chihlas <michael@resolutionflow.com>	2026-05-07 18:42:20 +00:00
Michael Chihlas	97d36dd400	test(kb-accelerator): downgrade kb_setup user to free plan The kb_setup fixture asserts free-plan quota numbers (lifetime_conversions_limit=3), but Phase 1 conftest seeds test_user on Pro. Downgrade explicitly inside kb_setup to preserve the original test intent without affecting other suites. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	f26f468878	feat(billing): pilot user backfill — set existing accounts to complimentary Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	79942c3fd3	feat(billing): add GET /billing/state aggregating subscription + plan + features Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	4768ae0648	feat(invites): add bulk-create and soft-revoke invite endpoints Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	e54d6c586a	feat(invites): wire EmailService.send_account_invite_email into create handler Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	86893562b9	feat(auth): auto-send verification email on register; enforce invite email match Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	b0708ed650	feat(auth): guard login/password paths against OAuth-only users Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	2ef2350de7	feat(auth): add Microsoft OAuth callback Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	f4606f073a	feat(auth): add Google OAuth callback with oauth_identities linking Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	9b709488d9	feat(billing): extend Stripe webhook stub with concrete event handlers Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	18180bc57f	feat(billing): apply_subscription_event with stripe_events idempotency Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	f683bb5720	feat(billing): add /billing/checkout-session via BillingService Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	9851d56633	feat(billing): add BillingService.start_trial; wire into /auth/register Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	519c7eb5ce	feat(deps): add require_verified_email_after_grace guard Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	9ec208f6e7	feat(deps): add require_active_subscription guard with allowlist Mounts on Pro routers (trees, sessions, scripts, FlowPilot, etc.) and returns 402 with structured detail when an account's subscription is missing or locked. Allowlist bypasses billing/account/auth flows so users can recover from a lapsed subscription. Conftest now seeds a default Pro/active Subscription on test_user and test_admin (delete-then-insert because the register endpoint already creates a free/active sub by default). Two existing tests adapted to the new seeded plan; tenant-isolation tests seed Subscription rows for the accounts they create directly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	cfe0e6cae6	refactor(deps): remove trial auto-downgrade; expiry now non-mutating per spec Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	e3f5ed4985	feat(billing): add complimentary status, fix is_paid, add has_pro_entitlement Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	a28b635b19	feat(invites): add revoked_at + email_sent_at to account_invites Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	453ba3fefc	feat(auth): make users.password_hash nullable for OAuth-only accounts Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	143c979975	feat(auth): add oauth_identities table for Google/Microsoft sign-in Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-06 19:14:30 -04:00
Michael Chihlas	00663a4734	feat(suggested-fix): add applied_pending status for deferred verification Some checks failed Mirror to GitHub / mirror (push) Has been cancelled Details CI / backend (pull_request) Successful in 10m43s Details CI / frontend (pull_request) Successful in 5m42s Details CI / e2e (pull_request) Successful in 11m13s Details Engineer applies a fix but can't verify yet (waiting on client power-cycle, AD replication, async sync). Today the verifying banner forces a synchronous verdict (worked / didn't / partial) — anything else means leaving the banner stale or guessing wrong. This adds a fourth outcome that parks the fix in a non-terminal "Awaiting verification" state with a reason ("waiting on what?") and exposes it on the chat-anchored banner so the engineer doesn't lose track. Backend - New non-terminal status `applied_pending` parallel to `applied_partial`. - New `pending_reason` column (nullable Text) — the "what are you waiting on?" prose, mirrors `partial_notes`. Required when outcome=applied_pending. - Outcome endpoint allows pending in/out transitions; pending stamps applied_at but NOT verified_at (it's parked, not verified). - Resolution-note + escalation-package prompts handle the new status: resolution note frames the fix as provisional; escalation package surfaces pending verification as the leading hypothesis with reference to what's being waited on. - Migration: add column + extend status CHECK constraint. Frontend - New `BannerMode = 'pending'` + `PendingBanner` component (info-tone, parallel to PartialBanner) with worked / didn't / update-reason actions. - VerifyingBanner overflow menu adds "Waiting to verify…". - Nudge banner's "Still checking" button now actually records pending with a reason, instead of just silencing for the session. - AssistantChatPage banner-mode derivation maps applied_pending → 'pending'. Tests: 4 new integration tests covering pending notes requirement, reason storage + applied_at/verified_at semantics, pending→success transition, and pending_reason update on re-PATCH. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 17:32:37 -04:00
Michael Chihlas	f10649abc2	fix(escalations): atomic claim + self-claim rejection + queue exclusion All checks were successful Mirror to GitHub / mirror (push) Successful in 5s Details CI / frontend (pull_request) Successful in 4m59s Details CI / backend (pull_request) Successful in 10m22s Details CI / e2e (pull_request) Successful in 10m46s Details Codex review pass on the escalation wedge. Reworks claim_session from read-then-write to a conditional UPDATE so two seniors racing can't both win, blocks the original engineer from claiming their own handoff, and filters self-escalated sessions out of the dashboard escalation queue. Also preassigns the handoff UUID before flush so the compatibility escalation_package payload carries it. Removes legacy frontend pickup state (claiming, handleStartHere) that broke tsc --noEmit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 16:21:20 -04:00
Michael Chihlas	db717b0b3f	feat(escalations): magic-moment 3-option CTA + claim 500 fix - HandoffContextScreen: 3-option layout (Continue/AI analysis/Own thing) with hasTaskLane, activeOptionKey, spinner/disabled states - AssistantChatPage: wire up handleContinue, handleAIAnalysis, handleOwnThing handlers; chip detail expansion inline with copy-button fix; post-escalation redirect to dashboard on ConcludeSessionModal close - TaskLane: fix async copy button (await + execCommand fallback + copiedKey visual feedback); whitespace-pre-wrap on command blocks - Fix 500 on claim: Pydantic v2 model_validate() + model_copy(update={}) (was passing update= kwarg directly which v2 rejects) - HandoffResponse schema: handed_off_by_name field Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-30 00:05:02 -04:00
Michael Chihlas	0f00ee5e01	feat(escalations): close out plan-locked wedge polish Four items from the design-plan audit, all flagged as locked-design or Codex corrections, shipped together so the GTM demo path covers them end-to-end before bug bash. 1. Live AI assessment refresh on the magic-moment screen. Backend already publishes handoff_assessment_ready when enrich_escalation_async commits; wire the frontend listener so the senior sees the assessment populate without a manual reopen. New event type + onAssessmentReady handler on streamEscalations; AssistantChatPage opens a scoped SSE subscription whenever it tracks a handoff missing its assessment, refetches on match, and replaces magicHandoff / overlayHandoff in place. Closes the loop on the async-assessment commit `e8ba74e`. 2. Suggested-step chips below the chat input. Locked design from the plan (Codex correction). Chip strip renders above the composer post-claim when ai_assessment_data.suggested_steps[] is non-empty. Click prefills the input and focuses; first send or explicit X hides for the session. 3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen set (rf-escalation-seen, capped 200). Dot top-right when not seen. Cleared on open (card click) or claim (Pick Up) — NOT on hover, per Codex correction. Pick Up stops propagation so it doesn't double-fire. 4. Race-condition toast on claim conflict. The /claim endpoint previously silently overwrote claimed_by — both seniors thought they owned the session. New HandoffAlreadyClaimedError carries the winner's id/name/ timestamp; claim_session rejects different-user re-claims (same-user is idempotent for double-click safety); endpoint returns 409 with structured detail. AssistantChatPage.handleStartHere extracts and surfaces "Already claimed by {name} {time_ago}." via toast, drops ?pickup=true, dismisses magic-moment so the loser flows back to queue. Tests: 2 new unit tests in test_handoff_manager.py (conflict raises, same-user idempotent). Full handoff + escalation suite (34 tests) green. Frontend tsc -b clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-28 01:59:28 -04:00
Michael Chihlas	9bdd9959a8	fix(handoff): bound escalation assessment latency Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 20:03:14 -04:00
Michael Chihlas	bc15952857	fix(tests): stabilize escalation SSE backend tests Co-Authored-By: Codex <noreply@openai.com>	2026-04-27 19:47:43 -04:00
Michael Chihlas	87bd0b7c56	WIP: SSE pub/sub for live escalation arrivals (paused for Codex review) First half of the WebSocket/SSE push slice. Paused mid-flight to hand the branch to Codex for outside-voice review before stacking more commits on top. See .ai/HANDOFF.md for the full pause context + what to look at. What's here: - backend/app/core/escalation_bus.py — module-level singleton in-memory pub/sub keyed by account_id. asyncio.Queue per subscriber with 64-event maxsize and drop-on-full semantics. Designed to be swappable for Redis pub/sub when Railway scales past single-replica. - backend/app/api/endpoints/session_handoffs.py — GET /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe bound to current_user.account_id. - backend/app/services/handoff_manager.py — dispatch_escalation_notifications now publishes a `handoff_created` event to the bus BEFORE the email fan-out, in a try/except so a bus failure can't block email delivery. - backend/tests/test_escalation_bus.py — 7 unit tests, all green standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers. - backend/tests/test_handoff_manager.py — +1 dispatcher integration test (publishes to bus, payload shape). - backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer blocked, ready event handshake). [gstack-context] Decisions: - SSE over WebSocket (one-way, browser EventSource semantics, fewer moving parts behind Railway proxy) - In-memory bus over Redis for v1 pilot (3 MSPs, single replica) - Drop-on-full subscriber queue rather than back-pressure publishers - Bus publish ahead of email send, both wrapped in try/except so neither can break handoff creation - Frontend will be a fetch-based ReadableStream reader matching the existing streamDocumentation pattern, not native EventSource (custom-header auth) Remaining (post-Codex): - Frontend SSE subscription in EscalationQueue.tsx (slide-in, reconnect, tab-title flash, prefers-reduced-motion) - Magic-moment handoff-context screen - Re-run the full backend test suite to verify the SSE + dispatcher integration tests (bus units already green standalone) Tried: - Running the full test suite repeatedly without xdist; the per-test DROP SCHEMA + recreate fixture made wall-clock prohibitive when multiple stale runs collided on the same Postgres test schema. Resolution: -n auto next time. [/gstack-context] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 19:29:07 -04:00
Michael Chihlas	07d0db9579	feat(handoff): email engineer-or-admin teammates on escalation First half of the Escalation Mode notification dual-path. WebSocket/SSE push is the second half (next commit) — email handles offline seniors, push handles online ones for the magic-moment demo. HandoffManager.dispatch_escalation_notifications: - Pulls active engineer/admin/owner-role users in the same account_id (excludes the escalator + viewers + soft-deleted) - Sends via existing EmailService.send_notification_email, concurrent via asyncio.gather; per-message failures don't block the rest - Wrapped in try/except: any exception is logged + swallowed. Handoff creation is authoritative; notification is advisory. This is the graceful-degradation regression both eng + codex reviews flagged as critical (handoff must succeed even if SMTP is down). Endpoint wiring (POST /ai-sessions/{id}/handoff): - Dispatch fires AFTER db.commit() — never email about a rolled-back handoff. Trust-erosion bug if we got that wrong. - Only fires for intent=escalate. Park is private to the escalator. Tests (4 new): - emails-engineer-recipients-in-account: viewer excluded, escalator excluded, only the engineer/admin teammates get the message - skipped-for-park-intent: park doesn't fan out - graceful-degradation-when-email-raises: RuntimeError from the email service does NOT bubble out of dispatch - endpoint-dispatches-on-escalate: end-to-end wiring through POST Per-channel delivery records (replacing the dead `notification_sent` boolean per Codex correction) is a v1.x story — for now application logs are the audit trail. See docs/plans/2026-04-27-escalation-mode-wedge-design.md. 20 tests green across handoff_manager + session_handoffs_api + flowpilot_analytics_escalations. No regressions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:58:05 -04:00
Michael Chihlas	7a5b853b3b	feat(api): role-gate handoff claim to engineer-or-admin POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an authenticated user, so a viewer-role account user could claim escalations. Codex review flagged this as wedge-relevant: the Escalation Mode race- condition story (two seniors clicking Pick Up simultaneously) depends on auth gating for audit integrity. Originally captured as a deferred TODO during /plan-eng-review, then moved in-scope by /codex review. Swap the dep to require_engineer_or_admin. One-line change. Two new tests: - viewer_role gets 403 with "Engineer or admin access required" - engineer/owner role still succeeds and claimed_at + claimed_by populate Existing handoff create + queue tests unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:46:59 -04:00
Michael Chihlas	52f6d0308f	feat(analytics): add escalation time-to-first-action metric endpoint GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d} Computes the in-product wedge metric for Escalation Mode: average / median / p95 seconds between SessionHandoff.claimed_at and the first ai_session_step created on the same session after that timestamp. Account-scoped, role-gated to engineer-or-admin. The metric is intentionally NOT called "minutes recovered" — that's the two-metric framing locked by /codex review: this in-product number must be paired with manual baseline (the verbal-handoff stopwatch from The Assignment) to produce the savings claim. Schema's `metric_definition` field surfaces the disclaimer in every response so callers don't oversell it. Implementation notes: - Uses correlated scalar subquery for first-step-after-claim per handoff, aggregates avg/median/p95 in Python (~1k rows/account/month is well within budget; cleaner than percentile_cont gymnastics in SQL) - Excludes unclaimed handoffs (claimed_at IS NULL) - Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in n_handoffs_with_action — surfaces the conversion-rate signal - Floors negative deltas at 0 to handle clock-drift edge cases Tests cover happy path, zero-data, claimed-but-no-action accounting, period window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4 RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests, all green. No regressions in existing handoff_manager / session_handoffs suites. First piece of the Approach A wedge build per docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue stat-card and the analytics page. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 15:25:46 -04:00
Michael Chihlas	7f714363dd	perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m Backend suite is the slow gate (1076 passed locally in 22m27s on fix/ci-workflow-config). Adding pytest-xdist with per-worker DB isolation drops it to ~4m20s on the 8-core homelab runner. Verified locally: `pytest -n auto --no-cov` finished in 4m28s real time (15m19s user — confirms ~5× parallelism). How it works: - conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist — 'gw0', 'gw1', …). When set, derives a per-worker DB URL like `…/resolutionflow_test_gw0`. The base DB stays for serial / master runs. - `_ensure_worker_db_exists` runs synchronously at conftest import, connects to the postgres maintenance DB, and `CREATE DATABASE`s the worker-suffixed DB if it doesn't exist. Idempotent across runs. - The "test" safety guard still applies — every worker DB name contains "test" so the assertion holds. - The per-test `DROP SCHEMA public CASCADE` now operates on the worker's isolated DB, no cross-worker race. CI workflow: backend job switches to `pytest -n auto`. Coverage still collected (pytest-cov has built-in xdist support). Adds `pytest-xdist==3.6.1` to requirements-dev.txt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:53:47 -04:00
Michael Chihlas	e976fb4e87	fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Successful in 31m8s Details CI / frontend (pull_request) Successful in 5m42s Details CI / e2e (pull_request) Failing after 4m57s Details Three changes that get PR #150 to a green CI gate: 1. test_record_decision_persists_and_bumps_state_version — the `decision: draft_template` path calls `_extract_template_parameters` (TemplateExtractionService → AI provider). CI doesn't set ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised `RuntimeError: No AI provider configured` and returned 500. The test isn't exercising the AI integration — patched the extractor with an AsyncMock returning a minimal valid `{templated_body, parameters}` dict. Verified locally: the test now passes. 2. pip + npm caches in backend, frontend, and e2e jobs. Keyed on the hash of requirements.txt / package-lock.json with a runner-os restore-key fallback. Saves ~30-60s per run on cache hit. 3. Pytest invocation tightened*: - Dropped `--cov-report=term-missing` — the custom "Display coverage summary" step below parses coverage.json and prints the same module list more concisely. Term-missing dumps every uncovered line which adds ~5-10s of stdout. - Added `--maxfail=10` so a structural breakage (fixture explosion, DB unreachable) bails after 10 errors instead of running the full 25-min suite. Tunable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 12:01:05 -04:00
Michael Chihlas	49f88569da	wip(handoff): restore backend suite to green Some checks failed Mirror to GitHub / mirror (push) Successful in 12s Details CI / backend (pull_request) Failing after 27m35s Details CI / frontend (pull_request) Successful in 2m46s Details CI / e2e (pull_request) Failing after 4m9s Details Co-Authored-By: Codex <noreply@openai.com>	2026-04-25 06:13:23 -04:00
Michael Chihlas	d6218f2e07	fix(tests): import all models in conftest so create_all sees the full schema Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 11m23s Details CI / frontend (pull_request) Failing after 2m41s Details CI / e2e (pull_request) Has been skipped Details The test_db fixture calls Base.metadata.create_all on a fresh test DB. That only creates tables for models that have been imported (and thus registered with Base.metadata) by the time the fixture runs. app.main imports app.core.database (which gives us Base) but does NOT eagerly import the model modules — most are pulled in lazily inside scheduler functions (archive_stale_ai_sessions etc.) and route modules. At fixture-setup time, only the handful of models touched by those eager imports are on the metadata, so any test that exercises PSA, network diagrams, ratings, escalations, etc. fails with \`UndefinedTableError: relation "X" does not exist\` and a cascade of 500s on every endpoint that queries the missing table. Adding \`from app import models as _models\` (rather than the bare \`import app.models\` which would shadow the \`app\` FastAPI instance imported just above) pulls in app/models/__init__.py, which itself imports every model module — registering all ~60 tables with Base.metadata before create_all runs. Verified locally: tests/test_psa_writeback_phase4.py went from 1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were masking the actual passes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 02:49:06 -04:00
Michael Chihlas	1c904373f8	Merge main into feat/flowpilot-migration Some checks failed Mirror to GitHub / mirror (push) Successful in 11s Details CI / backend (pull_request) Failing after 36s Details CI / frontend (pull_request) Failing after 1m7s Details CI / e2e (pull_request) Has been skipped Details Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top of a unified main. Two manual conflict resolutions: 1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven protocol). The pre-rewrite reference content (CW integration notes, lessons archive, env vars table) lives in `docs/connectwise/`, `docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design. 2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks (facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint). Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay, NewTicketModal) kept. All setters wired by either branch are intact. Verification: - `tsc -b` clean across the merged tree. - Browser smoke-test (Session B fixture): Phase 9 ProposalBanner ("Run AI-drafted PowerShell to recover SSL VPN") renders alongside PSA's new Tickets sidebar icon. Console clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 01:03:33 -04:00
Michael Chihlas	b14a16a1ab	chore(tests): gate RLS tests behind RUN_RLS_TESTS flag Continues the test-isolation work from `dab740d`. RLS migration tests run against a policy-installed database and fail in the default create_all suite, so they need to be opt-in: - pytest.ini: register `rls` marker. - conftest.py: auto-deselect test_rls_isolation.py unless RUN_RLS_TESTS=1. Drops the deprecated session-scoped event_loop fixture (not needed since pytest-asyncio 0.23+). - test_rls_isolation.py: tag module with `rls` marker. Replace hardcoded `patherly_test` DB reference with parsed DATABASE_TEST_URL (matches conftest.py default `resolutionflow_test`). Updated docstring command to show RUN_RLS_TESTS=1. - requirements-dev.txt: bump pytest-asyncio 0.23.0 → 0.24.0 (loop-scope marker behavior required by the RLS module fixture). Run the RLS suite with: RUN_RLS_TESTS=1 DB_APP_ROLE_PASSWORD=... pytest tests/test_rls_isolation.py Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 16:09:13 -04:00
Michael Chihlas	dab740ddf7	fix(tests): isolate test DB from dev DB and plug admin-db override gap All checks were successful Mirror to GitHub / mirror (push) Successful in 3s Details Root cause of the 06:32 AM outage: running 'pytest tests/' inside the resolutionflow_backend container silently dropped the public schema on the DEV database. Two layered bugs made this possible; both are fixed. Bug 1 — env-var lookup in conftest.TEST_DATABASE_URL put DATABASE_URL (which normally points at the dev/prod DB) ahead of DATABASE_TEST_URL. When DATABASE_URL is set, pytest used the dev DB as the 'test' DB and the test_db fixture's DROP SCHEMA public CASCADE wiped it. Fixed: - Honor only DATABASE_TEST_URL (or the localhost fallback). - Assert at module load that the DB name contains 'test' — refuses to run otherwise. Makes future misconfiguration impossible. Bug 2 — conftest overrode app.dependency_overrides[get_db] but not get_admin_db. Endpoints using get_admin_db (register, admin routes) bypassed the test session and hit the real admin DB. Before Bug 1 was fixed this was hidden because both engines pointed at the same dev DB. With isolation in place, register started failing 'Email already registered' because of stale users in the dev DB. Fixed: - Also override get_admin_db to yield the same test session. RLS is not enabled in the create_all-managed test schema, so sharing is safe. Also adds DATABASE_TEST_URL=resolutionflow_test to docker-compose.dev.yml so pytest in the container works out of the box. Verified: 49/50 Phase 8 + 9 tests pass against resolutionflow_test; the 1 failure is the pre-existing Phase 8 Issue #4 (test_record_decision_persists_and_bumps_state_version). Refs gitea #145 (will update that issue with this as the primary fix). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 13:14:08 -04:00
Michael Chihlas	1c855563ee	feat(pilot): PATCH /suggested-fixes/:id/script endpoint Called by the inline Script Builder tab on Submit. Writes ai_drafted_script + ai_drafted_parameters to the fix without stamping applied_at (a draft is not an application — that's §5 of the Phase 9 spec). Bumps state_version so Resolve/Escalate preview bundles regenerate. 409 on terminal fix status. 404 on wrong session. 422 on empty script. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:34:06 -04:00
Michael Chihlas	d4fae87236	feat(pilot): inline Script Builder session — idempotent create + auth + filtered list POST /script-builder/sessions now supports origin='pilot_inline': - Requires ai_session_id; validates it against current user ownership. - Get-or-create: returns existing row for (user, ai_session_id) pair. - Partial unique index on the DB backs the invariant; races resolve to the single winner row. list_sessions + count_user_sessions default-scope to origin='standalone' so inline scratch sessions don't pollute the /script-builder dashboard or count against the 5-session cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 02:24:57 -04:00
Michael Chihlas	70c5da0c75	fix(pilot): persist AI-proposal rejection + clear on outcome write Issue #3 from phase-8-review-issues.md. 'Not yet' on the AI-confirming banner was a local-state hide; the proposal re-surfaced on the next refreshSessionDerived call. Two-part fix: - PATCH /outcome now clears ai_outcome_proposal on any terminal action (engineer has taken a decision; stale AI proposal is moot). - New DELETE /ai-sessions/:sid/suggested-fixes/:fid/ai-outcome-proposal endpoint for explicit 'Not yet' rejection. Does not touch status or state_version — pure UI state. Frontend handleRejectAIProposal now calls the DELETE and setActiveFix with the server response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:15:48 -04:00
Michael Chihlas	de2bef3175	fix(pilot): persist Apply — stamp applied_at on click Issue #2 from phase-8-review-issues.md. Apply was client-side-only via a bannerApplied flag. Refresh / chat reselect / multi-tab would drop Verifying state back to Proposed. - New POST /ai-sessions/{sid}/suggested-fixes/{fid}/apply stamps applied_at without changing status (still 'proposed'). Idempotent if already stamped; 409 if fix is past proposed (a terminal outcome was already recorded). - Bumps state_version so resolve/escalate preview bundles reflect that the fix has entered verifying. - Frontend handleApplyFix calls the endpoint and uses the returned applied_at directly. bannerApplied client flag is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:10:52 -04:00
Michael Chihlas	362c7b1d79	fix(pilot): outcome-aware Resolve/Escalate previews Issue #1 from phase-8-review-issues.md. Cache invalidation alone isn't enough — previews were also omitting outcome fields from the LLM bundle, so a fresh regenerate still couldn't distinguish proposed / failed / partial / success. - PATCH /outcome now bumps ai_sessions.state_version (matches record_decision's existing pattern). - Resolution-note + escalation-package bundles now include status, applied_at, verified_at, partial_notes, failure_reason on the active fix. - Generator prompts prescribe outcome-aware phrasing (closure language for success; what-we've-tried + next-steps for failed/partial). - New end-to-end test asserts the regenerated preview reflects the recorded outcome, not just that the cache key changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 22:04:56 -04:00
Michael Chihlas	2cde6673b0	feat(pilot): [FIX_OUTCOME] system prompt instructions Tells the AI when + how to emit the [FIX_OUTCOME] marker that Task 4's parser consumes. Placeholder-only per the anti-parrot pattern — no literal UUIDs, outcomes, or reasons that could leak into unrelated sessions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:17:21 -04:00
Michael Chihlas	c0112f8bee	feat(pilot): [FIX_OUTCOME] marker parser + AI outcome proposal The AI emits [FIX_OUTCOME] when the engineer indicates in chat that a prior suggested fix worked, didn't work, or was partially applied. The marker writes to session_suggested_fixes.ai_outcome_proposal (JSONB), which the frontend surfaces as a "confirm outcome?" banner. The status column is only updated when the engineer clicks confirm (via PATCH /outcome endpoint from Task 3). Placeholder-only system prompt wiring comes in Task 5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:08:43 -04:00
Michael Chihlas	8988dbc885	feat(pilot): PATCH /suggested-fixes/:id/outcome endpoint + tests Records engineer-reported outcome (applied_success\|applied_failed\| applied_partial\|dismissed). Enforces transition rules (partial → success/ failed allowed; terminal outcomes return 409) and notes requirements (applied_partial requires notes). Sets verified_at on success/failure, stamps applied_at if not already set (handles the case where the AI [FIX_OUTCOME] marker fires before the engineer clicks Apply). Also fixes pre-existing test-infrastructure bug: network_diagram.py used bare string server_default="'[]'" for JSONB columns, which asyncpg rejects during test schema creation. Changed to text("'[]'::jsonb") to match the pattern used by script_template.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:59:34 -04:00
Michael Chihlas	4aaf57adb5	feat(pilot): Phase 6 — post-resolve templatize prompt + draft accept/reject All checks were successful Mirror to GitHub / mirror (push) Successful in 11s Details Closes the loop on the Phase 5 "Run now, templatize after resolve" path. After a session resolves, drafts queued by the three-option dialog surface as a modal that lets the engineer review the AI-proposed parameterization and either save as a reusable team template or skip. A "don't ask again" toggle writes to account_settings.preferences so the next resolve won't pop the modal. Backend: - /api/v1/draft-templates: * GET — list account drafts (pending_only default true; pass false for audit view including accepted/rejected) * GET /{id} — single draft * POST /{id}/accept — promotes to a new script_templates row with source_session_id / source_user_id / source_ticket_ref populated (drives the Script Library "generated from CW #X · resolved by Y" provenance chip). Draft flips to status=accepted, promoted_template_id set, resolved_at stamped. 409 on re-accept / already-rejected. 400 on unknown category_id. * POST /{id}/reject — flips to status=rejected. 409 on re-reject. - /api/v1/accounts/me/preferences (GET/PATCH) — thin wrapper over AccountSettings.get_setting/set_setting. PATCH merges keys into the JSONB column, preserving existing keys the client didn't touch. Used by the "Don't ask again for this team" checkbox (templatize_prompt_enabled=false) and, forward-looking, by cw_resolved_status_id / cw_escalated_status_id from Phase 4. - 13 tests: list filter, accept with/without edited_body, provenance copy-through, reject, 409 on re-accept / re-reject, 400 on unknown category, prefs round-trip with merge semantics. Frontend: - src/components/pilot/script/TemplatizePrompt.tsx — modal showing the drafted script with proposed parameters in the Phase 5 ParameterizationPreview, editable name/category/description, an individual-parameter remove button, and the "don't ask again" opt-out. Accept posts to /draft-templates/{id}/accept + optionally PATCHes preferences. Skip posts /reject. - src/api/draftTemplates.ts — typed client plus accountPreferencesApi. - AssistantChatPage: after a successful Resolve (external OR local), fetches preferences + pending drafts for the session and queues the modal one draft at a time. Escalate does not trigger this flow. - Sidebar: Scripts nav shows the pending-draft count as a badge. Fetched independently of the main sidebar stats so endpoint flakes don't break the rest of the sidebar. Verified live 2026-04-22: seed two drafts → GET sees both pending → accept draft A (template created, provenance CW #99123 populated) → reject draft B → pending count drops → PATCH opt-out → GET confirms persistence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:37:49 -04:00
Michael Chihlas	d0ebdef9e8	fix(ai): full-sweep audit — placeholders only in system prompts + CI guardrail All checks were successful Mirror to GitHub / mirror (push) Successful in 10s Details The "AI parrots example content from system prompt" bug bit us twice in one day across two different prompt sites. Patching individual prompts is treating the symptom; this commit makes the rule structural. Audit + sanitize: - assistant_chat_service.ASSISTANT_SYSTEM_PROMPT — already cleaned in prior commits, but the [FORK] schema still had literal "Brief reason" / "Short name" / "One sentence" placeholders. Replaced with <angle-bracket> placeholders. Anti-parrot rule itself rewritten to describe the failure mode abstractly instead of naming "jsmith" so the rule no longer trips the guardrail (and so the model doesn't see "jsmith" as a token at all). - ai_chat_service.py — removed three concrete-example offenders: "Get-Service ADSync" command literal, the "DC01 server_name" intake form payload (in two places), and the inline interview demos using "Azure AD Sync failures" / "Exchange Online mailbox migration". Replaced with technology-neutral schema descriptions. - ai_tree_generator_service.BRANCH_DETAIL_SYSTEM_PROMPT — replaced the fully-fleshed DNS troubleshooting tree (with literal Dnscache / ipconfig / google.com / Start-Service) with a placeholder schema showing only ID-linkage shape. - kb_conversion_service.PROCEDURAL_SYSTEM_PROMPT — replaced the worked Server Manager + DC01 example payload with a placeholder schema. Guardrail (tests/test_prompt_anti_parrot.py): - Imports every module under app/services/ and app/core/ and walks every uppercase string constant ending in _PROMPT, _SCHEMA, _PROTOCOL, _FORMAT, or _CONTEXT. - test 1: known-leaked-token list (jsmith, DC01, ADSync, Dnscache, google.com, "Outlook keeps", "Teams drops") must not appear in any prompt constant. Add to the list when a new leak shows up in prod — the list IS the audit trail. - test 2: marker blocks ([QUESTIONS], [ACTIONS], [SUGGEST_FIX], etc.) must contain placeholders only. Distinguishes JSON keys (followed by ':', allowed) from JSON values (followed by ',' / ']' / '}', must be <placeholder>); allows pipe-separated enum types (text\|password\|select) and a small set of fixed enum values (question, diagnostic_check, decision, action, ...). Verified by feeding the test a known-bad block — caught it correctly. Documented the rule in CLAUDE.md → AI / FlowPilot lessons, naming the test as the enforcement point so future contributors know how to extend it (add to the known-leaked list when a new leak surfaces). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 02:09:30 -04:00

1 2 3 4

177 Commits