199 Commits

Author SHA1 Message Date
f85b90c95e fix(frontend): satisfy phase 2 lint checks
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 7m18s
CI / backend (pull_request) Successful in 10m23s
CI / e2e (pull_request) Successful in 9m31s
Co-Authored-By: Codex <noreply@openai.com>
2026-05-07 12:02:49 -04:00
5e6541ab92 fix(ci): set up node in gitea workflow
Some checks failed
Mirror to GitHub / mirror (push) Successful in 7s
CI / frontend (pull_request) Failing after 2m48s
CI / backend (pull_request) Successful in 15m5s
CI / e2e (pull_request) Successful in 8m47s
Co-Authored-By: Codex <noreply@openai.com>
2026-05-07 11:45:58 -04:00
4a37a47887 chore(env): standardize backend python on 3.12
Some checks failed
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Failing after 1m8s
CI / e2e (pull_request) Successful in 12m9s
CI / backend (pull_request) Successful in 15m24s
Co-Authored-By: Codex <noreply@openai.com>
2026-05-07 11:31:28 -04:00
f31b873459 wip(handoff): record native python status
Co-Authored-By: Codex <noreply@openai.com>
2026-05-07 11:14:59 -04:00
380fcf7bde docs(env): document Stripe env vars in backend/.env.example
Some checks failed
Mirror to GitHub / mirror (push) Successful in 6s
CI / frontend (pull_request) Failing after 1m10s
CI / backend (pull_request) Failing after 1m24s
CI / e2e (pull_request) Failing after 1m25s
STRIPE_SECRET_KEY / STRIPE_PUBLISHABLE_KEY / STRIPE_WEBHOOK_SECRET are
required for the self-serve signup flow's checkout, portal, and webhook
paths. When unset, settings.stripe_enabled returns False and Stripe
code paths short-circuit cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:58:15 -04:00
4b098deac5 docs(handoff): record four post-implementation fixes from external review
OAuth refresh-token storage, OAuth setTokens authenticated flag, Stripe
webhook idempotency atomicity, and the missing /account/billing pages.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:45:35 -04:00
502c0a44e8 feat(billing): add /account/billing and /account/billing/select-plan pages
Wires up the missing frontend billing surfaces that TrialPill, UpgradePrompt,
NextStepCard, and SetupChecklist all link to. Trial-expired, canceled,
past-due, and "Pick a plan" CTAs no longer 404.

- BillingPage: subscription summary, status-specific messaging
  (trialing / past_due / canceled / complimentary), Manage billing button
  routed through the Stripe Customer Portal, and a Pick/Change-plan link.
- SelectPlanPage: plan picker with monthly/annual toggle + seat count.
  Starter/Pro hit /billing/checkout-session; Enterprise links to
  /contact-sales. Active current plan is tagged "Current plan" with a
  disabled CTA.
- billingApi.getPortalSession + createCheckoutSession; getPortalSession
  surfaces a typed BillingPortalError (no_stripe_customer / stripe_not_
  configured) so the UI can show the right toast.
- AccountSettingsPage gets a Billing link card so the page is discoverable
  from the account hub.
- 10 new vitest cases covering subscription summary, trial/past-due/
  canceled/complimentary states, portal-session error fallback, plan-card
  rendering, checkout payload, and current-plan badge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:43:48 -04:00
06200fabb1 fix(billing): make Stripe webhook idempotency atomic so failed handlers can retry
Previously `apply_subscription_event` committed the StripeEvent idempotency
row before invoking the handler, then the handlers each committed their own
mutations. If a handler raised mid-flight (transient DB error, network blip,
race), the idempotency mark was already persisted — Stripe's retry would hit
the IntegrityError branch and silently return False, and the subscription
state would permanently desync from Stripe.

Switch to a single atomic transaction:
- Insert the StripeEvent + flush (catch IntegrityError on duplicate event_id).
- Run the handler.
- Commit on success; roll back the entire transaction on failure and re-raise.

Drop the four `db.commit()` calls inside `_handle_*` so the outer caller owns
commit. The webhook endpoint already lets exceptions propagate, so a 500
response now correctly tells Stripe to retry.

Tests: three new regression cases in test_stripe_webhook_handler.py covering
handler failure (no idempotency mark persisted), retry-after-failure success,
and duplicate-event-id skip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:36:13 -04:00
3630dd5a80 fix(auth): mark store authenticated after OAuth setTokens
setTokens() previously only set { token } without flipping
isAuthenticated, so after the OAuth callback exchange the store had
fresh tokens but ProtectedRoute still saw isAuthenticated === false and
bounced the user to /landing before fetchUser() could complete.

Storing tokens implies an active session, so set isAuthenticated: true
inside setTokens. The other caller (refresh interceptor in api/client.ts)
runs from an already-authenticated session, so the flag flip is a no-op
there.

Tests:
- new src/store/authStore.test.ts covers the setTokens contract
- src/pages/__tests__/OAuthCallbackPage.test.tsx adds a successful-
  callback case asserting setTokens + fetchUser are invoked with the
  exchanged tokens

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:32:53 -04:00
5e0c9d2de1 fix(auth): store OAuth refresh token JTI to fix /auth/refresh after OAuth signup
OAuth callbacks (POST /auth/google/callback, POST /auth/microsoft/callback)
issued refresh tokens via create_refresh_token() but never persisted the JTI
in the refresh_tokens table. The /auth/refresh rotation logic does a
conditional UPDATE that requires a matching unrevoked row; without it the
first refresh attempt 401s with "Refresh token has been revoked" and OAuth
users get effectively logged out after the ~5 minute access-token expiry.

- Promote _store_refresh_token to module-public store_refresh_token in
  app.api.endpoints.auth (existing callers in /login, /login/json, /refresh
  updated in-place — same module, just renamed).
- OAuth callbacks now call store_refresh_token(...) + db.commit() after
  _sign_in_or_register returns. _sign_in_or_register already commits the
  user/account/identity rows; the refresh-token row gets its own commit.
- Tests:
  - test_oauth_google_callback_stores_refresh_token_jti — asserts the JTI
    hash is in refresh_tokens after a Google callback.
  - test_oauth_refresh_works_after_oauth_signup — full e2e: callback -> use
    returned refresh token at /auth/refresh -> 200 with rotated tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 01:30:14 -04:00
fee4cb5b74 docs(handoff): capture Phase 2 (frontend cutover) code completion
Tasks 27–44 implemented across 18 commits on this branch. Phase O (Stripe
live setup, internal validation, flag flip) is the manual operational
follow-up and is the resume point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 23:46:15 -04:00
c75ce0c9a3 feat(sales): redirect beta-signup to /register; queue waitlist emails
Phase 2 retires the public beta-signup form in favor of the self-serve
register flow. The /api/v1/beta-signup POST endpoint stays mounted but
now responds with 307 to /register?from=beta so any external links keep
working and analytics can tag signup origin via the from query param.

Note: there is no beta_signup table in the schema — the original
endpoint only fired an email notification, so there is no waitlist to
read and no migration to run for the email-sent_at field. The one-off
admin script in the spec is therefore a no-op and is intentionally not
added here.

- Replace POST /beta-signup handler with RedirectResponse(307)
- Drop the EmailService.send_beta_signup_notification call (the user is
  now redirected into the register flow, which has its own email path)
- Add tests/test_beta_signup_redirect.py covering the 307 + Location

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 23:43:35 -04:00
db2478dd89 feat(sales): add /contact-sales form + landing page CTA
Public Talk-to-Sales surface and a "See pricing" hero CTA on the marketing
landing page. Phase 2 Task 43 of self-serve signup.

- frontend/src/api/sales.ts: salesApi.createLead -> POST /sales-leads.
- ContactSalesPage at /contact-sales (public, gated by self_serve_enabled
  with a 404-style fallback). Form fields: name, work email, company,
  team size (1-2 / 3-5 / 6-10 / 11-25 / 26+), and an optional
  "what brought you here?" textarea -> message. Submit button disabled
  while in flight to block duplicate submissions.
- Confirmation surface replaces the form on success. Calendly block is
  hidden when VITE_CALENDLY_URL is unset.
- detectSource(): 'pricing_page' if document.referrer contains '/pricing',
  else 'landing_page'. Server emits the canonical PostHog
  talk_to_sales_form_submitted event with this source.
- LandingPage: new "See pricing" hero CTA gated by useAppConfig().
  self_serve_enabled.
- frontend/.env.example + Dockerfile: VITE_CALENDLY_URL ARG/ENV.
- Tests: ContactSalesPage submit/confirmation, Calendly hide-when-unset,
  in-flight de-dup, 404 when self-serve off; LandingPage CTA on/off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:31:56 -04:00
67fae91087 feat(pricing): add /pricing page (B-style)
Phase 2 Task 42: public pricing page gated by SELF_SERVE_ENABLED.

Backend:
- New `GET /api/v1/plans/public` (no auth) returns plan_billing rows
  joined with plan_limits.max_users (as `max_seats`), filtered to
  is_public=true AND is_archived=false, ordered by sort_order ASC,
  plan ASC. Uses get_admin_db (cross-tenant catalog read, same pattern
  as /config/public).
- `PublicPlanResponse` schema in app/schemas/billing.py.
- Registered as PUBLIC in api router.

Frontend:
- `plansApi.getPublic()` client (frontend/src/api/plans.ts).
- `PricingPage` at /pricing with hero / 3 plan cards (Pro recommended,
  Enterprise hides price) / hardcoded v1 comparison table / testimonial
  placeholder / soft trust strip.
- Reads `useAppConfig().self_serve_enabled`; renders a 404 fallback
  when disabled, never calls the API in that path.
- Start free trial CTAs link to /register?plan=starter|pro; Talk to sales
  links to /contact-sales (page wired in Task 43).

Tests:
- Backend: only-public-rows + sort-order ordering.
- Frontend (Vitest): three plan cards with API prices, /register?plan=pro
  CTA, /contact-sales CTA, 404 when self_serve_enabled is false, soft
  trust language (no SOC2 claim).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:26:27 -04:00
0c326d0616 feat(dashboard): replace checklist with next-step card + unified list
Phase 2 Task 41 — Dashboard redesign.

Backend:
- Extend GET /users/onboarding-status with email_verified and shop_setup_done.
- tried_ai_assistant kept in payload for backward-compat during deploy.

Frontend:
- New NextStepCard: surfaces the highest-priority incomplete onboarding item
  with a primary CTA. Priority order: verify email > set up shop > run first
  FlowPilot session > connect PSA > invite teammate > pick a plan (gated on
  trial stage warning/urgent/expired). Returns null when all done OR
  onboarding_dismissed.
- New SetupChecklist: unified single list (no SOLO/TEAM bifurcation), drops
  the stale tried_ai_assistant / Script Builder item, surfaces "Pick a plan"
  when trial stage is warning or later.
- Mounted on QuickStartPage below the hero with a "Show all setup steps"
  toggle. The whole onboarding section auto-hides when there's nothing left
  to nudge on, so the dashboard goes back to clean once setup is done.
- Removed the orphaned OnboardingChecklist component (was defined but never
  mounted).
- New useOnboardingStatus hook so page + components share one fetch contract.

Tests:
- Backend: test_onboarding_status_includes_email_verified_and_shop_setup_done.
- Frontend (Vitest): 13 new tests across NextStepCard, SetupChecklist, and
  QuickStartPage covering priority ordering, dismissal, the SOLO/TEAM
  removal, the toggle reveal, and the trial-stage gate on Pick a plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:19:58 -04:00
99343ab7a9 feat(dashboard): add TrialPill in AppLayout topbar
Mounts a billing-state pill in the topbar that reads useTrialBanner() and
renders the appropriate label / tone / CTA per spec:

- pristine / warning  → "Pro trial · Nd"   (info → warning amber as days drop)
- urgent              → "Pro trial · today" (warning amber, semibold)
- expired             → "Trial expired — pick a plan" → /account/billing/select-plan
- paid                → planBilling.display_name (quiet)
- complimentary       → "Complimentary Pro" (accent, no CTA)
- past_due            → "Payment failed — update card" → /account/billing
- canceled            → "Reactivate" → /account/billing/select-plan
- null                → hidden

Uses existing design-system tokens only (text-info/bg-info-dim,
text-warning/bg-warning-dim, text-danger/bg-danger-dim, text-accent/
bg-accent-dim, text-muted-foreground/bg-elevated). Clickable variants
render as react-router-dom <Link>s and are keyboard-focusable with an
accent focus-visible ring. Mobile collapses the label to a clock icon
with a title attribute carrying the full text.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:06:09 -04:00
53dd5f13e5 feat(onboarding): add wizard Steps 2 (PSA) and 3 (Invite team)
Step 2 (`/welcome/step-2`): four PSA tiles (ConnectWise / Autotask /
HaloPSA / No PSA yet). Selecting a real PSA reveals a quiet inline
"Connect now" link to `/account/integrations` — credential entry is
intentionally OUT of the wizard. Continue persists `primary_psa`,
Skip advances without writing.

Step 3 (`/welcome/step-3`): up to 10 email/role rows (default 3,
"+ Add another" extends, role defaults to Tech / engineer with
Viewer alt). "Send invites and continue" filters empty rows, POSTs
`/accounts/me/invites/bulk`, then PATCHes onboarding-step
`{step:3, action:"complete"}` and navigates to `/?welcome=true`.
Per-row `failed[]` errors render inline next to the email and the
wizard does NOT auto-advance — user can fix-and-retry or click
"Continue anyway" to mark step complete. Empty + Skip / empty + Send
both advance without sending.

Adds `accountsApi.bulkInvite` and registers `/welcome/step-{2,3}`
in the router. Vitest: 5 named tests (selecting PSA persists,
Skip advances without primary_psa, valid emails create invites,
partial-failure inline error, empty + Skip no-op) + 5 incidental
coverage tests. tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 23:02:00 -04:00
9b517d3320 feat(onboarding): add welcome wizard scaffold + Step 1 (Your shop)
Lays the groundwork for the post-signup welcome wizard (Phase 2,
Task 38). Authed users hitting /welcome are routed to the next
incomplete step based on users.onboarding_step_completed +
users.onboarding_dismissed; refresh resumes correctly because every
navigation persists state server-side first.

Backend:
- Expose onboarding_step_completed (Optional[int]) and
  onboarding_dismissed (bool) on UserResponse so /auth/me drives
  client-side routing without a separate fetch.

Frontend:
- WelcomeRouter handles the /welcome decision table (dismissed → /,
  completed >=3 → /, else next step).
- WelcomeStep1 renders the "Your shop" form (company name pre-filled
  from accounts.name, team size 1-2/3-5/6-10/11-25/26+, role
  Owner/Lead Tech/Tech/Other). Continue PATCHes /users/me/onboarding-step
  with action=complete; Skip-this-step PATCHes action=skip; Skip-the-rest
  POSTs /users/me/onboarding-dismiss-rest. Each action refreshes the
  auth store before navigating so the router resumes correctly on the
  next visit.
- onboardingApi.updateStep + dismissRest (typed against backend
  OnboardingStepRequest/Response schemas).
- Routes mounted inside AppLayout so EmailVerificationBanner persists
  above each step per spec.
- 11 vitest cases covering the routing decision table + Continue / Skip
  / Skip-the-rest / persist-failure paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:54:10 -04:00
7d939a4acf feat(auth): add email verification banner, wall, /verify-email page
Wires up the soft 7-day email-verification grace period UX.

- EmailVerificationBanner now uses the design-system warning tokens
  (bg-warning-dim / text-warning) and hides itself once the grace
  period expires, so the wall takes over without double-messaging.
- EmailVerificationWall picks up data-testids on the resend and
  sign-out CTAs.
- VerifyEmailPage gains a single-fire useRef guard (so React 19
  strict-mode double-invoke doesn't burn the token), an
  already-verified short-circuit that skips the API call, success
  state with auth-store refresh + redirect to /?verified=1, and
  an error state with a resend CTA.

Tests: banner hides past day-7, banner resend triggers API call,
verify success refreshes + redirects, verify short-circuits when
already verified, single-fire guard holds across remount.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 22:46:43 -04:00
39e85c9770 feat(auth): add /accept-invite page + lookup endpoint
Adds the invitee-side flow for self-serve signup Phase 2 (Task 36):

Backend
- Public GET /accounts/invites/{code}/lookup returns
  {account_name, inviter_name, invited_email, role} for a valid invite,
  404 invite_invalid_or_expired_or_revoked otherwise (collapses unknown /
  expired / revoked / used into one anti-enumeration response). Mounted
  in a new account_invite_lookup endpoints module on the public route
  list, uses get_admin_db (BYPASSRLS) since the caller has no tenant.
- OAuthCallbackPayload gains optional account_invite_code + invited_email.
  _sign_in_or_register honors them: a new OAuth user with a valid invite
  joins the invited account (no personal account, no Pro trial), the
  invite is marked used, and OAuth-profile-email vs invite-email mismatch
  raises invite_email_mismatch (matching the email+password register
  contract).

Frontend
- New public route /accept-invite -> AcceptInvitePage. Reads ?code=,
  calls inviteApi.lookupAccountInvite, renders "Join {account} on
  ResolutionFlow" with the invited email locked (rendered as a div, not
  an input), three sign-in options (set password, Google, Microsoft),
  and a clear "ask {inviter} to resend" + mailto: fallback for invalid
  codes.
- OAuth state for invitees is base64url(JSON({csrf, accountInviteCode,
  invitedEmail})). OAuthCallbackPage decodes both shapes, forwards the
  invite fields to the backend, and surfaces invite_email_mismatch /
  invite_invalid_or_expired_or_revoked errors with friendly text.
  Successful invite-OAuth lands on /?welcome=teammate (suppresses the
  welcome wizard for invitees per spec).
- UserCreate type + invite/auth API clients extended for the new fields.

Tests
- Backend: invite lookup happy path + four invalid-state collapse, OAuth
  callback links invite when supplied + rejects on email mismatch.
- Frontend Vitest: AcceptInvitePage renders account name + locked email
  + accept buttons; resend message + mailto on invalid code.

All 43 backend auth/account/invite/email-verification tests green;
frontend Vitest 120/120 green; tsc -b clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:34:22 -04:00
70ab1f34d4 feat(auth): redesign /register with OAuth buttons; hide invite-code under flag
Phase 2 Task 35. Adds OAuth Google/Microsoft sign-in to the register flow,
gated on the public SELF_SERVE_ENABLED flag, and hides the legacy invite-code
field when self-serve is on.

- New `useAppConfig` hook + `configApi`. One-shot module-cached fetch of
  `GET /api/v1/config/public`; falls back to `VITE_SELF_SERVE_ENABLED` env
  var (default false) if the endpoint is unreachable.
- New `OAuthCallbackPage` mounted at `/auth/google/callback` and
  `/auth/microsoft/callback` (public, NOT inside ProtectedRoute). Posts the
  authorization code to the backend, persists tokens, hydrates the auth
  store via fetchUser, and redirects to `/welcome` (new) or `/` (returning).
- `RegisterPage` now renders OAuth buttons + email/password divider when
  `self_serve_enabled` is true and only emits buttons for providers the
  backend reports as configured. Invite-code field hidden in that mode.
  Captures `?plan=pro` into `localStorage.rf-intended-plan` on mount.
- `authApi` gains `googleCallback(code)` / `microsoftCallback(code)`.
- `frontend/.env.example` + `frontend/Dockerfile` document and bake the
  three new VITE_* build-time variables (Lesson 60: Vite needs ARG+ENV).
- Vitest coverage for the three required cases plus the plan-param capture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:15:25 -04:00
ece82225f2 feat(billing): add FeatureGate, UpgradePrompt, EmailVerificationGate components
Three drop-in gating components for the self-serve signup flow.

- FeatureGate reads useFeature(flag) and renders children when enabled,
  else a fallback (default UpgradePrompt). UX-only — security boundary
  remains require_feature on the backend.
- UpgradePrompt resolves a feature key to display name + required plan
  via an inline catalog and links to /account/billing/select-plan.
- EmailVerificationGate gates protected content behind a 6-day grace
  period; renders a minimal EmailVerificationWall (resend + sign out)
  on Day 7+ unverified. Wall design will be refined in Task 37.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 21:01:53 -04:00
0b5ed9aa10 feat(billing): add useFeature, useFeatureLimit, useTrialBanner hooks
Phase 2 Task 33. Components can now ask "is this feature on?", "how many
sessions left?", and "what stage is the trial in?" without re-implementing
the read against useBillingStore.

- useFeature(flagKey): boolean — reads enabledFeatures from store
- useFeatureLimit(field): { used, limit, percentage, isAtLimit, isLoading }
  with non-blocking 60s module-level cache and graceful 404 degradation
- useTrialBanner(): derives stage from subscription status + trial countdown,
  returns null on initial load to prevent flicker
- usageApi.getCount(field) — calls /api/v1/usage/{field}; backend endpoint
  is not yet implemented (planned), so the hook degrades to used=0

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:55:58 -04:00
7a9cb4b03b feat(billing): add useBillingStore and /billing/state integration
T32: Single frontend source of truth for subscription / plan / feature
state. New Zustand `useBillingStore` fetches `/billing/state` (auto-fetch
on login via authStore, reset on logout), exposes `refetch` for
post-Checkout refresh, and is supported by a `useBillingPoll` hook
that re-fetches every 60s while authenticated. The new `billingApi`
client transforms the snake_case backend payload to camelCase at a
single boundary so the rest of the frontend never sees `plan_billing`
or `enabled_features`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:47:50 -04:00
80baf89b00 feat(config): add SELF_SERVE_ENABLED flag + GET /config/public
Phase 2 Task 31. Single flag now controls whether the public-facing
self-serve flow is exposed.

- New public endpoint GET /api/v1/config/public returns
  {self_serve_enabled, oauth_providers}. oauth_providers includes
  "google" if GOOGLE_CLIENT_ID is set and "microsoft" if MS_CLIENT_ID
  is set. No auth required; consumed once by the frontend at load.
- POST /auth/register: when SELF_SERVE_ENABLED=true the platform
  invite-code requirement is bypassed even with REQUIRE_INVITE_CODE=true.
  invite_code stays in the schema for backward compat and still applies
  when supplied. With the flag off, the gate behaves exactly as before.
- Adds backend/app/schemas/config.py with PublicConfigResponse and
  registers the new router in the public/unauthenticated section.
- Adds 3 integration tests in tests/test_config_public.py covering the
  flag round-trip, the regression case (flag off keeps the 400), and
  the new behavior (flag on bypasses the gate, creates user + Pro trial).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:38:50 -04:00
d05b475a41 feat(admin): extend /admin/plan-limits to manage plan_billing fields
Task 30 of self-serve signup Phase 2. Super-admins can now manage Stripe
IDs, display names, prices, and public/archived flags via the existing
admin plan-limits endpoints.

- GET /admin/plan-limits now outer-joins plan_billing and returns
  merged PlanLimitWithBillingResponse rows. Plans without a
  plan_billing row return None for the billing fields.
- PUT /admin/plan-limits accepts the new optional billing fields and
  upserts plan_billing in the same transaction. If no plan_billing
  row exists for the plan and the body includes any billing field, a
  row is created (display_name defaults to plan.capitalize() when
  omitted; display_name is never NULLed out on an existing row).
- After commit, the handler queries account_ids on the affected plan
  and calls BillingService.invalidate_billing_cache(account_ids).
  This is a no-op stub today (logs only) — there's no in-process
  billing cache yet. TODO comment marks the wire-up point.
- 3 new integration tests cover GET-with-billing-present, PUT creating
  a plan_billing row, and the invalidation hook being awaited with a
  list of account_ids.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:35:10 -04:00
694279f89e feat(sales): add POST /sales-leads public endpoint
Phase 2 Task 29 — public Talk-to-Sales submission endpoint.

- New POST /api/v1/sales-leads (public, no auth, rate-limited 5/hour per IP).
- Inserts a sales_leads row, fires best-effort notification email and
  PostHog server-side capture; failures are logged but never fail the
  request.
- New EmailService.send_sales_lead_notification static method.
- New SALES_LEAD_RECIPIENT_EMAIL setting (defaults to sales@resolutionflow.com).
- Schemas: SalesLeadCreate / SalesLeadCreateResponse with literal source enum.
- Tests: happy path (row + email), email-failure resilience, and rate-limit
  enforcement (re-enables the slowapi limiter for the rate-limit assertion
  since DEBUG=true disables it by default in tests).

PostHog server-side instrumentation point is wired in but no-ops gracefully
until app.core.analytics.posthog exists — turning it on is a one-line
change when the backend SDK is configured.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:12:03 -04:00
16f5e4ce05 feat(onboarding): add PATCH /users/me/onboarding-step + dismiss-rest
Persists welcome-wizard Step 1/2/3 progress for self-serve signup Phase 2.
PATCH validates step cannot decrease, ignores `data` on action="skip", and
is idempotent on re-PATCH of the same step. POST /users/me/onboarding-dismiss-rest
backs the wizard's "Skip the rest" button.

Both routes added to _EMAIL_VERIFICATION_ALLOWLIST and _SUBSCRIPTION_GUARD_ALLOWLIST
so the wizard runs before email verification and during the trial. 4 integration
tests cover field writes, skip semantics, decrease guard, and dismiss-rest.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:04:43 -04:00
2f8ec3775e feat(billing): add BillingService.open_customer_portal + GET endpoint
Authed users can now request a Stripe-hosted Customer Portal URL for card
updates and cancellation via GET /api/v1/billing/portal-session. The path is
already in both _SUBSCRIPTION_GUARD_ALLOWLIST and _EMAIL_VERIFICATION_ALLOWLIST
so canceled or unverified-past-grace users can still update billing.

- Returns 503 with {"error": "stripe_not_configured"} when STRIPE_SECRET_KEY unset.
- Returns 400 with {"error": "no_stripe_customer"} when account has no
  stripe_customer_id (must complete checkout first).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 20:00:08 -04:00
f918b766b0 feat: self-serve signup backend (Phase 1) (#161)
All checks were successful
CI / frontend (push) Successful in 5m16s
Mirror to GitHub / mirror (push) Successful in 6s
CI / e2e (push) Successful in 10m22s
CI / backend (push) Successful in 10m55s
2026-05-06 23:46:34 +00:00
fbb41e789c docs(handoff): capture Phase 1 backend completion + followups
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 6m0s
CI / backend (pull_request) Successful in 11m15s
CI / e2e (pull_request) Successful in 10m4s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
97d36dd400 test(kb-accelerator): downgrade kb_setup user to free plan
The kb_setup fixture asserts free-plan quota numbers (lifetime_conversions_limit=3),
but Phase 1 conftest seeds test_user on Pro. Downgrade explicitly inside kb_setup
to preserve the original test intent without affecting other suites.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
f26f468878 feat(billing): pilot user backfill — set existing accounts to complimentary
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
79942c3fd3 feat(billing): add GET /billing/state aggregating subscription + plan + features
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
4768ae0648 feat(invites): add bulk-create and soft-revoke invite endpoints
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
e54d6c586a feat(invites): wire EmailService.send_account_invite_email into create handler
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
86893562b9 feat(auth): auto-send verification email on register; enforce invite email match
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
b0708ed650 feat(auth): guard login/password paths against OAuth-only users
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
2ef2350de7 feat(auth): add Microsoft OAuth callback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
f4606f073a feat(auth): add Google OAuth callback with oauth_identities linking
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
9b709488d9 feat(billing): extend Stripe webhook stub with concrete event handlers
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
18180bc57f feat(billing): apply_subscription_event with stripe_events idempotency
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
f683bb5720 feat(billing): add /billing/checkout-session via BillingService
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
9851d56633 feat(billing): add BillingService.start_trial; wire into /auth/register
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
519c7eb5ce feat(deps): add require_verified_email_after_grace guard
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
9ec208f6e7 feat(deps): add require_active_subscription guard with allowlist
Mounts on Pro routers (trees, sessions, scripts, FlowPilot, etc.) and
returns 402 with structured detail when an account's subscription is
missing or locked. Allowlist bypasses billing/account/auth flows so
users can recover from a lapsed subscription.

Conftest now seeds a default Pro/active Subscription on test_user and
test_admin (delete-then-insert because the register endpoint already
creates a free/active sub by default). Two existing tests adapted to
the new seeded plan; tenant-isolation tests seed Subscription rows for
the accounts they create directly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
cfe0e6cae6 refactor(deps): remove trial auto-downgrade; expiry now non-mutating per spec
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
e3f5ed4985 feat(billing): add complimentary status, fix is_paid, add has_pro_entitlement
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
5105eaf529 feat(billing): add sales_leads and stripe_events tables
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
974b188c1e feat(billing): add plan_billing sibling table for Stripe + catalog metadata
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
a28b635b19 feat(invites): add revoked_at + email_sent_at to account_invites
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
50e7763380 feat(onboarding): add accounts.team_size_bucket and primary_psa for wizard
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
b3ed76c203 feat(onboarding): add users.role_at_signup and onboarding_step_completed
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
453ba3fefc feat(auth): make users.password_hash nullable for OAuth-only accounts
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
143c979975 feat(auth): add oauth_identities table for Google/Microsoft sign-in
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
ab0d40c1e2 docs(plan): self-serve signup & onboarding implementation plans
Adds two phase plans alongside the spec at
docs/superpowers/specs/2026-05-05-self-serve-signup-onboarding-design.md:

- Phase 1 (backend foundation, 26 tasks across 8 sub-phases A-H):
  schema migrations, subscription model + new guards, BillingService,
  Stripe webhook handler extension, OAuth callbacks, email verification
  auto-send + email-match enforcement, account-invite extensions,
  GET /billing/state, pilot user backfill. Step-by-step granularity
  with full code blocks per writing-plans skill.

- Phase 2 (frontend + cutover, 21 tasks across 7 sub-phases I-O):
  Phase-1-deferred endpoints, useBillingStore + hooks + gating
  components, register redesign + OAuth buttons + accept-invite,
  welcome wizard, dashboard redesign, pricing page + contact-sales,
  beta-signup deprecation, cutover. Higher-altitude — defines
  contracts, acceptance criteria, integration tests; leaves
  component-detail decisions to implementer.

Each phase ends in a mergeable PR. Cutover is gated behind
SELF_SERVE_ENABLED + VITE_SELF_SERVE_ENABLED. Execution deferred to
a future session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:30 -04:00
278b9342b4 docs(spec): self-serve signup & onboarding design
Adds docs/superpowers/specs/2026-05-05-self-serve-signup-onboarding-design.md.
Six-section design for opening ResolutionFlow to public self-serve registration
with a 14-day reverse trial on Pro, Stripe-backed billing, sales-assist
Enterprise lane, and a hybrid welcome wizard + dashboard onboarding.

Reuses existing infrastructure (subscriptions, plan_limits, feature_flags,
plan_feature_defaults, account_feature_overrides, account_invites,
email_verification_tokens, /admin/plan-limits, /admin/feature-flags,
/accounts/me/transfer-ownership, /webhooks/stripe stub). New schema is
intentionally small: oauth_identities, plan_billing (sibling to plan_limits),
sales_leads, stripe_events, plus column additions for OAuth identity model
nullability, wizard step state, and pilot-account complimentary status.

Replaces deps.py:109 trial auto-downgrade with a non-mutating computed
expiry check enforced by a new require_active_subscription dep. Adds a
sibling require_verified_email_after_grace dep to enforce the 7-day email
verification grace at the API layer (frontend wall is UX over the same rule).

Defers promo codes from v1. No new combined /admin/plans surface — existing
admin endpoints handle plan/feature configuration with extended response
shape.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 19:14:29 -04:00
a8b22cfa0b feat: post-PR-159 UI cleanup — sidebar IA + account redesign (#160)
All checks were successful
CI / frontend (push) Successful in 5m11s
Mirror to GitHub / mirror (push) Successful in 6s
CI / backend (push) Successful in 10m19s
CI / e2e (push) Successful in 10m31s
2026-05-06 23:14:16 +00:00
b544a7a462 test(e2e): update account page heading assertion to match redesign
All checks were successful
Mirror to GitHub / mirror (push) Successful in 7s
CI / frontend (pull_request) Successful in 5m14s
CI / backend (pull_request) Successful in 9m57s
CI / e2e (pull_request) Successful in 10m21s
8612042 dropped the static "Account Management" heading in favor of the
account name (rendered as a dynamic h1). Switch the smoke test to the
"Settings" SectionLabel — a stable h2 that survives the redesign.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 18:54:53 -04:00
07a3f01184 fix(qa): ISSUE-001 — fall back to members.length when usage.user_count is missing
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m30s
CI / e2e (pull_request) Failing after 11m2s
CI / backend (pull_request) Successful in 14m47s
The /subscription endpoint returns usage as {tree_count, session_count_this_month}
without user_count, so the Seats UsageRow rendered as " / ∞" (blank current value).
The TS type declared user_count: number, hiding this API/type drift; the old
card-stack design hid it visually because each stat had its own border. The new
flat layout surfaced the gap.

Owners get a fallback to members.length (already fetched). Non-owners can't
fetch members and don't need seat-count info, so the row hides entirely for
them. Verified live: owner now sees Seats 2 / ∞.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 01:02:44 -04:00
86120423da refactor(account): redesign settings index, drop card stack
The index page had ~12 distinct card surfaces with three places of
nested cards-inside-cards, against PRODUCT.md's "elevation = lighter
surface + border" + "nested cards are always wrong" rules. Branding
appeared twice, Display Code lived in Identity but does invite work,
and Preferences got a full card for one dropdown.

Single column, max-w-3xl, no card chrome. Sections separated by
border-t rules + mono-uppercase section labels (existing house style):

- Header: inline-editable name + plan/status/owner/member-count info
  line. No card.
- Plan & usage: renewal date right-aligned in section header, three
  thin progress rows replace the 4-card usage stat grid, upgrade
  CTAs right-aligned at bottom.
- People (owner-only): invite form, unified members + pending invites
  list, display code as a quiet "share to invite during signup" line.
  Non-owners see a one-line "managed by your admin" instead of a card.
- Settings: dense route list (icon + title + summary + status pill +
  chevron). Profile above a thin divider; team-admin rows below,
  owner-gated. Branding row carries the Included/Plan-gated pill.
  Support & Feedback as a dim link at the bottom.
- Account actions: plain rows. Owner: Transfer + Delete. Non-owner:
  Leave. Destructive labels colored, no red box-of-doom.

Drops: Access & Security card (filler), Preferences card,
Settings Areas link grid, billing-card branding-status duplicate,
SettingsLinkCard helper. Default export format moves to Profile
Settings where it belongs (personal preference, not account).

856 -> 710 lines on the index. tsc, eslint, vite build clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 23:57:29 -04:00
0f90c0e199 refactor(sidebar): collapse rail/sections to single-IA, log docs
- Sidebar: kill the drifting railGroups + sections dual definition.
  Single source of truth (workItems / libraryItems / footerItems)
  rendered in both pinned and rail modes; pin/unpin is a width and
  label affordance, not an IA switch. Hairline divider replaces
  section labels. Guides moves to the footer alongside Account.
  Renames: Home -> Dashboard, History -> Sessions, Insights -> Analytics.
- CURRENT-STATE.md: log PR #158 (session impeccable pass + tasklane
  keyboard flow) under "Recently shipped".
- PRODUCT.md: design-context source of truth (users, brand, aesthetic);
  sibling to DESIGN-SYSTEM.md.
- skills-lock.json: lock /impeccable + /documentation-writer skill
  versions so other sessions reproduce the same tooling state.
- Drop stale .impeccable.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 22:50:19 -04:00
93fa4eac5c Merge pull request 'feat(guides): rewrite in-product User Guides as Diátaxis how-tos' (#159) from feat/guides-diataxis-rewrite into main
All checks were successful
CI / frontend (push) Successful in 4m57s
Mirror to GitHub / mirror (push) Successful in 6s
CI / backend (push) Successful in 10m38s
CI / e2e (push) Successful in 12m31s
2026-05-02 02:19:53 +00:00
dc71d5873b docs(ai): mark guides rewrite as merged in handoff and current task
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m1s
CI / backend (pull_request) Successful in 13m8s
CI / e2e (pull_request) Successful in 18m32s
Update HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect
that PR #159 is being merged into main, replacing the in-flight
"uncommitted" language with the merged-state rollup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:25:44 -04:00
307a6285e6 feat(guides): rewrite in-product User Guides as Diátaxis how-tos
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 4m57s
CI / backend (pull_request) Successful in 10m21s
CI / e2e (pull_request) Successful in 12m0s
Replace 15 feature-dump guides with 43 problem-oriented how-tos grouped
under 10 categories. Drop Maintenance Flows / AI Assistant / Flow Assist
Sparkles — those surfaces no longer exist post-FlowPilot pivot. Rename
Step Library → Solutions Library throughout. Correct every "click X in
the sidebar" reference to match live labels (Home, History, Tickets,
Flows, Scripts, Data, Acct).

Schema: add `category: CategoryId` and optional `relatedSlugs` to Guide;
new Category type and `categories` const drive hub ordering. GuidesHubPage
renders category sections (auto-hides empty); GuideDetailPage renders a
related-guides footer when set; GuideCard drops the misleading "N sections"
subtitle.

Fix step.tip markdown rendering — `**bold**` rendered literally because
tip used plain text instead of the same regex replacement used on
instruction.

14 net-new how-tos for FlowPilot-era surfaces with no prior coverage:
tasklane keyboard flow, view-what-we-know, ask-AI mid-session,
pause-and-leave, resolve, record-fix-outcome, escalate (Escalation
Mode), post-docs-to-ticket, send-client-update, build-script-from-scratch,
open-suggested-flow, pin-a-flow, invite-teammate.

Browser-verified against engineer + owner test users (sidebar labels,
account sub-pages, pilot-screen header buttons, Tasks panel, integration
form). tsc clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 21:16:51 -04:00
5e10005276 Merge pull request 'feat(session): impeccable pass + tasklane keyboard flow' (#158) from feat/session-distill-quieter into main
All checks were successful
CI / frontend (push) Successful in 5m8s
Mirror to GitHub / mirror (push) Successful in 6s
CI / backend (push) Successful in 10m20s
CI / e2e (push) Successful in 10m43s
Reviewed-on: #158
-Michael Chihlas
2026-05-01 21:53:13 +00:00
d3a9031e23 chore(session): bump keyboard hint contrast + drop redundant font-sans
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m33s
CI / backend (pull_request) Successful in 10m57s
CI / e2e (pull_request) Successful in 13m21s
Two small ergonomic fixes after the impeccable pass:

- TaskLane keyboard hints (⏎ submit · ⇧⏎ newline) under each open input
  were rendered at text-muted-foreground/70, just shy of legible at a
  glance. Drop the /70 opacity modifier so they read at full muted weight
  on first look without becoming visually loud.

- 12 sites across the session screen had explicit font-sans utilities,
  but the body default is already IBM Plex Sans (via --font-sans in
  index.css and Tailwind v4's default-sans binding). None of the call
  sites sit inside a font-heading or font-mono cascade, so every
  font-sans there was a no-op. Drop them. ConcludeSessionModal also had
  three "text-xs font-sans text-xs" triplets — drop both the redundant
  font-sans and the doubled text-xs in one pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 16:50:09 -04:00
708e8b977f chore(ai): log followup TODOs surfaced during impeccable pass
Two backlog entries surfaced while polishing the session screen:

- ConcludeSessionModal paused/escalated step forces a single-artifact
  choice (Ticket Notes / Client Update / Email Draft). Real escalations
  often need at least two of the three. Recommended shape: multi-select
  with smart pre-checks per outcome, parallel generation, per-result
  Copy / Post / Send actions. Feature work, deferred.

- bg-card-hover Tailwind class doesn't resolve in CommandPalette. The
  --color-bg-card-hover token generates bg-bg-card-hover (Tailwind v4
  takes the full token name minus --color-). Other call sites use the
  explicit hover:bg-[var(--color-bg-card-hover)] form that works; the
  CommandPalette classes silently produce nothing. Fix is two lines —
  swap to the explicit form, or add a --color-card-hover semantic
  mapping in index.css.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 16:23:15 -04:00
8b0358af3b fix(parameterization): word-boundary check prevents over-eager value match
ParameterizationPreview.tokenize() matched highlight values via raw
seg.text.startsWith(value, cursor) with no word-boundary check and no
minimum length. A param value like "D" (e.g. a drive letter) lit up every
capital D in the script body — Get-ADUser, Add-Type, Disable- all rendered
as proposed-parameter pills.

Add a word-boundary guard: a candidate match is only accepted if either
side of the match either falls at start/end of the segment, OR the
adjacent character is non-alphanumeric. The guard is conditional on
whether the value itself starts/ends with a word char, so values that
begin or end in punctuation (e.g. "D:\\Folder") still match cleanly when
they sit next to whitespace or punctuation.

Surfaced 2026-05-01 while testing the suggested-fix flow with a real
PowerShell script.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 16:23:05 -04:00
0156aae684 feat(session): impeccable session-screen pass + tasklane keyboard flow
Multi-step UX refactor of the assistant chat session screen, run via the
$impeccable skill. Heuristic score moved 24/40 → 33/40 (+9), with the biggest
gains on Aesthetic & Minimalist (1→3), Consistency & Standards (1→3), and
Recognition Rather Than Recall (2→4).

Distill — chat region:
- Remove the "Suggested checks" chip strip + selected-chip detail card; the
  TaskLane is the single canonical home for "what to do next"
- Add an inline Next steps · N pending cue above the latest action-bearing
  AI bubble (anchors attention without duplicating the lane's items)
- Link banner ↔ script-panel lifecycle: collapsing or dismissing the
  ProposalBanner now also hides the InlineNoTemplateDialog / TemplateMatchPanel
- Drop backdrop-blur on the handoff-context overlay (DESIGN-SYSTEM hard rule)

Quieter — drop decoration overshoot:
- Remove 3px side stripes on TaskLane done cards, all 6 ProposalBanner modes,
  WhatWeKnowItem fact rows
- Drop bg-gradient surfaces on WhatWeKnow + every ProposalBanner mode
- Drop 2px accent borderTop on the TaskLane header
- Replace bordered avatar boxes in banners with inline state-colored icons
- Each surface now uses a single decoration channel (top border + inline icon)

Layout:
- Header consolidates to Resolve + Escalate + ⋯ kebab; Context, New Ticket,
  Update Ticket, Pause now live behind the kebab on desktop, with feature
  parity in the existing mobile overflow menu
- Messages column anchors to max-w-3xl mx-auto to match the composer
- Chat bubbles drop from rounded-2xl to rounded-xl for vocabulary alignment

Typeset:
- Unify text sizing from 14 distinct sizes (with sub-pixel oddities and
  rem/px duplicates) to a 5-step scale: 10px / 11px / text-xs / 13px / text-sm

WhatWeKnow collapsible:
- Header is now a toggle; section body hides when collapsed
- Auto-collapses on first render when facts ≥ 5 so Questions / Diagnostic
  Checks stay above the fold
- Engineer's choice persists in sessionStorage per session and beats the
  auto-collapse heuristic on subsequent renders
- key=activeChatId on both render sites resets state cleanly across sessions

Polish:
- Split MessageCircleQuestion into Pencil (question Answer CTA, write
  affordance) + HelpCircle (per-check Explain toggle, universal help icon) —
  same icon for two different jobs was a discoverability bug
- Drop redundant text-xs from font-sans text-[0.625rem] / text-[0.6875rem]
  double-class definitions; the more-specific size always wins

TaskLane keyboard flow:
- Enter submits and auto-advances to the next pending task; Shift+Enter
  inserts a newline (consistent across question and action textareas — paste
  events don't fire keydown, so paste-then-Enter still works as expected)
- Esc cancels (same as the Cancel button)
- After the last pending task is submitted, focus moves to the Send Responses
  button so the engineer can fire the whole batch with one more keystroke
- Subtle hint row under each open input teaches the shortcut

Type-check, lint, and build all clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 16:22:50 -04:00
4d8b107121 wip(handoff): start issue cleanup plan sections 1 and 2
Co-Authored-By: Codex <noreply@openai.com>
2026-05-01 02:04:19 -04:00
a21fe93454 wip(handoff): clean stale TODOs and plan issue cleanup
Co-Authored-By: Codex <noreply@openai.com>
2026-05-01 01:47:41 -04:00
595844de0b wip(handoff): audit TODO and Gitea issue validity
Co-Authored-By: Codex <noreply@openai.com>
2026-05-01 01:41:37 -04:00
b74d3cf584 Merge pull request 'chore(ai): post-#156 handoff + log shipped features in CHANGELOG/CURRENT-STATE' (#157) from chore/post-156-handoff into main
All checks were successful
CI / backend (push) Successful in 10m46s
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (push) Successful in 5m47s
CI / e2e (push) Successful in 10m33s
Reviewed-on: #157
by Michael Chihlas
2026-05-01 04:38:22 +00:00
50ddacdb66 docs: log #155 + #156 in CHANGELOG/CURRENT-STATE
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 5m4s
CI / backend (pull_request) Successful in 10m25s
CI / e2e (pull_request) Successful in 10m41s
Adds Unreleased entries for the Escalation Mode wedge and the
suggested-fix Awaiting verification outcome — both user-visible
features merged this week. Refreshes CURRENT-STATE last-updated
date to 2026-05-01 and adds a "Recently shipped (post-0.1.0.0)"
quick-reference block at the top.

VERSION untouched (still 0.1.0.0; pre-PMF, no release scheduled).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 00:32:01 -04:00
a5e2dcf43f chore(ai): post-#156 handoff — feature shipped, QA report attached
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
Updates the .ai/ handoff trio after PR #156 merge:
- CURRENT_TASK.md: clear active task; record #156 in Recently shipped
  alongside #155 with one-line summary and QA-report pointer.
- HANDOFF.md: rewrite resume point as "pick next from TODO/roadmap";
  document carry-forward env quirks (CONTAINER=1 for Chromium,
  docker-01 hosts entry, multi-head alembic state).
- SESSION_LOG.md: append session entry for QA + merge.

Also includes the .gstack/qa-reports/ artifacts (report + 8 screenshots).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 23:45:10 -04:00
3ba4532675 Merge PR #156: pending-verification — applied_pending non-terminal outcome
All checks were successful
CI / frontend (push) Successful in 5m6s
Mirror to GitHub / mirror (push) Successful in 6s
CI / backend (push) Successful in 10m6s
CI / e2e (push) Successful in 10m33s
Adds applied_pending non-terminal status, pending_reason column, PendingBanner UI, and review fixes for page-level Resolve/Escalate intercepts.

QA: 5/7 scripted checks PASS with concrete evidence. 2 entry-path checks deferred — same handlers verified via tested transitions.
2026-05-01 03:42:10 +00:00
15042af6e2 docs(ai): document docker-exec pattern for hosts without native toolchains
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 4m57s
CI / e2e (pull_request) Successful in 10m10s
CI / backend (pull_request) Successful in 10m42s
The code-server LXC has bun and docker but no python/node/npm on PATH,
which left Codex unable to reproduce build/test commands. Adds a 6-line
block to PROJECT_CONTEXT.md showing the docker exec resolutionflow_{backend,frontend}
form, and updates the AGENTS.md "Tooling you do NOT have" line to point
Codex at it instead of suggesting toolchain installs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 23:02:53 -04:00
5bee264d70 fix(suggested-fix-pending): apply PR #156 review fixes
- Page-level Resolve patches applied_pending → applied_success before
  opening the resolution flow, so resolved sessions don't carry a
  provisional pending fix.
- Page-level Escalate intercept now catches applied_pending in addition
  to verifying/partial; intercept copy generalized from "Verifying state"
  to "still needs an outcome."
- PendingBanner gains a Dismiss action, matching the PR body and the
  backend's allowed pending → dismissed transition.
- resolution_note_generator and escalation_package_generator system
  prompts no longer include real-looking pending examples (anti-parrot
  guardrail compliance).

Verified via Docker: prompt anti-parrot 2/2, suggested-fix outcome suite
21/21, frontend tsc -b clean, npm run build clean.

Co-Authored-By: Codex <noreply@openai.com>
2026-04-30 23:02:46 -04:00
7cee7228dc docs(ai): refresh handoff for PR #156 — pending-verification feature
All checks were successful
Mirror to GitHub / mirror (push) Successful in 3s
CI / frontend (pull_request) Successful in 5m9s
CI / backend (pull_request) Successful in 9m51s
CI / e2e (pull_request) Successful in 9m22s
Closes out Escalation Mode (PR #155 merged) and pivots active task to
the new applied_pending suggested-fix outcome on PR #156.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:37:08 -04:00
00663a4734 feat(suggested-fix): add applied_pending status for deferred verification
Some checks failed
Mirror to GitHub / mirror (push) Has been cancelled
CI / backend (pull_request) Successful in 10m43s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Successful in 11m13s
Engineer applies a fix but can't verify yet (waiting on client power-cycle,
AD replication, async sync). Today the verifying banner forces a synchronous
verdict (worked / didn't / partial) — anything else means leaving the banner
stale or guessing wrong. This adds a fourth outcome that parks the fix in a
non-terminal "Awaiting verification" state with a reason ("waiting on what?")
and exposes it on the chat-anchored banner so the engineer doesn't lose track.

Backend
- New non-terminal status `applied_pending` parallel to `applied_partial`.
- New `pending_reason` column (nullable Text) — the "what are you waiting on?"
  prose, mirrors `partial_notes`. Required when outcome=applied_pending.
- Outcome endpoint allows pending in/out transitions; pending stamps
  applied_at but NOT verified_at (it's parked, not verified).
- Resolution-note + escalation-package prompts handle the new status:
  resolution note frames the fix as provisional; escalation package surfaces
  pending verification as the leading hypothesis with reference to what's
  being waited on.
- Migration: add column + extend status CHECK constraint.

Frontend
- New `BannerMode = 'pending'` + `PendingBanner` component (info-tone,
  parallel to PartialBanner) with worked / didn't / update-reason actions.
- VerifyingBanner overflow menu adds "Waiting to verify…".
- Nudge banner's "Still checking" button now actually records pending with
  a reason, instead of just silencing for the session.
- AssistantChatPage banner-mode derivation maps applied_pending → 'pending'.

Tests: 4 new integration tests covering pending notes requirement, reason
storage + applied_at/verified_at semantics, pending→success transition,
and pending_reason update on re-PATCH.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 17:32:37 -04:00
ac42f971fc Merge PR #155: Escalation Mode wedge — live arrival + magic-moment pickup
All checks were successful
CI / frontend (push) Successful in 5m7s
Mirror to GitHub / mirror (push) Successful in 6s
CI / e2e (push) Successful in 10m36s
CI / backend (push) Successful in 11m9s
Magic-moment handoff-context screen on senior pickup, live SSE escalation arrivals, time-to-first-action metric, role-gated claim with atomic conflict resolution, and chat ownership extension for claimed sessions.
2026-04-30 21:32:16 +00:00
f10649abc2 fix(escalations): atomic claim + self-claim rejection + queue exclusion
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 4m59s
CI / backend (pull_request) Successful in 10m22s
CI / e2e (pull_request) Successful in 10m46s
Codex review pass on the escalation wedge. Reworks claim_session from
read-then-write to a conditional UPDATE so two seniors racing can't both
win, blocks the original engineer from claiming their own handoff, and
filters self-escalated sessions out of the dashboard escalation queue.
Also preassigns the handoff UUID before flush so the compatibility
escalation_package payload carries it. Removes legacy frontend pickup
state (claiming, handleStartHere) that broke tsc --noEmit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 16:21:20 -04:00
ab5e0deaf7 docs(ai): session 3 handoff — QA complete, chat ownership decision logged
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 01:32:39 -04:00
f601a0db58 docs(ai): QA complete — escalation mode wedge browser-verified
All paths pass. One critical fix: chat endpoint now allows escalated_to_id
as a valid sender so the senior can run AI analysis on claimed sessions.
PR #155 ready for review.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 00:26:18 -04:00
dc69c9ddfb fix(escalations): allow claimed-by user to send chat messages to escalated session
unified_chat_service.send_chat_message checked AISession.user_id == user_id,
blocking the senior who claimed an escalation from sending the AI briefing.
Now also allows AISession.escalated_to_id == user_id (the claimer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 00:17:31 -04:00
db717b0b3f feat(escalations): magic-moment 3-option CTA + claim 500 fix
- HandoffContextScreen: 3-option layout (Continue/AI analysis/Own thing)
  with hasTaskLane, activeOptionKey, spinner/disabled states
- AssistantChatPage: wire up handleContinue, handleAIAnalysis, handleOwnThing
  handlers; chip detail expansion inline with copy-button fix; post-escalation
  redirect to dashboard on ConcludeSessionModal close
- TaskLane: fix async copy button (await + execCommand fallback + copiedKey
  visual feedback); whitespace-pre-wrap on command blocks
- Fix 500 on claim: Pydantic v2 model_validate() + model_copy(update={})
  (was passing update= kwarg directly which v2 rejects)
- HandoffResponse schema: handed_off_by_name field

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 00:05:02 -04:00
fb2dc222fd docs(ai): handoff for fresh session — AI consolidation plan locked
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m9s
CI / backend (pull_request) Successful in 9m43s
CI / e2e (pull_request) Successful in 10m13s
- HANDOFF: rewritten resume point. AI summary blocker is the active
  task; consolidation plan is the path. 5-step implementation order
  with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
  live-test results (what works, the AI summary blocker), full
  consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
  pickup bugs found + fixed, the three Enter/dashboard/timeout
  fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
  calls into one structured generation" — rejected alternatives
  (bump timeout further, copy status-update content the wrong way,
  switch to Haiku) and consequences (5s magic-moment, ~60% token
  reduction, instant Ticket Notes button, schema enforcement
  required, migration concerns documented).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:21:30 -04:00
0d1b305619 fix(escalations): live-test fixes from QA bash
Bundles four fixes from the live debugging session:

1. AssistantChatPage: replace urlSessionId === activeChatId gate with a
   loadedChatIdsRef. After 8914391 made activeChatId initialize from
   urlSessionId, the gate short-circuited fresh mounts and selectChat
   never fired. Symptom: senior picks up an escalation, lands on a blank
   chat surface with no conversation history and no sidebar entry. Fix
   also adds loadChats() in handleStartHere so the picked-up session
   appears in the sidebar (its escalated_to_id is null pre-claim, so
   listSessions doesn't return it until claim_session sets it).

2. config: bump ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS 15s → 45s.
   Sonnet was hitting tail latency at 15s in the field, leaving the
   magic-moment placeholder permanent. Background-task architecture
   (e8ba74e) means this no longer blocks the user; it's just the budget
   before publishing has_assessment=false. NOTE: live test still shows
   assessment not populating — see HANDOFF for the consolidation plan
   that supersedes this.

3. Enter-to-submit: chat-input convention (Enter submits, Shift+Enter
   inserts newline) on the escalate-flow forms. RichTextInput gains an
   optional onSubmit prop; EscalateModal wires it to handleSubmit;
   ConcludeSessionModal gets the same handler on its plain textarea.

4. PendingEscalations: each row is now expandable. Click row body to
   reveal the engineer's escalation reason, step count on record,
   confidence tier, and PSA ticket number. Pick Up still clicks through
   directly. Single-expand-at-a-time keeps the dashboard compact.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:18:40 -04:00
b7d7ff06d2 docs(ai): refresh handoff for compute swap
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m8s
CI / backend (pull_request) Successful in 9m46s
CI / e2e (pull_request) Successful in 10m16s
- HANDOFF: rewritten resume point. First action on resume is `git push`
  (commits 0f00ee5 and 665530f are local-only). Visual QA + bug bash is
  the active work; 4 plan-locked items + the structural task-lane fix
  all need real-browser verification.
- CURRENT_TASK: add 0f00ee5 and 665530f to the commit table; reframe
  "Just shipped" as a per-commit summary; flag the task-lane fix as
  needing visual confirmation.
- SESSION_LOG: chronological entry for this session with full detail
  (audit, four polish items, race-condition wiring, structural
  task-lane fix, test status, files touched).
- DECISIONS: new entry "Tag the task-lane state with an owner chatId"
  documenting the structural pattern, what was rejected, and the
  forward implication that future task-lane state slices follow the
  same owner-tagging pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 08:21:23 -04:00
665530f812 fix(assistant-chat): tag task-lane state with owner chatId to kill stale flash
The previous fix (8914391) only blocked the mount-time sessionStorage
restore when the page entered with prefill or ?pickup=true. It didn't
cover any path where the page was already mounted and activeChatId
flipped without the in-memory task-lane state going through reset+
repopulate cleanly — in-place URL navigation, mid-flight pickup,
HMR re-runs, the gap between setActiveChatId(B) and the AI response
that finally populates B's questions/actions.

Root cause: activeQuestions / activeActions / showTaskLane were never
intrinsically tied to a chatId. They were treated as "the active chat's
data" by convention, with no structural enforcement. Any window where
they survived past their owning chat leaked previous-session data into
the new view. The persistence effect made it worse: it stamped the
sessionStorage chatId field with activeChatId at write time, so a
mid-transition snapshot {chatId: B, questions: [A's]} would happily
restore A's data for B on the next mount.

Fix: introduce taskLaneOwnerChatId state that records the chatId those
in-memory questions/actions/show values BELONG to. Set at every site
that populates them (sendPrefill, selectChat, handleSend, handleTaskSubmit,
handleResumeNew, refreshFacts, handleApplyFix). Cleared in
resetSessionDerivedState. The persistence effect now writes ownerChatId
as the chatId tag, not activeChatId — so the snapshot is always
self-consistent.

Render gate: taskLaneIsForActiveChat = ownerChatId === activeChatId.
ANDed into all three render conditions (toolbar Tasks button, narrow-
viewport floating drawer, main side panel). The lane is structurally
unable to display data tagged with a different chat.

The mount-time skipTaskLaneRestore guard stays — it kills the flash
between component mount and the first sendPrefill effect run, which
the owner-gate alone doesn't cover.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 02:42:31 -04:00
0f00ee5e01 feat(escalations): close out plan-locked wedge polish
Four items from the design-plan audit, all flagged as locked-design or
Codex corrections, shipped together so the GTM demo path covers them
end-to-end before bug bash.

1. Live AI assessment refresh on the magic-moment screen. Backend already
   publishes handoff_assessment_ready when enrich_escalation_async commits;
   wire the frontend listener so the senior sees the assessment populate
   without a manual reopen. New event type + onAssessmentReady handler on
   streamEscalations; AssistantChatPage opens a scoped SSE subscription
   whenever it tracks a handoff missing its assessment, refetches on match,
   and replaces magicHandoff / overlayHandoff in place. Closes the loop on
   the async-assessment commit e8ba74e.

2. Suggested-step chips below the chat input. Locked design from the plan
   (Codex correction). Chip strip renders above the composer post-claim
   when ai_assessment_data.suggested_steps[] is non-empty. Click prefills
   the input and focuses; first send or explicit X hides for the session.

3. Unread 6px dot on EscalationQueue cards. localStorage-persisted seen
   set (rf-escalation-seen, capped 200). Dot top-right when not seen.
   Cleared on open (card click) or claim (Pick Up) — NOT on hover, per
   Codex correction. Pick Up stops propagation so it doesn't double-fire.

4. Race-condition toast on claim conflict. The /claim endpoint previously
   silently overwrote claimed_by — both seniors thought they owned the
   session. New HandoffAlreadyClaimedError carries the winner's id/name/
   timestamp; claim_session rejects different-user re-claims (same-user is
   idempotent for double-click safety); endpoint returns 409 with
   structured detail. AssistantChatPage.handleStartHere extracts and
   surfaces "Already claimed by {name} {time_ago}." via toast, drops
   ?pickup=true, dismisses magic-moment so the loser flows back to queue.

Tests: 2 new unit tests in test_handoff_manager.py (conflict raises,
same-user idempotent). Full handoff + escalation suite (34 tests) green.
Frontend tsc -b clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 01:59:28 -04:00
8914391336 fix(assistant-chat): kill stale task-lane flash on new-session entry
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m4s
CI / backend (pull_request) Successful in 10m9s
CI / e2e (pull_request) Successful in 10m8s
Two compounding bugs caused the previous session's questions/actions
to render briefly when entering a new chat — visible as "the new
session instantly pops with old session task-lane data" the user
reported.

The race
- AssistantChatPage's activeQuestions / activeActions / showTaskLane
  useState initializers synchronously read sessionStorage's
  rf-tasklane-meta. They restore the persisted task-lane state if its
  saved chatId matches the freshly-resolved activeChatId.
- On dashboard prefill flow, the page mounts on /pilot with
  location.state.prefill set; activeChatId initializes from
  sessionStorage's rf-active-chat-id (the previous session). The
  previous session's task-lane meta matches that chatId — so the
  initializer restores it. First paint shows old questions/actions.
  sendPrefill's resetSessionDerivedState fires later from a useEffect,
  but only after the flash.
- Same pattern hits the senior-pickup flow: ?pickup=true means we're
  about to render the magic-moment screen and discard whatever chat
  the senior was previously on, but the underlying chat surface still
  initializes with their old task-lane meta.

The amplifier
- resetSessionDerivedState wiped the in-memory state but never
  removed sessionStorage's rf-tasklane-meta. Any remount or reload
  before the next persistence-effect write could re-hydrate the
  cleared state from the still-stale sessionStorage entry.

Fixes
- Initializer guard: when location.state.prefill is set OR
  ?pickup=true is in the URL, skip the sessionStorage restore
  entirely. Kills the first-paint flash for both entry paths.
- Eager wipe: resetSessionDerivedState now also calls
  sessionStorage.removeItem('rf-tasklane-meta'). The persistence
  effect re-saves on the next state change anyway, so the only
  window where sessionStorage is empty is the exact window where
  stale-tag leakage was happening.

tsc -b clean. No backend changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 01:26:29 -04:00
e8ba74ed6d feat(escalations): distinguishable notifications, async AI, richer sidebar
All checks were successful
Mirror to GitHub / mirror (push) Successful in 6m5s
CI / frontend (pull_request) Successful in 11m59s
CI / e2e (pull_request) Successful in 10m7s
CI / backend (pull_request) Successful in 16m22s
Three improvements driven by live wedge testing.

1) Notification title now includes a problem snippet and PSA ticket
   suffix when present:
     "Escalation from Jane · #12345: Outlook is failing to sync email…"
   Replaces the prior "Session escalated by Jane" copy that made every
   escalation from the same junior look identical in the bell panel.
   Snippet is trimmed to 70 chars with ellipsis. handoff_manager now
   passes psa_ticket_id through in the notify() payload so this works
   for both /escalate and /handoff entry points.

2) AI enrichment (assessment + enhanced escalation_package) moved to
   a FastAPI BackgroundTask. The escalating engineer no longer waits
   on 15-25s of Sonnet latency — handoff creation returns as soon as
   snapshot, status flip, dual-write, documentation, PSA push, and
   notify() are committed. enrich_escalation_async opens its own DB
   session, runs both AI calls, updates handoff.ai_assessment +
   session.escalation_package, commits, and publishes a new
   `handoff_assessment_ready` event on the escalation bus. Frontend
   doesn't yet listen for that event — the magic-moment screen still
   shows a placeholder ("AI assessment is still generating. Reopen
   this view in a few seconds…") which is honest about the state.
   Live polling / auto-refresh on the bus event is the natural next
   step.

3) ChatSidebar entries now surface the problem summary as a secondary
   line and tag PSA-linked sessions with a monospace #ticket badge plus
   an "Escalated" pill on in-transit sessions. ChatListItem grew
   problem_summary, psa_ticket_id, and status fields; loadChats
   populates them from listSessions. The user couldn't tell their own
   sessions apart in the sidebar because they all rendered as "New
   Chat" with no distinguishing detail — this fixes that for any
   session, escalated or not.

Test plan
- Backend full suite: 1103 passed in 255.85s with -n auto.
- Frontend tsc -b clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 00:34:32 -04:00
aca915b047 fix(escalations): bump assessment timeout, surface picked-up sessions in sidebar
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 5m6s
CI / backend (pull_request) Successful in 9m45s
CI / e2e (pull_request) Successful in 10m20s
Two field-reported issues from live wedge testing.

ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS bumped 5s → 15s. The 5s bound
fired too aggressively against the Sonnet diagnostic assessment prompt;
~4-8s is typical but tail latency hits 12-14s. The fallback "Assessment
unavailable — model didn't respond in time" placeholder was showing on
the magic-moment screen for two consecutive escalations, which kills
the demo. 15s keeps the click-path bounded but lets the typical case
return real content. Real fix is async generation (kick off, persist
when done, surface "still computing" with refresh) — captured as a
follow-up; bumping the bound is the right call for the wedge demo.

list_sessions now matches escalated_to_id == current_user.id alongside
the existing user_id and escalation_package.picked_up_by clauses. The
unified HandoffManager.claim_session sets escalated_to_id but doesn't
write the legacy picked_up_by JSONB key, so picked-up sessions never
showed in the senior's chat list — the senior would land on the
session detail (active chat) but the sidebar showed only their other
unrelated sessions. User reported this as "4 different versions of the
session in the chat history section" — they were actually 4 unrelated
empty sessions the senior owned, plus the picked-up session was just
invisible. Backend tests still 94/94.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 00:04:08 -04:00
e910bcc67d fix(escalations): wire magic-moment + claim into AssistantChatPage
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 5m0s
CI / backend (pull_request) Successful in 10m2s
CI / e2e (pull_request) Successful in 10m39s
The /pilot/:id route renders AssistantChatPage, not FlowPilotSessionPage
(the latter is dead code with no active route). The earlier magic-moment
integration sat in the wrong file, so clicking Pick Up from the
dashboard navigated to /pilot/:id?pickup=true and AssistantChatPage
just loaded the chat surface with no claim — the senior never saw the
magic-moment screen and the handoff stayed unclaimed (status escalated,
permanently in the queue).

Adds full pickup awareness to AssistantChatPage:

- ?pickup=true on entry triggers a handoff fetch via
  handoffsApi.listHandoffs (account-scoped, no claim required).
  magicState transitions loading → visible (handoff found) or
  loading → dismissed (no handoff or fetch failed). The dismiss path
  also strips ?pickup=true from the URL so a refresh doesn't re-enter
  loading state.
- The existing selectChat-from-URL effect is gated on magicState — it
  skips while we're loading or showing the magic-moment so the chat
  surface doesn't race the claim flow. After claim it re-fires and
  populates messages from conversation_messages because the senior is
  now escalated_to_id and GET succeeds.
- Magic-moment renders as full-page take-over (sidebar hidden) until
  Start here. handleStartHere calls handoffsApi.claimHandoff, drops
  ?pickup=true, and dismisses — the regular chat then loads.
- Toolbar Context button (visible when magicHandoff is in memory)
  re-opens the screen as a dismissible overlay. Lazy-fetches the
  handoff when needed.

Verified tsc -b clean and Vite HMR picked the file up without errors.
The wire-level integration was already verified in earlier commits:
listHandoffs returns the unclaimed handoff for a senior pre-claim,
claimHandoff flips status escalated → active and sets escalated_to_id.

Note: the prior FlowPilotSessionPage magic-moment integration is now
in dead code (file is unreferenced from router). Left in place for
this commit; will come out in a follow-up cleanup once we're confident
the AssistantChatPage path is solid in production.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 23:23:00 -04:00
5085bb47c2 docs(ai): handoff state after /escalate unification through HandoffManager
All checks were successful
Mirror to GitHub / mirror (push) Successful in 6s
CI / backend (pull_request) Successful in 10m3s
CI / frontend (pull_request) Successful in 5m34s
CI / e2e (pull_request) Successful in 9m26s
Records 029680a — every escalation now funnels through HandoffManager
regardless of which URL it entered through, so /escalate from
EscalateModal produces the full set of artifacts (handoff row,
AppNotification, SSE event, Slack/Teams via notify, per-user emails,
documentation, PSA push) and the bell-icon notification opens the
magic-moment screen end-to-end. Notes the legacy SessionBriefing branch
+ flowpilot_engine.escalate_session as orphaned, scheduled for removal
after pilots have run a couple of weeks on the unified path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 22:29:40 -04:00
029680ab2d feat(escalations): unify /escalate through HandoffManager
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 5m8s
CI / backend (pull_request) Successful in 10m13s
CI / e2e (pull_request) Successful in 10m47s
Replaces the legacy flowpilot_engine.escalate_session orchestration with
a single canonical path through HandoffManager. Every escalation now
creates a SessionHandoff row, fans out via the SSE bus, persists
AppNotification rows for the bell icon, dispatches to external channels
(Slack/Teams) via notify(), and emails per-user — regardless of whether
the call entered through /escalate (legacy URL) or /handoff (new URL).
The senior-pickup magic-moment screen now works end-to-end from the
EscalateModal bell-icon path the user just tested.

Backend
- HandoffCreateRequest gains optional target_user_id (the equivalent of
  the legacy escalated_to_id field). Self-targeting rejected.
- HandoffManager.create_handoff handles intent='escalate' end-to-end:
  sets escalation_reason + escalated_to_id, builds the legacy enhanced
  AI escalation_package (Sonnet, lazy-imported from flowpilot_engine,
  graceful fallback on failure), and merges handoff metadata into it.
  Eager-loads session.steps and session.user via selectinload — required
  by both the enhanced-package builder and notify() to avoid
  MissingGreenlet on async lazy access.
- HandoffManager.finalize_escalation generates SessionDocumentation,
  pushes documentation to PSA, and runs notify() — pre-commit so the
  AppNotification rows persist atomically with the handoff.
- HandoffManager.dispatch_escalation_notifications keeps only the
  fire-and-forget IO (bus publish, per-user emails) — runs post-commit.
  Pulls engineer name via a separate User query rather than relying on
  session.user lazy access.
- /handoff endpoint passes target_user_id through and calls
  finalize_escalation pre-commit.
- /escalate endpoint is now a thin shim: owner-only session lookup,
  HandoffManager.create_handoff(intent='escalate'), finalize_escalation,
  commit, dispatch_escalation_notifications, return SessionCloseResponse
  built from documentation + psa_result. flowpilot_engine.escalate_session
  is no longer called by any endpoint.
- pickup_session accepts both 'requesting_escalation' (legacy in-flight
  sessions) and 'escalated' (new canonical) so the migration is seamless
  for sessions already in the queue.
- Escalation queue list and sidebar count now match either status.

Frontend
- useFlowPilotSession optimistic update flips status to 'escalated'
  instead of 'requesting_escalation' so the page state matches the
  unified backend response.

Verified end-to-end live: a fresh /escalate call from the junior produces
status='escalated', a SessionHandoff row, a SessionDocumentation, PSA
push attempted (no_psa for this test session), AND a bell-icon
AppNotification for the team admin with link
/pilot/{session_id}?pickup=true. Backend test suite: 1103 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 22:27:26 -04:00
2a2329ad19 docs(ai): handoff state after bell-icon fix; record draft PR #155
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
CI / frontend (pull_request) Successful in 5m41s
CI / backend (pull_request) Successful in 9m55s
CI / e2e (pull_request) Successful in 9m13s
Updates the handoff trio after the legacy notification flow fix and
the branch push. PR #155 is open against main as draft. Resume point
is now visual QA via /qa, then deferred follow-ups (chat-input
suggested-step chips, snapshot expansion). Logs the open question
about whether EscalateModal should switch to /handoff.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 21:33:44 -04:00
641853a002 fix(escalations): bell-icon notification opens the pickup flow
Some checks failed
Mirror to GitHub / mirror (push) Successful in 4s
CI / backend (pull_request) Failing after 1m17s
CI / frontend (pull_request) Successful in 4m53s
CI / e2e (pull_request) Successful in 9m18s
Two backend changes that unbreak the senior-pickup path from the
notification panel:

1. notification_service: session.escalated link template now ends with
   ?pickup=true so the senior lands in the handoff/pickup flow on
   click. Without it, navigation hit /pilot/:id directly, which then
   404'd on the GET because the senior isn't yet escalated_to_id —
   the user perceives this as the bell-icon "just clearing the
   notification".

2. ai_sessions GET access: any account member can now read an escalated
   session's detail when status is requesting_escalation or escalated.
   The owner-only guard was overly restrictive for explicitly-shared
   in-transit states. Tenant boundary is enforced by RLS on the
   underlying query, so account-scope is the right ceiling here. After
   pickup, the existing handler/escalated_to_id checks still apply.

Verified live: re-login as the senior engineer and GET the active
escalated session — now returns 200 with full detail. Focused test
subset plus tests/test_sessions.py and tests/test_session_sharing.py
→ 94 passed in 43.26s, no regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 21:29:47 -04:00
c194ba4a43 docs(ai): handoff state after magic-moment screen lands
Marks the magic-moment handoff-context screen as shipped, points the
next session at visual QA + push + draft PR, and captures the deferred
follow-ups (suggested-step chips, snapshot expansion, toolbar button
on revisits, owner analytics, Playwright e2e).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 21:08:07 -04:00
8e9d22e0e0 feat(escalations): magic-moment handoff-context screen on pickup
Adds the dedicated 4-section handoff-context view that renders BEFORE
the FlowPilot session for senior techs picking up an escalated
session, then dissolves on "Start here". This is the wedge's
demonstrable magic moment — what the GTM Loom records.

- HandoffContextScreen.tsx: pure presentational, takes a HandoffResponse
  plus onStartHere / onDismiss callbacks. Sections: header
  (problem summary, domain, step count, escalated-time, priority badge),
  "What's been tried" (engineer notes + step-count affordance), "AI
  assessment" (likely_cause / suggested_steps / confidence badge), Start
  here CTA. Confidence badge accepts both numeric (0..1) and string
  ("low"/"medium"/"high") shapes — backend currently emits the latter.
  Renders an explicit "assessment unavailable" branch when
  ai_assessment_data is null (the 5s timeout from 9bdd995 fired).
  Honors prefers-reduced-motion (animate-fade-in vs animate-slide-up).
  ARIA dialog + focus on the primary CTA. Esc dismisses when used as a
  re-openable overlay; pre-claim, Start here is the only exit.

- FlowPilotSessionPage.tsx: on /pilot/:id?pickup=true, fetch the
  handoff list via handoffsApi.listHandoffs (account-scoped via RLS,
  no claim required) and find the latest unclaimed escalate handoff.
  If found, render the magic-moment screen and skip the regular
  loadSession (the senior isn't yet escalated_to_id, so GET would
  404). Start here calls claimHandoff, drops the pickup query param,
  dismisses the screen — the existing loadSession effect then fires
  because the senior is now escalated_to_id. A "Context" toolbar
  button on active sessions re-opens the screen as a dismissible
  overlay (visible only when the senior arrived via the magic-moment
  flow this session — handoff lookup on demand).

Verified end-to-end against the running dev stack: listHandoffs
returns the unclaimed handoff with full payload; claim flips session
status from escalated → active; subsequent GET succeeds. tsc -b clean.

Defers (TODO followups): suggested-step chips below the chat input
that prefill on click (requires threading through to
FlowPilotMessageBar); snapshot expansion to include the recent
diagnostic steps pre-claim; toolbar Context button on sessions where
the senior didn't arrive via magic-moment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 21:06:14 -04:00
f65b65790c docs(ai): handoff state after frontend SSE slice lands
Marks the SSE subscription as shipped, points the next-session resume
target at the magic-moment handoff-context screen, and logs the live
end-to-end verification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 20:57:20 -04:00
b8627f4180 feat(escalations): subscribe EscalationQueue to live SSE arrivals
Adds the frontend live-arrival slice on top of the test-stabilized SSE
backend. Senior techs now see a junior's escalation slide into the
queue without refresh.

- streamEscalations(handlers, signal) in aiSessions.ts: fetch-based
  ReadableStream parser (native EventSource cannot send auth headers).
  Handles SSE frames, partial frames across chunks, : keepalive
  heartbeats. Dispatches ready and handoff_created.
- HandoffCreatedEvent + EscalationStreamHandlers types mirror the bus
  payload published by HandoffManager.dispatch_escalation_notifications.
- EscalationQueue.tsx: AbortController-managed subscription with
  exponential-backoff reconnect (1s → 30s cap, attempt counter resets
  on ready). On handoff_created, refetch and diff against previous IDs
  via sessionsRef; new arrivals prepended (newest-first) above
  established cards (oldest-first preserved). Slide-in tag held for
  800ms so the locked 200ms animation completes. Tab-title flash
  prefixes (N) while document.hidden, restores on focus / unmount.
  prefers-reduced-motion swaps slide-in for fade-in. ARIA region +
  aria-live=polite + aria-label on heading. Pick Up bumped to py-2.5
  to clear the 44px touch floor.

Verified end-to-end against the running dev stack: subscriber received
the ready frame on connect; after posting a handoff via the API, the
subscriber received the handoff_created frame with the expected
payload — wire format matches the parser. Backend regression: focused
subset still 32 passed in 18.91s. Frontend tsc -b clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 20:57:15 -04:00
02d5c6c08c docs(ai): refresh handoff state for next-session pickup under 200k context
Default Claude Code model is being switched from Opus 4.7 1M-context to
Opus 4.7 (200k). Tighten the per-session pickup docs so they're
self-sufficient under the smaller window:

- CURRENT_TASK now reflects the post-Codex state: 8 commits on the
  branch (5 feat + WIP SSE + 2 Codex test/latency fixes + 1 doc
  refresh), 32/32 backend tests with -n auto, frontend tsc -b clean.
  Remaining work re-scoped: the SSE backend half is feature-complete
  and tested, so what's left is the FRONTEND SSE subscription in
  EscalationQueue.tsx, then the magic-moment handoff-context screen,
  then push + draft PR.
- Session log gets a Claude Code entry covering today's planning →
  build → pause-for-Codex arc, the design decisions locked into the
  doc and code, the two TODOs added (peer-tech escalation, mobile
  responsive), and the model-switch context for the next session.
- HANDOFF.md needs no change — Codex's update in 9bdd995 already
  describes the resume point and watch-outs cleanly.

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 20:13:40 -04:00
9bdd9959a8 fix(handoff): bound escalation assessment latency
Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 20:03:14 -04:00
fff8338bf2 docs(ai): track escalation assessment latency follow-up
Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 19:55:31 -04:00
bc15952857 fix(tests): stabilize escalation SSE backend tests
Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 19:47:43 -04:00
ba46fc5644 docs(ai): pause Escalation Mode build mid-SSE for Codex review
Update HANDOFF to reflect:
- Build paused after the WIP SSE commit (87bd0b7)
- What Codex should look at on the SSE bus + endpoint + dispatch wiring
- Resume point post-review: re-run tests with -n auto, then frontend
  SSE subscription, then magic-moment screen
- Test-suite watch-out: per-test DROP SCHEMA fixture means concurrent
  pytest runs on the same DB collide; always one-suite-at-a-time or
  -n auto with conftest's per-worker DB isolation

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:29:16 -04:00
87bd0b7c56 WIP: SSE pub/sub for live escalation arrivals (paused for Codex review)
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.

What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
  pub/sub keyed by account_id. asyncio.Queue per subscriber with
  64-event maxsize and drop-on-full semantics. Designed to be swappable
  for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
  /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
  require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
  bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
  now publishes a `handoff_created` event to the bus BEFORE the email
  fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
  standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
  (publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
  blocked, ready event handshake).

[gstack-context]
Decisions:
  - SSE over WebSocket (one-way, browser EventSource semantics, fewer
    moving parts behind Railway proxy)
  - In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
  - Drop-on-full subscriber queue rather than back-pressure publishers
  - Bus publish ahead of email send, both wrapped in try/except so
    neither can break handoff creation
  - Frontend will be a fetch-based ReadableStream reader matching the
    existing streamDocumentation pattern, not native EventSource
    (custom-header auth)
Remaining (post-Codex):
  - Frontend SSE subscription in EscalationQueue.tsx (slide-in,
    reconnect, tab-title flash, prefers-reduced-motion)
  - Magic-moment handoff-context screen
  - Re-run the full backend test suite to verify the SSE +
    dispatcher integration tests (bus units already green standalone)
Tried:
  - Running the full test suite repeatedly without xdist; the per-test
    DROP SCHEMA + recreate fixture made wall-clock prohibitive when
    multiple stale runs collided on the same Postgres test schema.
    Resolution: -n auto next time.
[/gstack-context]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:29:07 -04:00
a283d0d3fd docs(ai): refresh handoff state mid-flight on Escalation Mode build
Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:

- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
  this branch, what remains (WebSocket push, magic-moment screen, analytics
  page, e2e), and the two-metric framing readers MUST internalize before
  quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
  notification dual-path); includes suggested first slice + watch-outs
  (no user_id on ai_session_step, denormalized account_id, peer-escalation
  still gated to session owner)
- Both files reference the design doc and the kill-switch criterion

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 16:38:14 -04:00
9f0bfd44f9 feat(escalations): mount time-to-first-action stat-card on /escalations
Surfaces the new GET /analytics/flowpilot/escalations endpoint as a card
above the EscalationQueue list. Closes the loop from yesterday's metric
endpoint commit — seniors and owners see the wedge stat the moment they
open the queue, which is the daily-reps version of the GTM ROI story.

Pieces:
- EscalationMetrics TS interface mirroring the backend Pydantic model
  (incl. metric_definition disclaimer field)
- flowpilotAnalyticsApi.getEscalationMetrics(period) client method
- EscalationMetricCard component:
    * loading skeleton, error state, zero-data empty state
    * avg + median + n_with_action/n_claimed conversion rate
    * humanized seconds → "Ns" / "N.N min" formatting
    * inline disclaimer reminding callers this is in-product time-to-
      first-action only, NOT the savings claim — pair with manual
      baseline (per /codex review's two-metric correction)
- Wired into EscalationQueuePage above EscalationQueue

DS-aligned: card-flat, accent-dim usage held to interactive elements,
text-muted-foreground for secondary copy, font-heading on the headline
number, explicit transition properties (no `transition: all`). Respects
prefers-reduced-motion implicitly (only animation is the loading pulse,
which Tailwind's animate-pulse already gates).

tsc -b clean. No new tests in this commit — component is a thin
state-machine over an axios call; integration coverage comes from the
existing backend tests + the e2e Playwright work in the plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 16:00:34 -04:00
07d0db9579 feat(handoff): email engineer-or-admin teammates on escalation
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.

HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
  (excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
  via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
  creation is authoritative; notification is advisory. This is the
  graceful-degradation regression both eng + codex reviews flagged as
  critical (handoff must succeed even if SMTP is down).

Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
  handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.

Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
  excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
  service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST

Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.

20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:58:05 -04:00
7a5b853b3b feat(api): role-gate handoff claim to engineer-or-admin
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.

Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate

Existing handoff create + queue tests unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:46:59 -04:00
52f6d0308f feat(analytics): add escalation time-to-first-action metric endpoint
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}

Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.

The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.

Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
  aggregates avg/median/p95 in Python (~1k rows/account/month is well within
  budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
  n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases

Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.

First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:25:46 -04:00
d51e95cdfa docs(plans): add escalation-mode wedge design + test plan
Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI
specs, and embedded review report for the Escalation Mode wedge — output of
/office-hours, /plan-eng-review, /plan-design-review, and /codex review.

Codex review surfaced two corrections we applied:
- two-metric framing (manual baseline vs in-product time-to-first-action)
- claim role gate moved in-scope (was deferred TODO)

TODO updates: peer-tech escalation + claim role gate captured (the latter then
moved in-scope by the codex pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:18:46 -04:00
c0ed6d9840 Merge pull request 'docs(ai): refresh handoff state after PR #153 merge' (#154) from chore/post-153-handoff into main
All checks were successful
CI / frontend (push) Successful in 5m37s
Mirror to GitHub / mirror (push) Successful in 14s
CI / backend (push) Successful in 10m48s
CI / e2e (push) Successful in 11m0s
Reviewed-on: #154
2026-04-26 05:33:31 +00:00
8f818a7c71 docs(ai): refresh handoff state after PR #153 merge
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m49s
CI / backend (pull_request) Successful in 11m5s
CI / e2e (pull_request) Successful in 11m36s
- CURRENT_TASK rolls forward — PR #153 closed out, no active task,
  with recommended next moves (promote e2e gate to required, pick
  from TODO).
- HANDOFF rewritten — new home position is `main`; documents the
  e2e job's stub ANTHROPIC_API_KEY convention so future
  AI-touching e2e tests know what to expect.
- SESSION_LOG entry extended with the CI env-var diagnosis, the
  fix, the merge, and pointers to the natural next pickups.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 01:14:49 -04:00
68fcdc6122 Merge PR #153: fix(chat): sync currentChatRef when prefill creates a new chat session
All checks were successful
CI / frontend (push) Successful in 5m57s
Mirror to GitHub / mirror (push) Successful in 13s
CI / backend (push) Successful in 10m28s
CI / e2e (push) Successful in 12m0s
Fixes a silent-drop bug where the dashboard prefill flow created a new chat session but didn't update the in-flight guard ref, so subsequent task-lane submissions had their AI follow-up responses discarded.

Includes a Playwright regression test that drives the prefill flow and stubs /ai-sessions/*/chat to verify the second AI turn renders. Also adds a stub ANTHROPIC_API_KEY to the e2e CI job so AI-gated endpoints clear their _require_ai_enabled() check (the chat call itself is intercepted in the browser, so no real Anthropic traffic).
2026-04-26 05:05:54 +00:00
11fe32f4c6 fix(ci): set stub ANTHROPIC_API_KEY for e2e job so AI-gated endpoints respond
All checks were successful
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m39s
CI / backend (pull_request) Successful in 10m24s
CI / e2e (pull_request) Successful in 12m14s
POST /api/v1/ai-sessions and friends call _require_ai_enabled(), which
returns 503 when no provider key is set. The new prefill-handoff
regression test (e2e/assistant-chat-prefill.spec.ts) drives the
dashboard prefill flow, which has to create a chat session before its
page.route stub on /chat can fire — so without a key, session
creation 503s and the test never sees the task lane.

The Playwright stub intercepts /chat in the browser, so the backend
never actually contacts Anthropic — but the AI-enabled gate still
needs to pass. A stub value is enough.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:51:39 -04:00
43eed720d9 docs(ai): close out PR #150, set PR #153 as active task
Some checks failed
Mirror to GitHub / mirror (push) Successful in 13s
CI / frontend (pull_request) Successful in 5m50s
CI / e2e (pull_request) Failing after 6m50s
CI / backend (pull_request) Successful in 10m40s
- CURRENT_TASK.md rolled forward — the CI-recovery task is complete
  (PR #150 merged as 87bb20b; backend gate is in required checks).
  Active task is now landing PR #153.
- HANDOFF.md rewritten — new resume point is watching CI on the
  rebased SHA 1559feb and merging when all three checks are green.
- SESSION_LOG.md gains a 2026-04-26 entry covering the prefill bug
  diagnosis, fix, regression test, and the rebase off post-#150 main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:30:50 -04:00
1559feb759 docs(ai): track currentChatRef silent-swallow follow-up in TODO
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m43s
CI / e2e (pull_request) Failing after 6m40s
CI / backend (pull_request) Has been cancelled
The guard pattern that masked the prefill-ref bug fixed in PR #153 is
applied across handleSend, handleTaskSubmit, selectChat, refreshFacts,
refreshActiveFix, and refreshPreview. Worth either logging the
mismatch path or distinguishing expected-stale from unexpected-stale
so the next instance of this class of bug surfaces instead of hiding.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:24:25 -04:00
b56da2facd fix(chat): sync currentChatRef when prefill creates a new chat session
The dashboard prefill flow in AssistantChatPage set activeChatId after
creating a new session but never updated currentChatRef.current. Every
later handleSend / handleTaskSubmit then tripped the
`currentChatRef.current !== sentForChatId` guard that was supposed to
discard responses for stale chats — and silently dropped the AI's
follow-up. The user saw their submitted message but no assistant
reply, no toast, no task-lane update.

Mirrors what handleNewChat and handleResumeNew already do. Adds an
e2e regression test that drives the dashboard prefill, submits a
partial task-lane response, and asserts the second AI turn renders.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:24:02 -04:00
87bb20b8f0 Merge PR #150: fix(ci): consolidated CI recovery — backend green, xdist parallelization, e2e selector + decoupling
All checks were successful
CI / frontend (push) Successful in 5m42s
Mirror to GitHub / mirror (push) Successful in 13s
CI / backend (push) Successful in 10m21s
CI / e2e (push) Successful in 11m5s
2026-04-25 21:57:26 +00:00
1e3a6cfa01 fix(e2e): harden card selectors for session resume
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m43s
CI / backend (pull_request) Successful in 10m21s
CI / e2e (pull_request) Successful in 11m23s
Co-Authored-By: Codex <noreply@openai.com>
2026-04-25 16:42:33 -04:00
ede6eebf9a docs(ai): note e2e decoupling commit (261814a) in HANDOFF
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m43s
CI / e2e (pull_request) Failing after 9m30s
CI / backend (pull_request) Successful in 10m18s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 16:12:19 -04:00
261814ae65 perf(ci): decouple e2e from frontend — build frontend inline in e2e job
Some checks failed
Mirror to GitHub / mirror (push) Successful in 14s
CI / frontend (pull_request) Successful in 5m44s
CI / e2e (pull_request) Failing after 7m42s
CI / backend (pull_request) Successful in 10m28s
Before: e2e \`needs: [frontend]\` waited for the frontend job to upload
a build artifact, then downloaded it. With multiple runners this means
the third runner sat idle for ~6 min while frontend ran, then started
e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min.

After: e2e builds its own frontend (npm ci + npm run build are already
in the job; just dropped the artifact download step and added the
build). e2e starts immediately on a free runner. Adds ~1-2 min to the
e2e job duration but removes ~5 min of waiting and eliminates the
cross-job artifact mechanism entirely.

Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches
on the cross-job handoff. The \`if: always()\` upload of the
playwright-report at the end of e2e is kept (failure report retrieval
is still useful), but it's a leaf-output, not a dependency.

Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the
3-runner setup, down from ~11 min.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:59:00 -04:00
6656ebdead docs(ai): reflect PR consolidation — #151/#152 merged into #150
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:55:08 -04:00
69f2a37591 fix(e2e): update 5 selectors that drifted with FlowPilot/PSA UI changes
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m52s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
Mechanical drift between the e2e selectors and the current UI surfaced
on the first CI run after PR #149 unblocked the artifact upload step.
Five tests, three categories of drift:

1. **Page heading renames** (navigation.spec.ts)
   - `Sessions` → `Session History` on /sessions
   - `Account Settings` → `Account Management` on /account

2. **Route rename** (command-palette.spec.ts:74)
   - The "Troubleshoot with FlowPilot" command palette option now lands
     on /pilot (Phase 1 of the FlowPilot migration renamed /assistant).
     /assistant still 301-redirects, so the assertion accepts either.

3. **Feature moved to /sessions** (history.spec.ts, resume.spec.ts)
   - Default tab on /sessions is "AI Sessions"; flow-session filtering
     and the Resume button moved behind the "Flow Sessions" tab. Both
     tests now click that tab before asserting.
   - resume.spec.ts no longer starts at /trees (Resume buttons aren't
     rendered there anymore — the flow lives on /sessions). Destination
     URL (/trees/:id/navigate) is unchanged.

No product-code changes — these are pure test updates against the
shipped UI. Run the suite locally with
`cd frontend && npm run test:e2e` once a fresh build is available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:53:57 -04:00
7f714363dd perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).

How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
  'gw0', 'gw1', …). When set, derives a per-worker DB URL like
  `…/resolutionflow_test_gw0`. The base DB stays for serial / master
  runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
  connects to the postgres maintenance DB, and `CREATE DATABASE`s the
  worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
  contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
  worker's isolated DB, no cross-worker race.

CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).

Adds `pytest-xdist==3.6.1` to requirements-dev.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:53:47 -04:00
1bd43abb8f fix(ci): drop postgres host port mapping (multi-runner port collision)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 6m44s
CI / e2e (pull_request) Failing after 8m43s
CI / backend (pull_request) Has been cancelled
With 3 Gitea Actions runners on the same homelab box, two simultaneous
backend (or backend + e2e) jobs both try to bind 0.0.0.0:5432 for their
postgres service containers. The second fails with:

  failed to set up container networking: ... Bind for 0.0.0.0:5432
  failed: port is already allocated

The host-port mapping isn't actually needed — the workflow uses
\`DATABASE_URL: postgresql+asyncpg://...@postgres:5432/...\` (hostname
\`postgres\` is the service container's docker-network DNS name).
The tests run inside the act container which is on the same docker
network, so they reach postgres without going through the host.

Removing \`ports: 5432:5432\` from both backend and e2e job service
definitions lets multiple postgres services run in parallel on
different docker networks without colliding on the host.

Surfaced when PR #150 ran in parallel with another job after the
multi-runner setup. Backend instant-failed in 2s on the docker run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:28:17 -04:00
c203b70ef9 docs(ai): queue data-testid hardening + reflect PR #152 + 3-runner setup
Some checks failed
CI / backend (pull_request) Failing after 2s
Mirror to GitHub / mirror (push) Successful in 15s
CI / e2e (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
TODO.md: Promote pytest-xdist to  (PR #151 carries it). Adds three new
backlog items:
- data-testid hardening for e2e-critical interactive elements (sparked
  by PR #152's selector drift work)
- per-test transactional rollback (next big speedup if needed)
- pytest-testmon for PR-time test selection

HANDOFF.md: Three open PRs now (#150, #151, #152), all independent.
Three Gitea runner agents now registered, so jobs run in parallel.
Combined with #151's xdist, the prior 1h 14m wall-clock should drop
to ~6-10 min. Updated merge order: #152 first (smallest), #150 next,
#151 last. After all three land, enable CI / backend then CI / e2e
as required status checks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:26:21 -04:00
f27e3b44b0 docs(ai): SESSION_LOG entry for the parallelization session
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Successful in 32m33s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m58s
(Was meant to land in fe632c9; the multi-line edit failed silently
because Codex's earlier entry shifted the surrounding context.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:15:41 -04:00
fe632c9194 docs(ai): handoff after CI parallelization + final test fix
Some checks failed
Mirror to GitHub / mirror (push) Has been cancelled
CI / backend (pull_request) Successful in 30m26s
CI / frontend (pull_request) Successful in 5m46s
CI / e2e (pull_request) Failing after 5m3s
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect:
- PR #150 now contains the AI-provider test mock + caching + maxfail.
  Backend CI should be fully green for the first time in months.
- PR #151 stacked on #150: pytest-xdist with per-worker DBs. Local
  verification: 22m 27s → 4m 28s (5× speedup), 1076 passed both runs.
- DoD is now: merge #150, then #151, then add CI / backend
  (pull_request) to required status checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:15:07 -04:00
e976fb4e87 fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Successful in 31m8s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m57s
Three changes that get PR #150 to a green CI gate:

1. **test_record_decision_persists_and_bumps_state_version** — the
   `decision: draft_template` path calls `_extract_template_parameters`
   (TemplateExtractionService → AI provider). CI doesn't set
   ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
   `RuntimeError: No AI provider configured` and returned 500. The test
   isn't exercising the AI integration — patched the extractor with an
   AsyncMock returning a minimal valid `{templated_body, parameters}`
   dict. Verified locally: the test now passes.

2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
   the hash of requirements*.txt / package-lock.json with a runner-os
   restore-key fallback. Saves ~30-60s per run on cache hit.

3. **Pytest invocation tightened**:
   - Dropped `--cov-report=term-missing` — the custom "Display coverage
     summary" step below parses coverage.json and prints the same
     module list more concisely. Term-missing dumps every uncovered
     line which adds ~5-10s of stdout.
   - Added `--maxfail=10` so a structural breakage (fixture explosion,
     DB unreachable) bails after 10 errors instead of running the full
     25-min suite. Tunable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:01:05 -04:00
0aefaa78eb docs(ai): queue pytest-xdist parallelization in TODO.md
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Has been cancelled
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
Capture the backend pytest parallelization work so it survives session
end. Backend suite is currently ~22 min wall-clock for 1076 tests;
xdist with one-DB-per-worker should land in the 3-6 min range on the
homelab Gitea Actions runner.

Also queues two backlog items:
- Frontend lint warnings (23 react-hooks/exhaustive-deps after PR #149)
- Periodic audit of the ResourceWarning filterwarnings added by Codex

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 11:35:38 -04:00
49f88569da wip(handoff): restore backend suite to green
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Failing after 27m35s
CI / frontend (pull_request) Successful in 2m46s
CI / e2e (pull_request) Failing after 4m9s
Co-Authored-By: Codex <noreply@openai.com>
2026-04-25 06:13:23 -04:00
208ec996d5 docs(ai): handoff for Codex — CI recovery + 54 real backend failures
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m15s
CI / frontend (pull_request) Successful in 2m55s
CI / e2e (pull_request) Failing after 4m23s
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md so the next
session has accurate resume state. Summary of where things are:

- PR #141 (PSA tickets), PR #147 (FlowPilot Phase 1-9), PR #148 (CI
  fixes part 1), PR #149 (CI fixes part 2) all merged to main in this
  session.
- Branch protection enabled on main: PR-only, CI / frontend required.
- PR #150 (this branch) is the last CI-config PR — adds
  DATABASE_TEST_URL to the workflow and pins upload-artifact to v3.
- Next session: watch #150's CI, merge if green, add CI / backend to
  required checks, then start on the 54 real backend test failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:36:54 -04:00
8f7df2c0ef fix(ci): set DATABASE_TEST_URL + downgrade upload-artifact to v3 (Gitea Actions)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m29s
CI / frontend (pull_request) Successful in 3m11s
CI / e2e (pull_request) Failing after 4m56s
Two CI-config issues blocking the gate from going green:

1. **Backend tests connect to localhost instead of postgres service.**
   conftest.py reads DATABASE_TEST_URL only — DATABASE_URL is intentionally
   not consulted (per dab740d's test-DB-isolation hardening — running
   pytest with DATABASE_URL set previously dropped the dev DB schema).
   The CI workflow only sets DATABASE_URL, so conftest falls back to its
   localhost default and every fixture-setup fails with
   `OSError: Connect call failed ('127.0.0.1', 5432)` — observed as 638
   errors on the latest main run.

   Add DATABASE_TEST_URL pointing at the postgres service container.
   Same connection string as DATABASE_URL — the test DB and the app DB
   are the same physical postgres in CI; conftest's safety assertion is
   satisfied by the URL containing "test".

2. **Frontend artifact upload fails on Gitea Actions runner.**
   actions/upload-artifact@v4 (and v5) are not supported on Gitea
   Actions / GHES — the runner returns
   `GHESNotSupportedError: ... not currently supported on GHES`. Lint
   itself is now passing (0 errors after PR #149); the job exits 1 only
   because the upload step then fails.

   Pin upload-artifact + download-artifact to v3, the latest version
   compatible with Gitea Actions until they ship v4 support.

After this lands, both backend and frontend CI gates should turn
green — at which point we can also add backend to the required status
checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:28:54 -04:00
f27f671fe6 Merge PR #149: fix(ci): frontend lint to zero errors + test-DB schema fix + dev-deps installable
Some checks failed
CI / backend (push) Failing after 10m26s
CI / frontend (push) Failing after 2m35s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 15s
2026-04-25 07:12:15 +00:00
d6218f2e07 fix(tests): import all models in conftest so create_all sees the full schema
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 11m23s
CI / frontend (pull_request) Failing after 2m41s
CI / e2e (pull_request) Has been skipped
The test_db fixture calls Base.metadata.create_all on a fresh test DB.
That only creates tables for models that have been imported (and thus
registered with Base.metadata) by the time the fixture runs.

app.main imports app.core.database (which gives us Base) but does NOT
eagerly import the model modules — most are pulled in lazily inside
scheduler functions (archive_stale_ai_sessions etc.) and route
modules. At fixture-setup time, only the handful of models touched by
those eager imports are on the metadata, so any test that exercises
PSA, network diagrams, ratings, escalations, etc. fails with
\`UndefinedTableError: relation "X" does not exist\` and a cascade of
500s on every endpoint that queries the missing table.

Adding \`from app import models as _models\` (rather than the bare
\`import app.models\` which would shadow the \`app\` FastAPI instance
imported just above) pulls in app/models/__init__.py, which itself
imports every model module — registering all ~60 tables with
Base.metadata before create_all runs.

Verified locally: tests/test_psa_writeback_phase4.py went from
1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were
masking the actual passes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:49:06 -04:00
920a246d77 fix(react): remove four setState-in-effect cascades flagged by react-hooks v5
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 11m23s
CI / frontend (pull_request) Failing after 2m42s
CI / e2e (pull_request) Has been skipped
The new react-hooks lint rule "Calling setState synchronously within an
effect can trigger cascading renders" flagged real anti-patterns in
four spots. Refactored each per the rule's intent (derive during render,
or use useSyncExternalStore for external subscriptions).

1. hooks/useMediaQuery.ts — replaced the useState + useEffect pair with
   useSyncExternalStore. That's the canonical React hook for
   subscribing to external stores (matchMedia in this case) without
   mirroring into local state via an effect. Snapshot/getServerSnapshot
   pair preserves the SSR-safe behaviour.

2. components/network/nodes/DeviceNode.tsx — the prop-sync useEffect
   that copied nodeData.label into labelValue was redundant.
   labelValue is the EDIT BUFFER; while not editing, the displayed
   span now reads nodeData.label directly. The buffer is initialized
   only when an edit session starts (onDoubleClick).

3. components/network/nodes/GroupNode.tsx — same pattern, same fix.

4. components/dashboard/TicketQueue.tsx — the
   setTickets([]) + setLoading(true) + fetchTickets() chain in the
   effect was the cascade. Pushed those writes inside fetchTickets
   (after the function boundary, so they batch with the eventual
   setTickets(result)). Added a request-id ref so a slow first
   response can't overwrite a fast second one.

Frontend lint: 20 errors → 0 errors. tsc -b clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:33:13 -04:00
b7f8e70be2 fix(lint): replace explicit-any types + unused-expressions ternaries
Five files, all stylistic:

- useFlowPilotSession.ts: typed the axios error shape with a narrow
  inline type instead of \`as any\`.
- FlowPilotSessionPage.tsx: same — typed location.state once, then
  destructured.
- ScriptBuilderTab.tsx: handleViewScript was a placeholder no-op;
  declared the args properly with \`void script; void filename\` so the
  signature matches ScriptBuilderChatProps without no-unused-vars
  firing.
- TicketsPage.tsx: replaced 8 ternaries-as-statements (\`x ? f() : g()\`)
  with proper if/else blocks. Same control flow, satisfies
  no-unused-expressions, and reads better in the URL-param update paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:57 -04:00
857d73e3d0 fix(lint): move AssistantSessionRedirect out of router.tsx (react-refresh gate)
react-refresh/only-export-components fires when a file with the
\`router\` const export also defines a component (the redirect helper).
Moves the small helper to its own file under components/routing/ so
HMR can keep the route-component module hot-reload-eligible.

No behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:50 -04:00
406ee0ef97 fix(deps): bump pytest 7.4 → 8.4, pytest-cov 4.1 → 5.0 to satisfy pytest-asyncio 0.24
pytest-asyncio==0.24.0 (added on the FlowPilot branch as part of the
RLS test infra refactor) declares pytest>=8.2 — but requirements-dev.txt
still pinned pytest==7.4.3, so a clean pip install fails with
ResolutionImpossible. CI runners that started from a fresh image would
have refused to install dev deps; the FlowPilot tests passed locally
only because the dev container had a pre-installed pytest 8.x lying
around.

pytest-cov 4.1.0 also needs >= 5.0 to play nicely with pytest 8.

No code changes — pytest 8 is API-compatible with the existing test
suite once the install resolves.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:43 -04:00
32fae2c693 Merge PR #147: feat: FlowPilot migration — Phase 1-9 + Phase 9 bug fixes + QA fixture harness
Some checks failed
CI / backend (push) Failing after 36s
CI / frontend (push) Failing after 1m11s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 11s
2026-04-25 06:02:14 +00:00
a45915fbbc Merge main into feat/flowpilot-migration (PR #148 backports)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 37s
CI / frontend (pull_request) Failing after 1m11s
CI / e2e (pull_request) Has been skipped
Brings PR #148 — two pre-existing CI fixes (network_diagrams JSONB
server_default, removed deprecated session-scoped event_loop fixture).

The conftest.py event_loop fix on main is already incorporated in
FlowPilot's b14a16a (RLS-gating commit, which dropped the same fixture
as part of its larger refactor). Kept HEAD's version of the RLS-gating
collection hook; the event_loop fixture removal is identical.

The network_diagram.py fix lands cleanly via auto-merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:01:46 -04:00
06593a40d9 Merge PR #148: fix(tests): repair two pre-existing bugs blocking backend CI
Some checks failed
CI / backend (push) Has been cancelled
CI / frontend (push) Has been cancelled
CI / e2e (push) Has been cancelled
Mirror to GitHub / mirror (push) Has been cancelled
2026-04-25 06:01:08 +00:00
9737d90f1b fix(tests): repair two pre-existing bugs blocking the backend CI gate
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 19m36s
CI / frontend (pull_request) Failing after 1m8s
CI / e2e (pull_request) Has been skipped
1. backend/app/models/network_diagram.py — `nodes` and `edges` columns
   used `server_default="'[]'"` (a Python string), which SQLAlchemy
   wraps in single quotes when generating DDL, producing
   `JSONB DEFAULT '''[]'''` — invalid JSON. Switch to
   `server_default=text("'[]'::jsonb")` so the literal is passed through
   and the table can actually be created. Surfaced on every CI run as
   `asyncpg.exceptions.InvalidTextRepresentationError: invalid input
   syntax for type json` at fixture setup time, cascading hundreds of
   test errors.

2. backend/tests/conftest.py — drop the deprecated session-scoped
   `event_loop` fixture. Since pytest-asyncio 0.23+, the plugin manages
   the loop itself; redefining it with a session scope but never
   `set_event_loop()`-ing it left the loop dangling, so any test that
   called `asyncio.run()` (e.g. `test_tasks_are_isolated`) closed the
   process loop and broke the next async test in the module —
   `test_require_tenant_context_raises_403_when_no_account` was the
   visible casualty in the CI logs.

Verified locally:
- `pytest tests/test_uploads.py::test_upload_success` — was setup-error
  on `network_diagrams` DDL; now passes.
- `pytest tests/test_tenant_context.py` — was 1 fail / 3 pass; now 4/4.

Both are real bugs, not test infrastructure churn. Pre-existing on
main; not introduced here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 01:49:50 -04:00
1c904373f8 Merge main into feat/flowpilot-migration
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 36s
CI / frontend (pull_request) Failing after 1m7s
CI / e2e (pull_request) Has been skipped
Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top
of a unified main. Two manual conflict resolutions:

1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven
   protocol). The pre-rewrite reference content (CW integration notes,
   lessons archive, env vars table) lives in `docs/connectwise/`,
   `docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design.

2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions
   were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks
   (facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with
   PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint).
   Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay,
   NewTicketModal) kept. All setters wired by either branch are intact.

Verification:
- `tsc -b` clean across the merged tree.
- Browser smoke-test (Session B fixture): Phase 9 ProposalBanner
  ("Run AI-drafted PowerShell to recover SSL VPN") renders alongside
  PSA's new Tickets sidebar icon. Console clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 01:03:33 -04:00
16060d2235 Merge PR #141: feat: PSA ticket management — /tickets page, detail panel, AI ticket creation
Some checks failed
CI / backend (push) Failing after 19m11s
CI / frontend (push) Failing after 1m19s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 11s
2026-04-25 04:59:02 +00:00
9330ce4782 fix(pilot): two Phase 9 layout/state bugs surfaced by QA fixtures
All checks were successful
Mirror to GitHub / mirror (push) Successful in 11s
1. EscalateInterceptDialog clipped off-screen.
   The dialog was positioned with `absolute bottom-full mb-2 left-0`
   under the assumption the Escalate button would have room above it.
   In practice the button lives in the chat-page action bar near
   y≈105, so the 302 px dialog overflows the top of the viewport
   and only the last option is visible.

   Switch to `top-full mt-2 right-0` — anchors the dialog below the
   button and aligns its right edge with the button (avoids overflow
   off the right when the button is in the right-side action cluster).

2. TemplateMatchPanel never renders on a fresh session.
   `handleApplyFix` for the script_template_id branch only sets
   `scriptPanelOpen=true`, but TemplateMatchPanel is mounted inside
   `TaskLane.bottomSlot`. On sessions with no questions/facts the
   lane defaults closed, so the panel exists in the React tree but
   inside an unrendered TaskLane — the user clicks Apply fix and
   nothing visibly changes.

   Fix: also `setShowTaskLane(true)` in that branch so the lane
   opens alongside the panel. The ai_drafted_script branch is fine
   (InlineNoTemplateDialog renders in the chat region, not in the
   lane), so it's left alone.

Both bugs were latent — they only surface on sessions that haven't
accumulated TaskLane state yet (questions/facts). Fresh sessions
created from the StartSessionInput hide them because the AI's first
turn populates questions and the lane auto-opens. Caught using the
new seed_phase9_qa_fixtures.py harness.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:08:50 -04:00
d68131a865 feat(seed): Phase 9 QA fixture seeder
Adds backend/scripts/seed_phase9_qa_fixtures.py — creates 4 ai_sessions
plus matching session_suggested_fixes that pre-bake the four backend
states the AI orchestrator must produce to mount the five conditional
Phase 9 components:

  A. no template, no draft     → ChatTabStrip + ScriptBuilderTab
  B. ai_drafted_script set      → InlineNoTemplateDialog
  C. script_template_id set     → TemplateMatchPanel
  D. applied_at + status=proposed → EscalateInterceptDialog (verify state)

Background: a Phase 9 QA pass against a regular session left these
five components unreached because the AI didn't emit SUGGEST_FIX in
time/at all. Seeding directly bypasses the AI and lets QA exercise
each surface deterministically.

UUIDs are deterministic (uuid5 over a fixed namespace) so re-runs
upsert. Pass --reset to wipe and recreate. Each session gets two
synthetic conversation messages so the chat header's canAct gate
(messages.length >= 2) opens up Resolve/Escalate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:08:38 -04:00
875bd924a9 fix(pilot): auto-scroll Resolve preview into view when opened
The ResolutionNotePreview popover renders inside TaskLane's
overflow-y-auto region at the bottom of the lane. On a 720px
viewport with the default question/check list expanded, the
popover lands below the visible scroll position — the engineer
clicks "Preview Resolve note", sees the button label flip to
"Showing", but no preview appears on screen.

Add a useEffect that calls scrollIntoView({block: 'nearest'}) on
the popover's outer div whenever `open` flips to true. block:
'nearest' scrolls just enough to make it visible without yanking
the lane to the top.

Discovered during Phase 9 QA. Reproduced at 1280x720; fix verified
visually in the same QA run (screenshots in
.gstack/qa-reports/phase9-*/).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:45:52 -04:00
49c6c8fd00 fix(seed): include cancel_at_period_end in test-user subscription INSERT
Discovered during Phase 9 QA: seed_test_users.py was missing the
cancel_at_period_end column in its subscriptions INSERT, but the
column is NOT NULL (added in 016_add_subscription_tables.py).
Result: seed crashed with NotNullViolationError before any users
were created, blocking auth in fresh dev environments.

Pre-existing on main; not introduced by the FlowPilot migration
branch. Default value: false.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:36:04 -04:00
a77e8ea578 chore: bootstrap gstack team mode
Per gstack team-mode install: adds a PreToolUse hook that blocks
skill usage when gstack isn't installed globally, so contributors
are prompted to install it. Un-ignores the two required files
(.claude/settings.json, .claude/hooks/check-gstack.sh) while
keeping settings.local.json and other Claude state ignored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:17:06 -04:00
90252bc98f docs(claude-md): expand gstack section with full grouped command list
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:17:01 -04:00
036431aef8 chore(ai): update HANDOFF.md and SESSION_LOG.md for session end
All checks were successful
Mirror to GitHub / mirror (push) Successful in 3s
Reflect current state: dual-agent migration + Codex review round +
branch cleanup (RLS test gating, Phase 9 docs, .remember/ gitignore,
landing-handoff deletion). Working tree clean, no active task, 3
cleanup commits queued to push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 16:16:55 -04:00
b3be1e0749 chore: ignore .remember/ skill runtime state
Runtime hook logs and PIDs from the remember skill — local-only, not
repo content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 16:09:23 -04:00
b3506b5e73 docs(pilot): phase 9 review issues
Review findings companion to docs/FlowAssist_Migration/Issues/phase-8-review-issues.md.
Documents the issues addressed by commit 24972e8 (partial-outcome notes
+ per-fix script-builder remount).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 16:09:23 -04:00
b14a16a1ab chore(tests): gate RLS tests behind RUN_RLS_TESTS flag
Continues the test-isolation work from dab740d. RLS migration tests run
against a policy-installed database and fail in the default create_all
suite, so they need to be opt-in:

- pytest.ini: register `rls` marker.
- conftest.py: auto-deselect test_rls_isolation.py unless
  RUN_RLS_TESTS=1. Drops the deprecated session-scoped event_loop
  fixture (not needed since pytest-asyncio 0.23+).
- test_rls_isolation.py: tag module with `rls` marker. Replace
  hardcoded `patherly_test` DB reference with parsed DATABASE_TEST_URL
  (matches conftest.py default `resolutionflow_test`). Updated docstring
  command to show RUN_RLS_TESTS=1.
- requirements-dev.txt: bump pytest-asyncio 0.23.0 → 0.24.0 (loop-scope
  marker behavior required by the RLS module fixture).

Run the RLS suite with:
  RUN_RLS_TESTS=1 DB_APP_ROLE_PASSWORD=... pytest tests/test_rls_isolation.py

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 16:09:13 -04:00
9c8ba296a8 fix(ai): correct stale role-hierarchy and file-listing claims
All checks were successful
Mirror to GitHub / mirror (push) Successful in 3s
Codex review of the dual-agent handoff migration flagged factual errors
carried over verbatim from the pre-migration CLAUDE.md. All claims
verified against the live code before correction.

PROJECT_CONTEXT.md — SaaS shape:
- Role hierarchy was `super_admin > team_admin > engineer > viewer`,
  but `backend/app/core/permissions.py:4` and
  `frontend/src/hooks/usePermissions.ts:4` both define it as
  `super_admin > owner > engineer > viewer`. The `team_admin` concept
  exists separately as an orthogonal team-scoped gate
  (`require_team_admin`, `is_team_admin=True` + valid `team_id`), not
  a level in the primary hierarchy.
- Dep list was missing `require_account_owner` and `require_team_admin`,
  both present in `backend/app/api/deps.py`.

PROJECT_CONTEXT.md — directory tree:
- `api/endpoints/` comment listed 11 routers; `api/router.py` actually
  registers 50+. Replaced with a summary that points at `router.py` as
  the source of truth instead of trying to maintain a freezing list.
- `services/psa/` comment omitted `exceptions.py` and `ticket_context.py`,
  both present in the directory.

CURRENT_TASK.md + TODO.md:
- Replaced `<!-- EXAMPLE -->` placeholders with clearer empty-state
  sentinels so a resume agent sees "no real task yet" at a glance
  rather than placeholder acceptance criteria that look unresolved.

SESSION_LOG.md updated with a follow-up bullet documenting this pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 15:09:22 -04:00
bee8690056 chore(ai): migrate to dual-agent handoff system
Split the monolithic CLAUDE.md into a durable handoff system:

- .ai/PROJECT_CONTEXT.md  — stable architectural truth (stack, structure,
  SaaS shape, ConnectWise, coding standards, frontend patterns, critical
  lessons). Ported verbatim from the previous CLAUDE.md.
- .ai/CURRENT_TASK.md     — single active task with DoD + out-of-scope.
- .ai/HANDOFF.md          — resume point, kept under ~2K tokens.
- .ai/TODO.md             — backlog, read only when CURRENT_TASK complete.
- .ai/DECISIONS.md        — append-only architectural decision log.
- .ai/SESSION_LOG.md      — append-only chronological history.
- .ai/README.md           — human-facing explanation of the system.

Root agent files share a byte-identical protocol block (verified via diff):

- CLAUDE.md — primary agent, with GitNexus + gstack tooling and the
  Claude Opus 4.7 co-author trailer.
- AGENTS.md — OpenAI Codex resume agent, with grep/rg fallbacks and the
  Codex co-author trailer. Steps in when Claude hits session/weekly
  limits.

Legacy root-level SESSION-HANDOFF.md deleted — superseded by .ai/HANDOFF.md.
It was a self-describing one-off from the Design System v4 migration and
had no external references.

Supersedes previous CLAUDE.md. Old version recoverable via
`git show pre-ai-handoff:CLAUDE.md` (tag points at commit e110fed).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 14:50:41 -04:00
995a0c1d2e fix(psa): use schedule entries for ticket co-assignees (CW canonical pattern)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 33s
CI / backend (pull_request) Failing after 17m0s
CI / frontend (pull_request) Failing after 51s
CI / e2e (pull_request) Has been skipped
The previous implementation PATCHed the `resources` string directly, which CW
silently ignores because `resources` is a server-derived read-only field (it's
populated from schedule entries of type/id=4, not freely writable).

Per CW docs (openapi line 70949): "Please use the
/schedule/entries?conditions=type/id=4 AND objectId={id} endpoint".

Behavior per spec:
- No owner + assign user → set owner (existing behavior kept)
- Has owner + assign different user → POST /schedule/entries with type/id=4,
  member, objectId; owner untouched
- User already assigned (owner or schedule entry) → idempotent no-op
- Remove owner → clear owner (existing behavior kept)
- Remove co-assignee → DELETE /schedule/entries/{entry_id}
- list_resources now merges owner + schedule-entry members, deduped by id

Required CW security role permission on the API member:
- Service > Resource Scheduling > Add/Inquire/Delete

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 00:34:18 +00:00
f6a24ea4e1 fix(psa): resource assignment targets CW owner, status PATCH verifies apply
Some checks failed
Mirror to GitHub / mirror (push) Successful in 2s
CI / backend (pull_request) Failing after 15m32s
CI / frontend (pull_request) Failing after 45s
CI / e2e (pull_request) Has been skipped
Previous `resources`-string PATCH was silently ignored by CW — the
`resources` field is server-derived from the ticket's owner + schedule
entries, not freely writable. Status PATCH could also silently no-op
when a cross-board status id was sent.

- add_resource: when the ticket is unassigned, set the `owner`
  MemberReference (the canonical writable primary-assignee field).
  If already owned by someone else, append the identifier to the
  `resources` co-assignee string best-effort.
- remove_resource: clear `owner` (with remove→replace:null fallback) if
  the target is the current owner, otherwise strip from `resources`.
- list_resources: merge owner + resources string, deduped by member id,
  so the UI reflects both single-owner and multi-resource assignments.
- update_ticket_status: verify CW applied the status by comparing the
  response body's status.id — raises PSAError with a clear message when
  CW silently rejects the change (e.g., status invalid for ticket's
  board), instead of reporting spurious success.
- Frontend: surface the backend error detail in the toast so users see
  the real reason instead of a generic "Failed to update" message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 21:39:21 +00:00
04ff2ea301 fix(tickets): refresh status and resources in detail panel after update
Some checks failed
Mirror to GitHub / mirror (push) Successful in 3s
CI / backend (pull_request) Failing after 17m32s
CI / frontend (pull_request) Failing after 48s
CI / e2e (pull_request) Has been skipped
Status update was returning only new_status (string) and the parent list's
onStatusUpdated only set status_name. The <select> was bound to status_id,
which never changed — so it visually reverted to the old status even though
the PATCH succeeded.

- Backend: include new_status_id in the status-update response.
- Panel: own currentStatusId/currentStatusName state so the select reflects
  the change immediately and survives stale parent snapshots.
- Parent list: update status_id on both the row and selectedTicket so the
  list row stays in sync when the panel stays open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 21:28:48 +00:00
60851b400a fix(tickets): status filter dropdown and CW resource assignment
Some checks failed
Mirror to GitHub / mirror (push) Successful in 4s
CI / backend (pull_request) Failing after 17m51s
CI / frontend (pull_request) Failing after 52s
CI / e2e (pull_request) Has been skipped
- Status filter: aggregate statuses across all boards (deduped by name)
  when no board is selected. Backend accepts status_name and filters by
  status/name so the same status matches across boards.
- Resource assignment: CW has no /service/tickets/{id}/members endpoint —
  assignees live in the ticket's comma-separated `resources` string field.
  Rewrote list/add/remove to read/PATCH that field via member identifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-16 21:03:00 +00:00
bea34229d6 chore: bump version and changelog (v0.1.0.0)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 4s
CI / backend (pull_request) Failing after 18m54s
CI / frontend (pull_request) Failing after 47s
CI / e2e (pull_request) Has been skipped
Add CW security roles reference docs and PSA ticket management plan.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 14:44:03 +00:00
294b309faa fix: pre-landing review fixes — company_id filter and CW condition injection
- Apply company_id filter in CW search_tickets conditions (was silently ignored)
- Sanitize query string to strip single quotes before CW condition interpolation
- Add psaError state to TicketsPage for permissions error surfacing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 14:42:05 +00:00
fb7690485b fix(tickets): fix statuses endpoint, members auth gate, and graceful error handling
All checks were successful
Mirror to GitHub / mirror (push) Successful in 3s
- Add GET /boards/{board_id}/statuses endpoint — direct board-to-statuses lookup
  without ticket roundabout; used by filter bar and new ticket form
- Fix TicketsPage and NewTicketModal to call getBoardStatuses(board_id) instead
  of misusing getTicketStatuses(ticket_id) with a board_id value
- Fix list_members auth: was require_account_owner (owner/super_admin only) —
  changed to require_engineer_or_admin so engineers can see member list for
  ticket assignment
- list_members: return [] on PSAError instead of 502 (Lesson 111 pattern)
- get_ticket_statuses: return [] on PSAError instead of 502

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 05:33:23 +00:00
6044d5a88b fix(tickets): fix permissions toast, board fallback, assignment search, remove load more
All checks were successful
Mirror to GitHub / mirror (push) Successful in 2s
- list_resources: return [] on PSAError instead of 502 — stops global interceptor
  toast when CW API key lacks ticket members permission (Lesson 111)
- list_boards/list_priorities: add warning logging so Railway logs reveal the
  root cause when CW permissions are missing
- TicketsPage: derive board options from ticket search results when listBoards
  returns empty (CW permissions fallback)
- TicketFilterBar: replace assignment <select> with searchable member picker —
  fixed options (All/Mine/Unassigned) + text-filtered member dropdown
- TicketQueue: remove Load More / infinite scroll; page now exists at /tickets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 04:59:03 +00:00
00cd8b7c55 feat(tickets): update TicketQueue with mapping detection, 5-item cap, View All link
All checks were successful
Mirror to GitHub / mirror (push) Successful in 4s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:42:25 +00:00
fded959b5e fix(tickets): guard linkedTicket fetch with currentChatRef to prevent race condition
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:40:31 +00:00
5f5b9e5b23 feat(tickets): add spin-off ticket creation in ResolutionAssist — state, action handler, modal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:37:46 +00:00
b2ee1a2150 fix(tickets): improve accessibility and error logging in ticket creation components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:34:08 +00:00
08909aa884 feat(tickets): add AiTicketParseForm and NewTicketModal with two-tab creation flow
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:31:21 +00:00
070d2383bc fix: remove unused PSATicketSearchResult import
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:26:23 +00:00
d7b1fe6645 feat(tickets): add TicketResourceManager and full TicketDetailPanel with optimistic hydration
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:24:18 +00:00
a3f8bb3427 feat(tickets): add ticket detail subcomponents
- TicketDetailHeader: Display ticket info with status dropdown
- TicketNotesFeed: Chronological list of ticket notes with internal flag
- TicketAddNote: Form to add notes (requires linked session)
- TicketConfigs: Display related configurations/devices
- TicketRelated: List of related tickets as clickable buttons

All components use type-safe imports from psaContext and integrations APIs.
Styling follows design system (flat dark theme, electric blue accent, Tailwind v4).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:19:18 +00:00
f050afc2f7 feat(tickets): add /tickets route and sidebar nav item
Add Tickets page route to router with lazy loading and code splitting.
Add Tickets navigation entry to sidebar in RESOLVE section for both
icon rail and pinned layouts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:15:35 +00:00
849e1c16e2 feat(tickets): add TicketsPage with URL-param filter state, stub detail panel and modal
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:12:26 +00:00
5310cd3fff fix(tickets): add company_id reset to filter clear button
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:09:51 +00:00
d2689afa53 feat(tickets): add TicketFilterBar and TicketListRow components
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:08:15 +00:00
9d88c8456c feat(tickets): add tickets API client, update integrations API for paginated search, fix callers
- Create frontend/src/api/tickets.ts with ticketsApi (resources, status, create, ai-parse, priorities, search)
- Update integrationsApi.searchTickets and searchTicketsQueue return types from PSATicketSearchResult[] to TicketListResponse
- Fix TicketQueue.tsx to use results.items (append/set) and results.items.length for pagination check
- Fix TicketPickerModal.tsx to use results.items when setting search results
- Export ticketsApi from api/index.ts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:05:13 +00:00
506aac609d feat(tickets): add tickets types, expand PSATicketSearchResult/PSATicketInfo with IDs
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:02:53 +00:00
7fa81f69a6 feat(psa): add spin-off ticket system prompt rule, backend routing tests
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 03:01:21 +00:00
6e0188d0b4 feat(psa): add AI ticket parse endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:59:02 +00:00
24ab1908a6 fix(psa): add TicketListResponseSchema response_model to search_tickets endpoint
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:57:23 +00:00
e2cdfac1c3 feat(psa): update search endpoint for pagination, add create/status/resource/priority endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:55:49 +00:00
a5e9615666 feat(psa): add ticket_service.py with list/add/remove resource, update_status, create_ticket
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:52:32 +00:00
66cca70588 feat(psa): expand PSATicketSearchResult with IDs, add psa_tickets.py schemas
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:50:56 +00:00
e714088a2b feat(psa): implement list/add/remove resources, create_ticket, paginated search in CW provider
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:49:20 +00:00
ff0ec143e2 feat(psa): add PSAResource, TicketCreatePayload types and abstract provider methods
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:45:24 +00:00
8d964e64e4 fix(psa): update autotask/halopsa stub search_tickets return type annotation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:44:08 +00:00
44634b1145 feat(psa): add PaginatedTicketResult type, update provider search_tickets signature
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:41:48 +00:00
001438008b docs: fix PSA ticket management spec — prefill state, TicketQueue naming
- Replace false claim about linkedTicket state with explicit fetch step on modal open
- Remove MyQueueWidget references; TicketQueue is the existing component being updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 02:00:33 +00:00
c8b68ad26d docs: fix PSA ticket management spec — pagination source, widget, linked ticket IDs
- Define PaginatedTicketResult provider type + parallel count fetch via CW /count endpoint
- Fix dashboard widget: updates existing TicketQueue (not new), uses searchTicketsQueue
- Fix NewTicketModal prefill: expand PSATicketInfo with company_id/board_id fields
- Correct Dashboard section description: not collapsible, TicketQueue already exists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 01:49:39 +00:00
2b3d52ad77 docs: fix PSA ticket management spec — API contract, actions format, file routing
- Explicitly call out search_tickets breaking change and all existing callers
- Fix [ACTIONS] marker to use JSON array format matching existing parser
- Route system prompt change to assistant_chat_service.py, not flowpilot_engine
- Pivot detail panel hydration to existing getTicketContext + listResources

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 01:44:34 +00:00
52b369680b docs: add PSA ticket management design spec
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 01:36:27 +00:00
338 changed files with 38404 additions and 2881 deletions

37
.ai/CURRENT_TASK.md Normal file
View File

@@ -0,0 +1,37 @@
# CURRENT_TASK.md
**Active task:** Self-serve signup Phase 2 — PR #162 is open on `feat/self-serve-signup-phase-2`. Current focus is resolving its failing Gitea checks. Phase O manual ops (Stripe live setup, internal validation, flag flip) remain pending after review/merge. See `.ai/HANDOFF.md` for the resume point.
## Recently shipped
- **2026-05-06 — `feat/self-serve-signup-phase-2`** Phase 2 frontend cutover code (Tasks 2744 of the plan, 18 commits). Backend remainders + frontend billing foundation + auth surfaces (OAuth + accept-invite + verify-email) + welcome wizard + dashboard redesign (TrialPill, NextStepCard, unified checklist) + public surfaces (`/pricing`, `/contact-sales`) + beta-signup deprecation. Phase O (Stripe live setup, internal validation, flag flip) is operational and pending. Single alembic head `c6cbfc534fad` (no new migrations).
- **2026-05-02 — PR #159** In-product User Guides rewrite. Merged into `main`. Replaced 15 feature-dump guides with 43 problem-oriented Diátaxis how-tos grouped under 10 categories. Dropped Maintenance Flows / AI Assistant / Flow Assist Sparkles (UI no longer exists). Renamed Step Library → Solutions Library. Authored 14 net-new how-tos for FlowPilot-era surfaces (tasklane keyboard flow, what-we-know, resolve, escalate, record-fix-outcome, post-docs-to-ticket, share-update, pause-and-leave, build-script-from-scratch, open-suggested-flow, pin-a-flow, invite-teammate, etc.). Schema additions: `category`, optional `relatedSlugs`; hub renders category sections; detail page renders related-guides footer. Fixed rendering bug where `**bold**` in `step.tip` rendered literally. Killed misleading "N sections" subtitle on guide cards. Browser-verified against engineer + owner login (sidebar labels, account sub-pages, pilot-screen header buttons, Tasks panel, integration form). Two unverified items intentionally deferred: change-teammate-role (requires non-owner test member to inspect role-change control) and detailed Resolve / Escalate modal contents (Resolve gated by 6 pending tasks in test data). tsc and Vite build clean.
- **2026-05-01 — PR #158** Session-screen UX impeccable pass + tasklane keyboard flow. Merged into `main` as `5e10005`.
- **Impeccable pass** (5 sub-passes — distill / quieter / layout / typeset / polish): score 24/40 → 33/40. Removed the duplicate "Suggested checks" chip strip; added an inline `Next steps · N pending in Tasks` cue above the latest action-bearing AI bubble; consolidated the desktop session header to Resolve + Escalate + ⋯ kebab (Context / New Ticket / Update Ticket / Pause now under the kebab, mobile kebab gained Context + New Ticket parity); centered the messages column to `max-w-3xl` to match the composer; bubbles dropped to `rounded-xl`. Decoration sweep: dropped 3px side stripes (TaskLane done states, all 6 ProposalBanner modes, WhatWeKnowItem rows), gradient backgrounds (WhatWeKnow + every banner), accent borderTop on TaskLane header, backdrop-blur on handoff overlay, animate-pulse-amber ring in VerifyingBanner, bordered avatar boxes in banners. Type sweep: 14 distinct sizes → 5-step scale (10/11/12/13/14px). Icon disambiguation: `MessageCircleQuestion` split into `Pencil` (Answer CTA) + `HelpCircle` (per-check explainer). Dead `font-sans` audit (12 sites) and double `text-xs` cleanups.
- **TaskLane keyboard-first flow** (real feature): Enter submits + auto-advances to next pending task, Shift+Enter newline, Esc cancels, focus jumps to Send Responses after the last submission. Mouse path also auto-advances. Subtle hint row teaches the shortcut.
- **Banner ↔ script panel linked**: collapsing or dismissing the ProposalBanner now also hides the InlineNoTemplateDialog / TemplateMatchPanel; recording any outcome closes both surfaces.
- **WhatWeKnow collapsible**: per-session preference in `sessionStorage` (`rf-whatweknow-collapsed:{sessionId}`); auto-collapses on first render at ≥5 facts.
- **Side fix**: `ParameterizationPreview.tokenize()` word-boundary guard prevents over-eager highlighting of short values like `"D"` (no longer lights up every capital D in `Get-ADUser`).
- Validation: tsc clean, ESLint clean, Vite build clean. Type-check + lint passed at every commit boundary.
- **2026-05-01 — PR #156** Suggested-fix `applied_pending` non-terminal outcome. Merged into `main` as `3ba4532`. Adds:
- Schema/API: `FixStatus="applied_pending"`, `pending_reason` Text column, migration `c0f3a4b7e91d`. `PATCH /suggested-fixes/{id}/outcome` accepts pending, requires notes, stamps `applied_at` only.
- UI: `PendingBanner` (info-tone, worked / didn't / update reason / dismiss). "Waiting to verify…" overflow option in `VerifyingBanner`. Nudge "Still checking" records pending with a reason. Page-level Resolve auto-patches pending → success before resolution flow; page-level Escalate intercepts pending the same way verifying/partial does.
- Generators: `resolution_note_generator` and `escalation_package_generator` system prompts handle the new status without real-looking examples.
- Tests: 4 new in `test_fix_outcome_endpoint.py` (21/21 suite green); prompt anti-parrot guardrail green; tsc + Vite build clean.
- QA report: `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md` (5/7 scripted checks PASS with concrete evidence; 2 entry-path checks deferred — same handlers verified via tested transitions).
- **2026-04-30 — PR #155** Escalation Mode wedge merged as `ac42f97`. Senior-tech magic-moment screen. Plan: [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md).
## Two-metric framing (Escalation Mode — read before quoting numbers)
The in-product `GET /analytics/flowpilot/escalations` endpoint measures *post-claim time-to-first-action*. The "minutes recovered" sales claim is `manual_baseline in_product_metric`. Manual baseline comes from the founder's stopwatch on the next 5 escalations. Don't roll the in-product number alone into "minutes recovered" — that's the apples-to-oranges miscount Codex caught.
## Kill-switch (Escalation Mode)
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.
## Notes for next session
- Drive checks 1 (VerifyingBanner overflow → "Waiting to verify…") and 5 (nudge "Still checking" with 3+ post-apply messages) in real pilot usage to close the QA gap left by `/qa` (the tested handlers cover the same mutations, but the entry-path UI rendering wasn't exercised end-to-end).
- Consider monitoring how often pending fixes get parked vs resolved — if engineers report losing track across sessions, revisit the cross-session "Follow-ups" dashboard rollup that was scoped out.
- After PR #158 lands in real ticket flow, eyeball the keyboard-hint contrast and the WhatWeKnow auto-collapse-at-5 threshold — both were judgment calls (5 was a guess; the contrast bump from `/70` to full muted-foreground was based on my read, not real screen testing). Adjust if the 5-fact threshold feels too aggressive or too lenient mid-session.
- Two follow-ups logged in `.ai/TODO.md` from the impeccable pass: `ConcludeSessionModal` paused/escalated step should allow multi-select (Ticket Notes + Client Update + Email Draft simultaneously) — real feature work; `bg-card-hover` Tailwind class doesn't resolve in `CommandPalette` — two-line fix.

148
.ai/DECISIONS.md Normal file
View File

@@ -0,0 +1,148 @@
# DECISIONS.md
> Append-only architectural decision log. Newest entries at the top.
> Entry format:
>
> ```
> ## YYYY-MM-DD — <short title>
> **Context:** why this came up
> **Decision:** what we chose
> **Rejected:** what we didn't choose and why
> **Consequences:** what this means going forward
> ```
---
## 2026-05-07 — Standardize backend Python on 3.12
**Context:** Runtime facts had drifted from docs. The backend Dockerfiles and running dev container were already on Python 3.12, GitHub CI had just been updated to 3.12, but project docs still said Python 3.11 and Gitea CI relied on the runner's ambient Python.
**Decision:** Treat Python 3.12 as the backend standard. Pin local pyenv via `.python-version` to 3.12.13, matching the current `python:3.12-slim` container patch level. Add explicit Python 3.12 setup to Gitea CI and keep GitHub CI on Python 3.12.
**Rejected:** Moving Docker/runtime back to Python 3.11. The application was already building and running on 3.12, so reverting the runtime would add churn without a product or dependency reason.
**Consequences:** Native backend work should use `backend/venv` created from Python 3.12.13. Future docs/CI/runtime changes should preserve Python 3.12 unless a deliberate upgrade decision is recorded.
## 2026-04-30 — Add `applied_pending` non-terminal status to suggested fixes
**Context:** The verifying banner forces a synchronous verdict — worked / didn't / partial — but a lot of real MSP fixes are async. Engineer ran the script but is waiting on the client to power-cycle, AD replication, an O365 license sync. With only the existing outcomes, the engineer either leaves the banner stale (eroding the verifying signal) or guesses wrong (corrupting outcome data). User flagged the gap directly. Today's `NudgeBanner` "Still checking" button just silences the nudge — it doesn't tell the system anything.
**Decision:** Add a fourth, non-terminal outcome `applied_pending`, parallel to `applied_partial`. Required `pending_reason` Text column stores the "what are you waiting on?" reason. Outcome endpoint allows pending → {success, failed, partial, dismissed} transitions; pending stamps `applied_at` but NOT `verified_at` (it's parked, not verified). Resolution-note generator frames the fix as provisional (no closure language); escalation-package generator surfaces pending verification as the leading hypothesis with a reference to what's being waited on. Frontend exposes the state via a new `PendingBanner` component (info-tone, mirrors `PartialBanner`) plus a "Waiting to verify…" overflow option in the verifying banner. `NudgeBanner` "Still checking" now records pending with a reason instead of just silencing.
**Rejected:**
- **Reuse `applied_partial`.** Semantically wrong — partial means "I did some of it." Pending means "I did all of it, just can't tell if it worked." Generators write different prose for each, and conflating them would lose the distinction in the customer-facing resolution note and the next-engineer escalation handoff.
- **Add a `pending_reason` column without a new status.** The status field is what the dashboard, banner, and generators all branch on. Hiding pending state in a separate column would proliferate `IF pending_reason IS NOT NULL` checks across every consumer.
- **Cross-session "Follow-ups" dashboard rollup in v1.** Per-session `PendingBanner` is the chat-anchored reminder. Add the dashboard surface only if engineers report losing track across multiple pending sessions in pilot use.
- **Optional follow-up timer ("remind me in 30m").** Out of scope; nice-to-have but not the wedge.
**Consequences:**
- Engineers can park a fix honestly without losing the verifying signal. The state survives across sessions because it's persisted server-side.
- `pending_reason` is preserved as audit trail when the engineer advances pending → success/failed/dismissed; it is not auto-cleared. Intentional — it tells the next reader "we waited for X, then it worked."
- New consumers of `FixStatus` must handle the `applied_pending` case. Currently three: the banner derivation in `AssistantChatPage`, the resolution-note generator, and the escalation-package generator. All three updated in this change.
- Migration `c0f3a4b7e91d` is reversible — downgrade rewrites pending rows back to `applied_partial` and copies `pending_reason` into `partial_notes` if the partial slot was empty, then drops the column.
---
## 2026-04-30 — Allow `escalated_to_id` to send chat messages in claimed sessions
**Context:** During browser QA, clicking "Get AI analysis" on the magic-moment screen returned `POST /ai-sessions/{id}/chat → 400`. The senior tech who claimed the session is stored as `escalated_to_id` on `AISession`, not `user_id` (which remains the junior who created the session). `unified_chat_service.send_chat_message` queried `WHERE ai_sessions.user_id = :user_id`, so the senior's ID never matched and the endpoint rejected the request.
**Decision:** Extend the ownership check in `send_chat_message` to `OR ai_sessions.escalated_to_id = :user_id` using SQLAlchemy `or_()`. This is the minimal, correct fix: the session model already has a semantically valid "also owns" field for the claiming senior; extending the WHERE clause makes that ownership real.
**Rejected:**
- **Transfer `user_id` to the senior on claim.** Breaks the audit trail — `user_id` is the originating engineer throughout the session lifecycle. Any query scoped to "sessions this engineer worked on" would silently lose the junior's history.
- **A separate `can_send_message` service method.** Adds indirection with no benefit for v1. One `or_()` line in the existing query is sufficient.
- **Checking a role/permission flag instead.** Role gating (engineer/admin) already happens at the claim endpoint. The chat-send check is about session ownership, not role. Mixing the two concerns would be confusing.
**Consequences:**
- Seniors can send AI briefings and continue chat work in sessions they have claimed. Core escalation pickup flow unblocked.
- Any future caller of `send_chat_message` should be aware that "user_id or escalated_to_id" is the ownership rule. The service-level check is the single enforcement point.
- `user_id` remains the originating engineer for all audit, history, and analytics queries. No data migration needed.
---
## 2026-04-29 — Consolidate the three per-escalation AI calls into one structured generation
**Context:** A single user-initiated escalation currently triggers three separate Sonnet calls, all summarizing the same source material (session state, steps taken, "what we know") from slightly different angles:
1. `_build_escalation_package_enhanced` — runs in the background `enrich_escalation_async` task, builds a rich JSON payload that's saved to `ai_session.escalation_package`.
2. `_generate_ai_assessment` — also background, returns the magic-moment screen fields (`likely_cause`, `suggested_steps[]`, `confidence`).
3. `generate_status_update` — engineer-triggered when they click "Ticket Notes" / "Client Update" / "Email Draft" in the conclude modal, generates audience-specific PSA prose.
The user surfaced the smell: the engineer is *typically* generating a status update during the escalate flow, so the AI assessment work is being done twice with overlapping context and the engineer's PSA prose is being thrown away. Live test on 2026-04-29 also showed that bumping the assessment timeout 15s → 45s did NOT fix the empty-placeholder bug — meaning the architectural smell is also a demo blocker.
**Decision:** ONE structured AI call per escalation that produces a single payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Persist to `SessionHandoff`. The conclude modal's "Ticket Notes" button reads from the saved prose instead of calling the model. "Client Update" and "Email Draft" buttons trigger a cheap Haiku transformation over the saved prose (tone shift only, not a re-summarization).
Proposed payload shape (final form decided during implementation):
```json
{
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
"what_we_know": ["<one-liner>"],
"likely_cause": "<one sentence>",
"suggested_steps": ["<short step>"],
"confidence": "low | medium | high",
"audience_variants": {"client_update": null, "email_draft": null}
}
```
`audience_variants` filled lazily on first user request, cached.
**Rejected:**
- **Just bumping the timeout further.** Already tried 5s → 15s → 45s. The architectural redundancy is the real cost — even if Sonnet completed reliably, three calls per escalation is wasteful and creates three places where state can diverge.
- **Reusing the engineer's status update content as the AI assessment.** User's first instinct, but: status updates aren't always generated (engineer has to click), they're audience-specific (so you'd pick which one to copy), and they're prose without the structured fields the magic-moment screen needs. The right consolidation is the OTHER direction — generate ONE structured payload that the status-update buttons consume.
- **Switching the assessment to Haiku for speed.** Faster but solves only the latency symptom, not the redundancy. Doesn't help the conclude modal's status-update buttons.
**Consequences:**
- Magic-moment screen populates in ~5s instead of 25s+ (work happens in the foreground escalate path, not in a background task that races with the senior's pickup).
- Token spend per escalation drops by ~60% — one Sonnet call replaces two; the third (audience variants) becomes Haiku.
- Engineer's "Ticket Notes" button is instant — no model round-trip.
- Schema enforcement matters. The current `_generate_ai_assessment` returns freeform prose that the frontend stuffs into `assessment_text` because the structured fields aren't reliably parseable. The new call must use Anthropic's structured output / tool-use to enforce the schema.
- Migration concern: `ai_session.escalation_package` JSON column has live data on existing sessions. Keep it READABLE for backward compatibility; just stop *writing* the enhanced payload from `enrich_escalation_async`. If downstream queue summaries depend on it, dual-write the basic snapshot.
- Test fixtures (`test_handoff_manager.py`, `test_session_handoffs_api.py`) currently stub `_generate_ai_assessment` via `AsyncMock`. Updating the stubs is part of the rename.
- The frontend SSE assessment-ready subscription (added in `0f00ee5`) stays as-is — it just listens for the new event payload.
---
## 2026-04-28 — Tag the task-lane state with an owner chatId
**Context:** A recurring bug — every time the user returned to test escalation work, creating a new session would flash the previous session's task-lane data (questions, actions, "Tasks" pill counts) before the new session's AI response landed. The first attempt to fix it (`8914391`) added initializer-time guards (`incomingPrefill || isPickup`) that skipped the sessionStorage restore on mount. That covered exactly two entry paths and missed every other case: in-place URL navigation, mid-flight pickup, HMR re-runs, and the gap between `setActiveChatId(B)` and the AI response that finally populates B's questions/actions. The persistence effect made it worse by writing `{chatId: activeChatId, questions: activeQuestions}` — at any moment where activeChatId had flipped before the questions were updated, sessionStorage was stamped with `{chatId: B, questions: [A's data]}` and a subsequent restore would happily render A's data for B.
The root cause was that `activeQuestions` / `activeActions` / `showTaskLane` were three independent state slices implicitly assumed to be in sync with `activeChatId`. The synchronization was by convention, not by structure. Every code path that mutated them had to remember to call `resetSessionDerivedState` first; missing one created stale UI.
**Decision:** Add a `taskLaneOwnerChatId` state that records *which chatId the in-memory questions/actions belong to*, set at every site that populates them (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix), cleared in `resetSessionDerivedState`. The persistence effect writes ownerChatId as the chatId tag. Render is gated on `taskLaneOwnerChatId === activeChatId` and ANDed into all three render conditions (toolbar Tasks button, narrow-viewport floating drawer, main side panel). The mount-time `skipTaskLaneRestore` guard stays as belt-and-braces for the prefill/pickup entry-flash window, which the owner-gate alone doesn't cover.
**Rejected:**
- **More entry-path guards.** That's whack-a-mole — the next path nobody anticipated will reproduce the bug. The owner-gate makes the bug structurally impossible regardless of which path triggers it.
- **Combining the four state slices into a single tagged object.** Cleaner long-term but a bigger refactor with more touch points. The owner-tracking approach gets the structural guarantee with a minimal diff and keeps the existing setState patterns.
- **Inlining the comparison at every render site.** Works but proliferates the comparison; one named derived value (`taskLaneIsForActiveChat`) reads better and groups the gate with the persistence-effect / state declarations as a named concept.
**Consequences:**
- Stale task-lane data is structurally unable to display. The lane is hidden during any window where `ownerChatId !== activeChatId`, no matter what mutation path got you there.
- Adding new sites that populate `activeQuestions` / `activeActions` requires also setting `taskLaneOwnerChatId`. The pattern is documented in the commit message and visible in every existing populate site as a paired call.
- The mount-time `skipTaskLaneRestore` guard is now redundant in steady-state but kept for the few-hundred-ms flash window between component mount and the first sendPrefill / selectChat effect. Deleting it would re-introduce a (smaller) flash without strong reason.
- Future task-lane state slices (e.g. `facts`, `activeFix`) follow the same pattern: gate their visibility on the owner check via the existing render conditions. Tagging more slices with their own `*OwnerChatId` is a future refactor if the slices diverge.
---
## 2026-04-24 — Adopt dual-agent handoff system (`.ai/` + `CLAUDE.md` + `AGENTS.md`)
**Context:** Claude Code hits session and weekly usage limits. Work stalls when the primary agent is locked out. Needed a structured way for OpenAI Codex to resume where Claude left off without losing architectural truth or drifting across sessions.
**Decision:** Split the old CLAUDE.md into `.ai/PROJECT_CONTEXT.md` (stable repo truth), agent-specific root files (`CLAUDE.md`, `AGENTS.md`) with a shared protocol block, and a small handoff toolkit (`CURRENT_TASK.md`, `HANDOFF.md`, `TODO.md`, `DECISIONS.md`, `SESSION_LOG.md`, `README.md`). Previous CLAUDE.md snapshotted in commit `e110fed` before the migration.
**Rejected:**
- Single symlinked CLAUDE.md/AGENTS.md — diverges silently, hides agent-specific tooling differences.
- Putting GitNexus/gstack content in AGENTS.md — Codex doesn't have those tools; would mislead the resume agent.
- Keeping the old CLAUDE.md as-is and adding AGENTS.md alongside it — duplicated truth, drift guaranteed.
**Consequences:**
- First read for either agent: `.ai/PROJECT_CONTEXT.md` + `.ai/CURRENT_TASK.md` + `.ai/HANDOFF.md`.
- Architectural changes in the repo require updating PROJECT_CONTEXT.md, not the root agent files.
- Git trailers differ per agent (`Claude Opus 4.7` vs `Codex`) — preserved in each root file.
- Legacy `SESSION-HANDOFF.md` deleted in the same commit; superseded by `.ai/HANDOFF.md`.

57
.ai/HANDOFF.md Normal file
View File

@@ -0,0 +1,57 @@
<!-- Keep under ~2K tokens. Old handoffs live in SESSION_LOG.md. Do not let this file accumulate history. -->
# HANDOFF.md
**Last updated:** 2026-05-07 (PR #162 CI investigation/fixes)
**Active task:** PR #162 (`feat/self-serve-signup-phase-2`) is open in Gitea. Current session is resolving its failing checks.
## Where this session ended
PR #162 originally failed quickly in Gitea CI. Public Gitea status metadata was available, but job logs redirected to login and no `GITEA_TOKEN` was present. The branch was pushed over SSH.
Fixed environment drift first:
- Standardized backend native/dev/CI Python on 3.12.13 to match Docker.
- Added `.python-version`.
- Rebuilt `backend/venv` from pyenv Python 3.12.13 and verified native `pytest --version` / `alembic --version` with explicit local env.
- Updated Gitea CI backend/e2e Python setup to 3.12.
Fixed Gitea runner assumptions next:
- Added `actions/setup-node@v4` with Node 20 to Gitea frontend and e2e jobs.
- Pushed `fix(ci): set up node in gitea workflow`.
Local frontend validation then exposed real lint failures in Phase 2 React code under the current lint stack. The current WIP fixes:
- `react-refresh/only-export-components` for exported pure helpers used by tests/shared invite OAuth code.
- `react-hooks/set-state-in-effect` warnings where local state intentionally mirrors route/config/cache state.
- `react-hooks/purity` warnings from `Date.now()` during render.
- Redundant loading-state write in pricing page.
Validation after those frontend changes:
- `docker exec -w /app resolutionflow_frontend npm run lint` passed.
- `docker exec -w /app resolutionflow_frontend npm run test:coverage` passed (`198` tests).
- `docker exec -w /app -e NODE_OPTIONS=--max-old-space-size=4096 resolutionflow_frontend npm run build` passed.
Known local noise:
- React `act(...)` warnings appeared in existing tests during coverage but did not fail the suite.
- Vite emitted large chunk warnings during build.
- Unrelated dirty/untracked files remain and should not be staged unless explicitly requested: `docker-compose.dev.yml`, `.env.example`, `abc-feat-self-serve-signup-phase-2-design-20260507-112020.md`, `core.*`, `docs/architecture/`, `docs/tutorials/`.
## Resume point
1. Commit the frontend lint fixes and `.ai/` handoff updates with the required Codex trailer.
2. Push `feat/self-serve-signup-phase-2`.
3. Poll Gitea PR #162 statuses for the new head SHA:
`curl -fsSL https://gitea.resolutionflow.com/api/v1/repos/chihlasm/resolutionflow/statuses/<sha> | python -m json.tool`
4. If statuses are still pending, report that local frontend CI is green and Gitea runner work is queued/running. If a check fails, public statuses may show only the context/description; logs require authenticated Gitea access.
## Carry-forward
- Phase O manual ops remain pending after PR review/merge: Stripe live setup, internal validation, feature-flag flip.
- Backend env: `SALES_LEAD_RECIPIENT_EMAIL`.
- Frontend env: `VITE_SELF_SERVE_ENABLED`, `VITE_GOOGLE_CLIENT_ID`, `VITE_MS_CLIENT_ID`, `VITE_OAUTH_REDIRECT_BASE`, `VITE_CALENDLY_URL`.
- Single alembic head remains `c6cbfc534fad`; Phase 2 added no migrations.

263
.ai/PROJECT_CONTEXT.md Normal file
View File

@@ -0,0 +1,263 @@
# PROJECT_CONTEXT.md — ResolutionFlow
> SaaS troubleshooting platform for MSPs. Stable architectural truth. Updated only when the repo's shape changes.
---
## Product & naming
Canonical product name is **ResolutionFlow**. `patherly` is the legacy internal name — still present in DB name (`patherly` on Railway, `resolutionflow` locally), some Railway service names, and historical paths. Treat as aliases, not canonical. Docker containers are `resolutionflow_*`.
**User terminology:** "Flows" (not Trees), "Projects" (not Procedures), "Solutions Library" (not Step Library). Maintenance flows hidden from pilot UI (backend retains them). DB column `tree_type` values unchanged.
---
## SaaS shape
Multi-tenant by account. Primary role hierarchy: `super_admin` > `owner` > `engineer` > `viewer` — driven by `is_super_admin` + `account_role`. Never `role=='admin'` — use `is_super_admin`. Separate team-scoped admin gate exists orthogonally to the role hierarchy: `is_team_admin=True` + valid `team_id`, enforced by `require_team_admin`. Backend deps in `app/api/deps.py`: `get_current_active_user`, `require_engineer_or_admin`, `require_admin`, `require_account_owner`, `require_team_admin`. Frontend: `usePermissions()` hook. Central logic in `backend/app/core/permissions.py` + `frontend/src/hooks/usePermissions.ts`.
---
## Status
Go-to-Market Validation (pre-PMF). Backend feature-complete (55+ endpoints, 100+ tests). Phase 0.5 FlowPilot telemetry baseline accruing. See [CURRENT-STATE.md](../CURRENT-STATE.md) for live status, [03-DEVELOPMENT-ROADMAP.md](../03-DEVELOPMENT-ROADMAP.md) for phases.
---
## Tech stack
- **Backend:** Python 3.12 + FastAPI, SQLAlchemy 2.0 async (asyncpg), Alembic, Pydantic v2, JWT (python-jose + bcrypt, JTI refresh rotation), APScheduler (in-process with FastAPI lifespan).
- **Frontend:** React 19 + Vite + TypeScript, Tailwind v4 (CSS-only config in `index.css`), Zustand (immer + zundo), React Router v7, Axios (token-refresh interceptor), Lucide.
- **DB:** PostgreSQL 16 (RLS enabled Phase 4, pgvector).
---
## Project structure
```
resolutionflow/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry
│ │ ├── api/endpoints/ # 50+ routers registered in api/router.py — auth/admin, trees/sessions, AI/chat, scripts, integrations, uploads, accounts, FlowPilot, etc.
│ │ ├── api/deps.py # auth deps (incl. require_team_admin)
│ │ ├── api/router.py # registration
│ │ ├── core/ # config, database, permissions, security, audit, rate_limit
│ │ ├── models/ # SQLAlchemy (incl. FlowProposal)
│ │ ├── schemas/ # Pydantic
│ │ ├── services/psa/ # PSA provider pattern (base, connectwise/, autotask/, halopsa/, cache, encryption, exceptions, registry, ticket_context, types)
│ │ ├── services/knowledge_flywheel.py + _scheduler.py
│ │ └── services/knowledge_gap_service.py
│ ├── alembic/versions/ # 001-070 sequential, then hex hash
│ ├── scripts/ # seed_data, seed_trees, seed_test_users
│ └── tests/ # pytest integration
├── frontend/
│ ├── src/
│ │ ├── api/ # Axios client + endpoint modules
│ │ ├── components/ # common, layout, dashboard, tree-editor, session, procedural, procedural-editor, library, step-library, ui, flowpilot
│ │ ├── hooks/ # usePermissions, useSessionTimer, useKeyboardShortcuts
│ │ ├── pages/
│ │ ├── store/ # Zustand (auth, treeEditor, proceduralEditor, userPreferences, scriptGeneratorStore)
│ │ └── types/
│ └── (Tailwind v4 CSS-only config in src/index.css)
├── docs/plans/archive/ # pre-March 2026 plans
├── docs/connectwise/ # CW API reference + best-practices guides
├── docs/LESSONS-ARCHIVE.md # archived lessons (fixes in code)
├── .ai/ # dual-agent handoff system (see .ai/README.md)
├── CLAUDE.md · AGENTS.md · CURRENT-STATE.md · DESIGN-SYSTEM.md · DEV-ENV.md
```
---
## Dev commands
Full setup in [DEV-ENV.md](../DEV-ENV.md) (host-agnostic, with homelab Proxmox reference topology). Day-to-day:
```bash
docker compose -f docker-compose.dev.yml up -d # start stack
cd backend && source venv/bin/activate && uvicorn app.main:app --reload
cd frontend && npm run dev
pytest --override-ini="addopts=" # tests (first time: CREATE DATABASE resolutionflow_test)
cd backend && alembic upgrade head # migrate
cd backend && alembic revision -m "desc" # manual migration (preferred per Lesson 77)
cd backend && alembic revision --autogenerate -m "desc" # picks up drift; review carefully
cd frontend && npm run build # stricter than tsc --noEmit — final check
cd frontend && npx tsc -b # TS-only check when dist/ has EACCES
docker exec -it resolutionflow_postgres psql -U postgres -d resolutionflow
python -m scripts.seed_trees # seed (from backend/)
```
**Never pass `--rev-id`** to alembic — let it generate the hex hash.
**On hosts without native `python`/`node`/`npm`** (e.g. the code-server LXC), run commands inside the already-running containers instead:
```bash
docker exec resolutionflow_backend pytest --override-ini="addopts="
docker exec resolutionflow_backend alembic upgrade head
docker exec -w /app resolutionflow_frontend npm run build
docker exec -w /app resolutionflow_frontend npx tsc -b
```
---
## URLs & test users
**URLs:** Frontend <http://localhost:5173>, backend <http://localhost:8000>, API docs <http://localhost:8000/api/docs>.
**Test users** (all password `TestPass123!`): `admin@resolutionflow.example.com` (super_admin), `teamadmin@resolutionflow.example.com`, `engineer@resolutionflow.example.com`, `pro@resolutionflow.example.com`.
---
## CI
Gitea (`gitea.resolutionflow.com/chihlasm/resolutionflow/actions`). `gh` CLI works for issues/PRs on the GitHub mirror, but not CI runs.
---
## Deployment (Railway)
- **Prod:** `resolutionflow.com` (frontend), `api.resolutionflow.com` (backend).
- Auto-deploy: Gitea push → GitHub mirror → Railway follows GitHub `main`.
- PR environments auto-created; need manual domain generation + `VITE_API_URL` with `https://` prefix.
- `ALLOW_RAILWAY_ORIGINS=true` for `*.up.railway.app` CORS.
- Shared Variables (Railway project-level) auto-propagate to PR envs — use for secrets like `ANTHROPIC_API_KEY`.
- Super admin utility: `backend/make_superadmin_simple.py list|<email>`.
---
## ConnectWise PSA
Reference: `docs/connectwise/` — start with `CONNECTWISE-API-REFERENCE.md`, then the `best-practices/` guides. Extracted OpenAPI spec in `connectwise-psa-resolutionflow-reference.json` (670 endpoints, v2025.16); full spec in `connectwise-psa-openapi-full.json`.
- **Auth:** API Key (Base64 `companyId+publicKey:privateKey`) + `clientId` header every request. `clientId` is server-side (`CW_CLIENT_ID` in `config.py`) — identifies ResolutionFlow, not per-tenant. Per-connection: `company_id`, `public_key`, `private_key`, `server_url`.
- **Architecture:** `services/psa/` provider pattern — `PSAProvider` base, `ConnectWiseProvider` impl, `PsaProviderRegistry` for multi-PSA dispatch. Credentials encrypted at rest via `services/psa/encryption.py` (Fernet). Per-team credentials, never per-user. Endpoints in `api/endpoints/integrations.py`. In-memory TTL cache in `services/psa/cache.py`.
- **Integration flows:** session docs → ticket notes (`POST /service/tickets/{id}/notes`, markdown supported); ticket context → FlowPilot; callbacks via `/system/callbacks` with HMAC verification.
- **API rules:** pin version via Accept header `application/vnd.connectwise.com+json; version=2025.16`. Paginate ≤1000/page. Dynamic base URL via `/login/companyinfo/{companyId}`. Request minimal permissions (MY, not ALL).
---
## Coding standards
- **Python:** type hints everywhere, async/await for DB, Pydantic v2, `DateTime(timezone=True)` always.
- **TypeScript:** interfaces for all data, `const` over `let`, functional components + hooks, shared logic in custom hooks.
- **Git:** feature branch before committing (`git checkout -b feat/feature-name`). Commit format: `type: description` (feat/fix/refactor/docs/test/chore). Large features: commit per phase with `npm run build` validation. Push to Gitea — auto-mirrors to GitHub (`.gitea/workflows/mirror-to-github.yml`); never push GitHub directly. (Agent-specific `Co-Authored-By` trailers live in CLAUDE.md / AGENTS.md.)
**After shipping:** update [CURRENT-STATE.md](../CURRENT-STATE.md) + [03-DEVELOPMENT-ROADMAP.md](../03-DEVELOPMENT-ROADMAP.md), `gh issue close #N` for resolved issues, add lessons only for non-obvious traps (otherwise let the code speak).
---
## Common tasks
- **New endpoint:** `endpoints/``router.py``schemas/` → tests → frontend API client.
- **New page:** `pages/` → route in `router.tsx` → nav in `AppLayout.tsx`.
- **New public route:** top-level in `router.tsx` alongside `/login`, not inside `ProtectedRoute`.
- **New frontend API module:** types in `types/` → export from `types/index.ts` → client in `api/` → export from `api/index.ts`.
- **Schema change:** update model → `alembic revision -m "desc"` → review → `alembic upgrade head`.
- **New `VITE_*` env var:** add as `ARG` + `ENV` in `frontend/Dockerfile` for Railway builds (Lesson 60 — Railway env vars are runtime-only, Vite bakes at build time).
- **Account sub-page:** add route in `router.tsx` under `account` children + add link card in `AccountSettingsPage.tsx``AccountLayout` has NO sidebar nav.
---
## Design system
**Source of truth: [DESIGN-SYSTEM.md](../DESIGN-SYSTEM.md).** Read before any visual change.
- Flat high-contrast dark theme, Sentry/PostHog-inspired. **No** glass, backdrop blur, ambient orbs, gradient surfaces.
- Accent **electric blue** (#60a5fa dark / #2563eb light) — ≤5% of UI, interactive elements only. Warning amber (#fbbf24), info cyan (#67e8f9), success green (#34d399), danger red (#f87171). Each with `-dim` at 10% opacity.
- Backgrounds: `bg-sidebar` (#0e1016) → `bg-page` (#16181f) → `bg-card` (#1e2028) → `bg-elevated` (#2a2d38). Borders `border-default` / `border-hover`.
- Text: `text-heading``text-primary``text-muted-foreground``text-muted`.
- Fonts: IBM Plex Sans (body), Bricolage Grotesque (heading, 700 weight for logo), JetBrains Mono (code).
- Logo: 30px gradient square (ember orange) + "ResolutionFlow" in Bricolage Grotesque. Assets in `brand-assets/`, `frontend/src/assets/brand/`, `frontend/public/icons/`.
- Mockups: `docs/mockups/` (HTML).
- **Deprecated — do not use:** glass-card, glass-stat, `bg-gradient-brand`, `backdrop-filter: blur()`, ambient orbs, purple gradients, ember orange as accent, cyan as accent (cyan is info only).
---
## Frontend patterns
- **Component basics:** `cn()` from `@/lib/utils`, Lucide icons, `Modal.tsx` for modals (mobile-responsive `items-end sm:items-center` + `max-w-full sm:max-w-lg`).
- **Types:** Create in `types/`, export from `types/index.ts`, `import type { T } from '@/types'`.
- **Routing:** `getTreeNavigatePath()` / `getTreeEditorPath()` from `@/lib/routing`. Tree editor is `/trees/new`. All dashboard session clicks → `/pilot/:id` regardless of `session_type`.
- **Lazy routes:** `lazyWithRetry` from `@/lib/lazyWithRetry.ts`, not `React.lazy` (auto-reload on stale chunks).
- **Public pages:** raw `fetch()` with full URL, NOT `apiClient` (which requires auth tokens).
- **Toast:** `toast.warning()` not `toast.warn()`. Import from `@/lib/toast` — methods: `success`, `error`, `warning`, `info`.
- **Assistant chat:** uses local React `useState`, not Zustand. All three send paths (`handleSend`, `sendPrefill`, `handleResumeNew`) must call `setShowTaskLane(true)` when response has actions/questions.
- **Chat backend wiring:** `aiSessionsApi.sendChatMessage``/ai-sessions/{id}/chat``unified_chat_service.py`. NOT `assistant_chat_service.py` (removed except retention settings).
- **FlowPilot:** Actions live in page header (Resolve/Escalate/Share Update + overflow). `useBlocker` for active-session nav guard. "Pause & Leave" auto-pauses.
- **AI markers:** `[QUESTIONS]`, `[ACTIONS]`, `[FORK]`, `[DELTA]...[/DELTA]` (editor), `[TREE_UPDATE]` (troubleshooting builder), `[STEPS_UPDATE]` (procedural builder), `[METADATA]`. Parsed in `unified_chat_service.py`; conversation history stores stripped `display_content`. If markers disappear: check system-prompt final reminder + per-user-message `[SYSTEM: ...]` injection in `_call_anthropic_cached()`.
- **Image uploads:** paste/attach → Railway S3 via `uploadsApi.upload()` → resized by `storage_service.resize_image_for_vision()` (Pillow, 1568px max, PNG→JPEG) → base64 → Claude multimodal blocks. Max 3/msg. Images NOT stored in history.
- **Async select-load-apply:** guard with a ref (pattern in `AssistantChatPage` `currentChatRef`). Update synchronously on every selection change; after every `await`, bail out if `ref.current !== thisId`.
- **Editor-Embedded Flow Assist:** `EditorAIPanel` (320px side panel) + `useEditorAI`. Ghost nodes via `_suggestion: true`. Route actions via `settings.get_model_for_action()`.
- **Script Builder:** `/script-builder`, chat-style. Backend `ScriptBuilderSession`, `script_builder_service.py`, endpoints `/scripts/builder/`. FlowPilot handoff via `action_type: "open_script_builder"` + `sessionStorage`.
- **Intake form field schema:** `variable_name` + `field_type` (NOT `name` / `type`).
- **Node field priority** (copilot, summaries): `title``question``description``content``label`.
- **Procedural sessions auto-start** on page load (no intake/Start screen). Troubleshooting flows DO have a start screen.
---
## Critical lessons
> Lessons 1-40 archived to [docs/LESSONS-ARCHIVE.md](../docs/LESSONS-ARCHIVE.md) — fixes baked into the codebase. **Grep the archive when an error message or symptom is unfamiliar, or after two failed attempts at resolving an issue.** Don't pre-load for routine work.
### Backend / data
- **APScheduler interval jobs always `max_instances=1`** — without it, overlapping runs reprocess records (TOCTOU).
- **`get_db` rolls back on exception** — never remove the `await session.rollback()`, or one failed request poisons the connection with `InFailedSQLTransaction` cascading.
- **Startup routines on tenant-isolated tables must use `_admin_session_factory()`, not `get_db()`.** Phase 4 RLS has no `app.current_account_id` set at startup. `get_service_account_id` is safe (reads cached `app.state`).
- **Backfill migrations adding `account_id`:** grep ALL `ModelClass(` sites in service code to verify `account_id=` is passed. SQLAlchemy accepts `None` silently — Phase 4 RLS WITH CHECK surfaces the problem at runtime as `InsufficientPrivilegeError: new row violates row-level security policy`.
- **`tree_shares.account_id = tree.account_id`**, never `current_user.account_id`. A super_admin sharing another tenant's tree must produce the share in the tree owner's tenant, or it becomes invisible post-RLS.
- **Global tables (no `account_id`, never in RLS migrations):** `script_categories`, `platform_steps`, `template_trees`, `plan_feature_defaults`, `accounts`. Scan at class level — one `.py` file can hold multiple classes with different columns (e.g. `ScriptCategory` vs `ScriptTemplate`).
- **`ai_sessions.status` is VARCHAR(30)** — fits `requesting_escalation` (23 chars). Migration `f0aad74ea51b` widened from 20.
- **PostgreSQL `func.sum(case(...))` returns `Decimal` via asyncpg** — cast to `int()` before Pydantic `dict[str, Any]`.
- **Enhancement / branch_addition proposals need `modified_flow_data` via "Edit & Publish"** — backend 400 on direct approve. Only `new_flow` supports direct approve.
- **Adding email types:** static async method on `EmailService` in `core/email.py`. Fire-and-forget from endpoints (log errors, don't fail the request).
### AI / FlowPilot
- **Anthropic SDK `max_retries=1`** — default of 2 can take 3× the timeout.
- **Model tier routing:** `settings.get_model_for_action(action_type)`. Always alias form (`claude-sonnet-4-6`).
- **FlowPilot must ask GUI-vs-script before suggesting either** when both are viable — see `FLOWPILOT_SYSTEM_PROMPT` in `flowpilot_engine.py`.
- **Telemetry events to grep:** `anthropic.cache` (prompt-cache hit/create), `mcp.turn` (per-turn MCP availability), `mcp.fallback` (MCP silent-retry fired).
- **Don't put literal payloads in system prompts.** Bit us twice in one day: a worked `[QUESTIONS]` example with literal "Outlook + jsmith" content, and a full DNS troubleshooting tree, both caused Claude to recite that content on unrelated tickets — the symptom looked like task-lane state leaking across chats. The fix is structural: every output example in a system prompt uses `<placeholder>` syntax (`{"text": "<one short, specific question>"}`), never literal field values. Real-looking format examples live in few-shot messages (separate file, separate code path), not system prompts. Guardrail: `tests/test_prompt_anti_parrot.py` scans every `*_PROMPT`/`*_SCHEMA`/`*_PROTOCOL`/`*_FORMAT` constant in `app/services/` and `app/core/`; CI fails when a marker block contains a literal JSON value or when a known leaked token (jsmith, DC01, ADSync, Dnscache, etc.) appears anywhere in a prompt.
### Frontend / UI
- **Flex height chain:** every ancestor from `app-shell` grid to React Flow canvas needs `flex` + `flex-1` + `min-h-0` or `h-full`. Missing `flex` collapses to 0. Same rule for FlowPilot action bar and any tall scroller.
- **React Flow CSS in Tailwind v4:** import in `index.css`, not component JS. Override dark theme via `--xy-*` CSS vars.
- **`text-secondary` renders invisible on dark** — Tailwind v4 maps it to `--color-secondary` (a surface color). Use `text-muted-foreground` for readable secondary text. Avoid `text-muted` for body — labels only.
- **`bg-accent` is electric blue — never for code/kbd.** Use `bg-white/[0.12] border border-white/[0.06]` for inline code, `bg-white/[0.08]` for kbd. Accent reserved for interactive elements.
- **`landing.css` uses self-contained `--lp-*` vars** — never `var(--color-*)` theme tokens (they resolve incorrectly outside the app shell).
- **Never `transition: all`** — list properties explicitly, or layout props animate and jank.
- **Date range filter end dates:** `setHours(23, 59, 59, 999)` before sending, or the day's items are excluded. For string-based date inputs, append `T23:59:59.999Z`.
- **TopBar search:** full bar `hidden sm:block`, icon button `sm:hidden` — both open CommandPalette.
- **Hover pop-out cards:** scrim `pointer-events-none`, expanded card has its own click handler at `z-50`, dismiss via `onMouseLeave` on wrapper. Never put handlers on the scrim.
- **`tsc -b` in Dockerfile is stricter than `tsc --noEmit`** — enforces `noUnusedLocals` / `noUnusedParameters` as hard errors. Check IDE yellow squiggles before pushing.
- **Dashboard prefill auto-submits** via `useEffect` + `prefillHandledRef` guard — no double-enter.
- **Global Axios 5xx interceptor fires before component `.catch()`** — fix optional-data endpoints at the source (return `[]` / `{}` on provider failure), not in the component.
- **Playwright strict mode:** scope selectors to avoid sidebar/main ambiguity. Use `getByRole('heading', { name })` or `.animate-scale-in` locators, not bare `getByText()`.
### Env / infra
- **Node 20.19+ required** (Vite 7). `nvm use 20` or `PATH="$HOME/.nvm/versions/node/v20.19.0/bin:$PATH"`.
- **Railway backend service is `patherly`, DB name `railway`.** Public Postgres proxy: `interchange.proxy.rlwy.net:45797`.
- **Railway Object Storage bucket `resolutionflow-uploads`.** Env vars `STORAGE_*`. boto3 in `storage_service.py`. Dockerfile needs Pillow + `libjpeg-dev` / `zlib1g-dev`.
- **PostHog:** `PostHogProvider` + `posthog.init()` in `main.tsx`. Helpers in `lib/analytics.ts`. Env: `VITE_PUBLIC_POSTHOG_KEY`, `VITE_PUBLIC_POSTHOG_HOST`. `identifyUser()` in `authStore.fetchUser()`, `resetAnalytics()` on logout.
- **bun PATH on devserver01:** `BUN_INSTALL="$HOME/.bun"`, `PATH="$BUN_INSTALL/bin:$PATH"`. Playwright Chromium needs `libatk1.0-0 libatk-bridge2.0-0 libcups2 libxkbcommon0 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2`.
- **Full-stack change:** trace schema → endpoint → API client → hook → store → UI. Don't assume one end proves the other.
- **Dev env** — see [DEV-ENV.md](../DEV-ENV.md) for current topology, `REPO_ROOT` requirement when compose runs inside a container, Vite `allowedHosts`, linuxserver.io `group_add` + custom-cont-init.d workaround, `docker compose up` no-op-on-unchanged-hash gotcha.
---
## Quick reference
| What | Where |
|---|---|
| Detailed status | [CURRENT-STATE.md](../CURRENT-STATE.md) |
| Roadmap | [03-DEVELOPMENT-ROADMAP.md](../03-DEVELOPMENT-ROADMAP.md) |
| Design system | [DESIGN-SYSTEM.md](../DESIGN-SYSTEM.md) |
| Dev env | [DEV-ENV.md](../DEV-ENV.md) |
| Archived lessons | [docs/LESSONS-ARCHIVE.md](../docs/LESSONS-ARCHIVE.md) |
| ConnectWise API | `docs/connectwise/` |
| GitHub issues | `gh issue list --state open` |
| Local API docs | <http://localhost:8000/api/docs> |
| Handoff system | [.ai/README.md](README.md) |

42
.ai/README.md Normal file
View File

@@ -0,0 +1,42 @@
# .ai/ — dual-agent handoff system
ResolutionFlow uses two coding agents: **Claude Code** (primary) and **OpenAI Codex** (resume when Claude hits session or weekly limits). This directory holds the shared state that lets either agent start a session with full context.
## Files
| File | Holds | Written when | Read when |
|---|---|---|---|
| [PROJECT_CONTEXT.md](PROJECT_CONTEXT.md) | Stable repo truth: stack, structure, SaaS shape, ConnectWise, coding standards, frontend patterns, critical lessons | Only when the repo's shape changes | Every session start |
| [CURRENT_TASK.md](CURRENT_TASK.md) | The single active task: goal, DoD, assumptions, out-of-scope | On task start; status updates during work | Every session start |
| [HANDOFF.md](HANDOFF.md) | Exact resume point: branch, where you left off, next steps, blockers | On session end / context-window limit | Every session start (most important) |
| [TODO.md](TODO.md) | Backlog of work NOT currently active | When deferring or queueing work | Only when `CURRENT_TASK.md` is `complete` |
| [DECISIONS.md](DECISIONS.md) | Append-only architectural decision log | When an architectural choice is made | Skim top entries each session |
| [SESSION_LOG.md](SESSION_LOG.md) | Append-only chronological history | On session end | Only when broader context is needed |
Agent-specific tooling lives at the repo root:
- [../CLAUDE.md](../CLAUDE.md) — Claude Code's tooling (GitNexus, gstack slash commands, Claude trailer)
- [../AGENTS.md](../AGENTS.md) — OpenAI Codex's tooling (grep/rg fallbacks, Codex trailer)
Both root files contain an **identical shared-protocol block**. If you edit one, edit the other.
## The handoff ritual
At session end (limit hit, task complete, or user stop): update `HANDOFF.md` to reflect the new resume point, update `CURRENT_TASK.md` status if it changed, append to `DECISIONS.md` if you made an architectural call, append a session entry to `SESSION_LOG.md`, and WIP-commit any dirty working tree with `wip(handoff): <one-line>` unless told otherwise. Don't push.
## How to invoke a resume
Tell the agent:
> Read CLAUDE.md (or AGENTS.md) and follow its instructions.
The agent will read its root file, which directs it to `.ai/PROJECT_CONTEXT.md`, `.ai/CURRENT_TASK.md`, and `.ai/HANDOFF.md` before doing anything else.
## Recovery
The previous monolithic CLAUDE.md is recoverable via:
```bash
git show pre-ai-handoff:CLAUDE.md
```
(Tag `pre-ai-handoff` on commit `e110fed` — the snapshot taken before this migration.)

348
.ai/SESSION_LOG.md Normal file
View File

@@ -0,0 +1,348 @@
# SESSION_LOG.md
> Append-only chronological record. Newest entries at the top. Skim when broader context is needed.
> Entry format:
>
> ```
> ## YYYY-MM-DD HH:MM <timezone> — <agent> — <one-line summary>
> - What was accomplished
> - What was left for next session
> - Files touched
> ```
---
## 2026-05-07 11:45 EDT — Codex — Push PR #162 CI runner setup fixes
- Inspected Gitea PR #162 via public API. PR head was `380fcf7` and all CI jobs failed quickly; pushed local commits through `4a37a47`, including Python 3.12 setup for Gitea backend/e2e jobs.
- New run on `4a37a47` showed frontend still failed quickly while backend/e2e remained pending. Root cause likely same class of runner drift: Gitea frontend/e2e jobs used `npm` without setting up Node.
- Added explicit `actions/setup-node@v4` with Node 20 to Gitea frontend and e2e jobs. This keeps CI from relying on runner ambient Node/npm.
- Files touched: `.gitea/workflows/ci.yml`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-05-07 11:30 EDT — Codex — Standardize backend Python on 3.12
- Standardized repo declarations around Python 3.12: added `.python-version` pinned to 3.12.13, updated stale Python 3.11 docs, and added explicit Python 3.12 setup steps to Gitea CI. GitHub CI was already updated to Python 3.12 by the user.
- Installed pyenv Python 3.12.13 and created `backend/venv` from that interpreter. Installed `backend/requirements-dev.txt` into the venv.
- Verified native `python --version` and venv `python --version` both report 3.12.13. Verified native `pytest 8.4.2` and `alembic 1.18.3` with explicit safe test env vars; plain pytest import still depends on local `.env` values being valid.
- Rebuilt and restarted the dev backend container with `docker compose -f docker-compose.dev.yml build backend` and `up -d backend`; confirmed `docker exec resolutionflow_backend python --version` reports 3.12.13.
- Files touched: `.python-version`, `.gitea/workflows/ci.yml`, `.github/workflows/ci.yml`, `README.md`, `DEV-ENV.md`, `.ai/PROJECT_CONTEXT.md`, `.ai/DECISIONS.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-05-07 11:14 EDT — Codex — Recheck native Python availability
- Re-ran the startup ritual and checked the host Python state after the user reported fixing the missing native Python issue.
- Verified `python` and `python3` resolve to `/config/.pyenv/shims/*` and run Python 3.12.10. `pip` and `pip3` are available as pip 25.0.1 under the same pyenv install.
- Confirmed there is no native `python3.11`, pyenv currently lists only `3.12.10`, no repo virtualenv exists under `backend/venv`, `backend/.venv`, or root `.venv`, and `python -m pytest --version` from `backend/` fails with `No module named pytest`.
- Conclusion: native Python is present, but it is not yet a ready backend dev/test environment for ResolutionFlow. Docker remains the reliable path for pytest/alembic until a Python 3.11 virtualenv with `backend/requirements*.txt` is installed.
- Files touched: `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-05-06 — Claude — Self-serve signup Phase 2 (frontend + cutover code) shipped on `feat/self-serve-signup-phase-2`
- Executed Tasks 2744 of `docs/superpowers/plans/2026-05-06-self-serve-signup-phase-2-frontend-cutover.md` via `superpowers:subagent-driven-development`. 18 commits on `feat/self-serve-signup-phase-2` (off `main` `f918b76`); HEAD `c75ce0c`. Each task: dispatched implementer subagent with full task text + curated context, then spec-compliance + code-quality review subagents; review issues either fixed in-flight via `git commit --amend` or noted as deferred scope.
- Backend (Phase I, Tasks 2731): `BillingService.open_customer_portal` + `GET /billing/portal-session`; `PATCH /users/me/onboarding-step` + dismiss-rest sibling; public `POST /sales-leads` (5/hr/IP); `/admin/plan-limits` GET/PUT round-trips `plan_billing` in one transaction with NOT-NULL guards on `display_name|is_public|is_archived|sort_order`; `BillingService.invalidate_billing_cache` no-op stub; `GET /config/public` (`{self_serve_enabled, oauth_providers}`); `auth/register` invite-code gate now `REQUIRE_INVITE_CODE and not SELF_SERVE_ENABLED and not invite_code`. Also (T36): `GET /accounts/invites/{code}/lookup` (public, joinedload account+inviter); OAuth callback honors `account_invite_code+invited_email`, rejects existing-email user with `email_already_registered_use_login`. Also (T42, T44): `GET /plans/public`; `POST /beta-signup` returns 307 to `${FRONTEND_URL}/register?from=beta`. `OnboardingStatus` extended with `email_verified`+`shop_setup_done`; `UserResponse` exposes `onboarding_step_completed`+`onboarding_dismissed`.
- Frontend (Phases JN, Tasks 3244): `useBillingStore` Zustand store + `useBillingPoll` mounted in `AppLayout`; `useFeature` / `useFeatureLimit` (60s module cache, lazy `/usage/{field}` fetch with silent fallback — endpoint deferred) / `useTrialBanner` (fractional-day boundary so 24h = warning); `FeatureGate` / `UpgradePrompt` (inline `FEATURE_CATALOG`) / `EmailVerificationGate` (mounted in AppLayout around `<ViewTransitionOutlet />`). `RegisterPage` redesign with OAuth buttons + invite-code conditional; `OAuthCallbackPage` with CSRF state validation + UTF-8-safe base64url state encoding (factored into `lib/oauthState.ts`); `useAppConfig` hook. `AcceptInvitePage` at `/accept-invite` with locked email; `EmailVerificationBanner` refactored to design-system tokens; `EmailVerificationWall` polished; `VerifyEmailPage` at `/verify-email` with single-fire ref guard; `WelcomeRouter` + `WelcomeStep1/2/3` at `/welcome*`; `TrialPill` in topbar (8 stages); `NextStepCard` + `SetupChecklist` (replace orphaned `OnboardingChecklist`); `PricingPage` at `/pricing`; `ContactSalesPage` at `/contact-sales`; `LandingPage` got "See pricing" CTA + replaced beta-signup form with `<Link>`.
- Final cross-cutting review caught one real bug — relative `/beta-signup` 307 target landing on API origin instead of frontend — fixed via amend (HEAD `c75ce0c`).
- Tests: ~165+ new tests across backend pytest + frontend vitest. Sweep at end-of-branch all-green; tsc -b clean.
- Phase O (Tasks 4547) is explicit manual operations: Stripe live-mode setup, internal validation via `INTERNAL_TESTER_EMAILS` per-email allowlist (backend support for that allowlist is NOT yet built), feature-flag flip + week-1 monitoring. Surfaced as the resume point in HANDOFF.md.
- Working tree was dirty before this session (`.ai/HANDOFF.md`, `.env.example`s, `core.*` core dumps, `docs/architecture/`, `docs/tutorials/`); intentionally not staged into Phase 2 commits. Files touched: see `git log --oneline f918b76..HEAD` on `feat/self-serve-signup-phase-2`.
---
## 2026-05-02 ~01:00 UTC — Claude — In-product User Guides Diátaxis rewrite shipped (PR #159)
- Audited the in-product `/guides` collection against live UI via `/browse` (engineer + owner test users). Existing 15 guides predated the FlowPilot pivot — every "click X in the sidebar" reference was wrong (Dashboard → Home, All Flows → Flows, Sessions → History, Exports gone, etc.). Three guides described surfaces that no longer exist: Maintenance Flows, AI Assistant page, Flow Assist Sparkles button. Findings written to `/tmp/guides-audit.md`.
- Rebuilt `frontend/src/data/guides.ts` from scratch as 43 problem-oriented Diátaxis how-tos under 10 categories. Single-outcome each, terse imperative steps, real UI labels (Create New, Sign in, Manage, Build New Script, Send Invite, Save Settings, Create Category, etc.). Added `category: CategoryId` and optional `relatedSlugs?: string[]` to the `Guide` interface; new `Category` type and `categories` const drive the hub layout. `GuidesHubPage` now renders category sections (auto-hides empty); `GuideDetailPage` renders a Related guides footer; `GuideCard` lost its misleading "N sections" subtitle.
- Fixed `GuideSection.tsx`: `step.tip` was rendered as plain text so `**bold**` markdown in tips rendered literally. Applied the same regex replacement used on `step.instruction`. Verified against `/guides/start-a-session` tip block.
- Authored 14 net-new how-tos for FlowPilot-era surfaces with no prior coverage: tasklane-keyboard-flow, view-what-we-know, ask-ai-mid-session, pause-and-leave-session, resolve-a-session, record-suggested-fix-outcome, escalate-a-session, post-docs-to-ticket, send-client-update, build-script-from-scratch, open-suggested-flow, pin-a-flow, invite-teammate. Dropped change-teammate-role from scope — couldn't verify the role-change UI control without a non-owner test member.
- Verified owner-only surfaces with `pro@resolutionflow.example.com`: Membership inline form on `/account` (not a separate `/team-members` route), `/account/categories` real button is **Create Category** (not Add), `/account/chat-retention` real fields are **Retention Period (days)** + **Max Conversations** + **Save Settings**, `/account/integrations` form fields confirmed. Three guides corrected post-audit.
- Smoke-tested all 43 detail pages — every slug renders, no "Guide Not Found" fallthroughs.
- Added `100.64.78.44 docker-01` entry to `/etc/hosts` (user ran `sudo tee` from a normal terminal because the LXC `!` shell prefix can't drive interactive sudo). Should now persist across `/browse` sessions on this LXC.
- `docker exec -w /app resolutionflow_frontend npx tsc -b` clean.
- Files touched: `frontend/src/data/guides.ts`, `frontend/src/pages/GuidesHubPage.tsx`, `frontend/src/pages/GuideDetailPage.tsx`, `frontend/src/components/guides/GuideCard.tsx`, `frontend/src/components/guides/GuideSection.tsx`, `CHANGELOG.md`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`. Working tree dirty — user not yet asked to commit.
---
## 2026-05-01 21:55 UTC — Claude — Session-screen impeccable pass + tasklane keyboard flow shipped (PR #158)
- Ran the `/impeccable` skill against the assistant chat session screen (chat history / chat bar / TaskLane). Initial design-health score: 24/40 with explicit DESIGN-SYSTEM violations (gradient surfaces in WhatWeKnow + ProposalBanner, side stripes in TaskLane done states + every banner mode, accent borderTop on lane header, backdrop blur on handoff overlay).
- Walked through all 5 impeccable sub-passes (distill, quieter, layout, typeset, polish). Score after pass: 33/40 (+9). Biggest gains in Aesthetic & Minimalist (1→3), Consistency & Standards (1→3), Recognition Rather Than Recall (2→4).
- Inline iterations on top of the impeccable steps: linked banner ↔ script-panel lifecycle (collapse hides both, dismiss closes both, any outcome closes both); collapsible WhatWeKnow with `sessionStorage` memory + auto-collapse-at-5-facts; full keyboard flow on TaskLane (Enter submits + auto-advances, Shift+Enter newline, Esc cancels, focus jumps to Send Responses after the last task).
- Side fix: `ParameterizationPreview` was over-highlighting short parameter values (a `"D"` lit up every capital D in `Get-ADUser`/`Add-Type`/etc.). Added a word-boundary guard, conditional on whether the value itself starts/ends with a word character so values with leading punctuation (`"D:\\Folder"`) still match cleanly.
- Followups logged in `.ai/TODO.md`: `ConcludeSessionModal` multi-select for paused/escalated outcomes (real feature work — engineers often need ≥2 of Ticket Notes / Client Update / Email Draft), and `bg-card-hover` Tailwind drift in `CommandPalette` (silently broken classes — two-line fix).
- Branched as `feat/session-distill-quieter`, 4 commits (impeccable pass, parameterize fix, TODO followups, hint contrast + font-sans audit). PR #158 created via Gitea API (`$GITEA_TOKEN` env, no `gh` on this LXC). Merged into `main` as `5e10005`. Local branch deleted.
- Validation at every commit boundary: `docker exec -w /app resolutionflow_frontend npx tsc -b`, `npm run lint`, and `npm run build` all clean.
- Files touched: 14 frontend files (TaskLane, AssistantChatPage, ChatMessage, ProposalBanner, WhatWeKnow, WhatWeKnowItem, SuggestedFlowCard, ChatSidebar, ConcludeSessionModal, ChatTabStrip, ActionCardGroup, AddNoteButton, ParameterizationPreview), `.ai/TODO.md`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `CHANGELOG.md`, `CURRENT-STATE.md`.
## 2026-05-01 07:20 UTC — Codex — Start issue cleanup plan sections 1 and 2
- Started `docs/plans/2026-05-01-issue-cleanup-plan.md` sections 1 and 2.
- Cleaned frontend lint to zero warnings by removing stale lint disables, tightening hook dependencies, and adding justified comments where effects are intentionally keyed to route or owner identity.
- Added e2e selectors for session history controls and the FlowPilot command-palette entry.
- Added `AssistantChatPage` observability for unexpected `currentChatRef` stale async discards.
- Added `TaskLane` diagnostic help affordances for common command categories and documented #128 as "keep the existing responsive side-panel/bottom-drawer behavior until pilot feedback says otherwise."
- Verified `npm run lint`, `npx tsc -b`, and `npm run build` in `resolutionflow_frontend`; build only reported the existing Vite large-chunk warning.
- Files touched: frontend lint-cleanup files, `frontend/src/components/assistant/TaskLane.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `frontend/src/pages/SessionHistoryPage.tsx`, `frontend/src/components/layout/CommandPalette.tsx`, `docs/plans/2026-05-01-issue-cleanup-plan.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-05-01 06:05 UTC — Codex — Clean stale TODOs and add issue cleanup plan
- Removed the resolved pytest-xdist item from `.ai/TODO.md` and reset "Up next" to no selected task.
- Removed the resolved "Add role gate to handoff claim endpoint" backlog item from `.ai/TODO.md`.
- Updated the frontend lint cleanup TODO from 23 warnings to the current `npm run lint` result: 24 warnings, 0 errors.
- Tried to close Gitea #127 through the API, but this environment has no Gitea token; API returned `401 token is required`.
- Added `docs/plans/2026-05-01-issue-cleanup-plan.md` with safe tracker actions and a recommended order for clearing remaining issues.
- Files touched: `.ai/TODO.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `docs/plans/2026-05-01-issue-cleanup-plan.md`.
## 2026-05-01 05:40 UTC — Codex — Audit TODO backlog and Gitea issue validity
- Compared `.ai/TODO.md`, inline code TODOs, and open Gitea issues against current `main`.
- Verified pytest-xdist is already shipped (`backend/requirements-dev.txt`, `backend/tests/conftest.py`, `.gitea/workflows/ci.yml`) so the `.ai/TODO.md` xdist item is stale. Ran frontend lint in Docker; current state is `0 errors, 24 warnings`, so the lint cleanup item remains valid but its count is stale.
- Verified Gitea issue status: #58, #60, #128, #129, #130 remain valid; #66 is partially resolved by current `.rfflow` import/export and should be narrowed to template packs/marketplace; #127 is mostly resolved by current UI copy and prompt boundaries unless an always-visible scope badge is still wanted. Open PR #124 is stale/unmergeable against current `main`.
- Verified inline TODOs still valid: post-session contextual feedback prompt, FlowPilot analytics domain/time-entry placeholders, prompt-cache verification note unless live telemetry has confirmed it, proposal `modify` flow editor wiring, and procedural ghost-step accept/dismiss buttons.
- Files touched: `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-05-01 03:45 UTC — Claude Opus 4.7 — QA, merge, and ship PR #156 pending-verification
- Committed two logical units of pending work on `feat/fix-pending-verification`: prior session's local review fixes as `5bee264` (Codex-attributed, 5 source files + 3 `.ai/` notes) and this session's docker-exec docs as `15042af` (Claude-attributed, `.ai/PROJECT_CONTEXT.md` + `AGENTS.md`). Cleaned up a 20MB `core.22120` Chromium dump left behind by an earlier sandbox crash.
- Resolved a tooling gap surfaced by Codex's prior session ("npm/python/python3 are not on the host path") by documenting that this code-server LXC uses bun + docker for the toolchain. The `docker exec resolutionflow_{backend,frontend}` form is now the canonical command pattern in `.ai/PROJECT_CONTEXT.md`.
- Got `$B`/Playwright Chromium running in the code-server LXC. After the user's restart cleared the AppArmor unprivileged-userns block, Chromium still aborted at the deeper `sandbox/linux/services/credentials.cc` layer because of the LXC namespace constraint. Workaround: launch browse with `CONTAINER=1` so it auto-adds `--no-sandbox`. Also added `100.64.78.44 docker-01` to code-server's `/etc/hosts` (via `docker exec -u 0`) so the headless browser could resolve the bake-in `VITE_API_URL`.
- Drove `/qa` against the dev stack at `http://100.64.78.44:5173`. No naturally-occurring `applied_pending` fix existed in the DB, so seeded session `4a558056-bcbd-4b51-925b-248d70eb318d` and fix `cd4ff2fd-751a-4bcb-8cfa-3c77b4864fb2` into the test state (un-resolved session, swapped supersession on the two fixes). Saved a restore script first; verified DB matches pre-test state after teardown.
- QA result: 5/7 scripted checks PASS with concrete DB + UI evidence. Banner renders correctly ("Awaiting verification" header, "Parked" tag, fix title + pending_reason, 4 actions). "Update reason" updates server-side. "It worked" → `applied_success` with `verified_at` stamped. "Dismiss" → `dismissed` with no terminal timestamp. Page-level Resolve auto-patches `applied_pending``applied_success` before the resolution flow opens. Page-level Escalate fires `EscalateInterceptDialog` with the generalized "still needs an outcome" copy. 2 entry-path checks (VerifyingBanner overflow, nudge "Still checking") deferred because they require live AI-generated chat state to drive; the mutating handlers behind those entry paths are verified via the tested transitions. Report at `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md`.
- Pushed `feat/fix-pending-verification`. Polled Gitea actions runs 161; required `CI / frontend` and `CI / backend` plus `CI / e2e` all green. Merged via Gitea API as a merge commit (`3ba4532`).
- Post-merge cleanup: fast-forwarded local `main`, deleted `feat/fix-pending-verification` locally and on the remote. Wrote handoff updates on `chore/post-156-handoff` matching the prior `chore/post-153-handoff` pattern.
- Files touched (this session): `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/PROJECT_CONTEXT.md`, `.ai/SESSION_LOG.md`, `AGENTS.md`, `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md`, `.gstack/qa-reports/screenshots/01-08*.png`. Plus the two prior-session-authored commits committed by this session (5 source + 3 `.ai/` notes).
## 2026-05-01 02:24 UTC — Codex — Review-fix PR #156 pending-verification flow
- Reviewed PR #156 for bugs and found three actionable gaps: pending fixes could be resolved from the page-level Resolve path without updating the fix outcome, the PendingBanner lacked the dismiss action described in the PR body, and new system-prompt examples used real-looking pending reasons contrary to the prompt anti-parrot lesson.
- Applied fixes locally on `feat/fix-pending-verification`: page-level Resolve now patches `applied_pending` to `applied_success`; page-level Escalate now intercepts `applied_pending` before handoff; PendingBanner now has Dismiss; escalation intercept copy no longer says only "Verifying state"; generator prompts no longer include real-looking pending examples.
- Verified via running containers: prompt anti-parrot guardrail `2 passed`, suggested-fix outcome suite `21 passed`, frontend `npx tsc -b` clean, frontend `npm run build` clean except the existing Vite large-chunk warning, and `git diff --check` clean.
- Left for next session: browser QA PR #156 using CURRENT_TASK.md checklist, then commit/push local review fixes and merge.
- Files touched: `backend/app/services/resolution_note_generator.py`, `backend/app/services/escalation_package_generator.py`, `frontend/src/components/pilot/ProposalBanner.tsx`, `frontend/src/components/pilot/EscalateInterceptDialog.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/SESSION_LOG.md`.
## 2026-04-30 — Claude Code — Land PR #155, ship pending-verification feature on PR #156
- Committed Codex's review-pass changes (atomic conditional `UPDATE` for `claim_session`, self-claim 403, queue self-exclusion, pre-flush handoff UUID, frontend dead-code removal) as `f10649a` on `feat/escalation-metric-endpoint`.
- Pushed `feat/escalation-metric-endpoint`, un-drafted PR #155, retitled it (stripped "WIP:"), and merged via Gitea API as a merge commit (`ac42f97`). 4/4 CI checks green at merge.
- Picked up follow-up work surfaced by the user: the suggested-fix verifying banner forces a synchronous verdict, but real fixes are often async (waiting on client power-cycle, AD replication, license sync). Added a fourth, non-terminal outcome.
- Designed the model: new `FixStatus="applied_pending"` parallel to `applied_partial`. Distinct semantics — partial = "did some of it"; pending = "did all of it, can't verify yet." Distinct prose in the resolution-note + escalation-package generators.
- Implemented on a fresh branch `feat/fix-pending-verification` off main:
- Backend: extended `FixStatus`/`FixOutcome` literals, added `pending_reason` Text column and CHECK constraint update via Alembic migration `c0f3a4b7e91d`. `patch_outcome` accepts pending, requires notes, stamps `applied_at` only (NOT `verified_at`); pending in/out transitions allowed.
- Frontend: new `BannerMode='pending'` + `PendingBanner` component (info-tone, mirrors `PartialBanner`). "Waiting to verify…" added to `VerifyingBanner` overflow menu. `NudgeBanner` "Still checking" button now records `applied_pending` with a reason instead of just silencing for the session — closes the loop semantically. `AssistantChatPage` banner-mode derivation maps the new status.
- Tests: 4 new integration tests in `test_fix_outcome_endpoint.py` covering notes-required, reason-storage with applied_at-not-verified_at semantics, pending→success transition, and pending_reason update on re-PATCH. 21/21 pass.
- Validation: `tsc --noEmit -p tsconfig.app.json` exit 0; `alembic upgrade heads` applied cleanly.
- Single-commit PR #156 opened: https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/156. Branch rebased onto post-merge main.
- Cleanup: removed 10 stray `core.*` dumps from the worktree; deleted merged `feat/escalation-metric-endpoint` locally and on the remote.
- Files touched: `backend/app/models/session_suggested_fix.py`, `backend/app/schemas/session_suggested_fix.py`, `backend/app/api/endpoints/session_suggested_fixes.py`, `backend/app/services/resolution_note_generator.py`, `backend/app/services/escalation_package_generator.py`, `backend/tests/test_fix_outcome_endpoint.py`, `backend/alembic/versions/71efd2102f49_add_pending_status_to_suggested_fixes.py`, `frontend/src/api/sessionSuggestedFixes.ts`, `frontend/src/components/pilot/ProposalBanner.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/DECISIONS.md`.
---
## 2026-04-30 06:25 UTC — Codex — Apply Escalation Mode review fixes
- Reviewed the recent Escalation Mode wedge work and fixed the actionable findings before PR #155 is marked ready.
- Reworked `HandoffManager.claim_session` from read-then-write to an atomic conditional update, preserving idempotent same-user retries and returning a typed conflict for a different claimant.
- Blocked original engineers from claiming their own handoffs and filtered their own escalated sessions out of `/ai-sessions/escalation-queue`, preventing the post-escalation dashboard from showing a junior their own handoff.
- Fixed the compatibility payload so `session.escalation_package["handoff_id"]` is populated from a preassigned UUID before flush.
- Removed unused legacy frontend pickup state (`claiming`, `handleStartHere`, unused `onStartHere` destructuring) that made `tsc -b` fail under `noUnusedLocals`.
- Added regression coverage for pre-flush handoff IDs, conflict handling, self-claim rejection, successful non-owner claim, and own-escalation queue exclusion.
- Verified `git diff --check`; focused backend tests passed (`28 passed in 42.23s`); frontend `tsc --noEmit` checks passed for app and node configs. Full Vite/build script remains blocked by root-owned generated directories under `frontend/node_modules` / `frontend/dist` in this workspace, not by TypeScript errors.
- Files touched: `backend/app/services/handoff_manager.py`, `backend/app/api/endpoints/ai_sessions.py`, `backend/app/api/endpoints/session_handoffs.py`, `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `frontend/src/components/flowpilot/HandoffContextScreen.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-04-30 — Claude Code — Browser QA pass complete; chat ownership bug found and fixed; PR #155 ready
- Ran full browser QA pass on the escalation mode feature using gstack `/qa` skill.
- **Critical bug found and fixed (commit `dc69c9d`):** `POST /ai-sessions/{id}/chat → 400` when senior clicked "Get AI analysis" on the magic-moment screen. Root cause: `unified_chat_service.send_chat_message` checked `AISession.user_id == user_id` only; senior is stored as `escalated_to_id`, not `user_id`. Fix: `or_(AISession.user_id == user_id, AISession.escalated_to_id == user_id)` in the WHERE clause.
- **All 7 QA scenarios passed:**
- Post-escalation redirect: junior routed to `/` with "Session escalated" toast.
- Magic-moment screen: header, metadata, two-column AI assessment, 2-option CTA rendered correctly.
- "I'll take it from here": claim → dismiss overlay → composer focused.
- "Get AI analysis": claim → briefing sent → AI responded → task lane populated (after `dc69c9d` fix).
- Task lane copy button: toast + checkmark visual feedback.
- Chip expansion: inline detail card + "Open in Tasks panel" scroll.
- Post-claim toolbar re-open: dismissible mode with Close-only CTA.
- **Known non-blockers:** "Continue where X left off" path untestable on first pickup (`hasTaskLane=false` is correct v1 behavior). 409 race condition untestable with one senior account; backend logic code-reviewed and correct.
- Backend tests: 17/17 pass.
- Updated `HANDOFF.md` to reflect QA complete; updated `CURRENT_TASK.md` status to engineering+QA complete; appended architectural decision to `DECISIONS.md`.
- Branch `feat/escalation-metric-endpoint` is ready for PR #155 to be marked ready-for-review.
- **Files touched this session:** `backend/app/services/unified_chat_service.py`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/DECISIONS.md`, `.ai/SESSION_LOG.md`.
---
## 2026-04-29 04:30 EDT — Claude Code — Live QA bash, pickup bug fixes, AI summary consolidation surfaced
- User on a freshly swapped computer ran the live QA flow. Identified two bugs missed by static analysis from the previous session:
- **Pickup landed on a blank chat surface.** Root cause: commit `8914391` had made `activeChatId` initialize from `urlSessionId`, which broke the selectChat-gating effect in `AssistantChatPage` (`urlSessionId === activeChatId` short-circuited fresh mounts). Symptom was `selectChat` never firing post-claim; messages, conversation history, and pickup-flow correctness all silently broken.
- **Picked-up session missing from sidebar.** Root cause: `loadChats` runs once at mount; pre-claim the session's `escalated_to_id` is null (the junior didn't specify a target), so `listSessions` doesn't return it. Post-claim `claim_session` sets `escalated_to_id` to teamadmin, but the sidebar list never refreshes.
- Fixes (commit `0d1b305`):
- Replaced the `urlSessionId === activeChatId` gate with a `loadedChatIdsRef` set so selectChat fires once per URL session per page lifecycle, regardless of whether activeChatId already matches.
- Added `loadChats()` call in `handleStartHere` after the claim succeeds so the sidebar reflects ownership.
- Three additional pieces folded into `0d1b305` from the same QA bash:
- **Enter-to-submit on the escalate forms.** Chat-input convention: plain Enter submits, Shift+Enter inserts a newline. Added optional `onSubmit` prop to `RichTextInput` (used by `EscalateModal`) and inline `onKeyDown` on the plain textarea in `ConcludeSessionModal`. The user explicitly asked for this — they want to type the reason and hit Enter without reaching for the mouse.
- **Dashboard `PendingEscalations` rows expand to preview.** Click a row to reveal escalation reason + step count + confidence tier + PSA ticket number. Pick Up button click-stops to still go directly to magic moment. Single expansion at a time.
- **`ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` bumped 15 → 45.** Backend logs showed Sonnet hitting the 15s timeout in field testing. Background-task architecture (e8ba74e) means this no longer blocks the user — only bounds before publishing `has_assessment: false`. **Did NOT fix the live demo.** Assessment placeholder still permanent in user's test.
- Surfaced an architectural smell: the escalation flow makes **three** Sonnet calls — `_build_escalation_package_enhanced`, `_generate_ai_assessment`, and `generate_status_update` (engineer-triggered) — all summarizing the same source material from slightly different angles. User correctly observed: status update is typically generated during the escalate flow anyway; reusing that content would consolidate.
- Decided the right consolidation: ONE structured AI call per escalation that returns both the magic-moment diagnostic fields (`likely_cause`, `suggested_steps[]`, `confidence`) AND PSA-ready prose. Magic moment populates immediately. Status update buttons become tone-shift transformations (Haiku) of the saved prose, not fresh summarizations. Drops to 1 call (~60% token reduction), eliminates the AI-summary placeholder bug because the work happens in the foreground escalate path. Full implementation plan written into CURRENT_TASK.md and DECISIONS.md.
- Session ended pre-consolidation: user is updating Claude Code CLI and starting a fresh session for clean context window. All work pushed to origin (`0d1b305`). PR #155 still draft.
- Test users for the next session (Acme MSP shared account, password `TestPass123!`): `engineer@` (junior) and `teamadmin@` (senior).
- Files touched: `frontend/src/pages/AssistantChatPage.tsx`, `frontend/src/components/common/RichTextInput.tsx`, `frontend/src/components/flowpilot/EscalateModal.tsx`, `frontend/src/components/assistant/ConcludeSessionModal.tsx`, `frontend/src/components/dashboard/PendingEscalations.tsx`, `backend/app/core/config.py`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/DECISIONS.md`.
## 2026-04-28 02:00 EDT — Claude Code — Plan-locked wedge polish + structural task-lane fix
- Audited `docs/plans/2026-04-27-escalation-mode-wedge-design.md` against the branch and identified four locked-design / Codex-correction items not yet shipped: live AI assessment refresh, suggested-step chips, unread 6px dot on queue cards, and race-condition toast on claim conflict.
- Shipped all four in commit `0f00ee5`:
- **Live AI assessment refresh.** New `HandoffAssessmentReadyEvent` type and `onAssessmentReady` handler on `streamEscalations`. `AssistantChatPage` opens a scoped SSE subscription whenever it tracks a handoff missing its AI assessment; on a matching event it calls `handoffsApi.listHandoffs(sessionId)`, finds the handoff by id, and replaces both `magicHandoff` and `overlayHandoff` in place. Closes the loop on the async-assessment commit `e8ba74e` — without this, the senior had to manually reopen the Context overlay to see the AI assessment when the background task finished.
- **Suggested-step chips.** New `chipsHidden` state in `AssistantChatPage`; chip strip renders above the composer when the magic-moment dissolves and `magicHandoff?.ai_assessment_data?.suggested_steps[]` is non-empty. Click prefills input and focuses; first send via `handleSend` flips `setChipsHidden(true)`; explicit X button also hides. Per-session lifetime by design (Codex correction locked).
- **Unread 6px dot.** localStorage-backed seen set (`rf-escalation-seen`, capped at 200 entries) hydrated in `EscalationQueue`. Card render adds a 6px `bg-accent` dot when not in the seen set. `markSeen` called on Pick Up click AND on card body click (the "open" affordance). Hover deliberately doesn't clear (Codex correction). Pick Up button's onClick now calls `e.stopPropagation()` so it doesn't double-fire the card-open path.
- **Race-condition toast on claim conflict.** New `HandoffAlreadyClaimedError` exception class in `handoff_manager.py`. `claim_session` now eager-loads `claimed_by_user` via `selectinload`, rejects different-user re-claims (idempotent for same-user double-clicks), and raises with `claimed_by_id` / `claimed_by_name` / `claimed_at`. The endpoint translates to HTTP 409 with structured `detail = {error: 'already_claimed', claimed_by_id, claimed_by_name, claimed_at}`. `AssistantChatPage.handleStartHere` extracts via `axios.isAxiosError`, formats `"Already claimed by {name} {time_ago}."` using the existing `timeAgo()` helper, drops `?pickup=true`, and dismisses the magic-moment so the loser flows back to the queue. Backed by 2 new unit tests (`test_claim_session_conflict_raises_already_claimed`, `test_claim_session_idempotent_for_same_user`).
- User then reported that the task-lane stale-flash bug was still happening despite the prior fix `8914391` — "every time we work on something that's related to this, when we go back to test we create a new session and then the task lane shows unrelated session data." The previous fix only covered mount-time entry paths (prefill + pickup); any in-place transition still flashed.
- Shipped structural fix in commit `665530f`. Introduced `taskLaneOwnerChatId` state that explicitly tags which chatId the in-memory `activeQuestions` / `activeActions` / `showTaskLane` values belong to. Set at every populate site (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix). Cleared in `resetSessionDerivedState`. Persistence effect now writes `chatId: taskLaneOwnerChatId` (was `activeChatId` — that was the original write-side bug). Render gate `taskLaneIsForActiveChat = ownerChatId === activeChatId` ANDed into all three render conditions. The lane is structurally unable to display data tagged with a different chat. See DECISIONS entry. **Not yet verified in a real browser** — user is swapping computers and asked for the handoff first.
- The two commits `0f00ee5` and `665530f` are **local-only** at session end. The user did not explicitly authorize a push, so per the handoff rule the branch was left unpushed. First action on resume is `git push`.
- Tests: full handoff + escalation suite (`test_handoff_manager.py`, `test_session_handoffs_api.py`, `test_escalation_bus.py`, `test_flowpilot_analytics_escalations.py`) → 34 passed in 68.89s. Frontend `tsc -b` exit 0 after each commit.
- Files touched: `frontend/src/api/aiSessions.ts`, `frontend/src/components/flowpilot/EscalationQueue.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `frontend/src/types/ai-session.ts`, `backend/app/api/endpoints/session_handoffs.py`, `backend/app/services/handoff_manager.py`, `backend/tests/test_handoff_manager.py`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/DECISIONS.md`.
## 2026-04-27 22:30 EDT — Claude Code — Escalation Mode: unify /escalate through HandoffManager
- User pushed back on the dual-path proposal: "why would we want two different escalation methods? Should the new one just be the way we escalate regardless if we're using a PSA or not using a PSA?" Right answer. Unified everything through `HandoffManager`.
- Backend changes (commit `029680a`):
- `HandoffCreateRequest` gains optional `target_user_id`; rejects self-targeting.
- `HandoffManager.create_handoff` for intent='escalate' now does what the legacy `flowpilot_engine.escalate_session` used to: sets `session.escalation_reason` and `escalated_to_id`, builds the legacy AI-enhanced `escalation_package` via Sonnet (`_build_escalation_package_enhanced` lazy-imported with graceful fallback), and merges handoff metadata (`intent`, `handoff_id`, `snapshot`, `engineer_notes`) into it. Eager-loads `session.steps` + `session.user` via `selectinload` to dodge async lazy-load `MissingGreenlet` errors.
- New `HandoffManager.finalize_escalation`: generates `SessionDocumentation`, pushes to PSA, and runs `notify()` (bell-icon AppNotification + Slack/Teams external channels) — all pre-commit so persistent state lands atomically with the handoff. Pulls engineer name via a separate User query rather than relying on `session.user` lazy access.
- `dispatch_escalation_notifications` keeps only the fire-and-forget IO (bus publish + per-user emails) post-commit. Found and fixed an in-flight bug: had originally put `notify()` inside dispatch (post-commit), which left `Notification` rows uncommitted — moved into `finalize_escalation` (pre-commit).
- `/handoff` endpoint passes `target_user_id` through and calls `finalize_escalation` pre-commit.
- `/escalate` is now a thin shim: owner-only session lookup → `create_handoff(intent='escalate')``finalize_escalation` → commit → `dispatch_escalation_notifications` → return `SessionCloseResponse`. `flowpilot_engine.escalate_session` is no longer called by any endpoint.
- `pickup_session` accepts both `requesting_escalation` (legacy in-flight) and `escalated` (new canonical) so existing queue items migrate seamlessly.
- Escalation queue list (`/escalation-queue`) and sidebar count match either status.
- Frontend: `useFlowPilotSession` optimistic update flips status to `escalated` instead of `requesting_escalation` so the page state matches the unified backend response.
- Verified end-to-end live against the running dev stack: a single legacy `/escalate` call from `engineer@` produced status=`escalated`, a `SessionHandoff` row (`ea9b375a…`, intent='escalate'), a `SessionDocumentation`, a PSA push attempt (`no_psa` since no ticket), AND an `AppNotification` for `teamadmin@` with title "Session escalated by Jordan Tech" and link `/pilot/{session_id}?pickup=true`. Backend test suite: `1103 passed in 259.63s` with `-n auto`. Frontend `tsc -b` clean.
- The legacy `SessionBriefing` render branch in `FlowPilotSessionPage.tsx` is now effectively dead for any new escalation (magic-moment takes over via the handoff record), but stays in place during the transition for legacy in-flight `requesting_escalation` sessions. Slated for cleanup after pilots run a couple of weeks on the unified path. `flowpilot_engine.escalate_session` is similarly orphaned and can be deleted at the same time.
- Files touched: `backend/app/api/endpoints/ai_sessions.py`, `backend/app/api/endpoints/session_handoffs.py`, `backend/app/api/endpoints/sidebar.py`, `backend/app/schemas/session_handoff.py`, `backend/app/services/flowpilot_engine.py`, `backend/app/services/handoff_manager.py`, `frontend/src/hooks/useFlowPilotSession.ts`.
## 2026-04-27 21:50 EDT — Claude Code — Escalation Mode: bell-icon notification fix; push + draft PR
- User ran a live escalation test via the EscalateModal (legacy `/escalate` path) and reported that clicking the bell-icon notification "just clears the notification instead of taking me to the session". Diagnosed: navigation IS happening, but the notification link template was `/pilot/{session_id}` without `?pickup=true`, so the senior landed on `FlowPilotSessionPage` with no pickup mode. `loadSession` then hit `GET /ai-sessions/{id}` which 404'd because the senior wasn't owner / `escalated_to_id` / picked-up handler. The user perceived the resulting error state as the action having done nothing.
- Two-part backend fix shipped in `641853a`. (1) `_build_notification_link` for `session.escalated` now ends with `?pickup=true` so notification clicks route through the senior-pickup flow (handoff-based or legacy SessionBriefing). (2) `GET /ai-sessions/{id}` access policy: any account member can now read a session's detail when status is `requesting_escalation` or `escalated`. Tenant boundary enforced by RLS — the owner-only guard was overly restrictive for explicitly-shared in-transit states. After-pickup access (handler / `escalated_to_id`) checks still apply for active/resolved sessions.
- Verified end-to-end live: re-login as senior engineer (non-owner, non-target) and `GET /ai-sessions/{escalated-session-id}` returns 200 with full detail. Backend regression with broader subset (`test_escalation_bus`, `test_handoff_manager`, `test_session_handoffs_api`, `test_flowpilot_analytics_escalations`, `test_sessions`, `test_session_sharing`) → 94 passed in 43.26s.
- Pushed `feat/escalation-metric-endpoint` to Gitea. Opened **draft PR #155** against `main` via Gitea API ([gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155](https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/155)). Title prefixed `WIP:` so Gitea marks it `draft: true`. PR body links the design + test-plan artifacts and mirrors the test plan as a checklist with visual QA + e2e demo flow as the unchecked items.
- Open question for next session: EscalateModal still calls the legacy `/escalate` endpoint, not the new `/handoff` path. The wedge demo flow (junior escalates → magic-moment renders) is cleaner if EscalateModal goes through `/handoff`. Legacy path does PSA documentation push that the handoff path doesn't, so a parallel path (legacy escalate also creates a handoff record) is probably the right call rather than full migration.
- Files touched: `backend/app/api/endpoints/ai_sessions.py`, `backend/app/services/notification_service.py`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-04-27 21:30 EDT — Claude Code — Escalation Mode: magic-moment handoff-context screen on pickup
- Continued the same session that shipped the live-arrival SSE subscription. Added the magic-moment screen on top.
- New `frontend/src/components/flowpilot/HandoffContextScreen.tsx`: presentational 4-section view (header with problem summary + domain + step count + escalated-time + priority badge; "What's been tried" with engineer notes + step-count affordance; "AI assessment" with likely_cause / suggested_steps / confidence badge; "Start here" CTA). Confidence badge accepts both numeric (0..1) and string ("low"/"medium"/"high") shapes — backend emits the latter, the frontend type says `number`, runtime handles both. Renders an explicit "assessment unavailable — model didn't respond in time" branch when `ai_assessment_data` is null (the 5s timeout from `9bdd995` fired). `prefers-reduced-motion` swaps `animate-slide-up` for `animate-fade-in`. ARIA `role=dialog` + `aria-modal=true` + focus on primary CTA on mount + Esc dismiss when used as a re-openable overlay.
- Integration in `frontend/src/pages/FlowPilotSessionPage.tsx`: on `/pilot/:id?pickup=true`, fetch the handoff list via `handoffsApi.listHandoffs` (account-scoped via RLS, no claim required) and find the latest unclaimed escalate handoff. If found, render the screen and skip `loadSession` (the senior would 404 pre-claim because they aren't yet `escalated_to_id`). "Start here" calls `handoffsApi.claimHandoff`, drops the `?pickup=true` query, and dismisses the screen — the existing `loadSession` effect then fires because the senior is now `escalated_to_id`. New "Context" toolbar button on active sessions (visible only when the senior arrived via the magic-moment flow this session — handoff lookup on demand) re-opens the screen as a dismissible overlay.
- Verified end-to-end against the running dev stack: `listHandoffs` returns the unclaimed handoff with full payload (engineer_notes, snapshot keys); `claimHandoff` flips session status from `escalated``active` and sets `escalated_to_id`; subsequent `GET /ai-sessions/{id}` succeeds. `tsc -b` exit 0. No backend changes; backend tests still `32 passed in 18.91s`.
- Deferred to TODOs in `CURRENT_TASK.md`: suggested-step chips below the chat input (Codex correction; threads through to `FlowPilotMessageBar`); `HandoffManager._generate_snapshot` expansion to include the recent diagnostic timeline pre-claim (today's snapshot is just `problem_summary, problem_domain, status, step_count, confidence_tier`); toolbar "Context" button visibility on revisited active sessions; owner-facing `/analytics/escalations` page; Playwright e2e for the GTM Loom demo path.
- Branch state: 3 new commits (`b8627f4` SSE subscription, `f65b657` handoff doc bump, `8e9d22e` magic-moment screen). Branch is unpushed — next session pushes + opens draft PR.
- Files touched this slice: `frontend/src/components/flowpilot/HandoffContextScreen.tsx` (new), `frontend/src/components/flowpilot/index.ts`, `frontend/src/pages/FlowPilotSessionPage.tsx`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-04-27 21:00 EDT — Claude Code — Escalation Mode: frontend SSE subscription in EscalationQueue
- Picked up `feat/escalation-metric-endpoint` after the Codex test-stabilization pass. Confirmed green starting state: focused backend subset `32 passed in 18.78s` with `-n auto`.
- Implemented the live-arrival frontend slice. Added `streamEscalations(handlers, signal)` to `frontend/src/api/aiSessions.ts` — fetch-based `ReadableStream` reader (native `EventSource` can't send auth headers) that parses SSE frames (event/data/comment lines), buffers partial frames across chunks, ignores `: keepalive` heartbeats, dispatches `ready` and `handoff_created` events. Added `HandoffCreatedEvent` and `EscalationStreamHandlers` types in `frontend/src/types/ai-session.ts` mirroring the backend bus payload.
- Rewrote `frontend/src/components/flowpilot/EscalationQueue.tsx`. SSE subscription with `AbortController` + exponential-backoff reconnect (1s → 30s cap, attempt counter resets on `ready`). On `handoff_created` the component refetches the queue, diffs against the previous IDs via a `sessionsRef`, prepends new arrivals (newest-first) above established cards (oldest-first preserved). New IDs are tagged for 800ms so the locked 200ms slide-in animation plays before cleanup. Tab-title flash: captures `document.title` at mount, prefixes `(N)` while `document.hidden`, clears on `focus` / `visibilitychange`, restores on unmount. `prefers-reduced-motion: reduce` swaps `animate-slide-in-bottom` for `animate-fade-in`. ARIA: `role="region"` + `aria-live="polite"` on the list, `aria-label="N escalations awaiting pickup"` on the heading; Pick Up button bumped to `py-2.5` to clear the 44px touch floor.
- Verified end-to-end against the running dev stack. `tsc -b` exit 0. Vite HMR'd the new component without errors. Raw SSE handshake against `/api/v1/ai-sessions/escalations/stream` returned 200 with `text/event-stream; charset=utf-8` plus the locked headers (`cache-control: no-cache`, `x-accel-buffering: no`). Subscriber received the `ready` frame on connect; after posting a handoff via the API, the subscriber received the `handoff_created` frame with the full payload — wire format matches the parser exactly. Backend regression: same focused subset still `32 passed in 18.91s`.
- Not yet verified (would need a real browser session): the slide-in animation visually plays, the tab title actually updates, the reduced-motion media-query path, AbortController cancellation on unmount, backoff after a real network blip. Wire contract is confirmed; these are visual/timing-dependent and follow from correct parser + state machine.
- Smoke-test artifact: a single test handoff (`0f6149db…` on session `50ea20d4…`) is sitting in the engineer's queue from the verification step. Harmless; useful as visual demo data.
- Left for next session: the magic-moment handoff-context screen — 4 sections (problem summary / what's been tried / AI assessment / Start here CTA), loads on Pick Up, dissolves into the regular FlowPilot session view. Must render gracefully when `ai_assessment` is `None` (per the 5s assessment timeout from Codex's earlier fix).
- Files touched: `frontend/src/api/aiSessions.ts`, `frontend/src/types/ai-session.ts`, `frontend/src/components/flowpilot/EscalationQueue.tsx`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
## 2026-04-27 EDT — Claude Code — Escalation Mode wedge: design through SSE backend (8 commits)
- One long session that produced the entire planning artifact stack and most of the backend for the Escalation Mode wedge. Output of `/office-hours` (8 founder-signal session, top-tier YC archetype indicators), `/plan-eng-review` (scope reduced from "2-3 weeks greenfield" to "~6-9 days integration + metric + polish" once the existing handoff_manager surface was inventoried), `/plan-design-review` (6/10 → 9/10 with magic-moment screen, hero metric placement, and real-time arrival visual locked), and `/codex review` (12 findings, 6 applied — two-metric framing, notification routing, claim auth gate moved in-scope, unread-state fix, "Start here" CTA reframe, per-channel delivery model; 5 rejected including the full-scope reduction Codex pushed for).
- Branched `feat/escalation-metric-endpoint` off `main` @ `c0ed6d9`. Stack at session end: `d51e95c` plan + test-plan artifacts; `52f6d03` `GET /analytics/flowpilot/escalations` endpoint with 9 tests including multi-tenant isolation; `7a5b853` claim-endpoint role gate; `07d0db9` email dispatch on escalate with graceful-degradation regression; `9f0bfd4` `EscalationMetricCard` mounted above the queue list; `a283d0d` mid-flight `.ai/` refresh; `87bd0b7` WIP commit for SSE pub/sub bus + endpoint + 7 bus unit tests + 1 dispatcher integration test + 2 endpoint tests; `ba46fc5` paused-for-Codex-review handoff. Codex picked up from `ba46fc5` and added `bc15952` / `fff8338` / `9bdd995` (test stabilization + assessment latency bound).
- Pause was forced by a runaway local test loop: multiple stale `pytest` processes were left inside `resolutionflow_backend` after several aborted runs and contended on the same Postgres test schema. Codex diagnosed and fixed (see entry above).
- Frontend: thin slice — added `getEscalationMetrics` to `flowpilotAnalyticsApi`, the `EscalationMetricCard` component (loading / error / zero-data states + avg + median + conversion-rate + the inline two-metric disclaimer), and mounted it above `EscalationQueue`. `tsc -b` clean.
- Plan-stage UI decisions locked into the design doc and the codebase: dedicated 4-section magic-moment screen on Pick Up that dissolves into FlowPilot; queue stat-card + dedicated owner analytics page for the hero metric (in two places, not one); 200ms slide-in + tab-title flash on real-time arrival, no sound, respects `prefers-reduced-motion`; unread dot clears on open/claim/dismiss, NOT on hover (Codex correction). Claim role gate moved in-scope per Codex (not deferred to TODO).
- Two TODOs added: peer-tech escalation (deferred to v2 once a pilot asks); mobile/responsive design (also v2; pre-PMF wedge demo targets desktop). Claim role gate's TODO entry was struck through in the same session because it shipped in `7a5b853`.
- Plan and test-plan artifacts copied into `docs/plans/` under the `YYYY-MM-DD-name-design.md` / `-test-plan.md` convention so they live alongside the existing project plans, not just in `~/.gstack/projects/`.
- Left for next session: frontend SSE subscription in `EscalationQueue.tsx` (fetch-based ReadableStream — native EventSource can't send auth headers; match `streamDocumentation` in `frontend/src/api/aiSessions.ts`), then the magic-moment handoff-context screen, then push + draft PR. Default Claude Code model is being switched from Opus 4.7 1M-context to Opus 4.7 (200k) for the next session — the resume docs are sized to be self-sufficient under the smaller window.
- Files touched (committed): `docs/plans/2026-04-27-escalation-mode-wedge-design.md`, `docs/plans/2026-04-27-escalation-mode-wedge-test-plan.md`, `backend/app/api/endpoints/flowpilot_analytics.py`, `backend/app/schemas/flowpilot_analytics.py`, `backend/app/api/endpoints/session_handoffs.py`, `backend/app/services/handoff_manager.py`, `backend/app/core/escalation_bus.py` (new), `backend/tests/test_flowpilot_analytics_escalations.py` (new), `backend/tests/test_escalation_bus.py` (new), `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `frontend/src/types/flowpilot-analytics.ts`, `frontend/src/api/flowpilotAnalytics.ts`, `frontend/src/components/flowpilot/EscalationMetricCard.tsx` (new), `frontend/src/components/flowpilot/index.ts`, `frontend/src/pages/EscalationQueuePage.tsx`, `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/TODO.md`.
## 2026-04-27 19:50 EDT — Codex — Stabilize Escalation Mode SSE backend tests
- Diagnosed slow backend tests on `feat/escalation-metric-endpoint`. Multiple stale pytest processes were still alive inside `resolutionflow_backend` and held `resolutionflow_test` transactions open, blocking later per-test schema resets on `DROP SCHEMA public CASCADE`.
- Reproduced a deterministic hang in `test_escalations_stream_returns_sse_content_type`: HTTPX `ASGITransport` buffers the full response body before returning, so an infinite SSE response never yielded the initial chunk and kept the auth DB dependency transaction open.
- Fixed `stream_escalations` to release auth dependencies before the long-lived stream body with `Depends(..., scope="function")`.
- Reworked the SSE handshake test to call `stream_escalations()` directly and consume one generator yield, then close it; kept viewer role-gate coverage through the API client.
- Stubbed `_generate_ai_assessment()` in handoff manager/API tests so escalation handoff tests no longer wait on the real AI path.
- Normalized account IDs inside `EscalationBus` so string UUIDs and `UUID` objects hit the same subscriber bucket; added a regression test.
- Verified focused backend subset: serial `31 passed in 46.95s`; xdist `31 passed in 17.80s`. Confirmed no lingering pytest processes or test DB sessions afterward.
- Follow-up in the same session: fixed the product latency risk by adding `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` (default 5s) around escalation AI assessment generation. If the optional assessment times out, handoff creation continues with no assessment. Added regression coverage; focused xdist subset now `32 passed in 17.77s`.
- Left for next session: continue frontend SSE subscription in `EscalationQueue.tsx`, then the magic-moment handoff-context screen.
- Files touched: `backend/app/api/endpoints/session_handoffs.py`, `backend/app/core/config.py`, `backend/app/core/escalation_bus.py`, `backend/app/services/handoff_manager.py`, `backend/tests/test_escalation_bus.py`, `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/TODO.md`.
## 2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150
- User reported a troubleshooting-session bug: after answering a subset of task-lane questions and clicking *Send N of M Responses*, no AI response appeared. Traced to `AssistantChatPage`: the dashboard prefill effect set `activeChatId` after creating a new chat session but never updated `currentChatRef.current`. The `currentChatRef.current !== sentForChatId` guard in `handleSend` and `handleTaskSubmit` then bailed silently on every later request and discarded the AI's reply. The user message was already pushed to the chat before the await, so the user saw their answers but nothing else.
- Fix: one-line addition mirroring `handleNewChat` and `handleResumeNew` — assign `currentChatRef.current = session.session_id` immediately after `setActiveChatId(session.session_id)` in the prefill effect. Branched off `origin/main` as `fix/tasklane-prefill-ref`; PR #153 opened on Gitea.
- Authored a Playwright regression test `frontend/e2e/assistant-chat-prefill.spec.ts` that drives the real dashboard prefill flow against the real backend, stubs `/ai-sessions/*/chat` with `page.route` for deterministic turn-1/turn-2 responses, and asserts the second AI message renders. Confirmed the test fails on unfixed code at the exact assertion (`Got it — based on your answer…` never appears) and passes once the fix is restored.
- Verified locally inside `mcr.microsoft.com/playwright:v1.58.2-noble` against the running dev stack: new spec passes, adjacent `flowpilot-chat` spec still passes, `tsc -b` clean. `resume.spec` and `history.spec` failures observed are pre-existing real-backend fixture collisions, unrelated to this change.
- First CI run on PR #153 failed on infrastructure issues already addressed by PR #150: backend hit `Bind for 0.0.0.0:5432 failed: port is already allocated`, frontend hit `actions/upload-artifact@v4 not supported on GHES`. PR #150 was already merged (commit `87bb20b` on `main`). Rebased `fix/tasklane-prefill-ref` onto new `main` (force-push `1a8cb06``1559feb`), resolved a `.ai/TODO.md` conflict by keeping both backlog item sets, kicked off CI on the rebased SHA.
- Confirmed `CI / backend (pull_request)` is now in branch protection's required-status-checks list (added during PR #150 close-out). `CI / e2e (pull_request)` left as not-required pending one more clean PR run as the threshold.
- Recorded the broader silent-return concern in TODO backlog: the `currentChatRef.current !== sentForChatId` guard is applied across `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview`. PR #153 fixes one symptom but the same pattern can mask other drift. Either log a Sentry breadcrumb on the mismatch path or distinguish "expected stale" (chat switch) from "unexpected stale" (ref never updated) so the latter alerts.
- First CI run on the rebased SHA passed backend and frontend but failed e2e: the new prefill regression test couldn't render the task-lane question text. Diagnosed via the job log: `POST /api/v1/ai-sessions` calls `_require_ai_enabled()` and returns 503 when no provider key is set. The e2e CI job had neither `ANTHROPIC_API_KEY` nor `GOOGLE_AI_API_KEY` in env. Locally the dev backend has a real key, hence the local pass. The Playwright `page.route` stub on `/chat` was correct but never had a chance to fire because the upstream session-creation call was 503-ing.
- Fix: added a stub `ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests` to the e2e job env in `.gitea/workflows/ci.yml`. The Playwright stub still intercepts the actual `/chat` call in the browser, so the backend never contacts Anthropic — the gate just needs to clear. Documented the convention in a workflow comment so future AI-touching e2e tests know what to expect. Pushed `11fe32f`; CI went all-green.
- Merged PR #153 as `68fcdc6` on `main`. Local feature branch and remote both deleted via Gitea's `delete_branch_after_merge`.
- Opened a small follow-up `chore/post-153-handoff` PR to refresh the now-stale `.ai/` files (this entry, plus `CURRENT_TASK.md` rolling forward to "no active task — pick from `TODO.md`" and `HANDOFF.md` updating to the post-merge home position). The `data-testid` audit at the top of `TODO.md` "Up next" or the `currentChatRef` silent-return audit added in this session's backlog are the natural next pickups.
- Files touched: `frontend/src/pages/AssistantChatPage.tsx` (the one-line fix + comment), `frontend/e2e/assistant-chat-prefill.spec.ts` (new regression test), `.gitea/workflows/ci.yml` (stub `ANTHROPIC_API_KEY` for e2e), `.ai/TODO.md` (silent-return follow-up entry, plus conflict resolution preserving PR #150's backlog additions), `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md` (this entry).
## 2026-04-25 16:41 EDT — Codex — Stabilize PR #150 e2e selectors
- Investigated the remaining PR #150 failure after backend and frontend CI were green. The e2e resume smoke test was not failing because of product behavior; it used `.bg-card` plus text filtering and matched the tree filter `<select>` before the intended session card.
- Added stable test IDs to flow session, tree, and share cards, then updated affected e2e tests to target those cards instead of Tailwind class names.
- Hardened the CI workflow by making Postgres healthchecks authenticate as `postgres` and baking `VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}"` into the e2e frontend build.
- Verified with `git diff --check`, frontend build in Docker, no remaining `.bg-card` e2e selectors, and focused Playwright runs in an Actions-like Ubuntu container: resume spec passed, then history/library/library-start/resume/shares passed (`6 passed`).
- Left for next session: push this WIP commit to PR #150, watch CI, merge when all three jobs are green, then enable backend branch protection and consider the e2e gate after a reliable green run.
- Files touched: `.gitea/workflows/ci.yml`, `frontend/e2e/history.spec.ts`, `frontend/e2e/library-start.spec.ts`, `frontend/e2e/library.spec.ts`, `frontend/e2e/resume.spec.ts`, `frontend/e2e/shares.spec.ts`, `frontend/src/components/library/TreeGridView.tsx`, `frontend/src/components/library/TreeListView.tsx`, `frontend/src/pages/MySharesPage.tsx`, `frontend/src/pages/SessionHistoryPage.tsx`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/SESSION_LOG.md`.
## 2026-04-25 12:00 America/New_York — Claude Code — Mock final AI-provider test, cache CI deps, parallelize backend with pytest-xdist
- Diagnosed why CI was still red despite Codex's local 1076 passed: a single test (`test_record_decision_persists_and_bumps_state_version`) needed `ANTHROPIC_API_KEY` because the `decision: draft_template` path calls `TemplateExtractionService` → AI provider. Patched `_extract_template_parameters` with an `AsyncMock` so the test no longer depends on AI availability. Verified.
- Pushed Codex's WIP commit `49f8856` to PR #150 (had been local-only per handoff protocol).
- PR #150 (`fix/ci-workflow-config`) extended with cheap CI wins: `actions/cache@v3` for pip + npm in all three jobs; dropped `--cov-report=term-missing` (the custom display step parses JSON); added `--maxfail=10` so structural breakage exits fast.
- PR #151 (`fix/ci-pytest-xdist`) opened, stacked on #150: pytest-xdist with per-worker DB isolation. `conftest.py` reads `PYTEST_XDIST_WORKER`, computes a per-worker DB URL like `…_gw0`, and synchronously CREATEs the DB on first import. The per-test `DROP SCHEMA public CASCADE` then operates on the worker's isolated DB. Verified locally: backend suite went from 22m 27s serial → 4m 28s parallel (8 workers), 1076 passed in both cases. ~5× speedup.
- Decided NOT to do per-test transactional rollback (bigger refactor); captured for future TODO consideration.
- Left for next session: watch CI on both PRs, merge in order (#150 first, #151 second), then enable `CI / backend (pull_request)` as a required status check on main.
- Files touched: `backend/tests/test_session_suggested_fixes_api.py`, `backend/tests/conftest.py`, `backend/requirements-dev.txt`, `.gitea/workflows/ci.yml`, `.ai/HANDOFF.md`, `.ai/CURRENT_TASK.md`, `.ai/TODO.md`.
## 2026-04-25 06:12 EDT — Codex — Fix backend suite to green
- Fixed the real backend failures left after the CI-infra cleanup: tenant-scoped seed drift, missing production `account_id` writes, public route mounting for survey/share links, Script Builder library saves, resolution output async loading, AI search schema metadata, disabled-AI fixture leakage, and prompt marker guardrails.
- Added backend CI/dev system packages required by WeasyPrint PDF export.
- Stabilized the pytest harness for pytest-asyncio/asyncpg teardown ResourceWarnings under `filterwarnings = error`.
- Verified `pytest --override-ini="addopts=" -q` inside `resolutionflow_backend`: `1076 passed, 35 deselected in 1347.41s`.
- Left for next session: commit/push if needed, check and merge PR #150 when Gitea CI is green, add backend CI as a required branch-protection check, and rerun frontend lint if final DoD requires it.
- Files touched: `.gitea/workflows/ci.yml`, `backend/Dockerfile.dev`, `backend/app/api/endpoints/folders.py`, `backend/app/api/endpoints/script_builder.py`, `backend/app/api/endpoints/shares.py`, `backend/app/api/router.py`, `backend/app/models/ai_session.py`, `backend/app/schemas/user.py`, `backend/app/services/assistant_chat_service.py`, `backend/app/services/resolution_output_generator.py`, `backend/app/services/script_builder_service.py`, `backend/pytest.ini`, `backend/tests/conftest.py`, and focused backend tests.
## 2026-04-25 02:00 America/New_York — Claude Code — Land FlowPilot + PSA, recover CI from 488 errors to ~4
- Started session by completing pending FlowPilot Phase 9 QA: ran `/qa` against the seeded fixtures, found and fixed four latent layout/state bugs (`ResolutionNotePreview` off-screen, `TemplateMatchPanel` deadlock when TaskLane closed, `EscalateInterceptDialog` clipped above viewport, `seed_test_users.py` `cancel_at_period_end` NOT NULL crash). Added a new fixture seeder `backend/scripts/seed_phase9_qa_fixtures.py` that pre-bakes the four backend states the AI orchestrator needs to emit, so future QA can exercise all 7 conditional Phase 9 components without depending on stochastic AI behavior.
- Discovered PR #141 (PSA ticket management) and `feat/flowpilot-migration` had 5 overlapping files but only 2 real conflicts (`CLAUDE.md`, `AssistantChatPage.tsx`). Conflicts were both additive — concatenated rather than chose-a-side.
- Merged PSA first (PR #141), then merged FlowPilot (PR #147), each through Gitea API. `tsc -b` clean and visual smoke-test confirmed PSA's Tickets sidebar coexists with Phase 9 ProposalBanner.
- Discovered main had been merging through a broken CI gate for several merges. Initially recommended "stop the line, fix CI before shipping." After scoping the actual rot (~50% of tests red, ~600 errors on a clean run), reversed the recommendation: ship the queue first because FlowPilot itself carried significant test-infra repairs that would be duplicated work on a fresh recovery branch.
- PR #148: two surgical fixes to main (network_diagrams JSONB `server_default` triple-quote bug, deprecated session-scoped `event_loop` fixture in conftest). +78 passing / -114 errors.
- PR #149: frontend lint `20 errors → 0`, `requirements-dev.txt` pytest pin bumped to satisfy `pytest-asyncio==0.24.0`'s `pytest>=8.2`, and a one-line `from app import models as _models` in conftest that registers all ~60 models with `Base.metadata` before `create_all`. The conftest fix collapsed 484 of the remaining 488 backend errors. `1018 passed / 4 errors / 54 failed` after.
- Enabled Gitea branch protection on `main`: PR-only merges, `CI / frontend (pull_request)` required, force-push blocked, no review required.
- Discovered CI on the merge commit STILL showed red despite local pytest being mostly green. Root cause: workflow only set `DATABASE_URL`, but conftest reads only `DATABASE_TEST_URL` (per `dab740d`'s safety hardening). 638 connection-refused errors on every fixture setup. Plus `actions/upload-artifact@v4` not supported by Gitea Actions. PR #150 fixes both.
- Left for next session: merge PR #150 once CI confirms green, add `CI / backend (pull_request)` to required status checks, then root-cause and fix the 54 real backend test failures (one sample seen — `test_user` fixture leaking across calls causing duplicate-email violations).
- Files touched (committed): `backend/scripts/seed_test_users.py`, `backend/scripts/seed_phase9_qa_fixtures.py` (new), `backend/app/models/network_diagram.py`, `backend/tests/conftest.py`, `backend/requirements-dev.txt`, `frontend/src/components/pilot/ResolutionNotePreview.tsx`, `frontend/src/components/pilot/EscalateInterceptDialog.tsx`, `frontend/src/components/pilot/ScriptBuilderTab.tsx`, `frontend/src/pages/AssistantChatPage.tsx`, `frontend/src/pages/FlowPilotSessionPage.tsx`, `frontend/src/pages/TicketsPage.tsx`, `frontend/src/hooks/useFlowPilotSession.ts`, `frontend/src/hooks/useMediaQuery.ts`, `frontend/src/components/dashboard/TicketQueue.tsx`, `frontend/src/components/network/nodes/DeviceNode.tsx`, `frontend/src/components/network/nodes/GroupNode.tsx`, `frontend/src/components/routing/AssistantSessionRedirect.tsx` (new), `frontend/src/router.tsx`, `.gitea/workflows/ci.yml`, `.claude/settings.json` (new), `.claude/hooks/check-gstack.sh` (new), `.gitignore`, `CLAUDE.md`, `.gstack/qa-reports/phase9-*/` (QA artifacts).
- Net merges to main: PR #141 (PSA), PR #147 (FlowPilot), PR #148 (CI fixes part 1), PR #149 (CI fixes part 2). PR #150 still open at session end.
## 2026-04-24 — Claude Code — Migrate to dual-agent handoff system
- Split CLAUDE.md into `.ai/PROJECT_CONTEXT.md` + shared-protocol root files (`CLAUDE.md`, `AGENTS.md`).
- Seeded `CURRENT_TASK.md`, `HANDOFF.md`, `TODO.md`, `DECISIONS.md`, `SESSION_LOG.md`, `README.md`.
- Deleted legacy `SESSION-HANDOFF.md` (superseded).
- Left for next session: first real feature task should replace the seed `CURRENT_TASK.md` and update `HANDOFF.md` with real resume state.
- Files touched: `.ai/*.md` (created), `CLAUDE.md` (rewritten), `AGENTS.md` (created), `SESSION-HANDOFF.md` (deleted).
- Follow-up (same day): Codex review pass flagged stale SaaS-role claim and incomplete file-listings carried over from the pre-migration CLAUDE.md. Verified against `backend/app/core/permissions.py`, `frontend/src/hooks/usePermissions.ts`, `backend/app/api/deps.py`, `backend/app/api/router.py`, and `backend/app/services/psa/`. Corrected PROJECT_CONTEXT.md role hierarchy (`super_admin > owner > engineer > viewer`, not `team_admin`), added `require_account_owner` / `require_team_admin` to deps list, replaced stale endpoint comment with a summary pointing at `api/router.py`, added `exceptions.py` + `ticket_context.py` to the PSA file list. Also replaced seed-example content in `CURRENT_TASK.md` and `TODO.md` with clearer empty-state sentinels.
- Branch cleanup (same day): committed pending test-isolation work as `b14a16a chore(tests): gate RLS tests behind RUN_RLS_TESTS flag`, new Phase 9 review doc as `b3506b5 docs(pilot): phase 9 review issues`, and `.remember/` gitignore entry as `b3be1e0 chore: ignore .remember/ skill runtime state`. Deleted `docs/landing-handoff/` (prepared for external design work, not meant to live in the repo). Working tree clean; 3 cleanup commits unpushed.
## 2026-05-07 UTC — Codex — Resolve PR #162 CI failures
- Investigated Gitea PR #162 failing checks for `feat/self-serve-signup-phase-2`. Public status metadata was available, but job logs required Gitea login and no token was present.
- Standardized backend development/CI Python on 3.12.13 to match the Docker image: added `.python-version`, updated Gitea CI Python setup, rebuilt the local backend virtualenv, and verified native `pytest` / `alembic` command availability with explicit local env.
- Added explicit Node 20 setup to Gitea frontend and e2e jobs so CI no longer depends on the runner's ambient Node installation.
- Reproduced the remaining frontend failure locally. Lint failed on Phase 2 React code because the current eslint stack flags exported pure helpers, render-time `Date.now()`, and effect-driven state synchronization.
- Patched the affected frontend surfaces narrowly: dashboard helper exports, app-config cache handling, feature-limit cache/fetch state, trial-banner time capture, invite/OAuth route error state, pricing loading state, and OAuth authorize URL helper export.
- Verified sequential frontend CI locally in Docker: `npm run lint` passed, `npm run test:coverage` passed (`198` tests), and `npm run build` passed with only Vite chunk-size warnings.
- Files touched: `.python-version`, `.gitea/workflows/ci.yml`, `.github/workflows/ci.yml`, `.ai/*`, `README.md`, `DEV-ENV.md`, and the frontend lint-fix files under `frontend/src/components/dashboard`, `frontend/src/hooks`, and `frontend/src/pages`.

25
.ai/TODO.md Normal file
View File

@@ -0,0 +1,25 @@
# TODO.md
> Backlog of work NOT currently active. Read only when `CURRENT_TASK.md` status is `complete`.
> Format: `- [ ] short description — optional link to issue/PR`
## Up next
None selected. Pick from the backlog below or `03-DEVELOPMENT-ROADMAP.md`.
## Backlog
- [ ] **Frontend lint warnings cleanup.** `npm run lint` currently reports 24 warnings (0 errors): mostly `react-hooks/exhaustive-deps` plus a few unused eslint-disable directives. Either fix them or audit known-safe ones and add/remove eslint-disable comments intentionally. Not blocking CI today.
- [ ] **Audit `filterwarnings` ignores added in `wip(handoff): restore backend suite to green`.** Codex added narrow `ResourceWarning` filters for unclosed socket/transport/event-loop noise from pytest-asyncio teardown. Worth periodically reviewing whether those are still needed (e.g. when bumping pytest-asyncio) — if a real warning appears in those forms it would be silenced.
- [ ] **Add `data-testid` attributes to e2e-critical interactive elements.** PR #152 fixed five Playwright tests by chasing UI-text changes (`Sessions``Session History`, `Account Settings``Account Management`, `/assistant``/pilot`, "Flow Sessions" tab, Resume button on session cards). Each was a one-line selector update, but every UI churn re-breaks them. Adding stable `data-testid` attributes on the targeted elements (page heading wrappers, tab nav, primary action buttons) and switching tests to `getByTestId` would make these immune to copy/route renames. Scope it small — start with `SessionHistoryPage` heading, the AI/Flow Sessions tab buttons, the per-session `Resume` button, and the command-palette FlowPilot option.
- [ ] **Per-test transactional rollback in `test_db` fixture.** Bigger engineering than xdist (which we already shipped). Instead of `DROP SCHEMA public CASCADE` per test, wrap each test in a savepoint and rollback at teardown. ~30-40% additional speedup on top of xdist for test-DB-heavy tests. Real refactor; only worth it if the suite gets significantly larger or runs more frequently.
- [ ] **Consider `pytest-testmon` for PR-time test selection.** Tracks which tests touched which source files and only re-runs affected ones. Best for small PRs touching ~few files. Adds cache-invalidation complexity; only worth it if the suite stays painfully long even after xdist.
- [ ] **AssistantChatPage `currentChatRef` guard is a silent return**`handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview` all bail with `if (currentChatRef.current !== sentForChatId) return` when stale. This is by design for chat switching, but it also silently masked the prefill-ref bug fixed in PR #153 — the user just saw "no AI response" with no log, no toast, no Sentry event. Either (a) log a `console.warn`/Sentry breadcrumb on the mismatch path so future drift is visible, or (b) split "expected stale" (chat switch) from "unexpected stale" (ref never updated) so only the latter alerts. Pair with an audit of every `currentChatRef.current = ...` assignment vs every `setActiveChatId(...)` call to make sure they're paired everywhere.
- [ ] **Allow peer-tech to escalate a colleague's session.** Today `POST /ai-sessions/{session_id}/handoff` in [endpoints/session_handoffs.py:48](backend/app/api/endpoints/session_handoffs.py#L48) filters by `AISession.user_id == current_user.id`, so only the session owner can escalate. Real MSP shops have peer hand-offs: Junior A is on lunch, Junior B sees the session is stuck and should be able to escalate it. Auth tweak: switch from session-owner check to `require_engineer_or_admin` + same-account scope. Add a `handed_off_by` audit column (already exists on `SessionHandoff`) so the original-owner-vs-actual-escalator distinction is preserved. Surfaced from /plan-eng-review on the Escalation-Mode wedge plan; v1 wedge demo doesn't need this (solo-founder pilot), but capture for v2 once 3+ pilots are live and a peer-claim need surfaces.
- [ ] **Mobile/responsive design for EscalationQueue + handoff-context screen.** Pre-PMF wedge demo targets desktop only — MSP techs work on laptops/desktops in shop environments. Once 3+ paying customers exist and a tech requests mobile (likely on-call use case), spec the responsive behavior: stacked card layout below `sm:` breakpoint, full-bleed handoff-context overlay on mobile, swipe-to-claim gesture instead of Pick Up button. Surfaced from /plan-design-review on the Escalation-Mode wedge plan.
- [ ] **`bg-card-hover` Tailwind class doesn't resolve.** [`frontend/src/components/layout/CommandPalette.tsx:450-451`](../frontend/src/components/layout/CommandPalette.tsx) uses `bg-card-hover` as a Tailwind utility, but Tailwind v4 generates `bg-{token}` from `--color-{token}` — and the token in [`frontend/src/index.css:15`](../frontend/src/index.css) is `--color-bg-card-hover`, which generates `bg-bg-card-hover`, not `bg-card-hover`. So those classes silently produce nothing. Other call sites (KnowledgeBaseCards, TeamSummary, ProposalBanner) use the explicit `hover:bg-[var(--color-bg-card-hover)]` form which works. Fix: change the CommandPalette classes to the explicit-var form, OR add a `--color-card-hover` semantic mapping in index.css alongside `--color-card`. Surfaced 2026-05-01 during impeccable polish sweep.
- [ ] **`ConcludeSessionModal` paused/escalated step forces single-artifact choice — should allow multi-select.** [`frontend/src/components/assistant/ConcludeSessionModal.tsx`](../frontend/src/components/assistant/ConcludeSessionModal.tsx) ~lines 430-474 ("Paused/Escalated: status update options"). Today the engineer clicks ONE of Ticket Notes / Client Update / Email Draft, the buttons disappear, and the result replaces them. Real MSP escalations almost always need at least two: technical notes for the next engineer's PSA AND a non-technical client update. Same for pause (client update + ticket notes for context when resuming). Recommended shape: multi-select with smart defaults — three checkboxes (`☑ Ticket Notes ☑ Client Update ☐ Email Draft`); for `escalated` pre-check Ticket Notes + Client Update; for `paused` pre-check Client Update only. One "Generate" button fires all selected in parallel via existing `aiSessionsApi.generateStatusUpdate(...)` (already supports the three `audience` values: `ticket_notes`, `client_update`, `email_draft`). Each result renders in its own card with its own Copy / Post-to-PSA / Send-Email action. Surfaced 2026-05-01. Feature work, not polish — touches streaming wiring for parallel calls.

20
.claude/hooks/check-gstack.sh Executable file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
# Block skill usage when gstack is not installed globally.
if [ ! -d "$HOME/.claude/skills/gstack/bin" ]; then
cat >&2 <<'MSG'
BLOCKED: gstack is not installed globally.
gstack is required for AI-assisted work in this repo.
Install it:
git clone --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
cd ~/.claude/skills/gstack && ./setup --team
Then restart your AI coding tool.
MSG
echo '{"permissionDecision":"deny","message":"gstack is required but not installed. See stderr for install instructions."}'
exit 0
fi
echo '{}'

15
.claude/settings.json Normal file
View File

@@ -0,0 +1,15 @@
{
"hooks": {
"PreToolUse": [
{
"matcher": "Skill",
"hooks": [
{
"type": "command",
"command": "\"$CLAUDE_PROJECT_DIR/.claude/hooks/check-gstack.sh\""
}
]
}
]
}
}

View File

@@ -17,10 +17,13 @@ jobs:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: resolutionflow_test
ports:
- 5432:5432
# No host port mapping. Tests connect to `postgres:5432` (the service
# container's docker-network DNS name), not `localhost:5432`. With
# multiple Gitea runners on the same homelab box, host-port mapping
# would race — two backend/e2e jobs both binding 0.0.0.0:5432 → the
# second fails with "port is already allocated".
options: >-
--health-cmd pg_isready
--health-cmd "pg_isready -U postgres"
--health-interval 10s
--health-timeout 5s
--health-retries 5
@@ -28,6 +31,12 @@ jobs:
env:
DATABASE_URL: postgresql+asyncpg://postgres:postgres@postgres:5432/resolutionflow_test
DATABASE_URL_SYNC: postgresql://postgres:postgres@postgres:5432/resolutionflow_test
# conftest.py reads DATABASE_TEST_URL only (DATABASE_URL is intentionally
# not consulted after the dab740d test-isolation hardening). The CI test
# DB is the same postgres service, so point DATABASE_TEST_URL at it
# explicitly — without this, conftest falls back to localhost:5432 and
# all tests fail at fixture setup with "connection refused".
DATABASE_TEST_URL: postgresql+asyncpg://postgres:postgres@postgres:5432/resolutionflow_test
SECRET_KEY: ci-test-secret-key-not-for-production
DEBUG: "true"
APP_NAME: ResolutionFlow
@@ -37,6 +46,24 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: pip-${{ runner.os }}-${{ hashFiles('backend/requirements.txt', 'backend/requirements-dev.txt') }}
restore-keys: |
pip-${{ runner.os }}-
- name: Install system dependencies
run: |
apt-get update
apt-get install -y libpango1.0-dev libcairo2-dev libgdk-pixbuf-2.0-dev libffi-dev libjpeg-dev zlib1g-dev
- name: Install dependencies
run: pip install --break-system-packages -r backend/requirements.txt -r backend/requirements-dev.txt
@@ -47,7 +74,15 @@ jobs:
run: cd backend && python scripts/check_tenant_filters.py
- name: Run tests with coverage
run: cd backend && python -m pytest --override-ini="addopts=" --cov=app --cov-report=term-missing --cov-report=json:coverage.json --cov-fail-under=50
# `-n auto` parallelizes across all runner cores via pytest-xdist.
# conftest.py creates a per-worker DB (resolutionflow_test_gw0,
# resolutionflow_test_gw1, …) so the per-test DROP SCHEMA doesn't
# race across workers. Master/serial runs keep the base DB.
# term-missing dropped — the custom "Display coverage summary" step
# below parses coverage.json and prints the same info more concisely.
# --maxfail=10 short-circuits on structural breakage so we don't burn
# 25 minutes when a fixture explodes.
run: cd backend && python -m pytest --override-ini="addopts=" -n auto --maxfail=10 --cov=app --cov-report=json:coverage.json --cov-fail-under=50
- name: Display coverage summary
if: always()
@@ -75,6 +110,19 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Set up Node.js 20
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Cache npm
uses: actions/cache@v3
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
- name: Install dependencies
run: cd frontend && npm ci
@@ -87,15 +135,14 @@ jobs:
- name: Build
run: cd frontend && NODE_OPTIONS="--max-old-space-size=4096" npm run build
- name: Upload build artifact
uses: actions/upload-artifact@v4
with:
name: frontend-dist
path: frontend/dist
retention-days: 1
# Build artifact intentionally NOT uploaded. The e2e job below builds
# its own frontend rather than downloading one from this job, so there
# is no need for the cross-job artifact handoff (which previously broke
# on actions/upload-artifact@v4 GHES support and forced a v3 pin).
# Decoupling also lets e2e start immediately rather than waiting for
# this job to finish — important on a multi-runner setup.
e2e:
needs: [frontend]
runs-on: ubuntu-latest
services:
@@ -105,10 +152,13 @@ jobs:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: resolutionflow_test
ports:
- 5432:5432
# No host port mapping. Tests connect to `postgres:5432` (the service
# container's docker-network DNS name), not `localhost:5432`. With
# multiple Gitea runners on the same homelab box, host-port mapping
# would race — two backend/e2e jobs both binding 0.0.0.0:5432 → the
# second fails with "port is already allocated".
options: >-
--health-cmd pg_isready
--health-cmd "pg_isready -U postgres"
--health-interval 10s
--health-timeout 5s
--health-retries 5
@@ -121,21 +171,55 @@ jobs:
PLAYWRIGHT_SECRET_KEY: ci-playwright-secret-key
PLAYWRIGHT_TEST_EMAIL: teamadmin@resolutionflow.example.com
PLAYWRIGHT_TEST_PASSWORD: TestPass123!
# AI-touching endpoints (POST /ai-sessions, /chat, /respond, etc.) are
# gated by `_require_ai_enabled()`, which returns 503 when no provider
# key is set. Tests that exercise those flows stub the AI calls in the
# browser via `page.route`, so the backend never actually contacts
# Anthropic — but the gate still has to pass. A stub value is enough.
ANTHROPIC_API_KEY: ci-stub-key-not-used-by-tests
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Set up Node.js 20
uses: actions/setup-node@v4
with:
node-version: "20"
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: pip-${{ runner.os }}-${{ hashFiles('backend/requirements.txt', 'backend/requirements-dev.txt') }}
restore-keys: |
pip-${{ runner.os }}-
- name: Cache npm
uses: actions/cache@v3
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('frontend/package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
- name: Install backend dependencies
run: pip install --break-system-packages -r backend/requirements.txt -r backend/requirements-dev.txt
- name: Install frontend dependencies
run: cd frontend && npm ci
- name: Download frontend build
uses: actions/download-artifact@v4
with:
name: frontend-dist
path: frontend/dist
- name: Build frontend
# Building inline (instead of downloading an artifact from the
# frontend job) drops the cross-job dependency, so e2e can start
# immediately on a free runner. Adds ~1-2 min of build time, but
# eliminates the artifact-upload mechanism entirely (no more
# v3/v4 GHES headaches) and saves ~5 min of waiting.
run: cd frontend && NODE_OPTIONS="--max-old-space-size=4096" VITE_API_URL="${PLAYWRIGHT_API_ORIGIN}" npm run build
- name: Install Playwright browser
run: cd frontend && npx playwright install --with-deps chromium
@@ -145,7 +229,7 @@ jobs:
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v3
with:
name: playwright-report
path: |

View File

@@ -37,10 +37,10 @@ jobs:
steps:
- uses: actions/checkout@v5
- name: Set up Python 3.11
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"
cache: pip
cache-dependency-path: |
backend/requirements.txt
@@ -143,10 +143,10 @@ jobs:
steps:
- uses: actions/checkout@v5
- name: Set up Python 3.11
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.11"
python-version: "3.12"
cache: pip
cache-dependency-path: |
backend/requirements.txt

9
.gitignore vendored
View File

@@ -207,7 +207,11 @@ marimo/_lsp/
__marimo__/
# Claude Code (local config, agents, settings)
.claude/
.claude/*
!.claude/settings.json
!.claude/hooks/
.claude/hooks/*
!.claude/hooks/check-gstack.sh
.agents/
# Database dumps
@@ -238,3 +242,6 @@ package-lock.json
# graphify knowledge graph outputs
graphify-out/
.graphify_python
# remember skill runtime state (hook logs, PIDs)
.remember/

1
.python-version Normal file
View File

@@ -0,0 +1 @@
3.12.13

61
AGENTS.md Normal file
View File

@@ -0,0 +1,61 @@
# AGENTS.md — ResolutionFlow
You are OpenAI Codex, the resume agent for ResolutionFlow. Claude Code is the primary coding agent; you step in when Claude hits session or weekly limits.
The first thing to do every session: read [`.ai/PROJECT_CONTEXT.md`](.ai/PROJECT_CONTEXT.md), [`.ai/CURRENT_TASK.md`](.ai/CURRENT_TASK.md), and [`.ai/HANDOFF.md`](.ai/HANDOFF.md). The ritual is spelled out below.
> The protocol section below is byte-identical to the shared block in CLAUDE.md. If you edit one, edit the other.
## Shared protocol
### Startup ritual (every session)
1. Read `.ai/PROJECT_CONTEXT.md` — architectural truth for this repo.
2. Read `.ai/CURRENT_TASK.md` — what we're actively working on.
3. Read `.ai/HANDOFF.md` — exact resume point.
4. Skim `.ai/DECISIONS.md` for recent entries relevant to the current task.
5. Run `git log --oneline -15` and `git status`.
6. Before taking action, state back in two sentences: the current goal and your proposed next action.
### Handoff ritual (session end — limit hit, task complete, or user stop)
1. Update `.ai/HANDOFF.md` to reflect new state. Keep it under ~2K tokens.
2. If `CURRENT_TASK.md` status changed, update it.
3. If you made an architectural decision, append to `.ai/DECISIONS.md`.
4. Append a session entry to `.ai/SESSION_LOG.md`.
5. If working tree is dirty, commit WIP with `wip(handoff): <one-line summary>`. Do not push unless explicitly asked.
### Writing rules for .ai/ files
- Use model-neutral voice in `HANDOFF.md`, `SESSION_LOG.md`, `DECISIONS.md` ("previous session did X", NOT "Claude did X" or "Codex did X"). Exception: `SESSION_LOG.md` entries include an `<agent>` field in the header.
- Do not duplicate content between files. `CURRENT_TASK.md` holds the goal, `HANDOFF.md` holds the resume point, `TODO.md` holds the backlog. If unsure where something goes, check `.ai/README.md`.
- Don't invent facts about the repo. If you're uncertain, write `TODO: confirm` and flag it.
### Project principle
Prefer correct architecture over minimal diff. Flag "simpler approach" tradeoffs for review before taking them.
## Codex-specific notes
### Tooling you do NOT have
- **No GitNexus tools.** Use `grep -r`, `rg`, `git grep`, or `find` for code search. For blast-radius reasoning, grep call sites manually and read the files.
- **No gstack slash commands** (`/review`, `/ship`, `/qa`, `/browse`, `/investigate`, `/design-review`, `/plan-*`). Run the equivalent work directly: `pytest` for tests, `npm run build` for frontend validation, manual PR description for review flow. If `python`/`npm` aren't on PATH, the host runs services in Docker — use the `docker exec resolutionflow_{backend,frontend} …` form documented in `.ai/PROJECT_CONTEXT.md` rather than installing toolchains.
- **No `/codex` second-opinion command.** You are Codex.
### Git trailer
Every commit: `Co-Authored-By: Codex <noreply@openai.com>`
### Model selection
Handled on OpenAI's side. Do not attempt to set Anthropic model aliases for your own runtime. (The repo's application code still uses Anthropic aliases like `claude-sonnet-4-6` via `settings.get_model_for_action()` — that's runtime config for the product, not your agent.)
### Reviewing Claude's work
When you resume from a Claude session, assume some decisions may have been informed by GitNexus queries or gstack commands whose output isn't in the handoff. If a decision looks unverified from the `.ai/` files alone, either:
- re-verify with `grep`/`rg`/file reads, or
- flag it in `HANDOFF.md` under "Open questions" so Michael or Claude can confirm on the next handoff.
Do not assume tooling output that isn't written down.

View File

@@ -2,9 +2,40 @@
All notable changes to ResolutionFlow are documented here.
## [Unreleased]
## [0.1.0.0] - 2026-04-16
### Added
- **PSA Ticket Management** — dedicated `/tickets` page with URL-param filter state (board, status, priority, company, assignment, closed), paginated ticket list, and slide-in detail panel
- **TicketDetailPanel** — full ticket view with notes feed, configurations, related tickets, and resource manager; optimistic status updates via dropdown
- **NewTicketModal** — two-tab ticket creation: "Quick Create (AI)" parses natural language into a pre-filled form via Claude, "Full Form" for manual entry; validates required fields before submitting to CW
- **AiTicketParseForm** — natural language → structured ticket data using Claude; resolves board and assignee automatically, flags fields needing manual selection
- **TicketResourceManager** — add/remove CW members as ticket resources with member search autocomplete
- **Spin-off ticket creation from ResolutionAssist** — AI can detect when a new ticket should be created mid-session and surface the NewTicketModal pre-filled with session context
- **TicketQueue improvements** — dashboard widget now detects member mapping, caps at 5 items, shows "View All" link to `/tickets`
- **Board statuses endpoint** — `GET /integrations/boards/{board_id}/statuses` for direct status lookup without a ticket context
- **Paginated ticket search** — `search_tickets` returns `{items, total, page, page_size}`; parallel CW count fetch for accurate totals
- **Ticket service layer** — `ticket_service.py` wraps all PSA mutations (create, update status, list/add/remove resources)
- **Priority lookup endpoint** — `GET /integrations/tickets/priorities` for form dropdowns
- **PSA error surfacing** — `/tickets` page shows inline error banner with specific guidance when CW returns a permissions error (replaces silent empty state)
### Fixed
- CW query injection: sanitize search `query` string to strip single quotes before interpolation into CW conditions
- `company_id` filter now correctly applied to CW ticket search conditions (was silently ignored)
- `linkedTicket` fetch in ResolutionAssist guarded with `currentChatRef` to prevent race condition on session switch
- Members endpoint auth gate no longer rejects engineers without a PSA mapping
- Board fallback: ticket list derives available boards from ticket data when the boards API returns empty (permissions)
- Assignment search and "Load More" removed from resource manager in favor of direct member list
## [Unreleased]
### Changed
- **In-product User Guides rewrite** — replaced 15 feature-dump guides with 43 problem-oriented Diátaxis how-tos grouped under 10 categories (Getting started, Working a pilot session, Closing out a session, Documentation & sharing, Authoring flows, Reusable assets, AI assistance, PSA integrations, Account & team admin, Analytics). Dropped three deprecated guides (Maintenance Flows, AI Assistant page, Flow Assist sparkle button — UI no longer exists). Renamed Step Library → Solutions Library to match canonical product terminology. Corrected sidebar entry-path references throughout (Dashboard → Home, All Flows → Flows, Sessions → History, Analytics → Data, etc.). Added `category` and optional `relatedSlugs` to the Guide schema; `GuidesHubPage` now renders category sections; `GuideDetailPage` shows a "Related guides" footer when set. Authored 14 net-new how-tos covering FlowPilot-era surfaces with no prior coverage: tasklane keyboard flow, what-we-know panel, ask-the-AI mid-session, pause-and-leave, resolve a session, record a suggested-fix outcome, escalate (Escalation Mode), post docs to a ConnectWise ticket, share a client update mid-session, build a script with Script Builder, open an AI-suggested flow, pin a flow, and invite a teammate. Fixed a long-standing rendering bug where `**bold**` markdown in `step.tip` rendered literally instead of bolded — the same regex replacement now runs on tips as on instructions. Killed the misleading "N sections" subtitle on guide cards (single-section how-tos make the count noise).
### Added
- **TaskLane keyboard-first answer flow** (#158) — Enter submits and auto-advances to the next pending task; Shift+Enter inserts a newline; Esc cancels; after the last task, focus jumps to the Send Responses button so the engineer can fire the whole batch with one more keystroke. Mouse path also auto-advances. Subtle hint row (`⏎ submit · ⇧⏎ newline`) under each open input teaches the shortcut.
- **Collapsible "What we know" section** (#158) — TaskLane's facts list is now a collapsible section with per-session memory in `sessionStorage`. Auto-collapses on first render at ≥5 facts so Questions and Diagnostic Checks stay above the fold; engineer's explicit toggle always wins.
- **Escalation Mode wedge** (#155) — when an engineer escalates, the senior tech who claims the session lands on a magic-moment handoff-context screen with the structured briefing visible in seconds (no scrolling, no chat re-read). Live SSE pushes new arrivals to anyone watching the queue, atomic claim resolves race conditions, the queue auto-excludes the claimed session, the claiming user retains chat ownership for AI briefings, and a new analytics endpoint tracks post-claim time-to-first-action so you can see real minutes recovered (paired with a manual baseline — see CURRENT_TASK.md two-metric framing).
- **Suggested-fix "Awaiting verification" outcome** (#156) — when a fix needs external confirmation (client power-cycle, AD replication, license sync) you can park it in `applied_pending` instead of forcing a worked / didn't / partial verdict. The new PendingBanner shows the parked status with worked / didn't / update reason / dismiss actions. The "Still checking" nudge records pending with a reason instead of just silencing. Page-level Resolve auto-patches pending → success before the resolution flow opens; page-level Escalate intercepts pending the same way it intercepts verifying/partial. Resolution notes and escalation packages frame the pending state honestly (provisional fix; leading hypothesis with what's being waited on).
- Tree Templates + Import/Export marketplace (#66)
- Recurring Issue Detection — client-specific pattern alerts (#60)
- Step Feedback Flag — "This Step is Wrong" reporting (#58)
@@ -18,6 +49,8 @@ All notable changes to ResolutionFlow are documented here.
- **Image support in Assistant Chat** — paste/attach images in chat input, uploaded to S3, resized for vision model, displayed in conversation history
### Changed
- **Assistant Chat session screen — UX overhaul** (#158, "impeccable" pass) — removed the duplicate "Suggested checks" chip strip in favor of the TaskLane as the single source of truth; added an inline `Next steps · N pending` cue above the latest action-bearing AI bubble; consolidated the session header to two visible primary actions (Resolve + Escalate) plus a kebab for Context / New Ticket / Update Ticket / Pause; centered the messages column to `max-w-3xl` to match the composer; unified chat-bubble radii to `rounded-xl`; dropped every banned decoration (3px side stripes, gradient surfaces, accent borderTop, backdrop blur, pulse rings, bordered avatar boxes) for a single decoration channel per surface; unified 14 distinct text sizes into a 5-step scale (10/11/12/13/14px); split the ambiguous `MessageCircleQuestion` icon into `Pencil` (write affordance for question Answer CTA) and `HelpCircle` (universal help icon for the per-check explainer); audited and dropped redundant `font-sans` classes across the screen.
- **Suggested-fix banner ↔ script panel are now linked** (#158) — collapsing the ProposalBanner now also hides the InlineNoTemplateDialog / TemplateMatchPanel; dismissing the banner closes both surfaces. Recording any outcome on a fix (Dismiss, It worked, Didn't work, Mark partial, Waiting to verify) closes the script panel alongside the banner state transition.
- **Edit Procedure page** — layout overhaul and color system refinements for better visual hierarchy
- **Flows sidebar navigation** — collapsed to reduce visual noise; session recovery removed from library view
- **Account settings page** — audit fixes for improved consistency and usability
@@ -28,6 +61,7 @@ All notable changes to ResolutionFlow are documented here.
- **Tenant data boundaries** — all session and tree endpoints now return 404 (not 403) for cross-tenant access attempts to avoid confirming resource existence
### Fixed
- **`ParameterizationPreview` over-highlight on short parameter values** (#158) — the tokenizer matched highlight values via raw substring with no word-boundary check, so a single-char value like `"D"` (a drive letter) lit up every capital D in identifiers like `Get-ADUser`, `Add-Type`, `Disable-`. Added a word-boundary guard that's conditional on whether the value itself starts/ends with a word character, so values with leading/trailing punctuation (e.g. `"D:\\Folder"`) still match cleanly when adjacent to whitespace.
- **CRITICAL: Copilot tree query isolation** (#131) — user could access any tree UUID if known, exposing full tree structure to AI. Now scoped to current account with 404 for inaccessible trees.
- **AI session search isolation** — search endpoint leaked other users' sessions via OR(user_id, account_id). Now restricted to current user only.
- **Analytics endpoint isolation** — GET `/analytics/flows/{tree_id}` exposed session counts for any tree UUID. Now returns 404 if tree doesn't belong to requesting account.

267
CLAUDE.md
View File

@@ -1,215 +1,43 @@
# CLAUDE.md — ResolutionFlow
> SaaS troubleshooting platform for MSPs. Last reviewed 2026-04-19.
You are Claude Code, the primary coding agent for ResolutionFlow. OpenAI Codex is the resume agent when you hit session or weekly limits.
**Naming:** Canonical product name is **ResolutionFlow**. `patherly` is the legacy internal name — still present in DB name (`patherly` on Railway, `resolutionflow` locally), some Railway service names, and historical paths. Treat as aliases, not canonical. Docker containers are `resolutionflow_*`.
The first thing to do every session: read [`.ai/PROJECT_CONTEXT.md`](.ai/PROJECT_CONTEXT.md), [`.ai/CURRENT_TASK.md`](.ai/CURRENT_TASK.md), and [`.ai/HANDOFF.md`](.ai/HANDOFF.md). The ritual is spelled out below.
**User terminology:** "Flows" (not Trees), "Projects" (not Procedures), "Solutions Library" (not Step Library). Maintenance flows hidden from pilot UI (backend retains them). DB column `tree_type` values unchanged.
> The protocol section below is byte-identical to the shared block in AGENTS.md. If you edit one, edit the other.
**SaaS shape:** Multi-tenant by account. Roles: `super_admin` > `team_admin` > `engineer` > `viewer`. Team admin = `role='engineer'` + `is_team_admin=True` + valid `team_id`. Never `role=='admin'` — use `is_super_admin`. Backend deps in `app/api/deps.py`: `get_current_active_user`, `require_engineer_or_admin`, `require_admin`. Frontend: `usePermissions()` hook. Central logic in `backend/app/core/permissions.py` + `frontend/src/hooks/usePermissions.ts`.
## Shared protocol
**Status:** Go-to-Market Validation (pre-PMF). Backend feature-complete (55+ endpoints, 100+ tests). Phase 0.5 FlowPilot telemetry baseline accruing. See `CURRENT-STATE.md` for live status, `03-DEVELOPMENT-ROADMAP.md` for phases.
### Startup ritual (every session)
**Principle:** Prefer correct architecture over minimal diff. Flag "simpler approach" tradeoffs for review before taking them.
1. Read `.ai/PROJECT_CONTEXT.md` — architectural truth for this repo.
2. Read `.ai/CURRENT_TASK.md` — what we're actively working on.
3. Read `.ai/HANDOFF.md` — exact resume point.
4. Skim `.ai/DECISIONS.md` for recent entries relevant to the current task.
5. Run `git log --oneline -15` and `git status`.
6. Before taking action, state back in two sentences: the current goal and your proposed next action.
---
### Handoff ritual (session end — limit hit, task complete, or user stop)
## Tech stack
1. Update `.ai/HANDOFF.md` to reflect new state. Keep it under ~2K tokens.
2. If `CURRENT_TASK.md` status changed, update it.
3. If you made an architectural decision, append to `.ai/DECISIONS.md`.
4. Append a session entry to `.ai/SESSION_LOG.md`.
5. If working tree is dirty, commit WIP with `wip(handoff): <one-line summary>`. Do not push unless explicitly asked.
- **Backend:** Python 3.11 + FastAPI, SQLAlchemy 2.0 async (asyncpg), Alembic, Pydantic v2, JWT (python-jose + bcrypt, JTI refresh rotation), APScheduler (in-process with FastAPI lifespan).
- **Frontend:** React 19 + Vite + TypeScript, Tailwind v4 (CSS-only config in `index.css`), Zustand (immer + zundo), React Router v7, Axios (token-refresh interceptor), Lucide.
- **DB:** PostgreSQL 16 (RLS enabled Phase 4, pgvector).
### Writing rules for .ai/ files
---
- Use model-neutral voice in `HANDOFF.md`, `SESSION_LOG.md`, `DECISIONS.md` ("previous session did X", NOT "Claude did X" or "Codex did X"). Exception: `SESSION_LOG.md` entries include an `<agent>` field in the header.
- Do not duplicate content between files. `CURRENT_TASK.md` holds the goal, `HANDOFF.md` holds the resume point, `TODO.md` holds the backlog. If unsure where something goes, check `.ai/README.md`.
- Don't invent facts about the repo. If you're uncertain, write `TODO: confirm` and flag it.
## Project structure
### Project principle
```
resolutionflow/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry
│ │ ├── api/endpoints/ # auth, trees, sessions, admin, steps, survey, copilot, assistant_chat, integrations, flow_proposals, flowpilot_analytics
│ │ ├── api/deps.py # auth deps (incl. require_team_admin)
│ │ ├── api/router.py # registration
│ │ ├── core/ # config, database, permissions, security, audit, rate_limit
│ │ ├── models/ # SQLAlchemy (incl. FlowProposal)
│ │ ├── schemas/ # Pydantic
│ │ ├── services/psa/ # PSA provider pattern (base, connectwise/, autotask/, halopsa/, cache, encryption, registry, types)
│ │ ├── services/knowledge_flywheel.py + _scheduler.py
│ │ └── services/knowledge_gap_service.py
│ ├── alembic/versions/ # 001-070 sequential, then hex hash
│ ├── scripts/ # seed_data, seed_trees, seed_test_users
│ └── tests/ # pytest integration
├── frontend/
│ ├── src/
│ │ ├── api/ # Axios client + endpoint modules
│ │ ├── components/ # common, layout, dashboard, tree-editor, session, procedural, procedural-editor, library, step-library, ui, flowpilot
│ │ ├── hooks/ # usePermissions, useSessionTimer, useKeyboardShortcuts
│ │ ├── pages/
│ │ ├── store/ # Zustand (auth, treeEditor, proceduralEditor, userPreferences, scriptGeneratorStore)
│ │ └── types/
│ └── (Tailwind v4 CSS-only config in src/index.css)
├── docs/plans/archive/ # pre-March 2026 plans
├── docs/connectwise/ # CW API reference + best-practices guides
├── docs/LESSONS-ARCHIVE.md # archived lessons (fixes in code)
├── CLAUDE.md · CURRENT-STATE.md · DESIGN-SYSTEM.md · DEV-ENV.md
```
Prefer correct architecture over minimal diff. Flag "simpler approach" tradeoffs for review before taking them.
---
## Claude-specific tooling
## Design system
**Source of truth: [DESIGN-SYSTEM.md](DESIGN-SYSTEM.md).** Read before any visual change.
- Flat high-contrast dark theme, Sentry/PostHog-inspired. **No** glass, backdrop blur, ambient orbs, gradient surfaces.
- Accent **electric blue** (#60a5fa dark / #2563eb light) — ≤5% of UI, interactive elements only. Warning amber (#fbbf24), info cyan (#67e8f9), success green (#34d399), danger red (#f87171). Each with `-dim` at 10% opacity.
- Backgrounds: `bg-sidebar` (#0e1016) → `bg-page` (#16181f) → `bg-card` (#1e2028) → `bg-elevated` (#2a2d38). Borders `border-default` / `border-hover`.
- Text: `text-heading``text-primary``text-muted-foreground``text-muted`.
- Fonts: IBM Plex Sans (body), Bricolage Grotesque (heading, 700 weight for logo), JetBrains Mono (code).
- Logo: 30px gradient square (ember orange) + "ResolutionFlow" in Bricolage Grotesque. Assets in `brand-assets/`, `frontend/src/assets/brand/`, `frontend/public/icons/`.
- Mockups: `docs/mockups/` (HTML).
- **Deprecated — do not use:** glass-card, glass-stat, `bg-gradient-brand`, `backdrop-filter: blur()`, ambient orbs, purple gradients, ember orange as accent, cyan as accent (cyan is info only).
---
## ConnectWise PSA
Reference: `docs/connectwise/` — start with `CONNECTWISE-API-REFERENCE.md`, then the `best-practices/` guides. Extracted OpenAPI spec in `connectwise-psa-resolutionflow-reference.json` (670 endpoints, v2025.16); full spec in `connectwise-psa-openapi-full.json`.
- **Auth:** API Key (Base64 `companyId+publicKey:privateKey`) + `clientId` header every request. `clientId` is server-side (`CW_CLIENT_ID` in `config.py`) — identifies ResolutionFlow, not per-tenant. Per-connection: `company_id`, `public_key`, `private_key`, `server_url`.
- **Architecture:** `services/psa/` provider pattern — `PSAProvider` base, `ConnectWiseProvider` impl, `PsaProviderRegistry` for multi-PSA dispatch. Credentials encrypted at rest via `services/psa/encryption.py` (Fernet). Per-team credentials, never per-user. Endpoints in `api/endpoints/integrations.py`. In-memory TTL cache in `services/psa/cache.py`.
- **Integration flows:** session docs → ticket notes (`POST /service/tickets/{id}/notes`, markdown supported); ticket context → FlowPilot; callbacks via `/system/callbacks` with HMAC verification.
- **API rules:** pin version via Accept header `application/vnd.connectwise.com+json; version=2025.16`. Paginate ≤1000/page. Dynamic base URL via `/login/companyinfo/{companyId}`. Request minimal permissions (MY, not ALL).
---
## Dev commands
Full setup in [DEV-ENV.md](DEV-ENV.md) (host-agnostic, with homelab Proxmox reference topology). Day-to-day:
```bash
docker compose -f docker-compose.dev.yml up -d # start stack
cd backend && source venv/bin/activate && uvicorn app.main:app --reload
cd frontend && npm run dev
pytest --override-ini="addopts=" # tests (first time: CREATE DATABASE resolutionflow_test)
cd backend && alembic upgrade head # migrate
cd backend && alembic revision -m "desc" # manual migration (preferred per Lesson 77)
cd backend && alembic revision --autogenerate -m "desc" # picks up drift; review carefully
cd frontend && npm run build # stricter than tsc --noEmit — final check
cd frontend && npx tsc -b # TS-only check when dist/ has EACCES
docker exec -it resolutionflow_postgres psql -U postgres -d resolutionflow
python -m scripts.seed_trees # seed (from backend/)
```
**URLs:** Frontend <http://localhost:5173>, backend <http://localhost:8000>, API docs <http://localhost:8000/api/docs>.
**Test users** (all password `TestPass123!`): `admin@resolutionflow.example.com` (super_admin), `teamadmin@resolutionflow.example.com`, `engineer@resolutionflow.example.com`, `pro@resolutionflow.example.com`.
**CI:** Gitea (`gitea.resolutionflow.com/chihlasm/resolutionflow/actions`). `gh` CLI works for issues/PRs on the GitHub mirror, but not CI runs.
**Never pass `--rev-id`** to alembic — let it generate the hex hash.
---
## Common tasks
- **New endpoint:** `endpoints/``router.py``schemas/` → tests → frontend API client.
- **New page:** `pages/` → route in `router.tsx` → nav in `AppLayout.tsx`.
- **New public route:** top-level in `router.tsx` alongside `/login`, not inside `ProtectedRoute`.
- **New frontend API module:** types in `types/` → export from `types/index.ts` → client in `api/` → export from `api/index.ts`.
- **Schema change:** update model → `alembic revision -m "desc"` → review → `alembic upgrade head`.
- **New `VITE_*` env var:** add as `ARG` + `ENV` in `frontend/Dockerfile` for Railway builds (Lesson 60 — Railway env vars are runtime-only, Vite bakes at build time).
- **Account sub-page:** add route in `router.tsx` under `account` children + add link card in `AccountSettingsPage.tsx``AccountLayout` has NO sidebar nav.
---
## Coding standards
- **Python:** type hints everywhere, async/await for DB, Pydantic v2, `DateTime(timezone=True)` always.
- **TypeScript:** interfaces for all data, `const` over `let`, functional components + hooks, shared logic in custom hooks.
- **Git:** feature branch before committing (`git checkout -b feat/feature-name`). Format: `type: description` (feat/fix/refactor/docs/test/chore). Always `Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>`. Large features: commit per phase with `npm run build` validation. Push to Gitea — auto-mirrors to GitHub (`.gitea/workflows/mirror-to-github.yml`); never push GitHub directly.
**After shipping:** update `CURRENT-STATE.md` + `03-DEVELOPMENT-ROADMAP.md`, `gh issue close #N` for resolved issues, add lessons here only for non-obvious traps (otherwise let the code speak).
---
## Frontend patterns
- **Component basics:** `cn()` from `@/lib/utils`, Lucide icons, `Modal.tsx` for modals (mobile-responsive `items-end sm:items-center` + `max-w-full sm:max-w-lg`).
- **Types:** Create in `types/`, export from `types/index.ts`, `import type { T } from '@/types'`.
- **Routing:** `getTreeNavigatePath()` / `getTreeEditorPath()` from `@/lib/routing`. Tree editor is `/trees/new`. All dashboard session clicks → `/pilot/:id` regardless of `session_type`.
- **Lazy routes:** `lazyWithRetry` from `@/lib/lazyWithRetry.ts`, not `React.lazy` (auto-reload on stale chunks).
- **Public pages:** raw `fetch()` with full URL, NOT `apiClient` (which requires auth tokens).
- **Toast:** `toast.warning()` not `toast.warn()`. Import from `@/lib/toast` — methods: `success`, `error`, `warning`, `info`.
- **Assistant chat:** uses local React `useState`, not Zustand. All three send paths (`handleSend`, `sendPrefill`, `handleResumeNew`) must call `setShowTaskLane(true)` when response has actions/questions.
- **Chat backend wiring:** `aiSessionsApi.sendChatMessage``/ai-sessions/{id}/chat``unified_chat_service.py`. NOT `assistant_chat_service.py` (removed except retention settings).
- **FlowPilot:** Actions live in page header (Resolve/Escalate/Share Update + overflow). `useBlocker` for active-session nav guard. "Pause & Leave" auto-pauses.
- **AI markers:** `[QUESTIONS]`, `[ACTIONS]`, `[FORK]`, `[DELTA]...[/DELTA]` (editor), `[TREE_UPDATE]` (troubleshooting builder), `[STEPS_UPDATE]` (procedural builder), `[METADATA]`. Parsed in `unified_chat_service.py`; conversation history stores stripped `display_content`. If markers disappear: check system-prompt final reminder + per-user-message `[SYSTEM: ...]` injection in `_call_anthropic_cached()`.
- **Image uploads:** paste/attach → Railway S3 via `uploadsApi.upload()` → resized by `storage_service.resize_image_for_vision()` (Pillow, 1568px max, PNG→JPEG) → base64 → Claude multimodal blocks. Max 3/msg. Images NOT stored in history.
- **Async select-load-apply:** guard with a ref (pattern in `AssistantChatPage` `currentChatRef`). Update synchronously on every selection change; after every `await`, bail out if `ref.current !== thisId`.
- **Editor-Embedded Flow Assist:** `EditorAIPanel` (320px side panel) + `useEditorAI`. Ghost nodes via `_suggestion: true`. Route actions via `settings.get_model_for_action()`.
- **Script Builder:** `/script-builder`, chat-style. Backend `ScriptBuilderSession`, `script_builder_service.py`, endpoints `/scripts/builder/`. FlowPilot handoff via `action_type: "open_script_builder"` + `sessionStorage`.
- **Intake form field schema:** `variable_name` + `field_type` (NOT `name` / `type`).
- **Node field priority** (copilot, summaries): `title``question``description``content``label`.
- **Procedural sessions auto-start** on page load (no intake/Start screen). Troubleshooting flows DO have a start screen.
---
## Critical lessons
> Lessons 1-40 archived to `docs/LESSONS-ARCHIVE.md` — fixes baked into the codebase. **Grep the archive when an error message or symptom is unfamiliar, or after two failed attempts at resolving an issue.** Don't pre-load for routine work.
### Backend / data
- **APScheduler interval jobs always `max_instances=1`** — without it, overlapping runs reprocess records (TOCTOU).
- **`get_db` rolls back on exception** — never remove the `await session.rollback()`, or one failed request poisons the connection with `InFailedSQLTransaction` cascading.
- **Startup routines on tenant-isolated tables must use `_admin_session_factory()`, not `get_db()`.** Phase 4 RLS has no `app.current_account_id` set at startup. `get_service_account_id` is safe (reads cached `app.state`).
- **Backfill migrations adding `account_id`:** grep ALL `ModelClass(` sites in service code to verify `account_id=` is passed. SQLAlchemy accepts `None` silently — Phase 4 RLS WITH CHECK surfaces the problem at runtime as `InsufficientPrivilegeError: new row violates row-level security policy`.
- **`tree_shares.account_id = tree.account_id`**, never `current_user.account_id`. A super_admin sharing another tenant's tree must produce the share in the tree owner's tenant, or it becomes invisible post-RLS.
- **Global tables (no `account_id`, never in RLS migrations):** `script_categories`, `platform_steps`, `template_trees`, `plan_feature_defaults`, `accounts`. Scan at class level — one `.py` file can hold multiple classes with different columns (e.g. `ScriptCategory` vs `ScriptTemplate`).
- **`ai_sessions.status` is VARCHAR(30)** — fits `requesting_escalation` (23 chars). Migration `f0aad74ea51b` widened from 20.
- **PostgreSQL `func.sum(case(...))` returns `Decimal` via asyncpg** — cast to `int()` before Pydantic `dict[str, Any]`.
- **Enhancement / branch_addition proposals need `modified_flow_data` via "Edit & Publish"** — backend 400 on direct approve. Only `new_flow` supports direct approve.
- **Adding email types:** static async method on `EmailService` in `core/email.py`. Fire-and-forget from endpoints (log errors, don't fail the request).
### AI / FlowPilot
- **Anthropic SDK `max_retries=1`** — default of 2 can take 3× the timeout.
- **Model tier routing:** `settings.get_model_for_action(action_type)`. Always alias form (`claude-sonnet-4-6`).
- **FlowPilot must ask GUI-vs-script before suggesting either** when both are viable — see `FLOWPILOT_SYSTEM_PROMPT` in `flowpilot_engine.py`.
- **Telemetry events to grep:** `anthropic.cache` (prompt-cache hit/create), `mcp.turn` (per-turn MCP availability), `mcp.fallback` (MCP silent-retry fired).
- **Don't put literal payloads in system prompts.** Bit us twice in one day: a worked `[QUESTIONS]` example with literal "Outlook + jsmith" content, and a full DNS troubleshooting tree, both caused Claude to recite that content on unrelated tickets — the symptom looked like task-lane state leaking across chats. The fix is structural: every output example in a system prompt uses `<placeholder>` syntax (`{"text": "<one short, specific question>"}`), never literal field values. Real-looking format examples live in few-shot messages (separate file, separate code path), not system prompts. Guardrail: `tests/test_prompt_anti_parrot.py` scans every `*_PROMPT`/`*_SCHEMA`/`*_PROTOCOL`/`*_FORMAT` constant in `app/services/` and `app/core/`; CI fails when a marker block contains a literal JSON value or when a known leaked token (jsmith, DC01, ADSync, Dnscache, etc.) appears anywhere in a prompt.
### Frontend / UI
- **Flex height chain:** every ancestor from `app-shell` grid to React Flow canvas needs `flex` + `flex-1` + `min-h-0` or `h-full`. Missing `flex` collapses to 0. Same rule for FlowPilot action bar and any tall scroller.
- **React Flow CSS in Tailwind v4:** import in `index.css`, not component JS. Override dark theme via `--xy-*` CSS vars.
- **`text-secondary` renders invisible on dark** — Tailwind v4 maps it to `--color-secondary` (a surface color). Use `text-muted-foreground` for readable secondary text. Avoid `text-muted` for body — labels only.
- **`bg-accent` is electric blue — never for code/kbd.** Use `bg-white/[0.12] border border-white/[0.06]` for inline code, `bg-white/[0.08]` for kbd. Accent reserved for interactive elements.
- **`landing.css` uses self-contained `--lp-*` vars** — never `var(--color-*)` theme tokens (they resolve incorrectly outside the app shell).
- **Never `transition: all`** — list properties explicitly, or layout props animate and jank.
- **Date range filter end dates:** `setHours(23, 59, 59, 999)` before sending, or the day's items are excluded. For string-based date inputs, append `T23:59:59.999Z`.
- **TopBar search:** full bar `hidden sm:block`, icon button `sm:hidden` — both open CommandPalette.
- **Hover pop-out cards:** scrim `pointer-events-none`, expanded card has its own click handler at `z-50`, dismiss via `onMouseLeave` on wrapper. Never put handlers on the scrim.
- **`tsc -b` in Dockerfile is stricter than `tsc --noEmit`** — enforces `noUnusedLocals` / `noUnusedParameters` as hard errors. Check IDE yellow squiggles before pushing.
- **Dashboard prefill auto-submits** via `useEffect` + `prefillHandledRef` guard — no double-enter.
- **Global Axios 5xx interceptor fires before component `.catch()`** — fix optional-data endpoints at the source (return `[]` / `{}` on provider failure), not in the component.
- **Playwright strict mode:** scope selectors to avoid sidebar/main ambiguity. Use `getByRole('heading', { name })` or `.animate-scale-in` locators, not bare `getByText()`.
### Env / infra
- **Node 20.19+ required** (Vite 7). `nvm use 20` or `PATH="$HOME/.nvm/versions/node/v20.19.0/bin:$PATH"`.
- **Railway backend service is `patherly`, DB name `railway`.** Public Postgres proxy: `interchange.proxy.rlwy.net:45797`.
- **Railway Object Storage bucket `resolutionflow-uploads`.** Env vars `STORAGE_*`. boto3 in `storage_service.py`. Dockerfile needs Pillow + `libjpeg-dev` / `zlib1g-dev`.
- **PostHog:** `PostHogProvider` + `posthog.init()` in `main.tsx`. Helpers in `lib/analytics.ts`. Env: `VITE_PUBLIC_POSTHOG_KEY`, `VITE_PUBLIC_POSTHOG_HOST`. `identifyUser()` in `authStore.fetchUser()`, `resetAnalytics()` on logout.
- **bun PATH on devserver01:** `BUN_INSTALL="$HOME/.bun"`, `PATH="$BUN_INSTALL/bin:$PATH"`. Playwright Chromium needs `libatk1.0-0 libatk-bridge2.0-0 libcups2 libxkbcommon0 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2`.
- **Full-stack change:** trace schema → endpoint → API client → hook → store → UI. Don't assume one end proves the other.
- **Dev env** — see DEV-ENV.md for current topology, `REPO_ROOT` requirement when compose runs inside a container, Vite `allowedHosts`, linuxserver.io `group_add` + custom-cont-init.d workaround, `docker compose up` no-op-on-unchanged-hash gotcha.
---
## GitNexus code intelligence
### GitNexus code intelligence
Indexed as `resolutionflow`. Earns its cost on cross-cutting work only.
@@ -224,42 +52,23 @@ Indexed as `resolutionflow`. Earns its cost on cross-cutting work only.
Re-indexes automatically on commit (PostToolUse hook). Manual refresh if stale: `npx gitnexus analyze`.
---
### gstack skills
## gstack skills
Always use `/browse` for web, never `mcp__claude-in-chrome__*`.
Always use `/browse` for web, never `mcp__claude-in-chrome__*`. Most-used:
Available commands:
- `/review` — pre-land PR review
- `/ship` — tests + review + PR creation
- `/browse` + `/qa` / `/qa-only` — headless browser testing (setup: Lesson 82)
- `/design-review` — visual QA
- `/investigate` — systematic debug with root cause
- `/codex` OpenAI Codex second opinion
- `/plan-eng-review` / `/plan-design-review` / `/plan-ceo-review` — plan critiques
- **Planning & review:** `/autoplan`, `/plan-eng-review`, `/plan-design-review`, `/plan-ceo-review`, `/plan-devex-review`, `/devex-review`, `/review`, `/cso`, `/office-hours`
- **Design:** `/design-consultation`, `/design-shotgun`, `/design-html`, `/design-review`
- **Browser & QA:** `/browse`, `/connect-chrome`, `/qa`, `/qa-only`, `/setup-browser-cookies`
- **Ship & deploy:** `/ship`, `/land-and-deploy`, `/canary`, `/benchmark`, `/setup-deploy`, `/document-release`
- **Debug & investigate:** `/investigate`, `/careful`, `/freeze`, `/guard`, `/unfreeze`
- **Other:** `/codex` (OpenAI second opinion), `/setup-gbrain`, `/retro`, `/learn`, `/gstack-upgrade`
---
### Git trailer
## Deployment (Railway)
Every commit: `Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>`
- **Prod:** `resolutionflow.com` (frontend), `api.resolutionflow.com` (backend).
- Auto-deploy: Gitea push → GitHub mirror → Railway follows GitHub `main`.
- PR environments auto-created; need manual domain generation + `VITE_API_URL` with `https://` prefix.
- `ALLOW_RAILWAY_ORIGINS=true` for `*.up.railway.app` CORS.
- Shared Variables (Railway project-level) auto-propagate to PR envs — use for secrets like `ANTHROPIC_API_KEY`.
- Super admin utility: `backend/make_superadmin_simple.py list|<email>`.
### Model aliases
---
## Quick reference
| What | Where |
|---|---|
| Detailed status | [CURRENT-STATE.md](CURRENT-STATE.md) |
| Roadmap | [03-DEVELOPMENT-ROADMAP.md](03-DEVELOPMENT-ROADMAP.md) |
| Design system | [DESIGN-SYSTEM.md](DESIGN-SYSTEM.md) |
| Dev env | [DEV-ENV.md](DEV-ENV.md) |
| Archived lessons | [docs/LESSONS-ARCHIVE.md](docs/LESSONS-ARCHIVE.md) |
| ConnectWise API | `docs/connectwise/` |
| GitHub issues | `gh issue list --state open` |
| Local API docs | <http://localhost:8000/api/docs> |
Always use alias form (`claude-sonnet-4-6`, `claude-opus-4-6`, etc.) via `settings.get_model_for_action()`. Never hardcode a dated model ID.

View File

@@ -2,7 +2,7 @@
> **Purpose:** Quick-reference file showing exactly where the project stands.
> **For Claude Code:** Read this first to understand what's done and what's next.
> **Last Updated:** April 12, 2026
> **Last Updated:** May 1, 2026
---
@@ -10,6 +10,14 @@
---
## Recently shipped (post-0.1.0.0)
- **2026-05-01 — PR #158** Session-screen UX impeccable pass + tasklane keyboard flow. Heuristic score 24/40 → 33/40 across five sub-passes (distill, quieter, layout, typeset, polish). Removed duplicate "Suggested checks" chip strip → TaskLane is the single source of truth; added inline `Next steps · N pending` cue on the latest action-bearing AI bubble; consolidated session header to Resolve + Escalate + ⋯ kebab; centered messages column to match composer; dropped all banned decorations (side stripes, gradient surfaces, backdrop blur, accent borderTop) for a single decoration channel per surface; unified 14 text sizes into a 5-step scale. TaskLane keyboard flow: Enter submits + auto-advances, Shift+Enter newline, Esc cancel, focus jumps to Send after the last task. Banner ↔ script-panel are now linked (collapse hides both, any outcome closes both). WhatWeKnow section is collapsible with `sessionStorage` memory + auto-collapse-at-5-facts. Side fix: ParameterizationPreview no longer over-highlights short parameter values (word-boundary check). Two backlog entries logged in `.ai/TODO.md`: ConcludeSessionModal multi-select and `bg-card-hover` Tailwind drift in CommandPalette.
- **2026-05-01 — PR #156** Suggested-fix "Awaiting verification" outcome. Engineers can now park a fix in `applied_pending` (waiting on client power-cycle, AD replication, license sync, etc.) instead of forcing a synchronous worked/didn't/partial verdict. PendingBanner with worked / didn't / update reason / dismiss; nudge "Still checking" records pending with a reason; page-level Resolve auto-patches pending → success before the resolution flow opens; page-level Escalate intercepts pending. Migration `c0f3a4b7e91d` (`pending_reason` column + status CHECK constraint).
- **2026-04-30 — PR #155** Escalation Mode wedge. Magic-moment handoff-context screen for senior pickup, live SSE escalation arrivals, post-claim time-to-first-action metric (`GET /analytics/flowpilot/escalations`), atomic role-gated claim with conflict resolution, queue self-exclusion, chat ownership extended to claimed sessions. The wedge for the first paying-customer push.
---
## What's Complete
### Core Platform

View File

@@ -108,7 +108,7 @@ Run these in order. Stop at the first failure and investigate.
# Ubuntu / Debian
sudo apt update && sudo apt install -y \
git curl build-essential \
python3.11 python3.11-venv python3-pip \
python3.12 python3.12-venv python3-pip \
postgresql-client # not the server — only if running Postgres natively
# Node 20 via nvm (survives container rebuilds if stored in a volume)
@@ -236,7 +236,7 @@ REPO_ROOT=/absolute/path/to/resolutionflow
```bash
cd backend
python3.11 -m venv venv
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

View File

@@ -11,7 +11,7 @@
## Quick Start
```bash
# Prerequisites: Docker, Python 3.11+, Node.js 20+
# Prerequisites: Docker, Python 3.12, Node.js 20+
# Start PostgreSQL
docker start patherly_postgres

View File

@@ -1,70 +0,0 @@
# Session Handoff — Design System v4 Migration
> **For the next Claude session:** Read this file completely, internalize the context, then delete it (`rm SESSION-HANDOFF.md`). This is a one-time context transfer.
---
## What Was Done This Session
### 1. FlowPilot Message Bar + AI Script Builder (MERGED to main)
- PR #118 merged. Always-visible message bar in FlowPilot sessions, AI Script Builder at `/script-builder`, library reorg (My/Team Scripts tabs), FlowPilot-to-Script-Builder handoff, session abandon/close, unified session history.
- Eng review completed: normalized `script_builder_messages` table, typed content helpers, 6 edge case tests.
### 2. Design System v4 Migration (PR #119, open, branch: `refactor/design-system-v4`)
- Complete frontend redesign from glassmorphism to flat dark theme (Sentry/PostHog-inspired)
- **CSS Foundation:** New color tokens in `index.css`, all via CSS custom properties. Light mode ready (just needs `.light` class values).
- **Icon Rail Sidebar:** 72px rail with 5 grouped icons (Home, Work, Knowledge, Insights, Help). Full-height resizable drawer on hover. Pin-to-expand to 260px. Mobile hamburger overlay.
- **Component Sweep:** ~200 files migrated. All hardcoded hex replaced with semantic Tailwind tokens (bg-card, text-foreground, border-border, etc.).
- **Landing Page:** Flat surfaces, no glow, solid buttons.
- **Interactive Shadows:** Dark-mode-aware — elevated surfaces + faint cyan accent glow (black shadows invisible on dark bg).
- **Stat Cards:** 3px colored left borders.
- **Tab Toggles:** Active state uses `tab-active-shadow` (elevated bg + faint glow).
### 3. GTM Strategy (from /office-hours)
- Shadow & Ship approach: Michael uses ResolutionFlow on real tickets for 2 weeks, then hands logins to 5 MSP colleagues. Key metric: unprompted return.
- Design doc at `~/.gstack/projects/patherly-patherly/`
---
## What Needs To Be Done Next
### Immediate (Design System v4 polish)
1. **Home icon color fix:** The Home icon in the sidebar shouldn't have a cyan background when not active. Instead, the Home icon itself should always be cyan (brand accent), and only show the `bg-accent-dim` background when the route is actually `/`. Michael specifically requested this.
2. **Visual QA pass:** Michael hasn't done a full page-by-page walkthrough yet. Expect feedback on individual pages once he does.
3. **`font-label` cleanup:** ~10 files still reference `font-label` (deprecated alias for `font-mono`). Each needs inspection — some should be `font-mono`, others `font-sans text-xs`.
4. **Inline `style` attributes:** ~29 instances still use hardcoded hex in inline styles (sidebar, drawer, badges). Should be converted to CSS variable references or Tailwind classes where possible.
### Before Merging PR #119
- Run migrations: `docker exec resolutionflow_backend alembic upgrade head` (new tables from the Script Builder PR are on main now)
- Full visual QA with backend running
- Test mobile responsive (hamburger menu)
- Test FlowPilot session with new message bar + action bar positioning
### Future
- **Light mode toggle:** CSS variables are ready. Need to add `.light` class values in `index.css` + toggle in user settings/account page.
- **Script Builder testing:** The AI Script Builder hasn't been tested end-to-end with the backend running yet.
---
## Key Files to Know
| File | What it does |
|------|-------------|
| `DESIGN-SYSTEM.md` | Single source of truth for all design decisions |
| `frontend/src/index.css` | CSS tokens, component utilities, shadow patterns |
| `frontend/src/components/layout/Sidebar.tsx` | Icon rail + drawer + pinned sidebar |
| `frontend/src/components/layout/AppLayout.tsx` | CSS Grid shell |
| `frontend/src/components/dashboard/StartSessionInput.tsx` | The Guided/Chat toggle |
| `frontend/src/components/dashboard/PerformanceCards.tsx` | Stat cards with colored borders |
## Key Lessons From This Session
- The component sweep agents missed `editor-ai/`, `guides/`, `maintenance/`, `scripts/`, `settings/` directories and `text-brand-dark` references. Always do a final grep audit after sweeps.
- `bg-[#hex]` hardcoding defeats the purpose of CSS variables. We had to do a second pass to replace 3,200+ hardcoded values with semantic tokens.
- Black shadows (`rgba(0,0,0,...)`) are invisible on dark backgrounds. Use elevated surfaces + faint accent glow instead.
- The sidebar flyout needed `position: fixed` to escape the CSS Grid cell clipping — `absolute` positioning was hidden behind the main content area.
- Flyout hover timing: individual item `onMouseLeave` was killing the flyout before the mouse reached the drawer. Only the outer wrapper should handle `onMouseLeave`.
---
> **After reading this file:** Save relevant context to your session memory, then run `rm SESSION-HANDOFF.md` and `git add -A && git commit -m "chore: remove session handoff file"`.

1
VERSION Normal file
View File

@@ -0,0 +1 @@
0.1.0.0

View File

@@ -21,4 +21,12 @@ ANTHROPIC_API_KEY=
VOYAGE_API_KEY=
# ConnectWise PSA Integration
CW_CLIENT_ID=<CONNECTWISE CLIENT ID>
CW_CLIENT_ID=<CONNECTWISE CLIENT ID>
# Stripe
# Test keys from Stripe Dashboard → Developers → API keys (with Test mode toggled on).
# Webhook secret for local dev: from `stripe listen --forward-to localhost:8000/api/v1/webhooks/stripe`.
# When unset, app/core/config.py:stripe_enabled returns False and Stripe code paths short-circuit.
STRIPE_SECRET_KEY=sk_test_
STRIPE_PUBLISHABLE_KEY=pk_test_
STRIPE_WEBHOOK_SECRET=whsec_

View File

@@ -5,6 +5,12 @@ WORKDIR /app
RUN apt-get update && apt-get install -y \
gcc \
libpq-dev \
libpango1.0-dev \
libcairo2-dev \
libgdk-pixbuf-2.0-dev \
libffi-dev \
libjpeg-dev \
zlib1g-dev \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt requirements-dev.txt ./
@@ -12,4 +18,4 @@ RUN pip install --no-cache-dir -r requirements-dev.txt
EXPOSE 8000
CMD [ "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload" ]
CMD [ "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--reload" ]

View File

@@ -0,0 +1,30 @@
"""account_invites add revoked_at and email_sent_at
Revision ID: 2aa73d3231c2
Revises: e1af7ab57ceb
Create Date: 2026-05-06 07:28:28.514384
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '2aa73d3231c2'
down_revision: Union[str, None] = 'e1af7ab57ceb'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column("account_invites", sa.Column("revoked_at", sa.DateTime(timezone=True), nullable=True))
op.add_column("account_invites", sa.Column("email_sent_at", sa.DateTime(timezone=True), nullable=True))
op.create_index("ix_account_invites_revoked_at", "account_invites", ["revoked_at"])
def downgrade() -> None:
op.drop_index("ix_account_invites_revoked_at", table_name="account_invites")
op.drop_column("account_invites", "email_sent_at")
op.drop_column("account_invites", "revoked_at")

View File

@@ -0,0 +1,28 @@
"""users add role_at_signup and onboarding_step_completed
Revision ID: 58e3caaa6269
Revises: 5bb055a1593e
Create Date: 2026-05-06 07:25:16.780761
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '58e3caaa6269'
down_revision: Union[str, None] = '5bb055a1593e'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column("users", sa.Column("role_at_signup", sa.String(50), nullable=True))
op.add_column("users", sa.Column("onboarding_step_completed", sa.Integer(), nullable=True))
def downgrade() -> None:
op.drop_column("users", "onboarding_step_completed")
op.drop_column("users", "role_at_signup")

View File

@@ -0,0 +1,47 @@
"""users password_hash nullable
Revision ID: 5bb055a1593e
Revises: b1fad5ddf357
Create Date: 2026-05-06 07:23:21.480252
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = '5bb055a1593e'
down_revision: Union[str, None] = 'b1fad5ddf357'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.alter_column(
"users",
"password_hash",
existing_type=sa.String(255),
nullable=True,
)
def downgrade() -> None:
# NOTE: downgrade is non-trivial if any OAuth-only users exist.
# This downgrade fails fast in that case rather than corrupting data.
conn = op.get_bind()
null_count = conn.execute(
sa.text("SELECT COUNT(*) FROM users WHERE password_hash IS NULL")
).scalar()
if null_count and null_count > 0:
raise RuntimeError(
f"Cannot downgrade: {null_count} OAuth-only users have NULL password_hash. "
"Set passwords or delete those rows before downgrading."
)
op.alter_column(
"users",
"password_hash",
existing_type=sa.String(255),
nullable=False,
)

View File

@@ -0,0 +1,60 @@
"""add applied_pending status + pending_reason to session_suggested_fixes
Adds the `applied_pending` non-terminal status (engineer ran the fix but
verification is deferred — waiting on client, async sync, etc) alongside
the existing `applied_partial` status. Mirrors partial_notes with a new
pending_reason column for the "what are you waiting on?" prose.
Revision ID: c0f3a4b7e91d
Revises: 71efd2102f49
Create Date: 2026-04-30
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
revision: str = "c0f3a4b7e91d"
down_revision: Union[str, None] = "71efd2102f49"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"session_suggested_fixes",
sa.Column("pending_reason", sa.Text(), nullable=True),
)
op.drop_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
type_="check",
)
op.create_check_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
"status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'applied_pending', 'dismissed')",
)
def downgrade() -> None:
op.execute(
"UPDATE session_suggested_fixes "
"SET status = 'applied_partial', "
" partial_notes = COALESCE(partial_notes, pending_reason) "
"WHERE status = 'applied_pending'"
)
op.drop_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
type_="check",
)
op.create_check_constraint(
"ck_session_suggested_fixes_status",
"session_suggested_fixes",
"status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'dismissed')",
)
op.drop_column("session_suggested_fixes", "pending_reason")

View File

@@ -0,0 +1,39 @@
"""add oauth_identities
Revision ID: b1fad5ddf357
Revises: c0f3a4b7e91d
Create Date: 2026-05-06 07:17:11.374555
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import UUID
# revision identifiers, used by Alembic.
revision: str = 'b1fad5ddf357'
down_revision: Union[str, None] = 'c0f3a4b7e91d'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"oauth_identities",
sa.Column("id", UUID(as_uuid=True), primary_key=True),
sa.Column("user_id", UUID(as_uuid=True), sa.ForeignKey("users.id", ondelete="CASCADE"), nullable=False),
sa.Column("provider", sa.String(20), nullable=False),
sa.Column("provider_subject", sa.String(255), nullable=False),
sa.Column("provider_email_at_link", sa.String(255), nullable=False),
sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.func.now()),
sa.UniqueConstraint("provider", "provider_subject", name="uq_oauth_identities_provider_subject"),
)
op.create_index("ix_oauth_identities_user_id", "oauth_identities", ["user_id"])
def downgrade() -> None:
op.drop_index("ix_oauth_identities_user_id", table_name="oauth_identities")
op.drop_table("oauth_identities")

View File

@@ -0,0 +1,47 @@
"""subscriptions pilot complimentary backfill
This migration converts existing pilot/dev accounts to permanent complimentary
Pro per the self-serve signup spec section 5. Forward-only; downgrade is
prohibited because original status is not preserved.
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
revision: str = "c6cbfc534fad"
down_revision: Union[str, None] = "c982a3fc4bf1"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
"""Set status='complimentary' and plan='pro' for all existing accounts that
don't have a canceled or past_due subscription. Pilot users transition to
permanent complimentary Pro per spec section 5.
Forward-only — does not preserve original status values."""
conn = op.get_bind()
# Update existing rows
conn.execute(sa.text("""
UPDATE subscriptions
SET status = 'complimentary', plan = 'pro',
current_period_end = NULL, current_period_start = NULL,
updated_at = now()
WHERE status NOT IN ('canceled', 'past_due')
"""))
# Backfill: any account without a Subscription row gets one
conn.execute(sa.text("""
INSERT INTO subscriptions (id, account_id, plan, status, cancel_at_period_end, created_at, updated_at)
SELECT gen_random_uuid(), a.id, 'pro', 'complimentary', false, now(), now()
FROM accounts a
WHERE NOT EXISTS (SELECT 1 FROM subscriptions s WHERE s.account_id = a.id)
"""))
def downgrade() -> None:
raise RuntimeError(
"Cannot downgrade: original subscription state is not preserved. "
"Restore from backup if needed."
)

View File

@@ -0,0 +1,45 @@
"""add stripe_events
Revision ID: c982a3fc4bf1
Revises: f7da3f93b519
Create Date: 2026-05-06 07:32:08.027633
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import JSONB
# revision identifiers, used by Alembic.
revision: str = 'c982a3fc4bf1'
down_revision: Union[str, None] = 'f7da3f93b519'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"stripe_events",
sa.Column("id", sa.String(length=255), primary_key=True, nullable=False),
sa.Column("event_type", sa.String(length=100), nullable=False),
sa.Column(
"processed_at",
sa.DateTime(timezone=True),
nullable=False,
server_default=sa.func.now(),
),
sa.Column(
"payload_excerpt",
JSONB,
nullable=False,
server_default=sa.text("'{}'::jsonb"),
),
)
op.create_index("ix_stripe_events_event_type", "stripe_events", ["event_type"])
def downgrade() -> None:
op.drop_index("ix_stripe_events_event_type", table_name="stripe_events")
op.drop_table("stripe_events")

View File

@@ -0,0 +1,28 @@
"""accounts add wizard columns
Revision ID: e1af7ab57ceb
Revises: 58e3caaa6269
Create Date: 2026-05-06 07:27:15.755518
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = 'e1af7ab57ceb'
down_revision: Union[str, None] = '58e3caaa6269'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column("accounts", sa.Column("team_size_bucket", sa.String(20), nullable=True))
op.add_column("accounts", sa.Column("primary_psa", sa.String(20), nullable=True))
def downgrade() -> None:
op.drop_column("accounts", "primary_psa")
op.drop_column("accounts", "team_size_bucket")

View File

@@ -0,0 +1,41 @@
"""add plan_billing
Revision ID: f236a91224d0
Revises: 2aa73d3231c2
Create Date: 2026-05-06 07:30:06.807887
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = 'f236a91224d0'
down_revision: Union[str, None] = '2aa73d3231c2'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"plan_billing",
sa.Column("plan", sa.String(50), sa.ForeignKey("plan_limits.plan"), primary_key=True),
sa.Column("display_name", sa.String(255), nullable=False),
sa.Column("description", sa.Text(), nullable=True),
sa.Column("monthly_price_cents", sa.Integer(), nullable=True),
sa.Column("annual_price_cents", sa.Integer(), nullable=True),
sa.Column("stripe_product_id", sa.String(255), nullable=True),
sa.Column("stripe_monthly_price_id", sa.String(255), nullable=True),
sa.Column("stripe_annual_price_id", sa.String(255), nullable=True),
sa.Column("is_public", sa.Boolean(), nullable=False, server_default=sa.text("true")),
sa.Column("is_archived", sa.Boolean(), nullable=False, server_default=sa.text("false")),
sa.Column("sort_order", sa.Integer(), nullable=False, server_default=sa.text("0")),
sa.Column("created_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.func.now()),
sa.Column("updated_at", sa.DateTime(timezone=True), nullable=False, server_default=sa.func.now()),
)
def downgrade() -> None:
op.drop_table("plan_billing")

View File

@@ -0,0 +1,57 @@
"""add sales_leads
Revision ID: f7da3f93b519
Revises: f236a91224d0
Create Date: 2026-05-06 07:31:39.533305
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
from sqlalchemy.dialects.postgresql import UUID
# revision identifiers, used by Alembic.
revision: str = 'f7da3f93b519'
down_revision: Union[str, None] = 'f236a91224d0'
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"sales_leads",
sa.Column("id", UUID(as_uuid=True), primary_key=True, nullable=False),
sa.Column("email", sa.String(length=255), nullable=False),
sa.Column("name", sa.String(length=255), nullable=False),
sa.Column("company", sa.String(length=255), nullable=False),
sa.Column("team_size", sa.String(length=20), nullable=True),
sa.Column("message", sa.Text(), nullable=True),
sa.Column("source", sa.String(length=50), nullable=False),
sa.Column("posthog_distinct_id", sa.String(length=255), nullable=True),
sa.Column(
"status",
sa.String(length=20),
nullable=False,
server_default=sa.text("'new'"),
),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
nullable=False,
server_default=sa.func.now(),
),
sa.Column(
"updated_at",
sa.DateTime(timezone=True),
nullable=False,
server_default=sa.func.now(),
),
)
op.create_index("ix_sales_leads_email", "sales_leads", ["email"])
def downgrade() -> None:
op.drop_index("ix_sales_leads_email", table_name="sales_leads")
op.drop_table("sales_leads")

View File

@@ -83,11 +83,12 @@ async def get_current_active_user(
current_user: Annotated[User, Depends(get_current_user)],
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> User:
"""Ensure user is active (not disabled). Auto-downgrades expired trials.
Enforces must_change_password — blocks all routes except allowlist.
"""Ensure user is active (not disabled). Enforces must_change_password —
blocks all routes except allowlist.
Uses get_admin_db: runs before require_tenant_context sets the ContextVar,
so tenant-scoped tables (subscriptions) would return 0 rows via app role.
Trial expiry enforcement now happens via require_active_subscription in
individual routers, NOT here. This dep no longer mutates Subscription
state.
"""
if not current_user.is_active:
raise HTTPException(
@@ -106,26 +107,6 @@ async def get_current_active_user(
# Set Sentry user context for error attribution
sentry_sdk.set_user({"id": str(current_user.id), "email": current_user.email})
# Lightweight trial expiry check
if current_user.account_id:
from app.models.subscription import Subscription
from datetime import datetime, timezone
result = await db.execute(
select(Subscription).where(Subscription.account_id == current_user.account_id)
)
subscription = result.scalar_one_or_none()
if (
subscription
and subscription.status == "trialing"
and subscription.current_period_end
and subscription.current_period_end < datetime.now(timezone.utc)
):
subscription.plan = "free"
subscription.status = "active"
subscription.current_period_end = None
subscription.current_period_start = None
await db.commit()
return current_user
@@ -241,3 +222,117 @@ async def require_admin_db(
the user object is needed in the handler.
"""
return db
_SUBSCRIPTION_GUARD_ALLOWLIST = {
"/api/v1/auth/me",
"/api/v1/auth/logout",
"/api/v1/auth/password/change",
"/api/v1/auth/email/send-verification",
"/api/v1/auth/email/verify",
"/api/v1/billing/state",
"/api/v1/billing/checkout-session",
"/api/v1/billing/portal-session",
"/api/v1/users/me",
"/api/v1/users/me/onboarding-step",
"/api/v1/users/me/onboarding-dismiss-rest",
}
async def require_active_subscription(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_admin_db)],
):
"""Returns the Subscription row when the account has access; raises 402
when locked. Mounted on routers requiring Pro entitlement.
'Locked' = (trialing AND current_period_end < now()) OR
(canceled OR incomplete OR no subscription).
Active states: active, complimentary, trialing-with-time-remaining, past_due.
"""
if request.url.path in _SUBSCRIPTION_GUARD_ALLOWLIST:
return None
from app.models.subscription import Subscription
from datetime import datetime, timezone
result = await db.execute(
select(Subscription).where(Subscription.account_id == current_user.account_id)
)
sub = result.scalar_one_or_none()
if sub is None:
raise HTTPException(
status_code=402,
detail={"error": "no_subscription", "upgrade_url": "/account/billing/select-plan"},
)
now = datetime.now(timezone.utc)
is_live = (
sub.status in ("active", "complimentary", "past_due")
or (
sub.status == "trialing"
and sub.current_period_end is not None
and sub.current_period_end > now
)
)
if not is_live:
raise HTTPException(
status_code=402,
detail={
"error": "subscription_inactive",
"status": sub.status,
"plan": sub.plan,
"current_period_end": sub.current_period_end.isoformat() if sub.current_period_end else None,
"upgrade_url": "/account/billing/select-plan",
},
)
return sub
_EMAIL_VERIFICATION_ALLOWLIST = {
"/api/v1/auth/me",
"/api/v1/auth/logout",
"/api/v1/auth/email/send-verification",
"/api/v1/auth/email/verify",
"/api/v1/auth/password/change",
"/api/v1/users/me",
"/api/v1/users/me/onboarding-step",
"/api/v1/users/me/onboarding-dismiss-rest",
"/api/v1/billing/state",
"/api/v1/billing/checkout-session",
"/api/v1/billing/portal-session",
}
VERIFICATION_GRACE_DAYS = 7
async def require_verified_email_after_grace(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
):
"""Enforces 'this user has verified email OR is still in 7-day grace.'
OAuth signups bypass cleanly because /auth/{google,microsoft}/callback
sets users.email_verified_at = now() (provider-attested)."""
from datetime import datetime, timezone, timedelta
if request.url.path in _EMAIL_VERIFICATION_ALLOWLIST:
return
if current_user.email_verified_at is not None:
return
grace_ends = current_user.created_at + timedelta(days=VERIFICATION_GRACE_DAYS)
if datetime.now(timezone.utc) < grace_ends:
return
raise HTTPException(
status_code=403,
detail={
"error": "email_not_verified",
"grace_ended_at": grace_ends.isoformat(),
"resend_url": "/api/v1/auth/email/send-verification",
},
)

View File

@@ -0,0 +1,54 @@
"""Public endpoint for resolving an account invite code into display info.
Mounted as a public route (no tenant context, no auth) — used by the
/accept-invite page on the frontend so an invitee can see what account they
are about to join before they sign up. Uses the BYPASSRLS admin session
factory because account_invites is account-scoped under Phase 4 RLS but the
caller has no tenant identity yet.
"""
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import joinedload
from app.core.admin_database import get_admin_db
from app.models.account_invite import AccountInvite
from app.schemas.oauth import InviteLookupResponse
router = APIRouter(prefix="/accounts", tags=["account-invite-lookup"])
@router.get("/invites/{code}/lookup", response_model=InviteLookupResponse)
async def lookup_invite(
code: str,
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> InviteLookupResponse:
"""Return minimal display data for a valid (unused, unexpired, not revoked)
invite. Returns 404 with `invite_invalid_or_expired_or_revoked` for any
invalid state — the AcceptInvitePage shows a single "ask the inviter to
resend" message regardless of which condition failed (anti-enumeration)."""
result = await db.execute(
select(AccountInvite)
.where(AccountInvite.code == code)
.options(
joinedload(AccountInvite.account),
joinedload(AccountInvite.invited_by),
)
)
invite = result.scalar_one_or_none()
if invite is None or not invite.is_valid:
raise HTTPException(
status_code=404,
detail={"error": "invite_invalid_or_expired_or_revoked"},
)
return InviteLookupResponse(
account_name=invite.account.name,
inviter_name=invite.invited_by.name,
invited_email=invite.email,
role=invite.role,
)

View File

@@ -19,7 +19,7 @@ from app.models.account_invite import AccountInvite
from app.models.account_settings import AccountSettings
from app.models.subscription import Subscription
from app.models.user import User
from app.schemas.account import AccountResponse, AccountUpdate, AccountInviteCreate, AccountInviteResponse, TransferOwnershipRequest
from app.schemas.account import AccountResponse, AccountUpdate, AccountInviteCreate, AccountInviteResponse, AccountInviteBulkCreate, AccountInviteBulkResponse, TransferOwnershipRequest
from app.schemas.subscription import SubscriptionResponse, PlanLimitsResponse, UsageResponse, SubscriptionDetails
from app.schemas.user import UserResponse, AccountRoleUpdate
from app.core.security import verify_password
@@ -260,7 +260,7 @@ async def create_invite(
db: Annotated[AsyncSession, Depends(get_db)],
current_user: Annotated[User, Depends(require_account_owner)]
):
"""Create an invite to join this account (owner only)."""
"""Create an invite to join this account (owner only). Sends invite email."""
code = secrets.token_urlsafe(16)
expires_at = None
@@ -276,11 +276,109 @@ async def create_invite(
expires_at=expires_at,
)
db.add(invite)
await db.flush()
# Lookup account name for email
account_result = await db.execute(
select(Account).where(Account.id == current_user.account_id)
)
account = account_result.scalar_one()
# Send invite email — non-blocking on failure (function returns False on error)
email_sent = await EmailService.send_account_invite_email(
to_email=invite.email,
code=code,
account_name=account.name,
role=invite.role,
)
if email_sent:
invite.email_sent_at = datetime.now(timezone.utc)
await db.commit()
await db.refresh(invite)
return invite
@router.post("/me/invites/bulk", response_model=AccountInviteBulkResponse, status_code=status.HTTP_201_CREATED)
async def create_invites_bulk(
payload: AccountInviteBulkCreate,
db: Annotated[AsyncSession, Depends(get_db)],
current_user: Annotated[User, Depends(require_account_owner)]
):
"""Create multiple invites in one call (wizard step 3 supports up to N).
Per-row failures are returned in `failed`; successes in `created`."""
# Lookup account once for email rendering
account_result = await db.execute(
select(Account).where(Account.id == current_user.account_id)
)
account = account_result.scalar_one()
created: list[AccountInvite] = []
failed: list[dict] = []
for invite_data in payload.invites:
try:
code = secrets.token_urlsafe(16)
expires_at = None
if invite_data.expires_in_days:
expires_at = datetime.now(timezone.utc) + timedelta(days=invite_data.expires_in_days)
invite = AccountInvite(
account_id=current_user.account_id,
invited_by_id=current_user.id,
email=invite_data.email,
code=code,
role=invite_data.role,
expires_at=expires_at,
)
db.add(invite)
await db.flush()
email_sent = await EmailService.send_account_invite_email(
to_email=invite.email,
code=code,
account_name=account.name,
role=invite.role,
)
if email_sent:
invite.email_sent_at = datetime.now(timezone.utc)
created.append(invite)
except Exception as e:
failed.append({"email": invite_data.email, "error": str(e)})
await db.commit()
for inv in created:
await db.refresh(inv)
return AccountInviteBulkResponse(created=created, failed=failed)
@router.delete("/me/invites/{invite_id}", status_code=status.HTTP_204_NO_CONTENT)
async def revoke_invite(
invite_id: UUID,
db: Annotated[AsyncSession, Depends(get_db)],
current_user: Annotated[User, Depends(require_account_owner)]
):
"""Soft-revoke an invitation by setting revoked_at. Idempotent on already-
revoked invites; rejects already-accepted invites."""
result = await db.execute(
select(AccountInvite).where(
AccountInvite.id == invite_id,
AccountInvite.account_id == current_user.account_id,
)
)
invite = result.scalar_one_or_none()
if not invite:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Invite not found")
if invite.is_revoked:
return None # idempotent
if invite.is_used:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Cannot revoke an accepted invite")
invite.revoked_at = datetime.now(timezone.utc)
await db.commit()
return None
@router.post("/me/invites/{invite_id}/resend", response_model=AccountInviteResponse)
async def resend_invite(
invite_id: UUID,

View File

@@ -8,34 +8,101 @@ from app.core.database import get_db
from app.core.audit import log_audit
from app.models.user import User
from app.models.plan_limits import PlanLimits
from app.models.plan_billing import PlanBilling
from app.models.account import Account
from app.models.account_limit_override import AccountLimitOverride
from app.models.subscription import Subscription
from app.schemas.admin import (
PlanLimitResponse, PlanLimitUpdate,
PlanLimitResponse, PlanLimitUpdate, PlanLimitWithBillingResponse,
AccountOverrideCreate, AccountOverrideUpdate, AccountOverrideResponse,
)
from app.api.deps import require_admin
from app.services.billing import BillingService
router = APIRouter(prefix="/admin", tags=["admin-plan-limits"])
@router.get("/plan-limits", response_model=list[PlanLimitResponse])
# Fields on PlanLimitUpdate that map to plan_billing (not plan_limits).
_PLAN_BILLING_FIELDS = (
"display_name",
"description",
"monthly_price_cents",
"annual_price_cents",
"stripe_product_id",
"stripe_monthly_price_id",
"stripe_annual_price_id",
"is_public",
"is_archived",
"sort_order",
)
# Subset of _PLAN_BILLING_FIELDS that are NOT NULL on the PlanBilling model.
# These are Optional[...] on PlanLimitUpdate, so a caller sending an explicit
# null for any of them would otherwise trigger a NOT NULL violation at commit.
_PLAN_BILLING_NOT_NULL_FIELDS = frozenset({
"display_name",
"is_public",
"is_archived",
"sort_order",
})
def _merge_plan_with_billing(
plan: PlanLimits, billing: PlanBilling | None
) -> PlanLimitWithBillingResponse:
"""Build a merged response. Billing fields are None when no plan_billing row
exists for the plan."""
payload = {
"plan": plan.plan,
"max_trees": plan.max_trees,
"max_sessions_per_month": plan.max_sessions_per_month,
"max_users": plan.max_users,
"custom_branding": plan.custom_branding,
"priority_support": plan.priority_support,
"export_formats": plan.export_formats or [],
}
if billing is not None:
payload.update({
"display_name": billing.display_name,
"description": billing.description,
"monthly_price_cents": billing.monthly_price_cents,
"annual_price_cents": billing.annual_price_cents,
"stripe_product_id": billing.stripe_product_id,
"stripe_monthly_price_id": billing.stripe_monthly_price_id,
"stripe_annual_price_id": billing.stripe_annual_price_id,
"is_public": billing.is_public,
"is_archived": billing.is_archived,
"sort_order": billing.sort_order,
})
return PlanLimitWithBillingResponse(**payload)
@router.get("/plan-limits", response_model=list[PlanLimitWithBillingResponse])
async def list_plan_limits(
db: Annotated[AsyncSession, Depends(get_db)],
current_user: Annotated[User, Depends(require_admin)],
):
"""List all plan limit configurations."""
result = await db.execute(select(PlanLimits))
return result.scalars().all()
"""List all plan limit configurations, merged with plan_billing fields
where present. Plans without a plan_billing row return None for the
billing fields."""
rows = (await db.execute(
select(PlanLimits, PlanBilling)
.outerjoin(PlanBilling, PlanLimits.plan == PlanBilling.plan)
)).all()
return [_merge_plan_with_billing(pl, pb) for pl, pb in rows]
@router.put("/plan-limits", response_model=PlanLimitResponse)
@router.put("/plan-limits", response_model=PlanLimitWithBillingResponse)
async def update_plan_limits(
data: PlanLimitUpdate,
db: Annotated[AsyncSession, Depends(get_db)],
current_user: Annotated[User, Depends(require_admin)],
):
"""Update a plan's limits."""
"""Update a plan's limits and (if any plan_billing field is included)
upsert the matching plan_billing row in the same transaction. After
commit, invalidates the in-process billing cache for accounts on this
plan (currently a no-op — see BillingService.invalidate_billing_cache).
"""
result = await db.execute(select(PlanLimits).where(PlanLimits.plan == data.plan))
plan = result.scalar_one_or_none()
if not plan:
@@ -48,10 +115,50 @@ async def update_plan_limits(
plan.priority_support = data.priority_support
plan.export_formats = data.export_formats
await log_audit(db, current_user.id, "plan_limits.update", "plan_limits", details={"plan": data.plan})
# Did the request include any plan_billing field? (Pydantic gives us
# `model_fields_set` to distinguish "user passed null" from "field omitted".)
billing_fields_set = data.model_fields_set & set(_PLAN_BILLING_FIELDS)
billing: PlanBilling | None = None
if billing_fields_set:
billing = (await db.execute(
select(PlanBilling).where(PlanBilling.plan == data.plan)
)).scalar_one_or_none()
if billing is None:
# Create. display_name is required on the model — derive from the
# plan name when the caller didn't supply one (e.g. "pro" → "Pro").
display_name = data.display_name or data.plan.capitalize()
billing = PlanBilling(plan=data.plan, display_name=display_name)
db.add(billing)
# Apply only the fields the caller actually included. Allows partial
# updates without clobbering existing values.
for field in billing_fields_set:
value = getattr(data, field)
if value is None and field in _PLAN_BILLING_NOT_NULL_FIELDS:
# Don't NULL out a NOT NULL column on update.
continue
setattr(billing, field, value)
await log_audit(
db, current_user.id, "plan_limits.update", "plan_limits",
details={"plan": data.plan, "updated_billing": bool(billing_fields_set)},
)
await db.commit()
await db.refresh(plan)
return plan
if billing is not None:
await db.refresh(billing)
# Invalidate any in-process billing cache for accounts on this plan.
# TODO: invalidate app.state.billing_cache when added.
account_ids = [
row[0] for row in (await db.execute(
select(Subscription.account_id).where(Subscription.plan == data.plan)
)).all()
]
await BillingService.invalidate_billing_cache(account_ids)
return _merge_plan_with_billing(plan, billing)
@router.get("/account-overrides", response_model=list[AccountOverrideResponse])

View File

@@ -15,7 +15,7 @@ from datetime import datetime
from typing import Annotated, Optional
from uuid import UUID
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Query, Request, status
from sqlalchemy import or_, select, func, text
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
@@ -452,6 +452,13 @@ async def resolve_session(
# ── Escalate ──
#
# Thin shim over HandoffManager. The legacy `flowpilot_engine.escalate_session`
# is no longer the source of truth — every escalation now creates a
# SessionHandoff row, fans out via the SSE bus, dispatches AppNotification +
# external channels via notify(), and emails per-user. EscalateModal and the
# /handoff endpoint both funnel through here / through HandoffManager so the
# senior-pickup magic-moment screen works regardless of entry point.
@router.post("/{session_id}/escalate", response_model=SessionCloseResponse)
@limiter.limit("15/minute")
@@ -459,25 +466,62 @@ async def escalate_session(
request: Request,
session_id: UUID,
data: EscalateSessionRequest,
background_tasks: BackgroundTasks,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Escalate a FlowPilot session to another engineer."""
"""Escalate a FlowPilot session — unified through HandoffManager."""
from app.services.handoff_manager import HandoffManager, enrich_escalation_async
# Owner-only — matches the original constraint on flowpilot_engine.escalate_session.
session_result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
session = session_result.scalar_one_or_none()
if not session:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND, detail="Session not found"
)
manager = HandoffManager(db)
try:
result = await flowpilot_engine.escalate_session(
handoff = await manager.create_handoff(
session_id=session_id,
request=data,
intent="escalate",
engineer_notes=data.escalation_reason,
user_id=current_user.id,
db=db,
priority="normal",
target_user_id=data.escalated_to_id,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
documentation, psa_result = await manager.finalize_escalation(
handoff, session, current_user.id
)
await db.commit()
return result
await manager.dispatch_escalation_notifications(handoff)
# AI enrichment (Sonnet assessment + enhanced escalation_package) runs
# in the background so the escalating engineer doesn't wait on
# 15-25s of model latency. Result lands on the handoff row when ready;
# the senior's magic-moment screen reads it at pickup time.
background_tasks.add_task(
enrich_escalation_async, handoff.id, current_user.id
)
return SessionCloseResponse(
session_id=session.id,
status=session.status,
documentation=documentation,
**psa_result,
)
# ── Pause ──
@@ -644,7 +688,8 @@ async def get_escalation_queue(
select(AISession)
.where(
scope_filter,
AISession.status == "requesting_escalation",
AISession.status.in_(("requesting_escalation", "escalated")),
AISession.user_id != current_user.id,
)
.order_by(AISession.created_at.desc())
)
@@ -838,13 +883,25 @@ async def list_sessions(
date_to: Optional[datetime] = Query(None),
q: Optional[str] = Query(None, min_length=2, max_length=200),
):
"""List the current user's AI sessions (owned or picked up)."""
"""List the current user's AI sessions (owned or picked up).
"Picked up" includes both the legacy escalation_package.picked_up_by
marker (set by flowpilot_engine.pickup_session) AND the new
escalated_to_id field (set by HandoffManager.claim_session for the
unified handoff/escalate path). Without the escalated_to_id branch
the senior tech wouldn't see a session they just claimed in their
chat sidebar — the picked-up session lands as the active chat with
no entry in the list, which is what the user reported as "4 versions
of the session" (their unrelated owned sessions show up while the
claimed one is invisible).
"""
user_id_str = str(current_user.id)
query = (
select(AISession)
.where(
or_(
AISession.user_id == current_user.id,
AISession.escalated_to_id == current_user.id,
AISession.escalation_package["picked_up_by"].as_string() == user_id_str,
)
)
@@ -901,10 +958,21 @@ async def get_session(
if not session:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
# Allow access if user is owner, escalation target, or picked-up handler
# Allow access if user is owner, escalation target, or picked-up handler.
# Sessions in transit (requesting_escalation / escalated) are also
# readable by any account member — the whole point of escalation is that
# other techs can see the context before claiming. Tenant boundary is
# enforced by RLS on the underlying query, so account-scope is the right
# ceiling for in-transit reads.
pkg = session.escalation_package or {}
is_handler = pkg.get("picked_up_by") == str(current_user.id)
if session.user_id != current_user.id and session.escalated_to_id != current_user.id and not is_handler:
is_in_transit = session.status in ("requesting_escalation", "escalated")
if (
session.user_id != current_user.id
and session.escalated_to_id != current_user.id
and not is_handler
and not is_in_transit
):
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
return _build_session_detail(session)

View File

@@ -1,3 +1,4 @@
import logging
import secrets
import string
from datetime import datetime, timezone, timedelta
@@ -41,11 +42,21 @@ from app.core.email import EmailService
from app.api.deps import get_current_active_user, get_refresh_token_payload
from app.core.audit import log_audit
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/auth", tags=["authentication"])
async def _store_refresh_token(db: AsyncSession, refresh_token_str: str, user_id) -> None:
"""Decode a refresh token JWT and store its hash in the database."""
async def store_refresh_token(db: AsyncSession, refresh_token_str: str, user_id) -> None:
"""Decode a refresh token JWT and store its hash in the database.
Module-public so OAuth callback endpoints (and any future token-issuing
surface) can register the JTI in the ``refresh_tokens`` table the same
way ``/auth/login`` does. Without this the first ``/auth/refresh`` call
will reject the token as "revoked" because no row exists.
Caller is responsible for committing the session.
"""
payload = decode_token(refresh_token_str)
if payload and payload.get("jti"):
token_record = RefreshToken(
@@ -62,6 +73,22 @@ def _generate_display_code() -> str:
return ''.join(secrets.choice(chars) for _ in range(8))
async def _reject_if_oauth_only(db: AsyncSession, user) -> None:
"""If the user has no password_hash, raise 400 with a list of linked
providers so the client can redirect them to the right OAuth flow."""
if user is None or user.password_hash is not None:
return
from app.models.oauth_identity import OAuthIdentity
result = await db.execute(
select(OAuthIdentity.provider).where(OAuthIdentity.user_id == user.id)
)
providers = [row for row in result.scalars().all()]
raise HTTPException(
status_code=400,
detail={"error": "use_oauth_provider", "providers": providers},
)
@router.post("/register", response_model=UserResponse, status_code=status.HTTP_201_CREATED)
@limiter.limit("3/minute")
async def register(
@@ -108,10 +135,24 @@ async def register(
detail="Account invite code has expired"
)
if account_invite_record.email.lower() != user_data.email.lower():
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail={"error": "invite_email_mismatch"},
)
# Validate platform invite code (skip if account invite was provided)
invite_code_record = None
if not account_invite_record:
if settings.REQUIRE_INVITE_CODE and not user_data.invite_code:
# When SELF_SERVE_ENABLED is on, the platform invite gate is bypassed
# entirely — public self-serve signup is the whole point. The
# invite_code field stays in the schema for backward compatibility
# and so paid/trial-bearing codes still apply when supplied.
if (
settings.REQUIRE_INVITE_CODE
and not settings.SELF_SERVE_ENABLED
and not user_data.invite_code
):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invite code is required"
@@ -195,26 +236,30 @@ async def register(
# Now set account owner and create subscription
new_account.owner_id = new_user.id
# Apply plan/trial from invite code if present
sub_plan = "free"
sub_status = "active"
period_start = None
period_end = None
if invite_code_record and invite_code_record.assigned_plan:
# Plan/trial driven by platform invite code (existing pilot flow)
sub_plan = invite_code_record.assigned_plan
sub_status = "active"
period_start = None
period_end = None
if invite_code_record.trial_duration_days:
sub_status = "trialing"
period_start = datetime.now(timezone.utc)
period_end = period_start + timedelta(days=invite_code_record.trial_duration_days)
new_subscription = Subscription(
account_id=new_account.id,
plan=sub_plan,
status=sub_status,
current_period_start=period_start,
current_period_end=period_end,
)
db.add(new_subscription)
db.add(Subscription(
account_id=new_account.id,
plan=sub_plan,
status=sub_status,
current_period_start=period_start,
current_period_end=period_end,
))
else:
# New self-serve shop — start the standard Pro trial.
# start_trial commits internally; flush our pending User/Account changes
# first so the FK is satisfied.
await db.flush()
from app.services.billing import BillingService
await BillingService.start_trial(db, new_account.id)
# Mark platform invite code as used
if invite_code_record:
@@ -224,6 +269,34 @@ async def register(
await db.commit()
await db.refresh(new_user)
# Auto-send verification email for newly-registered users.
# Skip silently if verification already done (shouldn't happen for fresh
# users, but defensive).
if new_user.email_verified_at is None:
verification_enabled = await SettingsManager.get(
"email_verification_enabled", db, default=True
)
if verification_enabled:
try:
raw_token = create_email_verification_token(str(new_user.id))
payload = decode_token(raw_token)
if payload and payload.get("jti"):
token_record = EmailVerificationToken(
token_hash=hash_token(payload["jti"]),
user_id=new_user.id,
expires_at=datetime.fromtimestamp(payload["exp"], tz=timezone.utc),
)
db.add(token_record)
await db.commit()
verification_url = f"{settings.FRONTEND_URL}/verify-email?token={raw_token}"
await EmailService.send_email_verification_email(
to_email=new_user.email,
verification_url=verification_url,
)
except Exception as e:
logger.warning("verification email send failed for %s: %s", new_user.email, e)
return new_user
@@ -239,6 +312,7 @@ async def login(
result = await db.execute(select(User).where(User.email == form_data.username))
user = result.scalar_one_or_none()
await _reject_if_oauth_only(db, user)
if not user or not verify_password(form_data.password, user.password_hash):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
@@ -254,7 +328,7 @@ async def login(
refresh_token_str = create_refresh_token(data={"sub": str(user.id)})
# Store refresh token hash in DB
await _store_refresh_token(db, refresh_token_str, user.id)
await store_refresh_token(db, refresh_token_str, user.id)
await db.commit()
return Token(
@@ -276,6 +350,7 @@ async def login_json(
result = await db.execute(select(User).where(User.email == credentials.email))
user = result.scalar_one_or_none()
await _reject_if_oauth_only(db, user)
if not user or not verify_password(credentials.password, user.password_hash):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
@@ -288,7 +363,7 @@ async def login_json(
refresh_token_str = create_refresh_token(data={"sub": str(user.id)})
# Store refresh token hash in DB
await _store_refresh_token(db, refresh_token_str, user.id)
await store_refresh_token(db, refresh_token_str, user.id)
await db.commit()
return Token(
@@ -346,7 +421,7 @@ async def refresh_token(
new_refresh_token_str = create_refresh_token(data={"sub": str(user.id)})
# Store new refresh token
await _store_refresh_token(db, new_refresh_token_str, user.id)
await store_refresh_token(db, new_refresh_token_str, user.id)
await db.commit()
return Token(
@@ -441,6 +516,7 @@ async def change_password(
db: Annotated[AsyncSession, Depends(get_admin_db)]
):
"""Change the current user's password."""
await _reject_if_oauth_only(db, current_user)
if not verify_password(data.current_password, current_user.password_hash):
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
@@ -484,7 +560,7 @@ async def forgot_password(
result = await db.execute(select(User).where(User.email == data.email))
user = result.scalar_one_or_none()
if user:
if user and user.password_hash is not None:
# Create reset token JWT
raw_token = create_password_reset_token(str(user.id))
payload = decode_token(raw_token)

View File

@@ -1,31 +1,44 @@
"""Public beta signup endpoint — no auth required."""
"""Legacy beta signup endpoint — redirects to /register?from=beta.
Phase 2 (self-serve signup) makes the public register flow the canonical
front door. The old `/api/v1/beta-signup` POST endpoint is kept mounted to
preserve any external links that still hit it, but now responds with a
307 Temporary Redirect to `/register?from=beta` so the user lands in the
real signup flow. The `?from=beta` marker lets the frontend tag the
signup origin for analytics.
Note: there is no `beta_signup` database table — the original endpoint
only fired a notification email. There is therefore no waitlist to email
and no migration to run when retiring the endpoint.
"""
import logging
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel, EmailStr
from app.core.email import EmailService
from fastapi import APIRouter
from fastapi.responses import RedirectResponse
from app.core.config import settings
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/beta-signup", tags=["beta"])
class BetaSignupRequest(BaseModel):
email: EmailStr
# Local-dev fallback when FRONTEND_URL isn't configured. The redirect must
# be absolute — a relative URL would resolve against the API origin
# (api.resolutionflow.com), which has no /register page.
_DEFAULT_FRONTEND_URL = "http://localhost:5173"
class BetaSignupResponse(BaseModel):
success: bool
message: str
@router.post("", include_in_schema=False)
async def beta_signup_redirect() -> RedirectResponse:
"""Redirect legacy beta-signup POST to the public register page.
@router.post("", response_model=BetaSignupResponse)
async def beta_signup(data: BetaSignupRequest):
"""Collect beta interest — sends notification to beta@resolutionflow.com."""
sent = await EmailService.send_beta_signup_notification(data.email)
if not sent:
logger.warning("Beta signup recorded (email delivery skipped): %s", data.email)
return BetaSignupResponse(
success=True,
message="Thanks! We'll be in touch with beta access details.",
Returns 307 so any client following the redirect preserves the HTTP
method; the frontend treats `/register?from=beta` as the canonical
entry point and reads the `from` query param for analytics.
"""
frontend_url = settings.FRONTEND_URL or _DEFAULT_FRONTEND_URL
return RedirectResponse(
url=f"{frontend_url}/register?from=beta",
status_code=307,
)

View File

@@ -0,0 +1,76 @@
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.api.deps import get_current_active_user
from app.core.admin_database import get_admin_db
from app.core.config import settings
from app.models.account import Account
from app.models.user import User
from app.schemas.billing import (
BillingPortalSessionResponse,
BillingStateResponse,
CheckoutSessionCreate,
CheckoutSessionResponse,
)
from app.services.billing import BillingService
router = APIRouter(prefix="/billing", tags=["billing"])
@router.post("/checkout-session", response_model=CheckoutSessionResponse)
async def create_checkout_session(
payload: CheckoutSessionCreate,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> CheckoutSessionResponse:
account = (await db.execute(
select(Account).where(Account.id == current_user.account_id)
)).scalar_one()
url = await BillingService.create_checkout_session(
db=db,
account=account,
plan=payload.plan,
seats=payload.seats,
billing_interval=payload.billing_interval,
success_url=f"{settings.FRONTEND_URL}/account/billing?success=1",
cancel_url=f"{settings.FRONTEND_URL}/account/billing/select-plan",
)
return CheckoutSessionResponse(url=url)
@router.get("/state", response_model=BillingStateResponse)
async def get_billing_state(
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> BillingStateResponse:
account = (await db.execute(
select(Account).where(Account.id == current_user.account_id)
)).scalar_one()
state = await BillingService.get_billing_state(db, account)
return BillingStateResponse(**state)
@router.get("/portal-session", response_model=BillingPortalSessionResponse)
async def get_billing_portal_session(
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> BillingPortalSessionResponse:
"""Return a Stripe-hosted Customer Portal URL for the account so the user
can update card / cancel. Allowlisted from the subscription + email-verify
guards (a canceled or unverified-past-grace user must still be able to
update billing)."""
if not settings.stripe_enabled:
raise HTTPException(status_code=503, detail={"error": "stripe_not_configured"})
account = (await db.execute(
select(Account).where(Account.id == current_user.account_id)
)).scalar_one()
try:
url = await BillingService.open_customer_portal(account)
except ValueError:
raise HTTPException(status_code=400, detail={"error": "no_stripe_customer"})
return BillingPortalSessionResponse(url=url)

View File

@@ -0,0 +1,40 @@
"""Public runtime configuration endpoint.
GET /api/v1/config/public
Returns the small set of runtime flags the frontend needs at app load
to decide whether to render the self-serve signup flow and which OAuth
buttons to show. No authentication required.
The response model lives in `app.schemas.config` so it can be reused by
frontend codegen and other call sites if needed.
"""
from __future__ import annotations
from fastapi import APIRouter
from app.core.config import settings
from app.schemas.config import PublicConfigResponse
router = APIRouter(prefix="/config", tags=["config"])
@router.get("/public", response_model=PublicConfigResponse)
async def get_public_config() -> PublicConfigResponse:
"""Return public-safe runtime config.
`oauth_providers` reflects which OAuth client IDs are configured server
side; the frontend uses it to render only buttons that will actually
succeed. `self_serve_enabled` is the master switch for the new public
self-serve signup flow.
"""
providers: list[str] = []
if settings.GOOGLE_CLIENT_ID:
providers.append("google")
if settings.MS_CLIENT_ID:
providers.append("microsoft")
return PublicConfigResponse(
self_serve_enabled=settings.SELF_SERVE_ENABLED,
oauth_providers=providers,
)

View File

@@ -3,8 +3,10 @@
Endpoints:
GET /analytics/flowpilot?period=30d — Main dashboard data
GET /analytics/flowpilot/knowledge-gaps — Knowledge gap report
GET /analytics/flowpilot/escalations?period=30d — Escalation handoff metrics
"""
import logging
import statistics
from datetime import datetime, timezone, timedelta
from typing import Annotated, Optional
@@ -13,10 +15,17 @@ from sqlalchemy import select, func, case, cast, Date, extract
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.rate_limit import limiter
from app.api.deps import get_current_active_user, get_db, require_team_admin
from app.api.deps import (
get_current_active_user,
get_db,
require_engineer_or_admin,
require_team_admin,
)
from app.models.user import User
from app.models.tree import Tree
from app.models.ai_session import AISession
from app.models.ai_session_step import AISessionStep
from app.models.session_handoff import SessionHandoff
from app.models.flow_proposal import FlowProposal
from app.models.psa_activity_log import PsaActivityLog
from app.models.psa_post_log import PsaPostLog
@@ -36,6 +45,7 @@ from app.schemas.flowpilot_analytics import (
EnhancedPsaMetrics,
PsaFunnel,
PsaDailyTrend,
EscalationMetrics,
)
from app.services.knowledge_gap_service import get_knowledge_gaps, KnowledgeGapReport
@@ -727,3 +737,104 @@ async def get_enhanced_psa_metrics(
push_funnel=push_funnel,
daily_trend=daily_trend,
)
# ─── Escalation Mode metrics (wedge stat for /escalations queue + analytics page)
#
# Pulls all (handoff.claimed_at, first_step_after_claim.created_at) pairs in the
# window and aggregates avg/median/p95 of the delta in Python. Pilot scale
# (~1k rows max per account per month) makes this cheaper and clearer than
# Postgres percentile_cont gymnastics.
#
# IMPORTANT: this is the in-product metric only. The "minutes recovered"
# sales claim requires manual baseline measurement (see The Assignment in
# docs/plans/2026-04-27-escalation-mode-wedge-design.md).
@router.get("/escalations", response_model=EscalationMetrics)
@limiter.limit("30/minute")
async def get_escalation_metrics(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
period: str = Query("30d", pattern="^(7d|30d|90d)$"),
) -> EscalationMetrics:
"""Time-to-first-action after escalation claim, account-scoped.
Returns:
n_handoffs_claimed: handoffs in window that were claimed by a senior.
n_handoffs_with_action: subset where the senior took at least one
action (an ai_session_step row created after claimed_at).
avg/median/p95_seconds_to_first_action: aggregates of
(first_step.created_at - claimed_at) in seconds.
Excludes handoffs where claimed_at IS NULL (never claimed) and handoffs
where no ai_session_step was created after the claim. Both are
counted — n_handoffs_claimed includes "no action yet" handoffs so the
conversion rate is visible.
"""
if not current_user.account_id:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN, detail="No account"
)
account_id = current_user.account_id
period_start = _get_period_start(period)
# First-action timestamp per handoff via correlated scalar subquery.
first_action_subq = (
select(func.min(AISessionStep.created_at))
.where(
AISessionStep.session_id == SessionHandoff.session_id,
AISessionStep.created_at > SessionHandoff.claimed_at,
)
.correlate(SessionHandoff)
.scalar_subquery()
)
rows = (
await db.execute(
select(
SessionHandoff.claimed_at,
first_action_subq.label("first_action_at"),
).where(
SessionHandoff.account_id == account_id,
SessionHandoff.claimed_at.isnot(None),
SessionHandoff.claimed_at >= period_start,
)
)
).all()
n_handoffs_claimed = len(rows)
deltas: list[float] = []
for claimed_at, first_action_at in rows:
if first_action_at is None:
continue
delta_s = (first_action_at - claimed_at).total_seconds()
# Floor at zero — clock drift between rows could in theory yield a
# tiny negative if a step's created_at races claimed_at. Surface as
# 0s rather than absurd negative deltas.
if delta_s < 0:
delta_s = 0.0
deltas.append(delta_s)
n_handoffs_with_action = len(deltas)
if n_handoffs_with_action == 0:
return EscalationMetrics(
period=period,
n_handoffs_claimed=n_handoffs_claimed,
n_handoffs_with_action=0,
)
sorted_deltas = sorted(deltas)
p95_idx = max(0, int(round(0.95 * (n_handoffs_with_action - 1))))
return EscalationMetrics(
period=period,
n_handoffs_claimed=n_handoffs_claimed,
n_handoffs_with_action=n_handoffs_with_action,
avg_seconds_to_first_action=round(statistics.fmean(deltas), 2),
median_seconds_to_first_action=round(statistics.median(deltas), 2),
p95_seconds_to_first_action=round(sorted_deltas[p95_idx], 2),
)

View File

@@ -194,6 +194,7 @@ async def create_folder(
new_folder = UserFolder(
user_id=current_user.id,
account_id=current_user.account_id,
name=folder_data.name,
color=folder_data.color,
icon=folder_data.icon,

View File

@@ -1,6 +1,7 @@
"""PSA integration endpoints — connection CRUD and test."""
from __future__ import annotations
import logging
from datetime import datetime, timezone
from typing import Annotated
from uuid import UUID
@@ -11,6 +12,8 @@ from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import delete
logger = logging.getLogger(__name__)
from app.api.deps import get_current_active_user, require_account_owner, require_engineer_or_admin
from app.core.database import get_db
from app.models.psa_connection import PsaConnection
@@ -30,6 +33,17 @@ from app.schemas.psa_connection import (
PSABoardResponse,
)
from app.core.config import settings
from app.schemas.psa_tickets import (
PSAResourceSchema,
PSATicketCreatedSchema,
PSATicketStatusUpdateSchema,
TicketCreatePayloadSchema,
PSAPrioritySchema,
TicketListResponseSchema,
AiParseRequestSchema,
AiParseResponseSchema,
)
import app.services.ticket_service as ticket_svc
from app.services.psa.encryption import (
decrypt_credentials,
encrypt_credentials,
@@ -362,33 +376,36 @@ async def list_boards(
provider = await get_provider_for_account(current_user.account_id, db)
boards = await provider.list_boards()
return [PSABoardResponse(id=b.id, name=b.name) for b in boards]
except PSAError:
except PSAError as e:
# Boards are optional UI chrome — degrade gracefully rather than surfacing a toast
logger.warning("list_boards failed: %s", e)
return []
@router.get("/tickets/search", response_model=list[PSATicketSearchResult])
@router.get("/tickets/search", response_model=TicketListResponseSchema)
async def search_tickets(
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
query: str = "",
board_id: int | None = None,
status_id: int | None = None,
status_name: str | None = None,
include_closed: bool = False,
assigned_to_me: bool = False,
unassigned: bool = False,
board_ids: str = "",
priority: str | None = None,
company_id: int | None = None,
page: int = 1,
page_size: int = 10,
page_size: int = 25,
):
"""Search ConnectWise tickets."""
"""Search ConnectWise tickets — returns paginated TicketListResponse."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.registry import get_provider_for_account
from app.services.psa.exceptions import PSAError
# Resolve assigned_to_me → member_identifier (CW login name for resources contains filter)
member_identifier: str | None = None
if assigned_to_me:
conn_result = await db.execute(
@@ -407,23 +424,18 @@ async def search_tickets(
)
mapping = mapping_result.scalar_one_or_none()
if not mapping:
# No mapping for this user — return empty list
return []
from app.services.psa.registry import get_provider_for_account as _get_provider
from app.services.psa.exceptions import PSAError as _PSAError
return {"items": [], "total": 0, "page": page, "page_size": page_size}
try:
_provider = await _get_provider(current_user.account_id, db)
_provider = await get_provider_for_account(current_user.account_id, db)
cw_members = await _provider.list_members()
matched = next((m for m in cw_members if m.id == mapping.external_member_id), None)
if matched:
member_identifier = matched.identifier
else:
return []
except _PSAError:
return []
return {"items": [], "total": 0, "page": page, "page_size": page_size}
except PSAError:
return {"items": [], "total": 0, "page": page, "page_size": page_size}
# Parse comma-separated board_ids
parsed_board_ids: list[int] = []
if board_ids:
try:
@@ -433,33 +445,250 @@ async def search_tickets(
try:
provider = await get_provider_for_account(current_user.account_id, db)
tickets = await provider.search_tickets(
result = await provider.search_tickets(
query,
board_id=board_id,
status_id=status_id,
status_name=status_name,
include_closed=include_closed,
member_identifier=member_identifier,
unassigned=unassigned,
board_ids=parsed_board_ids,
company_id=company_id,
page=page,
page_size=page_size,
)
return [
items = [
PSATicketSearchResult(
id=t.id,
summary=t.summary,
company_name=t.company_name,
company_id=t.company_id,
board_name=t.board_name,
board_id=t.board_id,
status_name=t.status_name,
status_id=t.status_id,
priority_name=t.priority_name,
priority_id=t.priority_id,
closed=t.closed,
)
for t in tickets
for t in result.items
]
return {"items": items, "total": result.total, "page": result.page, "page_size": result.page_size}
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
@router.post("/tickets", response_model=PSATicketCreatedSchema, status_code=201)
async def create_ticket(
data: TicketCreatePayloadSchema,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Create a new PSA ticket."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.exceptions import PSAError
from app.services.psa.types import TicketCreatePayload
try:
return await ticket_svc.create_ticket(
current_user.account_id,
TicketCreatePayload(**data.model_dump()),
db,
)
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
@router.post("/tickets/ai-parse", response_model=AiParseResponseSchema)
async def ai_parse_ticket(
data: AiParseRequestSchema,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Parse natural language into a ticket pre-fill payload using Claude."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.registry import get_provider_for_account
from app.services.psa.exceptions import PSAError
import anthropic
import json
# Fetch boards + members for context (both cached)
boards = []
members = []
try:
provider = await get_provider_for_account(current_user.account_id, db)
boards = await provider.list_boards()
members = await provider.list_members()
except PSAError:
pass
boards_list = [{"id": b.id, "name": b.name} for b in boards]
members_list = [{"id": m.id, "name": m.name, "identifier": m.identifier} for m in members]
system_prompt = """You are a ticket triage assistant for an MSP help desk.
Extract structured ticket information from the engineer's natural language description.
Return ONLY valid JSON matching this exact schema — no other text:
{
"summary": "short one-line ticket title or null",
"board_id": "integer matching one of the provided boards or null",
"priority_name": "one of: Critical, High, Medium, Low, or null",
"description": "expanded description or null",
"assignee_identifier": "member identifier string from the provided members list or null",
"warnings": ["list of strings explaining what could not be resolved"]
}"""
user_msg = f"""Available boards: {json.dumps(boards_list)}
Available members: {json.dumps(members_list[:50])}
Engineer's description: {data.prompt}"""
missing_fields: list[str] = []
warnings: list[str] = []
response_data = AiParseResponseSchema()
try:
client = anthropic.AsyncAnthropic(
api_key=settings.ANTHROPIC_API_KEY,
max_retries=1,
)
msg = await client.messages.create(
model=settings.get_model_for_action("default"),
max_tokens=512,
system=system_prompt,
messages=[{"role": "user", "content": user_msg}],
)
raw = msg.content[0].text.strip()
# Strip markdown fences if present
if raw.startswith("```"):
import re
raw = re.sub(r'^```(?:json)?\s*', '', raw)
raw = re.sub(r'\s*```$', '', raw.strip())
parsed = json.loads(raw)
response_data.summary = parsed.get("summary")
response_data.description = parsed.get("description")
warnings = parsed.get("warnings", [])
# Resolve board_id
if parsed.get("board_id"):
board_match = next((b for b in boards if b.id == int(parsed["board_id"])), None)
if board_match:
response_data.board_id = board_match.id
else:
missing_fields.append("board_id")
warnings.append(f"Board ID {parsed['board_id']} not found")
else:
missing_fields.append("board_id")
# Resolve assignee
if parsed.get("assignee_identifier"):
member = next((m for m in members if m.identifier == parsed["assignee_identifier"]), None)
if member:
response_data.assigned_member_id = int(member.id)
else:
warnings.append(f"Member '{parsed['assignee_identifier']}' not found")
# Priority/status/company always need manual selection
missing_fields.extend(["status_id", "priority_id", "company_id"])
except Exception as e:
logger.warning("AI parse failed: %s", e)
missing_fields = ["summary", "board_id", "status_id", "priority_id", "company_id"]
warnings = ["AI parsing failed — please fill in manually"]
response_data.missing_fields = missing_fields
response_data.warnings = warnings
return response_data
@router.patch("/tickets/{ticket_id}/status", response_model=PSATicketStatusUpdateSchema)
async def update_ticket_status_endpoint(
ticket_id: int,
status_id: int,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Update a ticket's status."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.exceptions import PSAError
try:
return await ticket_svc.update_status(current_user.account_id, ticket_id, status_id, db)
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
@router.get("/tickets/{ticket_id}/resources", response_model=list[PSAResourceSchema])
async def list_ticket_resources(
ticket_id: int,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.exceptions import PSAError
try:
return await ticket_svc.list_resources(current_user.account_id, ticket_id, db)
except PSAError as e:
# Resources are optional display data — degrade gracefully rather than surfacing a toast
logger.warning("list_resources(%s) failed: %s", ticket_id, e)
return []
@router.post("/tickets/{ticket_id}/resources", response_model=PSAResourceSchema, status_code=201)
async def add_ticket_resource(
ticket_id: int,
member_id: int,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.exceptions import PSAError
try:
return await ticket_svc.add_resource(current_user.account_id, ticket_id, member_id, db)
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
@router.delete("/tickets/{ticket_id}/resources/{member_id}", status_code=204)
async def remove_ticket_resource(
ticket_id: int,
member_id: int,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.exceptions import PSAError
try:
await ticket_svc.remove_resource(current_user.account_id, ticket_id, member_id, db)
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
@router.get("/priorities", response_model=list[PSAPrioritySchema])
async def list_priorities(
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""List PSA priority levels for ticket creation form."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.registry import get_provider_for_account
from app.services.psa.exceptions import PSAError
try:
provider = await get_provider_for_account(current_user.account_id, db)
raw = await provider.list_priorities()
return [PSAPrioritySchema(id=p["id"], name=p["name"]) for p in raw if p.get("id")]
except PSAError as e:
logger.warning("list_priorities failed: %s", e)
return []
@router.get("/tickets/{ticket_id}/context")
async def get_ticket_context(
ticket_id: int,
@@ -561,7 +790,30 @@ async def get_ticket_statuses(
except PSANotFoundError:
raise HTTPException(status_code=404, detail="Ticket not found")
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
logger.warning("get_ticket_statuses(%s) failed: %s", ticket_id, e)
return []
@router.get("/boards/{board_id}/statuses", response_model=list[PSATicketStatusItem])
async def get_board_statuses(
board_id: int,
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Get available statuses for a service board directly (no ticket lookup required)."""
if not current_user.account_id:
raise HTTPException(status_code=400, detail="User has no account")
from app.services.psa.registry import get_provider_for_account
from app.services.psa.exceptions import PSAError
try:
provider = await get_provider_for_account(current_user.account_id, db)
statuses = await provider.get_ticket_statuses(board_id)
return [PSATicketStatusItem(id=s.id, name=s.name, is_closed=s.is_closed) for s in statuses]
except PSAError as e:
logger.warning("get_board_statuses(%s) failed: %s", board_id, e)
return []
# ── member mapping endpoints ─────────────────────────────────────────
@@ -569,7 +821,7 @@ async def get_ticket_statuses(
@router.get("/members", response_model=list[PsaMemberResponse])
async def list_members(
current_user: Annotated[User, Depends(require_account_owner)],
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""List CW members (from CW API)."""
@@ -587,7 +839,9 @@ async def list_members(
for m in members
]
except PSAError as e:
raise HTTPException(status_code=502, detail=str(e))
# Members are optional display data — degrade gracefully
logger.warning("list_members failed: %s", e)
return []
@router.get("/member-mappings", response_model=list[PsaMemberMappingResponse])

View File

@@ -0,0 +1,231 @@
import secrets
import string
from datetime import datetime, timezone
from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.api.endpoints.auth import store_refresh_token
from app.core.admin_database import get_admin_db
from app.core.config import settings
from app.core.security import create_access_token, create_refresh_token
from app.models.account import Account
from app.models.account_invite import AccountInvite
from app.models.oauth_identity import OAuthIdentity
from app.models.user import User
from app.schemas.oauth import OAuthCallbackPayload, OAuthCallbackResponse
from app.services.billing import BillingService
from app.services.oauth_providers import (
google_exchange_code,
microsoft_exchange_code,
OAuthProfile,
)
router = APIRouter(prefix="/auth", tags=["auth-oauth"])
def _generate_display_code(length: int = 8) -> str:
"""Match the helper used by /auth/register — A-Z + 0-9, length 8."""
alphabet = string.ascii_uppercase + string.digits
return "".join(secrets.choice(alphabet) for _ in range(length))
async def _sign_in_or_register(
db: AsyncSession,
provider: str,
profile: OAuthProfile,
*,
account_invite_code: str | None = None,
invited_email: str | None = None,
) -> tuple[User, bool]:
"""Returns (user, is_new_user). Idempotent on (provider, provider_subject).
When ``account_invite_code`` is supplied (from the /accept-invite flow),
a brand-new user is created inside the invited account instead of getting
a personal account + Pro trial. Mismatch between the OAuth profile email
and ``invited_email`` raises ``invite_email_mismatch`` per the spec
contract that mirrors the email+password register path.
"""
identity = (
await db.execute(
select(OAuthIdentity).where(
OAuthIdentity.provider == provider,
OAuthIdentity.provider_subject == profile.provider_subject,
)
)
).scalar_one_or_none()
if identity:
user = (
await db.execute(select(User).where(User.id == identity.user_id))
).scalar_one()
return user, False
user = (
await db.execute(select(User).where(User.email == profile.email))
).scalar_one_or_none()
is_new_user = user is None
# If the user arrived via an invite link but already has a ResolutionFlow
# account (e.g., previously signed up with email+password), silently
# linking the OAuth identity to that existing account would bypass the
# invite — they'd stay in their personal account and the invite would
# never be consumed. Fail loud instead so they can sign in and accept the
# invite from the dashboard. The "invited user wants to transfer accounts"
# case is a v2 concern.
if account_invite_code and not is_new_user:
raise HTTPException(
status_code=400,
detail={
"error": "email_already_registered_use_login",
"message": (
"An account already exists for this email. Please sign in "
"instead, then accept the invite from your dashboard."
),
},
)
invite_record: AccountInvite | None = None
if is_new_user and account_invite_code:
# SELECT FOR UPDATE so two concurrent OAuth callbacks can't both
# consume the same invite code.
invite_record = (
await db.execute(
select(AccountInvite)
.where(AccountInvite.code == account_invite_code)
.with_for_update()
)
).scalar_one_or_none()
if invite_record is None or not invite_record.is_valid:
raise HTTPException(
status_code=400,
detail={"error": "invite_invalid_or_expired_or_revoked"},
)
# Verify the OAuth profile email matches what was invited. We compare
# against the invite row directly (source of truth), but also accept
# the client-supplied invited_email as a defensive equality check.
if invite_record.email.lower() != profile.email.lower():
raise HTTPException(
status_code=400,
detail={"error": "invite_email_mismatch"},
)
if invited_email and invited_email.lower() != invite_record.email.lower():
raise HTTPException(
status_code=400,
detail={"error": "invite_email_mismatch"},
)
if is_new_user:
if invite_record is not None:
# Join the invited account directly — no personal account, no
# trial creation.
user = User(
email=profile.email,
name=profile.name,
password_hash=None,
account_id=invite_record.account_id,
account_role=invite_record.role,
role="engineer",
email_verified_at=datetime.now(timezone.utc),
)
db.add(user)
await db.flush()
invite_record.accepted_by_id = user.id
invite_record.used_at = datetime.now(timezone.utc)
await db.flush()
else:
account = Account(
name=f"{profile.name}'s Account",
display_code=_generate_display_code(),
)
db.add(account)
await db.flush()
user = User(
email=profile.email,
name=profile.name,
password_hash=None,
account_id=account.id,
account_role="owner",
role="engineer",
email_verified_at=datetime.now(timezone.utc),
)
db.add(user)
await db.flush()
account.owner_id = user.id
await db.flush()
# start_trial commits internally; flushed account/user above.
await BillingService.start_trial(db, account.id)
db.add(
OAuthIdentity(
user_id=user.id,
provider=provider,
provider_subject=profile.provider_subject,
provider_email_at_link=profile.email,
)
)
await db.commit()
await db.refresh(user)
return user, is_new_user
@router.post("/google/callback", response_model=OAuthCallbackResponse)
async def google_callback(
payload: OAuthCallbackPayload,
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> OAuthCallbackResponse:
if not settings.GOOGLE_CLIENT_ID:
raise HTTPException(status_code=503, detail="Google sign-in not configured")
redirect_uri = f"{settings.OAUTH_REDIRECT_BASE}/auth/google/callback"
profile = await google_exchange_code(payload.code, redirect_uri)
user, is_new = await _sign_in_or_register(
db,
"google",
profile,
account_invite_code=payload.account_invite_code,
invited_email=payload.invited_email,
)
refresh_token_str = create_refresh_token({"sub": str(user.id)})
# Persist the refresh-token JTI so the first /auth/refresh call doesn't
# reject this token as "revoked" (the rotation logic requires a row to
# mark as used). _sign_in_or_register already committed; this needs a
# second commit.
await store_refresh_token(db, refresh_token_str, user.id)
await db.commit()
return OAuthCallbackResponse(
access_token=create_access_token({"sub": str(user.id)}),
refresh_token=refresh_token_str,
is_new_user=is_new,
)
@router.post("/microsoft/callback", response_model=OAuthCallbackResponse)
async def microsoft_callback(
payload: OAuthCallbackPayload,
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> OAuthCallbackResponse:
if not settings.MS_CLIENT_ID:
raise HTTPException(status_code=503, detail="Microsoft sign-in not configured")
redirect_uri = f"{settings.OAUTH_REDIRECT_BASE}/auth/microsoft/callback"
profile = await microsoft_exchange_code(payload.code, redirect_uri)
user, is_new = await _sign_in_or_register(
db,
"microsoft",
profile,
account_invite_code=payload.account_invite_code,
invited_email=payload.invited_email,
)
refresh_token_str = create_refresh_token({"sub": str(user.id)})
# Persist the refresh-token JTI so the first /auth/refresh call doesn't
# reject this token as "revoked" (the rotation logic requires a row to
# mark as used). _sign_in_or_register already committed; this needs a
# second commit.
await store_refresh_token(db, refresh_token_str, user.id)
await db.commit()
return OAuthCallbackResponse(
access_token=create_access_token({"sub": str(user.id)}),
refresh_token=refresh_token_str,
is_new_user=is_new,
)

View File

@@ -2,19 +2,24 @@
from typing import Annotated
from fastapi import APIRouter, Depends
from fastapi import APIRouter, Depends, HTTPException, status
from sqlalchemy import func, select
from sqlalchemy.ext.asyncio import AsyncSession
from app.api.deps import get_current_active_user
from app.core.database import get_db
from app.core.admin_database import get_admin_db
from app.models.account import Account
from app.models.assistant_chat import AssistantChat
from app.models.psa_connection import PsaConnection
from app.models.session import Session
from app.models.tree import Tree
from app.models.user import User
from app.schemas.onboarding import OnboardingStatus
from app.schemas.onboarding import (
OnboardingStatus,
OnboardingStepRequest,
OnboardingStepResponse,
)
router = APIRouter(prefix="/users", tags=["onboarding"])
@@ -85,6 +90,10 @@ async def get_onboarding_status(
)
connected_psa = (psa_q.scalar() or 0) > 0
# New (Phase 2 — Task 41)
email_verified = current_user.email_verified_at is not None
shop_setup_done = (current_user.onboarding_step_completed or 0) >= 1
return OnboardingStatus(
created_flow=created_flow,
ran_session=ran_session,
@@ -94,6 +103,8 @@ async def get_onboarding_status(
connected_psa=connected_psa,
is_team_user=is_team_user,
dismissed=current_user.onboarding_dismissed,
email_verified=email_verified,
shop_setup_done=shop_setup_done,
)
@@ -109,3 +120,98 @@ async def dismiss_onboarding(
# Return updated status (reuse the GET logic)
return await get_onboarding_status(db=db, current_user=current_user)
# ---------------------------------------------------------------------------
# Welcome wizard endpoints (Phase 2)
#
# These persist Step 1/2/3 progress for the post-signup welcome wizard.
# Mounted on /users/me/* (the parent router prefix is /users) so the wizard
# can run before email verification and during trial.
# ---------------------------------------------------------------------------
@router.patch("/me/onboarding-step", response_model=OnboardingStepResponse)
async def patch_onboarding_step(
body: OnboardingStepRequest,
db: Annotated[AsyncSession, Depends(get_admin_db)],
current_user: Annotated[User, Depends(get_current_active_user)],
) -> OnboardingStepResponse:
"""Persist welcome-wizard progress for the current user.
Contract:
- step=1 + complete writes accounts.name, accounts.team_size_bucket,
users.role_at_signup, then sets users.onboarding_step_completed=1.
- step=2 + complete writes accounts.primary_psa, then sets
users.onboarding_step_completed=2.
- step=3 + complete just sets users.onboarding_step_completed=3
(invites are POSTed separately).
- action="skip" ignores `data` entirely and only advances the step.
- The new step must be >= current onboarding_step_completed (None=>0);
otherwise 400. Idempotent re-PATCH of the same step succeeds.
"""
current_step = current_user.onboarding_step_completed or 0
if body.step < current_step:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail={
"error": "step_cannot_decrease",
"current_step": current_step,
"requested_step": body.step,
},
)
if body.action == "complete" and body.data is not None and body.step in (1, 2):
# Load the user's account for field writes. Step 3 has no data writes.
account_result = await db.execute(
select(Account).where(Account.id == current_user.account_id)
)
account = account_result.scalar_one_or_none()
if account is None:
# Should never happen — user is required to have an account_id.
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="account_not_found",
)
if body.step == 1:
data = body.data
if data.company_name is not None:
account.name = data.company_name
if data.team_size_bucket is not None:
account.team_size_bucket = data.team_size_bucket
if data.role_at_signup is not None:
current_user.role_at_signup = data.role_at_signup
elif body.step == 2:
data = body.data
if data.primary_psa is not None:
account.primary_psa = data.primary_psa
current_user.onboarding_step_completed = body.step
await db.commit()
await db.refresh(current_user)
return OnboardingStepResponse(
onboarding_step_completed=current_user.onboarding_step_completed,
onboarding_dismissed=current_user.onboarding_dismissed,
)
@router.post("/me/onboarding-dismiss-rest", response_model=OnboardingStepResponse)
async def dismiss_onboarding_rest(
db: Annotated[AsyncSession, Depends(get_admin_db)],
current_user: Annotated[User, Depends(get_current_active_user)],
) -> OnboardingStepResponse:
"""Set users.onboarding_dismissed=TRUE — backs the wizard's "Skip the rest" button.
Returns the same shape as the step PATCH so the frontend can update its
local store from a single response.
"""
current_user.onboarding_dismissed = True
await db.commit()
await db.refresh(current_user)
return OnboardingStepResponse(
onboarding_step_completed=current_user.onboarding_step_completed,
onboarding_dismissed=current_user.onboarding_dismissed,
)

View File

@@ -0,0 +1,58 @@
"""Public plans endpoint — no auth required.
GET /api/v1/plans/public
Returns the public-safe view of `plan_billing` joined with
`plan_limits.max_users` (exposed as `max_seats`), filtered to
`is_public=True AND is_archived=False`, ordered by sort_order ASC, plan ASC.
Distinct from `/admin/plan-limits` (admin-only, returns ALL plans including
archived/internal). This endpoint exists to power the marketing /pricing page
without exposing the rest of the admin-only billing surface.
"""
from __future__ import annotations
from typing import Annotated
from fastapi import APIRouter, Depends
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.admin_database import get_admin_db
from app.models.plan_billing import PlanBilling
from app.models.plan_limits import PlanLimits
from app.schemas.billing import PublicPlanResponse
router = APIRouter(prefix="/plans", tags=["plans"])
@router.get("/public", response_model=list[PublicPlanResponse])
async def list_public_plans(
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> list[PublicPlanResponse]:
"""List public, non-archived plans for the marketing /pricing page.
Public — no auth. Uses `get_admin_db` because this is a cross-tenant read
of the global plan catalog (same pattern as `/config/public`).
"""
stmt = (
select(PlanBilling, PlanLimits.max_users)
.outerjoin(PlanLimits, PlanBilling.plan == PlanLimits.plan)
.where(PlanBilling.is_public.is_(True))
.where(PlanBilling.is_archived.is_(False))
.order_by(PlanBilling.sort_order.asc(), PlanBilling.plan.asc())
)
rows = (await db.execute(stmt)).all()
return [
PublicPlanResponse(
plan=billing.plan,
display_name=billing.display_name,
description=billing.description,
monthly_price_cents=billing.monthly_price_cents,
annual_price_cents=billing.annual_price_cents,
max_seats=max_users,
sort_order=billing.sort_order,
is_public=billing.is_public,
)
for billing, max_users in rows
]

View File

@@ -0,0 +1,114 @@
"""Public Talk-to-Sales endpoint — no auth required.
POST /api/v1/sales-leads
- Inserts a sales_leads row.
- Fires (best-effort) a notification email to settings.SALES_LEAD_RECIPIENT_EMAIL.
- Emits a server-side PostHog event (best-effort).
- Rate-limited per IP (5/hour).
"""
from __future__ import annotations
import asyncio
import logging
from typing import Annotated
from fastapi import APIRouter, Depends, Request
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.admin_database import get_admin_db
from app.core.config import settings
from app.core.email import EmailService
from app.core.rate_limit import limiter
from app.models.sales_lead import SalesLead
from app.schemas.sales_lead import SalesLeadCreate, SalesLeadCreateResponse
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/sales-leads", tags=["sales"])
async def _send_notification_email(lead: SalesLead) -> None:
"""Fire-and-forget wrapper. EmailService methods never raise, but we
still wrap in a try/except to defend against future regressions."""
try:
await EmailService.send_sales_lead_notification(
to_email=settings.SALES_LEAD_RECIPIENT_EMAIL,
lead=lead,
)
except Exception:
logger.warning(
"Sales lead notification email failed for lead %s",
lead.id,
exc_info=True,
)
def _capture_posthog_event(lead: SalesLead) -> None:
"""Emit `talk_to_sales_form_submitted` server-side. Best-effort.
Backend PostHog SDK isn't initialized in the project today; this function
is the single instrumentation point so wiring it up later is a one-line
change. The call is wrapped so any future failure can never fail the
request.
"""
try:
# Lazy import — keeps the dependency optional. When the backend
# PostHog client is wired in (likely as `app.core.analytics.posthog`),
# swap the import path here and the event will fire automatically.
try:
from app.core.analytics import posthog # type: ignore[attr-defined]
except ImportError:
logger.debug(
"PostHog server-side capture skipped — client not configured"
)
return
distinct_id = lead.posthog_distinct_id or f"sales_lead:{lead.id}"
posthog.capture(
distinct_id=distinct_id,
event="talk_to_sales_form_submitted",
properties={
"source": lead.source,
"company": lead.company,
"team_size": lead.team_size,
},
)
except Exception:
logger.warning(
"PostHog capture failed for sales lead %s",
lead.id,
exc_info=True,
)
@router.post("", response_model=SalesLeadCreateResponse, status_code=201)
@limiter.limit("5/hour")
async def create_sales_lead(
request: Request,
data: SalesLeadCreate,
db: Annotated[AsyncSession, Depends(get_admin_db)],
) -> SalesLeadCreateResponse:
"""Public Talk-to-Sales submission.
Creates a sales_leads row, fires (best-effort) a notification email and a
server-side PostHog event. Rate-limited per IP at 5/hour.
"""
lead = SalesLead(
email=str(data.email).lower(),
name=data.name,
company=data.company,
team_size=data.team_size,
message=data.message,
source=data.source,
posthog_distinct_id=data.posthog_distinct_id,
)
db.add(lead)
await db.commit()
await db.refresh(lead)
# Fire-and-forget: email + analytics. Failures must not fail the request.
asyncio.create_task(_send_notification_email(lead))
_capture_posthog_event(lead)
return SalesLeadCreateResponse(id=lead.id, status="received")

View File

@@ -260,6 +260,7 @@ async def save_to_library(
category_id=data.category_id,
share_with_team=data.share_with_team,
user_id=current_user.id,
account_id=current_user.account_id,
team_id=current_user.team_id,
script_body=data.script_body,
parameters_schema=data.parameters_schema,

View File

@@ -1,23 +1,28 @@
"""Handoff endpoints — unified park/escalate.
POST /ai-sessions/{id}/handoff — Create handoff
POST /ai-sessions/{id}/handoff — Create handoff
GET /ai-sessions/{id}/handoffs — Handoff history
POST /ai-sessions/{id}/handoffs/{hid}/claim — Claim session
GET /ai-sessions/queue — Team queue
GET /ai-sessions/queue — Team queue
GET /ai-sessions/escalations/stream — SSE: live escalation arrivals
"""
import asyncio
import json
import logging
from typing import Annotated
from typing import Annotated, AsyncGenerator
from uuid import UUID
from fastapi import APIRouter, Depends, HTTPException, status
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, Request, status
from fastapi.responses import StreamingResponse
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.api.deps import get_current_active_user, get_db
from app.api.deps import get_current_active_user, get_db, require_engineer_or_admin
from app.core.escalation_bus import bus as escalation_bus
from app.models.user import User
from app.models.ai_session import AISession
from app.models.session_handoff import SessionHandoff
from app.services.handoff_manager import HandoffManager
from app.services.handoff_manager import HandoffAlreadyClaimedError, HandoffManager
from app.schemas.session_handoff import (
HandoffCreateRequest,
HandoffResponse,
@@ -36,6 +41,7 @@ router = APIRouter(prefix="/ai-sessions/{session_id}", tags=["session-handoffs"]
async def create_handoff(
session_id: UUID,
body: HandoffCreateRequest,
background_tasks: BackgroundTasks,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
) -> HandoffResponse:
@@ -58,12 +64,35 @@ async def create_handoff(
engineer_notes=body.engineer_notes,
user_id=current_user.id,
priority=body.priority,
target_user_id=body.target_user_id,
)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
# For escalate: generate documentation + push to PSA before commit so
# the handoff and the PSA-state changes land atomically.
if handoff.intent == "escalate":
await manager.finalize_escalation(handoff, session, current_user.id)
await db.commit()
return HandoffResponse.model_validate(handoff)
# Best-effort notification dispatch AFTER commit so we never email about
# a rolled-back handoff. Failures are swallowed inside the manager —
# handoff creation is authoritative; notifications are advisory.
if handoff.intent == "escalate":
from app.services.handoff_manager import enrich_escalation_async
await manager.dispatch_escalation_notifications(handoff)
# AI enrichment (Sonnet assessment + enhanced escalation_package)
# runs in the background after the response is sent so the
# escalating engineer doesn't wait on 15-25s of model latency.
background_tasks.add_task(
enrich_escalation_async, handoff.id, current_user.id
)
return HandoffResponse.model_validate(handoff).model_copy(
update={"handed_off_by_name": current_user.name}
)
@router.get("/handoffs", response_model=list[HandoffResponse])
@@ -86,21 +115,49 @@ async def list_handoffs(
async def claim_handoff(
session_id: UUID,
handoff_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
current_user: Annotated[User, Depends(require_engineer_or_admin)],
db: Annotated[AsyncSession, Depends(get_db)],
) -> HandoffResponse:
"""Claim a handed-off session."""
"""Claim a handed-off session.
Role-gated to engineer/admin/owner — viewers cannot claim. The race-condition
story (two seniors clicking Pick Up simultaneously) depends on auth gating
for audit integrity. Codex review flagged this as wedge-relevant; locked
in-scope for Escalation Mode v1.
"""
manager = HandoffManager(db)
try:
handoff = await manager.claim_session(
handoff_id=handoff_id,
claiming_user_id=current_user.id,
)
except HandoffAlreadyClaimedError as e:
# Loser of the race — the API surfaces structured detail so the
# client can render "Already claimed by {name} {time_ago}" without
# a follow-up fetch.
raise HTTPException(
status_code=status.HTTP_409_CONFLICT,
detail={
"error": "already_claimed",
"claimed_by_id": str(e.claimed_by_id),
"claimed_by_name": e.claimed_by_name,
"claimed_at": e.claimed_at.isoformat(),
},
)
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
except ValueError as e:
raise HTTPException(status_code=404, detail=str(e))
await db.commit()
return HandoffResponse.model_validate(handoff)
handed_off_by_name = (
handoff.handed_off_by_user.name
if handoff.handed_off_by_user
else None
)
return HandoffResponse.model_validate(handoff).model_copy(
update={"handed_off_by_name": handed_off_by_name}
)
@queue_router.get("/queue")
@@ -114,3 +171,83 @@ async def get_queue(
team_id=current_user.team_id,
account_id=current_user.account_id,
)
# ─── Live escalation arrivals (SSE) ──────────────────────────────────────────
#
# Streams `handoff_created` events to subscribers in the same account_id as
# the new handoff. Connected EscalationQueue instances prepend the new card
# with the locked 200ms slide-in. Account-scoped: cross-tenant leakage is
# prevented at the bus.publish boundary (only handoff.account_id subscribers
# are notified) and re-enforced here by binding the subscription to
# current_user.account_id.
#
# Heartbeat: a `: keepalive\n\n` SSE comment every 25s keeps the connection
# alive through Railway / nginx default 60s idle timeouts. Reconnect policy
# is on the client (browser EventSource auto-reconnects; our fetch-based
# reader retries with backoff).
_HEARTBEAT_INTERVAL_S = 25
_QUEUE_GET_TIMEOUT_S = 25 # < heartbeat so heartbeat fires reliably
@queue_router.get("/escalations/stream")
async def stream_escalations(
request: Request,
current_user: Annotated[
User,
Depends(require_engineer_or_admin, scope="function"),
],
):
"""SSE stream of new escalation arrivals for the current user's account.
Role-gated to engineer/admin/owner so viewers can't subscribe (matches
the queue + claim role surface). One open connection per browser tab is
expected; the bus handles fan-out.
"""
if not current_user.account_id:
raise HTTPException(
status_code=status.HTTP_403_FORBIDDEN, detail="No account"
)
account_id = current_user.account_id
async def event_generator() -> AsyncGenerator[str, None]:
queue = await escalation_bus.subscribe(account_id)
try:
# Initial hello so the client knows the stream is live.
yield (
"event: ready\n"
f"data: {json.dumps({'account_id': str(account_id)})}\n\n"
)
while True:
if await request.is_disconnected():
break
try:
event = await asyncio.wait_for(
queue.get(), timeout=_QUEUE_GET_TIMEOUT_S
)
except asyncio.TimeoutError:
# Heartbeat keeps the connection alive through proxies.
yield ": keepalive\n\n"
continue
event_type = event.get("type", "message")
yield (
f"event: {event_type}\n"
f"data: {json.dumps(event)}\n\n"
)
finally:
await escalation_bus.unsubscribe(account_id, queue)
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no",
},
)

View File

@@ -318,6 +318,11 @@ async def patch_suggested_fix_outcome(
status_code=status.HTTP_400_BAD_REQUEST,
detail="notes are required when outcome is applied_partial",
)
if body.outcome == "applied_pending" and not (body.notes and body.notes.strip()):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="notes are required when outcome is applied_pending",
)
TERMINAL = {"applied_success", "applied_failed", "dismissed"}
if fix.status in TERMINAL:
@@ -329,6 +334,10 @@ async def patch_suggested_fix_outcome(
fix.status = body.outcome
if body.outcome == "applied_partial":
fix.partial_notes = (body.notes or "").strip() or None
elif body.outcome == "applied_pending":
# Pending is parked, not terminal — keep applied_at, do NOT stamp
# verified_at. Reason explains what the engineer is waiting on.
fix.pending_reason = (body.notes or "").strip() or None
elif body.outcome == "applied_failed":
fix.failure_reason = (body.notes or "").strip() or None
fix.verified_at = now

View File

@@ -20,6 +20,7 @@ from app.core.audit import log_audit
from app.core.rate_limit import limiter
router = APIRouter(tags=["shares"])
public_router = APIRouter(tags=["shares"])
def build_share_response(share: SessionShare) -> ShareResponse:
@@ -206,7 +207,7 @@ async def _get_optional_user(request: Request, db: AsyncSession) -> Optional[Use
return None
@router.get("/share/{share_token}", response_model=SharePublicView)
@public_router.get("/share/{share_token}", response_model=SharePublicView)
@limiter.limit("30/minute")
async def access_share(
share_token: str,

View File

@@ -161,7 +161,7 @@ async def get_sidebar_stats(
select(func.count()).where(
and_(
esc_scope,
AISession.status == "requesting_escalation",
AISession.status.in_(("requesting_escalation", "escalated")),
)
)
)

View File

@@ -1,10 +1,10 @@
import logging
from fastapi import APIRouter, Request, HTTPException, status, Depends
from fastapi import APIRouter, Request, HTTPException, Depends
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.database import get_db
from app.core.admin_database import get_admin_db
from app.core.config import settings
from app.core.stripe_handlers import WEBHOOK_HANDLERS
from app.services.billing import BillingService
logger = logging.getLogger(__name__)
@@ -14,49 +14,36 @@ router = APIRouter(prefix="/webhooks", tags=["webhooks"])
@router.post("/stripe")
async def stripe_webhook(
request: Request,
db: AsyncSession = Depends(get_db),
db: AsyncSession = Depends(get_admin_db),
):
"""Handle Stripe webhook events.
"""Stripe webhook handler. Public endpoint; signature verification is the
only gate. Idempotency via stripe_events table.
Returns 200 for all events to prevent Stripe retries.
Actual processing happens only when Stripe is configured.
Returns 200 even when Stripe is not configured — keeps the receiver
permissive for local dev.
"""
if not settings.stripe_enabled:
if not settings.stripe_enabled or not settings.STRIPE_WEBHOOK_SECRET:
return {"status": "ok", "message": "Stripe not configured, event ignored"}
payload = await request.body()
sig_header = request.headers.get("stripe-signature")
if not sig_header:
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Missing stripe-signature header"
)
raise HTTPException(status_code=400, detail="Missing stripe-signature header")
# Verify webhook signature
try:
import stripe
stripe.api_key = settings.STRIPE_SECRET_KEY
event = stripe.Webhook.construct_event(
payload, sig_header, settings.STRIPE_WEBHOOK_SECRET
)
except ImportError:
logger.warning("stripe package not installed, cannot verify webhook")
return {"status": "ok", "message": "stripe package not installed"}
except Exception as e:
logger.error("Stripe webhook signature verification failed: %s", e)
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid signature"
)
logger.warning("stripe webhook bad signature: %s", e)
raise HTTPException(status_code=400, detail="Invalid signature")
event_type = event.get("type", "")
handler = WEBHOOK_HANDLERS.get(event_type)
if handler:
try:
await handler(event, db)
except Exception:
logger.exception("Error handling Stripe event %s", event_type)
return {"status": "ok"}
applied = await BillingService.apply_subscription_event(
db,
event_id=event["id"],
event_type=event["type"],
payload={"data": event["data"]},
)
return {"status": "ok", "applied": applied}

View File

@@ -1,6 +1,10 @@
from fastapi import APIRouter, Depends
from app.api.deps import require_tenant_context
from app.api.deps import (
require_tenant_context,
require_active_subscription,
require_verified_email_after_grace,
)
from app.api.endpoints import (
admin,
admin_audit,
@@ -19,10 +23,13 @@ from app.api.endpoints import (
analytics,
assistant_chat,
auth,
billing,
beta_feedback,
beta_signup,
sales_leads,
branding,
categories,
config as config_endpoints,
copilot,
device_types,
draft_templates,
@@ -36,7 +43,9 @@ from app.api.endpoints import (
maintenance_schedules,
network_diagrams,
notifications,
oauth as oauth_endpoints,
onboarding,
plans_public,
public_templates,
ratings,
scripts,
@@ -62,6 +71,7 @@ from app.api.endpoints import (
uploads,
webhooks,
accounts,
account_invite_lookup,
)
api_router = APIRouter()
@@ -77,10 +87,18 @@ api_router = APIRouter()
# in Phase 1. This will need revisiting in Phase 2 when `users` gets RLS.
# ---------------------------------------------------------------------------
api_router.include_router(auth.router)
api_router.include_router(oauth_endpoints.router)
api_router.include_router(billing.router) # Reachable when subscription locked
api_router.include_router(shared.router) # Public share links (no auth)
api_router.include_router(shares.public_router) # Public session share links (optional auth)
api_router.include_router(beta_signup.router)
api_router.include_router(sales_leads.router) # Talk-to-Sales (no auth, rate-limited)
api_router.include_router(webhooks.router) # Stripe webhook receiver
api_router.include_router(public_templates.router) # Public gallery (no auth, rate-limited)
api_router.include_router(survey.router) # Public survey flow (no auth, rate-limited)
api_router.include_router(config_endpoints.router) # Public runtime feature flags
api_router.include_router(account_invite_lookup.router) # Public invite-code lookup for /accept-invite
api_router.include_router(plans_public.router) # Public plan catalog for /pricing page
# ---------------------------------------------------------------------------
# Admin endpoints — super_admin only
@@ -100,23 +118,36 @@ api_router.include_router(admin_survey.router)
api_router.include_router(admin_gallery.router)
# ---------------------------------------------------------------------------
# User-facing endpoints — tenant context required
#
# _tenant_deps: routers that only require an authenticated user inside a
# tenant (auth/account/admin/non-Pro feature surfaces).
# _pro_deps: routers gated behind an active Pro subscription. Adds
# require_active_subscription which raises 402 unless the
# account's Subscription is active/complimentary/past_due or
# trialing-with-time-remaining. Allowlisted paths in deps.py
# bypass the gate for billing/account admin/auth flows.
# ---------------------------------------------------------------------------
_tenant_deps = [Depends(require_tenant_context)]
_pro_deps = [
Depends(require_tenant_context),
Depends(require_active_subscription),
Depends(require_verified_email_after_grace),
]
api_router.include_router(trees.router, dependencies=_tenant_deps)
api_router.include_router(trees.router, dependencies=_pro_deps)
api_router.include_router(sidebar.router, dependencies=_tenant_deps)
api_router.include_router(sessions.router, dependencies=_tenant_deps)
api_router.include_router(sessions.router, dependencies=_pro_deps)
api_router.include_router(invite.router, dependencies=_tenant_deps)
api_router.include_router(categories.router, dependencies=_tenant_deps)
api_router.include_router(tags.router, dependencies=_tenant_deps)
api_router.include_router(folders.router, dependencies=_tenant_deps)
api_router.include_router(step_categories.router, dependencies=_tenant_deps)
api_router.include_router(steps.router, dependencies=_tenant_deps)
api_router.include_router(step_categories.router, dependencies=_pro_deps)
api_router.include_router(steps.router, dependencies=_pro_deps)
api_router.include_router(accounts.router, dependencies=_tenant_deps)
api_router.include_router(shares.router, dependencies=_tenant_deps)
api_router.include_router(tree_markdown.router, dependencies=_tenant_deps)
api_router.include_router(ratings.router, dependencies=_tenant_deps)
api_router.include_router(analytics.router, dependencies=_tenant_deps)
api_router.include_router(analytics.router, dependencies=_pro_deps)
api_router.include_router(target_lists.router, dependencies=_tenant_deps)
api_router.include_router(maintenance_schedules.router, dependencies=_tenant_deps)
api_router.include_router(feedback.router, dependencies=_tenant_deps)
@@ -124,32 +155,31 @@ api_router.include_router(ai_builder.router, dependencies=_tenant_deps)
api_router.include_router(ai_fix.router, dependencies=_tenant_deps)
api_router.include_router(ai_chat.router, dependencies=_tenant_deps)
api_router.include_router(copilot.router, dependencies=_tenant_deps)
api_router.include_router(assistant_chat.router, dependencies=_tenant_deps)
api_router.include_router(survey.router, dependencies=_tenant_deps)
api_router.include_router(assistant_chat.router, dependencies=_pro_deps)
api_router.include_router(tree_transfer.router, dependencies=_tenant_deps)
api_router.include_router(ai_suggestions.router, dependencies=_tenant_deps)
api_router.include_router(kb_accelerator.router, dependencies=_tenant_deps)
api_router.include_router(scripts.router, dependencies=_tenant_deps)
api_router.include_router(integrations.router, dependencies=_tenant_deps)
api_router.include_router(scripts.router, dependencies=_pro_deps)
api_router.include_router(integrations.router, dependencies=_pro_deps)
api_router.include_router(onboarding.router, dependencies=_tenant_deps)
api_router.include_router(branding.router, dependencies=_tenant_deps)
api_router.include_router(supporting_data.router, dependencies=_tenant_deps)
api_router.include_router(network_diagrams.router, dependencies=_tenant_deps)
# session_handoffs queue router must come before ai_sessions to avoid conflict
api_router.include_router(session_handoffs.queue_router, dependencies=_tenant_deps)
api_router.include_router(session_resolutions.router, dependencies=_tenant_deps)
api_router.include_router(session_handoffs.queue_router, dependencies=_pro_deps)
api_router.include_router(session_resolutions.router, dependencies=_pro_deps)
# session_facts mounts under /ai-sessions/{id}/facts — register before ai_sessions
# so the {session_id}/facts subpaths take precedence over any future generic catchalls.
api_router.include_router(session_facts.router, dependencies=_tenant_deps)
api_router.include_router(session_suggested_fixes.router, dependencies=_tenant_deps)
api_router.include_router(session_facts.router, dependencies=_pro_deps)
api_router.include_router(session_suggested_fixes.router, dependencies=_pro_deps)
api_router.include_router(draft_templates.router, dependencies=_tenant_deps)
api_router.include_router(ai_sessions.router, dependencies=_tenant_deps)
api_router.include_router(flow_proposals.router, dependencies=_tenant_deps)
api_router.include_router(flowpilot_analytics.router, dependencies=_tenant_deps)
api_router.include_router(ai_sessions.router, dependencies=_pro_deps)
api_router.include_router(flow_proposals.router, dependencies=_pro_deps)
api_router.include_router(flowpilot_analytics.router, dependencies=_pro_deps)
api_router.include_router(notifications.router, dependencies=_tenant_deps)
api_router.include_router(uploads.router, dependencies=_tenant_deps)
api_router.include_router(script_builder.router, dependencies=_tenant_deps)
api_router.include_router(script_builder.router, dependencies=_pro_deps)
api_router.include_router(beta_feedback.router, dependencies=_tenant_deps)
api_router.include_router(session_branches.router, dependencies=_tenant_deps)
api_router.include_router(session_handoffs.router, dependencies=_tenant_deps)
api_router.include_router(session_branches.router, dependencies=_pro_deps)
api_router.include_router(session_handoffs.router, dependencies=_pro_deps)
api_router.include_router(device_types.router, dependencies=_tenant_deps)

View File

@@ -84,6 +84,7 @@ class Settings(BaseSettings):
RESEND_API_KEY: Optional[str] = None
FROM_EMAIL: str = "ResolutionFlow <invites@resolutionflow.com>"
FEEDBACK_EMAIL: Optional[str] = None
SALES_LEAD_RECIPIENT_EMAIL: str = "sales@resolutionflow.com"
@property
def email_enabled(self) -> bool:
@@ -94,11 +95,12 @@ class Settings(BaseSettings):
STRIPE_SECRET_KEY: Optional[str] = None
STRIPE_PUBLISHABLE_KEY: Optional[str] = None
STRIPE_WEBHOOK_SECRET: Optional[str] = None
SELF_SERVE_ENABLED: bool = False
@property
def stripe_enabled(self) -> bool:
"""Check if Stripe is configured."""
return self.STRIPE_SECRET_KEY is not None and self.STRIPE_WEBHOOK_SECRET is not None
return bool(self.STRIPE_SECRET_KEY)
# AI Flow Builder
ANTHROPIC_API_KEY: Optional[str] = None
@@ -111,6 +113,16 @@ class Settings(BaseSettings):
GOOGLE_AI_API_KEY: Optional[str] = None
AI_MODEL_GEMINI: str = "gemini-2.5-flash"
AI_MODEL_ANTHROPIC: str = "claude-sonnet-4-6"
# Bound for the diagnostic assessment Sonnet call. Generation runs in a
# FastAPI BackgroundTask (commit e8ba74e), so this no longer blocks the
# senior's click — only how long we wait before publishing
# `handoff_assessment_ready` with has_assessment=false. 15s was hitting
# tail latency on Sonnet (timeout 03:57:35 in field testing 2026-04-29),
# leaving the magic-moment placeholder permanent. 45s is the right
# ceiling: well above Sonnet p99 for a 500-token output, far enough
# below "the senior gives up watching" that we still surface SOMETHING
# on persistent slowness.
ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS: int = 45
# Model tier routing — maps action types to model tiers
AI_MODEL_TIERS: dict[str, str] = {
@@ -183,6 +195,13 @@ class Settings(BaseSettings):
"""Check if ConnectWise integration is configured."""
return self.CW_CLIENT_ID is not None
# OAuth providers (self-serve signup)
GOOGLE_CLIENT_ID: Optional[str] = None
GOOGLE_CLIENT_SECRET: Optional[str] = None
MS_CLIENT_ID: Optional[str] = None
MS_CLIENT_SECRET: Optional[str] = None
OAUTH_REDIRECT_BASE: str = "http://localhost:5173"
# Monitoring
SENTRY_DSN: Optional[str] = None

View File

@@ -1,6 +1,11 @@
import logging
from typing import TYPE_CHECKING
from app.core.config import settings
if TYPE_CHECKING:
from app.models.sales_lead import SalesLead
logger = logging.getLogger(__name__)
@@ -484,6 +489,99 @@ class EmailService:
logger.exception("Failed to send beta signup notification for %s", signup_email)
return False
@staticmethod
async def send_sales_lead_notification(
to_email: str,
lead: "SalesLead",
) -> bool:
"""Notify the sales recipient about a new Talk-to-Sales submission.
Fire-and-forget. Returns False (and logs) on any failure; never raises.
"""
if not settings.email_enabled:
logger.warning(
"Sales lead email not sent — RESEND_API_KEY not configured (lead %s)",
lead.id,
)
return False
try:
import resend
import html as html_mod
from datetime import datetime, timezone
resend.api_key = settings.RESEND_API_KEY
date_str = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M UTC")
safe_email = html_mod.escape(lead.email)
safe_name = html_mod.escape(lead.name)
safe_company = html_mod.escape(lead.company)
safe_team_size = html_mod.escape(lead.team_size or "")
safe_source = html_mod.escape(lead.source)
safe_message = html_mod.escape(lead.message or "(no message)")
subject = f"[ResolutionFlow Sales] New lead — {safe_company} ({safe_email})"
email_html = f"""<!DOCTYPE html>
<html><head><meta charset="utf-8"><meta name="viewport" content="width=device-width"></head>
<body style="margin:0;padding:0;background:#101114;font-family:'Inter',Helvetica,Arial,sans-serif;">
<table width="100%" cellpadding="0" cellspacing="0" style="background:#101114;padding:40px 0;">
<tr><td align="center">
<table width="560" cellpadding="0" cellspacing="0" style="background:#14161a;border:1px solid rgba(255,255,255,0.06);border-radius:16px;">
<tr><td style="padding:40px 40px 24px;text-align:center;">
<h1 style="margin:0;color:#f8fafc;font-size:24px;font-weight:600;">Resolution<span style="color:#06b6d4;">Flow</span></h1>
<p style="margin:8px 0 0;color:#5a6170;font-size:14px;">New Sales Lead</p>
</td></tr>
<tr><td style="padding:0 40px 16px;">
<p style="margin:0;color:#8891a0;font-size:16px;line-height:1.6;">
Source: <strong style="color:#f8fafc;">{safe_source}</strong>
</p>
</td></tr>
<tr><td style="padding:0 40px 16px;">
<table width="100%" cellpadding="0" cellspacing="0" style="background:rgba(0,0,0,0.3);border:1px solid rgba(255,255,255,0.06);border-radius:12px;">
<tr><td style="padding:16px;">
<p style="margin:0 0 4px;color:#5a6170;font-size:12px;text-transform:uppercase;letter-spacing:1px;">Name</p>
<p style="margin:0 0 12px;color:#f8fafc;font-size:16px;font-weight:600;">{safe_name}</p>
<p style="margin:0 0 4px;color:#5a6170;font-size:12px;text-transform:uppercase;letter-spacing:1px;">Email</p>
<p style="margin:0 0 12px;color:#22d3ee;font-size:16px;font-weight:600;">{safe_email}</p>
<p style="margin:0 0 4px;color:#5a6170;font-size:12px;text-transform:uppercase;letter-spacing:1px;">Company</p>
<p style="margin:0 0 12px;color:#f8fafc;font-size:16px;font-weight:600;">{safe_company}</p>
<p style="margin:0 0 4px;color:#5a6170;font-size:12px;text-transform:uppercase;letter-spacing:1px;">Team Size</p>
<p style="margin:0;color:#f8fafc;font-size:16px;font-weight:600;">{safe_team_size}</p>
</td></tr>
</table>
</td></tr>
<tr><td style="padding:0 40px 16px;">
<p style="margin:0 0 4px;color:#5a6170;font-size:12px;text-transform:uppercase;letter-spacing:1px;">Message</p>
<p style="margin:0;color:#8891a0;font-size:14px;line-height:1.6;white-space:pre-wrap;">{safe_message}</p>
</td></tr>
<tr><td style="padding:0 40px 32px;">
<p style="margin:0;color:#5a6170;font-size:12px;text-align:center;">
Submitted at {date_str} · Lead ID: {lead.id}
</p>
</td></tr>
</table>
</td></tr>
</table>
</body></html>"""
resend.Emails.send({
"from": settings.FROM_EMAIL,
"to": [to_email],
"reply_to": lead.email,
"subject": subject,
"html": email_html,
})
logger.info("Sales lead notification sent for %s (lead %s)", lead.email, lead.id)
return True
except Exception:
logger.exception(
"Failed to send sales lead notification for %s (lead %s)",
lead.email,
lead.id,
)
return False
@staticmethod
async def send_notification_email(
to_email: str,

View File

@@ -0,0 +1,105 @@
"""In-memory pub/sub bus for live escalation events.
Single-process, non-durable. When a handoff fires, every connected SSE
subscriber for the same `account_id` receives the event. Subscribers come
and go as senior techs open and close the EscalationQueue page.
Pre-PMF scale (3 pilots × 5-20 techs/MSP = ~15-60 concurrent subscribers
total, single Railway replica) makes in-memory the right call. When the
deployment scales horizontally, swap this for Redis pub/sub or similar —
the public surface (`publish` / `subscribe`) is intentionally narrow so
the swap is local.
Events are JSON-serializable dicts. `publish()` is non-blocking (drops the
event if a subscriber's queue is full rather than back-pressuring the
caller). `subscribe()` MUST be paired with `unsubscribe()` in a finally
block, or you leak queues.
"""
from __future__ import annotations
import asyncio
import logging
from typing import Any
from uuid import UUID
logger = logging.getLogger(__name__)
# Bound how many unconsumed events can sit in a subscriber's queue before
# we start dropping. 64 is generous for the queue-page use case; if a
# subscriber is that far behind, they're probably gone or stuck.
_QUEUE_MAXSIZE = 64
class EscalationBus:
"""Account-scoped pub/sub for escalation arrival events."""
def __init__(self) -> None:
self._subscribers: dict[UUID, set[asyncio.Queue[dict[str, Any]]]] = {}
self._lock = asyncio.Lock()
@staticmethod
def _normalize_account_id(account_id: UUID | str) -> UUID:
return account_id if isinstance(account_id, UUID) else UUID(str(account_id))
async def subscribe(self, account_id: UUID | str) -> asyncio.Queue[dict[str, Any]]:
"""Register a new subscriber queue for an account.
Caller must invoke `unsubscribe(account_id, queue)` when the
consumer disconnects.
"""
normalized_account_id = self._normalize_account_id(account_id)
queue: asyncio.Queue[dict[str, Any]] = asyncio.Queue(
maxsize=_QUEUE_MAXSIZE
)
async with self._lock:
self._subscribers.setdefault(normalized_account_id, set()).add(queue)
return queue
async def unsubscribe(
self, account_id: UUID | str, queue: asyncio.Queue[dict[str, Any]]
) -> None:
normalized_account_id = self._normalize_account_id(account_id)
async with self._lock:
subs = self._subscribers.get(normalized_account_id)
if subs is None:
return
subs.discard(queue)
if not subs:
self._subscribers.pop(normalized_account_id, None)
async def publish(self, account_id: UUID | str, event: dict[str, Any]) -> int:
"""Fan event out to every subscriber for `account_id`.
Returns the number of subscribers that successfully received the
event. Drops the event for any subscriber whose queue is full
(logs at warning level).
"""
normalized_account_id = self._normalize_account_id(account_id)
async with self._lock:
subs = list(self._subscribers.get(normalized_account_id, ()))
if not subs:
return 0
delivered = 0
for queue in subs:
try:
queue.put_nowait(event)
delivered += 1
except asyncio.QueueFull:
logger.warning(
"EscalationBus: dropped event for full subscriber queue "
"(account_id=%s, event=%s)",
normalized_account_id,
event.get("type", "?"),
)
return delivered
def subscriber_count(self, account_id: UUID | str) -> int:
"""Diagnostic — number of active subscribers for an account."""
normalized_account_id = self._normalize_account_id(account_id)
return len(self._subscribers.get(normalized_account_id, ()))
# Module-level singleton. FastAPI imports this; `subscribe()` and `publish()`
# are coroutine-safe via the internal Lock.
bus = EscalationBus()

View File

@@ -62,6 +62,10 @@ from .session_fact import SessionFact
from .session_suggested_fix import SessionSuggestedFix
from .draft_template import DraftTemplate
from .account_settings import AccountSettings
from .oauth_identity import OAuthIdentity # noqa: F401
from .plan_billing import PlanBilling # noqa: F401
from .sales_lead import SalesLead # noqa: F401
from .stripe_event import StripeEvent # noqa: F401
__all__ = [
"User",
@@ -138,4 +142,8 @@ __all__ = [
"SessionSuggestedFix",
"DraftTemplate",
"AccountSettings",
"OAuthIdentity",
"PlanBilling",
"SalesLead",
"StripeEvent",
]

View File

@@ -48,6 +48,8 @@ class Account(Base):
branding_logo_url: Mapped[Optional[str]] = mapped_column(String(500), nullable=True)
branding_primary_color: Mapped[Optional[str]] = mapped_column(String(7), nullable=True) # hex like #06b6d4
branding_company_name: Mapped[Optional[str]] = mapped_column(String(200), nullable=True)
team_size_bucket: Mapped[Optional[str]] = mapped_column(String(20), nullable=True)
primary_psa: Mapped[Optional[str]] = mapped_column(String(20), nullable=True)
# SSO / SAML groundwork (Task 11)
sso_enabled: Mapped[bool] = mapped_column(Boolean, default=False, server_default="false")

View File

@@ -27,6 +27,8 @@ class AccountInvite(Base):
expires_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
used_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
revoked_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
email_sent_at: Mapped[Optional[datetime]] = mapped_column(DateTime(timezone=True), nullable=True)
# Relationships
account: Mapped["Account"] = relationship("Account")
@@ -37,6 +39,10 @@ class AccountInvite(Base):
def is_used(self) -> bool:
return self.accepted_by_id is not None
@property
def is_revoked(self) -> bool:
return self.revoked_at is not None
@property
def is_expired(self) -> bool:
if self.expires_at is None:
@@ -45,4 +51,4 @@ class AccountInvite(Base):
@property
def is_valid(self) -> bool:
return not self.is_used and not self.is_expired
return not self.is_used and not self.is_expired and not self.is_revoked

View File

@@ -10,7 +10,7 @@ from typing import Optional, Any, TYPE_CHECKING
from sqlalchemy import String, Text, DateTime, ForeignKey, Boolean, Integer, Float, CheckConstraint
import sqlalchemy as sa
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.dialects.postgresql import UUID, JSONB
from sqlalchemy.dialects.postgresql import UUID, JSONB, TSVECTOR
from app.core.database import Base
@@ -46,6 +46,7 @@ class AISession(Base):
"confidence_tier IN ('guided', 'exploring', 'discovery')",
name="ck_ai_sessions_confidence_tier",
),
sa.Index("idx_ai_sessions_search", "search_vector", postgresql_using="gin"),
)
id: Mapped[uuid.UUID] = mapped_column(
@@ -150,6 +151,18 @@ class AISession(Base):
Text, nullable=True,
comment="Why escalated (set on escalation)",
)
search_vector: Mapped[Optional[str]] = mapped_column(
TSVECTOR,
sa.Computed(
"to_tsvector('english', "
"coalesce(problem_summary, '') || ' ' || "
"coalesce(resolution_summary, '') || ' ' || "
"coalesce(escalation_reason, '') || ' ' || "
"coalesce(problem_domain, ''))",
persisted=True,
),
nullable=True,
)
escalation_package: Mapped[Optional[dict[str, Any]]] = mapped_column(
JSONB, nullable=True,
comment="Context package for receiving engineer: steps_tried, hypotheses, suggestions",

View File

@@ -0,0 +1,36 @@
import uuid
from datetime import datetime, timezone
from typing import TYPE_CHECKING
from sqlalchemy import String, DateTime, ForeignKey, UniqueConstraint, Index
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.dialects.postgresql import UUID
from app.core.database import Base
if TYPE_CHECKING:
from app.models.user import User
class OAuthIdentity(Base):
__tablename__ = "oauth_identities"
__table_args__ = (
UniqueConstraint("provider", "provider_subject", name="uq_oauth_identities_provider_subject"),
Index("ix_oauth_identities_user_id", "user_id"),
)
id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
user_id: Mapped[uuid.UUID] = mapped_column(
UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False
)
provider: Mapped[str] = mapped_column(String(20), nullable=False)
provider_subject: Mapped[str] = mapped_column(String(255), nullable=False)
provider_email_at_link: Mapped[str] = mapped_column(String(255), nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True),
default=lambda: datetime.now(timezone.utc),
onupdate=lambda: datetime.now(timezone.utc),
)
user: Mapped["User"] = relationship("User", backref="oauth_identities")

View File

@@ -0,0 +1,31 @@
from datetime import datetime, timezone
from typing import Optional
from sqlalchemy import String, Integer, Boolean, DateTime, ForeignKey, Text
from sqlalchemy.orm import Mapped, mapped_column
from app.core.database import Base
class PlanBilling(Base):
__tablename__ = "plan_billing"
plan: Mapped[str] = mapped_column(
String(50), ForeignKey("plan_limits.plan"), primary_key=True
)
display_name: Mapped[str] = mapped_column(String(255), nullable=False)
description: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
monthly_price_cents: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
annual_price_cents: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
stripe_product_id: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
stripe_monthly_price_id: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
stripe_annual_price_id: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
is_public: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
is_archived: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
sort_order: Mapped[int] = mapped_column(Integer, nullable=False, default=0)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True),
default=lambda: datetime.now(timezone.utc),
onupdate=lambda: datetime.now(timezone.utc),
)

View File

@@ -0,0 +1,28 @@
import uuid
from datetime import datetime, timezone
from typing import Optional
from sqlalchemy import String, DateTime, Text, Index
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy.dialects.postgresql import UUID
from app.core.database import Base
class SalesLead(Base):
__tablename__ = "sales_leads"
__table_args__ = (Index("ix_sales_leads_email", "email"),)
id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
email: Mapped[str] = mapped_column(String(255), nullable=False)
name: Mapped[str] = mapped_column(String(255), nullable=False)
company: Mapped[str] = mapped_column(String(255), nullable=False)
team_size: Mapped[Optional[str]] = mapped_column(String(20), nullable=True)
message: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
source: Mapped[str] = mapped_column(String(50), nullable=False)
posthog_distinct_id: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
status: Mapped[str] = mapped_column(String(20), nullable=False, default="new")
created_at: Mapped[datetime] = mapped_column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True),
default=lambda: datetime.now(timezone.utc),
onupdate=lambda: datetime.now(timezone.utc),
)

View File

@@ -37,7 +37,7 @@ class SessionSuggestedFix(Base):
),
CheckConstraint(
"status IN ('proposed', 'applied_success', 'applied_failed', "
"'applied_partial', 'dismissed')",
"'applied_partial', 'applied_pending', 'dismissed')",
name="ck_session_suggested_fixes_status",
),
)
@@ -81,6 +81,7 @@ class SessionSuggestedFix(Base):
DateTime(timezone=True), nullable=True
)
partial_notes: Mapped[str | None] = mapped_column(Text, nullable=True)
pending_reason: Mapped[str | None] = mapped_column(Text, nullable=True)
failure_reason: Mapped[str | None] = mapped_column(Text, nullable=True)
ai_outcome_proposal: Mapped[dict[str, Any] | None] = mapped_column(
JSONB, nullable=True

View File

@@ -0,0 +1,17 @@
from datetime import datetime, timezone
from sqlalchemy import String, DateTime, Index
from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy.dialects.postgresql import JSONB
from app.core.database import Base
class StripeEvent(Base):
__tablename__ = "stripe_events"
__table_args__ = (Index("ix_stripe_events_event_type", "event_type"),)
id: Mapped[str] = mapped_column(String(255), primary_key=True) # Stripe event id
event_type: Mapped[str] = mapped_column(String(100), nullable=False)
processed_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), default=lambda: datetime.now(timezone.utc)
)
payload_excerpt: Mapped[dict] = mapped_column(JSONB, nullable=False, default=dict)

View File

@@ -32,8 +32,20 @@ class Subscription(Base):
@property
def is_active(self) -> bool:
return self.status in ("active", "trialing")
return self.status in ("active", "trialing", "complimentary")
@property
def is_paid(self) -> bool:
return self.plan in ("pro", "team")
# Excludes complimentary and trialing so MRR/paid-customer metrics aren't inflated.
return self.plan in ("pro", "team") and self.status not in ("complimentary", "trialing")
@property
def has_pro_entitlement(self) -> bool:
"""True if the account can access Pro features right now."""
if self.plan in ("pro", "team"):
if self.status in ("active", "complimentary"):
return True
if self.status == "trialing" and self.current_period_end is not None:
from datetime import datetime, timezone
return self.current_period_end > datetime.now(timezone.utc)
return False

View File

@@ -1,7 +1,7 @@
import uuid
from datetime import datetime, timezone
from typing import Optional, TYPE_CHECKING
from sqlalchemy import String, DateTime, ForeignKey, Boolean, CheckConstraint, Text
from sqlalchemy import String, DateTime, ForeignKey, Boolean, CheckConstraint, Text, Integer
from sqlalchemy.orm import Mapped, mapped_column, relationship
from sqlalchemy.dialects.postgresql import UUID
from app.core.database import Base
@@ -33,7 +33,7 @@ class User(Base):
default=uuid.uuid4
)
email: Mapped[str] = mapped_column(String(255), unique=True, nullable=False, index=True)
password_hash: Mapped[str] = mapped_column(String(255), nullable=False)
password_hash: Mapped[Optional[str]] = mapped_column(String(255), nullable=True)
name: Mapped[str] = mapped_column(String(255), nullable=False)
role: Mapped[str] = mapped_column(String(50), nullable=False, default="engineer")
is_super_admin: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
@@ -76,6 +76,8 @@ class User(Base):
# Onboarding
onboarding_dismissed: Mapped[bool] = mapped_column(Boolean, default=False, nullable=False, server_default="false")
role_at_signup: Mapped[Optional[str]] = mapped_column(String(50), nullable=True)
onboarding_step_completed: Mapped[Optional[int]] = mapped_column(Integer, nullable=True)
# Branding (solo pros without a team)
logo_data: Mapped[Optional[str]] = mapped_column(Text, nullable=True)

View File

@@ -42,3 +42,12 @@ class AccountInviteResponse(BaseModel):
used_at: Optional[datetime] = None
model_config = {"from_attributes": True}
class AccountInviteBulkCreate(BaseModel):
invites: list[AccountInviteCreate]
class AccountInviteBulkResponse(BaseModel):
created: list[AccountInviteResponse]
failed: list[dict] # entries shaped {"email": str, "error": str}

View File

@@ -172,6 +172,21 @@ class PlanLimitResponse(BaseModel):
from_attributes = True
class PlanLimitWithBillingResponse(PlanLimitResponse):
"""PlanLimits + plan_billing fields merged. Billing fields are None when no
plan_billing row exists for the plan yet."""
display_name: Optional[str] = None
description: Optional[str] = None
monthly_price_cents: Optional[int] = None
annual_price_cents: Optional[int] = None
stripe_product_id: Optional[str] = None
stripe_monthly_price_id: Optional[str] = None
stripe_annual_price_id: Optional[str] = None
is_public: Optional[bool] = None
is_archived: Optional[bool] = None
sort_order: Optional[int] = None
class PlanLimitUpdate(BaseModel):
plan: str
max_trees: Optional[int] = None
@@ -180,6 +195,19 @@ class PlanLimitUpdate(BaseModel):
custom_branding: bool = False
priority_support: bool = False
export_formats: list = Field(default_factory=lambda: ["markdown", "text"])
# plan_billing fields — all optional, partial-update semantics. If any are
# set in the body, the admin endpoint upserts the plan_billing row in the
# same transaction.
display_name: Optional[str] = None
description: Optional[str] = None
monthly_price_cents: Optional[int] = None
annual_price_cents: Optional[int] = None
stripe_product_id: Optional[str] = None
stripe_monthly_price_id: Optional[str] = None
stripe_annual_price_id: Optional[str] = None
is_public: Optional[bool] = None
is_archived: Optional[bool] = None
sort_order: Optional[int] = None
class AccountOverrideCreate(BaseModel):

View File

@@ -0,0 +1,64 @@
from typing import Literal, Optional, Dict, Any
from datetime import datetime
from pydantic import BaseModel
class CheckoutSessionCreate(BaseModel):
plan: Literal["pro", "starter", "team", "enterprise"]
seats: int
billing_interval: Literal["monthly", "annual"] = "monthly"
class CheckoutSessionResponse(BaseModel):
url: str
class BillingPortalSessionResponse(BaseModel):
url: str
class SubscriptionState(BaseModel):
status: str
plan: str
current_period_start: Optional[datetime]
current_period_end: Optional[datetime]
cancel_at_period_end: bool
seat_limit: Optional[int]
has_pro_entitlement: bool
is_paid: bool
class PlanBillingState(BaseModel):
display_name: str
description: Optional[str] = None
monthly_price_cents: Optional[int] = None
annual_price_cents: Optional[int] = None
model_config = {"from_attributes": True}
class BillingStateResponse(BaseModel):
subscription: SubscriptionState
plan_billing: Optional[PlanBillingState]
plan_limits: Dict[str, Any]
enabled_features: Dict[str, bool]
class PublicPlanResponse(BaseModel):
"""Public-safe view of a billable plan, used by the marketing /pricing page.
Sourced from `plan_billing` joined with `plan_limits.max_users` (exposed
here as `max_seats`). Always filtered server-side to is_public=True and
is_archived=False, so `is_public` is a constant True for any row returned
here — included for clarity and forward compatibility.
"""
plan: str
display_name: str
description: Optional[str] = None
monthly_price_cents: Optional[int] = None
annual_price_cents: Optional[int] = None
max_seats: Optional[int] = None
sort_order: int
is_public: bool = True
model_config = {"from_attributes": True}

View File

@@ -0,0 +1,18 @@
"""Pydantic schemas for public runtime configuration."""
from __future__ import annotations
from typing import List
from pydantic import BaseModel
class PublicConfigResponse(BaseModel):
"""Runtime feature flags + OAuth provider list exposed to anonymous clients.
Read once by the frontend at app load to decide whether to render the
self-serve signup flow and which OAuth buttons to show.
"""
self_serve_enabled: bool
oauth_providers: List[str]

View File

@@ -124,3 +124,26 @@ class FlowPilotDashboard(BaseModel):
confidence_breakdown: ConfidenceBreakdown
knowledge_coverage: KnowledgeCoverage
psa_metrics: PsaMetrics | None = None
class EscalationMetrics(BaseModel):
"""In-product time-to-first-action metric for the Escalation Mode wedge.
NOTE: this is the *in-product* metric (post-claim time-to-first-action). The
"minutes recovered" sales claim requires a manual baseline measurement of the
pre-Escalation-Mode verbal-handoff time. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md for the two-metric
framing — do not roll this number alone into "minutes recovered."
"""
period: str
n_handoffs_claimed: int
n_handoffs_with_action: int
avg_seconds_to_first_action: float | None = None
median_seconds_to_first_action: float | None = None
p95_seconds_to_first_action: float | None = None
metric_definition: str = (
"elapsed_seconds(first ai_session_step in session where "
"created_at > SessionHandoff.claimed_at) — measures post-claim activity "
"lag, NOT verbal-handoff savings. Pair with manual baseline."
)

View File

@@ -0,0 +1,32 @@
from pydantic import BaseModel
class OAuthCallbackPayload(BaseModel):
code: str
state: str | None = None
# When the OAuth flow originated from /accept-invite, the frontend round-trips
# the invite code + invited email so the backend can link the new user to the
# invited account instead of creating a personal one.
account_invite_code: str | None = None
invited_email: str | None = None
class OAuthCallbackResponse(BaseModel):
access_token: str
refresh_token: str
token_type: str = "bearer"
is_new_user: bool
class InviteLookupResponse(BaseModel):
"""Public response surface for GET /accounts/invites/{code}/lookup.
Returns the minimum context needed for the AcceptInvitePage:
account name (so we can title the card), inviter name (for the resend
fallback message), invited email (locked into the form), and role.
"""
account_name: str
inviter_name: str
invited_email: str
role: str

View File

@@ -1,12 +1,55 @@
from pydantic import BaseModel
from typing import Literal, Optional
from pydantic import BaseModel, Field
class OnboardingStatus(BaseModel):
created_flow: bool
ran_session: bool
exported_session: bool
# Kept for backward-compat during deploy; new code paths should not branch on this.
tried_ai_assistant: bool
invited_teammate: bool
connected_psa: bool
is_team_user: bool
dismissed: bool
# New (Phase 2 — Task 41) — drive the unified next-step card + checklist.
email_verified: bool
shop_setup_done: bool
# --- Welcome wizard (Phase 2) ----------------------------------------------
TeamSizeBucket = Literal["1-2", "3-5", "6-10", "11-25", "26+"]
RoleAtSignup = Literal["owner", "lead_tech", "tech", "other"]
PrimaryPsa = Literal["connectwise", "autotask", "halopsa", "none"]
WizardStep = Literal[1, 2, 3]
WizardAction = Literal["complete", "skip"]
class OnboardingStepData(BaseModel):
"""Optional payload carried with `action="complete"` for steps 1 and 2.
Step 1 fields: company_name, team_size_bucket, role_at_signup
Step 2 fields: primary_psa
Step 3 has no data (invitations posted separately).
"""
# Step 1
company_name: Optional[str] = Field(default=None, max_length=255)
team_size_bucket: Optional[TeamSizeBucket] = None
role_at_signup: Optional[RoleAtSignup] = None
# Step 2
primary_psa: Optional[PrimaryPsa] = None
class OnboardingStepRequest(BaseModel):
step: WizardStep
action: WizardAction
data: Optional[OnboardingStepData] = None
class OnboardingStepResponse(BaseModel):
onboarding_step_completed: Optional[int]
onboarding_dismissed: bool

View File

@@ -53,9 +53,13 @@ class PSATicketSearchResult(BaseModel):
id: str
summary: str
company_name: str | None = None
company_id: str | None = None
board_name: str | None = None
board_id: int | None = None
status_name: str | None = None
status_id: int | None = None
priority_name: str | None = None
priority_id: int | None = None
closed: bool = False

View File

@@ -0,0 +1,65 @@
"""Normalized DTOs for ticket management endpoints."""
from __future__ import annotations
from pydantic import BaseModel
class PSAResourceSchema(BaseModel):
member_id: int
member_name: str
member_identifier: str
is_rf_user: bool = False
class PSATicketCreatedSchema(BaseModel):
id: int
summary: str
board_name: str
status_name: str
priority_name: str
company_name: str
resources: list[PSAResourceSchema] = []
class PSATicketStatusUpdateSchema(BaseModel):
ticket_id: int
previous_status: str
new_status: str
new_status_id: int
class TicketCreatePayloadSchema(BaseModel):
summary: str
company_id: int
board_id: int
status_id: int
priority_id: int
description: str | None = None
assigned_member_id: int | None = None
class TicketListResponseSchema(BaseModel):
items: list = []
total: int = 0
page: int = 1
page_size: int = 25
class AiParseRequestSchema(BaseModel):
prompt: str
class AiParseResponseSchema(BaseModel):
summary: str | None = None
company_id: int | None = None
board_id: int | None = None
priority_id: int | None = None
status_id: int | None = None
assigned_member_id: int | None = None
description: str | None = None
missing_fields: list[str] = []
warnings: list[str] = []
class PSAPrioritySchema(BaseModel):
id: int
name: str

View File

@@ -0,0 +1,27 @@
"""Pydantic schemas for Talk-to-Sales submissions."""
from typing import Literal, Optional
from uuid import UUID
from pydantic import BaseModel, ConfigDict, EmailStr, Field
SalesLeadSource = Literal["pricing_page", "register_footer", "landing_page"]
class SalesLeadCreate(BaseModel):
"""Public Talk-to-Sales form submission."""
model_config = ConfigDict(str_strip_whitespace=True)
email: EmailStr
name: str = Field(..., min_length=1, max_length=255)
company: str = Field(..., min_length=1, max_length=255)
team_size: Optional[str] = Field(default=None, max_length=20)
message: Optional[str] = Field(default=None, max_length=5000)
source: SalesLeadSource
posthog_distinct_id: Optional[str] = Field(default=None, max_length=255)
class SalesLeadCreateResponse(BaseModel):
id: UUID
status: Literal["received"] = "received"

View File

@@ -10,12 +10,18 @@ class HandoffCreateRequest(BaseModel):
intent: str = Field(..., pattern="^(park|escalate)$")
engineer_notes: str | None = None
priority: str = Field("normal", pattern="^(normal|elevated)$")
# Optional escalation target — if set, only this user is the named
# recipient. Notification dispatch fans out to all engineer/admin/owner
# users in the account either way; this just records the original
# engineer's preferred recipient on the session for audit/UX.
target_user_id: UUID | None = None
class HandoffResponse(BaseModel):
id: UUID
session_id: UUID
handed_off_by: UUID
handed_off_by_name: str | None = None
intent: str
source_branch_id: UUID | None
snapshot: dict[str, Any]

View File

@@ -20,6 +20,7 @@ FixStatus = Literal[
"applied_success",
"applied_failed",
"applied_partial",
"applied_pending",
"dismissed",
]
@@ -40,6 +41,7 @@ class SessionSuggestedFixResponse(BaseModel):
applied_at: datetime | None
verified_at: datetime | None
partial_notes: str | None
pending_reason: str | None
failure_reason: str | None
ai_outcome_proposal: dict[str, Any] | None
@@ -91,7 +93,11 @@ class SessionSuggestedFixDecisionResponse(BaseModel):
# Subset of FixStatus that the engineer can set via the outcome endpoint —
# `proposed` is excluded because you can't un-decide a fix back to "proposed".
FixOutcome = Literal[
"applied_success", "applied_failed", "applied_partial", "dismissed"
"applied_success",
"applied_failed",
"applied_partial",
"applied_pending",
"dismissed",
]
@@ -103,14 +109,18 @@ class SessionSuggestedFixOutcomeRequest(BaseModel):
engineer took); outcome captures whether the fix actually worked.
Allowed transitions:
- from `proposed` or `applied_partial`: any outcome is valid
(partial is parked, not terminal — the engineer may update notes,
abandon via dismiss, or advance to success/failed)
- from `proposed`, `applied_partial`, or `applied_pending`: any outcome
is valid. Partial means "did some of it"; pending means "did all of
it but verification is deferred (waiting on client, async sync, etc)".
Both are parked, not terminal — the engineer may advance them to
success/failed/dismiss.
- from any terminal outcome (`applied_success`, `applied_failed`,
`dismissed`): server returns 409
"""
outcome: FixOutcome
# Required for applied_partial, optional for applied_failed, ignored otherwise.
# Required for applied_partial AND applied_pending; optional for
# applied_failed; ignored otherwise. For pending, this is the
# "what are you waiting on?" reason (e.g. "client power-cycling router").
notes: str | None = Field(None, max_length=500)

View File

@@ -58,6 +58,8 @@ class UserResponse(UserBase):
timezone: str = "UTC"
avatar_url: Optional[str] = None
email_verified_at: Optional[datetime] = None
onboarding_step_completed: Optional[int] = None
onboarding_dismissed: bool = False
class Config:
from_attributes = True
@@ -68,4 +70,6 @@ class RoleUpdate(BaseModel):
class AccountRoleUpdate(BaseModel):
account_role: str = Field(..., pattern="^(owner|admin|engineer|viewer)$")
# Ownership changes must go through the explicit transfer-ownership flow so
# account.owner_id stays consistent with user.account_role.
account_role: str = Field(..., pattern="^(admin|engineer|viewer)$")

View File

@@ -295,6 +295,24 @@ To create a fork, append this marker AFTER your [QUESTIONS]/[ACTIONS] markers:
- If a question is clearly outside your domain, say so briefly and redirect.
- Never fabricate error codes, KB article numbers, or CLI flags. If unsure, say so.
## SPIN-OFF TICKET CREATION
When you identify a second distinct issue that is clearly separate from the primary topic \
of this session, suggest creating a spin-off ticket using the [ACTIONS] marker below. \
Use this sparingly — only when the issue is genuinely independent, not for every tangential mention.
Use `create_spin_off_ticket` as the command value for this action.
Format:
[ACTIONS]
[
{
"label": "Create ticket: <brief issue title>",
"command": "<spin-off ticket action command>",
"description": "<one sentence description of the separate issue>"
}
]
[/ACTIONS]
## FINAL REMINDER — THIS OVERRIDES EVERYTHING ABOVE
Every single response MUST contain [QUESTIONS] and/or [ACTIONS] markers with valid JSON. \
No exceptions. Not even when forking. A response without at least one of these markers \

View File

@@ -0,0 +1,356 @@
"""Single billing service module. Stripe is the only impl — no provider
abstraction. Account row is canonical local state; Stripe is canonical
remote state; the webhook handler bridges the two."""
import logging
from datetime import datetime, timezone, timedelta
import stripe
from sqlalchemy import select
from sqlalchemy.exc import IntegrityError
from sqlalchemy.ext.asyncio import AsyncSession
from app.core.config import settings
from app.models.account import Account
from app.models.plan_billing import PlanBilling
from app.models.stripe_event import StripeEvent
from app.models.subscription import Subscription
TRIAL_DAYS = 14
logger = logging.getLogger(__name__)
class BillingService:
@staticmethod
async def invalidate_billing_cache(account_ids) -> None:
"""No-op stub for future in-process billing cache invalidation.
Today there is no `app.state.billing_cache` — `BillingService.get_billing_state`
always reads fresh from the DB. Call sites that mutate plan/feature data
invoke this hook so that wiring is in place when an in-process cache is
added later. Until then, this just logs.
TODO: when an in-process billing cache (e.g. `app.state.billing_cache`)
is introduced, evict entries for the given account_ids here.
"""
try:
count = len(list(account_ids))
except TypeError:
count = -1
logger.debug(
"BillingService.invalidate_billing_cache called for %d account(s) "
"(no-op stub — wire to app.state.billing_cache when added)",
count,
)
@staticmethod
async def start_trial(db: AsyncSession, account_id) -> Subscription:
"""Idempotent. Creates a trialing Subscription on Pro for the account if
one doesn't exist; otherwise returns the existing row."""
result = await db.execute(
select(Subscription).where(Subscription.account_id == account_id)
)
existing = result.scalar_one_or_none()
if existing is not None:
return existing
sub = Subscription(
account_id=account_id,
plan="pro",
status="trialing",
current_period_start=datetime.now(timezone.utc),
current_period_end=datetime.now(timezone.utc) + timedelta(days=TRIAL_DAYS),
)
db.add(sub)
await db.commit()
await db.refresh(sub)
return sub
@staticmethod
async def create_checkout_session(
db: AsyncSession,
account: Account,
plan: str,
seats: int,
billing_interval: str,
success_url: str,
cancel_url: str,
) -> str:
"""Create a Stripe Checkout Session for subscription purchase. If the
account currently has a trialing subscription with time remaining, that
trial end is preserved on the new Stripe subscription so the user
isn't charged early."""
if not settings.stripe_enabled:
raise RuntimeError("Stripe not configured")
stripe.api_key = settings.STRIPE_SECRET_KEY
plan_billing = (await db.execute(
select(PlanBilling).where(PlanBilling.plan == plan)
)).scalar_one_or_none()
if plan_billing is None:
raise ValueError(f"Unknown plan: {plan}")
price_id = (
plan_billing.stripe_monthly_price_id if billing_interval == "monthly"
else plan_billing.stripe_annual_price_id
)
if price_id is None:
raise RuntimeError(
f"Plan '{plan}' has no Stripe price for {billing_interval}"
)
if account.stripe_customer_id is None:
customer = stripe.Customer.create(
email=None,
metadata={"account_id": str(account.id)},
)
account.stripe_customer_id = customer.id
await db.commit()
sub = (await db.execute(
select(Subscription).where(Subscription.account_id == account.id)
)).scalar_one_or_none()
subscription_data = {}
if (
sub
and sub.status == "trialing"
and sub.current_period_end
and sub.current_period_end > datetime.now(timezone.utc)
):
subscription_data["trial_end"] = int(sub.current_period_end.timestamp())
session = stripe.checkout.Session.create(
customer=account.stripe_customer_id,
line_items=[{"price": price_id, "quantity": seats}],
mode="subscription",
subscription_data=subscription_data or None,
success_url=success_url,
cancel_url=cancel_url,
allow_promotion_codes=False,
)
return session.url
@staticmethod
async def open_customer_portal(account: Account) -> str:
"""Create a Stripe-hosted Customer Portal session and return the URL.
Raises RuntimeError if Stripe isn't configured (endpoint maps to 503).
Raises ValueError if the account has no stripe_customer_id yet — the
user must complete a checkout first (endpoint maps to 400).
"""
if not settings.stripe_enabled:
raise RuntimeError("Stripe not configured")
if account.stripe_customer_id is None:
raise ValueError("no_stripe_customer")
stripe.api_key = settings.STRIPE_SECRET_KEY
session = stripe.billing_portal.Session.create(
customer=account.stripe_customer_id,
return_url=f"{settings.FRONTEND_URL}/account/billing",
)
return session.url
@staticmethod
async def get_billing_state(db: AsyncSession, account):
"""Aggregate Subscription + PlanLimits + PlanBilling + resolved feature
flags for the account."""
from app.models.plan_limits import PlanLimits
from app.models.plan_billing import PlanBilling
from app.models.feature_flag import (
FeatureFlag, PlanFeatureDefault, AccountFeatureOverride,
)
sub = (await db.execute(
select(Subscription).where(Subscription.account_id == account.id)
)).scalar_one_or_none()
if sub is None:
from fastapi import HTTPException
raise HTTPException(status_code=404, detail="No subscription for account")
pl = (await db.execute(
select(PlanLimits).where(PlanLimits.plan == sub.plan)
)).scalar_one_or_none()
pb = (await db.execute(
select(PlanBilling).where(PlanBilling.plan == sub.plan)
)).scalar_one_or_none()
# Resolved feature flags: plan defaults overridden by account overrides
defaults = (await db.execute(
select(PlanFeatureDefault, FeatureFlag)
.join(FeatureFlag, PlanFeatureDefault.flag_id == FeatureFlag.id)
.where(PlanFeatureDefault.plan == sub.plan)
)).all()
resolved = {flag.flag_key: pfd.enabled for pfd, flag in defaults}
overrides = (await db.execute(
select(AccountFeatureOverride, FeatureFlag)
.join(FeatureFlag, AccountFeatureOverride.flag_id == FeatureFlag.id)
.where(AccountFeatureOverride.account_id == account.id)
)).all()
for ovr, flag in overrides:
resolved[flag.flag_key] = ovr.enabled
return {
"subscription": {
"status": sub.status,
"plan": sub.plan,
"current_period_start": sub.current_period_start,
"current_period_end": sub.current_period_end,
"cancel_at_period_end": sub.cancel_at_period_end,
"seat_limit": sub.seat_limit,
"has_pro_entitlement": sub.has_pro_entitlement,
"is_paid": sub.is_paid,
},
"plan_billing": pb,
"plan_limits": _plan_limits_to_dict(pl) if pl else {},
"enabled_features": resolved,
}
@staticmethod
async def apply_subscription_event(
db: AsyncSession, event_id: str, event_type: str, payload: dict
) -> bool:
"""Idempotent. Returns True if the event was applied; False if it had
already been processed (idempotent ack). The webhook handler returns 200
either way.
Atomic: the StripeEvent idempotency mark and the handler's state
mutations are committed in a single transaction. If the handler raises
the entire transaction (idempotency mark + partial mutations) is rolled
back, so a Stripe retry will re-run the handler. Without this, a
handler that fails mid-flight would leave the StripeEvent row persisted
and silently desync subscription state from Stripe.
"""
db.add(StripeEvent(
id=event_id,
event_type=event_type,
payload_excerpt=_excerpt(payload),
))
try:
await db.flush()
except IntegrityError:
# Duplicate event_id — already processed (or in flight). Ack with False.
await db.rollback()
return False
try:
if event_type == "checkout.session.completed":
await _handle_checkout_completed(db, payload)
elif event_type == "customer.subscription.updated":
await _handle_subscription_updated(db, payload)
elif event_type == "customer.subscription.deleted":
await _handle_subscription_deleted(db, payload)
elif event_type == "invoice.payment_failed":
await _handle_payment_failed(db, payload)
elif event_type == "invoice.payment_succeeded":
await _handle_payment_succeeded(db, payload)
await db.commit()
except Exception:
# Roll back the StripeEvent insert + any partial handler mutations
# so Stripe's retry can re-run cleanly.
await db.rollback()
raise
return True
def _plan_limits_to_dict(pl) -> dict:
return {c.name: getattr(pl, c.name) for c in pl.__table__.columns}
def _excerpt(payload: dict) -> dict:
obj = payload.get("data", {}).get("object", {})
return {
"object_id": obj.get("id"),
"customer": obj.get("customer"),
"subscription": obj.get("subscription"),
"status": obj.get("status"),
}
async def _handle_checkout_completed(db: AsyncSession, payload: dict):
obj = payload["data"]["object"]
customer_id = obj["customer"]
subscription_id = obj["subscription"]
account = (await db.execute(
select(Account).where(Account.stripe_customer_id == customer_id)
)).scalar_one_or_none()
if account is None:
return
sub = (await db.execute(
select(Subscription).where(Subscription.account_id == account.id)
)).scalar_one_or_none()
if sub is None:
return
stripe.api_key = settings.STRIPE_SECRET_KEY
stripe_sub = stripe.Subscription.retrieve(subscription_id)
sub.stripe_subscription_id = subscription_id
sub.stripe_price_id = stripe_sub["items"]["data"][0]["price"]["id"]
sub.status = "active"
sub.current_period_start = datetime.fromtimestamp(stripe_sub["current_period_start"], tz=timezone.utc)
sub.current_period_end = datetime.fromtimestamp(stripe_sub["current_period_end"], tz=timezone.utc)
sub.seat_limit = stripe_sub["items"]["data"][0]["quantity"]
pb = (await db.execute(
select(PlanBilling).where(
(PlanBilling.stripe_monthly_price_id == sub.stripe_price_id) |
(PlanBilling.stripe_annual_price_id == sub.stripe_price_id)
)
)).scalar_one_or_none()
if pb is not None:
sub.plan = pb.plan
# No commit — apply_subscription_event commits once for the full event.
async def _handle_subscription_updated(db: AsyncSession, payload: dict):
obj = payload["data"]["object"]
sub = (await db.execute(
select(Subscription).where(Subscription.stripe_subscription_id == obj["id"])
)).scalar_one_or_none()
if sub is None:
return
sub.status = obj["status"]
sub.current_period_start = datetime.fromtimestamp(obj["current_period_start"], tz=timezone.utc)
sub.current_period_end = datetime.fromtimestamp(obj["current_period_end"], tz=timezone.utc)
sub.cancel_at_period_end = obj.get("cancel_at_period_end", False)
sub.seat_limit = obj["items"]["data"][0]["quantity"]
# No commit — apply_subscription_event commits once for the full event.
async def _handle_subscription_deleted(db: AsyncSession, payload: dict):
obj = payload["data"]["object"]
sub = (await db.execute(
select(Subscription).where(Subscription.stripe_subscription_id == obj["id"])
)).scalar_one_or_none()
if sub is None:
return
sub.status = "canceled"
# No commit — apply_subscription_event commits once for the full event.
async def _handle_payment_failed(db: AsyncSession, payload: dict):
obj = payload["data"]["object"]
subscription_id = obj.get("subscription")
if not subscription_id:
return
sub = (await db.execute(
select(Subscription).where(Subscription.stripe_subscription_id == subscription_id)
)).scalar_one_or_none()
if sub is None:
return
sub.status = "past_due"
# No commit — apply_subscription_event commits once for the full event.
async def _handle_payment_succeeded(db: AsyncSession, payload: dict):
obj = payload["data"]["object"]
subscription_id = obj.get("subscription")
if not subscription_id:
return
sub = (await db.execute(
select(Subscription).where(Subscription.stripe_subscription_id == subscription_id)
)).scalar_one_or_none()
if sub is None:
return
if sub.status == "past_due":
sub.status = "active"
# No commit — apply_subscription_event commits once for the full event.

View File

@@ -63,6 +63,9 @@ the active suggested fix, as given in the input bundle under "Outcome status":>
provided. State that it did not resolve the issue.
- applied_partial: Include the fix as a partially tried path. Include partial \
notes if provided. Indicate it was not fully completed or not verified.
- applied_pending: List the fix as applied but awaiting verification. Include \
the pending reason if provided. Make it clear the next engineer should follow \
up to confirm it worked.
- applied_success: Note that the fix was applied and verified but escalation \
is still needed for another reason (unusual — reflect this accurately).
- dismissed: Do not mention the fix as a tried path; it was only considered.
@@ -80,6 +83,8 @@ symptoms are still being narrowed."
- applied_failed or dismissed: Say the proposed fix did not hold or was set \
aside. State any remaining uncertainty.
- applied_partial: Note the partial application and what remains open.
- applied_pending: Note that the fix is in place but unverified. Reference the \
pending reason. Frame this as the leading hypothesis pending confirmation.
- applied_success: Unusual in an escalate path — state the fix resolved the \
original symptom but a new or related issue requires escalation.
@@ -92,6 +97,8 @@ accordingly — e.g. suggest alternatives or deeper investigation paths, \
drawing on the failure reason if provided. \
If the fix is partially applied (applied_partial), the first step is typically \
to complete or verify it. \
If the fix is pending verification (applied_pending), the first step is \
typically to confirm whether the fix held — reference what was being waited on. \
If the fix is still proposed (no outcome), the first step is to try it if \
confidence is high (>80%).>
@@ -299,6 +306,8 @@ class EscalationPackageGeneratorService:
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
if active_fix.partial_notes:
lines.append(f"Partial notes: {active_fix.partial_notes}")
if active_fix.pending_reason:
lines.append(f"Pending reason: {active_fix.pending_reason}")
if active_fix.failure_reason:
lines.append(f"Failure reason: {active_fix.failure_reason}")

View File

@@ -632,8 +632,10 @@ async def pickup_session(
allow_team_access=True, team_id=team_id,
)
if session.status != "requesting_escalation":
raise ValueError(f"Session is {session.status}, not requesting_escalation")
if session.status not in ("requesting_escalation", "escalated"):
raise ValueError(
f"Session is {session.status}, not in an escalated state"
)
# Can't pick up your own session
if session.user_id == user_id:
@@ -911,6 +913,41 @@ async def generate_status_update(
"""Generate a status update for ticket notes, client communication, or email draft."""
session = await _load_session(session_id, user_id, db)
# For escalation/ticket_notes, return the pre-generated handoff prose immediately
# if enrich_escalation_async has already populated it. This eliminates the
# redundant Sonnet re-summarization on every "Ticket Notes" click.
if request.context == "escalation" and request.audience == "ticket_notes":
from app.models.session_handoff import SessionHandoff
handoff_q = await db.execute(
select(SessionHandoff)
.where(
SessionHandoff.session_id == session_id,
SessionHandoff.intent == "escalate",
)
.order_by(SessionHandoff.created_at.desc())
.limit(1)
)
escalation_handoff = handoff_q.scalar_one_or_none()
saved_data = (
escalation_handoff.ai_assessment_data or {}
) if escalation_handoff else {}
prose = saved_data.get("summary_prose") or (
escalation_handoff.ai_assessment if escalation_handoff else None
)
if prose:
return StatusUpdateResponse(
content=prose,
audience=request.audience,
length=request.length,
context=request.context,
session_status=session.status,
steps_completed=session.step_count or 0,
time_spent_display=None,
client_name=None,
generated_at=datetime.now(timezone.utc),
)
# Build conversation summary from session steps
steps_summary = []
for step in sorted(session.steps, key=lambda s: s.step_order):

View File

@@ -3,22 +3,65 @@
Creates handoff snapshots, AI assessments (for escalations), claim workflow,
and queue queries. Dual-writes to ai_sessions.escalation_package for
backward compatibility with the existing escalation queue.
For intent='escalate', `create_handoff` also runs the legacy enrichment
that the deprecated `/escalate` endpoint used to do directly: setting
`escalated_to_id`, building the AI-enhanced escalation_package (Sonnet),
and recording escalation_reason. `finalize_escalation` then generates the
SessionDocumentation and pushes to PSA. `dispatch_escalation_notifications`
fans out the bell-icon AppNotification + external channels (Slack/Teams)
on top of per-user emails. The `/escalate` endpoint is now a thin shim
calling these in sequence.
"""
import asyncio
import json
import logging
from datetime import datetime, timezone
from typing import Any
from uuid import UUID
from uuid import UUID, uuid4
from sqlalchemy import select
from sqlalchemy import select, update
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from app.core.ai_provider import get_ai_provider
from app.core.config import settings
from app.core.email import EmailService
from app.core.escalation_bus import bus as escalation_bus
from app.models.ai_session import AISession
from app.models.session_branch import SessionBranch
from app.models.session_handoff import SessionHandoff
from app.models.user import User
from app.schemas.ai_session import SessionDocumentation
from app.services.notification_service import notify
logger = logging.getLogger(__name__)
class HandoffAlreadyClaimedError(Exception):
"""Raised when a senior tries to claim a handoff another senior already won.
Carries the winning claimer's id, display name, and claim timestamp so the
API layer can surface a "Already claimed by {name} {time_ago}" toast on
the losing client. The race story is the locked design — without this
exception the endpoint would silently overwrite `claimed_by` and both
seniors would think they own the session.
"""
def __init__(
self,
claimed_by_id: UUID,
claimed_by_name: str,
claimed_at: datetime,
) -> None:
super().__init__(
f"Handoff already claimed by {claimed_by_name} at {claimed_at.isoformat()}"
)
self.claimed_by_id = claimed_by_id
self.claimed_by_name = claimed_by_name
self.claimed_at = claimed_at
class HandoffManager:
"""Unified park/escalate handoff management."""
@@ -32,37 +75,71 @@ class HandoffManager:
engineer_notes: str | None,
user_id: UUID,
priority: str = "normal",
target_user_id: UUID | None = None,
) -> SessionHandoff:
"""Create a handoff (park or escalate).
Generates snapshot, updates session status, dual-writes to
escalation_package for backward compat.
For intent='escalate' also: sets `session.escalation_reason` and
optionally `session.escalated_to_id`, builds the AI-enhanced
escalation package (the rich one the legacy `/escalate` path used
to produce), and merges the handoff metadata into it. Self-targeting
is rejected with ValueError, matching legacy behavior.
"""
user_id = UUID(str(user_id))
if target_user_id:
target_user_id = UUID(str(target_user_id))
# Eager-load steps + user — _build_escalation_package_enhanced and
# finalize_escalation iterate over session.steps to compose the
# legacy enriched package and the SessionDocumentation, and the
# notify() dispatcher reads session.user.name. Without selectinload
# the async session raises MissingGreenlet on attribute access.
result = await self.db.execute(
select(AISession).where(AISession.id == session_id)
select(AISession)
.options(
selectinload(AISession.steps),
selectinload(AISession.user),
)
.where(AISession.id == session_id)
)
session = result.scalar_one_or_none()
if not session:
raise ValueError(f"Session {session_id} not found")
# Generate snapshot
if intent == "escalate":
if target_user_id and target_user_id == user_id:
raise ValueError(
"Cannot escalate a session to yourself. Use pause instead."
)
if session.status not in ("active", "paused"):
raise ValueError(
f"Cannot escalate session in status: {session.status}"
)
# Generate snapshot — fast, no AI calls.
snapshot = await self._generate_snapshot(session)
# Generate AI assessment for escalations
ai_assessment = None
ai_assessment_data = None
if intent == "escalate":
ai_assessment, ai_assessment_data = await self._generate_ai_assessment(session)
# AI enrichment (assessment + enhanced escalation_package) is now
# deferred to a background task scheduled by the endpoint after
# commit — both calls hit Sonnet and together can take 15-25s,
# which is too long to block the click path. The handoff row lands
# immediately with `ai_assessment=None`; the magic-moment screen
# shows "Assessment still computing" until enrich_async finishes
# and the senior refreshes (or, eventually, polls).
handoff_id = uuid4()
handoff = SessionHandoff(
id=handoff_id,
session_id=session_id,
account_id=session.account_id,
handed_off_by=user_id,
intent=intent,
source_branch_id=session.active_branch_id,
snapshot=snapshot,
ai_assessment=ai_assessment,
ai_assessment_data=ai_assessment_data,
ai_assessment=None,
ai_assessment_data=None,
engineer_notes=engineer_notes,
priority=priority,
)
@@ -73,20 +150,248 @@ class HandoffManager:
session.status = "paused"
elif intent == "escalate":
session.status = "escalated"
session.escalation_reason = engineer_notes
if target_user_id:
session.escalated_to_id = target_user_id
session.handoff_count = (session.handoff_count or 0) + 1
# Dual-write for backward compat
# Dual-write the minimal escalation_package shape now. The async
# enrichment task overwrites this with the AI-enhanced shape
# (`steps_tried`, `remaining_hypotheses`, etc.) when it completes —
# consumers that read these fields (PSA writeback, legacy
# SessionBriefing) tolerate either shape.
session.escalation_package = {
"snapshot": snapshot,
"intent": intent,
"engineer_notes": engineer_notes,
"handoff_id": str(handoff.id),
"handoff_id": str(handoff_id),
}
await self.db.flush()
return handoff
async def finalize_escalation(
self,
handoff: SessionHandoff,
session: AISession,
user_id: UUID,
) -> tuple[SessionDocumentation | None, dict[str, Any]]:
"""Post-create enrichment for intent='escalate' handoffs.
Generates the SessionDocumentation + pushes documentation to PSA if
a ticket is linked. Returns (documentation, psa_result) so the
legacy `/escalate` shim can map back to SessionCloseResponse. Safe
to call only when handoff.intent == 'escalate' — for park, returns
a no-op no-PSA dict.
"""
if handoff.intent != "escalate":
return None, {
"psa_push_status": "no_psa",
"psa_push_error": None,
"member_mapping_warning": None,
}
# Lazy import to avoid circular dependency: flowpilot_engine imports
# plenty of services at module load time and we don't want
# handoff_manager pulled into that graph at import.
from app.services.flowpilot_engine import (
_generate_documentation,
_push_to_psa,
)
documentation = _generate_documentation(session)
psa_result = await _push_to_psa(session, user_id, self.db)
# Bell-icon AppNotification rows + external account-level channels
# (Slack/Teams webhooks, shared escalations inboxes). This is the
# `notify()` call the legacy /escalate path used to make directly,
# and it has to happen BEFORE the endpoint commits so the
# AppNotification rows land atomically with the handoff. Per-user
# emails come after commit in dispatch_escalation_notifications —
# those are pure IO with no persistent state.
try:
engineer_user = (
await self.db.execute(
select(User).where(User.id == user_id)
)
).scalar_one_or_none()
engineer_name = (
engineer_user.name
if engineer_user and engineer_user.name
else "Unknown"
)
target_user_ids = (
[session.escalated_to_id] if session.escalated_to_id else None
)
await notify(
"session.escalated",
handoff.account_id,
{
"session_id": str(handoff.session_id),
"engineer_name": engineer_name,
"escalation_reason": handoff.engineer_notes or "",
"problem_summary": session.problem_summary or "N/A",
# Surface the PSA ticket id in the bell-icon title so two
# similarly-worded escalations are still distinguishable
# at a glance.
"psa_ticket_id": session.psa_ticket_id,
},
self.db,
target_user_ids=target_user_ids,
)
except Exception:
logger.exception(
"notify() dispatch failed for handoff %s", handoff.id
)
return documentation, psa_result
async def _build_enhanced_escalation_package(
self,
session: AISession,
user_id: UUID,
) -> dict[str, Any]:
"""Lazy wrapper around the legacy enhanced-package builder.
The builder lives in flowpilot_engine; we only need it for the
escalate path. Failures are caught here so handoff creation never
depends on the optional Sonnet enrichment — return the minimal
shape on failure.
"""
try:
from app.services.flowpilot_engine import (
_build_escalation_package_enhanced,
)
return await _build_escalation_package_enhanced(session, user_id)
except Exception:
logger.exception(
"Enhanced escalation package build failed for session %s; "
"falling back to minimal package",
session.id,
)
return {}
async def dispatch_escalation_notifications(
self, handoff: SessionHandoff
) -> int:
"""Email engineer-or-admin users in the account about a new escalation.
Call this AFTER `db.commit()` has succeeded — sending email for a
rolled-back handoff is the kind of trust-erosion bug that makes pilot
customers stop trusting the tool. Returns the number of recipients
successfully emailed (best-effort, not authoritative).
Failures are logged but never raise: the wedge demo's reliability
story is "handoff creation always succeeds; notification is best-effort,"
not "handoff creation depends on the email service being up." This is
the graceful-degradation regression the eng + codex reviews both
flagged as critical.
Per-channel delivery records (Codex correction on the dead
`notification_sent` boolean) are a v1.x story — for now the
application logs are the audit trail.
"""
if handoff.intent != "escalate":
return 0
# Publish to the in-memory bus first so connected senior-tech inboxes
# see the new card slide in within ~1s of escalate. This path is
# fire-and-forget (no IO, just memory) so it can sit ahead of the
# email fan-out.
try:
await escalation_bus.publish(
handoff.account_id,
{
"type": "handoff_created",
"handoff_id": str(handoff.id),
"session_id": str(handoff.session_id),
"priority": handoff.priority,
"engineer_notes": handoff.engineer_notes or "",
"created_at": handoff.created_at.isoformat()
if handoff.created_at
else None,
},
)
except Exception:
logger.exception(
"EscalationBus publish failed for handoff %s", handoff.id
)
try:
recipients = (
await self.db.execute(
select(User).where(
User.account_id == handoff.account_id,
User.id != handoff.handed_off_by,
User.account_role.in_(("owner", "admin", "engineer")),
User.is_active.is_(True),
User.deleted_at.is_(None),
)
)
).scalars().all()
if not recipients:
logger.info(
"No notification recipients for handoff %s in account %s",
handoff.id,
handoff.account_id,
)
return 0
# Pull session for the email subject. Fall back to a generic title
# if the session is gone (e.g. cascade delete mid-dispatch).
session_result = await self.db.execute(
select(AISession).where(AISession.id == handoff.session_id)
)
session = session_result.scalar_one_or_none()
problem = (
session.problem_summary if session and session.problem_summary
else "an active session"
)
title = f"New escalation: {problem}"
notes = (handoff.engineer_notes or "").strip()
body = (
"A teammate has escalated a session and is asking for help.\n\n"
f"Reason: {notes if notes else 'No reason provided.'}\n"
f"Priority: {handoff.priority}"
)
link_url = (
f"{settings.FRONTEND_URL.rstrip('/')}/escalations"
if settings.FRONTEND_URL
else None
)
results = await asyncio.gather(
*[
EmailService.send_notification_email(
to_email=r.email,
title=title,
body=body,
link_url=link_url,
)
for r in recipients
],
return_exceptions=True,
)
sent = sum(1 for r in results if r is True)
logger.info(
"Escalation notifications dispatched for handoff %s: %d/%d recipients",
handoff.id,
sent,
len(recipients),
)
return sent
except Exception:
logger.exception(
"Escalation notification dispatch failed for handoff %s",
handoff.id,
)
return 0
async def _generate_snapshot(self, session: AISession) -> dict[str, Any]:
"""Generate a snapshot of the session state at handoff time."""
snapshot: dict[str, Any] = {
@@ -125,16 +430,56 @@ class HandoffManager:
handoff_id: UUID,
claiming_user_id: UUID,
) -> SessionHandoff:
"""Claim a handed-off session."""
"""Claim a handed-off session.
If the handoff was already claimed by a *different* user (the race
story: two seniors clicking Pick Up simultaneously), raise
`HandoffAlreadyClaimedError` with the winning claimer's details so
the API can return 409 with the data the loser's toast needs. A
re-claim by the same user is idempotent.
"""
claiming_user_id = UUID(str(claiming_user_id))
claimed_at = datetime.now(timezone.utc)
update_result = await self.db.execute(
update(SessionHandoff)
.where(
SessionHandoff.id == handoff_id,
SessionHandoff.claimed_by.is_(None),
SessionHandoff.handed_off_by != claiming_user_id,
)
.values(claimed_by=claiming_user_id, claimed_at=claimed_at)
.returning(SessionHandoff.id)
)
claimed_now = update_result.scalar_one_or_none() is not None
result = await self.db.execute(
select(SessionHandoff).where(SessionHandoff.id == handoff_id)
select(SessionHandoff)
.options(
selectinload(SessionHandoff.claimed_by_user),
selectinload(SessionHandoff.handed_off_by_user),
)
.where(SessionHandoff.id == handoff_id)
)
handoff = result.scalar_one_or_none()
if not handoff:
raise ValueError(f"Handoff {handoff_id} not found")
handoff.claimed_by = claiming_user_id
handoff.claimed_at = datetime.now(timezone.utc)
handed_off_by = UUID(str(handoff.handed_off_by))
claimed_by = (
UUID(str(handoff.claimed_by)) if handoff.claimed_by is not None else None
)
if handed_off_by == claiming_user_id:
raise PermissionError("Cannot claim your own handoff")
if not claimed_now and claimed_by != claiming_user_id:
claimer = handoff.claimed_by_user
raise HandoffAlreadyClaimedError(
claimed_by_id=claimed_by,
claimed_by_name=claimer.name if claimer else "another engineer",
claimed_at=handoff.claimed_at or datetime.now(timezone.utc),
)
# Reactivate session
session_result = await self.db.execute(
@@ -149,43 +494,111 @@ class HandoffManager:
await self.db.flush()
return handoff
async def _generate_ai_assessment(
async def _generate_handoff_summary(
self, session: AISession
) -> tuple[str | None, dict[str, Any] | None]:
"""Generate AI diagnostic assessment for escalation handoffs."""
) -> dict[str, Any] | None:
"""Single structured AI call for the escalation magic-moment screen.
Returns a dict with summary_prose, what_we_know, likely_cause,
suggested_steps, and confidence. Returns None on timeout or error.
Replaces the old _generate_ai_assessment + _generate_ai_assessment_with_timeout
pair, which returned freeform prose with no usable structured fields.
"""
timeout = settings.ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS
try:
from app.services.assistant_chat_service import _call_ai
context = f"Problem: {session.problem_summary or 'Unknown'}\nDomain: {session.problem_domain or 'Unknown'}"
msgs = session.conversation_messages or []
# Include last 10 messages for context
recent = "\n".join(
f"[{m.get('role', '?')}]: {m.get('content', '')[:200]}"
for m in msgs[-10:]
return await asyncio.wait_for(
self._generate_handoff_summary_inner(session),
timeout=timeout,
)
assessment_text, _, _ = await _call_ai(
system_base="You are a diagnostic assessment generator for MSP escalations.",
rag_context="",
history=[],
new_message=(
f"Generate a brief diagnostic assessment for this escalation.\n"
f"{context}\n\nRecent conversation:\n{recent}\n\n"
f"Return: 1) Most likely cause, 2) Suggested next steps, 3) Confidence (low/medium/high)"
),
max_tokens=500,
except asyncio.TimeoutError:
logger.warning(
"Handoff summary timed out after %ss for session %s",
timeout,
session.id,
)
assessment_data = {
"likely_cause": "See assessment text",
"suggested_steps": [],
"confidence": "medium",
}
return assessment_text, assessment_data
return None
except Exception:
logger.exception("Failed to generate AI assessment")
return None, None
logger.exception(
"Handoff summary failed for session %s", session.id
)
return None
async def _generate_handoff_summary_inner(
self, session: AISession
) -> dict[str, Any]:
steps = session.steps or []
steps_tried = []
for step in sorted(steps, key=lambda s: s.step_order):
content = step.content or {}
text = content.get("text", "").strip()
if not text:
continue
entry = text
if step.selected_option:
entry += f"{step.selected_option}"
elif step.free_text_input:
entry += f"{step.free_text_input[:100]}"
elif step.was_skipped:
entry += " (skipped)"
steps_tried.append(entry)
steps_text = (
"\n".join(f"- {s}" for s in steps_tried[:15])
or "No diagnostic steps recorded."
)
msgs = session.conversation_messages or []
recent_msgs = "\n".join(
f"[{m.get('role', '?')}]: {m.get('content', '')[:200]}"
for m in msgs[-10:]
)
prompt = (
"Generate a structured escalation handoff summary.\n\n"
f"Problem: {session.problem_summary or 'Unknown'}\n"
f"Domain: {session.problem_domain or 'Unknown'}\n"
f"Escalation reason: {session.escalation_reason or 'Not provided'}\n\n"
f"Diagnostic steps taken:\n{steps_text}\n\n"
f"Recent conversation:\n{recent_msgs}\n\n"
"Respond with ONLY a valid JSON object matching this schema exactly:\n"
'{"summary_prose": "<2-3 sentences suitable for PSA ticket notes>",\n'
' "what_we_know": ["<confirmed fact 1>", "<confirmed fact 2>"],\n'
' "likely_cause": "<one sentence root cause hypothesis>",\n'
' "suggested_steps": ["<next step 1>", "<next step 2>"],\n'
' "confidence": "<low or medium or high>"}'
)
provider = get_ai_provider(settings.get_model_for_action("escalation_package"))
raw, _, _ = await provider.generate_json(
system_prompt=(
"You are a diagnostic assessment generator for MSP tech support escalations. "
"Always respond with valid JSON and nothing else. "
"Be concise and factual."
),
messages=[{"role": "user", "content": prompt}],
max_tokens=700,
)
cleaned = raw.strip()
if cleaned.startswith("```"):
lines = cleaned.split("\n", 1)
cleaned = lines[1] if len(lines) > 1 else cleaned
if cleaned.endswith("```"):
cleaned = cleaned[:-3].rstrip()
result = json.loads(cleaned)
if not isinstance(result.get("suggested_steps"), list):
result["suggested_steps"] = []
if not isinstance(result.get("what_we_know"), list):
result["what_we_know"] = []
if result.get("confidence") not in ("low", "medium", "high"):
result["confidence"] = "medium"
if not isinstance(result.get("summary_prose"), str) or not result.get("summary_prose"):
result["summary_prose"] = result.get("likely_cause", "Assessment generated.")
if not isinstance(result.get("likely_cause"), str):
result["likely_cause"] = ""
return result
async def generate_briefing(
self, handoff_id: UUID, claiming_user_id: UUID
@@ -288,3 +701,105 @@ class HandoffManager:
})
return queue_items
async def enrich_escalation_async(handoff_id: UUID, user_id: UUID) -> None:
"""Run the AI enrichment for an escalation handoff in the background.
Scheduled by `/escalate` and `/handoff` (intent=escalate) endpoints via
FastAPI BackgroundTasks. Opens its own DB session because the request
session is closed by the time this runs. Generates:
1. The legacy AI-enhanced escalation_package (Sonnet, ~5-10s) — saved
to `session.escalation_package`, preserving the `intent` /
`engineer_notes` / `handoff_id` keys the dual-write set so legacy
consumers keep working.
2. The diagnostic AI assessment (Sonnet, ~4-15s) — saved to
`handoff.ai_assessment` and `handoff.ai_assessment_data`.
On completion publishes a `handoff_assessment_ready` event on the
escalation bus so any connected magic-moment screen can refresh
without a manual reload. Failures are logged but never propagated —
the click-path-side handoff creation already committed, so worst case
the senior sees the "Assessment still computing" placeholder until
they refresh manually.
"""
from app.core.database import async_session_maker
from app.core.escalation_bus import bus as escalation_bus
async with async_session_maker() as db:
try:
result = await db.execute(
select(SessionHandoff).where(SessionHandoff.id == handoff_id)
)
handoff = result.scalar_one_or_none()
if not handoff or handoff.intent != "escalate":
return
session_result = await db.execute(
select(AISession)
.options(selectinload(AISession.steps), selectinload(AISession.user))
.where(AISession.id == handoff.session_id)
)
session = session_result.scalar_one_or_none()
if not session:
logger.warning(
"enrich_escalation_async: session %s gone for handoff %s",
handoff.session_id,
handoff_id,
)
return
manager = HandoffManager(db)
# Single consolidated AI call — replaces the old
# _generate_ai_assessment + _build_enhanced_escalation_package pair.
try:
summary = await manager._generate_handoff_summary(session)
if summary:
# ai_assessment (text) holds the PSA prose for backward compat
# (push_to_psa reads it; generate_status_update falls back to it).
handoff.ai_assessment = summary.get("summary_prose")
handoff.ai_assessment_data = summary
# Keep suggested_next_steps in escalation_package so
# psa_documentation_service can read it without a handoff join.
existing_pkg = (
session.escalation_package
if isinstance(session.escalation_package, dict)
else {}
)
session.escalation_package = {
**existing_pkg,
"suggested_next_steps": summary.get("suggested_steps", []),
}
except Exception:
logger.exception(
"enrich_escalation_async: summary generation failed for handoff %s",
handoff_id,
)
await db.commit()
try:
await escalation_bus.publish(
handoff.account_id,
{
"type": "handoff_assessment_ready",
"handoff_id": str(handoff.id),
"session_id": str(handoff.session_id),
"has_assessment": handoff.ai_assessment_data is not None,
},
)
except Exception:
logger.exception(
"enrich_escalation_async: bus publish failed for handoff %s",
handoff_id,
)
except Exception:
logger.exception(
"enrich_escalation_async failed for handoff %s", handoff_id
)
try:
await db.rollback()
except Exception:
pass

View File

@@ -371,13 +371,35 @@ async def _send_teams_message(
def _build_notification_title(event: str, payload: dict[str, Any]) -> str:
"""Human-readable title per event type."""
titles = {
"session.escalated": "Session escalated by {engineer_name}",
# Distinguishability matters in the bell panel: with a generic title
# ("Session escalated by Jane") two different escalations from the
# same junior look like a duplicate notification. Including a short
# problem snippet (and ticket number if present) lets the senior
# tell them apart at a glance.
"session.escalated": "Escalation from {engineer_name}{ticket_suffix}: {problem_snippet}",
"session.high_priority": "High-priority session started: {ticket_number}",
"proposal.pending": "New flow proposal: {title}",
"proposal.approved": "Flow proposal approved: {title}",
"knowledge_gap.detected": "Knowledge gap detected: {gap_type}",
"test": "Test Notification from ResolutionFlow",
}
# Build the escalation-specific derived fields. Done here rather than at
# the call site so every dispatch path (legacy /escalate shim, /handoff,
# any future entry point) gets consistent formatting without each one
# having to repeat the snippet logic.
if event == "session.escalated":
problem = (payload.get("problem_summary") or "").strip()
if not problem or problem.upper() == "N/A":
problem_snippet = "(no summary provided)"
elif len(problem) > 70:
problem_snippet = problem[:67].rstrip() + ""
else:
problem_snippet = problem
ticket = payload.get("psa_ticket_id") or payload.get("ticket_number")
ticket_suffix = f" · #{ticket}" if ticket else ""
payload = {**payload, "problem_snippet": problem_snippet, "ticket_suffix": ticket_suffix}
template = titles.get(event, f"Notification: {event}")
try:
return template.format(**payload)
@@ -405,7 +427,12 @@ def _build_notification_body(event: str, payload: dict[str, Any]) -> str:
def _build_notification_link(event: str, payload: dict[str, Any]) -> Optional[str]:
"""In-app link per event type. Returns path (no host)."""
links: dict[str, str] = {
"session.escalated": "/pilot/{session_id}",
# ?pickup=true triggers the senior-tech handoff/pickup flow on the
# session page (magic-moment screen for handoff-based escalations,
# legacy SessionBriefing for `requesting_escalation` sessions).
# Without it the senior lands on a session-detail GET they can't
# access pre-pickup, which the user perceives as a dead notification.
"session.escalated": "/pilot/{session_id}?pickup=true",
"session.high_priority": "/pilot/{session_id}",
"proposal.pending": "/review-queue",
"proposal.approved": "/review-queue",

View File

@@ -0,0 +1,71 @@
"""OAuth provider helpers. Each provider exposes:
- exchange_code(code, redirect_uri) -> OAuthProfile
"""
from dataclasses import dataclass
import httpx
from app.core.config import settings
@dataclass
class OAuthProfile:
provider_subject: str
email: str
name: str
async def google_exchange_code(code: str, redirect_uri: str) -> OAuthProfile:
async with httpx.AsyncClient(timeout=10) as cli:
token_response = await cli.post(
"https://oauth2.googleapis.com/token",
data={
"code": code,
"client_id": settings.GOOGLE_CLIENT_ID,
"client_secret": settings.GOOGLE_CLIENT_SECRET,
"redirect_uri": redirect_uri,
"grant_type": "authorization_code",
},
)
token_response.raise_for_status()
access_token = token_response.json()["access_token"]
userinfo = await cli.get(
"https://openidconnect.googleapis.com/v1/userinfo",
headers={"Authorization": f"Bearer {access_token}"},
)
userinfo.raise_for_status()
data = userinfo.json()
return OAuthProfile(
provider_subject=data["sub"],
email=data["email"],
name=data.get("name") or data["email"].split("@")[0],
)
async def microsoft_exchange_code(code: str, redirect_uri: str) -> OAuthProfile:
async with httpx.AsyncClient(timeout=10) as cli:
token_response = await cli.post(
"https://login.microsoftonline.com/common/oauth2/v2.0/token",
data={
"code": code,
"client_id": settings.MS_CLIENT_ID,
"client_secret": settings.MS_CLIENT_SECRET,
"redirect_uri": redirect_uri,
"grant_type": "authorization_code",
"scope": "openid email profile",
},
)
token_response.raise_for_status()
access_token = token_response.json()["access_token"]
userinfo = await cli.get(
"https://graph.microsoft.com/v1.0/me",
headers={"Authorization": f"Bearer {access_token}"},
)
userinfo.raise_for_status()
data = userinfo.json()
return OAuthProfile(
provider_subject=data["id"],
email=data.get("mail") or data["userPrincipalName"],
name=data.get("displayName") or data["userPrincipalName"].split("@")[0],
)

View File

@@ -12,6 +12,10 @@ from app.services.psa.types import (
PSAConfiguration,
PSATimeEntry,
PSABoard,
PaginatedTicketResult,
PSAResource,
PSACreatedTicket,
TicketCreatePayload,
)
@@ -28,7 +32,7 @@ class AutotaskProvider(PSAProvider):
async def get_ticket(self, ticket_id: str) -> PSATicket:
raise NotImplementedError("Autotask integration coming soon")
async def search_tickets(self, query: str, **filters) -> list[PSATicket]:
async def search_tickets(self, query: str, **filters) -> PaginatedTicketResult:
raise NotImplementedError("Autotask integration coming soon")
async def post_note(
@@ -74,3 +78,18 @@ class AutotaskProvider(PSAProvider):
work_type: str | None = None,
) -> PSATimeEntry:
raise NotImplementedError("Autotask integration coming soon")
async def list_resources(self, ticket_id: int) -> list[PSAResource]:
raise NotImplementedError("Autotask integration coming soon")
async def add_resource(self, ticket_id: int, member_id: int) -> PSAResource:
raise NotImplementedError("Autotask integration coming soon")
async def remove_resource(self, ticket_id: int, member_id: int) -> None:
raise NotImplementedError("Autotask integration coming soon")
async def create_ticket(self, payload: TicketCreatePayload) -> PSACreatedTicket:
raise NotImplementedError("Autotask integration coming soon")
async def list_priorities(self) -> list[dict]:
raise NotImplementedError("Autotask integration coming soon")

View File

@@ -13,6 +13,10 @@ from .types import (
PSAConfiguration,
PSATimeEntry,
PSABoard,
PaginatedTicketResult,
PSAResource,
PSACreatedTicket,
TicketCreatePayload,
)
@@ -28,7 +32,7 @@ class PSAProvider(ABC):
...
@abstractmethod
async def search_tickets(self, query: str, **filters) -> list[PSATicket]:
async def search_tickets(self, query: str, **filters) -> PaginatedTicketResult:
...
@abstractmethod
@@ -83,3 +87,23 @@ class PSAProvider(ABC):
work_type: str | None = None,
) -> PSATimeEntry:
...
@abstractmethod
async def list_resources(self, ticket_id: int) -> list[PSAResource]:
...
@abstractmethod
async def add_resource(self, ticket_id: int, member_id: int) -> PSAResource:
...
@abstractmethod
async def remove_resource(self, ticket_id: int, member_id: int) -> None:
...
@abstractmethod
async def create_ticket(self, payload: TicketCreatePayload) -> PSACreatedTicket:
...
@abstractmethod
async def list_priorities(self) -> list[dict]:
...

View File

@@ -7,6 +7,7 @@ from datetime import datetime, timezone
from app.services.psa.base import PSAProvider
from app.services.psa.cache import psa_cache
from app.services.psa.exceptions import PSAError
from app.services.psa.types import (
ConnectionTestResult,
PSATicket,
@@ -17,6 +18,10 @@ from app.services.psa.types import (
PSAConfiguration,
PSATimeEntry,
PSABoard,
PaginatedTicketResult,
PSAResource,
PSACreatedTicket,
TicketCreatePayload,
)
from .client import ConnectWiseClient
@@ -55,27 +60,31 @@ class ConnectWiseProvider(PSAProvider):
)
return self._map_ticket(data)
async def search_tickets(self, query: str, **filters) -> list[PSATicket]:
"""Search CW tickets by summary. Supports board_id, status_id, member_id,
unassigned, board_ids, page, and page_size filters."""
async def search_tickets(self, query: str, **filters) -> PaginatedTicketResult:
"""Search CW tickets by summary. Supports board_id, status_id, member_identifier,
unassigned, board_ids, page, and page_size filters. Returns paginated result."""
page_size = filters.get("page_size", 10)
page = filters.get("page", 1)
params: dict = {
"fields": "id,summary,company,board,status,priority,closedFlag",
"orderBy": "id desc",
"orderBy": "priority/sort asc,dateEntered desc",
"pageSize": page_size,
"page": page,
}
# Build CW condition query
conditions: list[str] = []
if query:
conditions.append(f"summary contains '{query}'")
# Sanitize: strip single quotes to prevent CW condition injection
safe_query = query.replace("'", "")
conditions.append(f"summary contains '{safe_query}'")
if filters.get("board_id"):
conditions.append(f"board/id = {filters['board_id']}")
if filters.get("status_id"):
conditions.append(f"status/id = {filters['status_id']}")
elif filters.get("status_name"):
safe_status = str(filters["status_name"]).replace("'", "")
conditions.append(f"status/name = '{safe_status}'")
if not filters.get("include_closed", False):
conditions.append("closedFlag = false")
if filters.get("member_identifier") is not None:
@@ -86,16 +95,27 @@ class ConnectWiseProvider(PSAProvider):
if board_ids:
board_list = ", ".join(str(bid) for bid in board_ids)
conditions.append(f"board/id in ({board_list})")
if filters.get("company_id"):
conditions.append(f"company/id = {int(filters['company_id'])}")
if conditions:
params["conditions"] = " and ".join(conditions)
condition_str = " and ".join(conditions) if conditions else ""
if condition_str:
params["conditions"] = condition_str
data = await self.client.get("/service/tickets", params=params)
count_params: dict = {}
if condition_str:
count_params["conditions"] = condition_str
return [
self._map_ticket(t)
for t in (data if isinstance(data, list) else [])
]
# Fire page fetch + count in parallel
data, count_data = await asyncio.gather(
self.client.get("/service/tickets", params=params),
self.client.get("/service/tickets/count", params=count_params),
)
items = [self._map_ticket(t) for t in (data if isinstance(data, list) else [])]
total = count_data.get("count", len(items)) if isinstance(count_data, dict) else len(items)
return PaginatedTicketResult(items=items, total=total, page=page, page_size=page_size)
async def get_ticket_configurations(
self, ticket_id: str
@@ -246,13 +266,30 @@ class ConnectWiseProvider(PSAProvider):
async def update_ticket_status(
self, ticket_id: str, status_id: int
) -> PSATicket:
"""Update a CW ticket's status using JSON Patch format."""
"""Update a CW ticket's status using JSON Patch format.
Verifies CW actually applied the change — CW silently returns 200 when
a status id is invalid for the ticket's board. We check the response
body's status.id matches what we sent, and raise PSAError if not.
"""
patch_body = [
{"op": "replace", "path": "status", "value": {"id": status_id}}
]
data = await self.client.patch(
f"/service/tickets/{ticket_id}", json_body=patch_body
)
applied = (data.get("status") or {}) if isinstance(data, dict) else {}
applied_id = applied.get("id")
if applied_id != status_id:
logger.warning(
"CW status PATCH for ticket %s returned status id=%s instead of %s",
ticket_id, applied_id, status_id,
)
raise PSAError(
f"ConnectWise did not apply status {status_id} "
f"(still {applied.get('name') or applied_id}). "
"The status may not be valid for this ticket's board."
)
return self._map_ticket(data)
async def list_members(self) -> list[PSAMember]:
@@ -591,16 +628,247 @@ class ConnectWiseProvider(PSAProvider):
@staticmethod
def _map_ticket(data: dict) -> PSATicket:
"""Map a CW ticket JSON dict to a PSATicket."""
company = data.get("company") or {}
board = data.get("board") or {}
status = data.get("status") or {}
priority = data.get("priority") or {}
return PSATicket(
id=str(data["id"]),
id=str(data.get("id", "")),
summary=data.get("summary", ""),
company_name=data.get("company", {}).get("name"),
company_id=str(data["company"]["id"]) if data.get("company") else None,
board_name=data.get("board", {}).get("name"),
board_id=data.get("board", {}).get("id"),
status_name=data.get("status", {}).get("name"),
status_id=data.get("status", {}).get("id"),
priority_name=data.get("priority", {}).get("name"),
priority_id=data.get("priority", {}).get("id"),
company_name=company.get("name"),
company_id=str(company.get("id")) if company.get("id") else None,
board_name=board.get("name"),
board_id=board.get("id"),
status_name=status.get("name"),
status_id=status.get("id"),
priority_name=priority.get("name"),
priority_id=priority.get("id"),
closed=data.get("closedFlag", False),
)
# ── Resource management ───────────────────────────────────────────
# Schedule type id for "Service Ticket" resources — CW's canonical type for ticket co-assignees
_SCHEDULE_TYPE_SERVICE_TICKET = 4
async def _get_ticket_owner(self, ticket_id: int) -> dict | None:
"""Fetch the ticket's current owner (MemberReference) or None if unassigned."""
data = await self.client.get(
f"/service/tickets/{ticket_id}",
params={"fields": "id,owner"},
)
if not isinstance(data, dict):
return None
owner_raw = data.get("owner")
return owner_raw if isinstance(owner_raw, dict) and owner_raw.get("id") else None
async def _list_ticket_schedule_entries(self, ticket_id: int) -> list[dict]:
"""List schedule entries for a ticket's co-assignees.
Returns raw CW schedule entry dicts with at least id and member info.
"""
data = await self.client.get(
"/schedule/entries",
params={
"conditions": (
f"type/id={self._SCHEDULE_TYPE_SERVICE_TICKET} AND objectId={ticket_id}"
),
"fields": "id,member,name",
"pageSize": 100,
},
)
return data if isinstance(data, list) else []
async def list_resources(self, ticket_id: int) -> list[PSAResource]:
"""List members assigned to a CW ticket.
Merges the `owner` MemberReference (primary assignee) with schedule entries
of type 4 (Service Ticket resources — co-assignees). Deduped by member id.
"""
owner = await self._get_ticket_owner(ticket_id)
entries = await self._list_ticket_schedule_entries(ticket_id)
members = await self.list_members()
by_id = {str(m.id): m for m in members}
seen_ids: set[str] = set()
results: list[PSAResource] = []
if owner is not None:
owner_id = str(owner.get("id"))
m = by_id.get(owner_id)
if m:
results.append(PSAResource(
member_id=int(m.id),
member_name=m.name,
member_identifier=m.identifier,
))
else:
results.append(PSAResource(
member_id=int(owner.get("id") or 0),
member_name=str(owner.get("name") or ""),
member_identifier=str(owner.get("identifier") or ""),
))
seen_ids.add(owner_id)
for entry in entries:
entry_member = entry.get("member") if isinstance(entry, dict) else None
if not isinstance(entry_member, dict):
continue
mid = str(entry_member.get("id") or "")
if not mid or mid in seen_ids:
continue
m = by_id.get(mid)
if m:
results.append(PSAResource(
member_id=int(m.id),
member_name=m.name,
member_identifier=m.identifier,
))
else:
results.append(PSAResource(
member_id=int(entry_member.get("id") or 0),
member_name=str(entry_member.get("name") or ""),
member_identifier=str(entry_member.get("identifier") or ""),
))
seen_ids.add(mid)
return results
async def add_resource(self, ticket_id: int, member_id: int) -> PSAResource:
"""Assign a member to a CW ticket.
- If the ticket has no owner, set the target as `owner` (CW's canonical
primary assignee field). CW typically mirrors this into the derived
`resources` string automatically.
- If the ticket is already owned by someone else, add the target as a
co-assignee via a schedule entry of type 4 (Service Ticket). The
existing owner is not changed.
- Idempotent when target is already owner or already has a schedule entry.
"""
members = await self.list_members()
target = next((m for m in members if str(m.id) == str(member_id)), None)
if target is None:
raise PSAError(f"Member {member_id} not found")
current_owner = await self._get_ticket_owner(ticket_id)
if current_owner is None:
# Primary assign — set owner
await self.client.patch(
f"/service/tickets/{ticket_id}",
json_body=[{"op": "replace", "path": "owner", "value": {"id": int(target.id)}}],
)
elif str(current_owner.get("id")) != str(target.id):
# Ticket owned by someone else — add as co-assignee via schedule entry.
# Idempotent: skip if a schedule entry already exists for this member.
existing = await self._list_ticket_schedule_entries(ticket_id)
already_assigned = any(
str((e.get("member") or {}).get("id") or "") == str(target.id)
for e in existing
)
if not already_assigned:
await self.client.post(
"/schedule/entries",
json_body={
"member": {"id": int(target.id)},
"objectId": int(ticket_id),
"type": {"id": self._SCHEDULE_TYPE_SERVICE_TICKET},
"name": target.name or target.identifier or f"Member {target.id}",
},
)
# else: already the owner — idempotent no-op
return PSAResource(
member_id=int(target.id),
member_name=target.name,
member_identifier=target.identifier,
)
async def remove_resource(self, ticket_id: int, member_id: int) -> None:
"""Remove a member from a CW ticket (idempotent).
- If the target is the current owner, clear the owner field.
- Otherwise, delete their schedule entry (Service Ticket type).
"""
members = await self.list_members()
target = next((m for m in members if str(m.id) == str(member_id)), None)
if target is None:
return
current_owner = await self._get_ticket_owner(ticket_id)
if current_owner is not None and str(current_owner.get("id")) == str(target.id):
# Unassign the owner. Try RFC 6902 "remove" first; fall back to
# "replace" with null if CW rejects it.
try:
await self.client.patch(
f"/service/tickets/{ticket_id}",
json_body=[{"op": "remove", "path": "owner"}],
)
except PSAError:
await self.client.patch(
f"/service/tickets/{ticket_id}",
json_body=[{"op": "replace", "path": "owner", "value": None}],
)
return
# Not the owner — find and delete the schedule entry for this member.
entries = await self._list_ticket_schedule_entries(ticket_id)
for entry in entries:
entry_member = entry.get("member") if isinstance(entry, dict) else None
if isinstance(entry_member, dict) and str(entry_member.get("id") or "") == str(target.id):
entry_id = entry.get("id")
if entry_id:
await self.client.delete(f"/schedule/entries/{entry_id}")
break
# ── Ticket creation ───────────────────────────────────────────────
async def create_ticket(self, payload: TicketCreatePayload) -> PSACreatedTicket:
"""Create a new CW service ticket."""
body: dict = {
"summary": payload.summary,
"board": {"id": payload.board_id},
"company": {"id": payload.company_id},
"status": {"id": payload.status_id},
"priority": {"id": payload.priority_id},
}
if payload.description:
body["initialDescription"] = payload.description
if payload.assigned_member_id:
body["owner"] = {"id": payload.assigned_member_id}
data = await self.client.post("/service/tickets", json_body=body)
ticket_id = data.get("id") if isinstance(data, dict) else None
resources: list[PSAResource] = []
if ticket_id and payload.assigned_member_id:
try:
resources = await self.list_resources(ticket_id)
except Exception:
pass
company = (data.get("company") or {}) if isinstance(data, dict) else {}
board = (data.get("board") or {}) if isinstance(data, dict) else {}
status = (data.get("status") or {}) if isinstance(data, dict) else {}
priority = (data.get("priority") or {}) if isinstance(data, dict) else {}
return PSACreatedTicket(
id=ticket_id or 0,
summary=data.get("summary", payload.summary) if isinstance(data, dict) else payload.summary,
board_name=board.get("name", ""),
status_name=status.get("name", ""),
priority_name=priority.get("name", ""),
company_name=company.get("name", ""),
resources=resources,
)
# ── Priorities ────────────────────────────────────────────────────
async def list_priorities(self) -> list[dict]:
"""List CW service priorities."""
data = await self.client.get("/service/priorities", params={"pageSize": 50})
return [
{"id": p.get("id"), "name": p.get("name")}
for p in (data if isinstance(data, list) else [])
]

View File

@@ -12,6 +12,10 @@ from app.services.psa.types import (
PSAConfiguration,
PSATimeEntry,
PSABoard,
PaginatedTicketResult,
PSAResource,
PSACreatedTicket,
TicketCreatePayload,
)
@@ -28,7 +32,7 @@ class HaloPSAProvider(PSAProvider):
async def get_ticket(self, ticket_id: str) -> PSATicket:
raise NotImplementedError("Halo PSA integration coming soon")
async def search_tickets(self, query: str, **filters) -> list[PSATicket]:
async def search_tickets(self, query: str, **filters) -> PaginatedTicketResult:
raise NotImplementedError("Halo PSA integration coming soon")
async def post_note(
@@ -74,3 +78,18 @@ class HaloPSAProvider(PSAProvider):
work_type: str | None = None,
) -> PSATimeEntry:
raise NotImplementedError("Halo PSA integration coming soon")
async def list_resources(self, ticket_id: int) -> list[PSAResource]:
raise NotImplementedError("Halo PSA integration coming soon")
async def add_resource(self, ticket_id: int, member_id: int) -> PSAResource:
raise NotImplementedError("Halo PSA integration coming soon")
async def remove_resource(self, ticket_id: int, member_id: int) -> None:
raise NotImplementedError("Halo PSA integration coming soon")
async def create_ticket(self, payload: TicketCreatePayload) -> PSACreatedTicket:
raise NotImplementedError("Halo PSA integration coming soon")
async def list_priorities(self) -> list[dict]:
raise NotImplementedError("Halo PSA integration coming soon")

View File

@@ -73,6 +73,40 @@ class PSABoard(BaseModel):
inactive: bool = False
class PaginatedTicketResult(BaseModel):
items: list[PSATicket]
total: int
page: int
page_size: int
class PSAResource(BaseModel):
member_id: int
member_name: str
member_identifier: str
is_rf_user: bool = False
class PSACreatedTicket(BaseModel):
id: int
summary: str
board_name: str
status_name: str
priority_name: str
company_name: str
resources: list[PSAResource] = []
class TicketCreatePayload(BaseModel):
summary: str
company_id: int
board_id: int
status_id: int
priority_id: int
description: str | None = None
assigned_member_id: int | None = None
class NoteType:
INTERNAL_ANALYSIS = "internal_analysis"
RESOLUTION = "resolution"

View File

@@ -83,6 +83,10 @@ state means the engineer resolved the issue another way; the note should cover \
that actual resolution, not just the failed attempt.
- applied_partial: Note that the fix was partially applied. If partial_notes \
are provided, include them. Then describe the final resolution path taken.
- applied_pending: Note that the fix was applied and verification is pending. \
If pending_reason is provided, include it as the provided waiting reason. \
Frame the resolution as provisional — the fix is in place but not yet \
confirmed. Do not write closure language.
- dismissed: Treat the fix as considered and set aside. Do not center the note \
on it. Describe the resolution based on what was actually confirmed and done.
- proposed (no outcome yet): Write "Resolution not yet applied — fix proposed: \
@@ -322,6 +326,8 @@ class ResolutionNoteGeneratorService:
lines.append(f"Verified at: {active_fix.verified_at.isoformat()}")
if active_fix.partial_notes:
lines.append(f"Partial notes: {active_fix.partial_notes}")
if active_fix.pending_reason:
lines.append(f"Pending reason: {active_fix.pending_reason}")
if active_fix.failure_reason:
lines.append(f"Failure reason: {active_fix.failure_reason}")

View File

@@ -5,6 +5,7 @@ from uuid import UUID
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from app.models.ai_session import AISession
from app.models.session_resolution_output import SessionResolutionOutput
@@ -21,7 +22,9 @@ class ResolutionOutputGenerator:
async def generate_all(self, session_id: UUID) -> list[SessionResolutionOutput]:
result = await self.db.execute(
select(AISession).where(AISession.id == session_id)
select(AISession)
.options(selectinload(AISession.steps))
.where(AISession.id == session_id)
)
session = result.scalar_one_or_none()
if not session:

View File

@@ -360,6 +360,7 @@ async def save_to_library(
category_id: UUID | None,
share_with_team: bool,
user_id: UUID,
account_id: UUID,
team_id: UUID | None,
script_body: str | None = None,
parameters_schema: dict | None = None,
@@ -401,6 +402,7 @@ async def save_to_library(
id=uuid_mod.uuid4(),
category_id=resolved_category_id,
created_by=user_id,
account_id=account_id,
team_id=team_id if share_with_team else None,
name=name,
slug=slug,

View File

@@ -0,0 +1,116 @@
"""Ticket mutation service — wraps PSA provider, resolves is_rf_user flag."""
from __future__ import annotations
import logging
from uuid import UUID
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.models.psa_connection import PsaConnection
from app.models.psa_member_mapping import PsaMemberMapping
from app.schemas.psa_tickets import (
PSAResourceSchema,
PSATicketCreatedSchema,
PSATicketStatusUpdateSchema,
)
from app.services.psa.registry import get_provider_for_account
from app.services.psa.types import TicketCreatePayload
logger = logging.getLogger(__name__)
async def _get_mapped_member_ids(account_id: UUID, db: AsyncSession) -> set[int]:
"""Return set of external_member_id ints that are mapped to RF users."""
conn_result = await db.execute(
select(PsaConnection).where(PsaConnection.account_id == account_id)
)
conn = conn_result.scalar_one_or_none()
if not conn:
return set()
mappings = await db.execute(
select(PsaMemberMapping).where(PsaMemberMapping.psa_connection_id == conn.id)
)
return {int(m.external_member_id) for m in mappings.scalars().all() if m.external_member_id}
async def list_resources(
account_id: UUID, ticket_id: int, db: AsyncSession
) -> list[PSAResourceSchema]:
provider = await get_provider_for_account(account_id, db)
mapped_ids = await _get_mapped_member_ids(account_id, db)
resources = await provider.list_resources(ticket_id)
return [
PSAResourceSchema(
member_id=r.member_id,
member_name=r.member_name,
member_identifier=r.member_identifier,
is_rf_user=r.member_id in mapped_ids,
)
for r in resources
]
async def add_resource(
account_id: UUID, ticket_id: int, member_id: int, db: AsyncSession
) -> PSAResourceSchema:
provider = await get_provider_for_account(account_id, db)
mapped_ids = await _get_mapped_member_ids(account_id, db)
resource = await provider.add_resource(ticket_id, member_id)
return PSAResourceSchema(
member_id=resource.member_id,
member_name=resource.member_name,
member_identifier=resource.member_identifier,
is_rf_user=resource.member_id in mapped_ids,
)
async def remove_resource(
account_id: UUID, ticket_id: int, member_id: int, db: AsyncSession
) -> None:
provider = await get_provider_for_account(account_id, db)
await provider.remove_resource(ticket_id, member_id)
async def update_status(
account_id: UUID, ticket_id: int, status_id: int, db: AsyncSession
) -> PSATicketStatusUpdateSchema:
provider = await get_provider_for_account(account_id, db)
# get current status before updating
ticket = await provider.get_ticket(str(ticket_id))
previous_status = ticket.status_name or ""
await provider.update_ticket_status(str(ticket_id), status_id)
# get new status name from statuses list
statuses = await provider.get_ticket_statuses(ticket.board_id or 0)
new_status = next((s.name for s in statuses if s.id == status_id), str(status_id))
return PSATicketStatusUpdateSchema(
ticket_id=ticket_id,
previous_status=previous_status,
new_status=new_status,
new_status_id=status_id,
)
async def create_ticket(
account_id: UUID, payload: TicketCreatePayload, db: AsyncSession
) -> PSATicketCreatedSchema:
provider = await get_provider_for_account(account_id, db)
mapped_ids = await _get_mapped_member_ids(account_id, db)
result = await provider.create_ticket(payload)
return PSATicketCreatedSchema(
id=result.id,
summary=result.summary,
board_name=result.board_name,
status_name=result.status_name,
priority_name=result.priority_name,
company_name=result.company_name,
resources=[
PSAResourceSchema(
member_id=r.member_id,
member_name=r.member_name,
member_identifier=r.member_identifier,
is_rf_user=r.member_id in mapped_ids,
)
for r in result.resources
],
)

Some files were not shown because too many files have changed in this diff Show More