Commit Graph

1040 Commits

Author SHA1 Message Date
fff8338bf2 docs(ai): track escalation assessment latency follow-up
Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 19:55:31 -04:00
bc15952857 fix(tests): stabilize escalation SSE backend tests
Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 19:47:43 -04:00
ba46fc5644 docs(ai): pause Escalation Mode build mid-SSE for Codex review
Update HANDOFF to reflect:
- Build paused after the WIP SSE commit (87bd0b7)
- What Codex should look at on the SSE bus + endpoint + dispatch wiring
- Resume point post-review: re-run tests with -n auto, then frontend
  SSE subscription, then magic-moment screen
- Test-suite watch-out: per-test DROP SCHEMA fixture means concurrent
  pytest runs on the same DB collide; always one-suite-at-a-time or
  -n auto with conftest's per-worker DB isolation

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:29:16 -04:00
87bd0b7c56 WIP: SSE pub/sub for live escalation arrivals (paused for Codex review)
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.

What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
  pub/sub keyed by account_id. asyncio.Queue per subscriber with
  64-event maxsize and drop-on-full semantics. Designed to be swappable
  for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
  /api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
  require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
  bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
  now publishes a `handoff_created` event to the bus BEFORE the email
  fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
  standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
  (publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
  blocked, ready event handshake).

[gstack-context]
Decisions:
  - SSE over WebSocket (one-way, browser EventSource semantics, fewer
    moving parts behind Railway proxy)
  - In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
  - Drop-on-full subscriber queue rather than back-pressure publishers
  - Bus publish ahead of email send, both wrapped in try/except so
    neither can break handoff creation
  - Frontend will be a fetch-based ReadableStream reader matching the
    existing streamDocumentation pattern, not native EventSource
    (custom-header auth)
Remaining (post-Codex):
  - Frontend SSE subscription in EscalationQueue.tsx (slide-in,
    reconnect, tab-title flash, prefers-reduced-motion)
  - Magic-moment handoff-context screen
  - Re-run the full backend test suite to verify the SSE +
    dispatcher integration tests (bus units already green standalone)
Tried:
  - Running the full test suite repeatedly without xdist; the per-test
    DROP SCHEMA + recreate fixture made wall-clock prohibitive when
    multiple stale runs collided on the same Postgres test schema.
    Resolution: -n auto next time.
[/gstack-context]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:29:07 -04:00
a283d0d3fd docs(ai): refresh handoff state mid-flight on Escalation Mode build
Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:

- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
  this branch, what remains (WebSocket push, magic-moment screen, analytics
  page, e2e), and the two-metric framing readers MUST internalize before
  quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
  notification dual-path); includes suggested first slice + watch-outs
  (no user_id on ai_session_step, denormalized account_id, peer-escalation
  still gated to session owner)
- Both files reference the design doc and the kill-switch criterion

No code change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 16:38:14 -04:00
9f0bfd44f9 feat(escalations): mount time-to-first-action stat-card on /escalations
Surfaces the new GET /analytics/flowpilot/escalations endpoint as a card
above the EscalationQueue list. Closes the loop from yesterday's metric
endpoint commit — seniors and owners see the wedge stat the moment they
open the queue, which is the daily-reps version of the GTM ROI story.

Pieces:
- EscalationMetrics TS interface mirroring the backend Pydantic model
  (incl. metric_definition disclaimer field)
- flowpilotAnalyticsApi.getEscalationMetrics(period) client method
- EscalationMetricCard component:
    * loading skeleton, error state, zero-data empty state
    * avg + median + n_with_action/n_claimed conversion rate
    * humanized seconds → "Ns" / "N.N min" formatting
    * inline disclaimer reminding callers this is in-product time-to-
      first-action only, NOT the savings claim — pair with manual
      baseline (per /codex review's two-metric correction)
- Wired into EscalationQueuePage above EscalationQueue

DS-aligned: card-flat, accent-dim usage held to interactive elements,
text-muted-foreground for secondary copy, font-heading on the headline
number, explicit transition properties (no `transition: all`). Respects
prefers-reduced-motion implicitly (only animation is the loading pulse,
which Tailwind's animate-pulse already gates).

tsc -b clean. No new tests in this commit — component is a thin
state-machine over an axios call; integration coverage comes from the
existing backend tests + the e2e Playwright work in the plan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 16:00:34 -04:00
07d0db9579 feat(handoff): email engineer-or-admin teammates on escalation
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.

HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
  (excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
  via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
  creation is authoritative; notification is advisory. This is the
  graceful-degradation regression both eng + codex reviews flagged as
  critical (handoff must succeed even if SMTP is down).

Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
  handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.

Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
  excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
  service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST

Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.

20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:58:05 -04:00
7a5b853b3b feat(api): role-gate handoff claim to engineer-or-admin
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.

Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate

Existing handoff create + queue tests unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:46:59 -04:00
52f6d0308f feat(analytics): add escalation time-to-first-action metric endpoint
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}

Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.

The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.

Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
  aggregates avg/median/p95 in Python (~1k rows/account/month is well within
  budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
  n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases

Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.

First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:25:46 -04:00
d51e95cdfa docs(plans): add escalation-mode wedge design + test plan
Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI
specs, and embedded review report for the Escalation Mode wedge — output of
/office-hours, /plan-eng-review, /plan-design-review, and /codex review.

Codex review surfaced two corrections we applied:
- two-metric framing (manual baseline vs in-product time-to-first-action)
- claim role gate moved in-scope (was deferred TODO)

TODO updates: peer-tech escalation + claim role gate captured (the latter then
moved in-scope by the codex pass).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 15:18:46 -04:00
c0ed6d9840 Merge pull request 'docs(ai): refresh handoff state after PR #153 merge' (#154) from chore/post-153-handoff into main
All checks were successful
CI / frontend (push) Successful in 5m37s
Mirror to GitHub / mirror (push) Successful in 14s
CI / backend (push) Successful in 10m48s
CI / e2e (push) Successful in 11m0s
Reviewed-on: #154
2026-04-26 05:33:31 +00:00
8f818a7c71 docs(ai): refresh handoff state after PR #153 merge
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m49s
CI / backend (pull_request) Successful in 11m5s
CI / e2e (pull_request) Successful in 11m36s
- CURRENT_TASK rolls forward — PR #153 closed out, no active task,
  with recommended next moves (promote e2e gate to required, pick
  from TODO).
- HANDOFF rewritten — new home position is `main`; documents the
  e2e job's stub ANTHROPIC_API_KEY convention so future
  AI-touching e2e tests know what to expect.
- SESSION_LOG entry extended with the CI env-var diagnosis, the
  fix, the merge, and pointers to the natural next pickups.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 01:14:49 -04:00
68fcdc6122 Merge PR #153: fix(chat): sync currentChatRef when prefill creates a new chat session
All checks were successful
CI / frontend (push) Successful in 5m57s
Mirror to GitHub / mirror (push) Successful in 13s
CI / backend (push) Successful in 10m28s
CI / e2e (push) Successful in 12m0s
Fixes a silent-drop bug where the dashboard prefill flow created a new chat session but didn't update the in-flight guard ref, so subsequent task-lane submissions had their AI follow-up responses discarded.

Includes a Playwright regression test that drives the prefill flow and stubs /ai-sessions/*/chat to verify the second AI turn renders. Also adds a stub ANTHROPIC_API_KEY to the e2e CI job so AI-gated endpoints clear their _require_ai_enabled() check (the chat call itself is intercepted in the browser, so no real Anthropic traffic).
2026-04-26 05:05:54 +00:00
11fe32f4c6 fix(ci): set stub ANTHROPIC_API_KEY for e2e job so AI-gated endpoints respond
All checks were successful
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m39s
CI / backend (pull_request) Successful in 10m24s
CI / e2e (pull_request) Successful in 12m14s
POST /api/v1/ai-sessions and friends call _require_ai_enabled(), which
returns 503 when no provider key is set. The new prefill-handoff
regression test (e2e/assistant-chat-prefill.spec.ts) drives the
dashboard prefill flow, which has to create a chat session before its
page.route stub on /chat can fire — so without a key, session
creation 503s and the test never sees the task lane.

The Playwright stub intercepts /chat in the browser, so the backend
never actually contacts Anthropic — but the AI-enabled gate still
needs to pass. A stub value is enough.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:51:39 -04:00
43eed720d9 docs(ai): close out PR #150, set PR #153 as active task
Some checks failed
Mirror to GitHub / mirror (push) Successful in 13s
CI / frontend (pull_request) Successful in 5m50s
CI / e2e (pull_request) Failing after 6m50s
CI / backend (pull_request) Successful in 10m40s
- CURRENT_TASK.md rolled forward — the CI-recovery task is complete
  (PR #150 merged as 87bb20b; backend gate is in required checks).
  Active task is now landing PR #153.
- HANDOFF.md rewritten — new resume point is watching CI on the
  rebased SHA 1559feb and merging when all three checks are green.
- SESSION_LOG.md gains a 2026-04-26 entry covering the prefill bug
  diagnosis, fix, regression test, and the rebase off post-#150 main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:30:50 -04:00
1559feb759 docs(ai): track currentChatRef silent-swallow follow-up in TODO
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m43s
CI / e2e (pull_request) Failing after 6m40s
CI / backend (pull_request) Has been cancelled
The guard pattern that masked the prefill-ref bug fixed in PR #153 is
applied across handleSend, handleTaskSubmit, selectChat, refreshFacts,
refreshActiveFix, and refreshPreview. Worth either logging the
mismatch path or distinguishing expected-stale from unexpected-stale
so the next instance of this class of bug surfaces instead of hiding.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:24:25 -04:00
b56da2facd fix(chat): sync currentChatRef when prefill creates a new chat session
The dashboard prefill flow in AssistantChatPage set activeChatId after
creating a new session but never updated currentChatRef.current. Every
later handleSend / handleTaskSubmit then tripped the
`currentChatRef.current !== sentForChatId` guard that was supposed to
discard responses for stale chats — and silently dropped the AI's
follow-up. The user saw their submitted message but no assistant
reply, no toast, no task-lane update.

Mirrors what handleNewChat and handleResumeNew already do. Adds an
e2e regression test that drives the dashboard prefill, submits a
partial task-lane response, and asserts the second AI turn renders.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 00:24:02 -04:00
87bb20b8f0 Merge PR #150: fix(ci): consolidated CI recovery — backend green, xdist parallelization, e2e selector + decoupling
All checks were successful
CI / frontend (push) Successful in 5m42s
Mirror to GitHub / mirror (push) Successful in 13s
CI / backend (push) Successful in 10m21s
CI / e2e (push) Successful in 11m5s
2026-04-25 21:57:26 +00:00
1e3a6cfa01 fix(e2e): harden card selectors for session resume
All checks were successful
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 5m43s
CI / backend (pull_request) Successful in 10m21s
CI / e2e (pull_request) Successful in 11m23s
Co-Authored-By: Codex <noreply@openai.com>
2026-04-25 16:42:33 -04:00
ede6eebf9a docs(ai): note e2e decoupling commit (261814a) in HANDOFF
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m43s
CI / e2e (pull_request) Failing after 9m30s
CI / backend (pull_request) Successful in 10m18s
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 16:12:19 -04:00
261814ae65 perf(ci): decouple e2e from frontend — build frontend inline in e2e job
Some checks failed
Mirror to GitHub / mirror (push) Successful in 14s
CI / frontend (pull_request) Successful in 5m44s
CI / e2e (pull_request) Failing after 7m42s
CI / backend (pull_request) Successful in 10m28s
Before: e2e \`needs: [frontend]\` waited for the frontend job to upload
a build artifact, then downloaded it. With multiple runners this means
the third runner sat idle for ~6 min while frontend ran, then started
e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min.

After: e2e builds its own frontend (npm ci + npm run build are already
in the job; just dropped the artifact download step and added the
build). e2e starts immediately on a free runner. Adds ~1-2 min to the
e2e job duration but removes ~5 min of waiting and eliminates the
cross-job artifact mechanism entirely.

Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches
on the cross-job handoff. The \`if: always()\` upload of the
playwright-report at the end of e2e is kept (failure report retrieval
is still useful), but it's a leaf-output, not a dependency.

Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the
3-runner setup, down from ~11 min.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:59:00 -04:00
6656ebdead docs(ai): reflect PR consolidation — #151/#152 merged into #150
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:55:08 -04:00
69f2a37591 fix(e2e): update 5 selectors that drifted with FlowPilot/PSA UI changes
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Successful in 5m52s
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
Mechanical drift between the e2e selectors and the current UI surfaced
on the first CI run after PR #149 unblocked the artifact upload step.
Five tests, three categories of drift:

1. **Page heading renames** (navigation.spec.ts)
   - `Sessions` → `Session History` on /sessions
   - `Account Settings` → `Account Management` on /account

2. **Route rename** (command-palette.spec.ts:74)
   - The "Troubleshoot with FlowPilot" command palette option now lands
     on /pilot (Phase 1 of the FlowPilot migration renamed /assistant).
     /assistant still 301-redirects, so the assertion accepts either.

3. **Feature moved to /sessions** (history.spec.ts, resume.spec.ts)
   - Default tab on /sessions is "AI Sessions"; flow-session filtering
     and the Resume button moved behind the "Flow Sessions" tab. Both
     tests now click that tab before asserting.
   - resume.spec.ts no longer starts at /trees (Resume buttons aren't
     rendered there anymore — the flow lives on /sessions). Destination
     URL (/trees/:id/navigate) is unchanged.

No product-code changes — these are pure test updates against the
shipped UI. Run the suite locally with
`cd frontend && npm run test:e2e` once a fresh build is available.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:53:57 -04:00
7f714363dd perf(ci): pytest-xdist with per-worker DBs — 22m → ~4m
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).

How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
  'gw0', 'gw1', …). When set, derives a per-worker DB URL like
  `…/resolutionflow_test_gw0`. The base DB stays for serial / master
  runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
  connects to the postgres maintenance DB, and `CREATE DATABASE`s the
  worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
  contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
  worker's isolated DB, no cross-worker race.

CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).

Adds `pytest-xdist==3.6.1` to requirements-dev.txt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:53:47 -04:00
1bd43abb8f fix(ci): drop postgres host port mapping (multi-runner port collision)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / frontend (pull_request) Successful in 6m44s
CI / e2e (pull_request) Failing after 8m43s
CI / backend (pull_request) Has been cancelled
With 3 Gitea Actions runners on the same homelab box, two simultaneous
backend (or backend + e2e) jobs both try to bind 0.0.0.0:5432 for their
postgres service containers. The second fails with:

  failed to set up container networking: ... Bind for 0.0.0.0:5432
  failed: port is already allocated

The host-port mapping isn't actually needed — the workflow uses
\`DATABASE_URL: postgresql+asyncpg://...@postgres:5432/...\` (hostname
\`postgres\` is the service container's docker-network DNS name).
The tests run inside the act container which is on the same docker
network, so they reach postgres without going through the host.

Removing \`ports: 5432:5432\` from both backend and e2e job service
definitions lets multiple postgres services run in parallel on
different docker networks without colliding on the host.

Surfaced when PR #150 ran in parallel with another job after the
multi-runner setup. Backend instant-failed in 2s on the docker run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:28:17 -04:00
c203b70ef9 docs(ai): queue data-testid hardening + reflect PR #152 + 3-runner setup
Some checks failed
CI / backend (pull_request) Failing after 2s
Mirror to GitHub / mirror (push) Successful in 15s
CI / e2e (pull_request) Has been cancelled
CI / frontend (pull_request) Has been cancelled
TODO.md: Promote pytest-xdist to  (PR #151 carries it). Adds three new
backlog items:
- data-testid hardening for e2e-critical interactive elements (sparked
  by PR #152's selector drift work)
- per-test transactional rollback (next big speedup if needed)
- pytest-testmon for PR-time test selection

HANDOFF.md: Three open PRs now (#150, #151, #152), all independent.
Three Gitea runner agents now registered, so jobs run in parallel.
Combined with #151's xdist, the prior 1h 14m wall-clock should drop
to ~6-10 min. Updated merge order: #152 first (smallest), #150 next,
#151 last. After all three land, enable CI / backend then CI / e2e
as required status checks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:26:21 -04:00
f27e3b44b0 docs(ai): SESSION_LOG entry for the parallelization session
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Successful in 32m33s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m58s
(Was meant to land in fe632c9; the multi-line edit failed silently
because Codex's earlier entry shifted the surrounding context.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:15:41 -04:00
fe632c9194 docs(ai): handoff after CI parallelization + final test fix
Some checks failed
Mirror to GitHub / mirror (push) Has been cancelled
CI / backend (pull_request) Successful in 30m26s
CI / frontend (pull_request) Successful in 5m46s
CI / e2e (pull_request) Failing after 5m3s
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect:
- PR #150 now contains the AI-provider test mock + caching + maxfail.
  Backend CI should be fully green for the first time in months.
- PR #151 stacked on #150: pytest-xdist with per-worker DBs. Local
  verification: 22m 27s → 4m 28s (5× speedup), 1076 passed both runs.
- DoD is now: merge #150, then #151, then add CI / backend
  (pull_request) to required status checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:15:07 -04:00
e976fb4e87 fix(ci): mock AI provider in record_decision test + cache pip/npm + drop term-missing
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Successful in 31m8s
CI / frontend (pull_request) Successful in 5m42s
CI / e2e (pull_request) Failing after 4m57s
Three changes that get PR #150 to a green CI gate:

1. **test_record_decision_persists_and_bumps_state_version** — the
   `decision: draft_template` path calls `_extract_template_parameters`
   (TemplateExtractionService → AI provider). CI doesn't set
   ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
   `RuntimeError: No AI provider configured` and returned 500. The test
   isn't exercising the AI integration — patched the extractor with an
   AsyncMock returning a minimal valid `{templated_body, parameters}`
   dict. Verified locally: the test now passes.

2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
   the hash of requirements*.txt / package-lock.json with a runner-os
   restore-key fallback. Saves ~30-60s per run on cache hit.

3. **Pytest invocation tightened**:
   - Dropped `--cov-report=term-missing` — the custom "Display coverage
     summary" step below parses coverage.json and prints the same
     module list more concisely. Term-missing dumps every uncovered
     line which adds ~5-10s of stdout.
   - Added `--maxfail=10` so a structural breakage (fixture explosion,
     DB unreachable) bails after 10 errors instead of running the full
     25-min suite. Tunable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 12:01:05 -04:00
0aefaa78eb docs(ai): queue pytest-xdist parallelization in TODO.md
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / frontend (pull_request) Has been cancelled
CI / e2e (pull_request) Has been cancelled
CI / backend (pull_request) Has been cancelled
Capture the backend pytest parallelization work so it survives session
end. Backend suite is currently ~22 min wall-clock for 1076 tests;
xdist with one-DB-per-worker should land in the 3-6 min range on the
homelab Gitea Actions runner.

Also queues two backlog items:
- Frontend lint warnings (23 react-hooks/exhaustive-deps after PR #149)
- Periodic audit of the ResourceWarning filterwarnings added by Codex

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 11:35:38 -04:00
49f88569da wip(handoff): restore backend suite to green
Some checks failed
Mirror to GitHub / mirror (push) Successful in 12s
CI / backend (pull_request) Failing after 27m35s
CI / frontend (pull_request) Successful in 2m46s
CI / e2e (pull_request) Failing after 4m9s
Co-Authored-By: Codex <noreply@openai.com>
2026-04-25 06:13:23 -04:00
208ec996d5 docs(ai): handoff for Codex — CI recovery + 54 real backend failures
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m15s
CI / frontend (pull_request) Successful in 2m55s
CI / e2e (pull_request) Failing after 4m23s
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md so the next
session has accurate resume state. Summary of where things are:

- PR #141 (PSA tickets), PR #147 (FlowPilot Phase 1-9), PR #148 (CI
  fixes part 1), PR #149 (CI fixes part 2) all merged to main in this
  session.
- Branch protection enabled on main: PR-only, CI / frontend required.
- PR #150 (this branch) is the last CI-config PR — adds
  DATABASE_TEST_URL to the workflow and pins upload-artifact to v3.
- Next session: watch #150's CI, merge if green, add CI / backend to
  required checks, then start on the 54 real backend test failures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:36:54 -04:00
8f7df2c0ef fix(ci): set DATABASE_TEST_URL + downgrade upload-artifact to v3 (Gitea Actions)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 28m29s
CI / frontend (pull_request) Successful in 3m11s
CI / e2e (pull_request) Failing after 4m56s
Two CI-config issues blocking the gate from going green:

1. **Backend tests connect to localhost instead of postgres service.**
   conftest.py reads DATABASE_TEST_URL only — DATABASE_URL is intentionally
   not consulted (per dab740d's test-DB-isolation hardening — running
   pytest with DATABASE_URL set previously dropped the dev DB schema).
   The CI workflow only sets DATABASE_URL, so conftest falls back to its
   localhost default and every fixture-setup fails with
   `OSError: Connect call failed ('127.0.0.1', 5432)` — observed as 638
   errors on the latest main run.

   Add DATABASE_TEST_URL pointing at the postgres service container.
   Same connection string as DATABASE_URL — the test DB and the app DB
   are the same physical postgres in CI; conftest's safety assertion is
   satisfied by the URL containing "test".

2. **Frontend artifact upload fails on Gitea Actions runner.**
   actions/upload-artifact@v4 (and v5) are not supported on Gitea
   Actions / GHES — the runner returns
   `GHESNotSupportedError: ... not currently supported on GHES`. Lint
   itself is now passing (0 errors after PR #149); the job exits 1 only
   because the upload step then fails.

   Pin upload-artifact + download-artifact to v3, the latest version
   compatible with Gitea Actions until they ship v4 support.

After this lands, both backend and frontend CI gates should turn
green — at which point we can also add backend to the required status
checks on main.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 03:28:54 -04:00
f27f671fe6 Merge PR #149: fix(ci): frontend lint to zero errors + test-DB schema fix + dev-deps installable
Some checks failed
CI / backend (push) Failing after 10m26s
CI / frontend (push) Failing after 2m35s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 15s
2026-04-25 07:12:15 +00:00
d6218f2e07 fix(tests): import all models in conftest so create_all sees the full schema
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 11m23s
CI / frontend (pull_request) Failing after 2m41s
CI / e2e (pull_request) Has been skipped
The test_db fixture calls Base.metadata.create_all on a fresh test DB.
That only creates tables for models that have been imported (and thus
registered with Base.metadata) by the time the fixture runs.

app.main imports app.core.database (which gives us Base) but does NOT
eagerly import the model modules — most are pulled in lazily inside
scheduler functions (archive_stale_ai_sessions etc.) and route
modules. At fixture-setup time, only the handful of models touched by
those eager imports are on the metadata, so any test that exercises
PSA, network diagrams, ratings, escalations, etc. fails with
\`UndefinedTableError: relation "X" does not exist\` and a cascade of
500s on every endpoint that queries the missing table.

Adding \`from app import models as _models\` (rather than the bare
\`import app.models\` which would shadow the \`app\` FastAPI instance
imported just above) pulls in app/models/__init__.py, which itself
imports every model module — registering all ~60 tables with
Base.metadata before create_all runs.

Verified locally: tests/test_psa_writeback_phase4.py went from
1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were
masking the actual passes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:49:06 -04:00
920a246d77 fix(react): remove four setState-in-effect cascades flagged by react-hooks v5
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 11m23s
CI / frontend (pull_request) Failing after 2m42s
CI / e2e (pull_request) Has been skipped
The new react-hooks lint rule "Calling setState synchronously within an
effect can trigger cascading renders" flagged real anti-patterns in
four spots. Refactored each per the rule's intent (derive during render,
or use useSyncExternalStore for external subscriptions).

1. hooks/useMediaQuery.ts — replaced the useState + useEffect pair with
   useSyncExternalStore. That's the canonical React hook for
   subscribing to external stores (matchMedia in this case) without
   mirroring into local state via an effect. Snapshot/getServerSnapshot
   pair preserves the SSR-safe behaviour.

2. components/network/nodes/DeviceNode.tsx — the prop-sync useEffect
   that copied nodeData.label into labelValue was redundant.
   labelValue is the EDIT BUFFER; while not editing, the displayed
   span now reads nodeData.label directly. The buffer is initialized
   only when an edit session starts (onDoubleClick).

3. components/network/nodes/GroupNode.tsx — same pattern, same fix.

4. components/dashboard/TicketQueue.tsx — the
   setTickets([]) + setLoading(true) + fetchTickets() chain in the
   effect was the cascade. Pushed those writes inside fetchTickets
   (after the function boundary, so they batch with the eventual
   setTickets(result)). Added a request-id ref so a slow first
   response can't overwrite a fast second one.

Frontend lint: 20 errors → 0 errors. tsc -b clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:33:13 -04:00
b7f8e70be2 fix(lint): replace explicit-any types + unused-expressions ternaries
Five files, all stylistic:

- useFlowPilotSession.ts: typed the axios error shape with a narrow
  inline type instead of \`as any\`.
- FlowPilotSessionPage.tsx: same — typed location.state once, then
  destructured.
- ScriptBuilderTab.tsx: handleViewScript was a placeholder no-op;
  declared the args properly with \`void script; void filename\` so the
  signature matches ScriptBuilderChatProps without no-unused-vars
  firing.
- TicketsPage.tsx: replaced 8 ternaries-as-statements (\`x ? f() : g()\`)
  with proper if/else blocks. Same control flow, satisfies
  no-unused-expressions, and reads better in the URL-param update paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:57 -04:00
857d73e3d0 fix(lint): move AssistantSessionRedirect out of router.tsx (react-refresh gate)
react-refresh/only-export-components fires when a file with the
\`router\` const export also defines a component (the redirect helper).
Moves the small helper to its own file under components/routing/ so
HMR can keep the route-component module hot-reload-eligible.

No behavior change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:50 -04:00
406ee0ef97 fix(deps): bump pytest 7.4 → 8.4, pytest-cov 4.1 → 5.0 to satisfy pytest-asyncio 0.24
pytest-asyncio==0.24.0 (added on the FlowPilot branch as part of the
RLS test infra refactor) declares pytest>=8.2 — but requirements-dev.txt
still pinned pytest==7.4.3, so a clean pip install fails with
ResolutionImpossible. CI runners that started from a fresh image would
have refused to install dev deps; the FlowPilot tests passed locally
only because the dev container had a pre-installed pytest 8.x lying
around.

pytest-cov 4.1.0 also needs >= 5.0 to play nicely with pytest 8.

No code changes — pytest 8 is API-compatible with the existing test
suite once the install resolves.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:32:43 -04:00
32fae2c693 Merge PR #147: feat: FlowPilot migration — Phase 1-9 + Phase 9 bug fixes + QA fixture harness
Some checks failed
CI / backend (push) Failing after 36s
CI / frontend (push) Failing after 1m11s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 11s
2026-04-25 06:02:14 +00:00
a45915fbbc Merge main into feat/flowpilot-migration (PR #148 backports)
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 37s
CI / frontend (pull_request) Failing after 1m11s
CI / e2e (pull_request) Has been skipped
Brings PR #148 — two pre-existing CI fixes (network_diagrams JSONB
server_default, removed deprecated session-scoped event_loop fixture).

The conftest.py event_loop fix on main is already incorporated in
FlowPilot's b14a16a (RLS-gating commit, which dropped the same fixture
as part of its larger refactor). Kept HEAD's version of the RLS-gating
collection hook; the event_loop fixture removal is identical.

The network_diagram.py fix lands cleanly via auto-merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 02:01:46 -04:00
06593a40d9 Merge PR #148: fix(tests): repair two pre-existing bugs blocking backend CI
Some checks failed
CI / backend (push) Has been cancelled
CI / frontend (push) Has been cancelled
CI / e2e (push) Has been cancelled
Mirror to GitHub / mirror (push) Has been cancelled
2026-04-25 06:01:08 +00:00
9737d90f1b fix(tests): repair two pre-existing bugs blocking the backend CI gate
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 19m36s
CI / frontend (pull_request) Failing after 1m8s
CI / e2e (pull_request) Has been skipped
1. backend/app/models/network_diagram.py — `nodes` and `edges` columns
   used `server_default="'[]'"` (a Python string), which SQLAlchemy
   wraps in single quotes when generating DDL, producing
   `JSONB DEFAULT '''[]'''` — invalid JSON. Switch to
   `server_default=text("'[]'::jsonb")` so the literal is passed through
   and the table can actually be created. Surfaced on every CI run as
   `asyncpg.exceptions.InvalidTextRepresentationError: invalid input
   syntax for type json` at fixture setup time, cascading hundreds of
   test errors.

2. backend/tests/conftest.py — drop the deprecated session-scoped
   `event_loop` fixture. Since pytest-asyncio 0.23+, the plugin manages
   the loop itself; redefining it with a session scope but never
   `set_event_loop()`-ing it left the loop dangling, so any test that
   called `asyncio.run()` (e.g. `test_tasks_are_isolated`) closed the
   process loop and broke the next async test in the module —
   `test_require_tenant_context_raises_403_when_no_account` was the
   visible casualty in the CI logs.

Verified locally:
- `pytest tests/test_uploads.py::test_upload_success` — was setup-error
  on `network_diagrams` DDL; now passes.
- `pytest tests/test_tenant_context.py` — was 1 fail / 3 pass; now 4/4.

Both are real bugs, not test infrastructure churn. Pre-existing on
main; not introduced here.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 01:49:50 -04:00
1c904373f8 Merge main into feat/flowpilot-migration
Some checks failed
Mirror to GitHub / mirror (push) Successful in 11s
CI / backend (pull_request) Failing after 36s
CI / frontend (pull_request) Failing after 1m7s
CI / e2e (pull_request) Has been skipped
Brings in PR #141 (PSA ticket management) so FlowPilot can ship on top
of a unified main. Two manual conflict resolutions:

1. CLAUDE.md — kept the FlowPilot ai-handoff rewrite (`.ai/`-driven
   protocol). The pre-rewrite reference content (CW integration notes,
   lessons archive, env vars table) lives in `docs/connectwise/`,
   `docs/LESSONS-ARCHIVE.md`, and DEV-ENV.md by design.

2. frontend/src/pages/AssistantChatPage.tsx — both conflict regions
   were purely additive. Concatenated FlowPilot's Phase 2-9 state hooks
   (facts, activeFix, preview*, scriptPanelOpen, templatizeQueue) with
   PSA's spin-off ticket state (linkedTicket, showNewTicket, spinOffHint).
   Both modal mounts (TemplatizePrompt, ShortcutsHelpOverlay,
   NewTicketModal) kept. All setters wired by either branch are intact.

Verification:
- `tsc -b` clean across the merged tree.
- Browser smoke-test (Session B fixture): Phase 9 ProposalBanner
  ("Run AI-drafted PowerShell to recover SSL VPN") renders alongside
  PSA's new Tickets sidebar icon. Console clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 01:03:33 -04:00
16060d2235 Merge PR #141: feat: PSA ticket management — /tickets page, detail panel, AI ticket creation
Some checks failed
CI / backend (push) Failing after 19m11s
CI / frontend (push) Failing after 1m19s
CI / e2e (push) Has been skipped
Mirror to GitHub / mirror (push) Successful in 11s
2026-04-25 04:59:02 +00:00
9330ce4782 fix(pilot): two Phase 9 layout/state bugs surfaced by QA fixtures
All checks were successful
Mirror to GitHub / mirror (push) Successful in 11s
1. EscalateInterceptDialog clipped off-screen.
   The dialog was positioned with `absolute bottom-full mb-2 left-0`
   under the assumption the Escalate button would have room above it.
   In practice the button lives in the chat-page action bar near
   y≈105, so the 302 px dialog overflows the top of the viewport
   and only the last option is visible.

   Switch to `top-full mt-2 right-0` — anchors the dialog below the
   button and aligns its right edge with the button (avoids overflow
   off the right when the button is in the right-side action cluster).

2. TemplateMatchPanel never renders on a fresh session.
   `handleApplyFix` for the script_template_id branch only sets
   `scriptPanelOpen=true`, but TemplateMatchPanel is mounted inside
   `TaskLane.bottomSlot`. On sessions with no questions/facts the
   lane defaults closed, so the panel exists in the React tree but
   inside an unrendered TaskLane — the user clicks Apply fix and
   nothing visibly changes.

   Fix: also `setShowTaskLane(true)` in that branch so the lane
   opens alongside the panel. The ai_drafted_script branch is fine
   (InlineNoTemplateDialog renders in the chat region, not in the
   lane), so it's left alone.

Both bugs were latent — they only surface on sessions that haven't
accumulated TaskLane state yet (questions/facts). Fresh sessions
created from the StartSessionInput hide them because the AI's first
turn populates questions and the lane auto-opens. Caught using the
new seed_phase9_qa_fixtures.py harness.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:08:50 -04:00
d68131a865 feat(seed): Phase 9 QA fixture seeder
Adds backend/scripts/seed_phase9_qa_fixtures.py — creates 4 ai_sessions
plus matching session_suggested_fixes that pre-bake the four backend
states the AI orchestrator must produce to mount the five conditional
Phase 9 components:

  A. no template, no draft     → ChatTabStrip + ScriptBuilderTab
  B. ai_drafted_script set      → InlineNoTemplateDialog
  C. script_template_id set     → TemplateMatchPanel
  D. applied_at + status=proposed → EscalateInterceptDialog (verify state)

Background: a Phase 9 QA pass against a regular session left these
five components unreached because the AI didn't emit SUGGEST_FIX in
time/at all. Seeding directly bypasses the AI and lets QA exercise
each surface deterministically.

UUIDs are deterministic (uuid5 over a fixed namespace) so re-runs
upsert. Pass --reset to wipe and recreate. Each session gets two
synthetic conversation messages so the chat header's canAct gate
(messages.length >= 2) opens up Resolve/Escalate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 00:08:38 -04:00
875bd924a9 fix(pilot): auto-scroll Resolve preview into view when opened
The ResolutionNotePreview popover renders inside TaskLane's
overflow-y-auto region at the bottom of the lane. On a 720px
viewport with the default question/check list expanded, the
popover lands below the visible scroll position — the engineer
clicks "Preview Resolve note", sees the button label flip to
"Showing", but no preview appears on screen.

Add a useEffect that calls scrollIntoView({block: 'nearest'}) on
the popover's outer div whenever `open` flips to true. block:
'nearest' scrolls just enough to make it visible without yanking
the lane to the top.

Discovered during Phase 9 QA. Reproduced at 1280x720; fix verified
visually in the same QA run (screenshots in
.gstack/qa-reports/phase9-*/).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:45:52 -04:00
49c6c8fd00 fix(seed): include cancel_at_period_end in test-user subscription INSERT
Discovered during Phase 9 QA: seed_test_users.py was missing the
cancel_at_period_end column in its subscriptions INSERT, but the
column is NOT NULL (added in 016_add_subscription_tables.py).
Result: seed crashed with NotNullViolationError before any users
were created, blocking auth in fresh dev environments.

Pre-existing on main; not introduced by the FlowPilot migration
branch. Default value: false.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:36:04 -04:00
a77e8ea578 chore: bootstrap gstack team mode
Per gstack team-mode install: adds a PreToolUse hook that blocks
skill usage when gstack isn't installed globally, so contributors
are prompted to install it. Un-ignores the two required files
(.claude/settings.json, .claude/hooks/check-gstack.sh) while
keeping settings.local.json and other Claude state ignored.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 23:17:06 -04:00