Updates the handoff trio after the legacy notification flow fix and
the branch push. PR #155 is open against main as draft. Resume point
is now visual QA via /qa, then deferred follow-ups (chat-input
suggested-step chips, snapshot expansion). Logs the open question
about whether EscalateModal should switch to /handoff.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two backend changes that unbreak the senior-pickup path from the
notification panel:
1. notification_service: session.escalated link template now ends with
?pickup=true so the senior lands in the handoff/pickup flow on
click. Without it, navigation hit /pilot/:id directly, which then
404'd on the GET because the senior isn't yet escalated_to_id —
the user perceives this as the bell-icon "just clearing the
notification".
2. ai_sessions GET access: any account member can now read an escalated
session's detail when status is requesting_escalation or escalated.
The owner-only guard was overly restrictive for explicitly-shared
in-transit states. Tenant boundary is enforced by RLS on the
underlying query, so account-scope is the right ceiling here. After
pickup, the existing handler/escalated_to_id checks still apply.
Verified live: re-login as the senior engineer and GET the active
escalated session — now returns 200 with full detail. Focused test
subset plus tests/test_sessions.py and tests/test_session_sharing.py
→ 94 passed in 43.26s, no regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Marks the magic-moment handoff-context screen as shipped, points the
next session at visual QA + push + draft PR, and captures the deferred
follow-ups (suggested-step chips, snapshot expansion, toolbar button
on revisits, owner analytics, Playwright e2e).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the dedicated 4-section handoff-context view that renders BEFORE
the FlowPilot session for senior techs picking up an escalated
session, then dissolves on "Start here". This is the wedge's
demonstrable magic moment — what the GTM Loom records.
- HandoffContextScreen.tsx: pure presentational, takes a HandoffResponse
plus onStartHere / onDismiss callbacks. Sections: header
(problem summary, domain, step count, escalated-time, priority badge),
"What's been tried" (engineer notes + step-count affordance), "AI
assessment" (likely_cause / suggested_steps / confidence badge), Start
here CTA. Confidence badge accepts both numeric (0..1) and string
("low"/"medium"/"high") shapes — backend currently emits the latter.
Renders an explicit "assessment unavailable" branch when
ai_assessment_data is null (the 5s timeout from 9bdd995 fired).
Honors prefers-reduced-motion (animate-fade-in vs animate-slide-up).
ARIA dialog + focus on the primary CTA. Esc dismisses when used as a
re-openable overlay; pre-claim, Start here is the only exit.
- FlowPilotSessionPage.tsx: on /pilot/:id?pickup=true, fetch the
handoff list via handoffsApi.listHandoffs (account-scoped via RLS,
no claim required) and find the latest unclaimed escalate handoff.
If found, render the magic-moment screen and skip the regular
loadSession (the senior isn't yet escalated_to_id, so GET would
404). Start here calls claimHandoff, drops the pickup query param,
dismisses the screen — the existing loadSession effect then fires
because the senior is now escalated_to_id. A "Context" toolbar
button on active sessions re-opens the screen as a dismissible
overlay (visible only when the senior arrived via the magic-moment
flow this session — handoff lookup on demand).
Verified end-to-end against the running dev stack: listHandoffs
returns the unclaimed handoff with full payload; claim flips session
status from escalated → active; subsequent GET succeeds. tsc -b clean.
Defers (TODO followups): suggested-step chips below the chat input
that prefill on click (requires threading through to
FlowPilotMessageBar); snapshot expansion to include the recent
diagnostic steps pre-claim; toolbar Context button on sessions where
the senior didn't arrive via magic-moment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Marks the SSE subscription as shipped, points the next-session resume
target at the magic-moment handoff-context screen, and logs the live
end-to-end verification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the frontend live-arrival slice on top of the test-stabilized SSE
backend. Senior techs now see a junior's escalation slide into the
queue without refresh.
- streamEscalations(handlers, signal) in aiSessions.ts: fetch-based
ReadableStream parser (native EventSource cannot send auth headers).
Handles SSE frames, partial frames across chunks, : keepalive
heartbeats. Dispatches ready and handoff_created.
- HandoffCreatedEvent + EscalationStreamHandlers types mirror the bus
payload published by HandoffManager.dispatch_escalation_notifications.
- EscalationQueue.tsx: AbortController-managed subscription with
exponential-backoff reconnect (1s → 30s cap, attempt counter resets
on ready). On handoff_created, refetch and diff against previous IDs
via sessionsRef; new arrivals prepended (newest-first) above
established cards (oldest-first preserved). Slide-in tag held for
800ms so the locked 200ms animation completes. Tab-title flash
prefixes (N) while document.hidden, restores on focus / unmount.
prefers-reduced-motion swaps slide-in for fade-in. ARIA region +
aria-live=polite + aria-label on heading. Pick Up bumped to py-2.5
to clear the 44px touch floor.
Verified end-to-end against the running dev stack: subscriber received
the ready frame on connect; after posting a handoff via the API, the
subscriber received the handoff_created frame with the expected
payload — wire format matches the parser. Backend regression: focused
subset still 32 passed in 18.91s. Frontend tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Default Claude Code model is being switched from Opus 4.7 1M-context to
Opus 4.7 (200k). Tighten the per-session pickup docs so they're
self-sufficient under the smaller window:
- CURRENT_TASK now reflects the post-Codex state: 8 commits on the
branch (5 feat + WIP SSE + 2 Codex test/latency fixes + 1 doc
refresh), 32/32 backend tests with -n auto, frontend tsc -b clean.
Remaining work re-scoped: the SSE backend half is feature-complete
and tested, so what's left is the FRONTEND SSE subscription in
EscalationQueue.tsx, then the magic-moment handoff-context screen,
then push + draft PR.
- Session log gets a Claude Code entry covering today's planning →
build → pause-for-Codex arc, the design decisions locked into the
doc and code, the two TODOs added (peer-tech escalation, mobile
responsive), and the model-switch context for the next session.
- HANDOFF.md needs no change — Codex's update in 9bdd995 already
describes the resume point and watch-outs cleanly.
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Update HANDOFF to reflect:
- Build paused after the WIP SSE commit (87bd0b7)
- What Codex should look at on the SSE bus + endpoint + dispatch wiring
- Resume point post-review: re-run tests with -n auto, then frontend
SSE subscription, then magic-moment screen
- Test-suite watch-out: per-test DROP SCHEMA fixture means concurrent
pytest runs on the same DB collide; always one-suite-at-a-time or
-n auto with conftest's per-worker DB isolation
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the WebSocket/SSE push slice. Paused mid-flight to hand
the branch to Codex for outside-voice review before stacking more
commits on top. See .ai/HANDOFF.md for the full pause context + what
to look at.
What's here:
- backend/app/core/escalation_bus.py — module-level singleton in-memory
pub/sub keyed by account_id. asyncio.Queue per subscriber with
64-event maxsize and drop-on-full semantics. Designed to be swappable
for Redis pub/sub when Railway scales past single-replica.
- backend/app/api/endpoints/session_handoffs.py — GET
/api/v1/ai-sessions/escalations/stream SSE endpoint. Auth via
require_engineer_or_admin. 25s heartbeat. Account-scoped subscribe
bound to current_user.account_id.
- backend/app/services/handoff_manager.py — dispatch_escalation_notifications
now publishes a `handoff_created` event to the bus BEFORE the email
fan-out, in a try/except so a bus failure can't block email delivery.
- backend/tests/test_escalation_bus.py — 7 unit tests, all green
standalone (0.14s). Cross-tenant isolation, drop-on-full, no-subscribers.
- backend/tests/test_handoff_manager.py — +1 dispatcher integration test
(publishes to bus, payload shape).
- backend/tests/test_session_handoffs_api.py — +2 endpoint tests (viewer
blocked, ready event handshake).
[gstack-context]
Decisions:
- SSE over WebSocket (one-way, browser EventSource semantics, fewer
moving parts behind Railway proxy)
- In-memory bus over Redis for v1 pilot (3 MSPs, single replica)
- Drop-on-full subscriber queue rather than back-pressure publishers
- Bus publish ahead of email send, both wrapped in try/except so
neither can break handoff creation
- Frontend will be a fetch-based ReadableStream reader matching the
existing streamDocumentation pattern, not native EventSource
(custom-header auth)
Remaining (post-Codex):
- Frontend SSE subscription in EscalationQueue.tsx (slide-in,
reconnect, tab-title flash, prefers-reduced-motion)
- Magic-moment handoff-context screen
- Re-run the full backend test suite to verify the SSE +
dispatcher integration tests (bus units already green standalone)
Tried:
- Running the full test suite repeatedly without xdist; the per-test
DROP SCHEMA + recreate fixture made wall-clock prohibitive when
multiple stale runs collided on the same Postgres test schema.
Resolution: -n auto next time.
[/gstack-context]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Capture the in-flight state of the Escalation Mode wedge build so the next
session (or Codex resume) picks up cleanly without re-deriving context:
- CURRENT_TASK now describes the wedge, what's done across the 5 commits on
this branch, what remains (WebSocket push, magic-moment screen, analytics
page, e2e), and the two-metric framing readers MUST internalize before
quoting numbers
- HANDOFF resume point is the WebSocket/SSE push (live-arrival half of the
notification dual-path); includes suggested first slice + watch-outs
(no user_id on ai_session_step, denormalized account_id, peer-escalation
still gated to session owner)
- Both files reference the design doc and the kill-switch criterion
No code change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Surfaces the new GET /analytics/flowpilot/escalations endpoint as a card
above the EscalationQueue list. Closes the loop from yesterday's metric
endpoint commit — seniors and owners see the wedge stat the moment they
open the queue, which is the daily-reps version of the GTM ROI story.
Pieces:
- EscalationMetrics TS interface mirroring the backend Pydantic model
(incl. metric_definition disclaimer field)
- flowpilotAnalyticsApi.getEscalationMetrics(period) client method
- EscalationMetricCard component:
* loading skeleton, error state, zero-data empty state
* avg + median + n_with_action/n_claimed conversion rate
* humanized seconds → "Ns" / "N.N min" formatting
* inline disclaimer reminding callers this is in-product time-to-
first-action only, NOT the savings claim — pair with manual
baseline (per /codex review's two-metric correction)
- Wired into EscalationQueuePage above EscalationQueue
DS-aligned: card-flat, accent-dim usage held to interactive elements,
text-muted-foreground for secondary copy, font-heading on the headline
number, explicit transition properties (no `transition: all`). Respects
prefers-reduced-motion implicitly (only animation is the loading pulse,
which Tailwind's animate-pulse already gates).
tsc -b clean. No new tests in this commit — component is a thin
state-machine over an axios call; integration coverage comes from the
existing backend tests + the e2e Playwright work in the plan.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
First half of the Escalation Mode notification dual-path. WebSocket/SSE
push is the second half (next commit) — email handles offline seniors,
push handles online ones for the magic-moment demo.
HandoffManager.dispatch_escalation_notifications:
- Pulls active engineer/admin/owner-role users in the same account_id
(excludes the escalator + viewers + soft-deleted)
- Sends via existing EmailService.send_notification_email, concurrent
via asyncio.gather; per-message failures don't block the rest
- Wrapped in try/except: any exception is logged + swallowed. Handoff
creation is authoritative; notification is advisory. This is the
graceful-degradation regression both eng + codex reviews flagged as
critical (handoff must succeed even if SMTP is down).
Endpoint wiring (POST /ai-sessions/{id}/handoff):
- Dispatch fires AFTER db.commit() — never email about a rolled-back
handoff. Trust-erosion bug if we got that wrong.
- Only fires for intent=escalate. Park is private to the escalator.
Tests (4 new):
- emails-engineer-recipients-in-account: viewer excluded, escalator
excluded, only the engineer/admin teammates get the message
- skipped-for-park-intent: park doesn't fan out
- graceful-degradation-when-email-raises: RuntimeError from the email
service does NOT bubble out of dispatch
- endpoint-dispatches-on-escalate: end-to-end wiring through POST
Per-channel delivery records (replacing the dead `notification_sent`
boolean per Codex correction) is a v1.x story — for now application
logs are the audit trail. See
docs/plans/2026-04-27-escalation-mode-wedge-design.md.
20 tests green across handoff_manager + session_handoffs_api +
flowpilot_analytics_escalations. No regressions.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /ai-sessions/{id}/handoffs/{hid}/claim previously required only an
authenticated user, so a viewer-role account user could claim escalations.
Codex review flagged this as wedge-relevant: the Escalation Mode race-
condition story (two seniors clicking Pick Up simultaneously) depends on
auth gating for audit integrity. Originally captured as a deferred TODO
during /plan-eng-review, then moved in-scope by /codex review.
Swap the dep to require_engineer_or_admin. One-line change. Two new tests:
- viewer_role gets 403 with "Engineer or admin access required"
- engineer/owner role still succeeds and claimed_at + claimed_by populate
Existing handoff create + queue tests unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GET /api/v1/analytics/flowpilot/escalations?period={7d,30d,90d}
Computes the in-product wedge metric for Escalation Mode: average / median /
p95 seconds between SessionHandoff.claimed_at and the first ai_session_step
created on the same session after that timestamp. Account-scoped, role-gated
to engineer-or-admin.
The metric is intentionally NOT called "minutes recovered" — that's the
two-metric framing locked by /codex review: this in-product number must be
paired with manual baseline (the verbal-handoff stopwatch from The Assignment)
to produce the savings claim. Schema's `metric_definition` field surfaces the
disclaimer in every response so callers don't oversell it.
Implementation notes:
- Uses correlated scalar subquery for first-step-after-claim per handoff,
aggregates avg/median/p95 in Python (~1k rows/account/month is well within
budget; cleaner than percentile_cont gymnastics in SQL)
- Excludes unclaimed handoffs (claimed_at IS NULL)
- Counts claimed-but-no-action handoffs in n_handoffs_claimed but not in
n_handoffs_with_action — surfaces the conversion-rate signal
- Floors negative deltas at 0 to handle clock-drift edge cases
Tests cover happy path, zero-data, claimed-but-no-action accounting, period
window filtering, multi-handoff aggregation, multi-tenant isolation (Phase 4
RLS landmine pattern), viewer-role 403 gate, and period validation. 9 tests,
all green. No regressions in existing handoff_manager / session_handoffs
suites.
First piece of the Approach A wedge build per
docs/plans/2026-04-27-escalation-mode-wedge-design.md. Unblocks the queue
stat-card and the analytics page.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the GTM thesis, premises, reduced-scope engineering plan, locked UI
specs, and embedded review report for the Escalation Mode wedge — output of
/office-hours, /plan-eng-review, /plan-design-review, and /codex review.
Codex review surfaced two corrections we applied:
- two-metric framing (manual baseline vs in-product time-to-first-action)
- claim role gate moved in-scope (was deferred TODO)
TODO updates: peer-tech escalation + claim role gate captured (the latter then
moved in-scope by the codex pass).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CURRENT_TASK rolls forward — PR #153 closed out, no active task,
with recommended next moves (promote e2e gate to required, pick
from TODO).
- HANDOFF rewritten — new home position is `main`; documents the
e2e job's stub ANTHROPIC_API_KEY convention so future
AI-touching e2e tests know what to expect.
- SESSION_LOG entry extended with the CI env-var diagnosis, the
fix, the merge, and pointers to the natural next pickups.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes a silent-drop bug where the dashboard prefill flow created a new chat session but didn't update the in-flight guard ref, so subsequent task-lane submissions had their AI follow-up responses discarded.
Includes a Playwright regression test that drives the prefill flow and stubs /ai-sessions/*/chat to verify the second AI turn renders. Also adds a stub ANTHROPIC_API_KEY to the e2e CI job so AI-gated endpoints clear their _require_ai_enabled() check (the chat call itself is intercepted in the browser, so no real Anthropic traffic).
POST /api/v1/ai-sessions and friends call _require_ai_enabled(), which
returns 503 when no provider key is set. The new prefill-handoff
regression test (e2e/assistant-chat-prefill.spec.ts) drives the
dashboard prefill flow, which has to create a chat session before its
page.route stub on /chat can fire — so without a key, session
creation 503s and the test never sees the task lane.
The Playwright stub intercepts /chat in the browser, so the backend
never actually contacts Anthropic — but the AI-enabled gate still
needs to pass. A stub value is enough.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CURRENT_TASK.md rolled forward — the CI-recovery task is complete
(PR #150 merged as 87bb20b; backend gate is in required checks).
Active task is now landing PR #153.
- HANDOFF.md rewritten — new resume point is watching CI on the
rebased SHA 1559feb and merging when all three checks are green.
- SESSION_LOG.md gains a 2026-04-26 entry covering the prefill bug
diagnosis, fix, regression test, and the rebase off post-#150 main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The guard pattern that masked the prefill-ref bug fixed in PR #153 is
applied across handleSend, handleTaskSubmit, selectChat, refreshFacts,
refreshActiveFix, and refreshPreview. Worth either logging the
mismatch path or distinguishing expected-stale from unexpected-stale
so the next instance of this class of bug surfaces instead of hiding.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dashboard prefill flow in AssistantChatPage set activeChatId after
creating a new session but never updated currentChatRef.current. Every
later handleSend / handleTaskSubmit then tripped the
`currentChatRef.current !== sentForChatId` guard that was supposed to
discard responses for stale chats — and silently dropped the AI's
follow-up. The user saw their submitted message but no assistant
reply, no toast, no task-lane update.
Mirrors what handleNewChat and handleResumeNew already do. Adds an
e2e regression test that drives the dashboard prefill, submits a
partial task-lane response, and asserts the second AI turn renders.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Before: e2e \`needs: [frontend]\` waited for the frontend job to upload
a build artifact, then downloaded it. With multiple runners this means
the third runner sat idle for ~6 min while frontend ran, then started
e2e — total wall-clock max(backend, frontend+e2e) ≈ 11 min.
After: e2e builds its own frontend (npm ci + npm run build are already
in the job; just dropped the artifact download step and added the
build). e2e starts immediately on a free runner. Adds ~1-2 min to the
e2e job duration but removes ~5 min of waiting and eliminates the
cross-job artifact mechanism entirely.
Side benefit: no more \`actions/upload-artifact\` v3/v4 GHES headaches
on the cross-job handoff. The \`if: always()\` upload of the
playwright-report at the end of e2e is kept (failure report retrieval
is still useful), but it's a leaf-output, not a dependency.
Net wall-clock: max(backend=9m, frontend=6m, e2e=7m) ≈ 9 min on the
3-runner setup, down from ~11 min.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mechanical drift between the e2e selectors and the current UI surfaced
on the first CI run after PR #149 unblocked the artifact upload step.
Five tests, three categories of drift:
1. **Page heading renames** (navigation.spec.ts)
- `Sessions` → `Session History` on /sessions
- `Account Settings` → `Account Management` on /account
2. **Route rename** (command-palette.spec.ts:74)
- The "Troubleshoot with FlowPilot" command palette option now lands
on /pilot (Phase 1 of the FlowPilot migration renamed /assistant).
/assistant still 301-redirects, so the assertion accepts either.
3. **Feature moved to /sessions** (history.spec.ts, resume.spec.ts)
- Default tab on /sessions is "AI Sessions"; flow-session filtering
and the Resume button moved behind the "Flow Sessions" tab. Both
tests now click that tab before asserting.
- resume.spec.ts no longer starts at /trees (Resume buttons aren't
rendered there anymore — the flow lives on /sessions). Destination
URL (/trees/:id/navigate) is unchanged.
No product-code changes — these are pure test updates against the
shipped UI. Run the suite locally with
`cd frontend && npm run test:e2e` once a fresh build is available.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend suite is the slow gate (1076 passed locally in 22m27s on
fix/ci-workflow-config). Adding pytest-xdist with per-worker DB
isolation drops it to ~4m20s on the 8-core homelab runner. Verified
locally: `pytest -n auto --no-cov` finished in 4m28s real time
(15m19s user — confirms ~5× parallelism).
How it works:
- conftest.py reads `PYTEST_XDIST_WORKER` (set per worker by xdist —
'gw0', 'gw1', …). When set, derives a per-worker DB URL like
`…/resolutionflow_test_gw0`. The base DB stays for serial / master
runs.
- `_ensure_worker_db_exists` runs synchronously at conftest import,
connects to the postgres maintenance DB, and `CREATE DATABASE`s the
worker-suffixed DB if it doesn't exist. Idempotent across runs.
- The "test" safety guard still applies — every worker DB name
contains "test" so the assertion holds.
- The per-test `DROP SCHEMA public CASCADE` now operates on the
worker's isolated DB, no cross-worker race.
CI workflow: backend job switches to `pytest -n auto`. Coverage still
collected (pytest-cov has built-in xdist support).
Adds `pytest-xdist==3.6.1` to requirements-dev.txt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
With 3 Gitea Actions runners on the same homelab box, two simultaneous
backend (or backend + e2e) jobs both try to bind 0.0.0.0:5432 for their
postgres service containers. The second fails with:
failed to set up container networking: ... Bind for 0.0.0.0:5432
failed: port is already allocated
The host-port mapping isn't actually needed — the workflow uses
\`DATABASE_URL: postgresql+asyncpg://...@postgres:5432/...\` (hostname
\`postgres\` is the service container's docker-network DNS name).
The tests run inside the act container which is on the same docker
network, so they reach postgres without going through the host.
Removing \`ports: 5432:5432\` from both backend and e2e job service
definitions lets multiple postgres services run in parallel on
different docker networks without colliding on the host.
Surfaced when PR #150 ran in parallel with another job after the
multi-runner setup. Backend instant-failed in 2s on the docker run.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TODO.md: Promote pytest-xdist to ✅ (PR #151 carries it). Adds three new
backlog items:
- data-testid hardening for e2e-critical interactive elements (sparked
by PR #152's selector drift work)
- per-test transactional rollback (next big speedup if needed)
- pytest-testmon for PR-time test selection
HANDOFF.md: Three open PRs now (#150, #151, #152), all independent.
Three Gitea runner agents now registered, so jobs run in parallel.
Combined with #151's xdist, the prior 1h 14m wall-clock should drop
to ~6-10 min. Updated merge order: #152 first (smallest), #150 next,
#151 last. After all three land, enable CI / backend then CI / e2e
as required status checks.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
(Was meant to land in fe632c9; the multi-line edit failed silently
because Codex's earlier entry shifted the surrounding context.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md to reflect:
- PR #150 now contains the AI-provider test mock + caching + maxfail.
Backend CI should be fully green for the first time in months.
- PR #151 stacked on #150: pytest-xdist with per-worker DBs. Local
verification: 22m 27s → 4m 28s (5× speedup), 1076 passed both runs.
- DoD is now: merge #150, then #151, then add CI / backend
(pull_request) to required status checks on main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three changes that get PR #150 to a green CI gate:
1. **test_record_decision_persists_and_bumps_state_version** — the
`decision: draft_template` path calls `_extract_template_parameters`
(TemplateExtractionService → AI provider). CI doesn't set
ANTHROPIC_API_KEY/GOOGLE_AI_API_KEY, so the endpoint raised
`RuntimeError: No AI provider configured` and returned 500. The test
isn't exercising the AI integration — patched the extractor with an
AsyncMock returning a minimal valid `{templated_body, parameters}`
dict. Verified locally: the test now passes.
2. **pip + npm caches** in backend, frontend, and e2e jobs. Keyed on
the hash of requirements*.txt / package-lock.json with a runner-os
restore-key fallback. Saves ~30-60s per run on cache hit.
3. **Pytest invocation tightened**:
- Dropped `--cov-report=term-missing` — the custom "Display coverage
summary" step below parses coverage.json and prints the same
module list more concisely. Term-missing dumps every uncovered
line which adds ~5-10s of stdout.
- Added `--maxfail=10` so a structural breakage (fixture explosion,
DB unreachable) bails after 10 errors instead of running the full
25-min suite. Tunable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Capture the backend pytest parallelization work so it survives session
end. Backend suite is currently ~22 min wall-clock for 1076 tests;
xdist with one-DB-per-worker should land in the 3-6 min range on the
homelab Gitea Actions runner.
Also queues two backlog items:
- Frontend lint warnings (23 react-hooks/exhaustive-deps after PR #149)
- Periodic audit of the ResourceWarning filterwarnings added by Codex
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Updates HANDOFF.md, CURRENT_TASK.md, and SESSION_LOG.md so the next
session has accurate resume state. Summary of where things are:
- PR #141 (PSA tickets), PR #147 (FlowPilot Phase 1-9), PR #148 (CI
fixes part 1), PR #149 (CI fixes part 2) all merged to main in this
session.
- Branch protection enabled on main: PR-only, CI / frontend required.
- PR #150 (this branch) is the last CI-config PR — adds
DATABASE_TEST_URL to the workflow and pins upload-artifact to v3.
- Next session: watch #150's CI, merge if green, add CI / backend to
required checks, then start on the 54 real backend test failures.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two CI-config issues blocking the gate from going green:
1. **Backend tests connect to localhost instead of postgres service.**
conftest.py reads DATABASE_TEST_URL only — DATABASE_URL is intentionally
not consulted (per dab740d's test-DB-isolation hardening — running
pytest with DATABASE_URL set previously dropped the dev DB schema).
The CI workflow only sets DATABASE_URL, so conftest falls back to its
localhost default and every fixture-setup fails with
`OSError: Connect call failed ('127.0.0.1', 5432)` — observed as 638
errors on the latest main run.
Add DATABASE_TEST_URL pointing at the postgres service container.
Same connection string as DATABASE_URL — the test DB and the app DB
are the same physical postgres in CI; conftest's safety assertion is
satisfied by the URL containing "test".
2. **Frontend artifact upload fails on Gitea Actions runner.**
actions/upload-artifact@v4 (and v5) are not supported on Gitea
Actions / GHES — the runner returns
`GHESNotSupportedError: ... not currently supported on GHES`. Lint
itself is now passing (0 errors after PR #149); the job exits 1 only
because the upload step then fails.
Pin upload-artifact + download-artifact to v3, the latest version
compatible with Gitea Actions until they ship v4 support.
After this lands, both backend and frontend CI gates should turn
green — at which point we can also add backend to the required status
checks on main.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test_db fixture calls Base.metadata.create_all on a fresh test DB.
That only creates tables for models that have been imported (and thus
registered with Base.metadata) by the time the fixture runs.
app.main imports app.core.database (which gives us Base) but does NOT
eagerly import the model modules — most are pulled in lazily inside
scheduler functions (archive_stale_ai_sessions etc.) and route
modules. At fixture-setup time, only the handful of models touched by
those eager imports are on the metadata, so any test that exercises
PSA, network diagrams, ratings, escalations, etc. fails with
\`UndefinedTableError: relation "X" does not exist\` and a cascade of
500s on every endpoint that queries the missing table.
Adding \`from app import models as _models\` (rather than the bare
\`import app.models\` which would shadow the \`app\` FastAPI instance
imported just above) pulls in app/models/__init__.py, which itself
imports every model module — registering all ~60 tables with
Base.metadata before create_all runs.
Verified locally: tests/test_psa_writeback_phase4.py went from
1 failed / 6 errors → 4 failed / 3 passed (the cascading errors were
masking the actual passes).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The new react-hooks lint rule "Calling setState synchronously within an
effect can trigger cascading renders" flagged real anti-patterns in
four spots. Refactored each per the rule's intent (derive during render,
or use useSyncExternalStore for external subscriptions).
1. hooks/useMediaQuery.ts — replaced the useState + useEffect pair with
useSyncExternalStore. That's the canonical React hook for
subscribing to external stores (matchMedia in this case) without
mirroring into local state via an effect. Snapshot/getServerSnapshot
pair preserves the SSR-safe behaviour.
2. components/network/nodes/DeviceNode.tsx — the prop-sync useEffect
that copied nodeData.label into labelValue was redundant.
labelValue is the EDIT BUFFER; while not editing, the displayed
span now reads nodeData.label directly. The buffer is initialized
only when an edit session starts (onDoubleClick).
3. components/network/nodes/GroupNode.tsx — same pattern, same fix.
4. components/dashboard/TicketQueue.tsx — the
setTickets([]) + setLoading(true) + fetchTickets() chain in the
effect was the cascade. Pushed those writes inside fetchTickets
(after the function boundary, so they batch with the eventual
setTickets(result)). Added a request-id ref so a slow first
response can't overwrite a fast second one.
Frontend lint: 20 errors → 0 errors. tsc -b clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five files, all stylistic:
- useFlowPilotSession.ts: typed the axios error shape with a narrow
inline type instead of \`as any\`.
- FlowPilotSessionPage.tsx: same — typed location.state once, then
destructured.
- ScriptBuilderTab.tsx: handleViewScript was a placeholder no-op;
declared the args properly with \`void script; void filename\` so the
signature matches ScriptBuilderChatProps without no-unused-vars
firing.
- TicketsPage.tsx: replaced 8 ternaries-as-statements (\`x ? f() : g()\`)
with proper if/else blocks. Same control flow, satisfies
no-unused-expressions, and reads better in the URL-param update paths.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
react-refresh/only-export-components fires when a file with the
\`router\` const export also defines a component (the redirect helper).
Moves the small helper to its own file under components/routing/ so
HMR can keep the route-component module hot-reload-eligible.
No behavior change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pytest-asyncio==0.24.0 (added on the FlowPilot branch as part of the
RLS test infra refactor) declares pytest>=8.2 — but requirements-dev.txt
still pinned pytest==7.4.3, so a clean pip install fails with
ResolutionImpossible. CI runners that started from a fresh image would
have refused to install dev deps; the FlowPilot tests passed locally
only because the dev container had a pre-installed pytest 8.x lying
around.
pytest-cov 4.1.0 also needs >= 5.0 to play nicely with pytest 8.
No code changes — pytest 8 is API-compatible with the existing test
suite once the install resolves.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Brings PR #148 — two pre-existing CI fixes (network_diagrams JSONB
server_default, removed deprecated session-scoped event_loop fixture).
The conftest.py event_loop fix on main is already incorporated in
FlowPilot's b14a16a (RLS-gating commit, which dropped the same fixture
as part of its larger refactor). Kept HEAD's version of the RLS-gating
collection hook; the event_loop fixture removal is identical.
The network_diagram.py fix lands cleanly via auto-merge.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>