Merge pull request 'chore(ai): post-#156 handoff + log shipped features in CHANGELOG/CURRENT-STATE' (#157) from chore/post-156-handoff into main
All checks were successful
CI / backend (push) Successful in 10m46s
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (push) Successful in 5m47s
CI / e2e (push) Successful in 10m33s

Reviewed-on: #157
by Michael Chihlas
This commit was merged in pull request #157.
This commit is contained in:
2026-05-01 04:38:22 +00:00
5 changed files with 54 additions and 73 deletions

View File

@@ -1,45 +1,16 @@
# CURRENT_TASK.md
**Task:** Add a fourth, non-terminal outcome to the suggested-fix banner — **Awaiting verification** (`applied_pending`). Today the verifying banner forces a synchronous verdict (worked / didn't / partial), but a lot of real fixes are async — engineer ran the script but is waiting on the client to power-cycle, AD replication, an O365 license sync. Without a fourth state the banner sits stale or the engineer guesses wrong.
**Active task:** None — pick next from `.ai/TODO.md` or `03-DEVELOPMENT-ROADMAP.md`.
**Status:****Engineering complete; PR #156 open.** Backend tests, prompt guardrail, frontend tsc/build clean; Alembic migration applies. Pending browser QA.
## Recently shipped
**Branch:** `feat/fix-pending-verification` (off `main` after the Escalation Mode merge).
**PR:** https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/156
## What ships
| Layer | Change |
|---|---|
| Schema | New `FixStatus="applied_pending"` + new `pending_reason` Text column on `session_suggested_fixes`. |
| Migration | `c0f3a4b7e91d` — adds `pending_reason`, extends status CHECK constraint. |
| API | `PATCH /suggested-fixes/{id}/outcome` accepts `applied_pending`, requires `notes`, stamps `applied_at` only (NOT `verified_at`). Pending in/out transitions allowed (parked, like partial). |
| Generators | `resolution_note_generator` and `escalation_package_generator` system prompts handle the new status without real-looking examples; resolution note frames the fix as provisional; escalation package surfaces pending verification as the leading hypothesis with reference to what's being waited on. |
| Frontend | New `BannerMode='pending'` + `PendingBanner` component (info-tone, mirrors `PartialBanner`) with worked / didn't / update reason / dismiss actions; "Waiting to verify…" overflow option in `VerifyingBanner`; nudge "Still checking" now records pending with a reason instead of just silencing; `AssistantChatPage` banner-mode derivation maps `applied_pending → 'pending'`; page-level Resolve/Escalate now handle pending honestly. |
| Tests | 4 new integration tests in `test_fix_outcome_endpoint.py`: pending-requires-notes, pending stores reason + applied_at-not-verified_at, pending→success transition, pending_reason update on re-PATCH. 21/21 pass. Prompt anti-parrot guardrail passes. Frontend `tsc -b` and Vite build pass via Docker. |
## Out of scope (intentionally)
- Cross-session "Follow-ups" dashboard rollup. The chat-anchored `PendingBanner` is the per-session reminder. Add the dashboard surface only if engineers report losing track across multiple pending sessions.
- Optional follow-up timer ("remind me in 30m"). Nice but not the wedge.
## Resume point — DO THIS NEXT
**Browser QA:** verify the new flow end-to-end in dev:
1. Trigger a suggested fix, click Apply, then open the verifying banner overflow → "Waiting to verify…" → enter a reason → confirm `PendingBanner` renders with the reason.
2. From `PendingBanner`, click "It worked" → confirm transition to terminal success and banner dismissal.
3. From `PendingBanner`, click "Update reason" → confirm reason updates server-side.
4. From `PendingBanner`, click "Dismiss" → confirm banner dismissal and no terminal timestamps.
5. Trigger the nudge state (3+ post-apply messages) → click "Still checking" → enter a reason → confirm pending state takes over.
6. From a pending fix, use page-level Resolve → confirm the fix is patched to `applied_success` before the resolution note.
7. From a pending fix, use page-level Escalate → confirm the outcome intercept appears before the conclude modal.
After QA passes, commit/push the local review fixes and merge PR #156.
## Just-shipped (2026-04-30)
**PR #155 — Escalation Mode wedge** merged into main as `ac42f97`. The wedge for ResolutionFlow's GTM (first paying-customer push). Senior tech sees structured handoff context in seconds via the magic-moment screen. Plan: [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md).
- **2026-05-01 — PR #156** Suggested-fix `applied_pending` non-terminal outcome. Merged into `main` as `3ba4532`. Adds:
- Schema/API: `FixStatus="applied_pending"`, `pending_reason` Text column, migration `c0f3a4b7e91d`. `PATCH /suggested-fixes/{id}/outcome` accepts pending, requires notes, stamps `applied_at` only.
- UI: `PendingBanner` (info-tone, worked / didn't / update reason / dismiss). "Waiting to verify…" overflow option in `VerifyingBanner`. Nudge "Still checking" records pending with a reason. Page-level Resolve auto-patches pending → success before resolution flow; page-level Escalate intercepts pending the same way verifying/partial does.
- Generators: `resolution_note_generator` and `escalation_package_generator` system prompts handle the new status without real-looking examples.
- Tests: 4 new in `test_fix_outcome_endpoint.py` (21/21 suite green); prompt anti-parrot guardrail green; tsc + Vite build clean.
- QA report: `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md` (5/7 scripted checks PASS with concrete evidence; 2 entry-path checks deferred — same handlers verified via tested transitions).
- **2026-04-30 — PR #155** Escalation Mode wedge merged as `ac42f97`. Senior-tech magic-moment screen. Plan: [`docs/plans/2026-04-27-escalation-mode-wedge-design.md`](../docs/plans/2026-04-27-escalation-mode-wedge-design.md).
## Two-metric framing (Escalation Mode — read before quoting numbers)
@@ -48,3 +19,8 @@ The in-product `GET /analytics/flowpilot/escalations` endpoint measures *post-cl
## Kill-switch (Escalation Mode)
Week 8: if 0 of 3 pilots produce a verifiable hours-saved-per-week number above 1.0, revisit the wedge.
## Notes for next session
- Drive checks 1 (VerifyingBanner overflow → "Waiting to verify…") and 5 (nudge "Still checking" with 3+ post-apply messages) in real pilot usage to close the QA gap left by `/qa` (the tested handlers cover the same mutations, but the entry-path UI rendering wasn't exercised end-to-end).
- Consider monitoring how often pending fixes get parked vs resolved — if engineers report losing track across sessions, revisit the cross-session "Follow-ups" dashboard rollup that was scoped out.

View File

@@ -2,51 +2,36 @@
# HANDOFF.md
**Last updated:** 2026-05-01 (session 5 — PR #156 review fixes applied)
**Last updated:** 2026-05-01 (session 6 — PR #156 QA'd, merged, branch deleted)
**Active task:** Suggested-fix `applied_pending` outcome. Branch: `feat/fix-pending-verification`. PR #156 open; review fixes applied locally and ready for browser QA, then commit/merge.
**Active task:** None. Pick next from `.ai/TODO.md` or roadmap.
**Just-merged:** PR #155 (Escalation Mode wedge) merged into main as `ac42f97`.
**Just-merged:** PR #156 (suggested-fix `applied_pending` non-terminal outcome) merged into `main` as `3ba4532`.
## Where this session ended
PR #156 was reviewed for missed bugs and three fixes were applied:
PR #156 QA'd in the dev environment and merged.
1. Page-level **Resolve** now treats `applied_pending` as a verifying state and patches the fix to `applied_success` before opening the resolution flow, avoiding provisional notes on a resolved session.
2. Page-level **Escalate** intercept now catches `applied_pending` as well as verifying/partial, so a pending fix cannot bypass outcome capture before handoff. Intercept copy was generalized from "Verifying state" to "still needs an outcome."
3. `PendingBanner` now includes a **Dismiss** action, matching the PR body and backend's allowed pending → dismissed transition.
4. Real-looking pending examples were removed from `resolution_note_generator` and `escalation_package_generator` system prompts to stay aligned with the prompt anti-parrot guardrail.
1. Working tree had two commits' worth of pending work: the prior session's local review fixes (5 source files + 3 `.ai/` notes describing them) and this session's docker-exec docs (`.ai/PROJECT_CONTEXT.md` + `AGENTS.md`). Committed each as a separate logical commit, attributed to the agent that authored each.
2. Browser QA via `/qa`: 5 of 7 scripted checks PASS with concrete DB-level + UI-level evidence — PendingBanner rendering, "It worked" / "Update reason" / "Dismiss" actions, page-level Resolve auto-patch, Escalate intercept with new generalized copy. 2 entry-path checks (VerifyingBanner overflow → "Waiting to verify…", nudge "Still checking") deferred because they require live AI-generated chat state. The mutating handlers behind those entry paths are verified via the tested transitions, so risk is rendering-only.
3. Pushed `feat/fix-pending-verification` to remote. Required Gitea CI checks (`CI / frontend`, `CI / backend`) plus `CI / e2e` all green at merge. Merged via Gitea API as a merge commit (`3ba4532`).
4. Local `main` fast-forwarded to remote; `feat/fix-pending-verification` deleted locally and on the remote.
**Validation on PR #156:**
**Validation evidence:**
- `docker exec resolutionflow_backend pytest --override-ini="addopts=" tests/test_prompt_anti_parrot.py -q` ✅ 2/2 pass.
- `docker exec resolutionflow_backend pytest --override-ini="addopts=" tests/test_fix_outcome_endpoint.py -q` ✅ 21/21 pass.
- `docker exec -w /app resolutionflow_frontend npx tsc -b` ✅ exit 0.
- `docker exec -w /app resolutionflow_frontend npm run build` ✅ exit 0; only existing Vite large-chunk warning.
- `git diff --check` ✅ clean.
- Previous session also verified `alembic upgrade heads` applied migration `c0f3a4b7e91d` cleanly.
- `/gstack/qa-reports/qa-report-pending-verification-2026-04-30.md` — full report with screenshots in `screenshots/`.
- Gitea PR #156 state: `closed`, `merge_commit_sha=3ba45326`, `merged_at=2026-05-01T03:42:10Z`.
## Resume point — DO THIS NEXT
**Browser QA on PR #156** (see CURRENT_TASK.md "Resume point" for the checklist). Include the new review-fix paths: PendingBanner Dismiss, page-level Resolve from pending, and Escalate from pending. Then commit local fixes, push, and merge when QA is green.
Pick a task from `.ai/TODO.md` or `03-DEVELOPMENT-ROADMAP.md`. Two non-blocking follow-ups for the just-shipped feature:
## Key files for PR #156
- Drive checks 1 and 5 from the QA report in real pilot usage to close the entry-path UI rendering gap.
- Watch whether engineers lose track of multiple pending fixes across sessions; if so, revisit the cross-session "Follow-ups" rollup that was scoped out of PR #156.
- `backend/app/models/session_suggested_fix.py` — CHECK constraint extended; `pending_reason` Text column.
- `backend/app/schemas/session_suggested_fix.py``applied_pending` added to both `FixStatus` and `FixOutcome` literals; `pending_reason` on response model; updated docstring on `SessionSuggestedFixOutcomeRequest`.
- `backend/alembic/versions/71efd2102f49_add_pending_status_to_suggested_fixes.py` — new migration (rev `c0f3a4b7e91d`, down `71efd2102f49`).
- `backend/app/api/endpoints/session_suggested_fixes.py``patch_outcome` accepts pending, requires notes, stamps applied_at only.
- `backend/app/services/{resolution_note,escalation_package}_generator.py` — system-prompt handling for the new status; `pending_reason` line in input bundle; real-looking examples removed.
- `backend/tests/test_fix_outcome_endpoint.py` — 4 new tests.
- `frontend/src/api/sessionSuggestedFixes.ts` — types updated; `pending_reason` on `SessionSuggestedFix`.
- `frontend/src/components/pilot/ProposalBanner.tsx``'pending'` `BannerMode`; `PendingBanner` component + Dismiss; "Waiting to verify…" overflow option; nudge "Still checking" wired to record pending.
- `frontend/src/components/pilot/EscalateInterceptDialog.tsx` — copy generalized for pending/partial/verifying outcome capture.
- `frontend/src/pages/AssistantChatPage.tsx` — banner-mode derivation maps `applied_pending → 'pending'`; Resolve/Escalate paths now handle pending.
## Environment notes (carry-forward)
## Watch-outs
- Dev stack: backend `:8000`, frontend `:5173`, postgres `:5433` (docker-compose). HMR works. On this host use `docker exec` for Python/npm commands; see `.ai/PROJECT_CONTEXT.md`.
- Test users (Acme MSP, password `TestPass123!`): `engineer@resolutionflow.example.com`, `teamadmin@resolutionflow.example.com`.
- Multi-head alembic state on main is pre-existing (heads `070`, `c0f3a4b7e91d`, `024`); not introduced by this work but worth knowing if `alembic upgrade head` complains — use `upgrade heads` (plural).
- `pending_reason` is preserved as audit trail when an engineer advances pending → success/failed/dismissed; it is not auto-cleared. Intentional.
- Working tree also has documentation edits in `.ai/PROJECT_CONTEXT.md` and `AGENTS.md` describing Docker exec commands. Those were not part of the feature fix but should be preserved.
- This code-server LXC has bun + docker but no native `python`/`node`/`npm`. Use `docker exec resolutionflow_{backend,frontend} …` for all build/test commands. Documented in `.ai/PROJECT_CONTEXT.md`.
- Headless Chromium (used by `/qa` `/browse`) needs `CONTAINER=1` in the env that launches the browse server, otherwise it aborts at `sandbox/linux/services/credentials.cc` due to the LXC namespace constraint. The browse server is currently running with that env set; restarting it manually requires `CONTAINER=1 $B status` again.
- Code-server's `/etc/hosts` has `100.64.78.44 docker-01` so the headless browser can resolve the bake-in `VITE_API_URL`. Persistent — no need to re-add unless the container is rebuilt.
- Multi-head alembic state on `main` (heads `070`, `c0f3a4b7e91d`, `024`) is pre-existing. Use `alembic upgrade heads` (plural) if `head` complains.

View File

@@ -12,6 +12,17 @@
---
## 2026-05-01 03:45 UTC — Claude Opus 4.7 — QA, merge, and ship PR #156 pending-verification
- Committed two logical units of pending work on `feat/fix-pending-verification`: prior session's local review fixes as `5bee264` (Codex-attributed, 5 source files + 3 `.ai/` notes) and this session's docker-exec docs as `15042af` (Claude-attributed, `.ai/PROJECT_CONTEXT.md` + `AGENTS.md`). Cleaned up a 20MB `core.22120` Chromium dump left behind by an earlier sandbox crash.
- Resolved a tooling gap surfaced by Codex's prior session ("npm/python/python3 are not on the host path") by documenting that this code-server LXC uses bun + docker for the toolchain. The `docker exec resolutionflow_{backend,frontend}` form is now the canonical command pattern in `.ai/PROJECT_CONTEXT.md`.
- Got `$B`/Playwright Chromium running in the code-server LXC. After the user's restart cleared the AppArmor unprivileged-userns block, Chromium still aborted at the deeper `sandbox/linux/services/credentials.cc` layer because of the LXC namespace constraint. Workaround: launch browse with `CONTAINER=1` so it auto-adds `--no-sandbox`. Also added `100.64.78.44 docker-01` to code-server's `/etc/hosts` (via `docker exec -u 0`) so the headless browser could resolve the bake-in `VITE_API_URL`.
- Drove `/qa` against the dev stack at `http://100.64.78.44:5173`. No naturally-occurring `applied_pending` fix existed in the DB, so seeded session `4a558056-bcbd-4b51-925b-248d70eb318d` and fix `cd4ff2fd-751a-4bcb-8cfa-3c77b4864fb2` into the test state (un-resolved session, swapped supersession on the two fixes). Saved a restore script first; verified DB matches pre-test state after teardown.
- QA result: 5/7 scripted checks PASS with concrete DB + UI evidence. Banner renders correctly ("Awaiting verification" header, "Parked" tag, fix title + pending_reason, 4 actions). "Update reason" updates server-side. "It worked" → `applied_success` with `verified_at` stamped. "Dismiss" → `dismissed` with no terminal timestamp. Page-level Resolve auto-patches `applied_pending``applied_success` before the resolution flow opens. Page-level Escalate fires `EscalateInterceptDialog` with the generalized "still needs an outcome" copy. 2 entry-path checks (VerifyingBanner overflow, nudge "Still checking") deferred because they require live AI-generated chat state to drive; the mutating handlers behind those entry paths are verified via the tested transitions. Report at `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md`.
- Pushed `feat/fix-pending-verification`. Polled Gitea actions runs 161; required `CI / frontend` and `CI / backend` plus `CI / e2e` all green. Merged via Gitea API as a merge commit (`3ba4532`).
- Post-merge cleanup: fast-forwarded local `main`, deleted `feat/fix-pending-verification` locally and on the remote. Wrote handoff updates on `chore/post-156-handoff` matching the prior `chore/post-153-handoff` pattern.
- Files touched (this session): `.ai/CURRENT_TASK.md`, `.ai/HANDOFF.md`, `.ai/PROJECT_CONTEXT.md`, `.ai/SESSION_LOG.md`, `AGENTS.md`, `.gstack/qa-reports/qa-report-pending-verification-2026-04-30.md`, `.gstack/qa-reports/screenshots/01-08*.png`. Plus the two prior-session-authored commits committed by this session (5 source + 3 `.ai/` notes).
## 2026-05-01 02:24 UTC — Codex — Review-fix PR #156 pending-verification flow
- Reviewed PR #156 for bugs and found three actionable gaps: pending fixes could be resolved from the page-level Resolve path without updating the fix outcome, the PendingBanner lacked the dismiss action described in the PR body, and new system-prompt examples used real-looking pending reasons contrary to the prompt anti-parrot lesson.

View File

@@ -29,6 +29,8 @@ All notable changes to ResolutionFlow are documented here.
## [Unreleased]
### Added
- **Escalation Mode wedge** (#155) — when an engineer escalates, the senior tech who claims the session lands on a magic-moment handoff-context screen with the structured briefing visible in seconds (no scrolling, no chat re-read). Live SSE pushes new arrivals to anyone watching the queue, atomic claim resolves race conditions, the queue auto-excludes the claimed session, the claiming user retains chat ownership for AI briefings, and a new analytics endpoint tracks post-claim time-to-first-action so you can see real minutes recovered (paired with a manual baseline — see CURRENT_TASK.md two-metric framing).
- **Suggested-fix "Awaiting verification" outcome** (#156) — when a fix needs external confirmation (client power-cycle, AD replication, license sync) you can park it in `applied_pending` instead of forcing a worked / didn't / partial verdict. The new PendingBanner shows the parked status with worked / didn't / update reason / dismiss actions. The "Still checking" nudge records pending with a reason instead of just silencing. Page-level Resolve auto-patches pending → success before the resolution flow opens; page-level Escalate intercepts pending the same way it intercepts verifying/partial. Resolution notes and escalation packages frame the pending state honestly (provisional fix; leading hypothesis with what's being waited on).
- Tree Templates + Import/Export marketplace (#66)
- Recurring Issue Detection — client-specific pattern alerts (#60)
- Step Feedback Flag — "This Step is Wrong" reporting (#58)

View File

@@ -2,7 +2,7 @@
> **Purpose:** Quick-reference file showing exactly where the project stands.
> **For Claude Code:** Read this first to understand what's done and what's next.
> **Last Updated:** April 12, 2026
> **Last Updated:** May 1, 2026
---
@@ -10,6 +10,13 @@
---
## Recently shipped (post-0.1.0.0)
- **2026-05-01 — PR #156** Suggested-fix "Awaiting verification" outcome. Engineers can now park a fix in `applied_pending` (waiting on client power-cycle, AD replication, license sync, etc.) instead of forcing a synchronous worked/didn't/partial verdict. PendingBanner with worked / didn't / update reason / dismiss; nudge "Still checking" records pending with a reason; page-level Resolve auto-patches pending → success before the resolution flow opens; page-level Escalate intercepts pending. Migration `c0f3a4b7e91d` (`pending_reason` column + status CHECK constraint).
- **2026-04-30 — PR #155** Escalation Mode wedge. Magic-moment handoff-context screen for senior pickup, live SSE escalation arrivals, post-claim time-to-first-action metric (`GET /analytics/flowpilot/escalations`), atomic role-gated claim with conflict resolution, queue self-exclusion, chat ownership extended to claimed sessions. The wedge for the first paying-customer push.
---
## What's Complete
### Core Platform