From db446e1fd63cdbaf072c975b9dd0ebf44c9d15f1 Mon Sep 17 00:00:00 2001 From: Michael Chihlas Date: Tue, 9 Jun 2026 15:56:03 -0400 Subject: [PATCH] docs(handoff): PR #193 all 10 review findings resolved + 2 decisions Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to "re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK updated; two architectural decisions logged (real ai_build columns replacing the meta convention; ad-hoc walk restored); SESSION_LOG entry added. Co-Authored-By: Claude Opus 4.8 (1M context) --- .ai/CURRENT_TASK.md | 2 +- .ai/DECISIONS.md | 52 ++++++++++++++++++ .ai/HANDOFF.md | 53 +++++++++++-------- .ai/SESSION_LOG.md | 9 ++++ ...026-06-09-pr193-phase2a-review-findings.md | 37 +++++++++++++ 5 files changed, 129 insertions(+), 24 deletions(-) diff --git a/.ai/CURRENT_TASK.md b/.ai/CURRENT_TASK.md index 9a94b301..d4613495 100644 --- a/.ai/CURRENT_TASK.md +++ b/.ai/CURRENT_TASK.md @@ -1,6 +1,6 @@ # CURRENT_TASK.md -**Active task:** L1 AI Tree Builder **Phase 2A — implementation complete, in review as PR #193** (`feat/l1-ai-tree-builder-phase-2a` → `main`). All 19 plan tasks done; full backend suite 1387 passed/0 failed; frontend tsc+lint+build clean; migrations roundtrip clean. Resume point = check Gitea CI on #193, review, merge; then prod `alembic upgrade head` + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md`. +**Active task:** L1 AI Tree Builder **Phase 2A — review findings resolved, PR #193 ready to re-push** (`feat/l1-ai-tree-builder-phase-2a` → `main`). The 2026-06-09 multi-agent review found 10 confirmed defects (incl. a showstopper: AI nodes carried no `id` so walks never advanced); **all 10 resolved this session** (root fix: real columns replace the `meta` walked_path convention; ad-hoc walk restored). Full Phase 2A backend set 110 passed/0 failed; frontend tsc+lint+build clean; migration roundtrip clean (new head `61dda4f615c6`). Resume point = commit + push branch, re-run Gitea CI, merge; then prod `alembic upgrade head` (4 migrations) + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md` + `docs/plans/2026-06-09-pr193-phase2a-review-findings.md`. **Parallel (user-side, blocked):** Phase O cutover for self-serve signup — all code blockers closed on `main`; only user-side manual ops remain (apex DNS at Namecheap, Stripe Dashboard live-mode config with the `/contact` + `/policies` URLs, Railway prod env vars, internal validation, public flag flip), gated on the EIN. diff --git a/.ai/DECISIONS.md b/.ai/DECISIONS.md index 884deed5..fab8169a 100644 --- a/.ai/DECISIONS.md +++ b/.ai/DECISIONS.md @@ -13,6 +13,58 @@ --- +## 2026-06-09 — L1 ai_build context lives in columns, not a hidden `meta` walked_path entry + +**Context:** PR #193 review found that the intake category was smuggled into the +ai_build session's `walked_path` as a fake `{"node_type":"meta","category":...}` +entry that every consumer had to remember to skip. Most didn't: it made an +otherwise-empty walk truthy (junk `pending` proposals reached the review queue), +pushed the depth cap off by one (counted as a real step), and rendered as a blank +row in the escalations UI. Compounding it, AI-generated nodes carried no `id`, but +the advance protocol keys on `node_id` — so the walk could never advance past the +first question (the headline feature was non-functional end-to-end). + +**Decision:** Add real `category`, `problem_text`, and `pending_node` columns to +`l1_walk_sessions` (migration `61dda4f615c6`) and **delete the meta-entry convention +entirely**. Intake stores `category`/`problem_text` on the session; `/next-node` +reads them off the row (no ticket re-fetch, no walked_path scan). The server assigns +every node a `uuid4().hex[:8]` id (`ai_tree_builder._assign_id`) — never the model. +`pending_node` persists the served-but-unanswered node so a refresh / StrictMode +double-mount replays it instead of firing a fresh paid LLM call. + +**Rejected:** Symptom-level strip-meta fixes (filter the meta entry at each consumer). +Smaller diff, but leaves the landmine convention in place for the next consumer to +trip over — contrary to the project principle (correct architecture over minimal diff). +Asking the LLM to invent node ids: not stable, not trustworthy. + +**Consequences:** `walked_path` now holds only real steps. Adding a new consumer no +longer requires knowing about a hidden entry. `WalkSessionResponse` exposes +`category`/`problem_text` (escalations UI shows the real problem). The `meta` +node_type and `_strip_meta` are gone. + +--- + +## 2026-06-09 — Keep the L1 ad-hoc walk fallback (don't drop it) + +**Context:** The Phase 2A intake rewrite dropped the `else: start_adhoc_session(...)` +branch, leaving `start_adhoc_session` with zero callers and the out_of_scope prompt +offering only Escalate/Cancel — while `L1CategoriesPage` copy still promised "Disabled +categories fall back to an ad-hoc walk or escalation." A capability silently regressed. + +**Decision:** Restore it (review Finding 5 option a). Intake honors `adhoc=True` +(a new `IntakeRequest` field → `"adhoc"` outcome) and the out_of_scope prompt gained a +"Walk it ad-hoc" button. This preserves the pre-existing free-form-walk capability and +keeps the settings copy honest. + +**Rejected:** Dropping ad-hoc and fixing the copy. It removes a capability techs had, +for a problem class (out-of-scope) where a free-form walk is the natural fallback before +escalation. Cheaper, but a product regression dressed as cleanup. + +**Consequences:** `start_adhoc_session` has a caller again. The walker renders adhoc +sessions via its existing non-ai_build branch (free-form notes, no AI tree). + +--- + ## 2026-05-29 — Single source of truth for plan-tier taxonomy (derive admin UI + validation from `plan_limits`) **Context:** A prod report ("AI sessions aren't working") traced to the owner account having no paid plan (AI is plan-gated), compounded by a real bug: the admin "Change Plan" dropdown ([`AccountDetailPage.tsx:443-445`](../frontend/src/pages/admin/AccountDetailPage.tsx)) still offered the dead `team` slug (renamed to `enterprise` in migration `4ce3e594cb87`, 2026-05-07) and omitted `starter`/`enterprise`. Selecting "Team" 400s against the hardcoded allow-list in [`admin.py:994`](../backend/app/api/endpoints/admin.py#L994). The dropdown was missed during the 2026-05-07 taxonomy reconciliation because the allowed-plan list is hand-duplicated across ≥6 backend + frontend sites. Second taxonomy-drift incident. diff --git a/.ai/HANDOFF.md b/.ai/HANDOFF.md index fe528b98..86f90d52 100644 --- a/.ai/HANDOFF.md +++ b/.ai/HANDOFF.md @@ -2,27 +2,30 @@ # HANDOFF.md -**Last updated:** 2026-05-30 +**Last updated:** 2026-06-09 -**Active task:** L1 AI Tree Builder **Phase 2A — COMPLETE**. All 19 plan tasks done on -branch `feat/l1-ai-tree-builder-phase-2a` (branched from `main` @ `87236b5`), pushed to -Gitea, **PR #193 open** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable): +**Active task:** L1 AI Tree Builder **Phase 2A — review findings RESOLVED, ready to re-push**. +Branch `feat/l1-ai-tree-builder-phase-2a` (off `main` @ `87236b5`), **PR #193**: . -## Resume point — FIX REVIEW FINDINGS on PR #193, do NOT merge yet +## Resume point — re-push the fixes, re-run CI, then merge -A 2026-06-09 multi-agent review (7 finder angles, every finding code-verified) found -**10 confirmed defects** — including a showstopper (AI-generated nodes carry no `id`, -so ai_build walks can never advance past the first question) and proof that Tasks 16–17 -(ProposalDetail L1-source block, L1EscalationsSection mount) were recorded as done here -but were **never committed**. Full findings, evidence (file:line), fixes, and execution -order: [`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md). +All **10 review findings are resolved** (this session, uncommitted on the branch — commit + +push are the next action). Findings doc has a per-finding RESOLUTION section: +[`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md). +Two architecture decisions logged in `.ai/DECISIONS.md` (2026-06-09): real +`category`/`problem_text`/`pending_node` columns replacing the `meta` walked_path +convention; ad-hoc walk restored. -Next session: work that doc top-to-bottom (findings 1–7 are merge blockers), re-run the -Phase 2A test gate + tsc/lint/build + migration roundtrip, then resume the old plan: -merge PR #193, prod `alembic upgrade head` (3 migrations, head `1fd88a68b145`), and the -live AI-quality smoke test before wide enablement (spec §5.3 — all model calls are -mocked in tests). +Next: commit + push the branch, let Gitea CI run, then merge PR #193. After merge: +prod `alembic upgrade head` — now **4 migrations**, new head **`61dda4f615c6`** (adds the +three l1_walk_sessions columns + flips `flow_proposals.l1_session_id` FK to CASCADE + an +escalations partial index). Then the live AI-quality smoke test before wide enablement +(spec §5.3 — all model calls are mocked in tests). + +**Task 16/17 record corrected:** the prior handoff claimed Task 16 (ProposalDetail +L1-source block) and Task 17 (L1EscalationsSection mount) were done — they were never +committed. Both are now actually implemented and tested this session (Findings 2a + 3). ## What shipped (all verified this session) @@ -32,9 +35,10 @@ mocked in tests). `normalize_walked_path`, skips `meta`), `match_or_build` (match-first, gate-on-build, flow_id→str), `l1_session_service` (start/advance ai_build storing `node_text`, flywheel capture on resolve, escalate notify). `l1.session.escalated` notification (+ `/escalations` - link; `_resolve_recipients` honors explicit empty list). API: intake dispatch (build seeds - a hidden `{"node_type":"meta","category":...}` walked_path entry), `/next-node`, + link; `_resolve_recipients` honors explicit empty list). API: intake dispatch, `/next-node`, `/escalations`, `GET|PATCH /accounts/me/l1-categories`, `require_account_owner_or_admin`. + (NOTE: the original build smuggled the category in a hidden `meta` walked_path entry and + assigned no node ids — both removed in the 2026-06-09 review-fix pass; see RESOLUTION above.) - **Frontend (Tasks 13–17):** l1 types/api (intake outcome, TreeNode, categories; nextNode carries `node_text`); L1Dashboard outcome dispatch; L1WalkTreeVariant AI-node rendering + disclaimer banner; owner-gated L1CategoriesPage + route + settings card; ProposalDetail @@ -42,11 +46,14 @@ mocked in tests). - **Tests (Task 18 + throughout):** ~114 Phase 2A backend tests incl. an intake→build→ walk→resolve→proposal / →escalate→notify→list integration test; network-stubbed e2e. -**Verification (Task 19) — numbers below were read from complete run summaries:** -- The 11 Phase 2A backend test files run together = **86 passed / 0 errors / 0 failed** - (`/tmp/p2a.txt`). This is the authoritative Phase-2A gate. -- Frontend `tsc -b` + `npm run lint` + `npm run build` clean; migration `downgrade -3` - → `upgrade head` roundtrips cleanly. +**Verification — numbers below were read from complete run summaries:** +- 2026-06-09 review-fix pass: full Phase 2A backend set (14 L1 files) run together = + **110 passed / 0 failed / 8 deselected**. Frontend `tsc -b` + `eslint` + `vite build` + clean. Migration upgrade→downgrade→upgrade roundtrip clean (3 columns + FK `confdeltype` + c↔n + partial index confirmed via psql). Anti-parrot guardrail green. +- (Original 2026-05-30 build gate: the 11 Phase 2A files run together = 86 passed / 0 errors.) +- Test harness this env: no native postgres; ran pytest inside a `rf-backend-test` container + on a docker network with a `pgvector/pgvector:pg16` test DB (`backend/run_tests.sh` helper). - **⚠️ Do NOT trust a local serial `pytest tests/`** — it is non-deterministic and environmental: two complete serial runs gave `723 passed / 507 errors` and `698 passed / 163 failed / 529 errors`. The thousands of errors are asyncpg diff --git a/.ai/SESSION_LOG.md b/.ai/SESSION_LOG.md index c974af92..8157196d 100644 --- a/.ai/SESSION_LOG.md +++ b/.ai/SESSION_LOG.md @@ -474,3 +474,12 @@ - Outcome: the 11 Phase 2A backend test files run together = **124 passed / 0 errors**; frontend tsc+lint+build clean; migrations downgrade-3→upgrade-head roundtrip clean. Pushed to Gitea, opened **PR #193** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable). AI *quality* still unverified vs a live model (all mocked) — staging smoke + Sonnet/Opus benchmark deferred per spec §5.3. - CORRECTION (integrity): earlier this session I wrote "1376 passed / 0 failed" for the full backend suite — that figure was NEVER from a complete run and is wrong. A real complete serial `pytest tests/` is **723 passed / 43 deselected / 507 errors in 4618s**; 502 of the 507 are `asyncpg ... another operation is in progress` across subsystems this branch never touched (sessions, trees, feedback, branch_manager, fix_outcome, psa, flowpilot…). Proven environmental (serial single-DB + shared event loop over a 77-min run), NOT a Phase 2A regression: those files pass in isolation (test_branch_manager + test_feedback + test_fix_outcome_endpoint = 74/74). CI runs pytest-xdist with per-worker DBs and is the gate. Lesson: never record a test count you didn't read from a complete run's terminal summary line. - Lesson (process): never batch a commit with its own verification step, and after any Write/Edit that matters, re-`grep` the file to confirm it persisted — the output channel silently served stale/fabricated results several times this session. + +## 2026-06-09 — Claude — PR #193 Phase 2A: resolve all 10 review findings +Claude + +- Context: the 2026-06-09 multi-agent review (`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`) found 10 confirmed defects on `feat/l1-ai-tree-builder-phase-2a`, including a showstopper (AI nodes carried no `id`, so ai_build walks never advanced past question 1) and proof that Tasks 16–17 were recorded done but never committed. Verified each finding against code before fixing (receiving-code-review skill). +- Two decisions taken with the user up front (`.ai/DECISIONS.md`): **root fix** for Findings 8/9 — real `category`/`problem_text`/`pending_node` columns on `l1_walk_sessions`, deleting the `{"node_type":"meta"}` walked_path convention (migration `61dda4f615c6`, new head); **restore the ad-hoc walk** (Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button). +- Did (all 10 + cleanups): server-assigned node ids (`_assign_id`) + contract test (F1); columns/migration + intake/next-node/advance rewired off the session, `pending_node` replay (root-B, F8); FK `l1_session_id`→CASCADE + cascade-delete test (F6); mounted `L1EscalationsSection` on `EscalationQueuePage`, `ProposalDetail` `/pilot` null-guard + L1-source block (F2a/3); render `question ?? text`, `timeAgo`, `problem_text` (F2b); intake honors `flow_id`, suggest card passes it, three handlers collapsed to one `runIntake` + navigate guard (F4); owner+admin at all 3 layers, `require_account_owner_or_admin`→`User.can_manage_account`, `User.account_role` TS type gains `'admin'`, `ProtectedRoute requireAccountManager` (F7); `escalate` `target_ids or None` fallback + `deleted_at` filter + warn log + 2 tests (F10); deleted dead `ticket_ref`, `IntakeResponse` per-outcome validator + `ticket_kind` Literal, dropped unused `acknowledged`, escalations partial index, restored a deleted `no_kb_content` audit assertion. +- Outcome: full Phase 2A backend set (14 L1 files) = **110 passed / 0 failed / 8 deselected**; frontend `tsc -b` + `eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean (columns + FK `confdeltype` c↔n + partial index confirmed via psql); anti-parrot guardrail green. Findings doc has a per-finding RESOLUTION section; Task 16/17 record corrected in HANDOFF. Branch uncommitted — commit + push are the next action. +- Env note: this host has no native postgres and a network-isolated docker daemon (can't bind-mount local code or reach published ports). Ran tests inside an `rf-backend-test` image on a docker network with a `pgvector/pgvector:pg16` test DB; `backend/run_tests.sh` docker-cp's changed code into a long-lived runner before pytest. `Dockerfile.test` + `run_tests.sh` are local scaffolding, not committed. diff --git a/docs/plans/2026-06-09-pr193-phase2a-review-findings.md b/docs/plans/2026-06-09-pr193-phase2a-review-findings.md index 18f9eb46..d2aee405 100644 --- a/docs/plans/2026-06-09-pr193-phase2a-review-findings.md +++ b/docs/plans/2026-06-09-pr193-phase2a-review-findings.md @@ -5,6 +5,43 @@ **Process:** 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation). **Verdict: DO NOT MERGE as-is.** The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in `.ai/HANDOFF.md` were never actually committed, and one DB constraint is a deletion time bomb. +--- + +## ✅ RESOLUTION (2026-06-09, same day) + +**All 10 findings resolved.** Two architectural decisions taken (see `.ai/DECISIONS.md`): +the **root fix** for Findings 8/9 (real `category` / `problem_text` / `pending_node` +columns on `l1_walk_sessions`; the `{"node_type":"meta"}` walked_path convention +deleted entirely — migration `61dda4f615c6`), and **restoring the ad-hoc walk** +(Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button). + +- **Finding 1** — `ai_tree_builder._assign_id` stamps `uuid4().hex[:8]` on every node + (generated, depth-cap, generation-failed); `current_node_id` now real. Contract test + added (`test_ai_build_first_node_carries_id_and_advance_grows_walk`). +- **Finding 2a/3** — `L1EscalationsSection` mounted on `EscalationQueuePage`; + `ProposalDetail` `/pilot` link gated on `source_session_id`, L1-source block added. +- **Finding 2b** — renders `step.question ?? step.text`, `timeAgo`, shows `problem_text`. +- **Finding 4** — intake honors explicit `flow_id` (matcher bypassed); suggest card passes + `near_miss.flow_id`; the three intake handlers collapsed into one `runIntake`. +- **Finding 5** — ad-hoc walk restored (option a). +- **Finding 6** — `l1_session_id` FK → `ondelete=CASCADE` (model + migration); cascade-delete test. +- **Finding 7** — owner+admin at all three layers (GET dep, route guard, `usePermissions`); + `require_account_owner_or_admin` delegates to `User.can_manage_account`; `User.account_role` + TS type gains `'admin'`. +- **Finding 8** — `pending_node` column; `/next-node` replays the served node on re-mount + (no duplicate paid generation); reads context off the session (no ticket re-fetch). +- **Finding 9** — meta entry gone → empty walk is falsy (no junk proposal) and the depth + cap counts only real steps. +- **Finding 10** — `escalate` passes `target_ids or None` (default fallback), filters + `deleted_at IS NULL`, warns when empty; two tests. +- **Cleanups** — dead `ticket_ref` deleted, `IntakeResponse` per-outcome validator + `ticket_kind` + Literal restored, unused `acknowledged` dropped, escalations partial index added, restored the + deleted `no_kb_content` audit assertion. + +**Verification:** full Phase 2A backend set **110 passed / 0 failed**; frontend `tsc -b` + +`eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean +(columns + FK `confdeltype` + partial index confirmed); anti-parrot guardrail green. + How to use this file: work the findings in order. Findings 1–7 are merge blockers; 8–10 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms: - **Root cause A:** AI-generated nodes have no `id`, but the advance protocol keys on `node_id`. (Finding 1; touches 8.)