2026-06-12 23:41:16 +00:00
5 changed files with 129 additions and 24 deletions
--- a/.ai/CURRENT_TASK.md
+++ b/.ai/CURRENT_TASK.md
@@ -1,6 +1,6 @@
 # CURRENT_TASK.md

-**Active task:** L1 AI Tree Builder **Phase 2A — implementation complete, in review as PR #193** (`feat/l1-ai-tree-builder-phase-2a` → `main`). All 19 plan tasks done; full backend suite 1387 passed/0 failed; frontend tsc+lint+build clean; migrations roundtrip clean. Resume point = check Gitea CI on #193, review, merge; then prod `alembic upgrade head` + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md`.
+**Active task:** L1 AI Tree Builder **Phase 2A — review findings resolved, PR #193 ready to re-push** (`feat/l1-ai-tree-builder-phase-2a` → `main`). The 2026-06-09 multi-agent review found 10 confirmed defects (incl. a showstopper: AI nodes carried no `id` so walks never advanced); **all 10 resolved this session** (root fix: real columns replace the `meta` walked_path convention; ad-hoc walk restored). Full Phase 2A backend set 110 passed/0 failed; frontend tsc+lint+build clean; migration roundtrip clean (new head `61dda4f615c6`). Resume point = commit + push branch, re-run Gitea CI, merge; then prod `alembic upgrade head` (4 migrations) + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md` + `docs/plans/2026-06-09-pr193-phase2a-review-findings.md`.

 **Parallel (user-side, blocked):** Phase O cutover for self-serve signup — all code blockers closed on `main`; only user-side manual ops remain (apex DNS at Namecheap, Stripe Dashboard live-mode config with the `/contact` + `/policies` URLs, Railway prod env vars, internal validation, public flag flip), gated on the EIN.

--- a/.ai/DECISIONS.md
+++ b/.ai/DECISIONS.md
@@ -13,6 +13,58 @@

 ---

+## 2026-06-09 — L1 ai_build context lives in columns, not a hidden `meta` walked_path entry
+
+**Context:** PR #193 review found that the intake category was smuggled into the
+ai_build session's `walked_path` as a fake `{"node_type":"meta","category":...}`
+entry that every consumer had to remember to skip. Most didn't: it made an
+otherwise-empty walk truthy (junk `pending` proposals reached the review queue),
+pushed the depth cap off by one (counted as a real step), and rendered as a blank
+row in the escalations UI. Compounding it, AI-generated nodes carried no `id`, but
+the advance protocol keys on `node_id` — so the walk could never advance past the
+first question (the headline feature was non-functional end-to-end).
+
+**Decision:** Add real `category`, `problem_text`, and `pending_node` columns to
+`l1_walk_sessions` (migration `61dda4f615c6`) and **delete the meta-entry convention
+entirely**. Intake stores `category`/`problem_text` on the session; `/next-node`
+reads them off the row (no ticket re-fetch, no walked_path scan). The server assigns
+every node a `uuid4().hex[:8]` id (`ai_tree_builder._assign_id`) — never the model.
+`pending_node` persists the served-but-unanswered node so a refresh / StrictMode
+double-mount replays it instead of firing a fresh paid LLM call.
+
+**Rejected:** Symptom-level strip-meta fixes (filter the meta entry at each consumer).
+Smaller diff, but leaves the landmine convention in place for the next consumer to
+trip over — contrary to the project principle (correct architecture over minimal diff).
+Asking the LLM to invent node ids: not stable, not trustworthy.
+
+**Consequences:** `walked_path` now holds only real steps. Adding a new consumer no
+longer requires knowing about a hidden entry. `WalkSessionResponse` exposes
+`category`/`problem_text` (escalations UI shows the real problem). The `meta`
+node_type and `_strip_meta` are gone.
+
+---
+
+## 2026-06-09 — Keep the L1 ad-hoc walk fallback (don't drop it)
+
+**Context:** The Phase 2A intake rewrite dropped the `else: start_adhoc_session(...)`
+branch, leaving `start_adhoc_session` with zero callers and the out_of_scope prompt
+offering only Escalate/Cancel — while `L1CategoriesPage` copy still promised "Disabled
+categories fall back to an ad-hoc walk or escalation." A capability silently regressed.
+
+**Decision:** Restore it (review Finding 5 option a). Intake honors `adhoc=True`
+(a new `IntakeRequest` field → `"adhoc"` outcome) and the out_of_scope prompt gained a
+"Walk it ad-hoc" button. This preserves the pre-existing free-form-walk capability and
+keeps the settings copy honest.
+
+**Rejected:** Dropping ad-hoc and fixing the copy. It removes a capability techs had,
+for a problem class (out-of-scope) where a free-form walk is the natural fallback before
+escalation. Cheaper, but a product regression dressed as cleanup.
+
+**Consequences:** `start_adhoc_session` has a caller again. The walker renders adhoc
+sessions via its existing non-ai_build branch (free-form notes, no AI tree).
+
+---
+
 ## 2026-05-29 — Single source of truth for plan-tier taxonomy (derive admin UI + validation from `plan_limits`)

 **Context:** A prod report ("AI sessions aren't working") traced to the owner account having no paid plan (AI is plan-gated), compounded by a real bug: the admin "Change Plan" dropdown ([`AccountDetailPage.tsx:443-445`](../frontend/src/pages/admin/AccountDetailPage.tsx)) still offered the dead `team` slug (renamed to `enterprise` in migration `4ce3e594cb87`, 2026-05-07) and omitted `starter`/`enterprise`. Selecting "Team" 400s against the hardcoded allow-list in [`admin.py:994`](../backend/app/api/endpoints/admin.py#L994). The dropdown was missed during the 2026-05-07 taxonomy reconciliation because the allowed-plan list is hand-duplicated across ≥6 backend + frontend sites. Second taxonomy-drift incident.
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -2,27 +2,30 @@

 # HANDOFF.md

-**Last updated:** 2026-05-30
+**Last updated:** 2026-06-09

-**Active task:** L1 AI Tree Builder **Phase 2A — COMPLETE**. All 19 plan tasks done on
-branch `feat/l1-ai-tree-builder-phase-2a` (branched from `main` @ `87236b5`), pushed to
-Gitea, **PR #193 open** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable):
+**Active task:** L1 AI Tree Builder **Phase 2A — review findings RESOLVED, ready to re-push**.
+Branch `feat/l1-ai-tree-builder-phase-2a` (off `main` @ `87236b5`), **PR #193**:
 <https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/193>.

-## Resume point — FIX REVIEW FINDINGS on PR #193, do NOT merge yet
+## Resume point — re-push the fixes, re-run CI, then merge

-A 2026-06-09 multi-agent review (7 finder angles, every finding code-verified) found
-**10 confirmed defects** — including a showstopper (AI-generated nodes carry no `id`,
-so ai_build walks can never advance past the first question) and proof that Tasks 16–17
-(ProposalDetail L1-source block, L1EscalationsSection mount) were recorded as done here
-but were **never committed**. Full findings, evidence (file:line), fixes, and execution
-order: [`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md).
+All **10 review findings are resolved** (this session, uncommitted on the branch — commit +
+push are the next action). Findings doc has a per-finding RESOLUTION section:
+[`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md).
+Two architecture decisions logged in `.ai/DECISIONS.md` (2026-06-09): real
+`category`/`problem_text`/`pending_node` columns replacing the `meta` walked_path
+convention; ad-hoc walk restored.

-Next session: work that doc top-to-bottom (findings 1–7 are merge blockers), re-run the
-Phase 2A test gate + tsc/lint/build + migration roundtrip, then resume the old plan:
-merge PR #193, prod `alembic upgrade head` (3 migrations, head `1fd88a68b145`), and the
-live AI-quality smoke test before wide enablement (spec §5.3 — all model calls are
-mocked in tests).
+Next: commit + push the branch, let Gitea CI run, then merge PR #193. After merge:
+prod `alembic upgrade head` — now **4 migrations**, new head **`61dda4f615c6`** (adds the
+three l1_walk_sessions columns + flips `flow_proposals.l1_session_id` FK to CASCADE + an
+escalations partial index). Then the live AI-quality smoke test before wide enablement
+(spec §5.3 — all model calls are mocked in tests).
+
+**Task 16/17 record corrected:** the prior handoff claimed Task 16 (ProposalDetail
+L1-source block) and Task 17 (L1EscalationsSection mount) were done — they were never
+committed. Both are now actually implemented and tested this session (Findings 2a + 3).

 ## What shipped (all verified this session)

@@ -32,9 +35,10 @@ mocked in tests).
  `normalize_walked_path`, skips `meta`), `match_or_build` (match-first, gate-on-build,
  flow_id→str), `l1_session_service` (start/advance ai_build storing `node_text`, flywheel
  capture on resolve, escalate notify). `l1.session.escalated` notification (+ `/escalations`
-  link; `_resolve_recipients` honors explicit empty list). API: intake dispatch (build seeds
-  a hidden `{"node_type":"meta","category":...}` walked_path entry), `/next-node`,
+  link; `_resolve_recipients` honors explicit empty list). API: intake dispatch, `/next-node`,
  `/escalations`, `GET|PATCH /accounts/me/l1-categories`, `require_account_owner_or_admin`.
+  (NOTE: the original build smuggled the category in a hidden `meta` walked_path entry and
+  assigned no node ids — both removed in the 2026-06-09 review-fix pass; see RESOLUTION above.)
 - **Frontend (Tasks 13–17):** l1 types/api (intake outcome, TreeNode, categories; nextNode
  carries `node_text`); L1Dashboard outcome dispatch; L1WalkTreeVariant AI-node rendering +
  disclaimer banner; owner-gated L1CategoriesPage + route + settings card; ProposalDetail
@@ -42,11 +46,14 @@ mocked in tests).
 - **Tests (Task 18 + throughout):** ~114 Phase 2A backend tests incl. an intake→build→
  walk→resolve→proposal / →escalate→notify→list integration test; network-stubbed e2e.

-**Verification (Task 19) — numbers below were read from complete run summaries:**
- The 11 Phase 2A backend test files run together = **86 passed / 0 errors / 0 failed**
-  (`/tmp/p2a.txt`). This is the authoritative Phase-2A gate.
- Frontend `tsc -b` + `npm run lint` + `npm run build` clean; migration `downgrade -3`
-  → `upgrade head` roundtrips cleanly.
+**Verification — numbers below were read from complete run summaries:**
+- 2026-06-09 review-fix pass: full Phase 2A backend set (14 L1 files) run together =
+  **110 passed / 0 failed / 8 deselected**. Frontend `tsc -b` + `eslint` + `vite build`
+  clean. Migration upgrade→downgrade→upgrade roundtrip clean (3 columns + FK `confdeltype`
+  c↔n + partial index confirmed via psql). Anti-parrot guardrail green.
+- (Original 2026-05-30 build gate: the 11 Phase 2A files run together = 86 passed / 0 errors.)
+- Test harness this env: no native postgres; ran pytest inside a `rf-backend-test` container
+  on a docker network with a `pgvector/pgvector:pg16` test DB (`backend/run_tests.sh` helper).
 - **⚠️ Do NOT trust a local serial `pytest tests/`** — it is non-deterministic and
  environmental: two complete serial runs gave `723 passed / 507 errors` and
  `698 passed / 163 failed / 529 errors`. The thousands of errors are asyncpg
--- a/.ai/SESSION_LOG.md
+++ b/.ai/SESSION_LOG.md
@@ -474,3 +474,12 @@
 - Outcome: the 11 Phase 2A backend test files run together = **124 passed / 0 errors**; frontend tsc+lint+build clean; migrations downgrade-3→upgrade-head roundtrip clean. Pushed to Gitea, opened **PR #193** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable). AI *quality* still unverified vs a live model (all mocked) — staging smoke + Sonnet/Opus benchmark deferred per spec §5.3.
 - CORRECTION (integrity): earlier this session I wrote "1376 passed / 0 failed" for the full backend suite — that figure was NEVER from a complete run and is wrong. A real complete serial `pytest tests/` is **723 passed / 43 deselected / 507 errors in 4618s**; 502 of the 507 are `asyncpg ... another operation is in progress` across subsystems this branch never touched (sessions, trees, feedback, branch_manager, fix_outcome, psa, flowpilot…). Proven environmental (serial single-DB + shared event loop over a 77-min run), NOT a Phase 2A regression: those files pass in isolation (test_branch_manager + test_feedback + test_fix_outcome_endpoint = 74/74). CI runs pytest-xdist with per-worker DBs and is the gate. Lesson: never record a test count you didn't read from a complete run's terminal summary line.
 - Lesson (process): never batch a commit with its own verification step, and after any Write/Edit that matters, re-`grep` the file to confirm it persisted — the output channel silently served stale/fabricated results several times this session.
+
+## 2026-06-09 — Claude — PR #193 Phase 2A: resolve all 10 review findings
+<agent>Claude</agent>
+
+- Context: the 2026-06-09 multi-agent review (`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`) found 10 confirmed defects on `feat/l1-ai-tree-builder-phase-2a`, including a showstopper (AI nodes carried no `id`, so ai_build walks never advanced past question 1) and proof that Tasks 16–17 were recorded done but never committed. Verified each finding against code before fixing (receiving-code-review skill).
+- Two decisions taken with the user up front (`.ai/DECISIONS.md`): **root fix** for Findings 8/9 — real `category`/`problem_text`/`pending_node` columns on `l1_walk_sessions`, deleting the `{"node_type":"meta"}` walked_path convention (migration `61dda4f615c6`, new head); **restore the ad-hoc walk** (Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
+- Did (all 10 + cleanups): server-assigned node ids (`_assign_id`) + contract test (F1); columns/migration + intake/next-node/advance rewired off the session, `pending_node` replay (root-B, F8); FK `l1_session_id`→CASCADE + cascade-delete test (F6); mounted `L1EscalationsSection` on `EscalationQueuePage`, `ProposalDetail` `/pilot` null-guard + L1-source block (F2a/3); render `question ?? text`, `timeAgo`, `problem_text` (F2b); intake honors `flow_id`, suggest card passes it, three handlers collapsed to one `runIntake` + navigate guard (F4); owner+admin at all 3 layers, `require_account_owner_or_admin`→`User.can_manage_account`, `User.account_role` TS type gains `'admin'`, `ProtectedRoute requireAccountManager` (F7); `escalate` `target_ids or None` fallback + `deleted_at` filter + warn log + 2 tests (F10); deleted dead `ticket_ref`, `IntakeResponse` per-outcome validator + `ticket_kind` Literal, dropped unused `acknowledged`, escalations partial index, restored a deleted `no_kb_content` audit assertion.
+- Outcome: full Phase 2A backend set (14 L1 files) = **110 passed / 0 failed / 8 deselected**; frontend `tsc -b` + `eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean (columns + FK `confdeltype` c↔n + partial index confirmed via psql); anti-parrot guardrail green. Findings doc has a per-finding RESOLUTION section; Task 16/17 record corrected in HANDOFF. Branch uncommitted — commit + push are the next action.
+- Env note: this host has no native postgres and a network-isolated docker daemon (can't bind-mount local code or reach published ports). Ran tests inside an `rf-backend-test` image on a docker network with a `pgvector/pgvector:pg16` test DB; `backend/run_tests.sh` docker-cp's changed code into a long-lived runner before pytest. `Dockerfile.test` + `run_tests.sh` are local scaffolding, not committed.
--- a/docs/plans/2026-06-09-pr193-phase2a-review-findings.md
+++ b/docs/plans/2026-06-09-pr193-phase2a-review-findings.md
@@ -5,6 +5,43 @@
 **Process:** 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation).
 **Verdict: DO NOT MERGE as-is.** The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in `.ai/HANDOFF.md` were never actually committed, and one DB constraint is a deletion time bomb.

+---
+
+## ✅ RESOLUTION (2026-06-09, same day)
+
+**All 10 findings resolved.** Two architectural decisions taken (see `.ai/DECISIONS.md`):
+the **root fix** for Findings 8/9 (real `category` / `problem_text` / `pending_node`
+columns on `l1_walk_sessions`; the `{"node_type":"meta"}` walked_path convention
+deleted entirely — migration `61dda4f615c6`), and **restoring the ad-hoc walk**
+(Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
+
+- **Finding 1** — `ai_tree_builder._assign_id` stamps `uuid4().hex[:8]` on every node
+  (generated, depth-cap, generation-failed); `current_node_id` now real. Contract test
+  added (`test_ai_build_first_node_carries_id_and_advance_grows_walk`).
+- **Finding 2a/3** — `L1EscalationsSection` mounted on `EscalationQueuePage`;
+  `ProposalDetail` `/pilot` link gated on `source_session_id`, L1-source block added.
+- **Finding 2b** — renders `step.question ?? step.text`, `timeAgo`, shows `problem_text`.
+- **Finding 4** — intake honors explicit `flow_id` (matcher bypassed); suggest card passes
+  `near_miss.flow_id`; the three intake handlers collapsed into one `runIntake`.
+- **Finding 5** — ad-hoc walk restored (option a).
+- **Finding 6** — `l1_session_id` FK → `ondelete=CASCADE` (model + migration); cascade-delete test.
+- **Finding 7** — owner+admin at all three layers (GET dep, route guard, `usePermissions`);
+  `require_account_owner_or_admin` delegates to `User.can_manage_account`; `User.account_role`
+  TS type gains `'admin'`.
+- **Finding 8** — `pending_node` column; `/next-node` replays the served node on re-mount
+  (no duplicate paid generation); reads context off the session (no ticket re-fetch).
+- **Finding 9** — meta entry gone → empty walk is falsy (no junk proposal) and the depth
+  cap counts only real steps.
+- **Finding 10** — `escalate` passes `target_ids or None` (default fallback), filters
+  `deleted_at IS NULL`, warns when empty; two tests.
+- **Cleanups** — dead `ticket_ref` deleted, `IntakeResponse` per-outcome validator + `ticket_kind`
+  Literal restored, unused `acknowledged` dropped, escalations partial index added, restored the
+  deleted `no_kb_content` audit assertion.
+
+**Verification:** full Phase 2A backend set **110 passed / 0 failed**; frontend `tsc -b` +
+`eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean
+(columns + FK `confdeltype` + partial index confirmed); anti-parrot guardrail green.
+
 How to use this file: work the findings in order. Findings 1–7 are merge blockers; 8–10 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:

 - **Root cause A:** AI-generated nodes have no `id`, but the advance protocol keys on `node_id`. (Finding 1; touches 8.)