feat(l1): AI decision-tree builder — Phase 2A #193
@@ -1,6 +1,6 @@
|
||||
# CURRENT_TASK.md
|
||||
|
||||
**Active task:** L1 AI Tree Builder **Phase 2A — implementation complete, in review as PR #193** (`feat/l1-ai-tree-builder-phase-2a` → `main`). All 19 plan tasks done; full backend suite 1387 passed/0 failed; frontend tsc+lint+build clean; migrations roundtrip clean. Resume point = check Gitea CI on #193, review, merge; then prod `alembic upgrade head` + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md`.
|
||||
**Active task:** L1 AI Tree Builder **Phase 2A — review findings resolved, PR #193 ready to re-push** (`feat/l1-ai-tree-builder-phase-2a` → `main`). The 2026-06-09 multi-agent review found 10 confirmed defects (incl. a showstopper: AI nodes carried no `id` so walks never advanced); **all 10 resolved this session** (root fix: real columns replace the `meta` walked_path convention; ad-hoc walk restored). Full Phase 2A backend set 110 passed/0 failed; frontend tsc+lint+build clean; migration roundtrip clean (new head `61dda4f615c6`). Resume point = commit + push branch, re-run Gitea CI, merge; then prod `alembic upgrade head` (4 migrations) + a live AI-quality smoke/benchmark before wide enablement (spec §5.3). See `.ai/HANDOFF.md` + `docs/plans/2026-06-09-pr193-phase2a-review-findings.md`.
|
||||
|
||||
**Parallel (user-side, blocked):** Phase O cutover for self-serve signup — all code blockers closed on `main`; only user-side manual ops remain (apex DNS at Namecheap, Stripe Dashboard live-mode config with the `/contact` + `/policies` URLs, Railway prod env vars, internal validation, public flag flip), gated on the EIN.
|
||||
|
||||
|
||||
@@ -13,6 +13,58 @@
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-09 — L1 ai_build context lives in columns, not a hidden `meta` walked_path entry
|
||||
|
||||
**Context:** PR #193 review found that the intake category was smuggled into the
|
||||
ai_build session's `walked_path` as a fake `{"node_type":"meta","category":...}`
|
||||
entry that every consumer had to remember to skip. Most didn't: it made an
|
||||
otherwise-empty walk truthy (junk `pending` proposals reached the review queue),
|
||||
pushed the depth cap off by one (counted as a real step), and rendered as a blank
|
||||
row in the escalations UI. Compounding it, AI-generated nodes carried no `id`, but
|
||||
the advance protocol keys on `node_id` — so the walk could never advance past the
|
||||
first question (the headline feature was non-functional end-to-end).
|
||||
|
||||
**Decision:** Add real `category`, `problem_text`, and `pending_node` columns to
|
||||
`l1_walk_sessions` (migration `61dda4f615c6`) and **delete the meta-entry convention
|
||||
entirely**. Intake stores `category`/`problem_text` on the session; `/next-node`
|
||||
reads them off the row (no ticket re-fetch, no walked_path scan). The server assigns
|
||||
every node a `uuid4().hex[:8]` id (`ai_tree_builder._assign_id`) — never the model.
|
||||
`pending_node` persists the served-but-unanswered node so a refresh / StrictMode
|
||||
double-mount replays it instead of firing a fresh paid LLM call.
|
||||
|
||||
**Rejected:** Symptom-level strip-meta fixes (filter the meta entry at each consumer).
|
||||
Smaller diff, but leaves the landmine convention in place for the next consumer to
|
||||
trip over — contrary to the project principle (correct architecture over minimal diff).
|
||||
Asking the LLM to invent node ids: not stable, not trustworthy.
|
||||
|
||||
**Consequences:** `walked_path` now holds only real steps. Adding a new consumer no
|
||||
longer requires knowing about a hidden entry. `WalkSessionResponse` exposes
|
||||
`category`/`problem_text` (escalations UI shows the real problem). The `meta`
|
||||
node_type and `_strip_meta` are gone.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-09 — Keep the L1 ad-hoc walk fallback (don't drop it)
|
||||
|
||||
**Context:** The Phase 2A intake rewrite dropped the `else: start_adhoc_session(...)`
|
||||
branch, leaving `start_adhoc_session` with zero callers and the out_of_scope prompt
|
||||
offering only Escalate/Cancel — while `L1CategoriesPage` copy still promised "Disabled
|
||||
categories fall back to an ad-hoc walk or escalation." A capability silently regressed.
|
||||
|
||||
**Decision:** Restore it (review Finding 5 option a). Intake honors `adhoc=True`
|
||||
(a new `IntakeRequest` field → `"adhoc"` outcome) and the out_of_scope prompt gained a
|
||||
"Walk it ad-hoc" button. This preserves the pre-existing free-form-walk capability and
|
||||
keeps the settings copy honest.
|
||||
|
||||
**Rejected:** Dropping ad-hoc and fixing the copy. It removes a capability techs had,
|
||||
for a problem class (out-of-scope) where a free-form walk is the natural fallback before
|
||||
escalation. Cheaper, but a product regression dressed as cleanup.
|
||||
|
||||
**Consequences:** `start_adhoc_session` has a caller again. The walker renders adhoc
|
||||
sessions via its existing non-ai_build branch (free-form notes, no AI tree).
|
||||
|
||||
---
|
||||
|
||||
## 2026-05-29 — Single source of truth for plan-tier taxonomy (derive admin UI + validation from `plan_limits`)
|
||||
|
||||
**Context:** A prod report ("AI sessions aren't working") traced to the owner account having no paid plan (AI is plan-gated), compounded by a real bug: the admin "Change Plan" dropdown ([`AccountDetailPage.tsx:443-445`](../frontend/src/pages/admin/AccountDetailPage.tsx)) still offered the dead `team` slug (renamed to `enterprise` in migration `4ce3e594cb87`, 2026-05-07) and omitted `starter`/`enterprise`. Selecting "Team" 400s against the hardcoded allow-list in [`admin.py:994`](../backend/app/api/endpoints/admin.py#L994). The dropdown was missed during the 2026-05-07 taxonomy reconciliation because the allowed-plan list is hand-duplicated across ≥6 backend + frontend sites. Second taxonomy-drift incident.
|
||||
|
||||
@@ -2,27 +2,30 @@
|
||||
|
||||
# HANDOFF.md
|
||||
|
||||
**Last updated:** 2026-05-30
|
||||
**Last updated:** 2026-06-09
|
||||
|
||||
**Active task:** L1 AI Tree Builder **Phase 2A — COMPLETE**. All 19 plan tasks done on
|
||||
branch `feat/l1-ai-tree-builder-phase-2a` (branched from `main` @ `87236b5`), pushed to
|
||||
Gitea, **PR #193 open** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable):
|
||||
**Active task:** L1 AI Tree Builder **Phase 2A — review findings RESOLVED, ready to re-push**.
|
||||
Branch `feat/l1-ai-tree-builder-phase-2a` (off `main` @ `87236b5`), **PR #193**:
|
||||
<https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/193>.
|
||||
|
||||
## Resume point — FIX REVIEW FINDINGS on PR #193, do NOT merge yet
|
||||
## Resume point — re-push the fixes, re-run CI, then merge
|
||||
|
||||
A 2026-06-09 multi-agent review (7 finder angles, every finding code-verified) found
|
||||
**10 confirmed defects** — including a showstopper (AI-generated nodes carry no `id`,
|
||||
so ai_build walks can never advance past the first question) and proof that Tasks 16–17
|
||||
(ProposalDetail L1-source block, L1EscalationsSection mount) were recorded as done here
|
||||
but were **never committed**. Full findings, evidence (file:line), fixes, and execution
|
||||
order: [`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md).
|
||||
All **10 review findings are resolved** (this session, uncommitted on the branch — commit +
|
||||
push are the next action). Findings doc has a per-finding RESOLUTION section:
|
||||
[`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md).
|
||||
Two architecture decisions logged in `.ai/DECISIONS.md` (2026-06-09): real
|
||||
`category`/`problem_text`/`pending_node` columns replacing the `meta` walked_path
|
||||
convention; ad-hoc walk restored.
|
||||
|
||||
Next session: work that doc top-to-bottom (findings 1–7 are merge blockers), re-run the
|
||||
Phase 2A test gate + tsc/lint/build + migration roundtrip, then resume the old plan:
|
||||
merge PR #193, prod `alembic upgrade head` (3 migrations, head `1fd88a68b145`), and the
|
||||
live AI-quality smoke test before wide enablement (spec §5.3 — all model calls are
|
||||
mocked in tests).
|
||||
Next: commit + push the branch, let Gitea CI run, then merge PR #193. After merge:
|
||||
prod `alembic upgrade head` — now **4 migrations**, new head **`61dda4f615c6`** (adds the
|
||||
three l1_walk_sessions columns + flips `flow_proposals.l1_session_id` FK to CASCADE + an
|
||||
escalations partial index). Then the live AI-quality smoke test before wide enablement
|
||||
(spec §5.3 — all model calls are mocked in tests).
|
||||
|
||||
**Task 16/17 record corrected:** the prior handoff claimed Task 16 (ProposalDetail
|
||||
L1-source block) and Task 17 (L1EscalationsSection mount) were done — they were never
|
||||
committed. Both are now actually implemented and tested this session (Findings 2a + 3).
|
||||
|
||||
## What shipped (all verified this session)
|
||||
|
||||
@@ -32,9 +35,10 @@ mocked in tests).
|
||||
`normalize_walked_path`, skips `meta`), `match_or_build` (match-first, gate-on-build,
|
||||
flow_id→str), `l1_session_service` (start/advance ai_build storing `node_text`, flywheel
|
||||
capture on resolve, escalate notify). `l1.session.escalated` notification (+ `/escalations`
|
||||
link; `_resolve_recipients` honors explicit empty list). API: intake dispatch (build seeds
|
||||
a hidden `{"node_type":"meta","category":...}` walked_path entry), `/next-node`,
|
||||
link; `_resolve_recipients` honors explicit empty list). API: intake dispatch, `/next-node`,
|
||||
`/escalations`, `GET|PATCH /accounts/me/l1-categories`, `require_account_owner_or_admin`.
|
||||
(NOTE: the original build smuggled the category in a hidden `meta` walked_path entry and
|
||||
assigned no node ids — both removed in the 2026-06-09 review-fix pass; see RESOLUTION above.)
|
||||
- **Frontend (Tasks 13–17):** l1 types/api (intake outcome, TreeNode, categories; nextNode
|
||||
carries `node_text`); L1Dashboard outcome dispatch; L1WalkTreeVariant AI-node rendering +
|
||||
disclaimer banner; owner-gated L1CategoriesPage + route + settings card; ProposalDetail
|
||||
@@ -42,11 +46,14 @@ mocked in tests).
|
||||
- **Tests (Task 18 + throughout):** ~114 Phase 2A backend tests incl. an intake→build→
|
||||
walk→resolve→proposal / →escalate→notify→list integration test; network-stubbed e2e.
|
||||
|
||||
**Verification (Task 19) — numbers below were read from complete run summaries:**
|
||||
- The 11 Phase 2A backend test files run together = **86 passed / 0 errors / 0 failed**
|
||||
(`/tmp/p2a.txt`). This is the authoritative Phase-2A gate.
|
||||
- Frontend `tsc -b` + `npm run lint` + `npm run build` clean; migration `downgrade -3`
|
||||
→ `upgrade head` roundtrips cleanly.
|
||||
**Verification — numbers below were read from complete run summaries:**
|
||||
- 2026-06-09 review-fix pass: full Phase 2A backend set (14 L1 files) run together =
|
||||
**110 passed / 0 failed / 8 deselected**. Frontend `tsc -b` + `eslint` + `vite build`
|
||||
clean. Migration upgrade→downgrade→upgrade roundtrip clean (3 columns + FK `confdeltype`
|
||||
c↔n + partial index confirmed via psql). Anti-parrot guardrail green.
|
||||
- (Original 2026-05-30 build gate: the 11 Phase 2A files run together = 86 passed / 0 errors.)
|
||||
- Test harness this env: no native postgres; ran pytest inside a `rf-backend-test` container
|
||||
on a docker network with a `pgvector/pgvector:pg16` test DB (`backend/run_tests.sh` helper).
|
||||
- **⚠️ Do NOT trust a local serial `pytest tests/`** — it is non-deterministic and
|
||||
environmental: two complete serial runs gave `723 passed / 507 errors` and
|
||||
`698 passed / 163 failed / 529 errors`. The thousands of errors are asyncpg
|
||||
|
||||
@@ -474,3 +474,12 @@
|
||||
- Outcome: the 11 Phase 2A backend test files run together = **124 passed / 0 errors**; frontend tsc+lint+build clean; migrations downgrade-3→upgrade-head roundtrip clean. Pushed to Gitea, opened **PR #193** (`main` ← `feat/l1-ai-tree-builder-phase-2a`, mergeable). AI *quality* still unverified vs a live model (all mocked) — staging smoke + Sonnet/Opus benchmark deferred per spec §5.3.
|
||||
- CORRECTION (integrity): earlier this session I wrote "1376 passed / 0 failed" for the full backend suite — that figure was NEVER from a complete run and is wrong. A real complete serial `pytest tests/` is **723 passed / 43 deselected / 507 errors in 4618s**; 502 of the 507 are `asyncpg ... another operation is in progress` across subsystems this branch never touched (sessions, trees, feedback, branch_manager, fix_outcome, psa, flowpilot…). Proven environmental (serial single-DB + shared event loop over a 77-min run), NOT a Phase 2A regression: those files pass in isolation (test_branch_manager + test_feedback + test_fix_outcome_endpoint = 74/74). CI runs pytest-xdist with per-worker DBs and is the gate. Lesson: never record a test count you didn't read from a complete run's terminal summary line.
|
||||
- Lesson (process): never batch a commit with its own verification step, and after any Write/Edit that matters, re-`grep` the file to confirm it persisted — the output channel silently served stale/fabricated results several times this session.
|
||||
|
||||
## 2026-06-09 — Claude — PR #193 Phase 2A: resolve all 10 review findings
|
||||
<agent>Claude</agent>
|
||||
|
||||
- Context: the 2026-06-09 multi-agent review (`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`) found 10 confirmed defects on `feat/l1-ai-tree-builder-phase-2a`, including a showstopper (AI nodes carried no `id`, so ai_build walks never advanced past question 1) and proof that Tasks 16–17 were recorded done but never committed. Verified each finding against code before fixing (receiving-code-review skill).
|
||||
- Two decisions taken with the user up front (`.ai/DECISIONS.md`): **root fix** for Findings 8/9 — real `category`/`problem_text`/`pending_node` columns on `l1_walk_sessions`, deleting the `{"node_type":"meta"}` walked_path convention (migration `61dda4f615c6`, new head); **restore the ad-hoc walk** (Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
|
||||
- Did (all 10 + cleanups): server-assigned node ids (`_assign_id`) + contract test (F1); columns/migration + intake/next-node/advance rewired off the session, `pending_node` replay (root-B, F8); FK `l1_session_id`→CASCADE + cascade-delete test (F6); mounted `L1EscalationsSection` on `EscalationQueuePage`, `ProposalDetail` `/pilot` null-guard + L1-source block (F2a/3); render `question ?? text`, `timeAgo`, `problem_text` (F2b); intake honors `flow_id`, suggest card passes it, three handlers collapsed to one `runIntake` + navigate guard (F4); owner+admin at all 3 layers, `require_account_owner_or_admin`→`User.can_manage_account`, `User.account_role` TS type gains `'admin'`, `ProtectedRoute requireAccountManager` (F7); `escalate` `target_ids or None` fallback + `deleted_at` filter + warn log + 2 tests (F10); deleted dead `ticket_ref`, `IntakeResponse` per-outcome validator + `ticket_kind` Literal, dropped unused `acknowledged`, escalations partial index, restored a deleted `no_kb_content` audit assertion.
|
||||
- Outcome: full Phase 2A backend set (14 L1 files) = **110 passed / 0 failed / 8 deselected**; frontend `tsc -b` + `eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean (columns + FK `confdeltype` c↔n + partial index confirmed via psql); anti-parrot guardrail green. Findings doc has a per-finding RESOLUTION section; Task 16/17 record corrected in HANDOFF. Branch uncommitted — commit + push are the next action.
|
||||
- Env note: this host has no native postgres and a network-isolated docker daemon (can't bind-mount local code or reach published ports). Ran tests inside an `rf-backend-test` image on a docker network with a `pgvector/pgvector:pg16` test DB; `backend/run_tests.sh` docker-cp's changed code into a long-lived runner before pytest. `Dockerfile.test` + `run_tests.sh` are local scaffolding, not committed.
|
||||
|
||||
@@ -5,6 +5,43 @@
|
||||
**Process:** 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation).
|
||||
**Verdict: DO NOT MERGE as-is.** The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in `.ai/HANDOFF.md` were never actually committed, and one DB constraint is a deletion time bomb.
|
||||
|
||||
---
|
||||
|
||||
## ✅ RESOLUTION (2026-06-09, same day)
|
||||
|
||||
**All 10 findings resolved.** Two architectural decisions taken (see `.ai/DECISIONS.md`):
|
||||
the **root fix** for Findings 8/9 (real `category` / `problem_text` / `pending_node`
|
||||
columns on `l1_walk_sessions`; the `{"node_type":"meta"}` walked_path convention
|
||||
deleted entirely — migration `61dda4f615c6`), and **restoring the ad-hoc walk**
|
||||
(Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
|
||||
|
||||
- **Finding 1** — `ai_tree_builder._assign_id` stamps `uuid4().hex[:8]` on every node
|
||||
(generated, depth-cap, generation-failed); `current_node_id` now real. Contract test
|
||||
added (`test_ai_build_first_node_carries_id_and_advance_grows_walk`).
|
||||
- **Finding 2a/3** — `L1EscalationsSection` mounted on `EscalationQueuePage`;
|
||||
`ProposalDetail` `/pilot` link gated on `source_session_id`, L1-source block added.
|
||||
- **Finding 2b** — renders `step.question ?? step.text`, `timeAgo`, shows `problem_text`.
|
||||
- **Finding 4** — intake honors explicit `flow_id` (matcher bypassed); suggest card passes
|
||||
`near_miss.flow_id`; the three intake handlers collapsed into one `runIntake`.
|
||||
- **Finding 5** — ad-hoc walk restored (option a).
|
||||
- **Finding 6** — `l1_session_id` FK → `ondelete=CASCADE` (model + migration); cascade-delete test.
|
||||
- **Finding 7** — owner+admin at all three layers (GET dep, route guard, `usePermissions`);
|
||||
`require_account_owner_or_admin` delegates to `User.can_manage_account`; `User.account_role`
|
||||
TS type gains `'admin'`.
|
||||
- **Finding 8** — `pending_node` column; `/next-node` replays the served node on re-mount
|
||||
(no duplicate paid generation); reads context off the session (no ticket re-fetch).
|
||||
- **Finding 9** — meta entry gone → empty walk is falsy (no junk proposal) and the depth
|
||||
cap counts only real steps.
|
||||
- **Finding 10** — `escalate` passes `target_ids or None` (default fallback), filters
|
||||
`deleted_at IS NULL`, warns when empty; two tests.
|
||||
- **Cleanups** — dead `ticket_ref` deleted, `IntakeResponse` per-outcome validator + `ticket_kind`
|
||||
Literal restored, unused `acknowledged` dropped, escalations partial index added, restored the
|
||||
deleted `no_kb_content` audit assertion.
|
||||
|
||||
**Verification:** full Phase 2A backend set **110 passed / 0 failed**; frontend `tsc -b` +
|
||||
`eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean
|
||||
(columns + FK `confdeltype` + partial index confirmed); anti-parrot guardrail green.
|
||||
|
||||
How to use this file: work the findings in order. Findings 1–7 are merge blockers; 8–10 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:
|
||||
|
||||
- **Root cause A:** AI-generated nodes have no `id`, but the advance protocol keys on `node_id`. (Finding 1; touches 8.)
|
||||
|
||||
Reference in New Issue
Block a user