Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to "re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK updated; two architectural decisions logged (real ai_build columns replacing the meta convention; ad-hoc walk restored); SESSION_LOG entry added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
181 lines
20 KiB
Markdown
181 lines
20 KiB
Markdown
# PR #193 (Phase 2A — L1 AI Tree Builder) Review Findings
|
||
|
||
**Date:** 2026-06-09
|
||
**Reviewed:** `feat/l1-ai-tree-builder-phase-2a` vs `main` (42 files, +2,326/−154)
|
||
**Process:** 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation).
|
||
**Verdict: DO NOT MERGE as-is.** The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in `.ai/HANDOFF.md` were never actually committed, and one DB constraint is a deletion time bomb.
|
||
|
||
---
|
||
|
||
## ✅ RESOLUTION (2026-06-09, same day)
|
||
|
||
**All 10 findings resolved.** Two architectural decisions taken (see `.ai/DECISIONS.md`):
|
||
the **root fix** for Findings 8/9 (real `category` / `problem_text` / `pending_node`
|
||
columns on `l1_walk_sessions`; the `{"node_type":"meta"}` walked_path convention
|
||
deleted entirely — migration `61dda4f615c6`), and **restoring the ad-hoc walk**
|
||
(Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
|
||
|
||
- **Finding 1** — `ai_tree_builder._assign_id` stamps `uuid4().hex[:8]` on every node
|
||
(generated, depth-cap, generation-failed); `current_node_id` now real. Contract test
|
||
added (`test_ai_build_first_node_carries_id_and_advance_grows_walk`).
|
||
- **Finding 2a/3** — `L1EscalationsSection` mounted on `EscalationQueuePage`;
|
||
`ProposalDetail` `/pilot` link gated on `source_session_id`, L1-source block added.
|
||
- **Finding 2b** — renders `step.question ?? step.text`, `timeAgo`, shows `problem_text`.
|
||
- **Finding 4** — intake honors explicit `flow_id` (matcher bypassed); suggest card passes
|
||
`near_miss.flow_id`; the three intake handlers collapsed into one `runIntake`.
|
||
- **Finding 5** — ad-hoc walk restored (option a).
|
||
- **Finding 6** — `l1_session_id` FK → `ondelete=CASCADE` (model + migration); cascade-delete test.
|
||
- **Finding 7** — owner+admin at all three layers (GET dep, route guard, `usePermissions`);
|
||
`require_account_owner_or_admin` delegates to `User.can_manage_account`; `User.account_role`
|
||
TS type gains `'admin'`.
|
||
- **Finding 8** — `pending_node` column; `/next-node` replays the served node on re-mount
|
||
(no duplicate paid generation); reads context off the session (no ticket re-fetch).
|
||
- **Finding 9** — meta entry gone → empty walk is falsy (no junk proposal) and the depth
|
||
cap counts only real steps.
|
||
- **Finding 10** — `escalate` passes `target_ids or None` (default fallback), filters
|
||
`deleted_at IS NULL`, warns when empty; two tests.
|
||
- **Cleanups** — dead `ticket_ref` deleted, `IntakeResponse` per-outcome validator + `ticket_kind`
|
||
Literal restored, unused `acknowledged` dropped, escalations partial index added, restored the
|
||
deleted `no_kb_content` audit assertion.
|
||
|
||
**Verification:** full Phase 2A backend set **110 passed / 0 failed**; frontend `tsc -b` +
|
||
`eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean
|
||
(columns + FK `confdeltype` + partial index confirmed); anti-parrot guardrail green.
|
||
|
||
How to use this file: work the findings in order. Findings 1–7 are merge blockers; 8–10 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:
|
||
|
||
- **Root cause A:** AI-generated nodes have no `id`, but the advance protocol keys on `node_id`. (Finding 1; touches 8.)
|
||
- **Root cause B:** The intake category is smuggled into `walked_path` as a fake `{"node_type":"meta"}` entry that every consumer must know to skip — and most don't. (Findings 2b, 9; the deeper fix is a real `category` column on `l1_walk_sessions`, plus `problem_text` while you're there — see Finding 8's note.)
|
||
|
||
---
|
||
|
||
## MERGE BLOCKERS
|
||
|
||
### 1. AI walkthrough can never advance past the first question (showstopper)
|
||
|
||
**Evidence:**
|
||
- `backend/app/services/ai_tree_builder.py:37-41` — SYSTEM_PROMPT's JSON output shapes (`{"node_type":"question","text":...}` etc.) define **no `id` field**; `validate_node` (lines 71-79) returns the node unchanged; nothing anywhere assigns an id.
|
||
- `backend/app/services/l1_session_service.py:156` — `advance_ai_build` only appends to `walked_path` `if node_id is not None`; docstring (line 139) says "On the first call (node_id is None) nothing is appended."
|
||
- `frontend/src/components/l1/L1WalkTreeVariant.tsx:52-54` — sends `node_id: node.id`, which is `undefined` at runtime (server never sends an id; `JSON.stringify` drops undefined keys) → backend always receives `node_id=None`.
|
||
- `l1_session_service.py:174` — `session.current_node_id = next_node.get("id")` is always `None`.
|
||
|
||
**User impact:** Tech answers the first question → answer is discarded, the same (or a re-rolled) first question regenerates forever. `walked_path` never grows past the meta entry, the depth cap never fires, and resolve captures an empty tree.
|
||
|
||
**Why tests missed it:** `test_l1_api_ai_build` and friends mock `advance_ai_build` / hand-craft nodes **with** `id` keys — a shape the real model is never instructed to produce.
|
||
|
||
**Fix:** Assign a server-side id to every generated node before returning it (e.g., `uuid4().hex[:8]` in `generate_next_node` after `validate_node`), persist it as `session.current_node_id`, and add a test that runs the real (unmocked-shape) prompt contract: generate → assert node has id → advance with that id → assert walked_path grew. Do NOT ask the LLM to invent ids.
|
||
|
||
### 2. Escalations from AI sessions go nowhere (two linked defects)
|
||
|
||
**2a — Component never mounted.**
|
||
- `grep -rn "L1EscalationsSection" frontend/src` → exactly one hit: its own definition (`frontend/src/components/l1/L1EscalationsSection.tsx:10`). It is imported nowhere.
|
||
- `frontend/src/pages/EscalationQueuePage.tsx:3` imports only `EscalationQueue, EscalationMetricCard` from `@/components/flowpilot`.
|
||
- `backend/app/services/notification_service.py:449` — `"l1.session.escalated": "/escalations"` deep-link → `frontend/src/router.tsx:299` renders `EscalationQueuePage` → engineer sees only FlowPilot escalations; the L1 handoff is invisible. `GET /l1/escalations` (`backend/app/api/endpoints/l1.py:330`) has no UI surface.
|
||
- **Note:** `.ai/HANDOFF.md:38` claims "L1EscalationsSection on EscalationQueuePage" — that claim is false (documentation drift; see Finding 3).
|
||
|
||
**2b — Component renders wrong fields once mounted.**
|
||
- `backend/app/services/l1_session_service.py:162-168` — ai_build entries are `{"node_type", "id", "text", "answer", "l1_note"}` (key is `text`); legacy `record_step` (lines 199-204) uses `question`/`node_id`.
|
||
- `L1EscalationsSection.tsx:61` renders `{step.question}` → blank for every ai_build entry.
|
||
- `L1EscalationsSection.tsx:46` — `{s.walked_path.length} steps walked` counts the hidden meta entry → "N+1 steps walked".
|
||
- `backend/app/api/endpoints/l1.py:41` — `_to_response` returns `walked_path` raw (meta entry included).
|
||
|
||
**Fix:** Mount `L1EscalationsSection` on `EscalationQueuePage` (or fold L1 rows into the existing queue). Render `step.question ?? step.text`. Filter `node_type === 'meta'` entries — ideally server-side in `_to_response` (or eliminate the meta entry entirely per Root cause B). Also use the shared `timeAgo` util (`frontend/src/lib/timeAgo.ts`) instead of `new Date(...).toLocaleString()` at line 50, to match every sibling queue.
|
||
|
||
### 3. Two tasks recorded as complete were never committed
|
||
|
||
- Task 16 ("ProposalDetail L1-source block") and Task 17 (mounting the escalations section) appear in `.ai/HANDOFF.md` / `SESSION_LOG.md` as done, but **no hunk for `ProposalDetail.tsx` exists in the diff**, and Finding 2a proves the mount never happened.
|
||
- Concrete user impact today: `frontend/src/components/flowpilot/ProposalDetail.tsx:91-101` renders the "Source Session" card unconditionally; line 95 is `` to={`/pilot/${proposal.source_session_id}`} `` with no null guard. L1-sourced proposals (created with `source_session_id=None`, `l1_session_id=<session>`) reach the review queue as `pending` → engineers get a broken **`/pilot/null`** link.
|
||
|
||
**Fix:** Implement the missing work: in `ProposalDetail.tsx`, gate the `/pilot/` link on `source_session_id != null` and render an L1-source block (problem statement, category, link to the L1 session / escalations view) when `l1_session_id` is set. The backend already serves `l1_session_id` via `backend/app/schemas/flow_proposal.py`. Then correct `.ai/HANDOFF.md`.
|
||
|
||
### 4. "Use this flow" button silently does nothing
|
||
|
||
- `frontend/src/pages/l1/L1Dashboard.tsx:77-86` — `useSuggestedFlow` re-POSTs `/l1/intake` with the same text, no `flow_id`. The in-code comment ("it matches again and returns a `matched` outcome") is factually wrong: the same text scores in the same 0.60–0.75 suggest band (`backend/app/services/match_or_build.py:66-72`, `MATCH_THRESHOLD = 0.75` line 21) → `suggest` again, no `session_id` → handler falls to `resetPrompts()` and the card vanishes. The suggested flow can never be started.
|
||
- `backend/app/api/endpoints/l1.py` — the rewritten intake **never reads `payload.flow_id`** (old branch deleted, diff confirms); `IntakeRequest.flow_id` (`backend/app/schemas/l1.py:13`) is now dead.
|
||
|
||
**Fix:** Make intake honor an explicit `flow_id` (bypass the matcher, call `start_flow_session` directly — restores the deleted behavior), and have the suggest card pass `near_miss.flow_id`. This also kills the wasteful re-run of the embedding + pgvector + keyword pipeline just to rediscover a flow_id the client already holds.
|
||
|
||
### 5. Out-of-scope problems lost the ad-hoc walk fallback
|
||
|
||
- Old intake had `else: start_adhoc_session(...)`; the rewrite (`backend/app/api/endpoints/l1.py:88-102`) dispatches only matched/build/suggest/out_of_scope. `start_adhoc_session` (`l1_session_service.py:82`) now has **zero callers** — ad-hoc sessions are unreachable product-wide (the only remaining `session_kind="adhoc"` creation is `escalate_without_walk`, an audit record, not walkable).
|
||
- `L1Dashboard.tsx:269-292` — out_of_scope prompt offers only "Escalate to engineering" / "Cancel".
|
||
- Stale copy: `frontend/src/pages/account/L1CategoriesPage.tsx:57-58` still promises "Disabled categories fall back to an ad-hoc walk or escalation." A diff test comment also claims adhoc "is offered from the out_of_scope prompt" — it is not.
|
||
|
||
**Fix (decide deliberately, don't drift):** Either (a) add a "Walk it ad-hoc" option to the out_of_scope prompt that hits a path creating an adhoc session (restore the capability), or (b) if dropping ad-hoc is intentional, fix the L1CategoriesPage copy and the test comment, and note the decision in `.ai/DECISIONS.md`. Option (a) preserves pre-existing user capability; recommend (a).
|
||
|
||
### 6. DB constraint makes L1-session deletion always fail (time bomb)
|
||
|
||
- `backend/app/models/flow_proposal.py:60-63` — `CheckConstraint("(source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)")` (XOR).
|
||
- Line 87-92 — `l1_session_id` FK is `ondelete="SET NULL"`; `source_session_id` (line 83) is `ondelete="CASCADE"`.
|
||
- Migration `backend/alembic/versions/1fd88a68b145_flow_proposal_l1_source_linkage.py:32-45` ships the same DDL.
|
||
- Postgres CHECK constraints are non-deferrable and ARE evaluated on the UPDATE produced by `ON DELETE SET NULL`. So deleting any `l1_walk_sessions` row referenced by a proposal (whose `source_session_id` is NULL by construction) → both columns NULL → CHECK violation → the DELETE fails. The `SET NULL` action can literally never fire successfully.
|
||
- Reachable today via `backend/app/api/endpoints/admin.py:1336` (`hard_delete_user` → `db.delete(account)`, DB-side cascades with unspecified ordering), and via any future GDPR/retention purge.
|
||
|
||
**Fix:** Change `l1_session_id` to `ondelete="CASCADE"` (matching `source_session_id`'s behavior — proposal dies with its source), in both the model and a new migration. Keep the XOR check. Alternative (`num_nonnulls(...) >= 1` style relaxation) is weaker; prefer CASCADE.
|
||
|
||
### 7. Account admins locked out of L1 category settings (3-layer inconsistency)
|
||
|
||
- Frontend route: `frontend/src/router.tsx:368-372` — `requiredRole="owner"`; `frontend/src/hooks/usePermissions.ts:21-28` (`getEffectiveRole`) has **no admin branch** → `account_role='admin'` maps to `viewer` → bounced to /trees.
|
||
- Backend GET: `backend/app/api/endpoints/accounts.py:175-178` uses `require_l1_or_above` (`deps.py:235-242`: `l1_tech/engineer/owner` only) → admin gets **403 on read**.
|
||
- Backend PATCH: `accounts.py:193-197` uses `require_account_owner_or_admin` (`deps.py:279-289`) → admin **can write**.
|
||
- `admin` is a real role: `backend/app/models/user.py:25` CHECK constraint; `user.py:132` treats admin as account-manager.
|
||
|
||
**Fix:** Pick one rule — owner+admin manage L1 categories — and apply it at all three layers: GET should use `require_account_owner_or_admin` too (or a combined dep), and the route guard needs admins to pass (either add an admin branch to `getEffectiveRole` — check blast radius on other `requiredRole` uses first — or a dedicated `canManageAccount`-style guard for this route). Also note `require_account_owner_or_admin` duplicates `User.can_manage_account` (`user.py:130-132`); delegate to it.
|
||
|
||
---
|
||
|
||
## FAST-FOLLOWS (real bugs, lower urgency)
|
||
|
||
### 8. Every walk-view mount fires a fresh paid LLM call (and may swap the question)
|
||
|
||
- The served-but-unanswered node is never persisted: `l1_session_service.py:156-174` — `node_id is None` path goes straight to `generate_next_node`; only `current_node_id` (always None today, see Finding 1) is stored. No replay branch.
|
||
- `L1WalkTreeVariant.tsx:26-44` — mount effect unconditionally POSTs `/next-node {}`; `frontend/src/main.tsx:4,34` — StrictMode is on, so dev double-mounts double-generate.
|
||
|
||
**Impact:** Refresh/back-forward = duplicate Sonnet spend, multi-second stall, and possibly a *different* question than the one the tech was answering.
|
||
|
||
**Fix:** Persist the pending node (e.g., a `pending_node` JSONB column on `l1_walk_sessions`, or reuse `current_node_id` + stored payload) and replay it when `node_id is None` and a pending node exists. Note: if adding columns, this is the moment to also add `category` and `problem_text` columns and delete the meta-entry convention (Root cause B) — `/next-node` currently re-fetches the internal ticket and re-scans walked_path on every step (`l1.py:302-310`) just to recover these immutable values.
|
||
|
||
### 9. Hidden meta entry: junk proposals + depth cap off-by-one
|
||
|
||
- Junk proposal: `l1_session_service.py:270` — `if helpful and session.session_kind == "ai_build" and session.walked_path:` — a meta-only walked_path (seeded at intake, `l1.py:132-134`) is truthy. `normalize_walked_path` (`ai_tree_builder.py:131-137`) strips meta → empty → returns the `"Empty walk — needs authoring."` stub, which "passes the proposal approval guard" per its own docstring → a `status="pending"`, `validated_by_outcome=True` junk proposal reaches the review queue when a tech resolves immediately after intake.
|
||
- Depth cap: `l1_session_service.py:172-173` passes the **raw** walked_path; `ai_tree_builder.py:82-83,96-98` — `len(walked_path) >= MAX_DEPTH` (12) counts the meta entry → cap fires after 11 real steps. `_strip_meta` is applied only downstream.
|
||
|
||
**Fix (symptom-level):** strip meta before both the truthiness guard and `escalate_if_depth_exceeded`. **Fix (root):** real `category` column, delete the meta convention (see Finding 8 note). Root fix preferred per project principle (correct architecture over minimal diff).
|
||
|
||
### 10. Escalation notification silently dropped when recipient query is empty
|
||
|
||
- `notification_service.py:180` changed `if target_user_ids:` → `if target_user_ids is not None:` (intentional, documented at lines 176-178).
|
||
- `l1_session_service.py:371-381` — `escalate()` passes its computed `target_ids` unconditionally; if all owners/admins/engineers are inactive, `[]` → zero in-app notifications, no log, no fallback. (Existing callers are safe — they all use `[x] if x else None` patterns.)
|
||
- Bonus divergence: escalate's hand-rolled query filters only `is_active`, while `handoff_manager.py:323-333` also filters `deleted_at IS NULL` — soft-deleted engineers would be notified.
|
||
|
||
**Fix:** In `escalate()`: `target_user_ids=target_ids or None` (falls back to default recipients) plus a warning log when empty; add the `deleted_at` filter. Longer-term: give `_resolve_recipients` a roles parameter so callers stop hand-rolling recipient queries.
|
||
|
||
---
|
||
|
||
## Cleanups (optional, do alongside adjacent fixes)
|
||
|
||
- `L1Dashboard.tsx:47-110` — `handleStart` / `useSuggestedFlow` / `buildNew` are three near-identical intake calls; collapse to one `runIntake(opts)` switching on `response.outcome` (this also prevents Finding-4-class drift).
|
||
- `backend/app/schemas/l1.py` — `IntakeRequest.flow_id` is dead unless Finding 4 revives it; `NextNodeRequest.acknowledged` is sent by the frontend but never read by the backend (advance infers ack from `answer is None`) — wire it or drop it. `IntakeResponse` lost its per-outcome guarantees (all fields Optional, `ticket_kind` no longer `Literal["psa","internal"]`); add a `model_validator(mode="after")` requiring `session_id`/`ticket_id` when outcome is matched/build, and add a `session_id` null-guard before `navigate()` in `handleStart` (`L1Dashboard.tsx:58-59` — currently navigates to `/l1/walk/undefined` on a regression).
|
||
- `backend/app/services/match_or_build.py:55` — unused positional `ticket_ref` param (only caller passes `""`); delete it. Also note `classify()` is a second bespoke LLM intake classifier alongside `flowpilot_engine._classify_intake`; `l1.py` passes `problem_domain=None` to matching, losing the domain signal the existing classifier provides — consider unifying in Phase 2B.
|
||
- `backend/app/services/l1_category_service.py:17` + `models/account.py:75-81` — DEFAULT_L1_CATEGORIES duplicated as a hand-escaped JSON `server_default`; derive one from the other (migration copy stays frozen).
|
||
- `frontend/src/pages/account/L1CategoriesPage.tsx` — local `prettify()` duplicates `humanizeFeatureKey` (`UpgradePrompt.tsx:62`); page skips shared `PageHeader`/`Spinner` used by sibling settings pages.
|
||
- Missing index for `GET /l1/escalations` (`l1.py:338`): consider `CREATE INDEX ... ON l1_walk_sessions (account_id, last_step_at DESC) WHERE status = 'escalated'`.
|
||
- `backend/tests/test_l1_session_service.py` — the `escalation_reason_category == "no_kb_content"` assertion was deleted from `test_escalate_without_walk_writes_audit_log`, weakening audit coverage; restore it.
|
||
- Per-step `walked_path` rewrite is O(n²) cumulative bytes (`session.walked_path = [*session.walked_path, entry]`); bounded by MAX_DEPTH=12 so fine today — note for Phase 2B if depth grows.
|
||
|
||
---
|
||
|
||
## Suggested execution order
|
||
|
||
1. Finding 1 (node ids) — unblocks everything; add the contract test.
|
||
2. Finding 6 (FK/constraint) — new migration; do early so it ships in the same release.
|
||
3. Findings 2 + 3 together (mount section, fix field names/meta filter, ProposalDetail L1 block + null-guard the /pilot link).
|
||
4. Finding 4 (intake honors flow_id; suggest card passes it).
|
||
5. Finding 5 (decide adhoc: restore option (a) recommended, or fix copy + DECISIONS.md).
|
||
6. Finding 7 (align all three permission layers).
|
||
7. Findings 8 + 9 via the root fix: add `category`/`problem_text` (+ optionally `pending_node`) columns, delete the meta-entry convention, strip-meta fixes become moot.
|
||
8. Finding 10 (one-line guard + deleted_at filter).
|
||
9. Cleanups opportunistically alongside the file they touch.
|
||
|
||
After fixes: run the 11 Phase 2A backend test files together (authoritative gate per HANDOFF — do NOT trust a full local serial `pytest tests/`; use `--override-ini="addopts="`), frontend `tsc -b` + lint + build, and migration downgrade/upgrade roundtrip. Update `.ai/HANDOFF.md` to correct the Task 16/17 record.
|