Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to "re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK updated; two architectural decisions logged (real ai_build columns replacing the meta convention; ad-hoc walk restored); SESSION_LOG entry added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
20 KiB
PR #193 (Phase 2A — L1 AI Tree Builder) Review Findings
Date: 2026-06-09
Reviewed: feat/l1-ai-tree-builder-phase-2a vs main (42 files, +2,326/−154)
Process: 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation).
Verdict: DO NOT MERGE as-is. The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in .ai/HANDOFF.md were never actually committed, and one DB constraint is a deletion time bomb.
✅ RESOLUTION (2026-06-09, same day)
All 10 findings resolved. Two architectural decisions taken (see .ai/DECISIONS.md):
the root fix for Findings 8/9 (real category / problem_text / pending_node
columns on l1_walk_sessions; the {"node_type":"meta"} walked_path convention
deleted entirely — migration 61dda4f615c6), and restoring the ad-hoc walk
(Finding 5 option a — adhoc=True intake + "Walk it ad-hoc" out_of_scope button).
- Finding 1 —
ai_tree_builder._assign_idstampsuuid4().hex[:8]on every node (generated, depth-cap, generation-failed);current_node_idnow real. Contract test added (test_ai_build_first_node_carries_id_and_advance_grows_walk). - Finding 2a/3 —
L1EscalationsSectionmounted onEscalationQueuePage;ProposalDetail/pilotlink gated onsource_session_id, L1-source block added. - Finding 2b — renders
step.question ?? step.text,timeAgo, showsproblem_text. - Finding 4 — intake honors explicit
flow_id(matcher bypassed); suggest card passesnear_miss.flow_id; the three intake handlers collapsed into onerunIntake. - Finding 5 — ad-hoc walk restored (option a).
- Finding 6 —
l1_session_idFK →ondelete=CASCADE(model + migration); cascade-delete test. - Finding 7 — owner+admin at all three layers (GET dep, route guard,
usePermissions);require_account_owner_or_admindelegates toUser.can_manage_account;User.account_roleTS type gains'admin'. - Finding 8 —
pending_nodecolumn;/next-nodereplays the served node on re-mount (no duplicate paid generation); reads context off the session (no ticket re-fetch). - Finding 9 — meta entry gone → empty walk is falsy (no junk proposal) and the depth cap counts only real steps.
- Finding 10 —
escalatepassestarget_ids or None(default fallback), filtersdeleted_at IS NULL, warns when empty; two tests. - Cleanups — dead
ticket_refdeleted,IntakeResponseper-outcome validator +ticket_kindLiteral restored, unusedacknowledgeddropped, escalations partial index added, restored the deletedno_kb_contentaudit assertion.
Verification: full Phase 2A backend set 110 passed / 0 failed; frontend tsc -b +
eslint + vite build clean; migration upgrade→downgrade→upgrade roundtrip clean
(columns + FK confdeltype + partial index confirmed); anti-parrot guardrail green.
How to use this file: work the findings in order. Findings 1–7 are merge blockers; 8–10 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:
- Root cause A: AI-generated nodes have no
id, but the advance protocol keys onnode_id. (Finding 1; touches 8.) - Root cause B: The intake category is smuggled into
walked_pathas a fake{"node_type":"meta"}entry that every consumer must know to skip — and most don't. (Findings 2b, 9; the deeper fix is a realcategorycolumn onl1_walk_sessions, plusproblem_textwhile you're there — see Finding 8's note.)
MERGE BLOCKERS
1. AI walkthrough can never advance past the first question (showstopper)
Evidence:
backend/app/services/ai_tree_builder.py:37-41— SYSTEM_PROMPT's JSON output shapes ({"node_type":"question","text":...}etc.) define noidfield;validate_node(lines 71-79) returns the node unchanged; nothing anywhere assigns an id.backend/app/services/l1_session_service.py:156—advance_ai_buildonly appends towalked_pathif node_id is not None; docstring (line 139) says "On the first call (node_id is None) nothing is appended."frontend/src/components/l1/L1WalkTreeVariant.tsx:52-54— sendsnode_id: node.id, which isundefinedat runtime (server never sends an id;JSON.stringifydrops undefined keys) → backend always receivesnode_id=None.l1_session_service.py:174—session.current_node_id = next_node.get("id")is alwaysNone.
User impact: Tech answers the first question → answer is discarded, the same (or a re-rolled) first question regenerates forever. walked_path never grows past the meta entry, the depth cap never fires, and resolve captures an empty tree.
Why tests missed it: test_l1_api_ai_build and friends mock advance_ai_build / hand-craft nodes with id keys — a shape the real model is never instructed to produce.
Fix: Assign a server-side id to every generated node before returning it (e.g., uuid4().hex[:8] in generate_next_node after validate_node), persist it as session.current_node_id, and add a test that runs the real (unmocked-shape) prompt contract: generate → assert node has id → advance with that id → assert walked_path grew. Do NOT ask the LLM to invent ids.
2. Escalations from AI sessions go nowhere (two linked defects)
2a — Component never mounted.
grep -rn "L1EscalationsSection" frontend/src→ exactly one hit: its own definition (frontend/src/components/l1/L1EscalationsSection.tsx:10). It is imported nowhere.frontend/src/pages/EscalationQueuePage.tsx:3imports onlyEscalationQueue, EscalationMetricCardfrom@/components/flowpilot.backend/app/services/notification_service.py:449—"l1.session.escalated": "/escalations"deep-link →frontend/src/router.tsx:299rendersEscalationQueuePage→ engineer sees only FlowPilot escalations; the L1 handoff is invisible.GET /l1/escalations(backend/app/api/endpoints/l1.py:330) has no UI surface.- Note:
.ai/HANDOFF.md:38claims "L1EscalationsSection on EscalationQueuePage" — that claim is false (documentation drift; see Finding 3).
2b — Component renders wrong fields once mounted.
backend/app/services/l1_session_service.py:162-168— ai_build entries are{"node_type", "id", "text", "answer", "l1_note"}(key istext); legacyrecord_step(lines 199-204) usesquestion/node_id.L1EscalationsSection.tsx:61renders{step.question}→ blank for every ai_build entry.L1EscalationsSection.tsx:46—{s.walked_path.length} steps walkedcounts the hidden meta entry → "N+1 steps walked".backend/app/api/endpoints/l1.py:41—_to_responsereturnswalked_pathraw (meta entry included).
Fix: Mount L1EscalationsSection on EscalationQueuePage (or fold L1 rows into the existing queue). Render step.question ?? step.text. Filter node_type === 'meta' entries — ideally server-side in _to_response (or eliminate the meta entry entirely per Root cause B). Also use the shared timeAgo util (frontend/src/lib/timeAgo.ts) instead of new Date(...).toLocaleString() at line 50, to match every sibling queue.
3. Two tasks recorded as complete were never committed
- Task 16 ("ProposalDetail L1-source block") and Task 17 (mounting the escalations section) appear in
.ai/HANDOFF.md/SESSION_LOG.mdas done, but no hunk forProposalDetail.tsxexists in the diff, and Finding 2a proves the mount never happened. - Concrete user impact today:
frontend/src/components/flowpilot/ProposalDetail.tsx:91-101renders the "Source Session" card unconditionally; line 95 isto={`/pilot/${proposal.source_session_id}`}with no null guard. L1-sourced proposals (created withsource_session_id=None,l1_session_id=<session>) reach the review queue aspending→ engineers get a broken/pilot/nulllink.
Fix: Implement the missing work: in ProposalDetail.tsx, gate the /pilot/ link on source_session_id != null and render an L1-source block (problem statement, category, link to the L1 session / escalations view) when l1_session_id is set. The backend already serves l1_session_id via backend/app/schemas/flow_proposal.py. Then correct .ai/HANDOFF.md.
4. "Use this flow" button silently does nothing
frontend/src/pages/l1/L1Dashboard.tsx:77-86—useSuggestedFlowre-POSTs/l1/intakewith the same text, noflow_id. The in-code comment ("it matches again and returns amatchedoutcome") is factually wrong: the same text scores in the same 0.60–0.75 suggest band (backend/app/services/match_or_build.py:66-72,MATCH_THRESHOLD = 0.75line 21) →suggestagain, nosession_id→ handler falls toresetPrompts()and the card vanishes. The suggested flow can never be started.backend/app/api/endpoints/l1.py— the rewritten intake never readspayload.flow_id(old branch deleted, diff confirms);IntakeRequest.flow_id(backend/app/schemas/l1.py:13) is now dead.
Fix: Make intake honor an explicit flow_id (bypass the matcher, call start_flow_session directly — restores the deleted behavior), and have the suggest card pass near_miss.flow_id. This also kills the wasteful re-run of the embedding + pgvector + keyword pipeline just to rediscover a flow_id the client already holds.
5. Out-of-scope problems lost the ad-hoc walk fallback
- Old intake had
else: start_adhoc_session(...); the rewrite (backend/app/api/endpoints/l1.py:88-102) dispatches only matched/build/suggest/out_of_scope.start_adhoc_session(l1_session_service.py:82) now has zero callers — ad-hoc sessions are unreachable product-wide (the only remainingsession_kind="adhoc"creation isescalate_without_walk, an audit record, not walkable). L1Dashboard.tsx:269-292— out_of_scope prompt offers only "Escalate to engineering" / "Cancel".- Stale copy:
frontend/src/pages/account/L1CategoriesPage.tsx:57-58still promises "Disabled categories fall back to an ad-hoc walk or escalation." A diff test comment also claims adhoc "is offered from the out_of_scope prompt" — it is not.
Fix (decide deliberately, don't drift): Either (a) add a "Walk it ad-hoc" option to the out_of_scope prompt that hits a path creating an adhoc session (restore the capability), or (b) if dropping ad-hoc is intentional, fix the L1CategoriesPage copy and the test comment, and note the decision in .ai/DECISIONS.md. Option (a) preserves pre-existing user capability; recommend (a).
6. DB constraint makes L1-session deletion always fail (time bomb)
backend/app/models/flow_proposal.py:60-63—CheckConstraint("(source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)")(XOR).- Line 87-92 —
l1_session_idFK isondelete="SET NULL";source_session_id(line 83) isondelete="CASCADE". - Migration
backend/alembic/versions/1fd88a68b145_flow_proposal_l1_source_linkage.py:32-45ships the same DDL. - Postgres CHECK constraints are non-deferrable and ARE evaluated on the UPDATE produced by
ON DELETE SET NULL. So deleting anyl1_walk_sessionsrow referenced by a proposal (whosesource_session_idis NULL by construction) → both columns NULL → CHECK violation → the DELETE fails. TheSET NULLaction can literally never fire successfully. - Reachable today via
backend/app/api/endpoints/admin.py:1336(hard_delete_user→db.delete(account), DB-side cascades with unspecified ordering), and via any future GDPR/retention purge.
Fix: Change l1_session_id to ondelete="CASCADE" (matching source_session_id's behavior — proposal dies with its source), in both the model and a new migration. Keep the XOR check. Alternative (num_nonnulls(...) >= 1 style relaxation) is weaker; prefer CASCADE.
7. Account admins locked out of L1 category settings (3-layer inconsistency)
- Frontend route:
frontend/src/router.tsx:368-372—requiredRole="owner";frontend/src/hooks/usePermissions.ts:21-28(getEffectiveRole) has no admin branch →account_role='admin'maps toviewer→ bounced to /trees. - Backend GET:
backend/app/api/endpoints/accounts.py:175-178usesrequire_l1_or_above(deps.py:235-242:l1_tech/engineer/owneronly) → admin gets 403 on read. - Backend PATCH:
accounts.py:193-197usesrequire_account_owner_or_admin(deps.py:279-289) → admin can write. adminis a real role:backend/app/models/user.py:25CHECK constraint;user.py:132treats admin as account-manager.
Fix: Pick one rule — owner+admin manage L1 categories — and apply it at all three layers: GET should use require_account_owner_or_admin too (or a combined dep), and the route guard needs admins to pass (either add an admin branch to getEffectiveRole — check blast radius on other requiredRole uses first — or a dedicated canManageAccount-style guard for this route). Also note require_account_owner_or_admin duplicates User.can_manage_account (user.py:130-132); delegate to it.
FAST-FOLLOWS (real bugs, lower urgency)
8. Every walk-view mount fires a fresh paid LLM call (and may swap the question)
- The served-but-unanswered node is never persisted:
l1_session_service.py:156-174—node_id is Nonepath goes straight togenerate_next_node; onlycurrent_node_id(always None today, see Finding 1) is stored. No replay branch. L1WalkTreeVariant.tsx:26-44— mount effect unconditionally POSTs/next-node {};frontend/src/main.tsx:4,34— StrictMode is on, so dev double-mounts double-generate.
Impact: Refresh/back-forward = duplicate Sonnet spend, multi-second stall, and possibly a different question than the one the tech was answering.
Fix: Persist the pending node (e.g., a pending_node JSONB column on l1_walk_sessions, or reuse current_node_id + stored payload) and replay it when node_id is None and a pending node exists. Note: if adding columns, this is the moment to also add category and problem_text columns and delete the meta-entry convention (Root cause B) — /next-node currently re-fetches the internal ticket and re-scans walked_path on every step (l1.py:302-310) just to recover these immutable values.
9. Hidden meta entry: junk proposals + depth cap off-by-one
- Junk proposal:
l1_session_service.py:270—if helpful and session.session_kind == "ai_build" and session.walked_path:— a meta-only walked_path (seeded at intake,l1.py:132-134) is truthy.normalize_walked_path(ai_tree_builder.py:131-137) strips meta → empty → returns the"Empty walk — needs authoring."stub, which "passes the proposal approval guard" per its own docstring → astatus="pending",validated_by_outcome=Truejunk proposal reaches the review queue when a tech resolves immediately after intake. - Depth cap:
l1_session_service.py:172-173passes the raw walked_path;ai_tree_builder.py:82-83,96-98—len(walked_path) >= MAX_DEPTH(12) counts the meta entry → cap fires after 11 real steps._strip_metais applied only downstream.
Fix (symptom-level): strip meta before both the truthiness guard and escalate_if_depth_exceeded. Fix (root): real category column, delete the meta convention (see Finding 8 note). Root fix preferred per project principle (correct architecture over minimal diff).
10. Escalation notification silently dropped when recipient query is empty
notification_service.py:180changedif target_user_ids:→if target_user_ids is not None:(intentional, documented at lines 176-178).l1_session_service.py:371-381—escalate()passes its computedtarget_idsunconditionally; if all owners/admins/engineers are inactive,[]→ zero in-app notifications, no log, no fallback. (Existing callers are safe — they all use[x] if x else Nonepatterns.)- Bonus divergence: escalate's hand-rolled query filters only
is_active, whilehandoff_manager.py:323-333also filtersdeleted_at IS NULL— soft-deleted engineers would be notified.
Fix: In escalate(): target_user_ids=target_ids or None (falls back to default recipients) plus a warning log when empty; add the deleted_at filter. Longer-term: give _resolve_recipients a roles parameter so callers stop hand-rolling recipient queries.
Cleanups (optional, do alongside adjacent fixes)
L1Dashboard.tsx:47-110—handleStart/useSuggestedFlow/buildNeware three near-identical intake calls; collapse to onerunIntake(opts)switching onresponse.outcome(this also prevents Finding-4-class drift).backend/app/schemas/l1.py—IntakeRequest.flow_idis dead unless Finding 4 revives it;NextNodeRequest.acknowledgedis sent by the frontend but never read by the backend (advance infers ack fromanswer is None) — wire it or drop it.IntakeResponselost its per-outcome guarantees (all fields Optional,ticket_kindno longerLiteral["psa","internal"]); add amodel_validator(mode="after")requiringsession_id/ticket_idwhen outcome is matched/build, and add asession_idnull-guard beforenavigate()inhandleStart(L1Dashboard.tsx:58-59— currently navigates to/l1/walk/undefinedon a regression).backend/app/services/match_or_build.py:55— unused positionalticket_refparam (only caller passes""); delete it. Also noteclassify()is a second bespoke LLM intake classifier alongsideflowpilot_engine._classify_intake;l1.pypassesproblem_domain=Noneto matching, losing the domain signal the existing classifier provides — consider unifying in Phase 2B.backend/app/services/l1_category_service.py:17+models/account.py:75-81— DEFAULT_L1_CATEGORIES duplicated as a hand-escaped JSONserver_default; derive one from the other (migration copy stays frozen).frontend/src/pages/account/L1CategoriesPage.tsx— localprettify()duplicateshumanizeFeatureKey(UpgradePrompt.tsx:62); page skips sharedPageHeader/Spinnerused by sibling settings pages.- Missing index for
GET /l1/escalations(l1.py:338): considerCREATE INDEX ... ON l1_walk_sessions (account_id, last_step_at DESC) WHERE status = 'escalated'. backend/tests/test_l1_session_service.py— theescalation_reason_category == "no_kb_content"assertion was deleted fromtest_escalate_without_walk_writes_audit_log, weakening audit coverage; restore it.- Per-step
walked_pathrewrite is O(n²) cumulative bytes (session.walked_path = [*session.walked_path, entry]); bounded by MAX_DEPTH=12 so fine today — note for Phase 2B if depth grows.
Suggested execution order
- Finding 1 (node ids) — unblocks everything; add the contract test.
- Finding 6 (FK/constraint) — new migration; do early so it ships in the same release.
- Findings 2 + 3 together (mount section, fix field names/meta filter, ProposalDetail L1 block + null-guard the /pilot link).
- Finding 4 (intake honors flow_id; suggest card passes it).
- Finding 5 (decide adhoc: restore option (a) recommended, or fix copy + DECISIONS.md).
- Finding 7 (align all three permission layers).
- Findings 8 + 9 via the root fix: add
category/problem_text(+ optionallypending_node) columns, delete the meta-entry convention, strip-meta fixes become moot. - Finding 10 (one-line guard + deleted_at filter).
- Cleanups opportunistically alongside the file they touch.
After fixes: run the 11 Phase 2A backend test files together (authoritative gate per HANDOFF — do NOT trust a full local serial pytest tests/; use --override-ini="addopts="), frontend tsc -b + lint + build, and migration downgrade/upgrade roundtrip. Update .ai/HANDOFF.md to correct the Task 16/17 record.