Files
resolutionflow/docs/plans/2026-06-09-pr193-phase2a-review-findings.md
Michael Chihlas db446e1fd6 docs(handoff): PR #193 all 10 review findings resolved + 2 decisions
Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to
"re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK
updated; two architectural decisions logged (real ai_build columns replacing the
meta convention; ad-hoc walk restored); SESSION_LOG entry added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 15:56:03 -04:00

20 KiB
Raw Blame History

PR #193 (Phase 2A — L1 AI Tree Builder) Review Findings

Date: 2026-06-09 Reviewed: feat/l1-ai-tree-builder-phase-2a vs main (42 files, +2,326/154) Process: 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation). Verdict: DO NOT MERGE as-is. The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in .ai/HANDOFF.md were never actually committed, and one DB constraint is a deletion time bomb.


RESOLUTION (2026-06-09, same day)

All 10 findings resolved. Two architectural decisions taken (see .ai/DECISIONS.md): the root fix for Findings 8/9 (real category / problem_text / pending_node columns on l1_walk_sessions; the {"node_type":"meta"} walked_path convention deleted entirely — migration 61dda4f615c6), and restoring the ad-hoc walk (Finding 5 option a — adhoc=True intake + "Walk it ad-hoc" out_of_scope button).

  • Finding 1ai_tree_builder._assign_id stamps uuid4().hex[:8] on every node (generated, depth-cap, generation-failed); current_node_id now real. Contract test added (test_ai_build_first_node_carries_id_and_advance_grows_walk).
  • Finding 2a/3L1EscalationsSection mounted on EscalationQueuePage; ProposalDetail /pilot link gated on source_session_id, L1-source block added.
  • Finding 2b — renders step.question ?? step.text, timeAgo, shows problem_text.
  • Finding 4 — intake honors explicit flow_id (matcher bypassed); suggest card passes near_miss.flow_id; the three intake handlers collapsed into one runIntake.
  • Finding 5 — ad-hoc walk restored (option a).
  • Finding 6l1_session_id FK → ondelete=CASCADE (model + migration); cascade-delete test.
  • Finding 7 — owner+admin at all three layers (GET dep, route guard, usePermissions); require_account_owner_or_admin delegates to User.can_manage_account; User.account_role TS type gains 'admin'.
  • Finding 8pending_node column; /next-node replays the served node on re-mount (no duplicate paid generation); reads context off the session (no ticket re-fetch).
  • Finding 9 — meta entry gone → empty walk is falsy (no junk proposal) and the depth cap counts only real steps.
  • Finding 10escalate passes target_ids or None (default fallback), filters deleted_at IS NULL, warns when empty; two tests.
  • Cleanups — dead ticket_ref deleted, IntakeResponse per-outcome validator + ticket_kind Literal restored, unused acknowledged dropped, escalations partial index added, restored the deleted no_kb_content audit assertion.

Verification: full Phase 2A backend set 110 passed / 0 failed; frontend tsc -b + eslint + vite build clean; migration upgrade→downgrade→upgrade roundtrip clean (columns + FK confdeltype + partial index confirmed); anti-parrot guardrail green.

How to use this file: work the findings in order. Findings 17 are merge blockers; 810 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:

  • Root cause A: AI-generated nodes have no id, but the advance protocol keys on node_id. (Finding 1; touches 8.)
  • Root cause B: The intake category is smuggled into walked_path as a fake {"node_type":"meta"} entry that every consumer must know to skip — and most don't. (Findings 2b, 9; the deeper fix is a real category column on l1_walk_sessions, plus problem_text while you're there — see Finding 8's note.)

MERGE BLOCKERS

1. AI walkthrough can never advance past the first question (showstopper)

Evidence:

  • backend/app/services/ai_tree_builder.py:37-41 — SYSTEM_PROMPT's JSON output shapes ({"node_type":"question","text":...} etc.) define no id field; validate_node (lines 71-79) returns the node unchanged; nothing anywhere assigns an id.
  • backend/app/services/l1_session_service.py:156advance_ai_build only appends to walked_path if node_id is not None; docstring (line 139) says "On the first call (node_id is None) nothing is appended."
  • frontend/src/components/l1/L1WalkTreeVariant.tsx:52-54 — sends node_id: node.id, which is undefined at runtime (server never sends an id; JSON.stringify drops undefined keys) → backend always receives node_id=None.
  • l1_session_service.py:174session.current_node_id = next_node.get("id") is always None.

User impact: Tech answers the first question → answer is discarded, the same (or a re-rolled) first question regenerates forever. walked_path never grows past the meta entry, the depth cap never fires, and resolve captures an empty tree.

Why tests missed it: test_l1_api_ai_build and friends mock advance_ai_build / hand-craft nodes with id keys — a shape the real model is never instructed to produce.

Fix: Assign a server-side id to every generated node before returning it (e.g., uuid4().hex[:8] in generate_next_node after validate_node), persist it as session.current_node_id, and add a test that runs the real (unmocked-shape) prompt contract: generate → assert node has id → advance with that id → assert walked_path grew. Do NOT ask the LLM to invent ids.

2. Escalations from AI sessions go nowhere (two linked defects)

2a — Component never mounted.

  • grep -rn "L1EscalationsSection" frontend/src → exactly one hit: its own definition (frontend/src/components/l1/L1EscalationsSection.tsx:10). It is imported nowhere.
  • frontend/src/pages/EscalationQueuePage.tsx:3 imports only EscalationQueue, EscalationMetricCard from @/components/flowpilot.
  • backend/app/services/notification_service.py:449"l1.session.escalated": "/escalations" deep-link → frontend/src/router.tsx:299 renders EscalationQueuePage → engineer sees only FlowPilot escalations; the L1 handoff is invisible. GET /l1/escalations (backend/app/api/endpoints/l1.py:330) has no UI surface.
  • Note: .ai/HANDOFF.md:38 claims "L1EscalationsSection on EscalationQueuePage" — that claim is false (documentation drift; see Finding 3).

2b — Component renders wrong fields once mounted.

  • backend/app/services/l1_session_service.py:162-168 — ai_build entries are {"node_type", "id", "text", "answer", "l1_note"} (key is text); legacy record_step (lines 199-204) uses question/node_id.
  • L1EscalationsSection.tsx:61 renders {step.question} → blank for every ai_build entry.
  • L1EscalationsSection.tsx:46{s.walked_path.length} steps walked counts the hidden meta entry → "N+1 steps walked".
  • backend/app/api/endpoints/l1.py:41_to_response returns walked_path raw (meta entry included).

Fix: Mount L1EscalationsSection on EscalationQueuePage (or fold L1 rows into the existing queue). Render step.question ?? step.text. Filter node_type === 'meta' entries — ideally server-side in _to_response (or eliminate the meta entry entirely per Root cause B). Also use the shared timeAgo util (frontend/src/lib/timeAgo.ts) instead of new Date(...).toLocaleString() at line 50, to match every sibling queue.

3. Two tasks recorded as complete were never committed

  • Task 16 ("ProposalDetail L1-source block") and Task 17 (mounting the escalations section) appear in .ai/HANDOFF.md / SESSION_LOG.md as done, but no hunk for ProposalDetail.tsx exists in the diff, and Finding 2a proves the mount never happened.
  • Concrete user impact today: frontend/src/components/flowpilot/ProposalDetail.tsx:91-101 renders the "Source Session" card unconditionally; line 95 is to={`/pilot/${proposal.source_session_id}`} with no null guard. L1-sourced proposals (created with source_session_id=None, l1_session_id=<session>) reach the review queue as pending → engineers get a broken /pilot/null link.

Fix: Implement the missing work: in ProposalDetail.tsx, gate the /pilot/ link on source_session_id != null and render an L1-source block (problem statement, category, link to the L1 session / escalations view) when l1_session_id is set. The backend already serves l1_session_id via backend/app/schemas/flow_proposal.py. Then correct .ai/HANDOFF.md.

4. "Use this flow" button silently does nothing

  • frontend/src/pages/l1/L1Dashboard.tsx:77-86useSuggestedFlow re-POSTs /l1/intake with the same text, no flow_id. The in-code comment ("it matches again and returns a matched outcome") is factually wrong: the same text scores in the same 0.600.75 suggest band (backend/app/services/match_or_build.py:66-72, MATCH_THRESHOLD = 0.75 line 21) → suggest again, no session_id → handler falls to resetPrompts() and the card vanishes. The suggested flow can never be started.
  • backend/app/api/endpoints/l1.py — the rewritten intake never reads payload.flow_id (old branch deleted, diff confirms); IntakeRequest.flow_id (backend/app/schemas/l1.py:13) is now dead.

Fix: Make intake honor an explicit flow_id (bypass the matcher, call start_flow_session directly — restores the deleted behavior), and have the suggest card pass near_miss.flow_id. This also kills the wasteful re-run of the embedding + pgvector + keyword pipeline just to rediscover a flow_id the client already holds.

5. Out-of-scope problems lost the ad-hoc walk fallback

  • Old intake had else: start_adhoc_session(...); the rewrite (backend/app/api/endpoints/l1.py:88-102) dispatches only matched/build/suggest/out_of_scope. start_adhoc_session (l1_session_service.py:82) now has zero callers — ad-hoc sessions are unreachable product-wide (the only remaining session_kind="adhoc" creation is escalate_without_walk, an audit record, not walkable).
  • L1Dashboard.tsx:269-292 — out_of_scope prompt offers only "Escalate to engineering" / "Cancel".
  • Stale copy: frontend/src/pages/account/L1CategoriesPage.tsx:57-58 still promises "Disabled categories fall back to an ad-hoc walk or escalation." A diff test comment also claims adhoc "is offered from the out_of_scope prompt" — it is not.

Fix (decide deliberately, don't drift): Either (a) add a "Walk it ad-hoc" option to the out_of_scope prompt that hits a path creating an adhoc session (restore the capability), or (b) if dropping ad-hoc is intentional, fix the L1CategoriesPage copy and the test comment, and note the decision in .ai/DECISIONS.md. Option (a) preserves pre-existing user capability; recommend (a).

6. DB constraint makes L1-session deletion always fail (time bomb)

  • backend/app/models/flow_proposal.py:60-63CheckConstraint("(source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)") (XOR).
  • Line 87-92 — l1_session_id FK is ondelete="SET NULL"; source_session_id (line 83) is ondelete="CASCADE".
  • Migration backend/alembic/versions/1fd88a68b145_flow_proposal_l1_source_linkage.py:32-45 ships the same DDL.
  • Postgres CHECK constraints are non-deferrable and ARE evaluated on the UPDATE produced by ON DELETE SET NULL. So deleting any l1_walk_sessions row referenced by a proposal (whose source_session_id is NULL by construction) → both columns NULL → CHECK violation → the DELETE fails. The SET NULL action can literally never fire successfully.
  • Reachable today via backend/app/api/endpoints/admin.py:1336 (hard_delete_userdb.delete(account), DB-side cascades with unspecified ordering), and via any future GDPR/retention purge.

Fix: Change l1_session_id to ondelete="CASCADE" (matching source_session_id's behavior — proposal dies with its source), in both the model and a new migration. Keep the XOR check. Alternative (num_nonnulls(...) >= 1 style relaxation) is weaker; prefer CASCADE.

7. Account admins locked out of L1 category settings (3-layer inconsistency)

  • Frontend route: frontend/src/router.tsx:368-372requiredRole="owner"; frontend/src/hooks/usePermissions.ts:21-28 (getEffectiveRole) has no admin branchaccount_role='admin' maps to viewer → bounced to /trees.
  • Backend GET: backend/app/api/endpoints/accounts.py:175-178 uses require_l1_or_above (deps.py:235-242: l1_tech/engineer/owner only) → admin gets 403 on read.
  • Backend PATCH: accounts.py:193-197 uses require_account_owner_or_admin (deps.py:279-289) → admin can write.
  • admin is a real role: backend/app/models/user.py:25 CHECK constraint; user.py:132 treats admin as account-manager.

Fix: Pick one rule — owner+admin manage L1 categories — and apply it at all three layers: GET should use require_account_owner_or_admin too (or a combined dep), and the route guard needs admins to pass (either add an admin branch to getEffectiveRole — check blast radius on other requiredRole uses first — or a dedicated canManageAccount-style guard for this route). Also note require_account_owner_or_admin duplicates User.can_manage_account (user.py:130-132); delegate to it.


FAST-FOLLOWS (real bugs, lower urgency)

8. Every walk-view mount fires a fresh paid LLM call (and may swap the question)

  • The served-but-unanswered node is never persisted: l1_session_service.py:156-174node_id is None path goes straight to generate_next_node; only current_node_id (always None today, see Finding 1) is stored. No replay branch.
  • L1WalkTreeVariant.tsx:26-44 — mount effect unconditionally POSTs /next-node {}; frontend/src/main.tsx:4,34 — StrictMode is on, so dev double-mounts double-generate.

Impact: Refresh/back-forward = duplicate Sonnet spend, multi-second stall, and possibly a different question than the one the tech was answering.

Fix: Persist the pending node (e.g., a pending_node JSONB column on l1_walk_sessions, or reuse current_node_id + stored payload) and replay it when node_id is None and a pending node exists. Note: if adding columns, this is the moment to also add category and problem_text columns and delete the meta-entry convention (Root cause B) — /next-node currently re-fetches the internal ticket and re-scans walked_path on every step (l1.py:302-310) just to recover these immutable values.

9. Hidden meta entry: junk proposals + depth cap off-by-one

  • Junk proposal: l1_session_service.py:270if helpful and session.session_kind == "ai_build" and session.walked_path: — a meta-only walked_path (seeded at intake, l1.py:132-134) is truthy. normalize_walked_path (ai_tree_builder.py:131-137) strips meta → empty → returns the "Empty walk — needs authoring." stub, which "passes the proposal approval guard" per its own docstring → a status="pending", validated_by_outcome=True junk proposal reaches the review queue when a tech resolves immediately after intake.
  • Depth cap: l1_session_service.py:172-173 passes the raw walked_path; ai_tree_builder.py:82-83,96-98len(walked_path) >= MAX_DEPTH (12) counts the meta entry → cap fires after 11 real steps. _strip_meta is applied only downstream.

Fix (symptom-level): strip meta before both the truthiness guard and escalate_if_depth_exceeded. Fix (root): real category column, delete the meta convention (see Finding 8 note). Root fix preferred per project principle (correct architecture over minimal diff).

10. Escalation notification silently dropped when recipient query is empty

  • notification_service.py:180 changed if target_user_ids:if target_user_ids is not None: (intentional, documented at lines 176-178).
  • l1_session_service.py:371-381escalate() passes its computed target_ids unconditionally; if all owners/admins/engineers are inactive, [] → zero in-app notifications, no log, no fallback. (Existing callers are safe — they all use [x] if x else None patterns.)
  • Bonus divergence: escalate's hand-rolled query filters only is_active, while handoff_manager.py:323-333 also filters deleted_at IS NULL — soft-deleted engineers would be notified.

Fix: In escalate(): target_user_ids=target_ids or None (falls back to default recipients) plus a warning log when empty; add the deleted_at filter. Longer-term: give _resolve_recipients a roles parameter so callers stop hand-rolling recipient queries.


Cleanups (optional, do alongside adjacent fixes)

  • L1Dashboard.tsx:47-110handleStart / useSuggestedFlow / buildNew are three near-identical intake calls; collapse to one runIntake(opts) switching on response.outcome (this also prevents Finding-4-class drift).
  • backend/app/schemas/l1.pyIntakeRequest.flow_id is dead unless Finding 4 revives it; NextNodeRequest.acknowledged is sent by the frontend but never read by the backend (advance infers ack from answer is None) — wire it or drop it. IntakeResponse lost its per-outcome guarantees (all fields Optional, ticket_kind no longer Literal["psa","internal"]); add a model_validator(mode="after") requiring session_id/ticket_id when outcome is matched/build, and add a session_id null-guard before navigate() in handleStart (L1Dashboard.tsx:58-59 — currently navigates to /l1/walk/undefined on a regression).
  • backend/app/services/match_or_build.py:55 — unused positional ticket_ref param (only caller passes ""); delete it. Also note classify() is a second bespoke LLM intake classifier alongside flowpilot_engine._classify_intake; l1.py passes problem_domain=None to matching, losing the domain signal the existing classifier provides — consider unifying in Phase 2B.
  • backend/app/services/l1_category_service.py:17 + models/account.py:75-81 — DEFAULT_L1_CATEGORIES duplicated as a hand-escaped JSON server_default; derive one from the other (migration copy stays frozen).
  • frontend/src/pages/account/L1CategoriesPage.tsx — local prettify() duplicates humanizeFeatureKey (UpgradePrompt.tsx:62); page skips shared PageHeader/Spinner used by sibling settings pages.
  • Missing index for GET /l1/escalations (l1.py:338): consider CREATE INDEX ... ON l1_walk_sessions (account_id, last_step_at DESC) WHERE status = 'escalated'.
  • backend/tests/test_l1_session_service.py — the escalation_reason_category == "no_kb_content" assertion was deleted from test_escalate_without_walk_writes_audit_log, weakening audit coverage; restore it.
  • Per-step walked_path rewrite is O(n²) cumulative bytes (session.walked_path = [*session.walked_path, entry]); bounded by MAX_DEPTH=12 so fine today — note for Phase 2B if depth grows.

Suggested execution order

  1. Finding 1 (node ids) — unblocks everything; add the contract test.
  2. Finding 6 (FK/constraint) — new migration; do early so it ships in the same release.
  3. Findings 2 + 3 together (mount section, fix field names/meta filter, ProposalDetail L1 block + null-guard the /pilot link).
  4. Finding 4 (intake honors flow_id; suggest card passes it).
  5. Finding 5 (decide adhoc: restore option (a) recommended, or fix copy + DECISIONS.md).
  6. Finding 7 (align all three permission layers).
  7. Findings 8 + 9 via the root fix: add category/problem_text (+ optionally pending_node) columns, delete the meta-entry convention, strip-meta fixes become moot.
  8. Finding 10 (one-line guard + deleted_at filter).
  9. Cleanups opportunistically alongside the file they touch.

After fixes: run the 11 Phase 2A backend test files together (authoritative gate per HANDOFF — do NOT trust a full local serial pytest tests/; use --override-ini="addopts="), frontend tsc -b + lint + build, and migration downgrade/upgrade roundtrip. Update .ai/HANDOFF.md to correct the Task 16/17 record.