Files
resolutionflow/docs/plans/2026-06-09-pr193-phase2a-review-findings.md
Michael Chihlas db446e1fd6 docs(handoff): PR #193 all 10 review findings resolved + 2 decisions
Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to
"re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK
updated; two architectural decisions logged (real ai_build columns replacing the
meta convention; ad-hoc walk restored); SESSION_LOG entry added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 15:56:03 -04:00

181 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PR #193 (Phase 2A — L1 AI Tree Builder) Review Findings
**Date:** 2026-06-09
**Reviewed:** `feat/l1-ai-tree-builder-phase-2a` vs `main` (42 files, +2,326/154)
**Process:** 7 independent finder angles, every candidate independently verified against actual code (quoted lines confirmed, not speculation).
**Verdict: DO NOT MERGE as-is.** The headline feature (AI-guided walkthrough) is non-functional end-to-end, two tasks recorded as complete in `.ai/HANDOFF.md` were never actually committed, and one DB constraint is a deletion time bomb.
---
## ✅ RESOLUTION (2026-06-09, same day)
**All 10 findings resolved.** Two architectural decisions taken (see `.ai/DECISIONS.md`):
the **root fix** for Findings 8/9 (real `category` / `problem_text` / `pending_node`
columns on `l1_walk_sessions`; the `{"node_type":"meta"}` walked_path convention
deleted entirely — migration `61dda4f615c6`), and **restoring the ad-hoc walk**
(Finding 5 option a — `adhoc=True` intake + "Walk it ad-hoc" out_of_scope button).
- **Finding 1** — `ai_tree_builder._assign_id` stamps `uuid4().hex[:8]` on every node
(generated, depth-cap, generation-failed); `current_node_id` now real. Contract test
added (`test_ai_build_first_node_carries_id_and_advance_grows_walk`).
- **Finding 2a/3** — `L1EscalationsSection` mounted on `EscalationQueuePage`;
`ProposalDetail` `/pilot` link gated on `source_session_id`, L1-source block added.
- **Finding 2b** — renders `step.question ?? step.text`, `timeAgo`, shows `problem_text`.
- **Finding 4** — intake honors explicit `flow_id` (matcher bypassed); suggest card passes
`near_miss.flow_id`; the three intake handlers collapsed into one `runIntake`.
- **Finding 5** — ad-hoc walk restored (option a).
- **Finding 6** — `l1_session_id` FK → `ondelete=CASCADE` (model + migration); cascade-delete test.
- **Finding 7** — owner+admin at all three layers (GET dep, route guard, `usePermissions`);
`require_account_owner_or_admin` delegates to `User.can_manage_account`; `User.account_role`
TS type gains `'admin'`.
- **Finding 8** — `pending_node` column; `/next-node` replays the served node on re-mount
(no duplicate paid generation); reads context off the session (no ticket re-fetch).
- **Finding 9** — meta entry gone → empty walk is falsy (no junk proposal) and the depth
cap counts only real steps.
- **Finding 10** — `escalate` passes `target_ids or None` (default fallback), filters
`deleted_at IS NULL`, warns when empty; two tests.
- **Cleanups** — dead `ticket_ref` deleted, `IntakeResponse` per-outcome validator + `ticket_kind`
Literal restored, unused `acknowledged` dropped, escalations partial index added, restored the
deleted `no_kb_content` audit assertion.
**Verification:** full Phase 2A backend set **110 passed / 0 failed**; frontend `tsc -b` +
`eslint` + `vite build` clean; migration upgrade→downgrade→upgrade roundtrip clean
(columns + FK `confdeltype` + partial index confirmed); anti-parrot guardrail green.
How to use this file: work the findings in order. Findings 17 are merge blockers; 810 can be fast-follows. Each finding lists the verified evidence (file:line) and a suggested fix. Several findings share two root causes — fix those at the root rather than patching symptoms:
- **Root cause A:** AI-generated nodes have no `id`, but the advance protocol keys on `node_id`. (Finding 1; touches 8.)
- **Root cause B:** The intake category is smuggled into `walked_path` as a fake `{"node_type":"meta"}` entry that every consumer must know to skip — and most don't. (Findings 2b, 9; the deeper fix is a real `category` column on `l1_walk_sessions`, plus `problem_text` while you're there — see Finding 8's note.)
---
## MERGE BLOCKERS
### 1. AI walkthrough can never advance past the first question (showstopper)
**Evidence:**
- `backend/app/services/ai_tree_builder.py:37-41` — SYSTEM_PROMPT's JSON output shapes (`{"node_type":"question","text":...}` etc.) define **no `id` field**; `validate_node` (lines 71-79) returns the node unchanged; nothing anywhere assigns an id.
- `backend/app/services/l1_session_service.py:156``advance_ai_build` only appends to `walked_path` `if node_id is not None`; docstring (line 139) says "On the first call (node_id is None) nothing is appended."
- `frontend/src/components/l1/L1WalkTreeVariant.tsx:52-54` — sends `node_id: node.id`, which is `undefined` at runtime (server never sends an id; `JSON.stringify` drops undefined keys) → backend always receives `node_id=None`.
- `l1_session_service.py:174``session.current_node_id = next_node.get("id")` is always `None`.
**User impact:** Tech answers the first question → answer is discarded, the same (or a re-rolled) first question regenerates forever. `walked_path` never grows past the meta entry, the depth cap never fires, and resolve captures an empty tree.
**Why tests missed it:** `test_l1_api_ai_build` and friends mock `advance_ai_build` / hand-craft nodes **with** `id` keys — a shape the real model is never instructed to produce.
**Fix:** Assign a server-side id to every generated node before returning it (e.g., `uuid4().hex[:8]` in `generate_next_node` after `validate_node`), persist it as `session.current_node_id`, and add a test that runs the real (unmocked-shape) prompt contract: generate → assert node has id → advance with that id → assert walked_path grew. Do NOT ask the LLM to invent ids.
### 2. Escalations from AI sessions go nowhere (two linked defects)
**2a — Component never mounted.**
- `grep -rn "L1EscalationsSection" frontend/src` → exactly one hit: its own definition (`frontend/src/components/l1/L1EscalationsSection.tsx:10`). It is imported nowhere.
- `frontend/src/pages/EscalationQueuePage.tsx:3` imports only `EscalationQueue, EscalationMetricCard` from `@/components/flowpilot`.
- `backend/app/services/notification_service.py:449``"l1.session.escalated": "/escalations"` deep-link → `frontend/src/router.tsx:299` renders `EscalationQueuePage` → engineer sees only FlowPilot escalations; the L1 handoff is invisible. `GET /l1/escalations` (`backend/app/api/endpoints/l1.py:330`) has no UI surface.
- **Note:** `.ai/HANDOFF.md:38` claims "L1EscalationsSection on EscalationQueuePage" — that claim is false (documentation drift; see Finding 3).
**2b — Component renders wrong fields once mounted.**
- `backend/app/services/l1_session_service.py:162-168` — ai_build entries are `{"node_type", "id", "text", "answer", "l1_note"}` (key is `text`); legacy `record_step` (lines 199-204) uses `question`/`node_id`.
- `L1EscalationsSection.tsx:61` renders `{step.question}` → blank for every ai_build entry.
- `L1EscalationsSection.tsx:46``{s.walked_path.length} steps walked` counts the hidden meta entry → "N+1 steps walked".
- `backend/app/api/endpoints/l1.py:41``_to_response` returns `walked_path` raw (meta entry included).
**Fix:** Mount `L1EscalationsSection` on `EscalationQueuePage` (or fold L1 rows into the existing queue). Render `step.question ?? step.text`. Filter `node_type === 'meta'` entries — ideally server-side in `_to_response` (or eliminate the meta entry entirely per Root cause B). Also use the shared `timeAgo` util (`frontend/src/lib/timeAgo.ts`) instead of `new Date(...).toLocaleString()` at line 50, to match every sibling queue.
### 3. Two tasks recorded as complete were never committed
- Task 16 ("ProposalDetail L1-source block") and Task 17 (mounting the escalations section) appear in `.ai/HANDOFF.md` / `SESSION_LOG.md` as done, but **no hunk for `ProposalDetail.tsx` exists in the diff**, and Finding 2a proves the mount never happened.
- Concrete user impact today: `frontend/src/components/flowpilot/ProposalDetail.tsx:91-101` renders the "Source Session" card unconditionally; line 95 is `` to={`/pilot/${proposal.source_session_id}`} `` with no null guard. L1-sourced proposals (created with `source_session_id=None`, `l1_session_id=<session>`) reach the review queue as `pending` → engineers get a broken **`/pilot/null`** link.
**Fix:** Implement the missing work: in `ProposalDetail.tsx`, gate the `/pilot/` link on `source_session_id != null` and render an L1-source block (problem statement, category, link to the L1 session / escalations view) when `l1_session_id` is set. The backend already serves `l1_session_id` via `backend/app/schemas/flow_proposal.py`. Then correct `.ai/HANDOFF.md`.
### 4. "Use this flow" button silently does nothing
- `frontend/src/pages/l1/L1Dashboard.tsx:77-86``useSuggestedFlow` re-POSTs `/l1/intake` with the same text, no `flow_id`. The in-code comment ("it matches again and returns a `matched` outcome") is factually wrong: the same text scores in the same 0.600.75 suggest band (`backend/app/services/match_or_build.py:66-72`, `MATCH_THRESHOLD = 0.75` line 21) → `suggest` again, no `session_id` → handler falls to `resetPrompts()` and the card vanishes. The suggested flow can never be started.
- `backend/app/api/endpoints/l1.py` — the rewritten intake **never reads `payload.flow_id`** (old branch deleted, diff confirms); `IntakeRequest.flow_id` (`backend/app/schemas/l1.py:13`) is now dead.
**Fix:** Make intake honor an explicit `flow_id` (bypass the matcher, call `start_flow_session` directly — restores the deleted behavior), and have the suggest card pass `near_miss.flow_id`. This also kills the wasteful re-run of the embedding + pgvector + keyword pipeline just to rediscover a flow_id the client already holds.
### 5. Out-of-scope problems lost the ad-hoc walk fallback
- Old intake had `else: start_adhoc_session(...)`; the rewrite (`backend/app/api/endpoints/l1.py:88-102`) dispatches only matched/build/suggest/out_of_scope. `start_adhoc_session` (`l1_session_service.py:82`) now has **zero callers** — ad-hoc sessions are unreachable product-wide (the only remaining `session_kind="adhoc"` creation is `escalate_without_walk`, an audit record, not walkable).
- `L1Dashboard.tsx:269-292` — out_of_scope prompt offers only "Escalate to engineering" / "Cancel".
- Stale copy: `frontend/src/pages/account/L1CategoriesPage.tsx:57-58` still promises "Disabled categories fall back to an ad-hoc walk or escalation." A diff test comment also claims adhoc "is offered from the out_of_scope prompt" — it is not.
**Fix (decide deliberately, don't drift):** Either (a) add a "Walk it ad-hoc" option to the out_of_scope prompt that hits a path creating an adhoc session (restore the capability), or (b) if dropping ad-hoc is intentional, fix the L1CategoriesPage copy and the test comment, and note the decision in `.ai/DECISIONS.md`. Option (a) preserves pre-existing user capability; recommend (a).
### 6. DB constraint makes L1-session deletion always fail (time bomb)
- `backend/app/models/flow_proposal.py:60-63``CheckConstraint("(source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)")` (XOR).
- Line 87-92 — `l1_session_id` FK is `ondelete="SET NULL"`; `source_session_id` (line 83) is `ondelete="CASCADE"`.
- Migration `backend/alembic/versions/1fd88a68b145_flow_proposal_l1_source_linkage.py:32-45` ships the same DDL.
- Postgres CHECK constraints are non-deferrable and ARE evaluated on the UPDATE produced by `ON DELETE SET NULL`. So deleting any `l1_walk_sessions` row referenced by a proposal (whose `source_session_id` is NULL by construction) → both columns NULL → CHECK violation → the DELETE fails. The `SET NULL` action can literally never fire successfully.
- Reachable today via `backend/app/api/endpoints/admin.py:1336` (`hard_delete_user``db.delete(account)`, DB-side cascades with unspecified ordering), and via any future GDPR/retention purge.
**Fix:** Change `l1_session_id` to `ondelete="CASCADE"` (matching `source_session_id`'s behavior — proposal dies with its source), in both the model and a new migration. Keep the XOR check. Alternative (`num_nonnulls(...) >= 1` style relaxation) is weaker; prefer CASCADE.
### 7. Account admins locked out of L1 category settings (3-layer inconsistency)
- Frontend route: `frontend/src/router.tsx:368-372``requiredRole="owner"`; `frontend/src/hooks/usePermissions.ts:21-28` (`getEffectiveRole`) has **no admin branch**`account_role='admin'` maps to `viewer` → bounced to /trees.
- Backend GET: `backend/app/api/endpoints/accounts.py:175-178` uses `require_l1_or_above` (`deps.py:235-242`: `l1_tech/engineer/owner` only) → admin gets **403 on read**.
- Backend PATCH: `accounts.py:193-197` uses `require_account_owner_or_admin` (`deps.py:279-289`) → admin **can write**.
- `admin` is a real role: `backend/app/models/user.py:25` CHECK constraint; `user.py:132` treats admin as account-manager.
**Fix:** Pick one rule — owner+admin manage L1 categories — and apply it at all three layers: GET should use `require_account_owner_or_admin` too (or a combined dep), and the route guard needs admins to pass (either add an admin branch to `getEffectiveRole` — check blast radius on other `requiredRole` uses first — or a dedicated `canManageAccount`-style guard for this route). Also note `require_account_owner_or_admin` duplicates `User.can_manage_account` (`user.py:130-132`); delegate to it.
---
## FAST-FOLLOWS (real bugs, lower urgency)
### 8. Every walk-view mount fires a fresh paid LLM call (and may swap the question)
- The served-but-unanswered node is never persisted: `l1_session_service.py:156-174``node_id is None` path goes straight to `generate_next_node`; only `current_node_id` (always None today, see Finding 1) is stored. No replay branch.
- `L1WalkTreeVariant.tsx:26-44` — mount effect unconditionally POSTs `/next-node {}`; `frontend/src/main.tsx:4,34` — StrictMode is on, so dev double-mounts double-generate.
**Impact:** Refresh/back-forward = duplicate Sonnet spend, multi-second stall, and possibly a *different* question than the one the tech was answering.
**Fix:** Persist the pending node (e.g., a `pending_node` JSONB column on `l1_walk_sessions`, or reuse `current_node_id` + stored payload) and replay it when `node_id is None` and a pending node exists. Note: if adding columns, this is the moment to also add `category` and `problem_text` columns and delete the meta-entry convention (Root cause B) — `/next-node` currently re-fetches the internal ticket and re-scans walked_path on every step (`l1.py:302-310`) just to recover these immutable values.
### 9. Hidden meta entry: junk proposals + depth cap off-by-one
- Junk proposal: `l1_session_service.py:270``if helpful and session.session_kind == "ai_build" and session.walked_path:` — a meta-only walked_path (seeded at intake, `l1.py:132-134`) is truthy. `normalize_walked_path` (`ai_tree_builder.py:131-137`) strips meta → empty → returns the `"Empty walk — needs authoring."` stub, which "passes the proposal approval guard" per its own docstring → a `status="pending"`, `validated_by_outcome=True` junk proposal reaches the review queue when a tech resolves immediately after intake.
- Depth cap: `l1_session_service.py:172-173` passes the **raw** walked_path; `ai_tree_builder.py:82-83,96-98``len(walked_path) >= MAX_DEPTH` (12) counts the meta entry → cap fires after 11 real steps. `_strip_meta` is applied only downstream.
**Fix (symptom-level):** strip meta before both the truthiness guard and `escalate_if_depth_exceeded`. **Fix (root):** real `category` column, delete the meta convention (see Finding 8 note). Root fix preferred per project principle (correct architecture over minimal diff).
### 10. Escalation notification silently dropped when recipient query is empty
- `notification_service.py:180` changed `if target_user_ids:``if target_user_ids is not None:` (intentional, documented at lines 176-178).
- `l1_session_service.py:371-381``escalate()` passes its computed `target_ids` unconditionally; if all owners/admins/engineers are inactive, `[]` → zero in-app notifications, no log, no fallback. (Existing callers are safe — they all use `[x] if x else None` patterns.)
- Bonus divergence: escalate's hand-rolled query filters only `is_active`, while `handoff_manager.py:323-333` also filters `deleted_at IS NULL` — soft-deleted engineers would be notified.
**Fix:** In `escalate()`: `target_user_ids=target_ids or None` (falls back to default recipients) plus a warning log when empty; add the `deleted_at` filter. Longer-term: give `_resolve_recipients` a roles parameter so callers stop hand-rolling recipient queries.
---
## Cleanups (optional, do alongside adjacent fixes)
- `L1Dashboard.tsx:47-110``handleStart` / `useSuggestedFlow` / `buildNew` are three near-identical intake calls; collapse to one `runIntake(opts)` switching on `response.outcome` (this also prevents Finding-4-class drift).
- `backend/app/schemas/l1.py``IntakeRequest.flow_id` is dead unless Finding 4 revives it; `NextNodeRequest.acknowledged` is sent by the frontend but never read by the backend (advance infers ack from `answer is None`) — wire it or drop it. `IntakeResponse` lost its per-outcome guarantees (all fields Optional, `ticket_kind` no longer `Literal["psa","internal"]`); add a `model_validator(mode="after")` requiring `session_id`/`ticket_id` when outcome is matched/build, and add a `session_id` null-guard before `navigate()` in `handleStart` (`L1Dashboard.tsx:58-59` — currently navigates to `/l1/walk/undefined` on a regression).
- `backend/app/services/match_or_build.py:55` — unused positional `ticket_ref` param (only caller passes `""`); delete it. Also note `classify()` is a second bespoke LLM intake classifier alongside `flowpilot_engine._classify_intake`; `l1.py` passes `problem_domain=None` to matching, losing the domain signal the existing classifier provides — consider unifying in Phase 2B.
- `backend/app/services/l1_category_service.py:17` + `models/account.py:75-81` — DEFAULT_L1_CATEGORIES duplicated as a hand-escaped JSON `server_default`; derive one from the other (migration copy stays frozen).
- `frontend/src/pages/account/L1CategoriesPage.tsx` — local `prettify()` duplicates `humanizeFeatureKey` (`UpgradePrompt.tsx:62`); page skips shared `PageHeader`/`Spinner` used by sibling settings pages.
- Missing index for `GET /l1/escalations` (`l1.py:338`): consider `CREATE INDEX ... ON l1_walk_sessions (account_id, last_step_at DESC) WHERE status = 'escalated'`.
- `backend/tests/test_l1_session_service.py` — the `escalation_reason_category == "no_kb_content"` assertion was deleted from `test_escalate_without_walk_writes_audit_log`, weakening audit coverage; restore it.
- Per-step `walked_path` rewrite is O(n²) cumulative bytes (`session.walked_path = [*session.walked_path, entry]`); bounded by MAX_DEPTH=12 so fine today — note for Phase 2B if depth grows.
---
## Suggested execution order
1. Finding 1 (node ids) — unblocks everything; add the contract test.
2. Finding 6 (FK/constraint) — new migration; do early so it ships in the same release.
3. Findings 2 + 3 together (mount section, fix field names/meta filter, ProposalDetail L1 block + null-guard the /pilot link).
4. Finding 4 (intake honors flow_id; suggest card passes it).
5. Finding 5 (decide adhoc: restore option (a) recommended, or fix copy + DECISIONS.md).
6. Finding 7 (align all three permission layers).
7. Findings 8 + 9 via the root fix: add `category`/`problem_text` (+ optionally `pending_node`) columns, delete the meta-entry convention, strip-meta fixes become moot.
8. Finding 10 (one-line guard + deleted_at filter).
9. Cleanups opportunistically alongside the file they touch.
After fixes: run the 11 Phase 2A backend test files together (authoritative gate per HANDOFF — do NOT trust a full local serial `pytest tests/`; use `--override-ini="addopts="`), frontend `tsc -b` + lint + build, and migration downgrade/upgrade roundtrip. Update `.ai/HANDOFF.md` to correct the Task 16/17 record.