docs(spec): resolve 6 Codex review findings on L1 AI tree builder spec

- Blocker: FlowProposal can't link an l1_walk_session (source_session_id is NOT NULL FK→ai_sessions, UI links /pilot). Add nullable l1_session_id + exactly-one CHECK + read-only walked-path link for L1-sourced proposals. - High: flow_matching_engine matches published flows only; scope match pass to flows, defer proposal-matching. - High: notification system is FlowPilot-shaped; enumerate the 3 changes for l1.session.escalated (VALID_EVENTS, link+body builder, explicit engineer recipients). Engineer-visible surface is the primary handoff. - Medium: match before category gate so authored flows aren't blocked. - Medium: define normalize_walked_path → valid tree with root id, unexplored branches as needs_review stubs. - Medium: category write auth needs owner/admin, not engineer; add require_account_owner_or_admin dep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 03:04:49 -04:00
parent 5b58702b20
commit f62712d11c
1 changed files with 69 additions and 23 deletions
--- a/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
+++ b/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
@@ -49,24 +49,30 @@ Dedicated builder for the constrained node generation; reuse existing rails for
 - Phase 1 `L1WalkTreeVariant` walker — its stubbed synthetic-step UI is replaced by real AI node rendering.

 **Intake decision flow:**
+
+Order matters: **match first, gate only the build path.** The category allowlist exists to bound *generic AI building* for safety — it must not block a human-authored flow that already exists for that problem. So matching against published flows runs before any category check; the category gate applies only when we fall through to building.
+
 ```
 POST /l1/intake (problem_statement, customer_*, force_build?)
-  → match_or_build(account_id, problem_text, ticket_ref, force_build):
-      1. category = classify(problem_text)                       # new
-      2. if category not in account.enabled_l1_categories:
-             return {outcome: 'out_of_scope', category}
-      3. if not force_build:
-             hits = flow_matching_engine.find_matches(problem_text)
-             best = max(hits, default=None)
-             if best.score >= MATCH_THRESHOLD:
-                 return {outcome: 'matched', target_id, session_kind}   # flow|proposal
-             if best.score >= SUGGEST_THRESHOLD:
+  → match_or_build(account_id, problem_text, problem_domain, ticket_ref, force_build):
+      1. if not force_build:
+             hits = flow_matching_engine.find_matches(problem_text, problem_domain, account_id)
+             best = max(hits, default=None)                       # published flows (Trees) only
+             if best and best.score >= MATCH_THRESHOLD:
+                 return {outcome: 'matched', flow_id, session_kind: 'flow'}
+             if best and best.score >= SUGGEST_THRESHOLD:
                 return {outcome: 'suggest', near_miss, can_build: true}
+      2. category = classify(problem_text)                        # new — only on build path
+      3. if category not in account.enabled_l1_categories:
+             return {outcome: 'out_of_scope', category}
      4. return {outcome: 'build', session_kind: 'ai_build', category}
 ```
+
+**Match scope (Finding 2):** `flow_matching_engine.find_matches()` matches **published flows (`trees`) only** — it returns `{tree_id, tree_name, score, ...}` and has no notion of `FlowProposal`s. Phase 2A therefore matches against published flows only; the `matched` outcome is always `session_kind: 'flow'`. This is sufficient because the flywheel promotes good AI drafts to published flows (§6), which then become matchable on future intakes. Matching against not-yet-promoted proposals is a deferred enhancement (would require extending the engine), noted in §13.
+
 Frontend dispatches on `outcome`:
- `matched` → start a `flow`/`proposal` walk (Phase 1 paths).
- `suggest` → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with `force_build=true`.
+- `matched` → start a `flow` walk (Phase 1 path).
+- `suggest` → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with `force_build=true` (which skips the match pass and runs the category gate before building).
 - `out_of_scope` → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
 - `build` → create an `ai_build` session, navigate to the walker, fetch the first node.

@@ -98,7 +104,7 @@ Frontend dispatches on `outcome`:

 ## 5. Safety model (layered)

-**Layer 1 — classification gate.** `classify(problem_text)` maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or `unknown`); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is `unknown`), intake returns `out_of_scope`; no build happens.
+**Layer 1 — classification gate (build path only).** Runs only after the match pass misses (§3) — a human-authored flow is never blocked by category settings. `classify(problem_text)` maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or `unknown`); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is `unknown`), intake returns `out_of_scope` (offer adhoc/escalate); no build happens.

 **Layer 2 — constrained generation.** The `ai_tree_builder` system prompt restricts output to:
 - Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
@@ -143,18 +149,37 @@ Ships enabled by default; Account Owners/Admins toggle per account:
 ## 6. Flywheel capture

 On `resolve` of an `ai_build` session (`l1_session_service.resolve` extension):
-1. Build `proposed_flow_data` from the `walked_path` (the nodes that were actually traversed, normalized into a tree structure).
-2. Create a `FlowProposal`: `source='ai_realtime_l1'`, `validated_by_outcome=true`, `proposed_flow_data=<tree>`, `linked_ticket_id/kind=<session ticket>`, `problem_domain=<category>`, `status='pending'`.
+1. **Normalize** the `walked_path` into a complete, valid `tree_structure` (§6.1) — approval requires a dict with a real `id` (see Finding 5 / `_create_tree_from_proposal`).
+2. Create a `FlowProposal`: `source='ai_realtime_l1'`, `validated_by_outcome=true`, `proposed_flow_data={tree_structure, match_keywords}`, `l1_session_id=<this session>` (NOT `source_session_id` — see §6.2 / Finding 1), `linked_ticket_id/kind=<session ticket>`, `problem_domain=<category>`, `status='pending'`.
 3. Run the existing `_find_similar_pending_proposal` dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
 4. Emit the existing `proposal.pending` notification to the review queue.

-Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by `flow_matching_engine` on future intakes → the KB compounds. No new review UI needed; `source='ai_realtime_l1'` rows surface in the existing queue (optionally badge them "AI · outcome-validated").
+Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by `flow_matching_engine` on future intakes → the KB compounds. `source='ai_realtime_l1'` rows surface in the existing queue (badge them "AI · outcome-validated").
+
+### 6.1 Tree normalization (Finding 5)
+The live `walked_path` holds only traversed nodes, and `"generate"` is a runtime sentinel, not a real edge — that is not a valid tree and would fail the `_create_tree_from_proposal` guard (`tree_structure` must be a dict with an `id`). At resolve time, `ai_tree_builder.normalize_walked_path(walked_path) -> tree_structure` produces a complete object:
+- Assign stable string `id`s to every node; the first node becomes the root and `tree_structure.id` = root id.
+- `question` nodes: the **traversed** branch (`yes`/`no` the tech actually chose) points to the next traversed node; the **untraversed** branch points to a terminal `{node_type: 'needs_review', text: 'Branch not explored during the originating call'}` stub.
+- `instruction` nodes point to the next traversed node.
+- The traversal ends at the real terminal node (`resolved` or `escalate`).
+This yields a structurally valid, reviewable tree: engineers fill in the `needs_review` branches when promoting. (Trees are `tree_type='troubleshooting'`.)
+
+### 6.2 FlowProposal L1 source linkage (Finding 1 — Blocker)
+`FlowProposal.source_session_id` is currently `nullable=False` FK → `ai_sessions`, and the review UI (`ProposalDetail.tsx`) links the "Source Session" to `/pilot/{source_session_id}` (a FlowPilot chat surface). An L1 `ai_build` session is an `l1_walk_session`, not an `ai_session`, so it cannot populate `source_session_id`. Changes:
+- **Model/migration:** add `FlowProposal.l1_session_id` (nullable FK → `l1_walk_sessions.id`, `ondelete=SET NULL`, indexed). Make `source_session_id` **nullable**. Add CHECK `((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL))` — exactly one source set.
+- **Review UI:** when `l1_session_id` is set (source `ai_realtime_l1`), render the "Source" block as a read-only walked-path summary (problem statement + the resolved path) instead of a `/pilot/...` link. Existing ai_session-sourced proposals are unchanged.
+- **Tree promotion:** `_create_tree_from_proposal` sets `Tree.source_session_id` from the proposal — for L1-sourced proposals leave it NULL (confirm `Tree.source_session_id` is nullable; if not, include in the migration).

 ## 7. Minimum escalation handoff

-On `escalate` (terminal node reached, or the L1 hits the Escalate modal during an `ai_build` walk) — extends `l1_session_service.escalate`:
-1. **Notify engineers** — `notification_service` bell-badge event `l1.session.escalated` to the account's engineers (and `is_team_admin`/owner). Payload: ticket ref, problem summary, escalation reason category, link.
-2. **Engineer-visible surface** — escalated L1 sessions appear in an engineer-facing list. Reuse/extend the existing `/escalations` queue (`EscalationQueuePage`) with an "L1 escalations" section, or a dedicated `GET /l1/escalations` consumed there. Each row shows problem, the walked path summary, who escalated, when.
+On `escalate` (terminal node reached, or the L1 hits the Escalate modal during an `ai_build` walk) — extends `l1_session_service.escalate`. **The engineer-visible surface is the primary, dependency-free handoff; the bell-badge notification is a thin addition that requires three specific extensions to the FlowPilot-shaped notification system (Finding 3).**
+
+1. **Engineer-visible surface (primary).** Escalated L1 sessions appear in an engineer-facing list — extend the existing `/escalations` queue (`EscalationQueuePage`) with an "L1 escalations" section, backed by a new `GET /l1/escalations`. Each row: problem statement, walked-path summary, who escalated, when, reason category. Pollable; no dependency on the notification subsystem.
+
+2. **Bell-badge notification (Finding 3 — three explicit changes).** The notification system is currently FlowPilot-specific:
+   - `VALID_EVENTS` (`backend/app/schemas/notification.py`) has no `l1.session.escalated`. **Add it** to the set (and to the default `events_enabled` map).
+   - `_build_notification_link` (`notification_service.py`) only knows `session.escalated → /pilot/{session_id}?pickup=true`. **Add** `l1.session.escalated → /escalations` and **add** a body template for the new event. The existing `session.escalated` event must NOT be reused — an L1 escalation has no ai_session and no `/pilot` pickup flow.
+   - Default recipients (`_resolve_recipients`, ~line 184) are owner/admin/team_admin only — ordinary **engineers are excluded**. Since L1 escalations must reach engineers who can pick them up, the call **must pass explicit `target_user_ids`** = the account's active `engineer`-role users (plus owner/admin), not rely on the default set.

 **Still deferred** (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.

@@ -167,6 +192,12 @@ On `escalate` (terminal node reached, or the L1 hits the Escalate modal during a
 **Migration 2 — account L1 category settings.**
 - Add `accounts.enabled_l1_categories` `JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb` (list of category keys). RLS already covers `accounts`.

+**Migration 3 — FlowProposal L1 source linkage (Finding 1).**
+- Add `flow_proposals.l1_session_id` nullable FK → `l1_walk_sessions.id` (`ondelete=SET NULL`, indexed).
+- Make `flow_proposals.source_session_id` **nullable** (was `NOT NULL`).
+- Add CHECK `((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL))` — exactly one source.
+- Confirm `trees.source_session_id` is nullable (L1-promoted trees leave it NULL); if not, drop its NOT NULL here.
+
 No new tables — live build state rides on the existing `l1_walk_sessions.walked_path`; persisted trees ride on `FlowProposal.proposed_flow_data`.

 ## 9. API surface
@@ -176,9 +207,11 @@ No new tables — live build state rides on the existing `l1_walk_sessions.walke
 | POST | `/l1/intake` | **Extended**: now runs `match_or_build`; response carries `outcome` (`matched`/`suggest`/`out_of_scope`/`build`). | `require_l1_or_coverage` |
 | POST | `/l1/sessions/{id}/next-node` | **New**: record answer/ack on current node, generate + return next node (or terminal). | `require_l1_or_coverage` |
 | GET | `/accounts/me/l1-categories` | **New**: list enabled + available categories + hard-floor (read-only) list. | `require_l1_or_above` (read) |
-| PATCH | `/accounts/me/l1-categories` | **New**: set enabled categories. | `require_engineer_or_admin` (owner/admin) |
+| PATCH | `/accounts/me/l1-categories` | **New**: set enabled categories. | `require_account_owner_or_admin` (Finding 6) |
 | GET | `/l1/escalations` | **New** (or extend `/escalations`): engineer-visible escalated-from-L1 list. | `require_engineer_or_admin` |

+**Finding 6 — new auth dep.** The category control is an owner/admin setting, but `require_engineer_or_admin` also admits `engineer`. No existing dep matches "owner or account-admin" (`require_account_owner` is owner-only; `require_admin` is super-admin-only). Add `require_account_owner_or_admin` to `deps.py`: allow `super_admin` bypass, then `account_role in ('owner', 'admin')`, else 403. Use it for the PATCH.
+
 ## 10. Frontend

 - `L1WalkTreeVariant` — replace synthetic stepping with real node rendering driven by `/next-node`; render `question` (yes/no), `instruction` (acknowledge), `resolved`/`escalate` (terminal). Per-node loading affordance. Disclaimer banner mounted for `ai_build` sessions.
@@ -191,10 +224,11 @@ No new tables — live build state rides on the existing `l1_walk_sessions.walke
 **Backend unit:**
 - `ai_tree_builder.generate_next_node` — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
 - Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
- `match_or_build` — all four outcomes at threshold boundaries (`score == MATCH_THRESHOLD`, `== SUGGEST_THRESHOLD`), `force_build` bypasses match, `out_of_scope` when category disabled.
+- `match_or_build` — all four outcomes at threshold boundaries (`score == MATCH_THRESHOLD`, `== SUGGEST_THRESHOLD`); **match runs before the category gate** (a matched published flow is returned even when its category is disabled — Finding 4); `force_build` skips match but still applies the category gate; `out_of_scope` only on the build path when category disabled/unknown.
 - `classify` — known categories map correctly; unknown → out_of_scope.
- Flywheel capture — resolve creates `ai_realtime_l1` proposal; dedupe merges near-duplicate.
- Escalation handoff — notification fired; escalated session appears in engineer query.
+- `normalize_walked_path` (Finding 5) — produces a dict with a root `id`; untraversed `question` branches become `needs_review` stubs; output passes the `_create_tree_from_proposal` validity guard.
+- Flywheel capture — resolve creates `ai_realtime_l1` proposal with `l1_session_id` set and `source_session_id` NULL (Finding 1); CHECK accepts exactly-one-source; dedupe merges near-duplicate.
+- Escalation handoff — `l1.session.escalated` accepted by the notification schema (Finding 3); link resolves to `/escalations`; explicit engineer `target_user_ids` receive it; escalated session appears in `GET /l1/escalations`.

 **Backend integration:**
 - Full intake→build→resolve creates an outcome-validated proposal.
@@ -218,3 +252,15 @@ No new tables — live build state rides on the existing `l1_walk_sessions.walke

 ## 13. Out of scope (restated)
 KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.
+
+**Also deferred (surfaced in review):**
+- **Matching against unpromoted `FlowProposal`s** (Finding 2). `flow_matching_engine` matches published flows only. Extending it to also surface outcome-validated drafts before promotion is a later enhancement; Phase 2A relies on engineer promotion (draft → published flow → matchable).
+
+## 14. Review revisions (2026-05-29 Codex review)
+All six findings verified against code and resolved in this spec:
+1. **Blocker — FlowProposal source linkage:** §6.2 + §8 Migration 3 (new nullable `l1_session_id`, `source_session_id` made nullable, exactly-one CHECK, review-UI link change).
+2. **High — match scope:** §3 (match published flows only; proposal-matching deferred §13).
+3. **High — escalation notification:** §7 (engineer surface is primary; three explicit notification-system changes enumerated).
+4. **Medium — gate ordering:** §3 + §5 Layer 1 (match first; category gate only on the build path).
+5. **Medium — flywheel tree shape:** §6.1 (`normalize_walked_path` produces a valid tree with root `id`; unexplored branches → `needs_review` stubs).
+6. **Medium — category write auth:** §9 (new `require_account_owner_or_admin` dep; `require_engineer_or_admin` was too broad).