4 Commits

Author SHA1 Message Date
5c38fb8904 docs(decisions): record plan-tier taxonomy centralization decision (Option B)
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 6m55s
CI / e2e (pull_request) Successful in 10m27s
CI / backend (pull_request) Successful in 11m42s
Captures the 2026-05-29 decision to derive admin plan dropdown + validation
from the plan_limits table rather than hand-duplicating the allow-list across
6+ sites. Triggered by the prod "AI sessions down" report that traced to the
admin dropdown still offering the dead 'team' slug. Adds the matching backlog
entry to TODO.md with duplication sites enumerated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 11:25:28 -04:00
23dbcec86e docs(plan): L1 AI decision-tree builder — Phase 2A implementation plan
19 TDD tasks from the approved spec: 3 migrations (ai_build kind, account
categories, FlowProposal l1_session_id), ai_tree_builder (constrained node
gen + validation + normalize), match_or_build orchestrator (match-first,
gate-on-build), session-service ai_build start/advance, flywheel capture on
resolve, engineer escalation notification, category settings API, and the
frontend walker/dispatch/settings/escalations surfaces + e2e.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 03:16:10 -04:00
f62712d11c docs(spec): resolve 6 Codex review findings on L1 AI tree builder spec
- Blocker: FlowProposal can't link an l1_walk_session (source_session_id is
  NOT NULL FK→ai_sessions, UI links /pilot). Add nullable l1_session_id +
  exactly-one CHECK + read-only walked-path link for L1-sourced proposals.
- High: flow_matching_engine matches published flows only; scope match pass
  to flows, defer proposal-matching.
- High: notification system is FlowPilot-shaped; enumerate the 3 changes for
  l1.session.escalated (VALID_EVENTS, link+body builder, explicit engineer
  recipients). Engineer-visible surface is the primary handoff.
- Medium: match before category gate so authored flows aren't blocked.
- Medium: define normalize_walked_path → valid tree with root id, unexplored
  branches as needs_review stubs.
- Medium: category write auth needs owner/admin, not engineer; add
  require_account_owner_or_admin dep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 03:04:49 -04:00
5b58702b20 docs(spec): L1 AI decision-tree builder — Phase 2A design
Brainstormed design for real-time AI tree building when no KB/flow matches.
Overrides the original "no empty-KB build" rule: build from generic L1
knowledge under a layered safety model (classification gate, constrained
generation, per-node validation with a hard floor, standing disclaimer).
Approach C — dedicated ai_tree_builder + match_or_build orchestrator,
reusing flow_matching_engine and the knowledge_flywheel proposal pipeline.

Scope: streaming node-by-node builder, admin-configurable categories,
flywheel capture of resolved trees, minimum escalation handoff (notify +
engineer surface). KB ingestion/connectors, PSA reassign, escalation
package, and AI chat handoff deferred to later phases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 01:22:37 -04:00
4 changed files with 2246 additions and 0 deletions

View File

@@ -13,6 +13,18 @@
---
## 2026-05-29 — Single source of truth for plan-tier taxonomy (derive admin UI + validation from `plan_limits`)
**Context:** A prod report ("AI sessions aren't working") traced to the owner account having no paid plan (AI is plan-gated), compounded by a real bug: the admin "Change Plan" dropdown ([`AccountDetailPage.tsx:443-445`](../frontend/src/pages/admin/AccountDetailPage.tsx)) still offered the dead `team` slug (renamed to `enterprise` in migration `4ce3e594cb87`, 2026-05-07) and omitted `starter`/`enterprise`. Selecting "Team" 400s against the hardcoded allow-list in [`admin.py:994`](../backend/app/api/endpoints/admin.py#L994). The dropdown was missed during the 2026-05-07 taxonomy reconciliation because the allowed-plan list is hand-duplicated across ≥6 backend + frontend sites. Second taxonomy-drift incident.
**Decision:** Option B — make `plan_limits` the single source of truth: admin dropdown + pricing/checkout derive plan options from a plans endpoint (filter `is_public`, order by `sort_order`, label from `display_name`), and backend validation checks against actual `plan_limits` rows rather than a hardcoded tuple. Implementation deferred (active work is on another branch); fully specced in [TODO.md](TODO.md). A trivial dropdown-options fix may land first to unblock the admin tool.
**Rejected:** Option A (patch only the `AccountDetailPage` dropdown). Fixes the symptom but leaves the duplication that has now caused two drift incidents — and there is no outage forcing a minimal diff (bug is admin-only and was already worked around via direct Pro assignment). Conflicts with the repo principle "prefer correct architecture over minimal diff."
**Consequences:** New plan tiers become a data change (a `plan_limits` row) instead of a multi-file code edit; UI and validation can no longer drift from the catalog. Requires a public-plans read endpoint (or extending billing state) consumed by the admin UI + pricing page. The `'team'` visibility string (`Tree.visibility` / `StepLibrary.visibility`) is a separate domain and is explicitly out of scope.
---
## 2026-05-28 — Scope Anthropic structured outputs to flat-array JSON only
**Context:** Optimizing the existing Claude API usage (no model change). The Anthropic path in `generate_json` (`ai_provider.py`) had no equivalent to the Gemini path's `response_mime_type="application/json"` — it prompted for JSON and relied on downstream defenses: `_strip_markdown_fences` (ai_fix), `parse_llm_json` (knowledge_flywheel), and `_try_repair_json` (kb_conversion, which balances unclosed braces on truncated output). Anthropic structured outputs (`output_config.format` with a JSON schema) guarantee valid, parseable JSON and would eliminate those band-aids. The question was which of the four `generate_json` call sites can adopt it.

View File

@@ -23,3 +23,5 @@ None selected. Pick from the backlog below or `03-DEVELOPMENT-ROADMAP.md`.
- [ ] **`bg-card-hover` Tailwind class doesn't resolve.** [`frontend/src/components/layout/CommandPalette.tsx:450-451`](../frontend/src/components/layout/CommandPalette.tsx) uses `bg-card-hover` as a Tailwind utility, but Tailwind v4 generates `bg-{token}` from `--color-{token}` — and the token in [`frontend/src/index.css:15`](../frontend/src/index.css) is `--color-bg-card-hover`, which generates `bg-bg-card-hover`, not `bg-card-hover`. So those classes silently produce nothing. Other call sites (KnowledgeBaseCards, TeamSummary, ProposalBanner) use the explicit `hover:bg-[var(--color-bg-card-hover)]` form which works. Fix: change the CommandPalette classes to the explicit-var form, OR add a `--color-card-hover` semantic mapping in index.css alongside `--color-card`. Surfaced 2026-05-01 during impeccable polish sweep.
- [ ] **`ConcludeSessionModal` paused/escalated step forces single-artifact choice — should allow multi-select.** [`frontend/src/components/assistant/ConcludeSessionModal.tsx`](../frontend/src/components/assistant/ConcludeSessionModal.tsx) ~lines 430-474 ("Paused/Escalated: status update options"). Today the engineer clicks ONE of Ticket Notes / Client Update / Email Draft, the buttons disappear, and the result replaces them. Real MSP escalations almost always need at least two: technical notes for the next engineer's PSA AND a non-technical client update. Same for pause (client update + ticket notes for context when resuming). Recommended shape: multi-select with smart defaults — three checkboxes (`☑ Ticket Notes ☑ Client Update ☐ Email Draft`); for `escalated` pre-check Ticket Notes + Client Update; for `paused` pre-check Client Update only. One "Generate" button fires all selected in parallel via existing `aiSessionsApi.generateStatusUpdate(...)` (already supports the three `audience` values: `ticket_notes`, `client_update`, `email_draft`). Each result renders in its own card with its own Copy / Post-to-PSA / Send-Email action. Surfaced 2026-05-01. Feature work, not polish — touches streaming wiring for parallel calls.
- [ ] **Centralize plan-tier taxonomy — derive admin plan dropdown (and validation) from `plan_limits`, not hardcoded lists.** Chose **Option B** over a one-line patch (see [DECISIONS.md](DECISIONS.md) 2026-05-29). *Surfaced by a prod bug (2026-05-28):* the admin "Change Plan" dropdown at [`AccountDetailPage.tsx:443-445`](../frontend/src/pages/admin/AccountDetailPage.tsx) still offered `free / pro / team` — the dead `team` slug (renamed to `enterprise` in migration `4ce3e594cb87`, 2026-05-07) and missing `starter`/`enterprise`. Selecting "Team" sends `{plan:"team"}` to `PUT /admin/accounts/{id}/subscription/plan`, which 400s on `if data.plan not in ("free","pro","starter","enterprise")` ([admin.py:994](../backend/app/api/endpoints/admin.py#L994), duplicated at [:975](../backend/app/api/endpoints/admin.py#L975)). The 400 detail was swallowed by a generic `toast.error('Failed to update plan')` ([AccountDetailPage.tsx:196](../frontend/src/pages/admin/AccountDetailPage.tsx)), so it presented as "AI sessions are down" (real cause: owner account had no paid plan; AI is plan-gated). **Root cause of the root cause:** the allowed-plan list is hand-duplicated across ≥6 sites and drifted (2nd such incident). **Duplication sites to consolidate:** backend [`admin.py:975`](../backend/app/api/endpoints/admin.py#L975) + [`:994`](../backend/app/api/endpoints/admin.py#L994) (tuple, twice), [`schemas/admin.py:128`](../backend/app/schemas/admin.py) (`AdminAccountCreate.plan` Literal), frontend `AccountDetailPage.tsx` dropdown, `AccountsPage.tsx` create-account dropdown, `types/admin.ts` + `types/account.ts` + `types/billing.ts`, `hooks/useSubscription.ts` (`isPaidPlan`), `components/subscription/CheckoutButton.tsx` (`planLabels`). **Source of truth:** the `plan_limits` table (rows: free/starter/pro/enterprise) — `PlanLimitWithBillingResponse` already exposes `is_public` + `sort_order` + `display_name` for ordering/labels. **End state (B):** admin dropdown + pricing/checkout derive options from a plans endpoint backed by `plan_limits` (filter `is_public`, order by `sort_order`, label from `display_name`); backend validation checks against actual `plan_limits` rows instead of a hardcoded tuple. **Trivial first commit (land anytime to unblock the admin tool):** fix the `AccountDetailPage` dropdown to `Free / Starter / Pro / Enterprise` and surface the backend error detail in the toast. ⚠️ The `'team'` string in `Tree.visibility` / `StepLibrary.visibility` is a *separate domain* (shared-with-account) — do NOT touch it.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,266 @@
# L1 AI Decision-Tree Builder — Phase 2A Design
**Status:** Draft for review
**Date:** 2026-05-29
**Author:** previous session (brainstorming)
**Predecessor:** [`2026-05-28-l1-workspace-design.md`](2026-05-28-l1-workspace-design.md) (full L1 vision), [`2026-05-28-l1-workspace-phase-1-acceptance.md`](2026-05-28-l1-workspace-phase-1-acceptance.md) (what shipped in Phase 1)
---
## 1. Goal
When an L1 tech describes a problem and there is **no matching authored flow or AI draft**, the platform builds a yes/no decision tree **in real time from the model's general L1 knowledge** and walks the tech through it node by node. Scoped to L1-appropriate troubleshooting: simple yes/no questions and reversible step-by-step instructions. Successful trees are captured as outcome-validated drafts for engineer review, compounding the account's knowledge base from real resolutions.
This **overrides** the original spec's "no empty-KB build" rule (§8.1 of the predecessor), which aborted to a degradation screen when no KB existed. Instead of aborting, we build from generic knowledge under a layered safety model.
KB grounding (RAG over ingested documents) is **explicitly deferred to Phase 2B** — Phase 2A builds from generic knowledge only, plus matching against already-authored flows.
## 2. Scope
**In scope (Phase 2A):**
- `match_or_build` orchestrator inserted at L1 intake (match-first, build-on-miss).
- `ai_tree_builder` service: node-by-node ("streaming") tree generation, constrained + escalate-early.
- Admin-configurable L1 category allowlist (Account Owner/Admin control panel).
- Standing AI-disclaimer banner on AI-built walks.
- Flywheel capture: resolved AI trees become outcome-validated `FlowProposal`s.
- Minimum escalation handoff: engineer bell-badge notification + an engineer-visible "escalated from L1" surface.
**Deferred:**
- KB document ingestion + connectors (IT Glue, Hudu, SharePoint/OneDrive) — Phase 2B.
- RAG grounding of the builder on ingested KB — Phase 2B.
- PSA ticket reassign on escalation, escalation-package generation, AI chat handoff — later phase.
- `BuildAbortedNoKB` screen from the original spec — **dropped** (superseded by build-from-generic).
## 3. Architecture (Approach C)
Dedicated builder for the constrained node generation; reuse existing rails for matching and capture.
**New services:**
| File | Responsibility |
|---|---|
| `backend/app/services/match_or_build.py` | Orchestrator. `match_or_build(account_id, problem_text, ticket_ref, *, force_build=False) -> MatchOrBuildResult`. Classify → category gate → match pass → build/suggest/out-of-scope decision. |
| `backend/app/services/ai_tree_builder.py` | Node-by-node generation. `generate_next_node(problem_text, category, walked_path) -> TreeNode`. Reuses `get_ai_provider` + `generate_json` + `parse_llm_json`. Owns the constrained system prompt and per-node validation. |
| `backend/app/services/l1_category_service.py` | Read/write an account's enabled L1 categories; expose the default allowlist and the always-forbidden hard floor. |
**Reused as-is:**
- `flow_matching_engine.find_matches()` — semantic + keyword + recency match pass.
- `knowledge_flywheel` proposal-creation + dedupe (`_find_similar_pending_proposal`) — outcome-validated capture.
- `notification_service` — engineer escalation notification.
- Phase 1 `L1WalkTreeVariant` walker — its stubbed synthetic-step UI is replaced by real AI node rendering.
**Intake decision flow:**
Order matters: **match first, gate only the build path.** The category allowlist exists to bound *generic AI building* for safety — it must not block a human-authored flow that already exists for that problem. So matching against published flows runs before any category check; the category gate applies only when we fall through to building.
```
POST /l1/intake (problem_statement, customer_*, force_build?)
→ match_or_build(account_id, problem_text, problem_domain, ticket_ref, force_build):
1. if not force_build:
hits = flow_matching_engine.find_matches(problem_text, problem_domain, account_id)
best = max(hits, default=None) # published flows (Trees) only
if best and best.score >= MATCH_THRESHOLD:
return {outcome: 'matched', flow_id, session_kind: 'flow'}
if best and best.score >= SUGGEST_THRESHOLD:
return {outcome: 'suggest', near_miss, can_build: true}
2. category = classify(problem_text) # new — only on build path
3. if category not in account.enabled_l1_categories:
return {outcome: 'out_of_scope', category}
4. return {outcome: 'build', session_kind: 'ai_build', category}
```
**Match scope (Finding 2):** `flow_matching_engine.find_matches()` matches **published flows (`trees`) only** — it returns `{tree_id, tree_name, score, ...}` and has no notion of `FlowProposal`s. Phase 2A therefore matches against published flows only; the `matched` outcome is always `session_kind: 'flow'`. This is sufficient because the flywheel promotes good AI drafts to published flows (§6), which then become matchable on future intakes. Matching against not-yet-promoted proposals is a deferred enhancement (would require extending the engine), noted in §13.
Frontend dispatches on `outcome`:
- `matched` → start a `flow` walk (Phase 1 path).
- `suggest` → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with `force_build=true` (which skips the match pass and runs the category gate before building).
- `out_of_scope` → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
- `build` → create an `ai_build` session, navigate to the walker, fetch the first node.
## 4. The streaming build & node schema
`ai_tree_builder.generate_next_node()` is called with the problem statement, the resolved category, and the **full walked path so far**. It returns exactly one node. Passing the whole path every call is what keeps independently-generated nodes coherent and lets the model decide when it has exhausted safe steps.
**Node shape (`proposed_flow_data` node, also the live `walked_path` entry):**
```json
// question — yes/no branch; both branches regenerate
{ "node_type": "question", "id": "n3", "text": "Is the printer showing a 'ready' status light?",
"yes_next": "generate", "no_next": "generate" }
// instruction — a single safe, reversible action; advances on acknowledgement
{ "node_type": "instruction", "id": "n4", "text": "Unplug the printer for 30 seconds, then power it back on.",
"next": "generate" }
// resolved — terminal success
{ "node_type": "resolved", "id": "n7", "text": "Printer is back online and printing test pages." }
// escalate — terminal handoff (escalate-early safety valve)
{ "node_type": "escalate", "id": "n7", "reason_category": "exhausted_safe_steps",
"text": "This looks like a driver-level fault beyond L1 scope — escalating to engineering." }
```
`"generate"` is a sentinel meaning "call `generate_next_node` again with the new answer appended." The first node is fetched synchronously on `ai_build` session creation (intake). Each subsequent node is fetched when the tech answers/acknowledges — target latency ~24s per node; show a per-node "Thinking through the next step…" affordance.
**Endpoint:** `POST /l1/sessions/{id}/next-node` body `{node_id, answer?: 'yes'|'no', acknowledged?: true, note?}`. Appends the answered node to `walked_path`, then generates and returns the next node (or a terminal node). Replaces the Phase 1 synthetic stepping in `L1WalkTreeVariant`.
## 5. Safety model (layered)
**Layer 1 — classification gate (build path only).** Runs only after the match pass misses (§3) — a human-authored flow is never blocked by category settings. `classify(problem_text)` maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or `unknown`); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is `unknown`), intake returns `out_of_scope` (offer adhoc/escalate); no build happens.
**Layer 2 — constrained generation.** The `ai_tree_builder` system prompt restricts output to:
- Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
- A **hard floor of always-forbidden actions** (see §5.1) that NO category may unlock.
- An explicit instruction to emit an `escalate` node — never guess — once it runs out of in-scope safe steps.
**Layer 3 — per-node validation.** Server-side, every generated node is checked before being returned:
- Reject (and regenerate once, then escalate) nodes whose text matches forbidden-action patterns (§5.1).
- Enforce a **depth cap** (default `L1_BUILD_MAX_DEPTH = 12`): once the walked path hits the cap, force an `escalate` node.
- Validate node JSON shape (Pydantic); malformed → regenerate once, then escalate.
**Layer 4 — standing disclaimer.** Persistent banner on every `ai_build` walk:
> *"These are high-confidence troubleshooting steps, but they come from outside your organization's knowledge base — review them before acting. When in doubt, escalate early."*
### 5.1 Hard floor — always forbidden (admins cannot enable)
Regardless of enabled categories, the builder must never produce steps that:
- Modify the Windows registry, system files, or boot configuration.
- Delete, format, or repartition data/disks; remove user profiles or mailboxes.
- Change credentials, MFA, security/firewall/AV settings, or disable protections.
- Run scripts/commands with elevated/admin privileges.
- Touch domain controllers, DNS, DHCP, or production server config.
- Make purchases, license changes, or anything with billing impact.
*(This list is a product decision — review and edit during spec review.)*
### 5.2 Default enabled category allowlist (admin-editable)
Ships enabled by default; Account Owners/Admins toggle per account:
`password_reset`, `account_lockout`, `printer`, `email_outlook_client`, `wifi_network_basics`, `vpn_connect`, `teams_zoom_av`, `browser_cache_cookies`, `peripheral_reconnect`, `os_restart_update`.
*(This list is a product decision — review and edit during spec review.)*
### 5.3 Tunables
| Setting | Default | Notes |
|---|---|---|
| `MATCH_THRESHOLD` | 0.75 | Carried from predecessor spec §8.1. |
| `SUGGEST_THRESHOLD` | 0.60 | Carried from predecessor spec §8.1. |
| `L1_BUILD_MAX_DEPTH` | 12 | Force escalate beyond this many nodes. |
| `get_model_for_action('l1_realtime_build')` | Sonnet | Latency-sensitive; benchmark Sonnet vs Opus during plan. |
| Per-node max_tokens | 1024 | One node is small. |
## 6. Flywheel capture
On `resolve` of an `ai_build` session (`l1_session_service.resolve` extension):
1. **Normalize** the `walked_path` into a complete, valid `tree_structure` (§6.1) — approval requires a dict with a real `id` (see Finding 5 / `_create_tree_from_proposal`).
2. Create a `FlowProposal`: `source='ai_realtime_l1'`, `validated_by_outcome=true`, `proposed_flow_data={tree_structure, match_keywords}`, `l1_session_id=<this session>` (NOT `source_session_id` — see §6.2 / Finding 1), `linked_ticket_id/kind=<session ticket>`, `problem_domain=<category>`, `status='pending'`.
3. Run the existing `_find_similar_pending_proposal` dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
4. Emit the existing `proposal.pending` notification to the review queue.
Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by `flow_matching_engine` on future intakes → the KB compounds. `source='ai_realtime_l1'` rows surface in the existing queue (badge them "AI · outcome-validated").
### 6.1 Tree normalization (Finding 5)
The live `walked_path` holds only traversed nodes, and `"generate"` is a runtime sentinel, not a real edge — that is not a valid tree and would fail the `_create_tree_from_proposal` guard (`tree_structure` must be a dict with an `id`). At resolve time, `ai_tree_builder.normalize_walked_path(walked_path) -> tree_structure` produces a complete object:
- Assign stable string `id`s to every node; the first node becomes the root and `tree_structure.id` = root id.
- `question` nodes: the **traversed** branch (`yes`/`no` the tech actually chose) points to the next traversed node; the **untraversed** branch points to a terminal `{node_type: 'needs_review', text: 'Branch not explored during the originating call'}` stub.
- `instruction` nodes point to the next traversed node.
- The traversal ends at the real terminal node (`resolved` or `escalate`).
This yields a structurally valid, reviewable tree: engineers fill in the `needs_review` branches when promoting. (Trees are `tree_type='troubleshooting'`.)
### 6.2 FlowProposal L1 source linkage (Finding 1 — Blocker)
`FlowProposal.source_session_id` is currently `nullable=False` FK → `ai_sessions`, and the review UI (`ProposalDetail.tsx`) links the "Source Session" to `/pilot/{source_session_id}` (a FlowPilot chat surface). An L1 `ai_build` session is an `l1_walk_session`, not an `ai_session`, so it cannot populate `source_session_id`. Changes:
- **Model/migration:** add `FlowProposal.l1_session_id` (nullable FK → `l1_walk_sessions.id`, `ondelete=SET NULL`, indexed). Make `source_session_id` **nullable**. Add CHECK `((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL))` — exactly one source set.
- **Review UI:** when `l1_session_id` is set (source `ai_realtime_l1`), render the "Source" block as a read-only walked-path summary (problem statement + the resolved path) instead of a `/pilot/...` link. Existing ai_session-sourced proposals are unchanged.
- **Tree promotion:** `_create_tree_from_proposal` sets `Tree.source_session_id` from the proposal — for L1-sourced proposals leave it NULL (confirm `Tree.source_session_id` is nullable; if not, include in the migration).
## 7. Minimum escalation handoff
On `escalate` (terminal node reached, or the L1 hits the Escalate modal during an `ai_build` walk) — extends `l1_session_service.escalate`. **The engineer-visible surface is the primary, dependency-free handoff; the bell-badge notification is a thin addition that requires three specific extensions to the FlowPilot-shaped notification system (Finding 3).**
1. **Engineer-visible surface (primary).** Escalated L1 sessions appear in an engineer-facing list — extend the existing `/escalations` queue (`EscalationQueuePage`) with an "L1 escalations" section, backed by a new `GET /l1/escalations`. Each row: problem statement, walked-path summary, who escalated, when, reason category. Pollable; no dependency on the notification subsystem.
2. **Bell-badge notification (Finding 3 — three explicit changes).** The notification system is currently FlowPilot-specific:
- `VALID_EVENTS` (`backend/app/schemas/notification.py`) has no `l1.session.escalated`. **Add it** to the set (and to the default `events_enabled` map).
- `_build_notification_link` (`notification_service.py`) only knows `session.escalated → /pilot/{session_id}?pickup=true`. **Add** `l1.session.escalated → /escalations` and **add** a body template for the new event. The existing `session.escalated` event must NOT be reused — an L1 escalation has no ai_session and no `/pilot` pickup flow.
- Default recipients (`_resolve_recipients`, ~line 184) are owner/admin/team_admin only — ordinary **engineers are excluded**. Since L1 escalations must reach engineers who can pick them up, the call **must pass explicit `target_user_ids`** = the account's active `engineer`-role users (plus owner/admin), not rely on the default set.
**Still deferred** (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.
## 8. Data model & migrations
**Migration 1 — `ai_build` session kind.**
- Extend `l1_walk_sessions` `ck_l1_walk_sessions_session_kind` CHECK to include `'ai_build'`.
- Extend `ck_l1_walk_sessions_target_consistency`: for `ai_build`, both `flow_id` and `flow_proposal_id` are NULL (same as `adhoc`).
**Migration 2 — account L1 category settings.**
- Add `accounts.enabled_l1_categories` `JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb` (list of category keys). RLS already covers `accounts`.
**Migration 3 — FlowProposal L1 source linkage (Finding 1).**
- Add `flow_proposals.l1_session_id` nullable FK → `l1_walk_sessions.id` (`ondelete=SET NULL`, indexed).
- Make `flow_proposals.source_session_id` **nullable** (was `NOT NULL`).
- Add CHECK `((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL))` — exactly one source.
- Confirm `trees.source_session_id` is nullable (L1-promoted trees leave it NULL); if not, drop its NOT NULL here.
No new tables — live build state rides on the existing `l1_walk_sessions.walked_path`; persisted trees ride on `FlowProposal.proposed_flow_data`.
## 9. API surface
| Method | Path | Notes | Auth |
|---|---|---|---|
| POST | `/l1/intake` | **Extended**: now runs `match_or_build`; response carries `outcome` (`matched`/`suggest`/`out_of_scope`/`build`). | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/next-node` | **New**: record answer/ack on current node, generate + return next node (or terminal). | `require_l1_or_coverage` |
| GET | `/accounts/me/l1-categories` | **New**: list enabled + available categories + hard-floor (read-only) list. | `require_l1_or_above` (read) |
| PATCH | `/accounts/me/l1-categories` | **New**: set enabled categories. | `require_account_owner_or_admin` (Finding 6) |
| GET | `/l1/escalations` | **New** (or extend `/escalations`): engineer-visible escalated-from-L1 list. | `require_engineer_or_admin` |
**Finding 6 — new auth dep.** The category control is an owner/admin setting, but `require_engineer_or_admin` also admits `engineer`. No existing dep matches "owner or account-admin" (`require_account_owner` is owner-only; `require_admin` is super-admin-only). Add `require_account_owner_or_admin` to `deps.py`: allow `super_admin` bypass, then `account_role in ('owner', 'admin')`, else 403. Use it for the PATCH.
## 10. Frontend
- `L1WalkTreeVariant` — replace synthetic stepping with real node rendering driven by `/next-node`; render `question` (yes/no), `instruction` (acknowledge), `resolved`/`escalate` (terminal). Per-node loading affordance. Disclaimer banner mounted for `ai_build` sessions.
- `L1Dashboard` intake handler — dispatch on `match_or_build` `outcome` (suggest prompt, out-of-scope prompt, build → walker).
- New admin settings panel (under `/account`) — toggle enabled L1 categories; show hard-floor list as read-only "always excluded."
- Engineer escalations surface — "L1 escalations" section/list.
## 11. Testing strategy
**Backend unit:**
- `ai_tree_builder.generate_next_node` — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
- Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
- `match_or_build` — all four outcomes at threshold boundaries (`score == MATCH_THRESHOLD`, `== SUGGEST_THRESHOLD`); **match runs before the category gate** (a matched published flow is returned even when its category is disabled — Finding 4); `force_build` skips match but still applies the category gate; `out_of_scope` only on the build path when category disabled/unknown.
- `classify` — known categories map correctly; unknown → out_of_scope.
- `normalize_walked_path` (Finding 5) — produces a dict with a root `id`; untraversed `question` branches become `needs_review` stubs; output passes the `_create_tree_from_proposal` validity guard.
- Flywheel capture — resolve creates `ai_realtime_l1` proposal with `l1_session_id` set and `source_session_id` NULL (Finding 1); CHECK accepts exactly-one-source; dedupe merges near-duplicate.
- Escalation handoff — `l1.session.escalated` accepted by the notification schema (Finding 3); link resolves to `/escalations`; explicit engineer `target_user_ids` receive it; escalated session appears in `GET /l1/escalations`.
**Backend integration:**
- Full intake→build→resolve creates an outcome-validated proposal.
- Intake→build→escalate notifies engineers and surfaces in the escalations list.
- Migrations roundtrip; `ai_build` CHECK + target-consistency hold.
**Frontend e2e (extend `l1-workspace.spec.ts`):**
- L1 intake with no match → AI build → answer nodes → resolve → proposal created.
- L1 build → escalate node → escalate handoff.
- Admin toggles a category off → that problem class returns out-of-scope.
**AI quality (plan-time):** small eval set of common L1 problems; assert trees stay in-scope, reach resolution or escalate cleanly, never emit hard-floor actions. Benchmark Sonnet vs Opus for the model-tier decision.
## 12. Risks & open questions
- **Hallucinated-but-plausible steps** for niche/company-specific apps. Mitigation: classification gate + constrained prompt + escalate-early + disclaimer. Residual risk accepted for v1; eval set bounds it.
- **Latency on a live call.** Node-by-node means ~24s per branch. Mitigation: Sonnet, small per-node token budget, clear loading affordance. Benchmark at plan time.
- **Coherence across independently-generated nodes.** Mitigation: full walked-path context every call.
- **Classification accuracy.** A misclassify could wrongly gate a valid problem out, or let a borderline one through. Mitigation: hard floor is category-independent; out-of-scope still offers adhoc/escalate (no dead end).
- **Open (product, for spec review):** the default category allowlist (§5.2) and the hard-floor list (§5.1) — confirm/edit. Model tier — confirm Sonnet pending benchmark.
## 13. Out of scope (restated)
KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.
**Also deferred (surfaced in review):**
- **Matching against unpromoted `FlowProposal`s** (Finding 2). `flow_matching_engine` matches published flows only. Extending it to also surface outcome-validated drafts before promotion is a later enhancement; Phase 2A relies on engineer promotion (draft → published flow → matchable).
## 14. Review revisions (2026-05-29 Codex review)
All six findings verified against code and resolved in this spec:
1. **Blocker — FlowProposal source linkage:** §6.2 + §8 Migration 3 (new nullable `l1_session_id`, `source_session_id` made nullable, exactly-one CHECK, review-UI link change).
2. **High — match scope:** §3 (match published flows only; proposal-matching deferred §13).
3. **High — escalation notification:** §7 (engineer surface is primary; three explicit notification-system changes enumerated).
4. **Medium — gate ordering:** §3 + §5 Layer 1 (match first; category gate only on the build path).
5. **Medium — flywheel tree shape:** §6.1 (`normalize_walked_path` produces a valid tree with root `id`; unexplored branches → `needs_review` stubs).
6. **Medium — category write auth:** §9 (new `require_account_owner_or_admin` dep; `require_engineer_or_admin` was too broad).