docs(spec): L1 AI decision-tree builder — Phase 2A design

Brainstormed design for real-time AI tree building when no KB/flow matches. Overrides the original "no empty-KB build" rule: build from generic L1 knowledge under a layered safety model (classification gate, constrained generation, per-node validation with a hard floor, standing disclaimer). Approach C — dedicated ai_tree_builder + match_or_build orchestrator, reusing flow_matching_engine and the knowledge_flywheel proposal pipeline. Scope: streaming node-by-node builder, admin-configurable categories, flywheel capture of resolved trees, minimum escalation handoff (notify + engineer surface). KB ingestion/connectors, PSA reassign, escalation package, and AI chat handoff deferred to later phases. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 01:22:37 -04:00
parent 57d28ac08e
commit 5b58702b20
1 changed files with 220 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
+++ b/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
@@ -0,0 +1,220 @@
+# L1 AI Decision-Tree Builder — Phase 2A Design
+
+**Status:** Draft for review
+**Date:** 2026-05-29
+**Author:** previous session (brainstorming)
+**Predecessor:** [`2026-05-28-l1-workspace-design.md`](2026-05-28-l1-workspace-design.md) (full L1 vision), [`2026-05-28-l1-workspace-phase-1-acceptance.md`](2026-05-28-l1-workspace-phase-1-acceptance.md) (what shipped in Phase 1)
+
+---
+
+## 1. Goal
+
+When an L1 tech describes a problem and there is **no matching authored flow or AI draft**, the platform builds a yes/no decision tree **in real time from the model's general L1 knowledge** and walks the tech through it node by node. Scoped to L1-appropriate troubleshooting: simple yes/no questions and reversible step-by-step instructions. Successful trees are captured as outcome-validated drafts for engineer review, compounding the account's knowledge base from real resolutions.
+
+This **overrides** the original spec's "no empty-KB build" rule (§8.1 of the predecessor), which aborted to a degradation screen when no KB existed. Instead of aborting, we build from generic knowledge under a layered safety model.
+
+KB grounding (RAG over ingested documents) is **explicitly deferred to Phase 2B** — Phase 2A builds from generic knowledge only, plus matching against already-authored flows.
+
+## 2. Scope
+
+**In scope (Phase 2A):**
+- `match_or_build` orchestrator inserted at L1 intake (match-first, build-on-miss).
+- `ai_tree_builder` service: node-by-node ("streaming") tree generation, constrained + escalate-early.
+- Admin-configurable L1 category allowlist (Account Owner/Admin control panel).
+- Standing AI-disclaimer banner on AI-built walks.
+- Flywheel capture: resolved AI trees become outcome-validated `FlowProposal`s.
+- Minimum escalation handoff: engineer bell-badge notification + an engineer-visible "escalated from L1" surface.
+
+**Deferred:**
+- KB document ingestion + connectors (IT Glue, Hudu, SharePoint/OneDrive) — Phase 2B.
+- RAG grounding of the builder on ingested KB — Phase 2B.
+- PSA ticket reassign on escalation, escalation-package generation, AI chat handoff — later phase.
+- `BuildAbortedNoKB` screen from the original spec — **dropped** (superseded by build-from-generic).
+
+## 3. Architecture (Approach C)
+
+Dedicated builder for the constrained node generation; reuse existing rails for matching and capture.
+
+**New services:**
+| File | Responsibility |
+|---|---|
+| `backend/app/services/match_or_build.py` | Orchestrator. `match_or_build(account_id, problem_text, ticket_ref, *, force_build=False) -> MatchOrBuildResult`. Classify → category gate → match pass → build/suggest/out-of-scope decision. |
+| `backend/app/services/ai_tree_builder.py` | Node-by-node generation. `generate_next_node(problem_text, category, walked_path) -> TreeNode`. Reuses `get_ai_provider` + `generate_json` + `parse_llm_json`. Owns the constrained system prompt and per-node validation. |
+| `backend/app/services/l1_category_service.py` | Read/write an account's enabled L1 categories; expose the default allowlist and the always-forbidden hard floor. |
+
+**Reused as-is:**
+- `flow_matching_engine.find_matches()` — semantic + keyword + recency match pass.
+- `knowledge_flywheel` proposal-creation + dedupe (`_find_similar_pending_proposal`) — outcome-validated capture.
+- `notification_service` — engineer escalation notification.
+- Phase 1 `L1WalkTreeVariant` walker — its stubbed synthetic-step UI is replaced by real AI node rendering.
+
+**Intake decision flow:**
+```
+POST /l1/intake (problem_statement, customer_*, force_build?)
+  → match_or_build(account_id, problem_text, ticket_ref, force_build):
+      1. category = classify(problem_text)                       # new
+      2. if category not in account.enabled_l1_categories:
+             return {outcome: 'out_of_scope', category}
+      3. if not force_build:
+             hits = flow_matching_engine.find_matches(problem_text)
+             best = max(hits, default=None)
+             if best.score >= MATCH_THRESHOLD:
+                 return {outcome: 'matched', target_id, session_kind}   # flow|proposal
+             if best.score >= SUGGEST_THRESHOLD:
+                 return {outcome: 'suggest', near_miss, can_build: true}
+      4. return {outcome: 'build', session_kind: 'ai_build', category}
+```
+Frontend dispatches on `outcome`:
+- `matched` → start a `flow`/`proposal` walk (Phase 1 paths).
+- `suggest` → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with `force_build=true`.
+- `out_of_scope` → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
+- `build` → create an `ai_build` session, navigate to the walker, fetch the first node.
+
+## 4. The streaming build & node schema
+
+`ai_tree_builder.generate_next_node()` is called with the problem statement, the resolved category, and the **full walked path so far**. It returns exactly one node. Passing the whole path every call is what keeps independently-generated nodes coherent and lets the model decide when it has exhausted safe steps.
+
+**Node shape (`proposed_flow_data` node, also the live `walked_path` entry):**
+```json
+// question — yes/no branch; both branches regenerate
+{ "node_type": "question", "id": "n3", "text": "Is the printer showing a 'ready' status light?",
+  "yes_next": "generate", "no_next": "generate" }
+
+// instruction — a single safe, reversible action; advances on acknowledgement
+{ "node_type": "instruction", "id": "n4", "text": "Unplug the printer for 30 seconds, then power it back on.",
+  "next": "generate" }
+
+// resolved — terminal success
+{ "node_type": "resolved", "id": "n7", "text": "Printer is back online and printing test pages." }
+
+// escalate — terminal handoff (escalate-early safety valve)
+{ "node_type": "escalate", "id": "n7", "reason_category": "exhausted_safe_steps",
+  "text": "This looks like a driver-level fault beyond L1 scope — escalating to engineering." }
+```
+
+`"generate"` is a sentinel meaning "call `generate_next_node` again with the new answer appended." The first node is fetched synchronously on `ai_build` session creation (intake). Each subsequent node is fetched when the tech answers/acknowledges — target latency ~2–4s per node; show a per-node "Thinking through the next step…" affordance.
+
+**Endpoint:** `POST /l1/sessions/{id}/next-node` body `{node_id, answer?: 'yes'|'no', acknowledged?: true, note?}`. Appends the answered node to `walked_path`, then generates and returns the next node (or a terminal node). Replaces the Phase 1 synthetic stepping in `L1WalkTreeVariant`.
+
+## 5. Safety model (layered)
+
+**Layer 1 — classification gate.** `classify(problem_text)` maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or `unknown`); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is `unknown`), intake returns `out_of_scope`; no build happens.
+
+**Layer 2 — constrained generation.** The `ai_tree_builder` system prompt restricts output to:
+- Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
+- A **hard floor of always-forbidden actions** (see §5.1) that NO category may unlock.
+- An explicit instruction to emit an `escalate` node — never guess — once it runs out of in-scope safe steps.
+
+**Layer 3 — per-node validation.** Server-side, every generated node is checked before being returned:
+- Reject (and regenerate once, then escalate) nodes whose text matches forbidden-action patterns (§5.1).
+- Enforce a **depth cap** (default `L1_BUILD_MAX_DEPTH = 12`): once the walked path hits the cap, force an `escalate` node.
+- Validate node JSON shape (Pydantic); malformed → regenerate once, then escalate.
+
+**Layer 4 — standing disclaimer.** Persistent banner on every `ai_build` walk:
+
+> *"These are high-confidence troubleshooting steps, but they come from outside your organization's knowledge base — review them before acting. When in doubt, escalate early."*
+
+### 5.1 Hard floor — always forbidden (admins cannot enable)
+Regardless of enabled categories, the builder must never produce steps that:
+- Modify the Windows registry, system files, or boot configuration.
+- Delete, format, or repartition data/disks; remove user profiles or mailboxes.
+- Change credentials, MFA, security/firewall/AV settings, or disable protections.
+- Run scripts/commands with elevated/admin privileges.
+- Touch domain controllers, DNS, DHCP, or production server config.
+- Make purchases, license changes, or anything with billing impact.
+
+*(This list is a product decision — review and edit during spec review.)*
+
+### 5.2 Default enabled category allowlist (admin-editable)
+Ships enabled by default; Account Owners/Admins toggle per account:
+`password_reset`, `account_lockout`, `printer`, `email_outlook_client`, `wifi_network_basics`, `vpn_connect`, `teams_zoom_av`, `browser_cache_cookies`, `peripheral_reconnect`, `os_restart_update`.
+
+*(This list is a product decision — review and edit during spec review.)*
+
+### 5.3 Tunables
+| Setting | Default | Notes |
+|---|---|---|
+| `MATCH_THRESHOLD` | 0.75 | Carried from predecessor spec §8.1. |
+| `SUGGEST_THRESHOLD` | 0.60 | Carried from predecessor spec §8.1. |
+| `L1_BUILD_MAX_DEPTH` | 12 | Force escalate beyond this many nodes. |
+| `get_model_for_action('l1_realtime_build')` | Sonnet | Latency-sensitive; benchmark Sonnet vs Opus during plan. |
+| Per-node max_tokens | 1024 | One node is small. |
+
+## 6. Flywheel capture
+
+On `resolve` of an `ai_build` session (`l1_session_service.resolve` extension):
+1. Build `proposed_flow_data` from the `walked_path` (the nodes that were actually traversed, normalized into a tree structure).
+2. Create a `FlowProposal`: `source='ai_realtime_l1'`, `validated_by_outcome=true`, `proposed_flow_data=<tree>`, `linked_ticket_id/kind=<session ticket>`, `problem_domain=<category>`, `status='pending'`.
+3. Run the existing `_find_similar_pending_proposal` dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
+4. Emit the existing `proposal.pending` notification to the review queue.
+
+Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by `flow_matching_engine` on future intakes → the KB compounds. No new review UI needed; `source='ai_realtime_l1'` rows surface in the existing queue (optionally badge them "AI · outcome-validated").
+
+## 7. Minimum escalation handoff
+
+On `escalate` (terminal node reached, or the L1 hits the Escalate modal during an `ai_build` walk) — extends `l1_session_service.escalate`:
+1. **Notify engineers** — `notification_service` bell-badge event `l1.session.escalated` to the account's engineers (and `is_team_admin`/owner). Payload: ticket ref, problem summary, escalation reason category, link.
+2. **Engineer-visible surface** — escalated L1 sessions appear in an engineer-facing list. Reuse/extend the existing `/escalations` queue (`EscalationQueuePage`) with an "L1 escalations" section, or a dedicated `GET /l1/escalations` consumed there. Each row shows problem, the walked path summary, who escalated, when.
+
+**Still deferred** (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.
+
+## 8. Data model & migrations
+
+**Migration 1 — `ai_build` session kind.**
+- Extend `l1_walk_sessions` `ck_l1_walk_sessions_session_kind` CHECK to include `'ai_build'`.
+- Extend `ck_l1_walk_sessions_target_consistency`: for `ai_build`, both `flow_id` and `flow_proposal_id` are NULL (same as `adhoc`).
+
+**Migration 2 — account L1 category settings.**
+- Add `accounts.enabled_l1_categories` `JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb` (list of category keys). RLS already covers `accounts`.
+
+No new tables — live build state rides on the existing `l1_walk_sessions.walked_path`; persisted trees ride on `FlowProposal.proposed_flow_data`.
+
+## 9. API surface
+
+| Method | Path | Notes | Auth |
+|---|---|---|---|
+| POST | `/l1/intake` | **Extended**: now runs `match_or_build`; response carries `outcome` (`matched`/`suggest`/`out_of_scope`/`build`). | `require_l1_or_coverage` |
+| POST | `/l1/sessions/{id}/next-node` | **New**: record answer/ack on current node, generate + return next node (or terminal). | `require_l1_or_coverage` |
+| GET | `/accounts/me/l1-categories` | **New**: list enabled + available categories + hard-floor (read-only) list. | `require_l1_or_above` (read) |
+| PATCH | `/accounts/me/l1-categories` | **New**: set enabled categories. | `require_engineer_or_admin` (owner/admin) |
+| GET | `/l1/escalations` | **New** (or extend `/escalations`): engineer-visible escalated-from-L1 list. | `require_engineer_or_admin` |
+
+## 10. Frontend
+
+- `L1WalkTreeVariant` — replace synthetic stepping with real node rendering driven by `/next-node`; render `question` (yes/no), `instruction` (acknowledge), `resolved`/`escalate` (terminal). Per-node loading affordance. Disclaimer banner mounted for `ai_build` sessions.
+- `L1Dashboard` intake handler — dispatch on `match_or_build` `outcome` (suggest prompt, out-of-scope prompt, build → walker).
+- New admin settings panel (under `/account`) — toggle enabled L1 categories; show hard-floor list as read-only "always excluded."
+- Engineer escalations surface — "L1 escalations" section/list.
+
+## 11. Testing strategy
+
+**Backend unit:**
+- `ai_tree_builder.generate_next_node` — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
+- Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
+- `match_or_build` — all four outcomes at threshold boundaries (`score == MATCH_THRESHOLD`, `== SUGGEST_THRESHOLD`), `force_build` bypasses match, `out_of_scope` when category disabled.
+- `classify` — known categories map correctly; unknown → out_of_scope.
+- Flywheel capture — resolve creates `ai_realtime_l1` proposal; dedupe merges near-duplicate.
+- Escalation handoff — notification fired; escalated session appears in engineer query.
+
+**Backend integration:**
+- Full intake→build→resolve creates an outcome-validated proposal.
+- Intake→build→escalate notifies engineers and surfaces in the escalations list.
+- Migrations roundtrip; `ai_build` CHECK + target-consistency hold.
+
+**Frontend e2e (extend `l1-workspace.spec.ts`):**
+- L1 intake with no match → AI build → answer nodes → resolve → proposal created.
+- L1 build → escalate node → escalate handoff.
+- Admin toggles a category off → that problem class returns out-of-scope.
+
+**AI quality (plan-time):** small eval set of common L1 problems; assert trees stay in-scope, reach resolution or escalate cleanly, never emit hard-floor actions. Benchmark Sonnet vs Opus for the model-tier decision.
+
+## 12. Risks & open questions
+
+- **Hallucinated-but-plausible steps** for niche/company-specific apps. Mitigation: classification gate + constrained prompt + escalate-early + disclaimer. Residual risk accepted for v1; eval set bounds it.
+- **Latency on a live call.** Node-by-node means ~2–4s per branch. Mitigation: Sonnet, small per-node token budget, clear loading affordance. Benchmark at plan time.
+- **Coherence across independently-generated nodes.** Mitigation: full walked-path context every call.
+- **Classification accuracy.** A misclassify could wrongly gate a valid problem out, or let a borderline one through. Mitigation: hard floor is category-independent; out-of-scope still offers adhoc/escalate (no dead end).
+- **Open (product, for spec review):** the default category allowlist (§5.2) and the hard-floor list (§5.1) — confirm/edit. Model tier — confirm Sonnet pending benchmark.
+
+## 13. Out of scope (restated)
+KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.