Files
resolutionflow/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
Michael Chihlas 5b58702b20 docs(spec): L1 AI decision-tree builder — Phase 2A design
Brainstormed design for real-time AI tree building when no KB/flow matches.
Overrides the original "no empty-KB build" rule: build from generic L1
knowledge under a layered safety model (classification gate, constrained
generation, per-node validation with a hard floor, standing disclaimer).
Approach C — dedicated ai_tree_builder + match_or_build orchestrator,
reusing flow_matching_engine and the knowledge_flywheel proposal pipeline.

Scope: streaming node-by-node builder, admin-configurable categories,
flywheel capture of resolved trees, minimum escalation handoff (notify +
engineer surface). KB ingestion/connectors, PSA reassign, escalation
package, and AI chat handoff deferred to later phases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 01:22:37 -04:00

221 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# L1 AI Decision-Tree Builder — Phase 2A Design
**Status:** Draft for review
**Date:** 2026-05-29
**Author:** previous session (brainstorming)
**Predecessor:** [`2026-05-28-l1-workspace-design.md`](2026-05-28-l1-workspace-design.md) (full L1 vision), [`2026-05-28-l1-workspace-phase-1-acceptance.md`](2026-05-28-l1-workspace-phase-1-acceptance.md) (what shipped in Phase 1)
---
## 1. Goal
When an L1 tech describes a problem and there is **no matching authored flow or AI draft**, the platform builds a yes/no decision tree **in real time from the model's general L1 knowledge** and walks the tech through it node by node. Scoped to L1-appropriate troubleshooting: simple yes/no questions and reversible step-by-step instructions. Successful trees are captured as outcome-validated drafts for engineer review, compounding the account's knowledge base from real resolutions.
This **overrides** the original spec's "no empty-KB build" rule (§8.1 of the predecessor), which aborted to a degradation screen when no KB existed. Instead of aborting, we build from generic knowledge under a layered safety model.
KB grounding (RAG over ingested documents) is **explicitly deferred to Phase 2B** — Phase 2A builds from generic knowledge only, plus matching against already-authored flows.
## 2. Scope
**In scope (Phase 2A):**
- `match_or_build` orchestrator inserted at L1 intake (match-first, build-on-miss).
- `ai_tree_builder` service: node-by-node ("streaming") tree generation, constrained + escalate-early.
- Admin-configurable L1 category allowlist (Account Owner/Admin control panel).
- Standing AI-disclaimer banner on AI-built walks.
- Flywheel capture: resolved AI trees become outcome-validated `FlowProposal`s.
- Minimum escalation handoff: engineer bell-badge notification + an engineer-visible "escalated from L1" surface.
**Deferred:**
- KB document ingestion + connectors (IT Glue, Hudu, SharePoint/OneDrive) — Phase 2B.
- RAG grounding of the builder on ingested KB — Phase 2B.
- PSA ticket reassign on escalation, escalation-package generation, AI chat handoff — later phase.
- `BuildAbortedNoKB` screen from the original spec — **dropped** (superseded by build-from-generic).
## 3. Architecture (Approach C)
Dedicated builder for the constrained node generation; reuse existing rails for matching and capture.
**New services:**
| File | Responsibility |
|---|---|
| `backend/app/services/match_or_build.py` | Orchestrator. `match_or_build(account_id, problem_text, ticket_ref, *, force_build=False) -> MatchOrBuildResult`. Classify → category gate → match pass → build/suggest/out-of-scope decision. |
| `backend/app/services/ai_tree_builder.py` | Node-by-node generation. `generate_next_node(problem_text, category, walked_path) -> TreeNode`. Reuses `get_ai_provider` + `generate_json` + `parse_llm_json`. Owns the constrained system prompt and per-node validation. |
| `backend/app/services/l1_category_service.py` | Read/write an account's enabled L1 categories; expose the default allowlist and the always-forbidden hard floor. |
**Reused as-is:**
- `flow_matching_engine.find_matches()` — semantic + keyword + recency match pass.
- `knowledge_flywheel` proposal-creation + dedupe (`_find_similar_pending_proposal`) — outcome-validated capture.
- `notification_service` — engineer escalation notification.
- Phase 1 `L1WalkTreeVariant` walker — its stubbed synthetic-step UI is replaced by real AI node rendering.
**Intake decision flow:**
```
POST /l1/intake (problem_statement, customer_*, force_build?)
→ match_or_build(account_id, problem_text, ticket_ref, force_build):
1. category = classify(problem_text) # new
2. if category not in account.enabled_l1_categories:
return {outcome: 'out_of_scope', category}
3. if not force_build:
hits = flow_matching_engine.find_matches(problem_text)
best = max(hits, default=None)
if best.score >= MATCH_THRESHOLD:
return {outcome: 'matched', target_id, session_kind} # flow|proposal
if best.score >= SUGGEST_THRESHOLD:
return {outcome: 'suggest', near_miss, can_build: true}
4. return {outcome: 'build', session_kind: 'ai_build', category}
```
Frontend dispatches on `outcome`:
- `matched` → start a `flow`/`proposal` walk (Phase 1 paths).
- `suggest` → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with `force_build=true`.
- `out_of_scope` → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
- `build` → create an `ai_build` session, navigate to the walker, fetch the first node.
## 4. The streaming build & node schema
`ai_tree_builder.generate_next_node()` is called with the problem statement, the resolved category, and the **full walked path so far**. It returns exactly one node. Passing the whole path every call is what keeps independently-generated nodes coherent and lets the model decide when it has exhausted safe steps.
**Node shape (`proposed_flow_data` node, also the live `walked_path` entry):**
```json
// question — yes/no branch; both branches regenerate
{ "node_type": "question", "id": "n3", "text": "Is the printer showing a 'ready' status light?",
"yes_next": "generate", "no_next": "generate" }
// instruction — a single safe, reversible action; advances on acknowledgement
{ "node_type": "instruction", "id": "n4", "text": "Unplug the printer for 30 seconds, then power it back on.",
"next": "generate" }
// resolved — terminal success
{ "node_type": "resolved", "id": "n7", "text": "Printer is back online and printing test pages." }
// escalate — terminal handoff (escalate-early safety valve)
{ "node_type": "escalate", "id": "n7", "reason_category": "exhausted_safe_steps",
"text": "This looks like a driver-level fault beyond L1 scope — escalating to engineering." }
```
`"generate"` is a sentinel meaning "call `generate_next_node` again with the new answer appended." The first node is fetched synchronously on `ai_build` session creation (intake). Each subsequent node is fetched when the tech answers/acknowledges — target latency ~24s per node; show a per-node "Thinking through the next step…" affordance.
**Endpoint:** `POST /l1/sessions/{id}/next-node` body `{node_id, answer?: 'yes'|'no', acknowledged?: true, note?}`. Appends the answered node to `walked_path`, then generates and returns the next node (or a terminal node). Replaces the Phase 1 synthetic stepping in `L1WalkTreeVariant`.
## 5. Safety model (layered)
**Layer 1 — classification gate.** `classify(problem_text)` maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or `unknown`); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is `unknown`), intake returns `out_of_scope`; no build happens.
**Layer 2 — constrained generation.** The `ai_tree_builder` system prompt restricts output to:
- Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
- A **hard floor of always-forbidden actions** (see §5.1) that NO category may unlock.
- An explicit instruction to emit an `escalate` node — never guess — once it runs out of in-scope safe steps.
**Layer 3 — per-node validation.** Server-side, every generated node is checked before being returned:
- Reject (and regenerate once, then escalate) nodes whose text matches forbidden-action patterns (§5.1).
- Enforce a **depth cap** (default `L1_BUILD_MAX_DEPTH = 12`): once the walked path hits the cap, force an `escalate` node.
- Validate node JSON shape (Pydantic); malformed → regenerate once, then escalate.
**Layer 4 — standing disclaimer.** Persistent banner on every `ai_build` walk:
> *"These are high-confidence troubleshooting steps, but they come from outside your organization's knowledge base — review them before acting. When in doubt, escalate early."*
### 5.1 Hard floor — always forbidden (admins cannot enable)
Regardless of enabled categories, the builder must never produce steps that:
- Modify the Windows registry, system files, or boot configuration.
- Delete, format, or repartition data/disks; remove user profiles or mailboxes.
- Change credentials, MFA, security/firewall/AV settings, or disable protections.
- Run scripts/commands with elevated/admin privileges.
- Touch domain controllers, DNS, DHCP, or production server config.
- Make purchases, license changes, or anything with billing impact.
*(This list is a product decision — review and edit during spec review.)*
### 5.2 Default enabled category allowlist (admin-editable)
Ships enabled by default; Account Owners/Admins toggle per account:
`password_reset`, `account_lockout`, `printer`, `email_outlook_client`, `wifi_network_basics`, `vpn_connect`, `teams_zoom_av`, `browser_cache_cookies`, `peripheral_reconnect`, `os_restart_update`.
*(This list is a product decision — review and edit during spec review.)*
### 5.3 Tunables
| Setting | Default | Notes |
|---|---|---|
| `MATCH_THRESHOLD` | 0.75 | Carried from predecessor spec §8.1. |
| `SUGGEST_THRESHOLD` | 0.60 | Carried from predecessor spec §8.1. |
| `L1_BUILD_MAX_DEPTH` | 12 | Force escalate beyond this many nodes. |
| `get_model_for_action('l1_realtime_build')` | Sonnet | Latency-sensitive; benchmark Sonnet vs Opus during plan. |
| Per-node max_tokens | 1024 | One node is small. |
## 6. Flywheel capture
On `resolve` of an `ai_build` session (`l1_session_service.resolve` extension):
1. Build `proposed_flow_data` from the `walked_path` (the nodes that were actually traversed, normalized into a tree structure).
2. Create a `FlowProposal`: `source='ai_realtime_l1'`, `validated_by_outcome=true`, `proposed_flow_data=<tree>`, `linked_ticket_id/kind=<session ticket>`, `problem_domain=<category>`, `status='pending'`.
3. Run the existing `_find_similar_pending_proposal` dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
4. Emit the existing `proposal.pending` notification to the review queue.
Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by `flow_matching_engine` on future intakes → the KB compounds. No new review UI needed; `source='ai_realtime_l1'` rows surface in the existing queue (optionally badge them "AI · outcome-validated").
## 7. Minimum escalation handoff
On `escalate` (terminal node reached, or the L1 hits the Escalate modal during an `ai_build` walk) — extends `l1_session_service.escalate`:
1. **Notify engineers**`notification_service` bell-badge event `l1.session.escalated` to the account's engineers (and `is_team_admin`/owner). Payload: ticket ref, problem summary, escalation reason category, link.
2. **Engineer-visible surface** — escalated L1 sessions appear in an engineer-facing list. Reuse/extend the existing `/escalations` queue (`EscalationQueuePage`) with an "L1 escalations" section, or a dedicated `GET /l1/escalations` consumed there. Each row shows problem, the walked path summary, who escalated, when.
**Still deferred** (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.
## 8. Data model & migrations
**Migration 1 — `ai_build` session kind.**
- Extend `l1_walk_sessions` `ck_l1_walk_sessions_session_kind` CHECK to include `'ai_build'`.
- Extend `ck_l1_walk_sessions_target_consistency`: for `ai_build`, both `flow_id` and `flow_proposal_id` are NULL (same as `adhoc`).
**Migration 2 — account L1 category settings.**
- Add `accounts.enabled_l1_categories` `JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb` (list of category keys). RLS already covers `accounts`.
No new tables — live build state rides on the existing `l1_walk_sessions.walked_path`; persisted trees ride on `FlowProposal.proposed_flow_data`.
## 9. API surface
| Method | Path | Notes | Auth |
|---|---|---|---|
| POST | `/l1/intake` | **Extended**: now runs `match_or_build`; response carries `outcome` (`matched`/`suggest`/`out_of_scope`/`build`). | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/next-node` | **New**: record answer/ack on current node, generate + return next node (or terminal). | `require_l1_or_coverage` |
| GET | `/accounts/me/l1-categories` | **New**: list enabled + available categories + hard-floor (read-only) list. | `require_l1_or_above` (read) |
| PATCH | `/accounts/me/l1-categories` | **New**: set enabled categories. | `require_engineer_or_admin` (owner/admin) |
| GET | `/l1/escalations` | **New** (or extend `/escalations`): engineer-visible escalated-from-L1 list. | `require_engineer_or_admin` |
## 10. Frontend
- `L1WalkTreeVariant` — replace synthetic stepping with real node rendering driven by `/next-node`; render `question` (yes/no), `instruction` (acknowledge), `resolved`/`escalate` (terminal). Per-node loading affordance. Disclaimer banner mounted for `ai_build` sessions.
- `L1Dashboard` intake handler — dispatch on `match_or_build` `outcome` (suggest prompt, out-of-scope prompt, build → walker).
- New admin settings panel (under `/account`) — toggle enabled L1 categories; show hard-floor list as read-only "always excluded."
- Engineer escalations surface — "L1 escalations" section/list.
## 11. Testing strategy
**Backend unit:**
- `ai_tree_builder.generate_next_node` — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
- Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
- `match_or_build` — all four outcomes at threshold boundaries (`score == MATCH_THRESHOLD`, `== SUGGEST_THRESHOLD`), `force_build` bypasses match, `out_of_scope` when category disabled.
- `classify` — known categories map correctly; unknown → out_of_scope.
- Flywheel capture — resolve creates `ai_realtime_l1` proposal; dedupe merges near-duplicate.
- Escalation handoff — notification fired; escalated session appears in engineer query.
**Backend integration:**
- Full intake→build→resolve creates an outcome-validated proposal.
- Intake→build→escalate notifies engineers and surfaces in the escalations list.
- Migrations roundtrip; `ai_build` CHECK + target-consistency hold.
**Frontend e2e (extend `l1-workspace.spec.ts`):**
- L1 intake with no match → AI build → answer nodes → resolve → proposal created.
- L1 build → escalate node → escalate handoff.
- Admin toggles a category off → that problem class returns out-of-scope.
**AI quality (plan-time):** small eval set of common L1 problems; assert trees stay in-scope, reach resolution or escalate cleanly, never emit hard-floor actions. Benchmark Sonnet vs Opus for the model-tier decision.
## 12. Risks & open questions
- **Hallucinated-but-plausible steps** for niche/company-specific apps. Mitigation: classification gate + constrained prompt + escalate-early + disclaimer. Residual risk accepted for v1; eval set bounds it.
- **Latency on a live call.** Node-by-node means ~24s per branch. Mitigation: Sonnet, small per-node token budget, clear loading affordance. Benchmark at plan time.
- **Coherence across independently-generated nodes.** Mitigation: full walked-path context every call.
- **Classification accuracy.** A misclassify could wrongly gate a valid problem out, or let a borderline one through. Mitigation: hard floor is category-independent; out-of-scope still offers adhoc/escalate (no dead end).
- **Open (product, for spec review):** the default category allowlist (§5.2) and the hard-floor list (§5.1) — confirm/edit. Model tier — confirm Sonnet pending benchmark.
## 13. Out of scope (restated)
KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.