Files
resolutionflow/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
Michael Chihlas 5b58702b20 docs(spec): L1 AI decision-tree builder — Phase 2A design
Brainstormed design for real-time AI tree building when no KB/flow matches.
Overrides the original "no empty-KB build" rule: build from generic L1
knowledge under a layered safety model (classification gate, constrained
generation, per-node validation with a hard floor, standing disclaimer).
Approach C — dedicated ai_tree_builder + match_or_build orchestrator,
reusing flow_matching_engine and the knowledge_flywheel proposal pipeline.

Scope: streaming node-by-node builder, admin-configurable categories,
flywheel capture of resolved trees, minimum escalation handoff (notify +
engineer surface). KB ingestion/connectors, PSA reassign, escalation
package, and AI chat handoff deferred to later phases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 01:22:37 -04:00

16 KiB
Raw Blame History

L1 AI Decision-Tree Builder — Phase 2A Design

Status: Draft for review Date: 2026-05-29 Author: previous session (brainstorming) Predecessor: 2026-05-28-l1-workspace-design.md (full L1 vision), 2026-05-28-l1-workspace-phase-1-acceptance.md (what shipped in Phase 1)


1. Goal

When an L1 tech describes a problem and there is no matching authored flow or AI draft, the platform builds a yes/no decision tree in real time from the model's general L1 knowledge and walks the tech through it node by node. Scoped to L1-appropriate troubleshooting: simple yes/no questions and reversible step-by-step instructions. Successful trees are captured as outcome-validated drafts for engineer review, compounding the account's knowledge base from real resolutions.

This overrides the original spec's "no empty-KB build" rule (§8.1 of the predecessor), which aborted to a degradation screen when no KB existed. Instead of aborting, we build from generic knowledge under a layered safety model.

KB grounding (RAG over ingested documents) is explicitly deferred to Phase 2B — Phase 2A builds from generic knowledge only, plus matching against already-authored flows.

2. Scope

In scope (Phase 2A):

  • match_or_build orchestrator inserted at L1 intake (match-first, build-on-miss).
  • ai_tree_builder service: node-by-node ("streaming") tree generation, constrained + escalate-early.
  • Admin-configurable L1 category allowlist (Account Owner/Admin control panel).
  • Standing AI-disclaimer banner on AI-built walks.
  • Flywheel capture: resolved AI trees become outcome-validated FlowProposals.
  • Minimum escalation handoff: engineer bell-badge notification + an engineer-visible "escalated from L1" surface.

Deferred:

  • KB document ingestion + connectors (IT Glue, Hudu, SharePoint/OneDrive) — Phase 2B.
  • RAG grounding of the builder on ingested KB — Phase 2B.
  • PSA ticket reassign on escalation, escalation-package generation, AI chat handoff — later phase.
  • BuildAbortedNoKB screen from the original spec — dropped (superseded by build-from-generic).

3. Architecture (Approach C)

Dedicated builder for the constrained node generation; reuse existing rails for matching and capture.

New services:

File Responsibility
backend/app/services/match_or_build.py Orchestrator. match_or_build(account_id, problem_text, ticket_ref, *, force_build=False) -> MatchOrBuildResult. Classify → category gate → match pass → build/suggest/out-of-scope decision.
backend/app/services/ai_tree_builder.py Node-by-node generation. generate_next_node(problem_text, category, walked_path) -> TreeNode. Reuses get_ai_provider + generate_json + parse_llm_json. Owns the constrained system prompt and per-node validation.
backend/app/services/l1_category_service.py Read/write an account's enabled L1 categories; expose the default allowlist and the always-forbidden hard floor.

Reused as-is:

  • flow_matching_engine.find_matches() — semantic + keyword + recency match pass.
  • knowledge_flywheel proposal-creation + dedupe (_find_similar_pending_proposal) — outcome-validated capture.
  • notification_service — engineer escalation notification.
  • Phase 1 L1WalkTreeVariant walker — its stubbed synthetic-step UI is replaced by real AI node rendering.

Intake decision flow:

POST /l1/intake (problem_statement, customer_*, force_build?)
  → match_or_build(account_id, problem_text, ticket_ref, force_build):
      1. category = classify(problem_text)                       # new
      2. if category not in account.enabled_l1_categories:
             return {outcome: 'out_of_scope', category}
      3. if not force_build:
             hits = flow_matching_engine.find_matches(problem_text)
             best = max(hits, default=None)
             if best.score >= MATCH_THRESHOLD:
                 return {outcome: 'matched', target_id, session_kind}   # flow|proposal
             if best.score >= SUGGEST_THRESHOLD:
                 return {outcome: 'suggest', near_miss, can_build: true}
      4. return {outcome: 'build', session_kind: 'ai_build', category}

Frontend dispatches on outcome:

  • matched → start a flow/proposal walk (Phase 1 paths).
  • suggest → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with force_build=true.
  • out_of_scope → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
  • build → create an ai_build session, navigate to the walker, fetch the first node.

4. The streaming build & node schema

ai_tree_builder.generate_next_node() is called with the problem statement, the resolved category, and the full walked path so far. It returns exactly one node. Passing the whole path every call is what keeps independently-generated nodes coherent and lets the model decide when it has exhausted safe steps.

Node shape (proposed_flow_data node, also the live walked_path entry):

// question — yes/no branch; both branches regenerate
{ "node_type": "question", "id": "n3", "text": "Is the printer showing a 'ready' status light?",
  "yes_next": "generate", "no_next": "generate" }

// instruction — a single safe, reversible action; advances on acknowledgement
{ "node_type": "instruction", "id": "n4", "text": "Unplug the printer for 30 seconds, then power it back on.",
  "next": "generate" }

// resolved — terminal success
{ "node_type": "resolved", "id": "n7", "text": "Printer is back online and printing test pages." }

// escalate — terminal handoff (escalate-early safety valve)
{ "node_type": "escalate", "id": "n7", "reason_category": "exhausted_safe_steps",
  "text": "This looks like a driver-level fault beyond L1 scope — escalating to engineering." }

"generate" is a sentinel meaning "call generate_next_node again with the new answer appended." The first node is fetched synchronously on ai_build session creation (intake). Each subsequent node is fetched when the tech answers/acknowledges — target latency ~24s per node; show a per-node "Thinking through the next step…" affordance.

Endpoint: POST /l1/sessions/{id}/next-node body {node_id, answer?: 'yes'|'no', acknowledged?: true, note?}. Appends the answered node to walked_path, then generates and returns the next node (or a terminal node). Replaces the Phase 1 synthetic stepping in L1WalkTreeVariant.

5. Safety model (layered)

Layer 1 — classification gate. classify(problem_text) maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or unknown); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is unknown), intake returns out_of_scope; no build happens.

Layer 2 — constrained generation. The ai_tree_builder system prompt restricts output to:

  • Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
  • A hard floor of always-forbidden actions (see §5.1) that NO category may unlock.
  • An explicit instruction to emit an escalate node — never guess — once it runs out of in-scope safe steps.

Layer 3 — per-node validation. Server-side, every generated node is checked before being returned:

  • Reject (and regenerate once, then escalate) nodes whose text matches forbidden-action patterns (§5.1).
  • Enforce a depth cap (default L1_BUILD_MAX_DEPTH = 12): once the walked path hits the cap, force an escalate node.
  • Validate node JSON shape (Pydantic); malformed → regenerate once, then escalate.

Layer 4 — standing disclaimer. Persistent banner on every ai_build walk:

"These are high-confidence troubleshooting steps, but they come from outside your organization's knowledge base — review them before acting. When in doubt, escalate early."

5.1 Hard floor — always forbidden (admins cannot enable)

Regardless of enabled categories, the builder must never produce steps that:

  • Modify the Windows registry, system files, or boot configuration.
  • Delete, format, or repartition data/disks; remove user profiles or mailboxes.
  • Change credentials, MFA, security/firewall/AV settings, or disable protections.
  • Run scripts/commands with elevated/admin privileges.
  • Touch domain controllers, DNS, DHCP, or production server config.
  • Make purchases, license changes, or anything with billing impact.

(This list is a product decision — review and edit during spec review.)

5.2 Default enabled category allowlist (admin-editable)

Ships enabled by default; Account Owners/Admins toggle per account: password_reset, account_lockout, printer, email_outlook_client, wifi_network_basics, vpn_connect, teams_zoom_av, browser_cache_cookies, peripheral_reconnect, os_restart_update.

(This list is a product decision — review and edit during spec review.)

5.3 Tunables

Setting Default Notes
MATCH_THRESHOLD 0.75 Carried from predecessor spec §8.1.
SUGGEST_THRESHOLD 0.60 Carried from predecessor spec §8.1.
L1_BUILD_MAX_DEPTH 12 Force escalate beyond this many nodes.
get_model_for_action('l1_realtime_build') Sonnet Latency-sensitive; benchmark Sonnet vs Opus during plan.
Per-node max_tokens 1024 One node is small.

6. Flywheel capture

On resolve of an ai_build session (l1_session_service.resolve extension):

  1. Build proposed_flow_data from the walked_path (the nodes that were actually traversed, normalized into a tree structure).
  2. Create a FlowProposal: source='ai_realtime_l1', validated_by_outcome=true, proposed_flow_data=<tree>, linked_ticket_id/kind=<session ticket>, problem_domain=<category>, status='pending'.
  3. Run the existing _find_similar_pending_proposal dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
  4. Emit the existing proposal.pending notification to the review queue.

Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by flow_matching_engine on future intakes → the KB compounds. No new review UI needed; source='ai_realtime_l1' rows surface in the existing queue (optionally badge them "AI · outcome-validated").

7. Minimum escalation handoff

On escalate (terminal node reached, or the L1 hits the Escalate modal during an ai_build walk) — extends l1_session_service.escalate:

  1. Notify engineersnotification_service bell-badge event l1.session.escalated to the account's engineers (and is_team_admin/owner). Payload: ticket ref, problem summary, escalation reason category, link.
  2. Engineer-visible surface — escalated L1 sessions appear in an engineer-facing list. Reuse/extend the existing /escalations queue (EscalationQueuePage) with an "L1 escalations" section, or a dedicated GET /l1/escalations consumed there. Each row shows problem, the walked path summary, who escalated, when.

Still deferred (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.

8. Data model & migrations

Migration 1 — ai_build session kind.

  • Extend l1_walk_sessions ck_l1_walk_sessions_session_kind CHECK to include 'ai_build'.
  • Extend ck_l1_walk_sessions_target_consistency: for ai_build, both flow_id and flow_proposal_id are NULL (same as adhoc).

Migration 2 — account L1 category settings.

  • Add accounts.enabled_l1_categories JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb (list of category keys). RLS already covers accounts.

No new tables — live build state rides on the existing l1_walk_sessions.walked_path; persisted trees ride on FlowProposal.proposed_flow_data.

9. API surface

Method Path Notes Auth
POST /l1/intake Extended: now runs match_or_build; response carries outcome (matched/suggest/out_of_scope/build). require_l1_or_coverage
POST /l1/sessions/{id}/next-node New: record answer/ack on current node, generate + return next node (or terminal). require_l1_or_coverage
GET /accounts/me/l1-categories New: list enabled + available categories + hard-floor (read-only) list. require_l1_or_above (read)
PATCH /accounts/me/l1-categories New: set enabled categories. require_engineer_or_admin (owner/admin)
GET /l1/escalations New (or extend /escalations): engineer-visible escalated-from-L1 list. require_engineer_or_admin

10. Frontend

  • L1WalkTreeVariant — replace synthetic stepping with real node rendering driven by /next-node; render question (yes/no), instruction (acknowledge), resolved/escalate (terminal). Per-node loading affordance. Disclaimer banner mounted for ai_build sessions.
  • L1Dashboard intake handler — dispatch on match_or_build outcome (suggest prompt, out-of-scope prompt, build → walker).
  • New admin settings panel (under /account) — toggle enabled L1 categories; show hard-floor list as read-only "always excluded."
  • Engineer escalations surface — "L1 escalations" section/list.

11. Testing strategy

Backend unit:

  • ai_tree_builder.generate_next_node — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
  • Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
  • match_or_build — all four outcomes at threshold boundaries (score == MATCH_THRESHOLD, == SUGGEST_THRESHOLD), force_build bypasses match, out_of_scope when category disabled.
  • classify — known categories map correctly; unknown → out_of_scope.
  • Flywheel capture — resolve creates ai_realtime_l1 proposal; dedupe merges near-duplicate.
  • Escalation handoff — notification fired; escalated session appears in engineer query.

Backend integration:

  • Full intake→build→resolve creates an outcome-validated proposal.
  • Intake→build→escalate notifies engineers and surfaces in the escalations list.
  • Migrations roundtrip; ai_build CHECK + target-consistency hold.

Frontend e2e (extend l1-workspace.spec.ts):

  • L1 intake with no match → AI build → answer nodes → resolve → proposal created.
  • L1 build → escalate node → escalate handoff.
  • Admin toggles a category off → that problem class returns out-of-scope.

AI quality (plan-time): small eval set of common L1 problems; assert trees stay in-scope, reach resolution or escalate cleanly, never emit hard-floor actions. Benchmark Sonnet vs Opus for the model-tier decision.

12. Risks & open questions

  • Hallucinated-but-plausible steps for niche/company-specific apps. Mitigation: classification gate + constrained prompt + escalate-early + disclaimer. Residual risk accepted for v1; eval set bounds it.
  • Latency on a live call. Node-by-node means ~24s per branch. Mitigation: Sonnet, small per-node token budget, clear loading affordance. Benchmark at plan time.
  • Coherence across independently-generated nodes. Mitigation: full walked-path context every call.
  • Classification accuracy. A misclassify could wrongly gate a valid problem out, or let a borderline one through. Mitigation: hard floor is category-independent; out-of-scope still offers adhoc/escalate (no dead end).
  • Open (product, for spec review): the default category allowlist (§5.2) and the hard-floor list (§5.1) — confirm/edit. Model tier — confirm Sonnet pending benchmark.

13. Out of scope (restated)

KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.