Files
resolutionflow/docs/superpowers/specs/2026-05-29-l1-ai-tree-builder-phase-2a-design.md
Michael Chihlas f62712d11c docs(spec): resolve 6 Codex review findings on L1 AI tree builder spec
- Blocker: FlowProposal can't link an l1_walk_session (source_session_id is
  NOT NULL FK→ai_sessions, UI links /pilot). Add nullable l1_session_id +
  exactly-one CHECK + read-only walked-path link for L1-sourced proposals.
- High: flow_matching_engine matches published flows only; scope match pass
  to flows, defer proposal-matching.
- High: notification system is FlowPilot-shaped; enumerate the 3 changes for
  l1.session.escalated (VALID_EVENTS, link+body builder, explicit engineer
  recipients). Engineer-visible surface is the primary handoff.
- Medium: match before category gate so authored flows aren't blocked.
- Medium: define normalize_walked_path → valid tree with root id, unexplored
  branches as needs_review stubs.
- Medium: category write auth needs owner/admin, not engineer; add
  require_account_owner_or_admin dep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 03:04:49 -04:00

23 KiB
Raw Blame History

L1 AI Decision-Tree Builder — Phase 2A Design

Status: Draft for review Date: 2026-05-29 Author: previous session (brainstorming) Predecessor: 2026-05-28-l1-workspace-design.md (full L1 vision), 2026-05-28-l1-workspace-phase-1-acceptance.md (what shipped in Phase 1)


1. Goal

When an L1 tech describes a problem and there is no matching authored flow or AI draft, the platform builds a yes/no decision tree in real time from the model's general L1 knowledge and walks the tech through it node by node. Scoped to L1-appropriate troubleshooting: simple yes/no questions and reversible step-by-step instructions. Successful trees are captured as outcome-validated drafts for engineer review, compounding the account's knowledge base from real resolutions.

This overrides the original spec's "no empty-KB build" rule (§8.1 of the predecessor), which aborted to a degradation screen when no KB existed. Instead of aborting, we build from generic knowledge under a layered safety model.

KB grounding (RAG over ingested documents) is explicitly deferred to Phase 2B — Phase 2A builds from generic knowledge only, plus matching against already-authored flows.

2. Scope

In scope (Phase 2A):

  • match_or_build orchestrator inserted at L1 intake (match-first, build-on-miss).
  • ai_tree_builder service: node-by-node ("streaming") tree generation, constrained + escalate-early.
  • Admin-configurable L1 category allowlist (Account Owner/Admin control panel).
  • Standing AI-disclaimer banner on AI-built walks.
  • Flywheel capture: resolved AI trees become outcome-validated FlowProposals.
  • Minimum escalation handoff: engineer bell-badge notification + an engineer-visible "escalated from L1" surface.

Deferred:

  • KB document ingestion + connectors (IT Glue, Hudu, SharePoint/OneDrive) — Phase 2B.
  • RAG grounding of the builder on ingested KB — Phase 2B.
  • PSA ticket reassign on escalation, escalation-package generation, AI chat handoff — later phase.
  • BuildAbortedNoKB screen from the original spec — dropped (superseded by build-from-generic).

3. Architecture (Approach C)

Dedicated builder for the constrained node generation; reuse existing rails for matching and capture.

New services:

File Responsibility
backend/app/services/match_or_build.py Orchestrator. match_or_build(account_id, problem_text, ticket_ref, *, force_build=False) -> MatchOrBuildResult. Classify → category gate → match pass → build/suggest/out-of-scope decision.
backend/app/services/ai_tree_builder.py Node-by-node generation. generate_next_node(problem_text, category, walked_path) -> TreeNode. Reuses get_ai_provider + generate_json + parse_llm_json. Owns the constrained system prompt and per-node validation.
backend/app/services/l1_category_service.py Read/write an account's enabled L1 categories; expose the default allowlist and the always-forbidden hard floor.

Reused as-is:

  • flow_matching_engine.find_matches() — semantic + keyword + recency match pass.
  • knowledge_flywheel proposal-creation + dedupe (_find_similar_pending_proposal) — outcome-validated capture.
  • notification_service — engineer escalation notification.
  • Phase 1 L1WalkTreeVariant walker — its stubbed synthetic-step UI is replaced by real AI node rendering.

Intake decision flow:

Order matters: match first, gate only the build path. The category allowlist exists to bound generic AI building for safety — it must not block a human-authored flow that already exists for that problem. So matching against published flows runs before any category check; the category gate applies only when we fall through to building.

POST /l1/intake (problem_statement, customer_*, force_build?)
  → match_or_build(account_id, problem_text, problem_domain, ticket_ref, force_build):
      1. if not force_build:
             hits = flow_matching_engine.find_matches(problem_text, problem_domain, account_id)
             best = max(hits, default=None)                       # published flows (Trees) only
             if best and best.score >= MATCH_THRESHOLD:
                 return {outcome: 'matched', flow_id, session_kind: 'flow'}
             if best and best.score >= SUGGEST_THRESHOLD:
                 return {outcome: 'suggest', near_miss, can_build: true}
      2. category = classify(problem_text)                        # new — only on build path
      3. if category not in account.enabled_l1_categories:
             return {outcome: 'out_of_scope', category}
      4. return {outcome: 'build', session_kind: 'ai_build', category}

Match scope (Finding 2): flow_matching_engine.find_matches() matches published flows (trees) only — it returns {tree_id, tree_name, score, ...} and has no notion of FlowProposals. Phase 2A therefore matches against published flows only; the matched outcome is always session_kind: 'flow'. This is sufficient because the flywheel promotes good AI drafts to published flows (§6), which then become matchable on future intakes. Matching against not-yet-promoted proposals is a deferred enhancement (would require extending the engine), noted in §13.

Frontend dispatches on outcome:

  • matched → start a flow walk (Phase 1 path).
  • suggest → inline prompt ("Found a similar flow — use it, or build new?"); "Build new" re-calls intake with force_build=true (which skips the match pass and runs the category gate before building).
  • out_of_scope → inline prompt offering ad-hoc walk or escalate-without-walk (Phase 1 paths).
  • build → create an ai_build session, navigate to the walker, fetch the first node.

4. The streaming build & node schema

ai_tree_builder.generate_next_node() is called with the problem statement, the resolved category, and the full walked path so far. It returns exactly one node. Passing the whole path every call is what keeps independently-generated nodes coherent and lets the model decide when it has exhausted safe steps.

Node shape (proposed_flow_data node, also the live walked_path entry):

// question — yes/no branch; both branches regenerate
{ "node_type": "question", "id": "n3", "text": "Is the printer showing a 'ready' status light?",
  "yes_next": "generate", "no_next": "generate" }

// instruction — a single safe, reversible action; advances on acknowledgement
{ "node_type": "instruction", "id": "n4", "text": "Unplug the printer for 30 seconds, then power it back on.",
  "next": "generate" }

// resolved — terminal success
{ "node_type": "resolved", "id": "n7", "text": "Printer is back online and printing test pages." }

// escalate — terminal handoff (escalate-early safety valve)
{ "node_type": "escalate", "id": "n7", "reason_category": "exhausted_safe_steps",
  "text": "This looks like a driver-level fault beyond L1 scope — escalating to engineering." }

"generate" is a sentinel meaning "call generate_next_node again with the new answer appended." The first node is fetched synchronously on ai_build session creation (intake). Each subsequent node is fetched when the tech answers/acknowledges — target latency ~24s per node; show a per-node "Thinking through the next step…" affordance.

Endpoint: POST /l1/sessions/{id}/next-node body {node_id, answer?: 'yes'|'no', acknowledged?: true, note?}. Appends the answered node to walked_path, then generates and returns the next node (or a terminal node). Replaces the Phase 1 synthetic stepping in L1WalkTreeVariant.

5. Safety model (layered)

Layer 1 — classification gate (build path only). Runs only after the match pass misses (§3) — a human-authored flow is never blocked by category settings. classify(problem_text) maps the problem to a category via a lightweight model call (low token budget, returns one category key from the enabled set or unknown); on model failure it falls back to keyword matching against category aliases. If the result is not in the account's enabled set (or is unknown), intake returns out_of_scope (offer adhoc/escalate); no build happens.

Layer 2 — constrained generation. The ai_tree_builder system prompt restricts output to:

  • Safe, reversible, observe-or-restart-class steps only (toggle/restart/reconnect/re-enter, check-status questions).
  • A hard floor of always-forbidden actions (see §5.1) that NO category may unlock.
  • An explicit instruction to emit an escalate node — never guess — once it runs out of in-scope safe steps.

Layer 3 — per-node validation. Server-side, every generated node is checked before being returned:

  • Reject (and regenerate once, then escalate) nodes whose text matches forbidden-action patterns (§5.1).
  • Enforce a depth cap (default L1_BUILD_MAX_DEPTH = 12): once the walked path hits the cap, force an escalate node.
  • Validate node JSON shape (Pydantic); malformed → regenerate once, then escalate.

Layer 4 — standing disclaimer. Persistent banner on every ai_build walk:

"These are high-confidence troubleshooting steps, but they come from outside your organization's knowledge base — review them before acting. When in doubt, escalate early."

5.1 Hard floor — always forbidden (admins cannot enable)

Regardless of enabled categories, the builder must never produce steps that:

  • Modify the Windows registry, system files, or boot configuration.
  • Delete, format, or repartition data/disks; remove user profiles or mailboxes.
  • Change credentials, MFA, security/firewall/AV settings, or disable protections.
  • Run scripts/commands with elevated/admin privileges.
  • Touch domain controllers, DNS, DHCP, or production server config.
  • Make purchases, license changes, or anything with billing impact.

(This list is a product decision — review and edit during spec review.)

5.2 Default enabled category allowlist (admin-editable)

Ships enabled by default; Account Owners/Admins toggle per account: password_reset, account_lockout, printer, email_outlook_client, wifi_network_basics, vpn_connect, teams_zoom_av, browser_cache_cookies, peripheral_reconnect, os_restart_update.

(This list is a product decision — review and edit during spec review.)

5.3 Tunables

Setting Default Notes
MATCH_THRESHOLD 0.75 Carried from predecessor spec §8.1.
SUGGEST_THRESHOLD 0.60 Carried from predecessor spec §8.1.
L1_BUILD_MAX_DEPTH 12 Force escalate beyond this many nodes.
get_model_for_action('l1_realtime_build') Sonnet Latency-sensitive; benchmark Sonnet vs Opus during plan.
Per-node max_tokens 1024 One node is small.

6. Flywheel capture

On resolve of an ai_build session (l1_session_service.resolve extension):

  1. Normalize the walked_path into a complete, valid tree_structure (§6.1) — approval requires a dict with a real id (see Finding 5 / _create_tree_from_proposal).
  2. Create a FlowProposal: source='ai_realtime_l1', validated_by_outcome=true, proposed_flow_data={tree_structure, match_keywords}, l1_session_id=<this session> (NOT source_session_id — see §6.2 / Finding 1), linked_ticket_id/kind=<session ticket>, problem_domain=<category>, status='pending'.
  3. Run the existing _find_similar_pending_proposal dedupe — merge (bump supporting count) if a near-duplicate pending proposal exists, else insert.
  4. Emit the existing proposal.pending notification to the review queue.

Engineers promote good proposals to authored flows in the existing review queue. Promoted flows are then found by flow_matching_engine on future intakes → the KB compounds. source='ai_realtime_l1' rows surface in the existing queue (badge them "AI · outcome-validated").

6.1 Tree normalization (Finding 5)

The live walked_path holds only traversed nodes, and "generate" is a runtime sentinel, not a real edge — that is not a valid tree and would fail the _create_tree_from_proposal guard (tree_structure must be a dict with an id). At resolve time, ai_tree_builder.normalize_walked_path(walked_path) -> tree_structure produces a complete object:

  • Assign stable string ids to every node; the first node becomes the root and tree_structure.id = root id.
  • question nodes: the traversed branch (yes/no the tech actually chose) points to the next traversed node; the untraversed branch points to a terminal {node_type: 'needs_review', text: 'Branch not explored during the originating call'} stub.
  • instruction nodes point to the next traversed node.
  • The traversal ends at the real terminal node (resolved or escalate). This yields a structurally valid, reviewable tree: engineers fill in the needs_review branches when promoting. (Trees are tree_type='troubleshooting'.)

6.2 FlowProposal L1 source linkage (Finding 1 — Blocker)

FlowProposal.source_session_id is currently nullable=False FK → ai_sessions, and the review UI (ProposalDetail.tsx) links the "Source Session" to /pilot/{source_session_id} (a FlowPilot chat surface). An L1 ai_build session is an l1_walk_session, not an ai_session, so it cannot populate source_session_id. Changes:

  • Model/migration: add FlowProposal.l1_session_id (nullable FK → l1_walk_sessions.id, ondelete=SET NULL, indexed). Make source_session_id nullable. Add CHECK ((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)) — exactly one source set.
  • Review UI: when l1_session_id is set (source ai_realtime_l1), render the "Source" block as a read-only walked-path summary (problem statement + the resolved path) instead of a /pilot/... link. Existing ai_session-sourced proposals are unchanged.
  • Tree promotion: _create_tree_from_proposal sets Tree.source_session_id from the proposal — for L1-sourced proposals leave it NULL (confirm Tree.source_session_id is nullable; if not, include in the migration).

7. Minimum escalation handoff

On escalate (terminal node reached, or the L1 hits the Escalate modal during an ai_build walk) — extends l1_session_service.escalate. The engineer-visible surface is the primary, dependency-free handoff; the bell-badge notification is a thin addition that requires three specific extensions to the FlowPilot-shaped notification system (Finding 3).

  1. Engineer-visible surface (primary). Escalated L1 sessions appear in an engineer-facing list — extend the existing /escalations queue (EscalationQueuePage) with an "L1 escalations" section, backed by a new GET /l1/escalations. Each row: problem statement, walked-path summary, who escalated, when, reason category. Pollable; no dependency on the notification subsystem.

  2. Bell-badge notification (Finding 3 — three explicit changes). The notification system is currently FlowPilot-specific:

    • VALID_EVENTS (backend/app/schemas/notification.py) has no l1.session.escalated. Add it to the set (and to the default events_enabled map).
    • _build_notification_link (notification_service.py) only knows session.escalated → /pilot/{session_id}?pickup=true. Add l1.session.escalated → /escalations and add a body template for the new event. The existing session.escalated event must NOT be reused — an L1 escalation has no ai_session and no /pilot pickup flow.
    • Default recipients (_resolve_recipients, ~line 184) are owner/admin/team_admin only — ordinary engineers are excluded. Since L1 escalations must reach engineers who can pick them up, the call must pass explicit target_user_ids = the account's active engineer-role users (plus owner/admin), not rely on the default set.

Still deferred (documented, not built): PSA ticket reassign, escalation-package markdown generation, AI chat handoff/session creation.

8. Data model & migrations

Migration 1 — ai_build session kind.

  • Extend l1_walk_sessions ck_l1_walk_sessions_session_kind CHECK to include 'ai_build'.
  • Extend ck_l1_walk_sessions_target_consistency: for ai_build, both flow_id and flow_proposal_id are NULL (same as adhoc).

Migration 2 — account L1 category settings.

  • Add accounts.enabled_l1_categories JSONB NOT NULL DEFAULT '<default allowlist>'::jsonb (list of category keys). RLS already covers accounts.

Migration 3 — FlowProposal L1 source linkage (Finding 1).

  • Add flow_proposals.l1_session_id nullable FK → l1_walk_sessions.id (ondelete=SET NULL, indexed).
  • Make flow_proposals.source_session_id nullable (was NOT NULL).
  • Add CHECK ((source_session_id IS NOT NULL) <> (l1_session_id IS NOT NULL)) — exactly one source.
  • Confirm trees.source_session_id is nullable (L1-promoted trees leave it NULL); if not, drop its NOT NULL here.

No new tables — live build state rides on the existing l1_walk_sessions.walked_path; persisted trees ride on FlowProposal.proposed_flow_data.

9. API surface

Method Path Notes Auth
POST /l1/intake Extended: now runs match_or_build; response carries outcome (matched/suggest/out_of_scope/build). require_l1_or_coverage
POST /l1/sessions/{id}/next-node New: record answer/ack on current node, generate + return next node (or terminal). require_l1_or_coverage
GET /accounts/me/l1-categories New: list enabled + available categories + hard-floor (read-only) list. require_l1_or_above (read)
PATCH /accounts/me/l1-categories New: set enabled categories. require_account_owner_or_admin (Finding 6)
GET /l1/escalations New (or extend /escalations): engineer-visible escalated-from-L1 list. require_engineer_or_admin

Finding 6 — new auth dep. The category control is an owner/admin setting, but require_engineer_or_admin also admits engineer. No existing dep matches "owner or account-admin" (require_account_owner is owner-only; require_admin is super-admin-only). Add require_account_owner_or_admin to deps.py: allow super_admin bypass, then account_role in ('owner', 'admin'), else 403. Use it for the PATCH.

10. Frontend

  • L1WalkTreeVariant — replace synthetic stepping with real node rendering driven by /next-node; render question (yes/no), instruction (acknowledge), resolved/escalate (terminal). Per-node loading affordance. Disclaimer banner mounted for ai_build sessions.
  • L1Dashboard intake handler — dispatch on match_or_build outcome (suggest prompt, out-of-scope prompt, build → walker).
  • New admin settings panel (under /account) — toggle enabled L1 categories; show hard-floor list as read-only "always excluded."
  • Engineer escalations surface — "L1 escalations" section/list.

11. Testing strategy

Backend unit:

  • ai_tree_builder.generate_next_node — returns valid node per type; escalate-early when path is deep / model signals exhaustion; regenerate-then-escalate on malformed/forbidden output; depth cap forces escalate.
  • Per-node validation — forbidden-action patterns rejected; hard-floor enforced even if a category is enabled.
  • match_or_build — all four outcomes at threshold boundaries (score == MATCH_THRESHOLD, == SUGGEST_THRESHOLD); match runs before the category gate (a matched published flow is returned even when its category is disabled — Finding 4); force_build skips match but still applies the category gate; out_of_scope only on the build path when category disabled/unknown.
  • classify — known categories map correctly; unknown → out_of_scope.
  • normalize_walked_path (Finding 5) — produces a dict with a root id; untraversed question branches become needs_review stubs; output passes the _create_tree_from_proposal validity guard.
  • Flywheel capture — resolve creates ai_realtime_l1 proposal with l1_session_id set and source_session_id NULL (Finding 1); CHECK accepts exactly-one-source; dedupe merges near-duplicate.
  • Escalation handoff — l1.session.escalated accepted by the notification schema (Finding 3); link resolves to /escalations; explicit engineer target_user_ids receive it; escalated session appears in GET /l1/escalations.

Backend integration:

  • Full intake→build→resolve creates an outcome-validated proposal.
  • Intake→build→escalate notifies engineers and surfaces in the escalations list.
  • Migrations roundtrip; ai_build CHECK + target-consistency hold.

Frontend e2e (extend l1-workspace.spec.ts):

  • L1 intake with no match → AI build → answer nodes → resolve → proposal created.
  • L1 build → escalate node → escalate handoff.
  • Admin toggles a category off → that problem class returns out-of-scope.

AI quality (plan-time): small eval set of common L1 problems; assert trees stay in-scope, reach resolution or escalate cleanly, never emit hard-floor actions. Benchmark Sonnet vs Opus for the model-tier decision.

12. Risks & open questions

  • Hallucinated-but-plausible steps for niche/company-specific apps. Mitigation: classification gate + constrained prompt + escalate-early + disclaimer. Residual risk accepted for v1; eval set bounds it.
  • Latency on a live call. Node-by-node means ~24s per branch. Mitigation: Sonnet, small per-node token budget, clear loading affordance. Benchmark at plan time.
  • Coherence across independently-generated nodes. Mitigation: full walked-path context every call.
  • Classification accuracy. A misclassify could wrongly gate a valid problem out, or let a borderline one through. Mitigation: hard floor is category-independent; out-of-scope still offers adhoc/escalate (no dead end).
  • Open (product, for spec review): the default category allowlist (§5.2) and the hard-floor list (§5.1) — confirm/edit. Model tier — confirm Sonnet pending benchmark.

13. Out of scope (restated)

KB ingestion + connectors, RAG grounding, PSA reassign, escalation-package generation, AI chat handoff. Each is its own later phase with its own spec.

Also deferred (surfaced in review):

  • Matching against unpromoted FlowProposals (Finding 2). flow_matching_engine matches published flows only. Extending it to also surface outcome-validated drafts before promotion is a later enhancement; Phase 2A relies on engineer promotion (draft → published flow → matchable).

14. Review revisions (2026-05-29 Codex review)

All six findings verified against code and resolved in this spec:

  1. Blocker — FlowProposal source linkage: §6.2 + §8 Migration 3 (new nullable l1_session_id, source_session_id made nullable, exactly-one CHECK, review-UI link change).
  2. High — match scope: §3 (match published flows only; proposal-matching deferred §13).
  3. High — escalation notification: §7 (engineer surface is primary; three explicit notification-system changes enumerated).
  4. Medium — gate ordering: §3 + §5 Layer 1 (match first; category gate only on the build path).
  5. Medium — flywheel tree shape: §6.1 (normalize_walked_path produces a valid tree with root id; unexplored branches → needs_review stubs).
  6. Medium — category write auth: §9 (new require_account_owner_or_admin dep; require_engineer_or_admin was too broad).