Restructure walked_path off FlowProposal onto new l1_walk_sessions table (each L1 walk has its own path; proposal carries only the validation bit). Add adhoc walk variant for live calls when no KB content exists, with a dedicated BuildAbortedNoKB screen offering ad-hoc/escalate/near-miss options. Introduce SUGGEST_THRESHOLD below MATCH_THRESHOLD so near-miss flows surface as suggestions instead of triggering a 10s build. Define empty-state dashboard mode for first-run accounts. Spec the Microsoft Graph OAuth flow concretely (multi-tenant app, redirect callback, token refresh). Add seat enforcement for both L1 and engineer tracks via shared helper (engineer enforcement was missing in current code). Make audit policy explicit (resolve/escalate only, not per-step). Add session lifecycle (concurrent sessions, browser-close recovery, 24h abandonment). Clarify KB doc visibility is owner/engineer only (L1s see citations in walker, not /account/kb directly). Acknowledge escalation notification noise as v1 limitation with targeted notification deferred to v2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
68 KiB
L1 Workspace — Design Spec
Date: 2026-05-28 Status: Draft (pending implementation plan) Audience for this doc: engineers + reviewers building the L1 workspace feature
1. Summary
Introduce a dedicated L1 helpdesk workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive.
This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface.
2. Motivation
The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (AssistantChatPage mounted at /pilot). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who:
- Take inbound phone calls and need a fast, deterministic click-through UX
- Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.)
- Are not authorized to escalate complex issues themselves; they hand off to engineers
A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline.
3. Users & roles
3.1 Role hierarchy
super_admin > owner > engineer > l1_tech > viewer
l1_tech is added to the account_role enum. Permissions enforced via app/core/permissions.py and app/api/deps.py.
3.2 What L1 can do
- Use the
/l1/*surface - Open tickets from their queue (PSA-fed or internal)
- Intake walk-in/phone-call problems (creates a ticket as a side effect)
- Walk authored flows and AI-built FlowProposal drafts
- Resolve or escalate a session
- View their own AI drafts list (read-only — outcome tags shown)
3.3 What L1 cannot do
- See the chat surface (
/pilot) — sidebar hidden, route 403s - Author or edit flows
- See
/review-queueor/escalations(engineer inboxes) - See team analytics (only
/analytics/me) - Promote AI drafts (engineers/owners only, via existing review queue)
- Configure KB connectors (owner-only)
3.4 Engineer L1 coverage
Engineers do NOT see the L1 surface by default. Owners can toggle users.can_cover_l1 = true on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in /l1/* with a sticky banner: "Covering L1 — actions logged as coverage." Coverage actions are audit-logged with acting_as = 'l1_coverage'.
Backend dep: require_l1_or_coverage = l1_tech | (engineer AND can_cover_l1) | owner | super_admin.
This mirrors the existing orthogonal-flag pattern (is_team_admin) — no new architectural concept.
3.5 Billing data model
accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0(new column)- Existing
accounts.seats_purchasedcontinues to represent engineer seats - New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band
3.6 Seat enforcement (L1 + engineer together)
Important context surfaced during spec review: there is currently no seat-limit enforcement in the codebase. subscription.seat_limit is stored from Stripe webhook payloads and surfaced in API responses, but no endpoint blocks invites when the limit is reached. To avoid shipping L1 with enforcement while engineer seats remain unbounded (inconsistent SKU story), this spec adds enforcement for both tracks as part of v1.
Shared helper: services/seat_enforcement.py:
def check_seat_available(
account: Account,
subscription: Subscription,
role: Literal['engineer', 'l1_tech'],
db: AsyncSession,
) -> SeatCheckResult
Counts active users in the account at the given role, compares against the subscription's role-specific limit (seat_limit for engineer, l1_seat_limit for L1). Returns {available: bool, current: int, limit: int}.
Enforcement points:
POST /api/v1/invites(invite create) — blocks with402 Payment Required(or422with codeseat_limit_exceeded) when the target role's seats are full. Body:{current, limit, role, upgrade_url: <stripe customer portal link>}.- Invite accept (
/api/v1/accept-invite) — re-checks at acceptance time (race-condition guard). - Role change on existing user (e.g., promoting
viewertoengineer) — same check before commit. - Admin "assign role" UI — pre-checks seat availability and disables the option when full.
Grandfathering: any account currently over-seated (existing inviting beyond the limit was technically allowed before) is not retroactively kicked. The enforcement applies from migration-time forward — existing over-seated accounts get a banner prompting upgrade or seat removal but functionality is preserved until they invite a new user.
Frontend: /admin/users and /account/users show a seat counter widget for each role (3 / 5 engineer seats used · 2 / 5 L1 seats used). When a count exceeds the limit, the widget renders amber with a tooltip explaining grandfathering.
4. Architecture overview
4.1 New components
Frontend:
pages/l1/L1Dashboard.tsx— landing page; ticket queue + describe-the-problem intake. Two modes (empty-state + active).pages/l1/L1WalkPage.tsx— purpose-built walker with two internal variants: tree (flow/proposal) and adhoc (note-taking).pages/l1/L1NoKBScreen.tsx— BuildAbortedNoKB screen with three CTAs (adhoc / escalate / use near-miss).pages/l1/L1DraftsPage.tsx— read-only list of the L1's AI drafts and promotion status.pages/l1/L1TicketsPage.tsx— full-page queue (PSA + internal merged).components/l1/L1CoverageBanner.tsx— slim banner shown to engineer-coverers.components/l1/SuggestPrompt.tsx— inline near-miss suggestion ("Use this flow / Build new").components/admin/SeatCounterWidget.tsx— engineer + L1 seat usage counts on/admin/usersand/account/users.
Backend:
services/match_or_build.py— orchestrator (RAG match → fallback to AI build)services/ai_tree_builder.py— real-time AI tree generation via Anthropicservices/kb_connectors/package — base, registry, encryption, plusitglue.py,hudu.py,microsoft_graph.pyservices/kb_ingestion_writer.py— shared writer used by manual upload + all connectorsservices/kb_ingestion_scheduler.py— APScheduler job,max_instances=1, per-connector syncservices/internal_ticket_service.py— CRUD + status transitions for the no-PSA fallbackservices/l1_session_service.py— walking-session lifecycleapi/endpoints/l1.py— L1-role endpointsapi/endpoints/kb_connectors.py— KB connector config endpoints (owner-only for write)
Reused / extended:
services/rag_service.py— flow & KB matching (existing)services/flow_matching_engine.py— existingservices/escalation_package_generator.py— extended to include walked path, AI draft pointer, KB citationsmodels/FlowProposal— new columns (see §5)- New
models/L1WalkSession— per-session state for tree walks and adhoc walks (see §5.3) services/psa/— already supports ticket create + reassign across CW/Autotask/HaloPSAservices/embedding_service.py— used by KB ingestion writer- New
kb_documents+kb_document_chunkstables for RAG-retrievable document storage, separate from the existingkb_imports(which is a document→tree conversion record, not a persistent KB store — see §5) - Audit log writer — gains
acting_asfield
4.2 Data flow — walk-in / phone-call intake
L1 types: "User can't connect Outlook after password reset"
POST /api/v1/l1/intake
body: { problem_statement, customer_name?, customer_contact? }
→ create ticket
- PSA if configured: psa_provider.create_ticket(...)
- else: internal_tickets row
→ match_or_build(account_id, problem_text, ticket_ref)
→ rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow'
→ rag_service.match_proposals(... where validated_by_outcome=true)
→ top hit; if score ≥ threshold return as 'proposal'
→ ai_tree_builder.build(problem_text, kb_chunks, nearest_flows)
→ persist FlowProposal(source='ai_realtime_l1',
linked_ticket_id,
linked_ticket_kind,
validated_by_outcome=false)
→ return as 'proposal'
→ l1_session_service.start(...)
→ return { session_id, target_kind, target_id, intake_type }
→ navigate to /l1/walk/{session_id}
4.3 Data flow — PSA-queue intake
The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls POST /api/v1/l1/tickets/{ticket_ref}/start which is the same match_or_build path (the problem_statement is the ticket subject + description) followed by walker navigation.
5. Data model
All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are TIMESTAMPTZ. No --rev-id on Alembic; no --autogenerate for enum/RLS work.
5.1 FlowProposal — extended
Existing AI-draft model. Add columns:
| Column | Type | Notes |
|---|---|---|
source |
VARCHAR(30) NOT NULL |
'ai_realtime_l1' | 'kb_accelerator' | 'manual_draft'. Backfill existing rows to 'manual_draft'. |
linked_ticket_id |
VARCHAR(64) NULL |
PSA id or internal_tickets UUID (stored as text) |
linked_ticket_kind |
VARCHAR(10) NULL |
'psa' | 'internal' |
validated_by_outcome |
BOOLEAN NOT NULL DEFAULT FALSE |
Flipped to true when any L1 walks this proposal to a helpful resolve |
Note (revised after spec review): the walked path lives on the session (
l1_walk_sessions, §5.3), not the proposal. A single proposal may be walked by multiple L1s over time — each walk has its own path. The proposal carries only the boolean validation signal; engineer review queries the latest validated session's path for context.
Engineer review queue sort:
ORDER BY validated_by_outcome DESC, created_at DESC
5.2 internal_tickets — new
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
created_by_user_id UUID NOT NULL (the L1 who took the call)
customer_name VARCHAR(120)
customer_contact VARCHAR(200) NULL (email or phone, free text)
problem_statement TEXT NOT NULL
status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated'
flow_id UUID NULL FK trees
flow_proposal_id UUID NULL FK flow_proposals
ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation)
assigned_user_id UUID NULL (engineer post-escalation)
resolution_notes TEXT NULL
psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA)
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
resolved_at TIMESTAMPTZ NULL
RLS: account-scoped, WITH CHECK on insert/update.
5.3 l1_walk_sessions — new
Per-session state for an L1 walking a ticket. Supports three session kinds: walking an authored flow, walking an AI-built proposal, or an adhoc walk with no tree (used when no KB content exists and the L1 needs to handle the call manually but still wants the session/ticket/escalation framework).
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
created_by_user_id UUID NOT NULL (the L1, or coverage engineer)
acting_as VARCHAR(30) NULL -- 'l1_coverage' when engineer covers; null for native L1
ticket_id VARCHAR(64) NOT NULL -- PSA id or internal_tickets UUID as text
ticket_kind VARCHAR(10) NOT NULL -- 'psa' | 'internal'
session_kind VARCHAR(20) NOT NULL -- 'flow' | 'proposal' | 'adhoc'
flow_id UUID NULL FK trees
flow_proposal_id UUID NULL FK flow_proposals
current_node_id VARCHAR(100) NULL -- node within the tree; null for adhoc
walked_path JSONB NOT NULL DEFAULT '[]'::jsonb -- [{node_id, question, answer, l1_note}]; [] for adhoc
walk_notes JSONB NOT NULL DEFAULT '[]'::jsonb -- free-form notes (adhoc) or supplementary notes (tree walks)
status VARCHAR(20) NOT NULL DEFAULT 'active' -- 'active' | 'resolved' | 'escalated' | 'abandoned'
resolution_notes TEXT NULL
helpful BOOLEAN NULL -- the "did this work?" answer at resolve time
escalation_reason TEXT NULL
escalation_reason_category VARCHAR(30) NULL
started_at TIMESTAMPTZ NOT NULL
last_step_at TIMESTAMPTZ NOT NULL
resolved_at TIMESTAMPTZ NULL
Constraints:
CHECK (session_kind = 'flow' AND flow_id IS NOT NULL AND flow_proposal_id IS NULL) OR (session_kind = 'proposal' AND flow_proposal_id IS NOT NULL AND flow_id IS NULL) OR (session_kind = 'adhoc' AND flow_id IS NULL AND flow_proposal_id IS NULL)- Soft "abandoned" status: if
last_step_atis older than 24h and status is still'active', a cleanup task flips it to'abandoned'(preserves data; just gets it off the L1's "Resume in progress" widget).
RLS: account-scoped, WITH CHECK on insert/update.
Why a new table (rather than reusing ai_sessions): ai_sessions is the chat-conversation model — flat message list, no node-state, no flow/proposal linkage. An L1 walk has different state (current node, walked path, walk-kind constraint). Forcing it into ai_sessions would require multiple new nullable columns on a heavily-used model and overload its semantics. Separate table = cleaner separation and lower regression risk.
5.4 kb_connector_configs — new
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph'
display_name VARCHAR(80) NOT NULL
credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py
is_active BOOLEAN NOT NULL DEFAULT TRUE
sync_interval_minutes INTEGER NOT NULL DEFAULT 360
last_sync_at TIMESTAMPTZ NULL
last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running'
last_sync_error TEXT NULL
created_by_user_id UUID NOT NULL
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
UNIQUE (account_id, provider, display_name)
RLS: account-scoped, WITH CHECK.
5.5 New tables: kb_documents + kb_document_chunks
The existing kb_imports table is a document→tree conversion record (status lifecycle processing | ready | committed | failed, target tree_id) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval.
The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form:
kb_documents — one row per ingested document:
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph'
source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync
connector_config_id UUID NULL FK kb_connector_configs
title VARCHAR(500) NOT NULL
content TEXT NOT NULL -- full post-extraction text
content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection
metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.)
last_synced_at TIMESTAMPTZ NULL
deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
Unique partial index: (connector_config_id, source_ref) WHERE source_ref IS NOT NULL.
kb_document_chunks — chunks with embeddings, used by rag_service.match_kb_chunks:
id UUID PRIMARY KEY
document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE
account_id UUID NOT NULL -- denormalized for RLS
chunk_index INTEGER NOT NULL
content TEXT NOT NULL
embedding VECTOR(<dim>) NOT NULL -- dim matches embedding_service
metadata JSONB NULL -- section title, page number, etc.
created_at TIMESTAMPTZ NOT NULL
UNIQUE (document_id, chunk_index)
Pgvector index (ivfflat or hnsw) on embedding; choice tuned during implementation.
RLS on both tables: account-scoped, WITH CHECK on insert.
Coexistence with kb_imports: when an L1 (or owner) uploads a doc, the system can populate both — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into kb_documents for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to kb_documents (no auto-tree-conversion from synced docs in v1).
5.6 Other column additions
users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSEaccounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0audit_logs.acting_as VARCHAR(30) NULL—'l1_coverage'when engineer is in coverage mode; null otherwiseaccount_roleenum: add'l1_tech'subscriptions.l1_seat_limit INTEGER NULL(mirrors existingseat_limitwhich is treated as the engineer limit going forward)
5.6.1 Audit log policy (explicit)
Audit rows are written only at session terminal events — resolve and escalate — not on each step. The walked path is recorded incrementally on l1_walk_sessions.walked_path as it accumulates; the audit row at resolve/escalate captures the frozen final snapshot inline. Mid-walk step-by-step audit logging is not v1 because:
- MSP IT troubleshooting actions taken via an L1 walk are rarely high-stakes enough to justify the row-volume cost (~5–20 audit rows per call vs. 1).
- The
walked_pathon the session is itself the auditable record for the L1's path through the tree; the session table is account-scoped and retained. - If a customer-impacting incident traces back to an L1 walk, the path is recoverable from the session row even when the session is
abandoned(cleanup task preserves the row, just flips status).
If higher granularity is needed later (e.g., for compliance-heavy verticals), it's an additive change: subscribe to step events, emit an audit row per step. Not blocking v1.
5.7 Migration ordering
Eight manual Alembic revisions (no --rev-id, no --autogenerate):
- Add
'l1_tech'toaccount_roleenum. - Add
users.can_cover_l1,accounts.l1_seats_purchased,audit_logs.acting_as. - Extend
flow_proposalswith new columns + backfill existing rows tosource='manual_draft'. Do not addwalked_path_snapshot— that column lives on the new sessions table. - Create
l1_walk_sessions+ RLS policies (account-scoped, WITH CHECK) + check constraint on session_kind combinations. - Create
internal_tickets+ RLS policies. - Create
kb_connector_configs+ RLS policies. - Create
kb_documents+kb_document_chunkstables + RLS policies + pgvector index on chunks. - Add seat-enforcement support:
subscriptions.l1_seat_limit INTEGER NULL(already haveseat_limitfor engineers — kept as-is and treated as the engineer limit going forward).
Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass account_id= explicitly. Grep all Model( sites before merge.
6. Backend services & endpoints
6.1 New services
| Module | Purpose |
|---|---|
services/match_or_build.py |
Orchestrator. Single async entrypoint match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult. |
services/ai_tree_builder.py |
Real-time AI tree generation. Anthropic via existing _call_anthropic_cached pattern. Model tier via settings.get_model_for_action('l1_realtime_build'). Output validated against the flow node schema with Pydantic; rejects malformed output. |
services/kb_connectors/base.py |
Abstract KBConnector with test_credentials, list_documents, fetch_content, subscribe_to_changes (optional). |
services/kb_connectors/itglue.py |
IT Glue REST client. |
services/kb_connectors/hudu.py |
Hudu REST client. |
services/kb_connectors/microsoft_graph.py |
Microsoft Graph (SharePoint/OneDrive) client. |
services/kb_connectors/registry.py |
KBConnectorRegistry (mirrors PsaProviderRegistry). |
services/kb_connectors/encryption.py |
Fernet wrapper (or reuse the PSA one if generic). |
services/kb_ingestion_writer.py |
Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. |
services/kb_ingestion_scheduler.py |
APScheduler interval job, max_instances=1. Sequential per account; concurrency cap = 4 accounts simultaneously. |
services/internal_ticket_service.py |
CRUD + status transitions for internal_tickets. |
services/l1_session_service.py |
Walking-session lifecycle: start (flow/proposal/adhoc), step, notes, resolve, escalate, escalate-without-walk. Owns l1_walk_sessions writes. |
services/l1_session_cleanup.py |
APScheduler job (hourly, max_instances=1) flipping stale active sessions to abandoned after 24h of inactivity. |
services/seat_enforcement.py |
Shared helper used by invite, accept-invite, and role-change paths. Returns SeatCheckResult for engineer + L1 roles consistently. |
6.2 Extended services
services/escalation_package_generator.py— adds inputs:walked_path,ai_draft_proposal_id,kb_citations. New caller path froml1_session_service.escalate(...).- KB Accelerator endpoint — accepts ingested content via the shared
kb_ingestion_writer. Manual upload and connector sync share the same persistence path.
6.3 New endpoints
All under require_l1_or_coverage unless noted. Mounted under /api/v1/l1.
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /l1/queue |
Merged ticket queue (PSA + internal). Pagination + status filter. | require_l1_or_coverage |
| POST | /l1/intake |
Walk-in intake. Body {problem_statement, customer_name?, customer_contact?, force_build?}. Creates ticket, runs match_or_build. Response is one of: {outcome: 'matched', session_id, session_kind, target_id} · {outcome: 'suggest', suggestion, can_build} (frontend prompts user) · {outcome: 'aborted_no_kb', near_miss?, ticket_ref} (frontend renders BuildAbortedNoKB screen §8.4). |
require_l1_or_coverage |
| POST | /l1/tickets/{ticket_ref}/start |
Start walker from an existing ticket. Internally same as intake but skips ticket creation. | require_l1_or_coverage |
| POST | /l1/sessions/{id}/step |
Record an answer (tree walks only). Body {node_id, answer, note?}. Appends to l1_walk_sessions.walked_path. |
require_l1_or_coverage |
| POST | /l1/sessions/{id}/notes |
Update walk notes (adhoc walks only). Body {notes: JSONB}. Replaces l1_walk_sessions.walk_notes. Debounced auto-save from frontend. |
require_l1_or_coverage |
| POST | /l1/sessions/{id}/resolve |
Close as resolved. Body {resolution_notes, helpful: bool}. Sets validated_by_outcome=true on the proposal when helpful=true AND session_kind='proposal'. Closes the ticket. |
require_l1_or_coverage |
| POST | /l1/sessions/{id}/escalate |
Generate escalation package + reassign ticket. Body {reason, reason_category}. |
require_l1_or_coverage |
| POST | /l1/sessions/adhoc |
Start an adhoc walk. Body {ticket_ref?, ticket_kind?, problem_statement, customer_name?, customer_contact?}. If ticket_ref omitted, creates a ticket first (PSA or internal). Returns {session_id}. |
require_l1_or_coverage |
| POST | /l1/escalate-without-walk |
Escalate immediately without a walking session (used from the BuildAbortedNoKB screen). Body {problem_statement, customer_name?, customer_contact?, reason_category}. Creates ticket + escalated l1_walk_sessions row + escalation package. |
require_l1_or_coverage |
| GET | /l1/drafts |
List current user's AI drafts with promotion status. | require_l1_or_coverage |
KB connector endpoints (/api/v1/kb-connectors):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /kb-connectors |
List configured connectors for account. | require_l1_or_above |
| POST | /kb-connectors |
Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | require_account_owner |
| DELETE | /kb-connectors/{id} |
Remove (soft-disable). | require_account_owner |
| POST | /kb-connectors/{id}/sync |
Trigger immediate sync (enqueued). | require_account_owner |
| GET | /kb-connectors/{id}/status |
Sync status + doc count + last error. | require_l1_or_above |
Internal ticket endpoints (/api/v1/internal-tickets):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /internal-tickets |
List (account-scoped). | require_l1_or_coverage |
| GET | /internal-tickets/{id} |
Detail. | require_l1_or_coverage |
| POST | /internal-tickets/{id}/promote-to-psa |
Push to configured PSA, set psa_promoted_ticket_id. |
require_account_owner |
User management additions:
| Method | Path | Purpose | Auth |
|---|---|---|---|
| PATCH | /users/{id}/coverage |
Set can_cover_l1 flag. Body {can_cover_l1: bool}. |
require_account_owner |
| GET | /accounts/me/seats |
Returns seat usage {engineer: {current, limit}, l1_tech: {current, limit}}. Used by admin/users UIs to render the counter widget. |
require_engineer_or_admin |
Seat-enforcement integration points (no new endpoints — enforcement is inserted into existing flows):
POST /api/v1/invites(invite create) — returns402 Payment Required(or422withcode: seat_limit_exceeded) when target role has no remaining seats. Body includes{current, limit, role, upgrade_url}.POST /api/v1/accept-invite— race-condition re-check at acceptance time.- Role-change endpoints — same check.
7. Frontend surface
7.1 Sidebar — L1 view
LOGO
─────────────
Workspace /l1
Tickets /l1/tickets
My Drafts /l1/drafts
─────────────
Guides /guides
Account /account (filtered — no integrations, no categories)
No /pilot, no /trees, no /flows, no /review-queue, no /escalations, no team analytics. Sidebar.tsx picks the nav array by role.
7.2 Sidebar — engineer coverage view
Engineer's existing sidebar plus a single appended entry "L1 Workspace" → /l1. Shown when canCoverL1 || isOwner || isSuperAdmin.
7.3 /l1 dashboard layout
The dashboard has two modes determined on load: empty-state (account has no flows AND no KB documents) or active (normal state).
Active mode — four vertical zones, single column, max width ~1100px:
- Greeting — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.")
- Describe the problem card — large textarea (autofocus on load), optional
customer_name+customer_contactfields, single primary CTA "Start walk →" (the only electric-blue element on the page) - Open tickets — section label, count, table rows (merged PSA + internal with origin badges), row hover
bg-elevated - Resume in progress — shown when L1 has any session with
status='active'. Lists ALL active sessions, not just one, sorted bylast_step_at DESC. Each row shows ticket #, customer name, current node summary, "Step N · estimated M" or "Adhoc walk · {len(walk_notes)} notes".
Empty-state mode (first-run experience) — shown when count(flows) == 0 AND count(kb_documents) == 0 for the account:
┌──────────────────────────────────────────────────┐
│ Good morning, {firstName}. │
│ │
│ ╔══════════════════════════════════════════════╗ │
│ ║ Your knowledge base is empty ║ │
│ ║ ║ │
│ ║ L1 Workspace works best when your account ║ │
│ ║ has KB content or authored flows. Right ║ │
│ ║ now there's nothing to match against. ║ │
│ ║ ║ │
│ ║ [for L1 role:] ║ │
│ ║ Ask your admin to: ║ │
│ ║ • Upload KB documents ║ │
│ ║ • Configure a KB connector (IT Glue, etc.) ║ │
│ ║ • Or author a flow ║ │
│ ║ ║ │
│ ║ [for owner/coverage engineer:] ║ │
│ ║ [ Upload KB content ] [ Configure connector ]│ │
│ ║ ║ │
│ ║ You can still take calls — they'll start ║ │
│ ║ as ad-hoc walks. ║ │
│ ╚══════════════════════════════════════════════╝ │
│ │
│ Describe the problem (still works — will start │
│ as ad-hoc walk): │
│ [ ... textarea ... ] │
│ [ Start ad-hoc walk → ] │
└──────────────────────────────────────────────────┘
The empty-state card never blocks intake — an L1 can still take a call and the system gracefully starts an ad-hoc walk (since match_or_build will return aborted_no_kb).
Tailwind v4 tokens: bg-page base, bg-card zones, bg-elevated row hover, electric-blue accent only on primary CTA. No text-secondary. All borders border-default.
7.4 /l1/walk/{sessionId} walker
The walker renders one of two variants based on l1_walk_sessions.session_kind:
- Tree variant (§7.4.A) — for
session_kind in ('flow', 'proposal') - Adhoc variant (§7.4.B) — for
session_kind = 'adhoc'
Both share the sticky header, persistent Escalate + Resolve buttons, customer info, and the resolve/escalate modals.
7.4.A Tree variant (flow + proposal walks)
Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs flex + flex-1 + min-h-0).
Header:
- Back arrow + ticket ref + customer name + AI-built badge (when
session_kind='proposal') - Problem statement line
- Persistent action buttons:
[ Escalate ][ Resolve ✓ ]
Left pane (main):
- "Step N · estimated M" label
- Current node card — large yes/no/answer buttons (min 44px tap target)
- Optional note textarea below the card (appended to
walked_pathasl1_note) - On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s"
Right pane (transcript):
- Walked-so-far list (node title + answer chosen)
- Current step highlight
- "Source:" section listing KB citations for the current node (proposal walks only)
7.4.B Adhoc variant (no tree)
Same sticky header (no AI-built badge since there's no tree). Single-pane body instead of two-pane:
Header:
- Back arrow + ticket ref + customer name + "Ad-hoc walk" pill
- Problem statement line
- Persistent action buttons:
[ Escalate ][ Resolve ✓ ]
Body:
- Large notes editor (rich-text-lite — paragraph breaks, bullet lists, no formatting toolbar bloat)
- Auto-save on debounce (300ms) to
l1_walk_sessions.walk_notesviaPOST /l1/sessions/{id}/notes - Subtle saved-state indicator ("Saved 2s ago")
- Optional "Add a step" button — appends a structured entry
{timestamp, content}towalk_notesrather than free prose. Useful for recording sequential actions taken.
Why a separate variant rather than blank tree: the tree pane is built around the question/answer/transcript trio. Forcing an adhoc session through that frame produces a confusing UX (empty transcript pane, no current node). A dedicated note-taking surface respects the L1's actual job in this mode.
7.4.C Resolve modal (both variants)
- "Did this resolve it?"
[ Yes ][ No ] - Resolution notes textarea (pre-filled with the most recent adhoc walk_notes entry if adhoc)
- Yes + target was proposal → sets
validated_by_outcome=trueon the proposal - Yes + target was flow → no proposal change; flow's hit_count increments (telemetry only)
- Yes + adhoc → no proposal/flow change; resolution_notes saved on session and ticket
- No → prompt to escalate instead
7.4.D Escalate modal (both variants)
- Reason category dropdown: Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · No KB available · Other
- Free-text reason
- Confirm
7.5 /l1/drafts page
Read-only list, columns: created · problem (truncated) · ticket # · status (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances.
7.6 /l1/tickets page
Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me.
7.7 Coverage banner
<L1CoverageBanner /> — slim ~32px band, info-cyan-dim background, mounted at the top of all /l1/* pages when !isL1Tech && (canCoverL1 || isOwner || isSuperAdmin):
You're covering L1. Actions logged as coverage. [Switch back →]
The "Switch back" link returns to /.
7.8 Routing
const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard'))
const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage'))
const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage'))
const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage'))
Mounted under the / ProtectedRoute branch at:
/l1→L1Dashboard/l1/walk/:sessionId→L1WalkPage/l1/drafts→L1DraftsPage/l1/tickets→L1TicketsPage
Wrapped in L1RouteGuard (403 if not l1_tech AND not coverage-flagged). ProtectedRoute.tsx post-login redirect: L1 users land on /l1 instead of /.
lazyWithRetry, not React.lazy (per existing convention).
7.9 Session lifecycle, concurrency, and recovery
Concurrent sessions: an L1 may have multiple l1_walk_sessions rows with status='active' at the same time. The model imposes no single-session constraint — call patterns vary (one tech juggling two calls; one call drops and is resumed while another comes in; coverage engineer handling overflow). The dashboard's "Resume in progress" widget lists all active sessions ordered by last_step_at DESC.
Browser-close recovery: every POST /l1/sessions/{id}/step and adhoc POST /l1/sessions/{id}/notes writes the incremental state to the server. If the browser closes mid-walk (crash, reload, accidental tab close), revisiting /l1/walk/{sessionId} reloads the session from l1_walk_sessions — current node, walked path so far, notes, customer info — and resumes exactly where the L1 left off. No client-side persistence required.
Abandoned sessions: an APScheduler job (max_instances=1, hourly) flips sessions to status='abandoned' when status='active' AND last_step_at < now() - interval '24 hours'. Preserves the row for audit but removes it from the L1's "Resume in progress" widget. Abandoned sessions still appear in /l1/drafts filtered views if they walked a proposal.
No multi-tab guardrail in v1: if the same L1 opens the same session in two tabs, last-write-wins on walked_path. Acceptable for v1 — multi-tab is rare in helpdesk workflows. v2 could add optimistic-locking on the session row.
8. AI match-or-build pipeline
8.1 Match-or-build algorithm
match_or_build(account_id, problem_text, ticket_ref):
embedding = embedding_service.embed(problem_text)
# 1. Match authored flows
flow_hits = rag_service.match_flows(account_id, embedding, k=5)
if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'flow', id: flow_hits[0].flow_id, score: ...}
# 2. Match outcome-validated proposals only
proposal_hits = rag_service.match_proposals(
account_id, embedding, k=5,
where=validated_by_outcome=true,
)
if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...}
# 3. Near-miss zone: surface as suggestion, do NOT auto-build
near_miss = max(
(h for h in (flow_hits + proposal_hits) if h.score >= SUGGEST_THRESHOLD),
key=lambda h: h.score,
default=None,
)
# 4. Try to build fresh
kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8)
if not kb_chunks:
return {
kind: 'aborted_no_kb',
near_miss: near_miss, # might still be useful as a starting point
}
nearest_flows = flow_hits[:3]
if near_miss:
# Frontend prompts: "Found a similar flow — use it, or build new?"
return {kind: 'suggest', suggestion: near_miss, can_build: True}
proposal = ai_tree_builder.build(
problem_text, kb_chunks, nearest_flows, account_id, ticket_ref
)
return {kind: 'proposal', id: proposal.id, score: None}
Thresholds (per-account configurable):
MATCH_THRESHOLDdefault0.75(cosine) — auto-use without askingSUGGEST_THRESHOLDdefault0.60(cosine) — surface as suggestion ("Found a similar flow — use it, or build new?")
Near-miss handling rationale: if a flow scores 0.74 against a 0.75 match threshold, building a fresh AI tree means a 5–15s wait when there's likely a directly usable flow already authored. Surfacing it as an L1 choice saves the build time and gives the L1 agency. Below SUGGEST_THRESHOLD (0.60), the match is too weak to be worth offering and we fall through to build (or abort).
The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes. When this aborts, the frontend renders the BuildAbortedNoKB UX (§8.4).
8.2 AI tree-build details
Model: settings.get_model_for_action('l1_realtime_build'). Recommend Sonnet for v1 (latency-sensitive).
Schema: output validated against the existing flow node schema (matches tree_editor output). Validation failure aborts the build rather than persisting malformed data.
Prompt strategy (per Lesson on prompt anti-parrot — critical):
- System prompt: role definition + output schema using
<placeholder>notation only. Never literal field values. - Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt.
- User message:
{problem_statement}+{kb_context: [doc_title, section, content]}+{nearest_flow_summaries}+ instruction to cite KB chunks per node. - Output includes
kb_citations: [{node_id, kb_doc_id, snippet}]for walker's "Source:" pane and engineer review.
Latency: whole-tree-then-return (~5–15s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2.
Anthropic SDK config (per Lesson): max_retries=1. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account).
Telemetry:
l1.match_or_build.duration_ms,l1.match_or_build.outcome(flow_match/proposal_match/built/aborted_no_kb)anthropic.cacheevents (existing pattern) taggedaction=l1_realtime_buildl1.tree_build.tokens_in,tokens_out
Anti-parrot guardrail: the existing tests/test_prompt_anti_parrot.py auto-discovers new prompt constants via pattern match on *_PROMPT / *_SCHEMA / *_PROTOCOL / *_FORMAT. No new test required.
8.3 Hallucinated-citation defense
After build, the writer verifies every kb_doc_id in kb_citations exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning.
8.4 BuildAbortedNoKB UX (live-call graceful degradation)
The L1 is on a phone call when this fires. A generic "error" toast is unacceptable. The frontend renders a dedicated screen instead of navigating into a walker:
┌────────────────────────────────────────────────────┐
│ No knowledge base content yet │
│ │
│ We couldn't match an existing flow and there's │
│ nothing in your KB to build a new one from. │
│ │
│ You have three options for this call: │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Start an ad-hoc walk │ → │ ← primary CTA
│ │ Take notes, capture the resolution │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────┐ │
│ │ Escalate to engineering │ → │
│ │ Reason pre-filled: "No KB available" │ │
│ └──────────────────────────────────────────┘ │
│ │
│ [ (near_miss present?) ] │
│ ┌──────────────────────────────────────────┐ │
│ │ Try this similar flow instead │ → │
│ │ "{near_miss.title}" · {score} match │ │
│ └──────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────── │
│ Tip: ask your admin to upload KB content or │
│ configure a connector under Account → KB. │
└────────────────────────────────────────────────────┘
Each option triggers a distinct backend path:
- Start an ad-hoc walk →
POST /l1/sessions/adhoc→ createsl1_walk_sessionsrow withsession_kind='adhoc', no flow/proposal. Navigates to/l1/walk/{id}rendering the adhoc walker variant (§7.4.B). - Escalate →
POST /l1/escalate-without-walk(a thin variant of the session-escalate endpoint that takes no session id; creates an immediately-escalated session record and reassigns the ticket). Pre-fillsreason_category='No KB available'. - Try similar flow (only when
near_misswas returned) → starts a flow session against the suggested flow, same as if matched.
This is the graceful degradation contract: no L1 should ever hit a dead end on a live call.
8.5 Near-miss "Suggest" UX
When match_or_build returns {kind: 'suggest', suggestion: ..., can_build: true}, the intake response triggers an inline prompt on the dashboard (no full-page transition):
┌────────────────────────────────────────────────────┐
│ Found a similar flow │
│ │
│ "Outlook can't connect after password reset" │
│ Match: 67% · last updated 2 weeks ago │
│ │
│ [ Use this flow ] [ Build new tree ] │
└────────────────────────────────────────────────────┘
- Use this flow → starts a flow session against the suggestion.
- Build new tree → re-calls
match_or_buildwithforce_build=trueparameter, bypasses the suggest pass, goes directly to build.
This keeps the L1 in control while saving the 5–15s build time when there's an obvious starting point.
9. KB ingestion
9.1 Connector interface
class KBConnector(ABC):
async def test_credentials(self) -> bool
async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef]
async def fetch_content(self, ref: KBDocRef) -> KBDocContent
async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1
Registry dispatches by provider string. Credentials encrypted at rest via Fernet (reuse services/psa/encryption.py pattern).
9.2 Per-connector specifics
| IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) | |
|---|---|---|---|
| Auth | API token (header) | API key (header) | OAuth 2.0 |
| Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt |
| Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching (secret|confidential|private) heuristic; files with a tenant Sensitivity Label |
| Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) |
| Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff |
All three deliver content to kb_ingestion_writer which:
- Chunks (paragraph-aware, configurable size with overlap)
- Embeds via
embedding_service - Upserts into
kb_documentskeyed on(connector_config_id, source_ref); chunks intokb_document_chunks
Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped source_ref). Engineers can dedup manually if needed.
9.2.1 Microsoft Graph OAuth flow (called out — non-trivial)
Unlike IT Glue and Hudu (simple API token entry), Microsoft Graph requires a full OAuth 2.0 flow. This is materially more complex and worth specifying:
Prerequisites:
- Register a Microsoft Entra ID app for ResolutionFlow. Single-tenant or multi-tenant: multi-tenant so MSPs can authorize against their own M365 tenants.
- Configured redirect URI:
https://resolutionflow.com/api/v1/kb-connectors/microsoft_graph/oauth/callback(plus a localhost variant for dev). - Scopes (least privilege):
Files.Read.All+Sites.Read.All+offline_access(for refresh token). User must consent at the tenant level (admin consent required if the tenant has restricted user-consent).
Flow:
- Owner clicks "Connect SharePoint/OneDrive" on
/account/kb-connectors. Frontend callsPOST /api/v1/kb-connectorswithprovider='microsoft_graph'and minimal body (no credentials yet) → backend returns{authorize_url}with state token (signed JWT containing account_id + nonce, ~10min TTL). - Frontend opens
authorize_urlin a popup (preferred) or full-page redirect. User signs into Microsoft, consents. - Microsoft redirects to ResolutionFlow callback
/api/v1/kb-connectors/microsoft_graph/oauth/callback?code=...&state=.... - Backend validates state JWT (extracts account_id, verifies nonce). Exchanges
codefor{access_token, refresh_token, expires_in}via Microsoft token endpoint. - Backend stores both tokens encrypted (Fernet) into
kb_connector_configs.credentials_encryptedas a JSON blob{access_token, refresh_token, expires_at, tenant_id}. Setsdisplay_namefrom the user's M365 tenant name. - Backend returns
{success: true}to the popup window which postMessage's the parent and closes.
Site/drive selection:
After the initial OAuth, owner picks which SharePoint sites and OneDrive drives to ingest. The connector exposes a discovery endpoint that lists available sites; owner picks. Selection persists in kb_connector_configs.metadata JSONB: {site_ids: [...], drive_ids: [...]}.
Access token refresh:
The connector client (services/kb_connectors/microsoft_graph.py) wraps every API call: check expires_at, if within 5min of expiry call refresh endpoint, update stored tokens. Refresh failures (refresh_token expired or revoked) flip kb_connector_configs.last_sync_status='auth_expired' and surface in the connector status UI prompting owner to re-authorize.
Scope creep risk: keep to Files.Read.All + Sites.Read.All. Do not request write scopes, mailbox scopes, or directory scopes even if convenient — read-only KB is the entire value prop.
9.2.2 KB document visibility
Clarification (was ambiguous in initial spec): /account/kb is owner + engineer accessible only. L1s do NOT see KB documents directly — they only see KB content surfaced via walker citations during a walk. This matches the principle that L1 staff are downstream consumers of the knowledge curated by their account's owner/engineers.
Frontend route: /account/kb gated by require_engineer_or_admin. L1 hitting it → redirect to /l1 with toast "KB management is owner/engineer only."
9.3 Sync scheduling
kb_ingestion_scheduler.py runs as APScheduler interval job, max_instances=1. Per cycle:
- Query active
kb_connector_configswherelast_sync_atis older thansync_interval_minutes(default 360 = 6h). - Dispatch per account; concurrency cap = 4 simultaneous accounts.
- For each connector:
list_documents(since=last_sync_at)→ for each ref,fetch_content→ write. - Compute the diff between current refs and existing rows (same
connector_config_id); soft-delete missing ones viadeleted_at. - Update
last_sync_at,last_sync_status,last_sync_error.
Must use _admin_session_factory() not get_db() for startup-side and scheduler-side queries (per Lesson on RLS at startup — no app.current_account_id set).
Immediate sync via POST /api/v1/kb-connectors/{id}/sync enqueues a job; scheduler picks it up within ~30s.
10. Escalation flow
- L1 clicks Escalate → modal (reason category + optional free text).
POST /api/v1/l1/sessions/{id}/escalate→ backend:- Calls extended
escalation_package_generator.generate(session_id, include_l1_walk=true). Package contents:problem_statement, customer_name, customer_contact, ticket_ref (PSA id or internal id), target_kind ('flow' | 'proposal'), target_id, walked_path, ai_draft_proposal_id, kb_citations, escalation_reason, reason_category, l1_user_id - Creates an
ai_sessionwith the package serialized into system context for the chat surface. - If PSA-backed:
psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name). Default'Tier 2'. Owner configurable in/account/integrations. - If internal-backed:
internal_tickets.status='escalated',assigned_user_id=null(round-robin assignment is out of scope). - Writes notification via existing
notification_service— bell badge to all engineers in account. - Audit log entry;
acting_asreflects whether L1 or coverage-engineer escalated.
- Calls extended
- Toast on L1 side, return to
/l1. - Engineer clicks notification →
/pilot/{sessionId}→ chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat.
Un-escalate is out of scope. If engineer wants to bounce back, they reassign in PSA manually.
Known limitation — escalation notification noise: "notify all engineers" is intentionally simple for v1 but does not scale. A 20-engineer account will get 20 bell badges per escalation, which trains everyone to ignore them. v2 work (§13) covers targeted notification — on-duty engineer presence, round-robin assignment, or an owner-designated escalation recipients list. Acknowledged as a real product issue, not a hidden one.
11. Internal ticket fallback
When the account has no active PSA provider:
- Intake creates
internal_ticketsrow instead of a PSA ticket. - Queue surface merges PSA + internal with
Internal/PSAorigin badge. - Escalation flips
internal_tickets.status='escalated'and assigns engineer (or leaves null for any engineer to claim — v1 behavior). - Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip.
Promote to PSA: owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets psa_promoted_ticket_id. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets.
12. Outcome-validation lifecycle
1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1',
validated_by_outcome=false,
linked_ticket_id=...)
→ L1WalkSession(session_kind='proposal',
flow_proposal_id=...,
status='active')
2. L1 walks → POST /l1/sessions/{id}/step appends to l1_walk_sessions.walked_path
(NOTE: walked_path lives on the session, not the proposal — multiple L1s
may walk the same proposal independently)
3. L1 hits Resolve:
modal: "Did this resolve it?" [Yes] [No] + resolution_notes
4. helpful=true → flow_proposal.validated_by_outcome = true (set if not already)
→ l1_walk_sessions.status = 'resolved', helpful = true
→ ticket closed (PSA or internal)
helpful=false → flow_proposal.validated_by_outcome unchanged
→ l1_walk_sessions.status = 'resolved', helpful = false
→ L1 prompted: "Escalate instead?"
5. Engineer review queue:
ORDER BY validated_by_outcome DESC, created_at DESC
- Outcome-validated drafts surface first
- Review pane shows the most recent helpful=true walk's walked_path as evidence
- Promote / edit-and-promote / retire
6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted'
→ future match_or_build matches the new flow on the flow-match pass
Why validated_by_outcome on the proposal but walked_path on the session: validated_by_outcome is a one-bit signal that aggregates across all walks of a proposal (one L1 saying "this worked" is enough to flag the proposal as worth engineer attention). walked_path is the per-walk evidence and must be kept per-session — multiple paths through the same tree by different L1s tell different stories. Engineer review pulls the LATEST helpful=true session's path as the canonical "this is how it worked" record.
13. Out of scope (v1 non-goals)
- End-user / self-service portal ("L0" tier).
- Engineer warm-transfer / live take-over during a call.
- L1 ↔ engineer real-time chat during a call.
- Multi-language UI / customer-language toggle in walker.
- Auto-promote internal tickets to PSA on integration install.
- AI tree streaming (node-by-node).
- KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion).
- Confluence connector.
- Per-step KB citation editing in engineer review (engineers edit the tree, not citations).
- Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard).
- "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only).
- Cancel/un-escalate flow.
- Round-robin engineer assignment on internal-ticket escalations.
- Targeted escalation notification (on-duty presence, round-robin, owner-designated recipients) — v1 notifies all engineers; this will not scale past mid-size accounts. v2 work.
- Quick-select problem shortcuts on the L1 dashboard (top-N common problems as one-click intake buttons). Worth doing in v2 once telemetry reveals which problems dominate. Reduces typing on calls.
- Rich-text resolution notes with formatting toolbar. v1 is plain text + paragraph breaks only.
- Multi-tab session locking — last-write-wins on concurrent same-session edits in v1.
- Step-by-step audit log rows — v1 audits only at resolve/escalate (§5.6.1). Higher granularity is additive later.
- Bulk KB document delete in
/account/kb— per-row delete only in v1.
14. Testing strategy
14.1 Backend (pytest)
- Unit:
match_or_buildcovers all five paths (flow-match, proposal-match, suggest, built, aborted_no_kb). Assert thresholds work at boundaries (score = MATCH_THRESHOLD, score = SUGGEST_THRESHOLD, etc.). - Unit:
ai_tree_builderschema validation — assert rejection of malformed Anthropic output before persistence. - Unit: each connector's
list_documents+fetch_contentagainst recorded HTTP fixtures. - Unit: Microsoft Graph OAuth flow — state JWT validation, token exchange, refresh, auth-expired surfacing.
- Unit:
seat_enforcement.check_seat_available— engineer + L1 paths, grandfathered case. - Integration: intake → walk(flow) → resolve(helpful=true) → assert flow's hit_count incremented, ticket closed (no proposal change).
- Integration: intake → walk(proposal) → resolve(helpful=true) → assert
FlowProposal.validated_by_outcome=true,l1_walk_sessions.helpful=true, ticket closed. - Integration: intake → walk → escalate → assert PSA
reassign_ticketinvoked,ai_sessioncreated with package, audit log entry written ONLY at escalate (not steps), notification dispatched. - Integration: intake on empty-KB account → assert
outcome='aborted_no_kb'returned, no proposal created. - Integration:
/l1/sessions/adhoc→ walker variant flag set → resolve → ticket closed, no proposal/flow touched. - Integration:
/l1/escalate-without-walk→ escalated session row created, no walked_path, package generated. - Integration: KB scheduler —
max_instances=1, sequential per-account, soft-delete on removal. - Integration: Microsoft Graph refresh-token expiry →
last_sync_status='auth_expired'surfaced. - Integration: invite past seat limit → 402 returned; accept-invite at limit → 422; role-change at limit → blocked.
- Integration: grandfathered over-seated account → existing users keep access, new invite blocks.
- Integration: concurrent session creation by same L1 → both rows persist, dashboard returns both in "Resume in progress" sorted by
last_step_at DESC. - Integration: session abandonment job — flips
status='active'rows withlast_step_at < now() - 24hto'abandoned'. - RLS regression (highest priority):
l1_techuser in account A cannot read account B's tickets, drafts, KB docs, connector configs, or walk sessions. Added to existing RLS test suite. - Anti-parrot: existing CI test auto-discovers new prompt module.
14.2 Frontend
- Unit:
usePermissions— L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths. - Unit:
L1WalkPagetree variant — node advance, escalate modal, resolve modal flipsvalidated_by_outcomecorrectly. - Unit:
L1WalkPageadhoc variant — notes auto-save (debounced), no node card rendered, resolve uses notes as resolution_notes pre-fill. - Unit:
L1Dashboardempty-state — renders empty card when flows+KB are both zero; intake still works. - Unit:
L1Dashboardresume-in-progress — lists multiple active sessions ordered bylast_step_at DESC. - Unit:
L1CoverageBanner— visible for engineer-with-flag on/l1/*, hidden for L1 users. - Unit: BuildAbortedNoKB screen — renders three CTAs (with/without near_miss), routes correctly to adhoc/escalate/use-suggestion.
- Unit: SuggestPrompt component — accepts a suggestion, "Build new tree" re-calls intake with
force_build=true. - E2E (Playwright, scoped selectors per Lesson):
- L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged.
- L1 on empty-KB account → intake → BuildAbortedNoKB screen → "Start ad-hoc walk" → adhoc walker → resolve.
- L1 with near-miss → intake → suggest prompt → "Use this flow" → flow walker.
- L1 browser-close mid-walk → re-open
/l1/walk/{id}→ state restored. - Engineer with
can_cover_l1→ sidebar entry visible → click → coverage banner shows → walks a session → audit log recordsacting_as='l1_coverage'. - Owner invites past seat limit → blocked with upgrade prompt.
- L1 hitting
/pilot,/trees/new,/escalations,/account/kb→ 403 or redirect.
15. Acceptance criteria (v1 ships when…)
- L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable.
- L1 intake creates a ticket (PSA or internal) and lands in walker session — OR renders the BuildAbortedNoKB screen when KB is empty, OR renders the suggest prompt when near-miss exists.
- Walker handles flow walks, proposal walks, AND adhoc walks (single-pane note-taking variant). All three resolve and escalate correctly.
- Concurrent sessions supported; browser-close mid-walk recoverable; abandoned sessions auto-flipped after 24h inactivity.
- First-run empty-state card renders on dashboard when account has no flows AND no KB docs; intake still works (degrades to adhoc).
- Escalate generates package, reassigns ticket, notifies engineers. Escalate from BuildAbortedNoKB pre-fills reason category.
- Resolve flips
validated_by_outcomeon proposals; review queue prioritizes outcome-validated drafts and surfaces the latest helpful walk's path as evidence. - All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal. Microsoft Graph OAuth flow completes end-to-end including refresh token rotation.
- AI build refuses cleanly when account KB is empty (returns
aborted_no_kb, not an exception). - Coverage flag works end-to-end with audit-log tagging (
acting_as='l1_coverage'). - Seat enforcement: invite blocks with structured 402/422 when target-role seats are exhausted, for BOTH L1 and engineer roles.
- RLS blocks cross-tenant reads on every new table (
l1_walk_sessions,internal_tickets,kb_connector_configs,kb_documents,kb_document_chunks). - L1 seat count tracked separately from engineer seats; seat counter widget visible in admin/users UI.
- L1s cannot access
/account/kb(owner+engineer only) — confirmed by route guard test.
16. Risks & mitigations
| Risk | Mitigation |
|---|---|
| AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. |
| Hallucinated KB citations | Post-build verification that each kb_doc_id exists; unverified citations stripped from walker, surfaced as warning in engineer review. |
| Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. |
| KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. Ingested docs visible only to owners + engineers (NOT L1s) in /account/kb for manual deletion. |
| AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. |
| Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~1–2 weeks of work; Microsoft Graph OAuth is the heaviest (§9.2.1). Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. |
| Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from 'proposal' to 'flow', not the L1's ability to use validated content. |
walked_path JSONB grows unboundedly on long calls with many notes |
Per-call paths are bounded by tree depth (typically <20 nodes); per-L1 notes are typically short. Real risk only emerges for adhoc walks with verbose note-taking on multi-hour calls. v1 caps walk_notes JSONB at 256 KB at the API layer with a 400 error and "notes too long — consider escalating." Future v2: normalize notes into a separate l1_walk_notes table if size becomes a real issue. |
| Engineer notification overload at scale | Acknowledged — see §10 "Known limitation." v1 notifies all engineers; v2 work covers targeted notification. Mid-size accounts (10+ engineers) will feel this first; flag in onboarding docs. |
| L1 seat enforcement breaks for accounts grandfathered over their seat count | §3.6 specifies non-retroactive enforcement: existing over-seated accounts get a banner but functionality is preserved until next invite. Confirm test coverage for grandfathered state. |
17. Naming reference
| Layer | Value |
|---|---|
DB enum (account_role) |
l1_tech |
| UI display | "L1 Tech" / "L1" |
| Sidebar entry | "L1 Workspace" |
| URL prefix | /l1 |
| Coverage flag column | users.can_cover_l1 |
| Coverage audit tag | acting_as = 'l1_coverage' |
| Pricing label | "L1 seat" |
| Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now |
18. Open implementation decisions (deferred to plan, not blocking design)
- Specific
MATCH_THRESHOLDdefault value validation (initial 0.75, tune from telemetry post-launch). - Specific Anthropic model choice for
l1_realtime_build(Sonnet vs Opus — pick based on quality benchmark during plan). - Chunk size + overlap for KB ingestion writer (tune in implementation).
- Engineer queue label default (
'Tier 2'vs'Engineering') — owner-configurable anyway. - Exact look of the build-progress shimmer animation — design-system handoff.
These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here.
Note on scope and phasing
This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is:
- Phase 1: role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end.
- Phase 2:
kb_documentsschema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB. - Phase 3: the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks.
Phasing is a plan-level decision; the spec captures the full feature.
End of spec.