New seat tier between engineer and viewer. Dedicated /l1 surface (dashboard + walker + drafts) for first-call helpdesk staff. Walk-in intake + PSA queue both produce tickets. Match-or-build pipeline prefers authored flows, then outcome-validated AI drafts, then builds fresh from KB. Three KB connectors: IT Glue, Hudu, SharePoint/OneDrive. Escalation via package + PSA reassign, picked up in chat. Engineer coverage via per-user can_cover_l1 flag with audit-log tagging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
37 KiB
L1 Workspace — Design Spec
Date: 2026-05-28 Status: Draft (pending implementation plan) Audience for this doc: engineers + reviewers building the L1 workspace feature
1. Summary
Introduce a dedicated L1 helpdesk workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive.
This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface.
2. Motivation
The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (AssistantChatPage mounted at /pilot). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who:
- Take inbound phone calls and need a fast, deterministic click-through UX
- Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.)
- Are not authorized to escalate complex issues themselves; they hand off to engineers
A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline.
3. Users & roles
3.1 Role hierarchy
super_admin > owner > engineer > l1_tech > viewer
l1_tech is added to the account_role enum. Permissions enforced via app/core/permissions.py and app/api/deps.py.
3.2 What L1 can do
- Use the
/l1/*surface - Open tickets from their queue (PSA-fed or internal)
- Intake walk-in/phone-call problems (creates a ticket as a side effect)
- Walk authored flows and AI-built FlowProposal drafts
- Resolve or escalate a session
- View their own AI drafts list (read-only — outcome tags shown)
3.3 What L1 cannot do
- See the chat surface (
/pilot) — sidebar hidden, route 403s - Author or edit flows
- See
/review-queueor/escalations(engineer inboxes) - See team analytics (only
/analytics/me) - Promote AI drafts (engineers/owners only, via existing review queue)
- Configure KB connectors (owner-only)
3.4 Engineer L1 coverage
Engineers do NOT see the L1 surface by default. Owners can toggle users.can_cover_l1 = true on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in /l1/* with a sticky banner: "Covering L1 — actions logged as coverage." Coverage actions are audit-logged with acting_as = 'l1_coverage'.
Backend dep: require_l1_or_coverage = l1_tech | (engineer AND can_cover_l1) | owner | super_admin.
This mirrors the existing orthogonal-flag pattern (is_team_admin) — no new architectural concept.
3.5 Billing data model
accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0(new column)- Existing
accounts.seats_purchasedcontinues to represent engineer seats - New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band
4. Architecture overview
4.1 New components
Frontend:
pages/l1/L1Dashboard.tsx— landing page; ticket queue + describe-the-problem intakepages/l1/L1WalkPage.tsx— purpose-built walker; yes/no cards, transcript, persistent escalate/resolvepages/l1/L1DraftsPage.tsx— read-only list of the L1's AI drafts and promotion statuspages/l1/L1TicketsPage.tsx— full-page queue (PSA + internal merged)components/l1/L1CoverageBanner.tsx— slim banner shown to engineer-coverers
Backend:
services/match_or_build.py— orchestrator (RAG match → fallback to AI build)services/ai_tree_builder.py— real-time AI tree generation via Anthropicservices/kb_connectors/package — base, registry, encryption, plusitglue.py,hudu.py,microsoft_graph.pyservices/kb_ingestion_writer.py— shared writer used by manual upload + all connectorsservices/kb_ingestion_scheduler.py— APScheduler job,max_instances=1, per-connector syncservices/internal_ticket_service.py— CRUD + status transitions for the no-PSA fallbackservices/l1_session_service.py— walking-session lifecycleapi/endpoints/l1.py— L1-role endpointsapi/endpoints/kb_connectors.py— KB connector config endpoints (owner-only for write)
Reused / extended:
services/rag_service.py— flow & KB matching (existing)services/flow_matching_engine.py— existingservices/escalation_package_generator.py— extended to include walked path, AI draft pointer, KB citationsmodels/FlowProposal— new columns (see §5)services/psa/— already supports ticket create + reassign across CW/Autotask/HaloPSAservices/embedding_service.py— used by KB ingestion writer- New
kb_documents+kb_document_chunkstables for RAG-retrievable document storage, separate from the existingkb_imports(which is a document→tree conversion record, not a persistent KB store — see §5) - Audit log writer — gains
acting_asfield
4.2 Data flow — walk-in / phone-call intake
L1 types: "User can't connect Outlook after password reset"
POST /api/v1/l1/intake
body: { problem_statement, customer_name?, customer_contact? }
→ create ticket
- PSA if configured: psa_provider.create_ticket(...)
- else: internal_tickets row
→ match_or_build(account_id, problem_text, ticket_ref)
→ rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow'
→ rag_service.match_proposals(... where validated_by_outcome=true)
→ top hit; if score ≥ threshold return as 'proposal'
→ ai_tree_builder.build(problem_text, kb_chunks, nearest_flows)
→ persist FlowProposal(source='ai_realtime_l1',
linked_ticket_id,
linked_ticket_kind,
validated_by_outcome=false)
→ return as 'proposal'
→ l1_session_service.start(...)
→ return { session_id, target_kind, target_id, intake_type }
→ navigate to /l1/walk/{session_id}
4.3 Data flow — PSA-queue intake
The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls POST /api/v1/l1/tickets/{ticket_ref}/start which is the same match_or_build path (the problem_statement is the ticket subject + description) followed by walker navigation.
5. Data model
All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are TIMESTAMPTZ. No --rev-id on Alembic; no --autogenerate for enum/RLS work.
5.1 FlowProposal — extended
Existing AI-draft model. Add columns:
| Column | Type | Notes |
|---|---|---|
source |
VARCHAR(30) NOT NULL |
'ai_realtime_l1' | 'kb_accelerator' | 'manual_draft'. Backfill existing rows to 'manual_draft'. |
linked_ticket_id |
VARCHAR(64) NULL |
PSA id or internal_tickets UUID (stored as text) |
linked_ticket_kind |
VARCHAR(10) NULL |
'psa' | 'internal' |
validated_by_outcome |
BOOLEAN NOT NULL DEFAULT FALSE |
Flipped to true when L1 resolves and marks helpful=true |
walked_path_snapshot |
JSONB NULL |
Frozen at resolve/escalate; shape [{node_id, question, answer, l1_note}] |
Engineer review queue sort:
ORDER BY validated_by_outcome DESC, created_at DESC
5.2 internal_tickets — new
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
created_by_user_id UUID NOT NULL (the L1 who took the call)
customer_name VARCHAR(120)
customer_contact VARCHAR(200) NULL (email or phone, free text)
problem_statement TEXT NOT NULL
status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated'
flow_id UUID NULL FK trees
flow_proposal_id UUID NULL FK flow_proposals
ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation)
assigned_user_id UUID NULL (engineer post-escalation)
resolution_notes TEXT NULL
psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA)
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
resolved_at TIMESTAMPTZ NULL
RLS: account-scoped, WITH CHECK on insert/update.
5.3 kb_connector_configs — new
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph'
display_name VARCHAR(80) NOT NULL
credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py
is_active BOOLEAN NOT NULL DEFAULT TRUE
sync_interval_minutes INTEGER NOT NULL DEFAULT 360
last_sync_at TIMESTAMPTZ NULL
last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running'
last_sync_error TEXT NULL
created_by_user_id UUID NOT NULL
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
UNIQUE (account_id, provider, display_name)
RLS: account-scoped, WITH CHECK.
5.4 New tables: kb_documents + kb_document_chunks
The existing kb_imports table is a document→tree conversion record (status lifecycle processing | ready | committed | failed, target tree_id) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval.
The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form:
kb_documents — one row per ingested document:
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph'
source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync
connector_config_id UUID NULL FK kb_connector_configs
title VARCHAR(500) NOT NULL
content TEXT NOT NULL -- full post-extraction text
content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection
metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.)
last_synced_at TIMESTAMPTZ NULL
deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
Unique partial index: (connector_config_id, source_ref) WHERE source_ref IS NOT NULL.
kb_document_chunks — chunks with embeddings, used by rag_service.match_kb_chunks:
id UUID PRIMARY KEY
document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE
account_id UUID NOT NULL -- denormalized for RLS
chunk_index INTEGER NOT NULL
content TEXT NOT NULL
embedding VECTOR(<dim>) NOT NULL -- dim matches embedding_service
metadata JSONB NULL -- section title, page number, etc.
created_at TIMESTAMPTZ NOT NULL
UNIQUE (document_id, chunk_index)
Pgvector index (ivfflat or hnsw) on embedding; choice tuned during implementation.
RLS on both tables: account-scoped, WITH CHECK on insert.
Coexistence with kb_imports: when an L1 (or owner) uploads a doc, the system can populate both — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into kb_documents for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to kb_documents (no auto-tree-conversion from synced docs in v1).
5.5 Other column additions
users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSEaccounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0audit_logs.acting_as VARCHAR(30) NULL—'l1_coverage'when engineer is in coverage mode; null otherwiseaccount_roleenum: add'l1_tech'
5.6 Migration ordering
Six manual Alembic revisions (no --rev-id, no --autogenerate):
- Add
'l1_tech'toaccount_roleenum. - Add
users.can_cover_l1,accounts.l1_seats_purchased,audit_logs.acting_as. - Extend
flow_proposalswith new columns + backfill existing rows tosource='manual_draft'. - Create
internal_tickets+ RLS policies (account-scoped, WITH CHECK). - Create
kb_connector_configs+ RLS policies. - Create
kb_documents+kb_document_chunkstables + RLS policies + pgvector index on chunks.
Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass account_id= explicitly. Grep all Model( sites before merge.
6. Backend services & endpoints
6.1 New services
| Module | Purpose |
|---|---|
services/match_or_build.py |
Orchestrator. Single async entrypoint match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult. |
services/ai_tree_builder.py |
Real-time AI tree generation. Anthropic via existing _call_anthropic_cached pattern. Model tier via settings.get_model_for_action('l1_realtime_build'). Output validated against the flow node schema with Pydantic; rejects malformed output. |
services/kb_connectors/base.py |
Abstract KBConnector with test_credentials, list_documents, fetch_content, subscribe_to_changes (optional). |
services/kb_connectors/itglue.py |
IT Glue REST client. |
services/kb_connectors/hudu.py |
Hudu REST client. |
services/kb_connectors/microsoft_graph.py |
Microsoft Graph (SharePoint/OneDrive) client. |
services/kb_connectors/registry.py |
KBConnectorRegistry (mirrors PsaProviderRegistry). |
services/kb_connectors/encryption.py |
Fernet wrapper (or reuse the PSA one if generic). |
services/kb_ingestion_writer.py |
Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. |
services/kb_ingestion_scheduler.py |
APScheduler interval job, max_instances=1. Sequential per account; concurrency cap = 4 accounts simultaneously. |
services/internal_ticket_service.py |
CRUD + status transitions for internal_tickets. |
services/l1_session_service.py |
Walking-session lifecycle: start, step, resolve, escalate. Bridges ai_sessions and the walked target. |
6.2 Extended services
services/escalation_package_generator.py— adds inputs:walked_path,ai_draft_proposal_id,kb_citations. New caller path froml1_session_service.escalate(...).- KB Accelerator endpoint — accepts ingested content via the shared
kb_ingestion_writer. Manual upload and connector sync share the same persistence path.
6.3 New endpoints
All under require_l1_or_coverage unless noted. Mounted under /api/v1/l1.
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /l1/queue |
Merged ticket queue (PSA + internal). Pagination + status filter. | require_l1_or_coverage |
| POST | /l1/intake |
Walk-in intake. Body {problem_statement, customer_name?, customer_contact?}. Creates ticket, returns {session_id, target_kind, target_id, intake_type}. |
require_l1_or_coverage |
| POST | /l1/tickets/{ticket_ref}/start |
Start walker from an existing ticket. Internally same as intake but skips ticket creation. | require_l1_or_coverage |
| POST | /l1/sessions/{id}/step |
Record an answer. Body {node_id, answer, note?}. Appends to walked_path_snapshot. |
require_l1_or_coverage |
| POST | /l1/sessions/{id}/resolve |
Close as resolved. Body {resolution_notes, helpful: bool}. Sets validated_by_outcome=true on the proposal when helpful=true AND target was a proposal. Closes the ticket. |
require_l1_or_coverage |
| POST | /l1/sessions/{id}/escalate |
Generate escalation package + reassign ticket. Body {reason, reason_category}. |
require_l1_or_coverage |
| GET | /l1/drafts |
List current user's AI drafts with promotion status. | require_l1_or_coverage |
KB connector endpoints (/api/v1/kb-connectors):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /kb-connectors |
List configured connectors for account. | require_l1_or_above |
| POST | /kb-connectors |
Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | require_account_owner |
| DELETE | /kb-connectors/{id} |
Remove (soft-disable). | require_account_owner |
| POST | /kb-connectors/{id}/sync |
Trigger immediate sync (enqueued). | require_account_owner |
| GET | /kb-connectors/{id}/status |
Sync status + doc count + last error. | require_l1_or_above |
Internal ticket endpoints (/api/v1/internal-tickets):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | /internal-tickets |
List (account-scoped). | require_l1_or_coverage |
| GET | /internal-tickets/{id} |
Detail. | require_l1_or_coverage |
| POST | /internal-tickets/{id}/promote-to-psa |
Push to configured PSA, set psa_promoted_ticket_id. |
require_account_owner |
User management addition:
| Method | Path | Purpose | Auth |
|---|---|---|---|
| PATCH | /users/{id}/coverage |
Set can_cover_l1 flag. Body {can_cover_l1: bool}. |
require_account_owner |
7. Frontend surface
7.1 Sidebar — L1 view
LOGO
─────────────
Workspace /l1
Tickets /l1/tickets
My Drafts /l1/drafts
─────────────
Guides /guides
Account /account (filtered — no integrations, no categories)
No /pilot, no /trees, no /flows, no /review-queue, no /escalations, no team analytics. Sidebar.tsx picks the nav array by role.
7.2 Sidebar — engineer coverage view
Engineer's existing sidebar plus a single appended entry "L1 Workspace" → /l1. Shown when canCoverL1 || isOwner || isSuperAdmin.
7.3 /l1 dashboard layout
Three vertical zones, single column, max width ~1100px:
- Greeting — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.")
- Describe the problem card — large textarea (autofocus on load), optional
customer_name+customer_contactfields, single primary CTA "Start walk →" (the only electric-blue element on the page) - Open tickets — section label, count, table rows (merged PSA + internal with origin badges), row hover
bg-elevated - Resume in progress — shown only when L1 has a half-walked session
Tailwind v4 tokens: bg-page base, bg-card zones, bg-elevated row hover, electric-blue accent only on primary CTA. No text-secondary. All borders border-default.
7.4 /l1/walk/{sessionId} walker
Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs flex + flex-1 + min-h-0).
Header:
- Back arrow + ticket ref + customer name + AI-built badge (when target is proposal)
- Problem statement line
- Persistent action buttons:
[ Escalate ][ Resolve ✓ ]
Left pane (main):
- "Step N · estimated M" label
- Current node card — large yes/no/answer buttons (min 44px tap target)
- Optional note textarea below the card (appended to
walked_path_snapshot) - On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s"
Right pane (transcript):
- Walked-so-far list (node title + answer chosen)
- Current step highlight
- "Source:" section listing KB citations for the current node (proposal walks only)
Resolve modal:
- "Did this resolve it?"
[ Yes ][ No ] - Resolution notes textarea
- Yes + target was proposal → sets
validated_by_outcome=true - No → prompt to escalate instead
Escalate modal:
- Reason category dropdown: Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · Other
- Free-text reason
- Confirm
7.5 /l1/drafts page
Read-only list, columns: created · problem (truncated) · ticket # · status (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances.
7.6 /l1/tickets page
Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me.
7.7 Coverage banner
<L1CoverageBanner /> — slim ~32px band, info-cyan-dim background, mounted at the top of all /l1/* pages when !isL1Tech && (canCoverL1 || isOwner || isSuperAdmin):
You're covering L1. Actions logged as coverage. [Switch back →]
The "Switch back" link returns to /.
7.8 Routing
const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard'))
const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage'))
const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage'))
const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage'))
Mounted under the / ProtectedRoute branch at:
/l1→L1Dashboard/l1/walk/:sessionId→L1WalkPage/l1/drafts→L1DraftsPage/l1/tickets→L1TicketsPage
Wrapped in L1RouteGuard (403 if not l1_tech AND not coverage-flagged). ProtectedRoute.tsx post-login redirect: L1 users land on /l1 instead of /.
lazyWithRetry, not React.lazy (per existing convention).
8. AI match-or-build pipeline
8.1 Match-or-build algorithm
match_or_build(account_id, problem_text, ticket_ref):
embedding = embedding_service.embed(problem_text)
# 1. Match authored flows
flow_hits = rag_service.match_flows(account_id, embedding, k=5)
if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'flow', id: flow_hits[0].flow_id, score: ...}
# 2. Match outcome-validated proposals only
proposal_hits = rag_service.match_proposals(
account_id, embedding, k=5,
where=validated_by_outcome=true,
)
if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...}
# 3. Build fresh
kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8)
if not kb_chunks:
raise BuildAbortedNoKB(
"Cannot build a tree with no KB content. "
"Upload docs or wait for a connector sync."
)
nearest_flows = flow_hits[:3]
proposal = ai_tree_builder.build(
problem_text, kb_chunks, nearest_flows, account_id, ticket_ref
)
return {kind: 'proposal', id: proposal.id, score: None}
MATCH_THRESHOLD — per-account configurable; default 0.75 (cosine).
The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes.
8.2 AI tree-build details
Model: settings.get_model_for_action('l1_realtime_build'). Recommend Sonnet for v1 (latency-sensitive).
Schema: output validated against the existing flow node schema (matches tree_editor output). Validation failure aborts the build rather than persisting malformed data.
Prompt strategy (per Lesson on prompt anti-parrot — critical):
- System prompt: role definition + output schema using
<placeholder>notation only. Never literal field values. - Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt.
- User message:
{problem_statement}+{kb_context: [doc_title, section, content]}+{nearest_flow_summaries}+ instruction to cite KB chunks per node. - Output includes
kb_citations: [{node_id, kb_doc_id, snippet}]for walker's "Source:" pane and engineer review.
Latency: whole-tree-then-return (~5–15s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2.
Anthropic SDK config (per Lesson): max_retries=1. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account).
Telemetry:
l1.match_or_build.duration_ms,l1.match_or_build.outcome(flow_match/proposal_match/built/aborted_no_kb)anthropic.cacheevents (existing pattern) taggedaction=l1_realtime_buildl1.tree_build.tokens_in,tokens_out
Anti-parrot guardrail: the existing tests/test_prompt_anti_parrot.py auto-discovers new prompt constants via pattern match on *_PROMPT / *_SCHEMA / *_PROTOCOL / *_FORMAT. No new test required.
8.3 Hallucinated-citation defense
After build, the writer verifies every kb_doc_id in kb_citations exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning.
9. KB ingestion
9.1 Connector interface
class KBConnector(ABC):
async def test_credentials(self) -> bool
async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef]
async def fetch_content(self, ref: KBDocRef) -> KBDocContent
async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1
Registry dispatches by provider string. Credentials encrypted at rest via Fernet (reuse services/psa/encryption.py pattern).
9.2 Per-connector specifics
| IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) | |
|---|---|---|---|
| Auth | API token (header) | API key (header) | OAuth 2.0 |
| Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt |
| Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching (secret|confidential|private) heuristic; files with a tenant Sensitivity Label |
| Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) |
| Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff |
All three deliver content to kb_ingestion_writer which:
- Chunks (paragraph-aware, configurable size with overlap)
- Embeds via
embedding_service - Upserts into
kb_documentskeyed on(connector_config_id, source_ref); chunks intokb_document_chunks
Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped source_ref). Engineers can dedup manually if needed.
9.3 Sync scheduling
kb_ingestion_scheduler.py runs as APScheduler interval job, max_instances=1. Per cycle:
- Query active
kb_connector_configswherelast_sync_atis older thansync_interval_minutes(default 360 = 6h). - Dispatch per account; concurrency cap = 4 simultaneous accounts.
- For each connector:
list_documents(since=last_sync_at)→ for each ref,fetch_content→ write. - Compute the diff between current refs and existing rows (same
connector_config_id); soft-delete missing ones viadeleted_at. - Update
last_sync_at,last_sync_status,last_sync_error.
Must use _admin_session_factory() not get_db() for startup-side and scheduler-side queries (per Lesson on RLS at startup — no app.current_account_id set).
Immediate sync via POST /api/v1/kb-connectors/{id}/sync enqueues a job; scheduler picks it up within ~30s.
10. Escalation flow
- L1 clicks Escalate → modal (reason category + optional free text).
POST /api/v1/l1/sessions/{id}/escalate→ backend:- Calls extended
escalation_package_generator.generate(session_id, include_l1_walk=true). Package contents:problem_statement, customer_name, customer_contact, ticket_ref (PSA id or internal id), target_kind ('flow' | 'proposal'), target_id, walked_path, ai_draft_proposal_id, kb_citations, escalation_reason, reason_category, l1_user_id - Creates an
ai_sessionwith the package serialized into system context for the chat surface. - If PSA-backed:
psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name). Default'Tier 2'. Owner configurable in/account/integrations. - If internal-backed:
internal_tickets.status='escalated',assigned_user_id=null(round-robin assignment is out of scope). - Writes notification via existing
notification_service— bell badge to all engineers in account. - Audit log entry;
acting_asreflects whether L1 or coverage-engineer escalated.
- Calls extended
- Toast on L1 side, return to
/l1. - Engineer clicks notification →
/pilot/{sessionId}→ chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat.
Un-escalate is out of scope. If engineer wants to bounce back, they reassign in PSA manually.
11. Internal ticket fallback
When the account has no active PSA provider:
- Intake creates
internal_ticketsrow instead of a PSA ticket. - Queue surface merges PSA + internal with
Internal/PSAorigin badge. - Escalation flips
internal_tickets.status='escalated'and assigns engineer (or leaves null for any engineer to claim — v1 behavior). - Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip.
Promote to PSA: owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets psa_promoted_ticket_id. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets.
12. Outcome-validation lifecycle
1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1',
validated_by_outcome=false,
linked_ticket_id=...)
2. L1 walks → POST /l1/sessions/{id}/step appends to walked_path_snapshot
3. L1 hits Resolve:
modal: "Did this resolve it?" [Yes] [No] + resolution_notes
4. helpful=true → flow_proposal.validated_by_outcome = true
→ walked_path_snapshot frozen
→ ticket closed (PSA or internal)
helpful=false → validated_by_outcome stays false
→ L1 prompted: "Escalate instead?"
5. Engineer review queue:
ORDER BY validated_by_outcome DESC, created_at DESC
- Outcome-validated drafts surface first
- Promote / edit-and-promote / retire
6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted'
→ future match_or_build matches the new flow on the flow-match pass
13. Out of scope (v1 non-goals)
- End-user / self-service portal ("L0" tier).
- Engineer warm-transfer / live take-over during a call.
- L1 ↔ engineer real-time chat during a call.
- Multi-language UI / customer-language toggle in walker.
- Auto-promote internal tickets to PSA on integration install.
- AI tree streaming (node-by-node).
- KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion).
- Confluence connector.
- Per-step KB citation editing in engineer review (engineers edit the tree, not citations).
- Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard).
- "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only).
- Cancel/un-escalate flow.
- Round-robin engineer assignment on internal-ticket escalations.
14. Testing strategy
14.1 Backend (pytest)
- Unit:
match_or_buildcovers all four paths (flow-match, proposal-match, built, aborted_no_kb). - Unit:
ai_tree_builderschema validation — assert rejection of malformed Anthropic output before persistence. - Unit: each connector's
list_documents+fetch_contentagainst recorded HTTP fixtures. - Integration: intake → walk → resolve(helpful=true) → assert
FlowProposal.validated_by_outcome=true, ticket closed. - Integration: intake → walk → escalate → assert PSA
reassign_ticketinvoked,ai_sessioncreated with package, audit log entry, notification dispatched. - Integration: KB scheduler —
max_instances=1, sequential per-account, soft-delete on removal. - RLS regression (highest priority):
l1_techuser in account A cannot read account B's tickets, drafts, KB docs, or connector configs. Added to existing RLS test suite. - Anti-parrot: existing CI test auto-discovers new prompt module.
14.2 Frontend
- Unit:
usePermissions— L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths. - Unit:
L1WalkPage— node advance, escalate modal, resolve modal flipsvalidated_by_outcomecorrectly. - Unit:
L1CoverageBanner— visible for engineer-with-flag on/l1/*, hidden for L1 users. - E2E (Playwright, scoped selectors per Lesson):
- L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged.
- Engineer with
can_cover_l1→ sidebar entry visible → click → coverage banner shows → walks a session → audit log recordsacting_as='l1_coverage'. - L1 hitting
/pilot,/trees/new,/escalations→ 403 or redirect.
15. Acceptance criteria (v1 ships when…)
- L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable.
- L1 intake creates a ticket (PSA or internal) and lands in walker session.
- Walker handles both flows and proposals; AI-built badge + sources shown for proposals.
- Escalate generates package, reassigns ticket, notifies engineers.
- Resolve flips
validated_by_outcome; review queue prioritizes outcome-validated drafts. - All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal.
- AI build refuses with informative error when account KB is empty.
- Coverage flag works end-to-end with audit-log tagging.
- RLS blocks cross-tenant reads on every new table.
- L1 seat count tracked separately from engineer seats in admin/billing UI.
16. Risks & mitigations
| Risk | Mitigation |
|---|---|
| AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. |
| Hallucinated KB citations | Post-build verification that each kb_doc_id exists; unverified citations stripped from walker, surfaced as warning in engineer review. |
| Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. |
| KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. All ingested docs visible in /account/kb for manual deletion. |
| AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. |
| Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~1–2 weeks of work. Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. |
| Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from 'proposal' to 'flow', not the L1's ability to use validated content. |
17. Naming reference
| Layer | Value |
|---|---|
DB enum (account_role) |
l1_tech |
| UI display | "L1 Tech" / "L1" |
| Sidebar entry | "L1 Workspace" |
| URL prefix | /l1 |
| Coverage flag column | users.can_cover_l1 |
| Coverage audit tag | acting_as = 'l1_coverage' |
| Pricing label | "L1 seat" |
| Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now |
18. Open implementation decisions (deferred to plan, not blocking design)
- Specific
MATCH_THRESHOLDdefault value validation (initial 0.75, tune from telemetry post-launch). - Specific Anthropic model choice for
l1_realtime_build(Sonnet vs Opus — pick based on quality benchmark during plan). - Chunk size + overlap for KB ingestion writer (tune in implementation).
- Engineer queue label default (
'Tier 2'vs'Engineering') — owner-configurable anyway. - Exact look of the build-progress shimmer animation — design-system handoff.
These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here.
Note on scope and phasing
This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is:
- Phase 1: role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end.
- Phase 2:
kb_documentsschema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB. - Phase 3: the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks.
Phasing is a plan-level decision; the spec captures the full feature.
End of spec.