# L1 Workspace — Design Spec **Date:** 2026-05-28 **Status:** Draft (pending implementation plan) **Audience for this doc:** engineers + reviewers building the L1 workspace feature --- ## 1. Summary Introduce a dedicated **L1 helpdesk** workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive. This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface. --- ## 2. Motivation The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (`AssistantChatPage` mounted at `/pilot`). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who: - Take inbound phone calls and need a fast, deterministic click-through UX - Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.) - Are not authorized to escalate complex issues themselves; they hand off to engineers A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline. --- ## 3. Users & roles ### 3.1 Role hierarchy `super_admin > owner > engineer > l1_tech > viewer` `l1_tech` is added to the `account_role` enum. Permissions enforced via `app/core/permissions.py` and `app/api/deps.py`. ### 3.2 What L1 can do - Use the `/l1/*` surface - Open tickets from their queue (PSA-fed or internal) - Intake walk-in/phone-call problems (creates a ticket as a side effect) - Walk authored flows and AI-built FlowProposal drafts - Resolve or escalate a session - View their own AI drafts list (read-only — outcome tags shown) ### 3.3 What L1 cannot do - See the chat surface (`/pilot`) — sidebar hidden, route 403s - Author or edit flows - See `/review-queue` or `/escalations` (engineer inboxes) - See team analytics (only `/analytics/me`) - Promote AI drafts (engineers/owners only, via existing review queue) - Configure KB connectors (owner-only) ### 3.4 Engineer L1 coverage Engineers do NOT see the L1 surface by default. Owners can toggle `users.can_cover_l1 = true` on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in `/l1/*` with a sticky banner: *"Covering L1 — actions logged as coverage."* Coverage actions are audit-logged with `acting_as = 'l1_coverage'`. Backend dep: `require_l1_or_coverage` = `l1_tech | (engineer AND can_cover_l1) | owner | super_admin`. This mirrors the existing orthogonal-flag pattern (`is_team_admin`) — no new architectural concept. ### 3.5 Billing data model - `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` (new column) - Existing `accounts.seats_purchased` continues to represent engineer seats - New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band --- ## 4. Architecture overview ### 4.1 New components **Frontend:** - `pages/l1/L1Dashboard.tsx` — landing page; ticket queue + describe-the-problem intake - `pages/l1/L1WalkPage.tsx` — purpose-built walker; yes/no cards, transcript, persistent escalate/resolve - `pages/l1/L1DraftsPage.tsx` — read-only list of the L1's AI drafts and promotion status - `pages/l1/L1TicketsPage.tsx` — full-page queue (PSA + internal merged) - `components/l1/L1CoverageBanner.tsx` — slim banner shown to engineer-coverers **Backend:** - `services/match_or_build.py` — orchestrator (RAG match → fallback to AI build) - `services/ai_tree_builder.py` — real-time AI tree generation via Anthropic - `services/kb_connectors/` package — base, registry, encryption, plus `itglue.py`, `hudu.py`, `microsoft_graph.py` - `services/kb_ingestion_writer.py` — shared writer used by manual upload + all connectors - `services/kb_ingestion_scheduler.py` — APScheduler job, `max_instances=1`, per-connector sync - `services/internal_ticket_service.py` — CRUD + status transitions for the no-PSA fallback - `services/l1_session_service.py` — walking-session lifecycle - `api/endpoints/l1.py` — L1-role endpoints - `api/endpoints/kb_connectors.py` — KB connector config endpoints (owner-only for write) **Reused / extended:** - `services/rag_service.py` — flow & KB matching (existing) - `services/flow_matching_engine.py` — existing - `services/escalation_package_generator.py` — extended to include walked path, AI draft pointer, KB citations - `models/FlowProposal` — new columns (see §5) - `services/psa/` — already supports ticket create + reassign across CW/Autotask/HaloPSA - `services/embedding_service.py` — used by KB ingestion writer - New `kb_documents` + `kb_document_chunks` tables for RAG-retrievable document storage, separate from the existing `kb_imports` (which is a document→tree conversion record, not a persistent KB store — see §5) - Audit log writer — gains `acting_as` field ### 4.2 Data flow — walk-in / phone-call intake ``` L1 types: "User can't connect Outlook after password reset" POST /api/v1/l1/intake body: { problem_statement, customer_name?, customer_contact? } → create ticket - PSA if configured: psa_provider.create_ticket(...) - else: internal_tickets row → match_or_build(account_id, problem_text, ticket_ref) → rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow' → rag_service.match_proposals(... where validated_by_outcome=true) → top hit; if score ≥ threshold return as 'proposal' → ai_tree_builder.build(problem_text, kb_chunks, nearest_flows) → persist FlowProposal(source='ai_realtime_l1', linked_ticket_id, linked_ticket_kind, validated_by_outcome=false) → return as 'proposal' → l1_session_service.start(...) → return { session_id, target_kind, target_id, intake_type } → navigate to /l1/walk/{session_id} ``` ### 4.3 Data flow — PSA-queue intake The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls `POST /api/v1/l1/tickets/{ticket_ref}/start` which is the same `match_or_build` path (the `problem_statement` is the ticket subject + description) followed by walker navigation. --- ## 5. Data model All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are `TIMESTAMPTZ`. No `--rev-id` on Alembic; no `--autogenerate` for enum/RLS work. ### 5.1 `FlowProposal` — extended Existing AI-draft model. Add columns: | Column | Type | Notes | |---|---|---| | `source` | `VARCHAR(30) NOT NULL` | `'ai_realtime_l1' \| 'kb_accelerator' \| 'manual_draft'`. Backfill existing rows to `'manual_draft'`. | | `linked_ticket_id` | `VARCHAR(64) NULL` | PSA id or internal_tickets UUID (stored as text) | | `linked_ticket_kind` | `VARCHAR(10) NULL` | `'psa' \| 'internal'` | | `validated_by_outcome` | `BOOLEAN NOT NULL DEFAULT FALSE` | Flipped to true when L1 resolves and marks helpful=true | | `walked_path_snapshot` | `JSONB NULL` | Frozen at resolve/escalate; shape `[{node_id, question, answer, l1_note}]` | Engineer review queue sort: ```sql ORDER BY validated_by_outcome DESC, created_at DESC ``` ### 5.2 `internal_tickets` — new ``` id UUID PRIMARY KEY account_id UUID NOT NULL (RLS-scoped) created_by_user_id UUID NOT NULL (the L1 who took the call) customer_name VARCHAR(120) customer_contact VARCHAR(200) NULL (email or phone, free text) problem_statement TEXT NOT NULL status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated' flow_id UUID NULL FK trees flow_proposal_id UUID NULL FK flow_proposals ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation) assigned_user_id UUID NULL (engineer post-escalation) resolution_notes TEXT NULL psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA) created_at TIMESTAMPTZ NOT NULL updated_at TIMESTAMPTZ NOT NULL resolved_at TIMESTAMPTZ NULL ``` RLS: account-scoped, WITH CHECK on insert/update. ### 5.3 `kb_connector_configs` — new ``` id UUID PRIMARY KEY account_id UUID NOT NULL (RLS-scoped) provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph' display_name VARCHAR(80) NOT NULL credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py is_active BOOLEAN NOT NULL DEFAULT TRUE sync_interval_minutes INTEGER NOT NULL DEFAULT 360 last_sync_at TIMESTAMPTZ NULL last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running' last_sync_error TEXT NULL created_by_user_id UUID NOT NULL created_at TIMESTAMPTZ NOT NULL updated_at TIMESTAMPTZ NOT NULL UNIQUE (account_id, provider, display_name) ``` RLS: account-scoped, WITH CHECK. ### 5.4 New tables: `kb_documents` + `kb_document_chunks` The existing `kb_imports` table is a document→tree conversion record (status lifecycle `processing | ready | committed | failed`, target `tree_id`) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval. The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form: **`kb_documents`** — one row per ingested document: ``` id UUID PRIMARY KEY account_id UUID NOT NULL (RLS-scoped) source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph' source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync connector_config_id UUID NULL FK kb_connector_configs title VARCHAR(500) NOT NULL content TEXT NOT NULL -- full post-extraction text content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.) last_synced_at TIMESTAMPTZ NULL deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal created_at TIMESTAMPTZ NOT NULL updated_at TIMESTAMPTZ NOT NULL ``` Unique partial index: `(connector_config_id, source_ref) WHERE source_ref IS NOT NULL`. **`kb_document_chunks`** — chunks with embeddings, used by `rag_service.match_kb_chunks`: ``` id UUID PRIMARY KEY document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE account_id UUID NOT NULL -- denormalized for RLS chunk_index INTEGER NOT NULL content TEXT NOT NULL embedding VECTOR() NOT NULL -- dim matches embedding_service metadata JSONB NULL -- section title, page number, etc. created_at TIMESTAMPTZ NOT NULL UNIQUE (document_id, chunk_index) ``` Pgvector index (ivfflat or hnsw) on `embedding`; choice tuned during implementation. RLS on both tables: account-scoped, WITH CHECK on insert. **Coexistence with `kb_imports`:** when an L1 (or owner) uploads a doc, the system can populate **both** — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into `kb_documents` for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to `kb_documents` (no auto-tree-conversion from synced docs in v1). ### 5.5 Other column additions - `users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSE` - `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` - `audit_logs.acting_as VARCHAR(30) NULL` — `'l1_coverage'` when engineer is in coverage mode; null otherwise - `account_role` enum: add `'l1_tech'` ### 5.6 Migration ordering Six manual Alembic revisions (no `--rev-id`, no `--autogenerate`): 1. Add `'l1_tech'` to `account_role` enum. 2. Add `users.can_cover_l1`, `accounts.l1_seats_purchased`, `audit_logs.acting_as`. 3. Extend `flow_proposals` with new columns + backfill existing rows to `source='manual_draft'`. 4. Create `internal_tickets` + RLS policies (account-scoped, WITH CHECK). 5. Create `kb_connector_configs` + RLS policies. 6. Create `kb_documents` + `kb_document_chunks` tables + RLS policies + pgvector index on chunks. Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass `account_id=` explicitly. Grep all `Model(` sites before merge. --- ## 6. Backend services & endpoints ### 6.1 New services | Module | Purpose | |---|---| | `services/match_or_build.py` | Orchestrator. Single async entrypoint `match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult`. | | `services/ai_tree_builder.py` | Real-time AI tree generation. Anthropic via existing `_call_anthropic_cached` pattern. Model tier via `settings.get_model_for_action('l1_realtime_build')`. Output validated against the flow node schema with Pydantic; rejects malformed output. | | `services/kb_connectors/base.py` | Abstract `KBConnector` with `test_credentials`, `list_documents`, `fetch_content`, `subscribe_to_changes` (optional). | | `services/kb_connectors/itglue.py` | IT Glue REST client. | | `services/kb_connectors/hudu.py` | Hudu REST client. | | `services/kb_connectors/microsoft_graph.py` | Microsoft Graph (SharePoint/OneDrive) client. | | `services/kb_connectors/registry.py` | `KBConnectorRegistry` (mirrors `PsaProviderRegistry`). | | `services/kb_connectors/encryption.py` | Fernet wrapper (or reuse the PSA one if generic). | | `services/kb_ingestion_writer.py` | Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. | | `services/kb_ingestion_scheduler.py` | APScheduler interval job, `max_instances=1`. Sequential per account; concurrency cap = 4 accounts simultaneously. | | `services/internal_ticket_service.py` | CRUD + status transitions for `internal_tickets`. | | `services/l1_session_service.py` | Walking-session lifecycle: start, step, resolve, escalate. Bridges `ai_sessions` and the walked target. | ### 6.2 Extended services - `services/escalation_package_generator.py` — adds inputs: `walked_path`, `ai_draft_proposal_id`, `kb_citations`. New caller path from `l1_session_service.escalate(...)`. - KB Accelerator endpoint — accepts ingested content via the shared `kb_ingestion_writer`. Manual upload and connector sync share the same persistence path. ### 6.3 New endpoints All under `require_l1_or_coverage` unless noted. Mounted under `/api/v1/l1`. | Method | Path | Purpose | Auth | |---|---|---|---| | GET | `/l1/queue` | Merged ticket queue (PSA + internal). Pagination + status filter. | `require_l1_or_coverage` | | POST | `/l1/intake` | Walk-in intake. Body `{problem_statement, customer_name?, customer_contact?}`. Creates ticket, returns `{session_id, target_kind, target_id, intake_type}`. | `require_l1_or_coverage` | | POST | `/l1/tickets/{ticket_ref}/start` | Start walker from an existing ticket. Internally same as intake but skips ticket creation. | `require_l1_or_coverage` | | POST | `/l1/sessions/{id}/step` | Record an answer. Body `{node_id, answer, note?}`. Appends to `walked_path_snapshot`. | `require_l1_or_coverage` | | POST | `/l1/sessions/{id}/resolve` | Close as resolved. Body `{resolution_notes, helpful: bool}`. Sets `validated_by_outcome=true` on the proposal when `helpful=true` AND target was a proposal. Closes the ticket. | `require_l1_or_coverage` | | POST | `/l1/sessions/{id}/escalate` | Generate escalation package + reassign ticket. Body `{reason, reason_category}`. | `require_l1_or_coverage` | | GET | `/l1/drafts` | List current user's AI drafts with promotion status. | `require_l1_or_coverage` | KB connector endpoints (`/api/v1/kb-connectors`): | Method | Path | Purpose | Auth | |---|---|---|---| | GET | `/kb-connectors` | List configured connectors for account. | `require_l1_or_above` | | POST | `/kb-connectors` | Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | `require_account_owner` | | DELETE | `/kb-connectors/{id}` | Remove (soft-disable). | `require_account_owner` | | POST | `/kb-connectors/{id}/sync` | Trigger immediate sync (enqueued). | `require_account_owner` | | GET | `/kb-connectors/{id}/status` | Sync status + doc count + last error. | `require_l1_or_above` | Internal ticket endpoints (`/api/v1/internal-tickets`): | Method | Path | Purpose | Auth | |---|---|---|---| | GET | `/internal-tickets` | List (account-scoped). | `require_l1_or_coverage` | | GET | `/internal-tickets/{id}` | Detail. | `require_l1_or_coverage` | | POST | `/internal-tickets/{id}/promote-to-psa` | Push to configured PSA, set `psa_promoted_ticket_id`. | `require_account_owner` | User management addition: | Method | Path | Purpose | Auth | |---|---|---|---| | PATCH | `/users/{id}/coverage` | Set `can_cover_l1` flag. Body `{can_cover_l1: bool}`. | `require_account_owner` | --- ## 7. Frontend surface ### 7.1 Sidebar — L1 view ``` LOGO ───────────── Workspace /l1 Tickets /l1/tickets My Drafts /l1/drafts ───────────── Guides /guides Account /account (filtered — no integrations, no categories) ``` No `/pilot`, no `/trees`, no `/flows`, no `/review-queue`, no `/escalations`, no team analytics. Sidebar.tsx picks the nav array by role. ### 7.2 Sidebar — engineer coverage view Engineer's existing sidebar plus a single appended entry "L1 Workspace" → `/l1`. Shown when `canCoverL1 || isOwner || isSuperAdmin`. ### 7.3 `/l1` dashboard layout Three vertical zones, single column, max width ~1100px: 1. **Greeting** — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.") 2. **Describe the problem** card — large textarea (autofocus on load), optional `customer_name` + `customer_contact` fields, single primary CTA "Start walk →" (the only electric-blue element on the page) 3. **Open tickets** — section label, count, table rows (merged PSA + internal with origin badges), row hover `bg-elevated` 4. **Resume in progress** — shown only when L1 has a half-walked session Tailwind v4 tokens: `bg-page` base, `bg-card` zones, `bg-elevated` row hover, electric-blue accent only on primary CTA. No `text-secondary`. All borders `border-default`. ### 7.4 `/l1/walk/{sessionId}` walker Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs `flex` + `flex-1` + `min-h-0`). **Header:** - Back arrow + ticket ref + customer name + AI-built badge (when target is proposal) - Problem statement line - Persistent action buttons: `[ Escalate ]` `[ Resolve ✓ ]` **Left pane (main):** - "Step N · estimated M" label - Current node card — large yes/no/answer buttons (min 44px tap target) - Optional note textarea below the card (appended to `walked_path_snapshot`) - On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s" **Right pane (transcript):** - Walked-so-far list (node title + answer chosen) - Current step highlight - "Source:" section listing KB citations for the current node (proposal walks only) **Resolve modal:** - "Did this resolve it?" `[ Yes ]` `[ No ]` - Resolution notes textarea - Yes + target was proposal → sets `validated_by_outcome=true` - No → prompt to escalate instead **Escalate modal:** - Reason category dropdown: *Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · Other* - Free-text reason - Confirm ### 7.5 `/l1/drafts` page Read-only list, columns: `created` · `problem (truncated)` · `ticket #` · `status` (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances. ### 7.6 `/l1/tickets` page Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me. ### 7.7 Coverage banner `` — slim ~32px band, info-cyan-dim background, mounted at the top of all `/l1/*` pages when `!isL1Tech && (canCoverL1 || isOwner || isSuperAdmin)`: ``` You're covering L1. Actions logged as coverage. [Switch back →] ``` The "Switch back" link returns to `/`. ### 7.8 Routing ```tsx const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard')) const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage')) const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage')) const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage')) ``` Mounted under the `/` ProtectedRoute branch at: - `/l1` → `L1Dashboard` - `/l1/walk/:sessionId` → `L1WalkPage` - `/l1/drafts` → `L1DraftsPage` - `/l1/tickets` → `L1TicketsPage` Wrapped in `L1RouteGuard` (403 if not `l1_tech` AND not coverage-flagged). `ProtectedRoute.tsx` post-login redirect: L1 users land on `/l1` instead of `/`. `lazyWithRetry`, not `React.lazy` (per existing convention). --- ## 8. AI match-or-build pipeline ### 8.1 Match-or-build algorithm ``` match_or_build(account_id, problem_text, ticket_ref): embedding = embedding_service.embed(problem_text) # 1. Match authored flows flow_hits = rag_service.match_flows(account_id, embedding, k=5) if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD: return {kind: 'flow', id: flow_hits[0].flow_id, score: ...} # 2. Match outcome-validated proposals only proposal_hits = rag_service.match_proposals( account_id, embedding, k=5, where=validated_by_outcome=true, ) if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD: return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...} # 3. Build fresh kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8) if not kb_chunks: raise BuildAbortedNoKB( "Cannot build a tree with no KB content. " "Upload docs or wait for a connector sync." ) nearest_flows = flow_hits[:3] proposal = ai_tree_builder.build( problem_text, kb_chunks, nearest_flows, account_id, ticket_ref ) return {kind: 'proposal', id: proposal.id, score: None} ``` `MATCH_THRESHOLD` — per-account configurable; default `0.75` (cosine). The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes. ### 8.2 AI tree-build details **Model:** `settings.get_model_for_action('l1_realtime_build')`. Recommend Sonnet for v1 (latency-sensitive). **Schema:** output validated against the existing flow node schema (matches `tree_editor` output). Validation failure aborts the build rather than persisting malformed data. **Prompt strategy** (per Lesson on prompt anti-parrot — critical): - System prompt: role definition + output schema using `` notation only. Never literal field values. - Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt. - User message: `{problem_statement}` + `{kb_context: [doc_title, section, content]}` + `{nearest_flow_summaries}` + instruction to cite KB chunks per node. - Output includes `kb_citations: [{node_id, kb_doc_id, snippet}]` for walker's "Source:" pane and engineer review. **Latency:** whole-tree-then-return (~5–15s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2. **Anthropic SDK config** (per Lesson): `max_retries=1`. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account). **Telemetry:** - `l1.match_or_build.duration_ms`, `l1.match_or_build.outcome` (`flow_match`/`proposal_match`/`built`/`aborted_no_kb`) - `anthropic.cache` events (existing pattern) tagged `action=l1_realtime_build` - `l1.tree_build.tokens_in`, `tokens_out` **Anti-parrot guardrail:** the existing `tests/test_prompt_anti_parrot.py` auto-discovers new prompt constants via pattern match on `*_PROMPT` / `*_SCHEMA` / `*_PROTOCOL` / `*_FORMAT`. No new test required. ### 8.3 Hallucinated-citation defense After build, the writer verifies every `kb_doc_id` in `kb_citations` exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning. --- ## 9. KB ingestion ### 9.1 Connector interface ```python class KBConnector(ABC): async def test_credentials(self) -> bool async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef] async def fetch_content(self, ref: KBDocRef) -> KBDocContent async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1 ``` Registry dispatches by `provider` string. Credentials encrypted at rest via Fernet (reuse `services/psa/encryption.py` pattern). ### 9.2 Per-connector specifics | | IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) | |---|---|---|---| | Auth | API token (header) | API key (header) | OAuth 2.0 | | Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt | | Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching `(secret\|confidential\|private)` heuristic; files with a tenant Sensitivity Label | | Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) | | Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff | All three deliver content to `kb_ingestion_writer` which: 1. Chunks (paragraph-aware, configurable size with overlap) 2. Embeds via `embedding_service` 3. Upserts into `kb_documents` keyed on `(connector_config_id, source_ref)`; chunks into `kb_document_chunks` Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped `source_ref`). Engineers can dedup manually if needed. ### 9.3 Sync scheduling `kb_ingestion_scheduler.py` runs as APScheduler interval job, `max_instances=1`. Per cycle: 1. Query active `kb_connector_configs` where `last_sync_at` is older than `sync_interval_minutes` (default 360 = 6h). 2. Dispatch per account; concurrency cap = 4 simultaneous accounts. 3. For each connector: `list_documents(since=last_sync_at)` → for each ref, `fetch_content` → write. 4. Compute the diff between current refs and existing rows (same `connector_config_id`); soft-delete missing ones via `deleted_at`. 5. Update `last_sync_at`, `last_sync_status`, `last_sync_error`. Must use `_admin_session_factory()` not `get_db()` for startup-side and scheduler-side queries (per Lesson on RLS at startup — no `app.current_account_id` set). Immediate sync via `POST /api/v1/kb-connectors/{id}/sync` enqueues a job; scheduler picks it up within ~30s. --- ## 10. Escalation flow 1. L1 clicks **Escalate** → modal (reason category + optional free text). 2. `POST /api/v1/l1/sessions/{id}/escalate` → backend: - Calls extended `escalation_package_generator.generate(session_id, include_l1_walk=true)`. Package contents: ``` problem_statement, customer_name, customer_contact, ticket_ref (PSA id or internal id), target_kind ('flow' | 'proposal'), target_id, walked_path, ai_draft_proposal_id, kb_citations, escalation_reason, reason_category, l1_user_id ``` - Creates an `ai_session` with the package serialized into system context for the chat surface. - If PSA-backed: `psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name)`. Default `'Tier 2'`. Owner configurable in `/account/integrations`. - If internal-backed: `internal_tickets.status='escalated'`, `assigned_user_id=null` (round-robin assignment is out of scope). - Writes notification via existing `notification_service` — bell badge to all engineers in account. - Audit log entry; `acting_as` reflects whether L1 or coverage-engineer escalated. 3. Toast on L1 side, return to `/l1`. 4. Engineer clicks notification → `/pilot/{sessionId}` → chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat. **Un-escalate is out of scope.** If engineer wants to bounce back, they reassign in PSA manually. --- ## 11. Internal ticket fallback When the account has no active PSA provider: - Intake creates `internal_tickets` row instead of a PSA ticket. - Queue surface merges PSA + internal with `Internal` / `PSA` origin badge. - Escalation flips `internal_tickets.status='escalated'` and assigns engineer (or leaves null for any engineer to claim — v1 behavior). - Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip. **Promote to PSA:** owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets `psa_promoted_ticket_id`. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets. --- ## 12. Outcome-validation lifecycle ``` 1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1', validated_by_outcome=false, linked_ticket_id=...) 2. L1 walks → POST /l1/sessions/{id}/step appends to walked_path_snapshot 3. L1 hits Resolve: modal: "Did this resolve it?" [Yes] [No] + resolution_notes 4. helpful=true → flow_proposal.validated_by_outcome = true → walked_path_snapshot frozen → ticket closed (PSA or internal) helpful=false → validated_by_outcome stays false → L1 prompted: "Escalate instead?" 5. Engineer review queue: ORDER BY validated_by_outcome DESC, created_at DESC - Outcome-validated drafts surface first - Promote / edit-and-promote / retire 6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted' → future match_or_build matches the new flow on the flow-match pass ``` --- ## 13. Out of scope (v1 non-goals) - End-user / self-service portal ("L0" tier). - Engineer warm-transfer / live take-over during a call. - L1 ↔ engineer real-time chat during a call. - Multi-language UI / customer-language toggle in walker. - Auto-promote internal tickets to PSA on integration install. - AI tree streaming (node-by-node). - KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion). - Confluence connector. - Per-step KB citation editing in engineer review (engineers edit the tree, not citations). - Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard). - "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only). - Cancel/un-escalate flow. - Round-robin engineer assignment on internal-ticket escalations. --- ## 14. Testing strategy ### 14.1 Backend (pytest) - Unit: `match_or_build` covers all four paths (flow-match, proposal-match, built, aborted_no_kb). - Unit: `ai_tree_builder` schema validation — assert rejection of malformed Anthropic output before persistence. - Unit: each connector's `list_documents` + `fetch_content` against recorded HTTP fixtures. - Integration: intake → walk → resolve(helpful=true) → assert `FlowProposal.validated_by_outcome=true`, ticket closed. - Integration: intake → walk → escalate → assert PSA `reassign_ticket` invoked, `ai_session` created with package, audit log entry, notification dispatched. - Integration: KB scheduler — `max_instances=1`, sequential per-account, soft-delete on removal. - **RLS regression** (highest priority): `l1_tech` user in account A cannot read account B's tickets, drafts, KB docs, or connector configs. Added to existing RLS test suite. - Anti-parrot: existing CI test auto-discovers new prompt module. ### 14.2 Frontend - Unit: `usePermissions` — L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths. - Unit: `L1WalkPage` — node advance, escalate modal, resolve modal flips `validated_by_outcome` correctly. - Unit: `L1CoverageBanner` — visible for engineer-with-flag on `/l1/*`, hidden for L1 users. - E2E (Playwright, scoped selectors per Lesson): - L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged. - Engineer with `can_cover_l1` → sidebar entry visible → click → coverage banner shows → walks a session → audit log records `acting_as='l1_coverage'`. - L1 hitting `/pilot`, `/trees/new`, `/escalations` → 403 or redirect. --- ## 15. Acceptance criteria (v1 ships when…) - L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable. - L1 intake creates a ticket (PSA or internal) and lands in walker session. - Walker handles both flows and proposals; AI-built badge + sources shown for proposals. - Escalate generates package, reassigns ticket, notifies engineers. - Resolve flips `validated_by_outcome`; review queue prioritizes outcome-validated drafts. - All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal. - AI build refuses with informative error when account KB is empty. - Coverage flag works end-to-end with audit-log tagging. - RLS blocks cross-tenant reads on every new table. - L1 seat count tracked separately from engineer seats in admin/billing UI. --- ## 16. Risks & mitigations | Risk | Mitigation | |---|---| | AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. | | Hallucinated KB citations | Post-build verification that each `kb_doc_id` exists; unverified citations stripped from walker, surfaced as warning in engineer review. | | Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. | | KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. All ingested docs visible in `/account/kb` for manual deletion. | | AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. | | Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~1–2 weeks of work. Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. | | Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from `'proposal'` to `'flow'`, not the L1's ability to use validated content. | --- ## 17. Naming reference | Layer | Value | |---|---| | DB enum (`account_role`) | `l1_tech` | | UI display | "L1 Tech" / "L1" | | Sidebar entry | "L1 Workspace" | | URL prefix | `/l1` | | Coverage flag column | `users.can_cover_l1` | | Coverage audit tag | `acting_as = 'l1_coverage'` | | Pricing label | "L1 seat" | | Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now | --- ## 18. Open implementation decisions (deferred to plan, not blocking design) - Specific `MATCH_THRESHOLD` default value validation (initial 0.75, tune from telemetry post-launch). - Specific Anthropic model choice for `l1_realtime_build` (Sonnet vs Opus — pick based on quality benchmark during plan). - Chunk size + overlap for KB ingestion writer (tune in implementation). - Engineer queue label default (`'Tier 2'` vs `'Engineering'`) — owner-configurable anyway. - Exact look of the build-progress shimmer animation — design-system handoff. These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here. ### Note on scope and phasing This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is: - **Phase 1:** role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end. - **Phase 2:** `kb_documents` schema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB. - **Phase 3:** the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks. Phasing is a plan-level decision; the spec captures the full feature. --- *End of spec.*