From d1cf77cd415c2fb996e0c98046a994349a975a17 Mon Sep 17 00:00:00 2001 From: Michael Chihlas Date: Thu, 28 May 2026 03:33:32 -0400 Subject: [PATCH] docs(design): L1 workspace feature spec New seat tier between engineer and viewer. Dedicated /l1 surface (dashboard + walker + drafts) for first-call helpdesk staff. Walk-in intake + PSA queue both produce tickets. Match-or-build pipeline prefers authored flows, then outcome-validated AI drafts, then builds fresh from KB. Three KB connectors: IT Glue, Hudu, SharePoint/OneDrive. Escalation via package + PSA reassign, picked up in chat. Engineer coverage via per-user can_cover_l1 flag with audit-log tagging. Co-Authored-By: Claude Opus 4.7 --- .../specs/2026-05-28-l1-workspace-design.md | 717 ++++++++++++++++++ 1 file changed, 717 insertions(+) create mode 100644 docs/superpowers/specs/2026-05-28-l1-workspace-design.md diff --git a/docs/superpowers/specs/2026-05-28-l1-workspace-design.md b/docs/superpowers/specs/2026-05-28-l1-workspace-design.md new file mode 100644 index 00000000..6775ef7b --- /dev/null +++ b/docs/superpowers/specs/2026-05-28-l1-workspace-design.md @@ -0,0 +1,717 @@ +# L1 Workspace — Design Spec + +**Date:** 2026-05-28 +**Status:** Draft (pending implementation plan) +**Audience for this doc:** engineers + reviewers building the L1 workspace feature + +--- + +## 1. Summary + +Introduce a dedicated **L1 helpdesk** workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive. + +This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface. + +--- + +## 2. Motivation + +The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (`AssistantChatPage` mounted at `/pilot`). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who: + +- Take inbound phone calls and need a fast, deterministic click-through UX +- Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.) +- Are not authorized to escalate complex issues themselves; they hand off to engineers + +A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline. + +--- + +## 3. Users & roles + +### 3.1 Role hierarchy + +`super_admin > owner > engineer > l1_tech > viewer` + +`l1_tech` is added to the `account_role` enum. Permissions enforced via `app/core/permissions.py` and `app/api/deps.py`. + +### 3.2 What L1 can do + +- Use the `/l1/*` surface +- Open tickets from their queue (PSA-fed or internal) +- Intake walk-in/phone-call problems (creates a ticket as a side effect) +- Walk authored flows and AI-built FlowProposal drafts +- Resolve or escalate a session +- View their own AI drafts list (read-only — outcome tags shown) + +### 3.3 What L1 cannot do + +- See the chat surface (`/pilot`) — sidebar hidden, route 403s +- Author or edit flows +- See `/review-queue` or `/escalations` (engineer inboxes) +- See team analytics (only `/analytics/me`) +- Promote AI drafts (engineers/owners only, via existing review queue) +- Configure KB connectors (owner-only) + +### 3.4 Engineer L1 coverage + +Engineers do NOT see the L1 surface by default. Owners can toggle `users.can_cover_l1 = true` on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in `/l1/*` with a sticky banner: *"Covering L1 — actions logged as coverage."* Coverage actions are audit-logged with `acting_as = 'l1_coverage'`. + +Backend dep: `require_l1_or_coverage` = `l1_tech | (engineer AND can_cover_l1) | owner | super_admin`. + +This mirrors the existing orthogonal-flag pattern (`is_team_admin`) — no new architectural concept. + +### 3.5 Billing data model + +- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` (new column) +- Existing `accounts.seats_purchased` continues to represent engineer seats +- New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band + +--- + +## 4. Architecture overview + +### 4.1 New components + +**Frontend:** +- `pages/l1/L1Dashboard.tsx` — landing page; ticket queue + describe-the-problem intake +- `pages/l1/L1WalkPage.tsx` — purpose-built walker; yes/no cards, transcript, persistent escalate/resolve +- `pages/l1/L1DraftsPage.tsx` — read-only list of the L1's AI drafts and promotion status +- `pages/l1/L1TicketsPage.tsx` — full-page queue (PSA + internal merged) +- `components/l1/L1CoverageBanner.tsx` — slim banner shown to engineer-coverers + +**Backend:** +- `services/match_or_build.py` — orchestrator (RAG match → fallback to AI build) +- `services/ai_tree_builder.py` — real-time AI tree generation via Anthropic +- `services/kb_connectors/` package — base, registry, encryption, plus `itglue.py`, `hudu.py`, `microsoft_graph.py` +- `services/kb_ingestion_writer.py` — shared writer used by manual upload + all connectors +- `services/kb_ingestion_scheduler.py` — APScheduler job, `max_instances=1`, per-connector sync +- `services/internal_ticket_service.py` — CRUD + status transitions for the no-PSA fallback +- `services/l1_session_service.py` — walking-session lifecycle +- `api/endpoints/l1.py` — L1-role endpoints +- `api/endpoints/kb_connectors.py` — KB connector config endpoints (owner-only for write) + +**Reused / extended:** +- `services/rag_service.py` — flow & KB matching (existing) +- `services/flow_matching_engine.py` — existing +- `services/escalation_package_generator.py` — extended to include walked path, AI draft pointer, KB citations +- `models/FlowProposal` — new columns (see §5) +- `services/psa/` — already supports ticket create + reassign across CW/Autotask/HaloPSA +- `services/embedding_service.py` — used by KB ingestion writer +- New `kb_documents` + `kb_document_chunks` tables for RAG-retrievable document storage, separate from the existing `kb_imports` (which is a document→tree conversion record, not a persistent KB store — see §5) +- Audit log writer — gains `acting_as` field + +### 4.2 Data flow — walk-in / phone-call intake + +``` +L1 types: "User can't connect Outlook after password reset" + POST /api/v1/l1/intake + body: { problem_statement, customer_name?, customer_contact? } + → create ticket + - PSA if configured: psa_provider.create_ticket(...) + - else: internal_tickets row + → match_or_build(account_id, problem_text, ticket_ref) + → rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow' + → rag_service.match_proposals(... where validated_by_outcome=true) + → top hit; if score ≥ threshold return as 'proposal' + → ai_tree_builder.build(problem_text, kb_chunks, nearest_flows) + → persist FlowProposal(source='ai_realtime_l1', + linked_ticket_id, + linked_ticket_kind, + validated_by_outcome=false) + → return as 'proposal' + → l1_session_service.start(...) + → return { session_id, target_kind, target_id, intake_type } + → navigate to /l1/walk/{session_id} +``` + +### 4.3 Data flow — PSA-queue intake + +The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls `POST /api/v1/l1/tickets/{ticket_ref}/start` which is the same `match_or_build` path (the `problem_statement` is the ticket subject + description) followed by walker navigation. + +--- + +## 5. Data model + +All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are `TIMESTAMPTZ`. No `--rev-id` on Alembic; no `--autogenerate` for enum/RLS work. + +### 5.1 `FlowProposal` — extended + +Existing AI-draft model. Add columns: + +| Column | Type | Notes | +|---|---|---| +| `source` | `VARCHAR(30) NOT NULL` | `'ai_realtime_l1' \| 'kb_accelerator' \| 'manual_draft'`. Backfill existing rows to `'manual_draft'`. | +| `linked_ticket_id` | `VARCHAR(64) NULL` | PSA id or internal_tickets UUID (stored as text) | +| `linked_ticket_kind` | `VARCHAR(10) NULL` | `'psa' \| 'internal'` | +| `validated_by_outcome` | `BOOLEAN NOT NULL DEFAULT FALSE` | Flipped to true when L1 resolves and marks helpful=true | +| `walked_path_snapshot` | `JSONB NULL` | Frozen at resolve/escalate; shape `[{node_id, question, answer, l1_note}]` | + +Engineer review queue sort: +```sql +ORDER BY validated_by_outcome DESC, created_at DESC +``` + +### 5.2 `internal_tickets` — new + +``` +id UUID PRIMARY KEY +account_id UUID NOT NULL (RLS-scoped) +created_by_user_id UUID NOT NULL (the L1 who took the call) +customer_name VARCHAR(120) +customer_contact VARCHAR(200) NULL (email or phone, free text) +problem_statement TEXT NOT NULL +status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated' +flow_id UUID NULL FK trees +flow_proposal_id UUID NULL FK flow_proposals +ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation) +assigned_user_id UUID NULL (engineer post-escalation) +resolution_notes TEXT NULL +psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA) +created_at TIMESTAMPTZ NOT NULL +updated_at TIMESTAMPTZ NOT NULL +resolved_at TIMESTAMPTZ NULL +``` + +RLS: account-scoped, WITH CHECK on insert/update. + +### 5.3 `kb_connector_configs` — new + +``` +id UUID PRIMARY KEY +account_id UUID NOT NULL (RLS-scoped) +provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph' +display_name VARCHAR(80) NOT NULL +credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py +is_active BOOLEAN NOT NULL DEFAULT TRUE +sync_interval_minutes INTEGER NOT NULL DEFAULT 360 +last_sync_at TIMESTAMPTZ NULL +last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running' +last_sync_error TEXT NULL +created_by_user_id UUID NOT NULL +created_at TIMESTAMPTZ NOT NULL +updated_at TIMESTAMPTZ NOT NULL +UNIQUE (account_id, provider, display_name) +``` + +RLS: account-scoped, WITH CHECK. + +### 5.4 New tables: `kb_documents` + `kb_document_chunks` + +The existing `kb_imports` table is a document→tree conversion record (status lifecycle `processing | ready | committed | failed`, target `tree_id`) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval. + +The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form: + +**`kb_documents`** — one row per ingested document: + +``` +id UUID PRIMARY KEY +account_id UUID NOT NULL (RLS-scoped) +source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph' +source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync +connector_config_id UUID NULL FK kb_connector_configs +title VARCHAR(500) NOT NULL +content TEXT NOT NULL -- full post-extraction text +content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection +metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.) +last_synced_at TIMESTAMPTZ NULL +deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal +created_at TIMESTAMPTZ NOT NULL +updated_at TIMESTAMPTZ NOT NULL +``` + +Unique partial index: `(connector_config_id, source_ref) WHERE source_ref IS NOT NULL`. + +**`kb_document_chunks`** — chunks with embeddings, used by `rag_service.match_kb_chunks`: + +``` +id UUID PRIMARY KEY +document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE +account_id UUID NOT NULL -- denormalized for RLS +chunk_index INTEGER NOT NULL +content TEXT NOT NULL +embedding VECTOR() NOT NULL -- dim matches embedding_service +metadata JSONB NULL -- section title, page number, etc. +created_at TIMESTAMPTZ NOT NULL +UNIQUE (document_id, chunk_index) +``` + +Pgvector index (ivfflat or hnsw) on `embedding`; choice tuned during implementation. + +RLS on both tables: account-scoped, WITH CHECK on insert. + +**Coexistence with `kb_imports`:** when an L1 (or owner) uploads a doc, the system can populate **both** — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into `kb_documents` for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to `kb_documents` (no auto-tree-conversion from synced docs in v1). + +### 5.5 Other column additions + +- `users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSE` +- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` +- `audit_logs.acting_as VARCHAR(30) NULL` — `'l1_coverage'` when engineer is in coverage mode; null otherwise +- `account_role` enum: add `'l1_tech'` + +### 5.6 Migration ordering + +Six manual Alembic revisions (no `--rev-id`, no `--autogenerate`): + +1. Add `'l1_tech'` to `account_role` enum. +2. Add `users.can_cover_l1`, `accounts.l1_seats_purchased`, `audit_logs.acting_as`. +3. Extend `flow_proposals` with new columns + backfill existing rows to `source='manual_draft'`. +4. Create `internal_tickets` + RLS policies (account-scoped, WITH CHECK). +5. Create `kb_connector_configs` + RLS policies. +6. Create `kb_documents` + `kb_document_chunks` tables + RLS policies + pgvector index on chunks. + +Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass `account_id=` explicitly. Grep all `Model(` sites before merge. + +--- + +## 6. Backend services & endpoints + +### 6.1 New services + +| Module | Purpose | +|---|---| +| `services/match_or_build.py` | Orchestrator. Single async entrypoint `match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult`. | +| `services/ai_tree_builder.py` | Real-time AI tree generation. Anthropic via existing `_call_anthropic_cached` pattern. Model tier via `settings.get_model_for_action('l1_realtime_build')`. Output validated against the flow node schema with Pydantic; rejects malformed output. | +| `services/kb_connectors/base.py` | Abstract `KBConnector` with `test_credentials`, `list_documents`, `fetch_content`, `subscribe_to_changes` (optional). | +| `services/kb_connectors/itglue.py` | IT Glue REST client. | +| `services/kb_connectors/hudu.py` | Hudu REST client. | +| `services/kb_connectors/microsoft_graph.py` | Microsoft Graph (SharePoint/OneDrive) client. | +| `services/kb_connectors/registry.py` | `KBConnectorRegistry` (mirrors `PsaProviderRegistry`). | +| `services/kb_connectors/encryption.py` | Fernet wrapper (or reuse the PSA one if generic). | +| `services/kb_ingestion_writer.py` | Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. | +| `services/kb_ingestion_scheduler.py` | APScheduler interval job, `max_instances=1`. Sequential per account; concurrency cap = 4 accounts simultaneously. | +| `services/internal_ticket_service.py` | CRUD + status transitions for `internal_tickets`. | +| `services/l1_session_service.py` | Walking-session lifecycle: start, step, resolve, escalate. Bridges `ai_sessions` and the walked target. | + +### 6.2 Extended services + +- `services/escalation_package_generator.py` — adds inputs: `walked_path`, `ai_draft_proposal_id`, `kb_citations`. New caller path from `l1_session_service.escalate(...)`. +- KB Accelerator endpoint — accepts ingested content via the shared `kb_ingestion_writer`. Manual upload and connector sync share the same persistence path. + +### 6.3 New endpoints + +All under `require_l1_or_coverage` unless noted. Mounted under `/api/v1/l1`. + +| Method | Path | Purpose | Auth | +|---|---|---|---| +| GET | `/l1/queue` | Merged ticket queue (PSA + internal). Pagination + status filter. | `require_l1_or_coverage` | +| POST | `/l1/intake` | Walk-in intake. Body `{problem_statement, customer_name?, customer_contact?}`. Creates ticket, returns `{session_id, target_kind, target_id, intake_type}`. | `require_l1_or_coverage` | +| POST | `/l1/tickets/{ticket_ref}/start` | Start walker from an existing ticket. Internally same as intake but skips ticket creation. | `require_l1_or_coverage` | +| POST | `/l1/sessions/{id}/step` | Record an answer. Body `{node_id, answer, note?}`. Appends to `walked_path_snapshot`. | `require_l1_or_coverage` | +| POST | `/l1/sessions/{id}/resolve` | Close as resolved. Body `{resolution_notes, helpful: bool}`. Sets `validated_by_outcome=true` on the proposal when `helpful=true` AND target was a proposal. Closes the ticket. | `require_l1_or_coverage` | +| POST | `/l1/sessions/{id}/escalate` | Generate escalation package + reassign ticket. Body `{reason, reason_category}`. | `require_l1_or_coverage` | +| GET | `/l1/drafts` | List current user's AI drafts with promotion status. | `require_l1_or_coverage` | + +KB connector endpoints (`/api/v1/kb-connectors`): + +| Method | Path | Purpose | Auth | +|---|---|---|---| +| GET | `/kb-connectors` | List configured connectors for account. | `require_l1_or_above` | +| POST | `/kb-connectors` | Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | `require_account_owner` | +| DELETE | `/kb-connectors/{id}` | Remove (soft-disable). | `require_account_owner` | +| POST | `/kb-connectors/{id}/sync` | Trigger immediate sync (enqueued). | `require_account_owner` | +| GET | `/kb-connectors/{id}/status` | Sync status + doc count + last error. | `require_l1_or_above` | + +Internal ticket endpoints (`/api/v1/internal-tickets`): + +| Method | Path | Purpose | Auth | +|---|---|---|---| +| GET | `/internal-tickets` | List (account-scoped). | `require_l1_or_coverage` | +| GET | `/internal-tickets/{id}` | Detail. | `require_l1_or_coverage` | +| POST | `/internal-tickets/{id}/promote-to-psa` | Push to configured PSA, set `psa_promoted_ticket_id`. | `require_account_owner` | + +User management addition: + +| Method | Path | Purpose | Auth | +|---|---|---|---| +| PATCH | `/users/{id}/coverage` | Set `can_cover_l1` flag. Body `{can_cover_l1: bool}`. | `require_account_owner` | + +--- + +## 7. Frontend surface + +### 7.1 Sidebar — L1 view + +``` +LOGO +───────────── +Workspace /l1 +Tickets /l1/tickets +My Drafts /l1/drafts +───────────── +Guides /guides +Account /account (filtered — no integrations, no categories) +``` + +No `/pilot`, no `/trees`, no `/flows`, no `/review-queue`, no `/escalations`, no team analytics. Sidebar.tsx picks the nav array by role. + +### 7.2 Sidebar — engineer coverage view + +Engineer's existing sidebar plus a single appended entry "L1 Workspace" → `/l1`. Shown when `canCoverL1 || isOwner || isSuperAdmin`. + +### 7.3 `/l1` dashboard layout + +Three vertical zones, single column, max width ~1100px: + +1. **Greeting** — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.") +2. **Describe the problem** card — large textarea (autofocus on load), optional `customer_name` + `customer_contact` fields, single primary CTA "Start walk →" (the only electric-blue element on the page) +3. **Open tickets** — section label, count, table rows (merged PSA + internal with origin badges), row hover `bg-elevated` +4. **Resume in progress** — shown only when L1 has a half-walked session + +Tailwind v4 tokens: `bg-page` base, `bg-card` zones, `bg-elevated` row hover, electric-blue accent only on primary CTA. No `text-secondary`. All borders `border-default`. + +### 7.4 `/l1/walk/{sessionId}` walker + +Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs `flex` + `flex-1` + `min-h-0`). + +**Header:** +- Back arrow + ticket ref + customer name + AI-built badge (when target is proposal) +- Problem statement line +- Persistent action buttons: `[ Escalate ]` `[ Resolve ✓ ]` + +**Left pane (main):** +- "Step N · estimated M" label +- Current node card — large yes/no/answer buttons (min 44px tap target) +- Optional note textarea below the card (appended to `walked_path_snapshot`) +- On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s" + +**Right pane (transcript):** +- Walked-so-far list (node title + answer chosen) +- Current step highlight +- "Source:" section listing KB citations for the current node (proposal walks only) + +**Resolve modal:** +- "Did this resolve it?" `[ Yes ]` `[ No ]` +- Resolution notes textarea +- Yes + target was proposal → sets `validated_by_outcome=true` +- No → prompt to escalate instead + +**Escalate modal:** +- Reason category dropdown: *Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · Other* +- Free-text reason +- Confirm + +### 7.5 `/l1/drafts` page + +Read-only list, columns: `created` · `problem (truncated)` · `ticket #` · `status` (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances. + +### 7.6 `/l1/tickets` page + +Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me. + +### 7.7 Coverage banner + +`` — slim ~32px band, info-cyan-dim background, mounted at the top of all `/l1/*` pages when `!isL1Tech && (canCoverL1 || isOwner || isSuperAdmin)`: + +``` +You're covering L1. Actions logged as coverage. [Switch back →] +``` + +The "Switch back" link returns to `/`. + +### 7.8 Routing + +```tsx +const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard')) +const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage')) +const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage')) +const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage')) +``` + +Mounted under the `/` ProtectedRoute branch at: +- `/l1` → `L1Dashboard` +- `/l1/walk/:sessionId` → `L1WalkPage` +- `/l1/drafts` → `L1DraftsPage` +- `/l1/tickets` → `L1TicketsPage` + +Wrapped in `L1RouteGuard` (403 if not `l1_tech` AND not coverage-flagged). `ProtectedRoute.tsx` post-login redirect: L1 users land on `/l1` instead of `/`. + +`lazyWithRetry`, not `React.lazy` (per existing convention). + +--- + +## 8. AI match-or-build pipeline + +### 8.1 Match-or-build algorithm + +``` +match_or_build(account_id, problem_text, ticket_ref): + embedding = embedding_service.embed(problem_text) + + # 1. Match authored flows + flow_hits = rag_service.match_flows(account_id, embedding, k=5) + if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD: + return {kind: 'flow', id: flow_hits[0].flow_id, score: ...} + + # 2. Match outcome-validated proposals only + proposal_hits = rag_service.match_proposals( + account_id, embedding, k=5, + where=validated_by_outcome=true, + ) + if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD: + return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...} + + # 3. Build fresh + kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8) + if not kb_chunks: + raise BuildAbortedNoKB( + "Cannot build a tree with no KB content. " + "Upload docs or wait for a connector sync." + ) + nearest_flows = flow_hits[:3] + proposal = ai_tree_builder.build( + problem_text, kb_chunks, nearest_flows, account_id, ticket_ref + ) + return {kind: 'proposal', id: proposal.id, score: None} +``` + +`MATCH_THRESHOLD` — per-account configurable; default `0.75` (cosine). + +The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes. + +### 8.2 AI tree-build details + +**Model:** `settings.get_model_for_action('l1_realtime_build')`. Recommend Sonnet for v1 (latency-sensitive). + +**Schema:** output validated against the existing flow node schema (matches `tree_editor` output). Validation failure aborts the build rather than persisting malformed data. + +**Prompt strategy** (per Lesson on prompt anti-parrot — critical): +- System prompt: role definition + output schema using `` notation only. Never literal field values. +- Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt. +- User message: `{problem_statement}` + `{kb_context: [doc_title, section, content]}` + `{nearest_flow_summaries}` + instruction to cite KB chunks per node. +- Output includes `kb_citations: [{node_id, kb_doc_id, snippet}]` for walker's "Source:" pane and engineer review. + +**Latency:** whole-tree-then-return (~5–15s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2. + +**Anthropic SDK config** (per Lesson): `max_retries=1`. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account). + +**Telemetry:** +- `l1.match_or_build.duration_ms`, `l1.match_or_build.outcome` (`flow_match`/`proposal_match`/`built`/`aborted_no_kb`) +- `anthropic.cache` events (existing pattern) tagged `action=l1_realtime_build` +- `l1.tree_build.tokens_in`, `tokens_out` + +**Anti-parrot guardrail:** the existing `tests/test_prompt_anti_parrot.py` auto-discovers new prompt constants via pattern match on `*_PROMPT` / `*_SCHEMA` / `*_PROTOCOL` / `*_FORMAT`. No new test required. + +### 8.3 Hallucinated-citation defense + +After build, the writer verifies every `kb_doc_id` in `kb_citations` exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning. + +--- + +## 9. KB ingestion + +### 9.1 Connector interface + +```python +class KBConnector(ABC): + async def test_credentials(self) -> bool + async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef] + async def fetch_content(self, ref: KBDocRef) -> KBDocContent + async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1 +``` + +Registry dispatches by `provider` string. Credentials encrypted at rest via Fernet (reuse `services/psa/encryption.py` pattern). + +### 9.2 Per-connector specifics + +| | IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) | +|---|---|---|---| +| Auth | API token (header) | API key (header) | OAuth 2.0 | +| Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt | +| Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching `(secret\|confidential\|private)` heuristic; files with a tenant Sensitivity Label | +| Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) | +| Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff | + +All three deliver content to `kb_ingestion_writer` which: +1. Chunks (paragraph-aware, configurable size with overlap) +2. Embeds via `embedding_service` +3. Upserts into `kb_documents` keyed on `(connector_config_id, source_ref)`; chunks into `kb_document_chunks` + +Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped `source_ref`). Engineers can dedup manually if needed. + +### 9.3 Sync scheduling + +`kb_ingestion_scheduler.py` runs as APScheduler interval job, `max_instances=1`. Per cycle: +1. Query active `kb_connector_configs` where `last_sync_at` is older than `sync_interval_minutes` (default 360 = 6h). +2. Dispatch per account; concurrency cap = 4 simultaneous accounts. +3. For each connector: `list_documents(since=last_sync_at)` → for each ref, `fetch_content` → write. +4. Compute the diff between current refs and existing rows (same `connector_config_id`); soft-delete missing ones via `deleted_at`. +5. Update `last_sync_at`, `last_sync_status`, `last_sync_error`. + +Must use `_admin_session_factory()` not `get_db()` for startup-side and scheduler-side queries (per Lesson on RLS at startup — no `app.current_account_id` set). + +Immediate sync via `POST /api/v1/kb-connectors/{id}/sync` enqueues a job; scheduler picks it up within ~30s. + +--- + +## 10. Escalation flow + +1. L1 clicks **Escalate** → modal (reason category + optional free text). +2. `POST /api/v1/l1/sessions/{id}/escalate` → backend: + - Calls extended `escalation_package_generator.generate(session_id, include_l1_walk=true)`. Package contents: + ``` + problem_statement, customer_name, customer_contact, + ticket_ref (PSA id or internal id), + target_kind ('flow' | 'proposal'), target_id, + walked_path, + ai_draft_proposal_id, + kb_citations, + escalation_reason, reason_category, l1_user_id + ``` + - Creates an `ai_session` with the package serialized into system context for the chat surface. + - If PSA-backed: `psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name)`. Default `'Tier 2'`. Owner configurable in `/account/integrations`. + - If internal-backed: `internal_tickets.status='escalated'`, `assigned_user_id=null` (round-robin assignment is out of scope). + - Writes notification via existing `notification_service` — bell badge to all engineers in account. + - Audit log entry; `acting_as` reflects whether L1 or coverage-engineer escalated. +3. Toast on L1 side, return to `/l1`. +4. Engineer clicks notification → `/pilot/{sessionId}` → chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat. + +**Un-escalate is out of scope.** If engineer wants to bounce back, they reassign in PSA manually. + +--- + +## 11. Internal ticket fallback + +When the account has no active PSA provider: +- Intake creates `internal_tickets` row instead of a PSA ticket. +- Queue surface merges PSA + internal with `Internal` / `PSA` origin badge. +- Escalation flips `internal_tickets.status='escalated'` and assigns engineer (or leaves null for any engineer to claim — v1 behavior). +- Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip. + +**Promote to PSA:** owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets `psa_promoted_ticket_id`. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets. + +--- + +## 12. Outcome-validation lifecycle + +``` +1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1', + validated_by_outcome=false, + linked_ticket_id=...) +2. L1 walks → POST /l1/sessions/{id}/step appends to walked_path_snapshot +3. L1 hits Resolve: + modal: "Did this resolve it?" [Yes] [No] + resolution_notes +4. helpful=true → flow_proposal.validated_by_outcome = true + → walked_path_snapshot frozen + → ticket closed (PSA or internal) + helpful=false → validated_by_outcome stays false + → L1 prompted: "Escalate instead?" +5. Engineer review queue: + ORDER BY validated_by_outcome DESC, created_at DESC + - Outcome-validated drafts surface first + - Promote / edit-and-promote / retire +6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted' + → future match_or_build matches the new flow on the flow-match pass +``` + +--- + +## 13. Out of scope (v1 non-goals) + +- End-user / self-service portal ("L0" tier). +- Engineer warm-transfer / live take-over during a call. +- L1 ↔ engineer real-time chat during a call. +- Multi-language UI / customer-language toggle in walker. +- Auto-promote internal tickets to PSA on integration install. +- AI tree streaming (node-by-node). +- KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion). +- Confluence connector. +- Per-step KB citation editing in engineer review (engineers edit the tree, not citations). +- Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard). +- "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only). +- Cancel/un-escalate flow. +- Round-robin engineer assignment on internal-ticket escalations. + +--- + +## 14. Testing strategy + +### 14.1 Backend (pytest) + +- Unit: `match_or_build` covers all four paths (flow-match, proposal-match, built, aborted_no_kb). +- Unit: `ai_tree_builder` schema validation — assert rejection of malformed Anthropic output before persistence. +- Unit: each connector's `list_documents` + `fetch_content` against recorded HTTP fixtures. +- Integration: intake → walk → resolve(helpful=true) → assert `FlowProposal.validated_by_outcome=true`, ticket closed. +- Integration: intake → walk → escalate → assert PSA `reassign_ticket` invoked, `ai_session` created with package, audit log entry, notification dispatched. +- Integration: KB scheduler — `max_instances=1`, sequential per-account, soft-delete on removal. +- **RLS regression** (highest priority): `l1_tech` user in account A cannot read account B's tickets, drafts, KB docs, or connector configs. Added to existing RLS test suite. +- Anti-parrot: existing CI test auto-discovers new prompt module. + +### 14.2 Frontend + +- Unit: `usePermissions` — L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths. +- Unit: `L1WalkPage` — node advance, escalate modal, resolve modal flips `validated_by_outcome` correctly. +- Unit: `L1CoverageBanner` — visible for engineer-with-flag on `/l1/*`, hidden for L1 users. +- E2E (Playwright, scoped selectors per Lesson): + - L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged. + - Engineer with `can_cover_l1` → sidebar entry visible → click → coverage banner shows → walks a session → audit log records `acting_as='l1_coverage'`. + - L1 hitting `/pilot`, `/trees/new`, `/escalations` → 403 or redirect. + +--- + +## 15. Acceptance criteria (v1 ships when…) + +- L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable. +- L1 intake creates a ticket (PSA or internal) and lands in walker session. +- Walker handles both flows and proposals; AI-built badge + sources shown for proposals. +- Escalate generates package, reassigns ticket, notifies engineers. +- Resolve flips `validated_by_outcome`; review queue prioritizes outcome-validated drafts. +- All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal. +- AI build refuses with informative error when account KB is empty. +- Coverage flag works end-to-end with audit-log tagging. +- RLS blocks cross-tenant reads on every new table. +- L1 seat count tracked separately from engineer seats in admin/billing UI. + +--- + +## 16. Risks & mitigations + +| Risk | Mitigation | +|---|---| +| AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. | +| Hallucinated KB citations | Post-build verification that each `kb_doc_id` exists; unverified citations stripped from walker, surfaced as warning in engineer review. | +| Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. | +| KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. All ingested docs visible in `/account/kb` for manual deletion. | +| AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. | +| Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~1–2 weeks of work. Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. | +| Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from `'proposal'` to `'flow'`, not the L1's ability to use validated content. | + +--- + +## 17. Naming reference + +| Layer | Value | +|---|---| +| DB enum (`account_role`) | `l1_tech` | +| UI display | "L1 Tech" / "L1" | +| Sidebar entry | "L1 Workspace" | +| URL prefix | `/l1` | +| Coverage flag column | `users.can_cover_l1` | +| Coverage audit tag | `acting_as = 'l1_coverage'` | +| Pricing label | "L1 seat" | +| Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now | + +--- + +## 18. Open implementation decisions (deferred to plan, not blocking design) + +- Specific `MATCH_THRESHOLD` default value validation (initial 0.75, tune from telemetry post-launch). +- Specific Anthropic model choice for `l1_realtime_build` (Sonnet vs Opus — pick based on quality benchmark during plan). +- Chunk size + overlap for KB ingestion writer (tune in implementation). +- Engineer queue label default (`'Tier 2'` vs `'Engineering'`) — owner-configurable anyway. +- Exact look of the build-progress shimmer animation — design-system handoff. + +These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here. + +### Note on scope and phasing + +This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is: + +- **Phase 1:** role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end. +- **Phase 2:** `kb_documents` schema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB. +- **Phase 3:** the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks. + +Phasing is a plan-level decision; the spec captures the full feature. + +--- + +*End of spec.*