Files
resolutionflow/docs/superpowers/specs/2026-05-28-l1-workspace-design.md
Michael Chihlas d1cf77cd41 docs(design): L1 workspace feature spec
New seat tier between engineer and viewer. Dedicated /l1 surface
(dashboard + walker + drafts) for first-call helpdesk staff. Walk-in
intake + PSA queue both produce tickets. Match-or-build pipeline
prefers authored flows, then outcome-validated AI drafts, then builds
fresh from KB. Three KB connectors: IT Glue, Hudu, SharePoint/OneDrive.
Escalation via package + PSA reassign, picked up in chat. Engineer
coverage via per-user can_cover_l1 flag with audit-log tagging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 03:33:32 -04:00

718 lines
37 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# L1 Workspace — Design Spec
**Date:** 2026-05-28
**Status:** Draft (pending implementation plan)
**Audience for this doc:** engineers + reviewers building the L1 workspace feature
---
## 1. Summary
Introduce a dedicated **L1 helpdesk** workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive.
This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface.
---
## 2. Motivation
The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (`AssistantChatPage` mounted at `/pilot`). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who:
- Take inbound phone calls and need a fast, deterministic click-through UX
- Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.)
- Are not authorized to escalate complex issues themselves; they hand off to engineers
A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline.
---
## 3. Users & roles
### 3.1 Role hierarchy
`super_admin > owner > engineer > l1_tech > viewer`
`l1_tech` is added to the `account_role` enum. Permissions enforced via `app/core/permissions.py` and `app/api/deps.py`.
### 3.2 What L1 can do
- Use the `/l1/*` surface
- Open tickets from their queue (PSA-fed or internal)
- Intake walk-in/phone-call problems (creates a ticket as a side effect)
- Walk authored flows and AI-built FlowProposal drafts
- Resolve or escalate a session
- View their own AI drafts list (read-only — outcome tags shown)
### 3.3 What L1 cannot do
- See the chat surface (`/pilot`) — sidebar hidden, route 403s
- Author or edit flows
- See `/review-queue` or `/escalations` (engineer inboxes)
- See team analytics (only `/analytics/me`)
- Promote AI drafts (engineers/owners only, via existing review queue)
- Configure KB connectors (owner-only)
### 3.4 Engineer L1 coverage
Engineers do NOT see the L1 surface by default. Owners can toggle `users.can_cover_l1 = true` on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in `/l1/*` with a sticky banner: *"Covering L1 — actions logged as coverage."* Coverage actions are audit-logged with `acting_as = 'l1_coverage'`.
Backend dep: `require_l1_or_coverage` = `l1_tech | (engineer AND can_cover_l1) | owner | super_admin`.
This mirrors the existing orthogonal-flag pattern (`is_team_admin`) — no new architectural concept.
### 3.5 Billing data model
- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` (new column)
- Existing `accounts.seats_purchased` continues to represent engineer seats
- New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band
---
## 4. Architecture overview
### 4.1 New components
**Frontend:**
- `pages/l1/L1Dashboard.tsx` — landing page; ticket queue + describe-the-problem intake
- `pages/l1/L1WalkPage.tsx` — purpose-built walker; yes/no cards, transcript, persistent escalate/resolve
- `pages/l1/L1DraftsPage.tsx` — read-only list of the L1's AI drafts and promotion status
- `pages/l1/L1TicketsPage.tsx` — full-page queue (PSA + internal merged)
- `components/l1/L1CoverageBanner.tsx` — slim banner shown to engineer-coverers
**Backend:**
- `services/match_or_build.py` — orchestrator (RAG match → fallback to AI build)
- `services/ai_tree_builder.py` — real-time AI tree generation via Anthropic
- `services/kb_connectors/` package — base, registry, encryption, plus `itglue.py`, `hudu.py`, `microsoft_graph.py`
- `services/kb_ingestion_writer.py` — shared writer used by manual upload + all connectors
- `services/kb_ingestion_scheduler.py` — APScheduler job, `max_instances=1`, per-connector sync
- `services/internal_ticket_service.py` — CRUD + status transitions for the no-PSA fallback
- `services/l1_session_service.py` — walking-session lifecycle
- `api/endpoints/l1.py` — L1-role endpoints
- `api/endpoints/kb_connectors.py` — KB connector config endpoints (owner-only for write)
**Reused / extended:**
- `services/rag_service.py` — flow & KB matching (existing)
- `services/flow_matching_engine.py` — existing
- `services/escalation_package_generator.py` — extended to include walked path, AI draft pointer, KB citations
- `models/FlowProposal` — new columns (see §5)
- `services/psa/` — already supports ticket create + reassign across CW/Autotask/HaloPSA
- `services/embedding_service.py` — used by KB ingestion writer
- New `kb_documents` + `kb_document_chunks` tables for RAG-retrievable document storage, separate from the existing `kb_imports` (which is a document→tree conversion record, not a persistent KB store — see §5)
- Audit log writer — gains `acting_as` field
### 4.2 Data flow — walk-in / phone-call intake
```
L1 types: "User can't connect Outlook after password reset"
POST /api/v1/l1/intake
body: { problem_statement, customer_name?, customer_contact? }
→ create ticket
- PSA if configured: psa_provider.create_ticket(...)
- else: internal_tickets row
→ match_or_build(account_id, problem_text, ticket_ref)
→ rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow'
→ rag_service.match_proposals(... where validated_by_outcome=true)
→ top hit; if score ≥ threshold return as 'proposal'
→ ai_tree_builder.build(problem_text, kb_chunks, nearest_flows)
→ persist FlowProposal(source='ai_realtime_l1',
linked_ticket_id,
linked_ticket_kind,
validated_by_outcome=false)
→ return as 'proposal'
→ l1_session_service.start(...)
→ return { session_id, target_kind, target_id, intake_type }
→ navigate to /l1/walk/{session_id}
```
### 4.3 Data flow — PSA-queue intake
The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls `POST /api/v1/l1/tickets/{ticket_ref}/start` which is the same `match_or_build` path (the `problem_statement` is the ticket subject + description) followed by walker navigation.
---
## 5. Data model
All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are `TIMESTAMPTZ`. No `--rev-id` on Alembic; no `--autogenerate` for enum/RLS work.
### 5.1 `FlowProposal` — extended
Existing AI-draft model. Add columns:
| Column | Type | Notes |
|---|---|---|
| `source` | `VARCHAR(30) NOT NULL` | `'ai_realtime_l1' \| 'kb_accelerator' \| 'manual_draft'`. Backfill existing rows to `'manual_draft'`. |
| `linked_ticket_id` | `VARCHAR(64) NULL` | PSA id or internal_tickets UUID (stored as text) |
| `linked_ticket_kind` | `VARCHAR(10) NULL` | `'psa' \| 'internal'` |
| `validated_by_outcome` | `BOOLEAN NOT NULL DEFAULT FALSE` | Flipped to true when L1 resolves and marks helpful=true |
| `walked_path_snapshot` | `JSONB NULL` | Frozen at resolve/escalate; shape `[{node_id, question, answer, l1_note}]` |
Engineer review queue sort:
```sql
ORDER BY validated_by_outcome DESC, created_at DESC
```
### 5.2 `internal_tickets` — new
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
created_by_user_id UUID NOT NULL (the L1 who took the call)
customer_name VARCHAR(120)
customer_contact VARCHAR(200) NULL (email or phone, free text)
problem_statement TEXT NOT NULL
status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated'
flow_id UUID NULL FK trees
flow_proposal_id UUID NULL FK flow_proposals
ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation)
assigned_user_id UUID NULL (engineer post-escalation)
resolution_notes TEXT NULL
psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA)
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
resolved_at TIMESTAMPTZ NULL
```
RLS: account-scoped, WITH CHECK on insert/update.
### 5.3 `kb_connector_configs` — new
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph'
display_name VARCHAR(80) NOT NULL
credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py
is_active BOOLEAN NOT NULL DEFAULT TRUE
sync_interval_minutes INTEGER NOT NULL DEFAULT 360
last_sync_at TIMESTAMPTZ NULL
last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running'
last_sync_error TEXT NULL
created_by_user_id UUID NOT NULL
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
UNIQUE (account_id, provider, display_name)
```
RLS: account-scoped, WITH CHECK.
### 5.4 New tables: `kb_documents` + `kb_document_chunks`
The existing `kb_imports` table is a document→tree conversion record (status lifecycle `processing | ready | committed | failed`, target `tree_id`) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval.
The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form:
**`kb_documents`** — one row per ingested document:
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph'
source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync
connector_config_id UUID NULL FK kb_connector_configs
title VARCHAR(500) NOT NULL
content TEXT NOT NULL -- full post-extraction text
content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection
metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.)
last_synced_at TIMESTAMPTZ NULL
deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
```
Unique partial index: `(connector_config_id, source_ref) WHERE source_ref IS NOT NULL`.
**`kb_document_chunks`** — chunks with embeddings, used by `rag_service.match_kb_chunks`:
```
id UUID PRIMARY KEY
document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE
account_id UUID NOT NULL -- denormalized for RLS
chunk_index INTEGER NOT NULL
content TEXT NOT NULL
embedding VECTOR(<dim>) NOT NULL -- dim matches embedding_service
metadata JSONB NULL -- section title, page number, etc.
created_at TIMESTAMPTZ NOT NULL
UNIQUE (document_id, chunk_index)
```
Pgvector index (ivfflat or hnsw) on `embedding`; choice tuned during implementation.
RLS on both tables: account-scoped, WITH CHECK on insert.
**Coexistence with `kb_imports`:** when an L1 (or owner) uploads a doc, the system can populate **both** — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into `kb_documents` for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to `kb_documents` (no auto-tree-conversion from synced docs in v1).
### 5.5 Other column additions
- `users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSE`
- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0`
- `audit_logs.acting_as VARCHAR(30) NULL``'l1_coverage'` when engineer is in coverage mode; null otherwise
- `account_role` enum: add `'l1_tech'`
### 5.6 Migration ordering
Six manual Alembic revisions (no `--rev-id`, no `--autogenerate`):
1. Add `'l1_tech'` to `account_role` enum.
2. Add `users.can_cover_l1`, `accounts.l1_seats_purchased`, `audit_logs.acting_as`.
3. Extend `flow_proposals` with new columns + backfill existing rows to `source='manual_draft'`.
4. Create `internal_tickets` + RLS policies (account-scoped, WITH CHECK).
5. Create `kb_connector_configs` + RLS policies.
6. Create `kb_documents` + `kb_document_chunks` tables + RLS policies + pgvector index on chunks.
Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass `account_id=` explicitly. Grep all `Model(` sites before merge.
---
## 6. Backend services & endpoints
### 6.1 New services
| Module | Purpose |
|---|---|
| `services/match_or_build.py` | Orchestrator. Single async entrypoint `match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult`. |
| `services/ai_tree_builder.py` | Real-time AI tree generation. Anthropic via existing `_call_anthropic_cached` pattern. Model tier via `settings.get_model_for_action('l1_realtime_build')`. Output validated against the flow node schema with Pydantic; rejects malformed output. |
| `services/kb_connectors/base.py` | Abstract `KBConnector` with `test_credentials`, `list_documents`, `fetch_content`, `subscribe_to_changes` (optional). |
| `services/kb_connectors/itglue.py` | IT Glue REST client. |
| `services/kb_connectors/hudu.py` | Hudu REST client. |
| `services/kb_connectors/microsoft_graph.py` | Microsoft Graph (SharePoint/OneDrive) client. |
| `services/kb_connectors/registry.py` | `KBConnectorRegistry` (mirrors `PsaProviderRegistry`). |
| `services/kb_connectors/encryption.py` | Fernet wrapper (or reuse the PSA one if generic). |
| `services/kb_ingestion_writer.py` | Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. |
| `services/kb_ingestion_scheduler.py` | APScheduler interval job, `max_instances=1`. Sequential per account; concurrency cap = 4 accounts simultaneously. |
| `services/internal_ticket_service.py` | CRUD + status transitions for `internal_tickets`. |
| `services/l1_session_service.py` | Walking-session lifecycle: start, step, resolve, escalate. Bridges `ai_sessions` and the walked target. |
### 6.2 Extended services
- `services/escalation_package_generator.py` — adds inputs: `walked_path`, `ai_draft_proposal_id`, `kb_citations`. New caller path from `l1_session_service.escalate(...)`.
- KB Accelerator endpoint — accepts ingested content via the shared `kb_ingestion_writer`. Manual upload and connector sync share the same persistence path.
### 6.3 New endpoints
All under `require_l1_or_coverage` unless noted. Mounted under `/api/v1/l1`.
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/l1/queue` | Merged ticket queue (PSA + internal). Pagination + status filter. | `require_l1_or_coverage` |
| POST | `/l1/intake` | Walk-in intake. Body `{problem_statement, customer_name?, customer_contact?}`. Creates ticket, returns `{session_id, target_kind, target_id, intake_type}`. | `require_l1_or_coverage` |
| POST | `/l1/tickets/{ticket_ref}/start` | Start walker from an existing ticket. Internally same as intake but skips ticket creation. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/step` | Record an answer. Body `{node_id, answer, note?}`. Appends to `walked_path_snapshot`. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/resolve` | Close as resolved. Body `{resolution_notes, helpful: bool}`. Sets `validated_by_outcome=true` on the proposal when `helpful=true` AND target was a proposal. Closes the ticket. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/escalate` | Generate escalation package + reassign ticket. Body `{reason, reason_category}`. | `require_l1_or_coverage` |
| GET | `/l1/drafts` | List current user's AI drafts with promotion status. | `require_l1_or_coverage` |
KB connector endpoints (`/api/v1/kb-connectors`):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/kb-connectors` | List configured connectors for account. | `require_l1_or_above` |
| POST | `/kb-connectors` | Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | `require_account_owner` |
| DELETE | `/kb-connectors/{id}` | Remove (soft-disable). | `require_account_owner` |
| POST | `/kb-connectors/{id}/sync` | Trigger immediate sync (enqueued). | `require_account_owner` |
| GET | `/kb-connectors/{id}/status` | Sync status + doc count + last error. | `require_l1_or_above` |
Internal ticket endpoints (`/api/v1/internal-tickets`):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/internal-tickets` | List (account-scoped). | `require_l1_or_coverage` |
| GET | `/internal-tickets/{id}` | Detail. | `require_l1_or_coverage` |
| POST | `/internal-tickets/{id}/promote-to-psa` | Push to configured PSA, set `psa_promoted_ticket_id`. | `require_account_owner` |
User management addition:
| Method | Path | Purpose | Auth |
|---|---|---|---|
| PATCH | `/users/{id}/coverage` | Set `can_cover_l1` flag. Body `{can_cover_l1: bool}`. | `require_account_owner` |
---
## 7. Frontend surface
### 7.1 Sidebar — L1 view
```
LOGO
─────────────
Workspace /l1
Tickets /l1/tickets
My Drafts /l1/drafts
─────────────
Guides /guides
Account /account (filtered — no integrations, no categories)
```
No `/pilot`, no `/trees`, no `/flows`, no `/review-queue`, no `/escalations`, no team analytics. Sidebar.tsx picks the nav array by role.
### 7.2 Sidebar — engineer coverage view
Engineer's existing sidebar plus a single appended entry "L1 Workspace" → `/l1`. Shown when `canCoverL1 || isOwner || isSuperAdmin`.
### 7.3 `/l1` dashboard layout
Three vertical zones, single column, max width ~1100px:
1. **Greeting** — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.")
2. **Describe the problem** card — large textarea (autofocus on load), optional `customer_name` + `customer_contact` fields, single primary CTA "Start walk →" (the only electric-blue element on the page)
3. **Open tickets** — section label, count, table rows (merged PSA + internal with origin badges), row hover `bg-elevated`
4. **Resume in progress** — shown only when L1 has a half-walked session
Tailwind v4 tokens: `bg-page` base, `bg-card` zones, `bg-elevated` row hover, electric-blue accent only on primary CTA. No `text-secondary`. All borders `border-default`.
### 7.4 `/l1/walk/{sessionId}` walker
Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs `flex` + `flex-1` + `min-h-0`).
**Header:**
- Back arrow + ticket ref + customer name + AI-built badge (when target is proposal)
- Problem statement line
- Persistent action buttons: `[ Escalate ]` `[ Resolve ✓ ]`
**Left pane (main):**
- "Step N · estimated M" label
- Current node card — large yes/no/answer buttons (min 44px tap target)
- Optional note textarea below the card (appended to `walked_path_snapshot`)
- On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s"
**Right pane (transcript):**
- Walked-so-far list (node title + answer chosen)
- Current step highlight
- "Source:" section listing KB citations for the current node (proposal walks only)
**Resolve modal:**
- "Did this resolve it?" `[ Yes ]` `[ No ]`
- Resolution notes textarea
- Yes + target was proposal → sets `validated_by_outcome=true`
- No → prompt to escalate instead
**Escalate modal:**
- Reason category dropdown: *Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · Other*
- Free-text reason
- Confirm
### 7.5 `/l1/drafts` page
Read-only list, columns: `created` · `problem (truncated)` · `ticket #` · `status` (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances.
### 7.6 `/l1/tickets` page
Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me.
### 7.7 Coverage banner
`<L1CoverageBanner />` — slim ~32px band, info-cyan-dim background, mounted at the top of all `/l1/*` pages when `!isL1Tech && (canCoverL1 || isOwner || isSuperAdmin)`:
```
You're covering L1. Actions logged as coverage. [Switch back →]
```
The "Switch back" link returns to `/`.
### 7.8 Routing
```tsx
const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard'))
const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage'))
const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage'))
const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage'))
```
Mounted under the `/` ProtectedRoute branch at:
- `/l1``L1Dashboard`
- `/l1/walk/:sessionId``L1WalkPage`
- `/l1/drafts``L1DraftsPage`
- `/l1/tickets``L1TicketsPage`
Wrapped in `L1RouteGuard` (403 if not `l1_tech` AND not coverage-flagged). `ProtectedRoute.tsx` post-login redirect: L1 users land on `/l1` instead of `/`.
`lazyWithRetry`, not `React.lazy` (per existing convention).
---
## 8. AI match-or-build pipeline
### 8.1 Match-or-build algorithm
```
match_or_build(account_id, problem_text, ticket_ref):
embedding = embedding_service.embed(problem_text)
# 1. Match authored flows
flow_hits = rag_service.match_flows(account_id, embedding, k=5)
if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'flow', id: flow_hits[0].flow_id, score: ...}
# 2. Match outcome-validated proposals only
proposal_hits = rag_service.match_proposals(
account_id, embedding, k=5,
where=validated_by_outcome=true,
)
if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...}
# 3. Build fresh
kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8)
if not kb_chunks:
raise BuildAbortedNoKB(
"Cannot build a tree with no KB content. "
"Upload docs or wait for a connector sync."
)
nearest_flows = flow_hits[:3]
proposal = ai_tree_builder.build(
problem_text, kb_chunks, nearest_flows, account_id, ticket_ref
)
return {kind: 'proposal', id: proposal.id, score: None}
```
`MATCH_THRESHOLD` — per-account configurable; default `0.75` (cosine).
The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes.
### 8.2 AI tree-build details
**Model:** `settings.get_model_for_action('l1_realtime_build')`. Recommend Sonnet for v1 (latency-sensitive).
**Schema:** output validated against the existing flow node schema (matches `tree_editor` output). Validation failure aborts the build rather than persisting malformed data.
**Prompt strategy** (per Lesson on prompt anti-parrot — critical):
- System prompt: role definition + output schema using `<placeholder>` notation only. Never literal field values.
- Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt.
- User message: `{problem_statement}` + `{kb_context: [doc_title, section, content]}` + `{nearest_flow_summaries}` + instruction to cite KB chunks per node.
- Output includes `kb_citations: [{node_id, kb_doc_id, snippet}]` for walker's "Source:" pane and engineer review.
**Latency:** whole-tree-then-return (~515s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2.
**Anthropic SDK config** (per Lesson): `max_retries=1`. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account).
**Telemetry:**
- `l1.match_or_build.duration_ms`, `l1.match_or_build.outcome` (`flow_match`/`proposal_match`/`built`/`aborted_no_kb`)
- `anthropic.cache` events (existing pattern) tagged `action=l1_realtime_build`
- `l1.tree_build.tokens_in`, `tokens_out`
**Anti-parrot guardrail:** the existing `tests/test_prompt_anti_parrot.py` auto-discovers new prompt constants via pattern match on `*_PROMPT` / `*_SCHEMA` / `*_PROTOCOL` / `*_FORMAT`. No new test required.
### 8.3 Hallucinated-citation defense
After build, the writer verifies every `kb_doc_id` in `kb_citations` exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning.
---
## 9. KB ingestion
### 9.1 Connector interface
```python
class KBConnector(ABC):
async def test_credentials(self) -> bool
async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef]
async def fetch_content(self, ref: KBDocRef) -> KBDocContent
async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1
```
Registry dispatches by `provider` string. Credentials encrypted at rest via Fernet (reuse `services/psa/encryption.py` pattern).
### 9.2 Per-connector specifics
| | IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) |
|---|---|---|---|
| Auth | API token (header) | API key (header) | OAuth 2.0 |
| Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt |
| Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching `(secret\|confidential\|private)` heuristic; files with a tenant Sensitivity Label |
| Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) |
| Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff |
All three deliver content to `kb_ingestion_writer` which:
1. Chunks (paragraph-aware, configurable size with overlap)
2. Embeds via `embedding_service`
3. Upserts into `kb_documents` keyed on `(connector_config_id, source_ref)`; chunks into `kb_document_chunks`
Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped `source_ref`). Engineers can dedup manually if needed.
### 9.3 Sync scheduling
`kb_ingestion_scheduler.py` runs as APScheduler interval job, `max_instances=1`. Per cycle:
1. Query active `kb_connector_configs` where `last_sync_at` is older than `sync_interval_minutes` (default 360 = 6h).
2. Dispatch per account; concurrency cap = 4 simultaneous accounts.
3. For each connector: `list_documents(since=last_sync_at)` → for each ref, `fetch_content` → write.
4. Compute the diff between current refs and existing rows (same `connector_config_id`); soft-delete missing ones via `deleted_at`.
5. Update `last_sync_at`, `last_sync_status`, `last_sync_error`.
Must use `_admin_session_factory()` not `get_db()` for startup-side and scheduler-side queries (per Lesson on RLS at startup — no `app.current_account_id` set).
Immediate sync via `POST /api/v1/kb-connectors/{id}/sync` enqueues a job; scheduler picks it up within ~30s.
---
## 10. Escalation flow
1. L1 clicks **Escalate** → modal (reason category + optional free text).
2. `POST /api/v1/l1/sessions/{id}/escalate` → backend:
- Calls extended `escalation_package_generator.generate(session_id, include_l1_walk=true)`. Package contents:
```
problem_statement, customer_name, customer_contact,
ticket_ref (PSA id or internal id),
target_kind ('flow' | 'proposal'), target_id,
walked_path,
ai_draft_proposal_id,
kb_citations,
escalation_reason, reason_category, l1_user_id
```
- Creates an `ai_session` with the package serialized into system context for the chat surface.
- If PSA-backed: `psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name)`. Default `'Tier 2'`. Owner configurable in `/account/integrations`.
- If internal-backed: `internal_tickets.status='escalated'`, `assigned_user_id=null` (round-robin assignment is out of scope).
- Writes notification via existing `notification_service` — bell badge to all engineers in account.
- Audit log entry; `acting_as` reflects whether L1 or coverage-engineer escalated.
3. Toast on L1 side, return to `/l1`.
4. Engineer clicks notification → `/pilot/{sessionId}` → chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat.
**Un-escalate is out of scope.** If engineer wants to bounce back, they reassign in PSA manually.
---
## 11. Internal ticket fallback
When the account has no active PSA provider:
- Intake creates `internal_tickets` row instead of a PSA ticket.
- Queue surface merges PSA + internal with `Internal` / `PSA` origin badge.
- Escalation flips `internal_tickets.status='escalated'` and assigns engineer (or leaves null for any engineer to claim — v1 behavior).
- Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip.
**Promote to PSA:** owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets `psa_promoted_ticket_id`. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets.
---
## 12. Outcome-validation lifecycle
```
1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1',
validated_by_outcome=false,
linked_ticket_id=...)
2. L1 walks → POST /l1/sessions/{id}/step appends to walked_path_snapshot
3. L1 hits Resolve:
modal: "Did this resolve it?" [Yes] [No] + resolution_notes
4. helpful=true → flow_proposal.validated_by_outcome = true
→ walked_path_snapshot frozen
→ ticket closed (PSA or internal)
helpful=false → validated_by_outcome stays false
→ L1 prompted: "Escalate instead?"
5. Engineer review queue:
ORDER BY validated_by_outcome DESC, created_at DESC
- Outcome-validated drafts surface first
- Promote / edit-and-promote / retire
6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted'
→ future match_or_build matches the new flow on the flow-match pass
```
---
## 13. Out of scope (v1 non-goals)
- End-user / self-service portal ("L0" tier).
- Engineer warm-transfer / live take-over during a call.
- L1 ↔ engineer real-time chat during a call.
- Multi-language UI / customer-language toggle in walker.
- Auto-promote internal tickets to PSA on integration install.
- AI tree streaming (node-by-node).
- KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion).
- Confluence connector.
- Per-step KB citation editing in engineer review (engineers edit the tree, not citations).
- Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard).
- "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only).
- Cancel/un-escalate flow.
- Round-robin engineer assignment on internal-ticket escalations.
---
## 14. Testing strategy
### 14.1 Backend (pytest)
- Unit: `match_or_build` covers all four paths (flow-match, proposal-match, built, aborted_no_kb).
- Unit: `ai_tree_builder` schema validation — assert rejection of malformed Anthropic output before persistence.
- Unit: each connector's `list_documents` + `fetch_content` against recorded HTTP fixtures.
- Integration: intake → walk → resolve(helpful=true) → assert `FlowProposal.validated_by_outcome=true`, ticket closed.
- Integration: intake → walk → escalate → assert PSA `reassign_ticket` invoked, `ai_session` created with package, audit log entry, notification dispatched.
- Integration: KB scheduler — `max_instances=1`, sequential per-account, soft-delete on removal.
- **RLS regression** (highest priority): `l1_tech` user in account A cannot read account B's tickets, drafts, KB docs, or connector configs. Added to existing RLS test suite.
- Anti-parrot: existing CI test auto-discovers new prompt module.
### 14.2 Frontend
- Unit: `usePermissions` — L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths.
- Unit: `L1WalkPage` — node advance, escalate modal, resolve modal flips `validated_by_outcome` correctly.
- Unit: `L1CoverageBanner` — visible for engineer-with-flag on `/l1/*`, hidden for L1 users.
- E2E (Playwright, scoped selectors per Lesson):
- L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged.
- Engineer with `can_cover_l1` → sidebar entry visible → click → coverage banner shows → walks a session → audit log records `acting_as='l1_coverage'`.
- L1 hitting `/pilot`, `/trees/new`, `/escalations` → 403 or redirect.
---
## 15. Acceptance criteria (v1 ships when…)
- L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable.
- L1 intake creates a ticket (PSA or internal) and lands in walker session.
- Walker handles both flows and proposals; AI-built badge + sources shown for proposals.
- Escalate generates package, reassigns ticket, notifies engineers.
- Resolve flips `validated_by_outcome`; review queue prioritizes outcome-validated drafts.
- All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal.
- AI build refuses with informative error when account KB is empty.
- Coverage flag works end-to-end with audit-log tagging.
- RLS blocks cross-tenant reads on every new table.
- L1 seat count tracked separately from engineer seats in admin/billing UI.
---
## 16. Risks & mitigations
| Risk | Mitigation |
|---|---|
| AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. |
| Hallucinated KB citations | Post-build verification that each `kb_doc_id` exists; unverified citations stripped from walker, surfaced as warning in engineer review. |
| Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. |
| KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. All ingested docs visible in `/account/kb` for manual deletion. |
| AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. |
| Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~12 weeks of work. Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. |
| Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from `'proposal'` to `'flow'`, not the L1's ability to use validated content. |
---
## 17. Naming reference
| Layer | Value |
|---|---|
| DB enum (`account_role`) | `l1_tech` |
| UI display | "L1 Tech" / "L1" |
| Sidebar entry | "L1 Workspace" |
| URL prefix | `/l1` |
| Coverage flag column | `users.can_cover_l1` |
| Coverage audit tag | `acting_as = 'l1_coverage'` |
| Pricing label | "L1 seat" |
| Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now |
---
## 18. Open implementation decisions (deferred to plan, not blocking design)
- Specific `MATCH_THRESHOLD` default value validation (initial 0.75, tune from telemetry post-launch).
- Specific Anthropic model choice for `l1_realtime_build` (Sonnet vs Opus — pick based on quality benchmark during plan).
- Chunk size + overlap for KB ingestion writer (tune in implementation).
- Engineer queue label default (`'Tier 2'` vs `'Engineering'`) — owner-configurable anyway.
- Exact look of the build-progress shimmer animation — design-system handoff.
These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here.
### Note on scope and phasing
This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is:
- **Phase 1:** role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end.
- **Phase 2:** `kb_documents` schema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB.
- **Phase 3:** the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks.
Phasing is a plan-level decision; the spec captures the full feature.
---
*End of spec.*