docs(design): L1 workspace feature spec

New seat tier between engineer and viewer. Dedicated /l1 surface
(dashboard + walker + drafts) for first-call helpdesk staff. Walk-in
intake + PSA queue both produce tickets. Match-or-build pipeline
prefers authored flows, then outcome-validated AI drafts, then builds
fresh from KB. Three KB connectors: IT Glue, Hudu, SharePoint/OneDrive.
Escalation via package + PSA reassign, picked up in chat. Engineer
coverage via per-user can_cover_l1 flag with audit-log tagging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-28 03:33:32 -04:00
parent 41f5519916
commit d1cf77cd41

View File

@@ -0,0 +1,717 @@
# L1 Workspace — Design Spec
**Date:** 2026-05-28
**Status:** Draft (pending implementation plan)
**Audience for this doc:** engineers + reviewers building the L1 workspace feature
---
## 1. Summary
Introduce a dedicated **L1 helpdesk** workspace as a new seat tier in ResolutionFlow. L1 techs walk customers through yes/no decision trees on inbound tickets and phone calls. The platform either matches an existing authored flow, reuses an outcome-validated AI draft, or builds a fresh decision tree in real time from the MSP's ingested knowledge base. Drafts that resolve a call become "outcome-validated" and surface first in the engineer review queue for promotion to authored flows. KB ingestion supports manual upload plus three MSP-native connectors: IT Glue, Hudu, and Microsoft SharePoint/OneDrive.
This re-introduces the original deterministic tree-walker UX — which had been deprecated in favor of chat-primary FlowPilot — and repositions it as a frontline-tier product surface distinct from the engineer chat surface.
---
## 2. Motivation
The current ResolutionFlow product funnels every user — regardless of skill tier — into a single chat-primary surface (`AssistantChatPage` mounted at `/pilot`). The chat is excellent for engineers but is the wrong primitive for L1 helpdesk staff who:
- Take inbound phone calls and need a fast, deterministic click-through UX
- Resolve simple, recurring problems (password resets, mailbox connection issues, VPN disconnects, printer queue clears, etc.)
- Are not authorized to escalate complex issues themselves; they hand off to engineers
A tree-walker UX serves this audience natively. The substrate already exists in the codebase — decision-tree data model, authoring tools, RAG, KB Accelerator, escalation packaging — but no first-class L1 surface ties it together. This spec defines that surface and the supporting AI/KB pipeline.
---
## 3. Users & roles
### 3.1 Role hierarchy
`super_admin > owner > engineer > l1_tech > viewer`
`l1_tech` is added to the `account_role` enum. Permissions enforced via `app/core/permissions.py` and `app/api/deps.py`.
### 3.2 What L1 can do
- Use the `/l1/*` surface
- Open tickets from their queue (PSA-fed or internal)
- Intake walk-in/phone-call problems (creates a ticket as a side effect)
- Walk authored flows and AI-built FlowProposal drafts
- Resolve or escalate a session
- View their own AI drafts list (read-only — outcome tags shown)
### 3.3 What L1 cannot do
- See the chat surface (`/pilot`) — sidebar hidden, route 403s
- Author or edit flows
- See `/review-queue` or `/escalations` (engineer inboxes)
- See team analytics (only `/analytics/me`)
- Promote AI drafts (engineers/owners only, via existing review queue)
- Configure KB connectors (owner-only)
### 3.4 Engineer L1 coverage
Engineers do NOT see the L1 surface by default. Owners can toggle `users.can_cover_l1 = true` on individual engineer users. Engineers with that flag (and all owners/super_admins) see an "L1 Workspace" entry in their sidebar. Clicking it puts them in `/l1/*` with a sticky banner: *"Covering L1 — actions logged as coverage."* Coverage actions are audit-logged with `acting_as = 'l1_coverage'`.
Backend dep: `require_l1_or_coverage` = `l1_tech | (engineer AND can_cover_l1) | owner | super_admin`.
This mirrors the existing orthogonal-flag pattern (`is_team_admin`) — no new architectural concept.
### 3.5 Billing data model
- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0` (new column)
- Existing `accounts.seats_purchased` continues to represent engineer seats
- New Stripe SKU placeholder for L1 seat; actual pricing set in Stripe dashboard out-of-band
---
## 4. Architecture overview
### 4.1 New components
**Frontend:**
- `pages/l1/L1Dashboard.tsx` — landing page; ticket queue + describe-the-problem intake
- `pages/l1/L1WalkPage.tsx` — purpose-built walker; yes/no cards, transcript, persistent escalate/resolve
- `pages/l1/L1DraftsPage.tsx` — read-only list of the L1's AI drafts and promotion status
- `pages/l1/L1TicketsPage.tsx` — full-page queue (PSA + internal merged)
- `components/l1/L1CoverageBanner.tsx` — slim banner shown to engineer-coverers
**Backend:**
- `services/match_or_build.py` — orchestrator (RAG match → fallback to AI build)
- `services/ai_tree_builder.py` — real-time AI tree generation via Anthropic
- `services/kb_connectors/` package — base, registry, encryption, plus `itglue.py`, `hudu.py`, `microsoft_graph.py`
- `services/kb_ingestion_writer.py` — shared writer used by manual upload + all connectors
- `services/kb_ingestion_scheduler.py` — APScheduler job, `max_instances=1`, per-connector sync
- `services/internal_ticket_service.py` — CRUD + status transitions for the no-PSA fallback
- `services/l1_session_service.py` — walking-session lifecycle
- `api/endpoints/l1.py` — L1-role endpoints
- `api/endpoints/kb_connectors.py` — KB connector config endpoints (owner-only for write)
**Reused / extended:**
- `services/rag_service.py` — flow & KB matching (existing)
- `services/flow_matching_engine.py` — existing
- `services/escalation_package_generator.py` — extended to include walked path, AI draft pointer, KB citations
- `models/FlowProposal` — new columns (see §5)
- `services/psa/` — already supports ticket create + reassign across CW/Autotask/HaloPSA
- `services/embedding_service.py` — used by KB ingestion writer
- New `kb_documents` + `kb_document_chunks` tables for RAG-retrievable document storage, separate from the existing `kb_imports` (which is a document→tree conversion record, not a persistent KB store — see §5)
- Audit log writer — gains `acting_as` field
### 4.2 Data flow — walk-in / phone-call intake
```
L1 types: "User can't connect Outlook after password reset"
POST /api/v1/l1/intake
body: { problem_statement, customer_name?, customer_contact? }
→ create ticket
- PSA if configured: psa_provider.create_ticket(...)
- else: internal_tickets row
→ match_or_build(account_id, problem_text, ticket_ref)
→ rag_service.match_flows(...) → top hit; if score ≥ threshold return as 'flow'
→ rag_service.match_proposals(... where validated_by_outcome=true)
→ top hit; if score ≥ threshold return as 'proposal'
→ ai_tree_builder.build(problem_text, kb_chunks, nearest_flows)
→ persist FlowProposal(source='ai_realtime_l1',
linked_ticket_id,
linked_ticket_kind,
validated_by_outcome=false)
→ return as 'proposal'
→ l1_session_service.start(...)
→ return { session_id, target_kind, target_id, intake_type }
→ navigate to /l1/walk/{session_id}
```
### 4.3 Data flow — PSA-queue intake
The L1 dashboard polls the L1's PSA queue plus their internal tickets. Clicking a ticket row calls `POST /api/v1/l1/tickets/{ticket_ref}/start` which is the same `match_or_build` path (the `problem_statement` is the ticket subject + description) followed by walker navigation.
---
## 5. Data model
All new tenant-isolated tables get RLS policies (account-scoped, WITH CHECK). All TIMESTAMPs are `TIMESTAMPTZ`. No `--rev-id` on Alembic; no `--autogenerate` for enum/RLS work.
### 5.1 `FlowProposal` — extended
Existing AI-draft model. Add columns:
| Column | Type | Notes |
|---|---|---|
| `source` | `VARCHAR(30) NOT NULL` | `'ai_realtime_l1' \| 'kb_accelerator' \| 'manual_draft'`. Backfill existing rows to `'manual_draft'`. |
| `linked_ticket_id` | `VARCHAR(64) NULL` | PSA id or internal_tickets UUID (stored as text) |
| `linked_ticket_kind` | `VARCHAR(10) NULL` | `'psa' \| 'internal'` |
| `validated_by_outcome` | `BOOLEAN NOT NULL DEFAULT FALSE` | Flipped to true when L1 resolves and marks helpful=true |
| `walked_path_snapshot` | `JSONB NULL` | Frozen at resolve/escalate; shape `[{node_id, question, answer, l1_note}]` |
Engineer review queue sort:
```sql
ORDER BY validated_by_outcome DESC, created_at DESC
```
### 5.2 `internal_tickets` — new
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
created_by_user_id UUID NOT NULL (the L1 who took the call)
customer_name VARCHAR(120)
customer_contact VARCHAR(200) NULL (email or phone, free text)
problem_statement TEXT NOT NULL
status VARCHAR(30) NOT NULL -- 'open' | 'walking' | 'resolved' | 'escalated'
flow_id UUID NULL FK trees
flow_proposal_id UUID NULL FK flow_proposals
ai_session_id UUID NULL FK ai_sessions (set when engineer picks up in chat post-escalation)
assigned_user_id UUID NULL (engineer post-escalation)
resolution_notes TEXT NULL
psa_promoted_ticket_id VARCHAR(64) NULL (set if later promoted to PSA)
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
resolved_at TIMESTAMPTZ NULL
```
RLS: account-scoped, WITH CHECK on insert/update.
### 5.3 `kb_connector_configs` — new
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
provider VARCHAR(20) NOT NULL -- 'itglue' | 'hudu' | 'microsoft_graph'
display_name VARCHAR(80) NOT NULL
credentials_encrypted BYTEA NOT NULL -- Fernet, same pattern as services/psa/encryption.py
is_active BOOLEAN NOT NULL DEFAULT TRUE
sync_interval_minutes INTEGER NOT NULL DEFAULT 360
last_sync_at TIMESTAMPTZ NULL
last_sync_status VARCHAR(20) NULL -- 'success' | 'error' | 'running'
last_sync_error TEXT NULL
created_by_user_id UUID NOT NULL
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
UNIQUE (account_id, provider, display_name)
```
RLS: account-scoped, WITH CHECK.
### 5.4 New tables: `kb_documents` + `kb_document_chunks`
The existing `kb_imports` table is a document→tree conversion record (status lifecycle `processing | ready | committed | failed`, target `tree_id`) — designed to turn one document into one authored flow. It is NOT a persistent KB document store and does not power RAG retrieval.
The L1 feature needs a separate pair of tables that store ingested docs in RAG-retrievable form:
**`kb_documents`** — one row per ingested document:
```
id UUID PRIMARY KEY
account_id UUID NOT NULL (RLS-scoped)
source_kind VARCHAR(20) NOT NULL -- 'upload' | 'paste' | 'itglue' | 'hudu' | 'microsoft_graph'
source_ref VARCHAR(200) NULL -- provider-side document ID for re-sync
connector_config_id UUID NULL FK kb_connector_configs
title VARCHAR(500) NOT NULL
content TEXT NOT NULL -- full post-extraction text
content_hash VARCHAR(64) NOT NULL -- sha256 for change-detection
metadata JSONB NULL -- provider-specific (org_id, drive_id, etc.)
last_synced_at TIMESTAMPTZ NULL
deleted_at TIMESTAMPTZ NULL -- soft-delete on connector removal
created_at TIMESTAMPTZ NOT NULL
updated_at TIMESTAMPTZ NOT NULL
```
Unique partial index: `(connector_config_id, source_ref) WHERE source_ref IS NOT NULL`.
**`kb_document_chunks`** — chunks with embeddings, used by `rag_service.match_kb_chunks`:
```
id UUID PRIMARY KEY
document_id UUID NOT NULL FK kb_documents ON DELETE CASCADE
account_id UUID NOT NULL -- denormalized for RLS
chunk_index INTEGER NOT NULL
content TEXT NOT NULL
embedding VECTOR(<dim>) NOT NULL -- dim matches embedding_service
metadata JSONB NULL -- section title, page number, etc.
created_at TIMESTAMPTZ NOT NULL
UNIQUE (document_id, chunk_index)
```
Pgvector index (ivfflat or hnsw) on `embedding`; choice tuned during implementation.
RLS on both tables: account-scoped, WITH CHECK on insert.
**Coexistence with `kb_imports`:** when an L1 (or owner) uploads a doc, the system can populate **both** — the existing KBImport pipeline produces a draft tree, and the new ingestion writer additionally chunks+embeds the doc into `kb_documents` for RAG. Both paths share the upload endpoint but write to independent tables. Connectors only write to `kb_documents` (no auto-tree-conversion from synced docs in v1).
### 5.5 Other column additions
- `users.can_cover_l1 BOOLEAN NOT NULL DEFAULT FALSE`
- `accounts.l1_seats_purchased INTEGER NOT NULL DEFAULT 0`
- `audit_logs.acting_as VARCHAR(30) NULL``'l1_coverage'` when engineer is in coverage mode; null otherwise
- `account_role` enum: add `'l1_tech'`
### 5.6 Migration ordering
Six manual Alembic revisions (no `--rev-id`, no `--autogenerate`):
1. Add `'l1_tech'` to `account_role` enum.
2. Add `users.can_cover_l1`, `accounts.l1_seats_purchased`, `audit_logs.acting_as`.
3. Extend `flow_proposals` with new columns + backfill existing rows to `source='manual_draft'`.
4. Create `internal_tickets` + RLS policies (account-scoped, WITH CHECK).
5. Create `kb_connector_configs` + RLS policies.
6. Create `kb_documents` + `kb_document_chunks` tables + RLS policies + pgvector index on chunks.
Per Lesson on tenant-isolated tables: any service-construction site that creates rows on these tables must pass `account_id=` explicitly. Grep all `Model(` sites before merge.
---
## 6. Backend services & endpoints
### 6.1 New services
| Module | Purpose |
|---|---|
| `services/match_or_build.py` | Orchestrator. Single async entrypoint `match_or_build(account_id, problem_text, ticket_ref) -> MatchOrBuildResult`. |
| `services/ai_tree_builder.py` | Real-time AI tree generation. Anthropic via existing `_call_anthropic_cached` pattern. Model tier via `settings.get_model_for_action('l1_realtime_build')`. Output validated against the flow node schema with Pydantic; rejects malformed output. |
| `services/kb_connectors/base.py` | Abstract `KBConnector` with `test_credentials`, `list_documents`, `fetch_content`, `subscribe_to_changes` (optional). |
| `services/kb_connectors/itglue.py` | IT Glue REST client. |
| `services/kb_connectors/hudu.py` | Hudu REST client. |
| `services/kb_connectors/microsoft_graph.py` | Microsoft Graph (SharePoint/OneDrive) client. |
| `services/kb_connectors/registry.py` | `KBConnectorRegistry` (mirrors `PsaProviderRegistry`). |
| `services/kb_connectors/encryption.py` | Fernet wrapper (or reuse the PSA one if generic). |
| `services/kb_ingestion_writer.py` | Shared writer: chunk → embed → upsert. Used by manual upload AND connector sync. |
| `services/kb_ingestion_scheduler.py` | APScheduler interval job, `max_instances=1`. Sequential per account; concurrency cap = 4 accounts simultaneously. |
| `services/internal_ticket_service.py` | CRUD + status transitions for `internal_tickets`. |
| `services/l1_session_service.py` | Walking-session lifecycle: start, step, resolve, escalate. Bridges `ai_sessions` and the walked target. |
### 6.2 Extended services
- `services/escalation_package_generator.py` — adds inputs: `walked_path`, `ai_draft_proposal_id`, `kb_citations`. New caller path from `l1_session_service.escalate(...)`.
- KB Accelerator endpoint — accepts ingested content via the shared `kb_ingestion_writer`. Manual upload and connector sync share the same persistence path.
### 6.3 New endpoints
All under `require_l1_or_coverage` unless noted. Mounted under `/api/v1/l1`.
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/l1/queue` | Merged ticket queue (PSA + internal). Pagination + status filter. | `require_l1_or_coverage` |
| POST | `/l1/intake` | Walk-in intake. Body `{problem_statement, customer_name?, customer_contact?}`. Creates ticket, returns `{session_id, target_kind, target_id, intake_type}`. | `require_l1_or_coverage` |
| POST | `/l1/tickets/{ticket_ref}/start` | Start walker from an existing ticket. Internally same as intake but skips ticket creation. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/step` | Record an answer. Body `{node_id, answer, note?}`. Appends to `walked_path_snapshot`. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/resolve` | Close as resolved. Body `{resolution_notes, helpful: bool}`. Sets `validated_by_outcome=true` on the proposal when `helpful=true` AND target was a proposal. Closes the ticket. | `require_l1_or_coverage` |
| POST | `/l1/sessions/{id}/escalate` | Generate escalation package + reassign ticket. Body `{reason, reason_category}`. | `require_l1_or_coverage` |
| GET | `/l1/drafts` | List current user's AI drafts with promotion status. | `require_l1_or_coverage` |
KB connector endpoints (`/api/v1/kb-connectors`):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/kb-connectors` | List configured connectors for account. | `require_l1_or_above` |
| POST | `/kb-connectors` | Create. OAuth handoff for Microsoft Graph; API token entry for IT Glue/Hudu. | `require_account_owner` |
| DELETE | `/kb-connectors/{id}` | Remove (soft-disable). | `require_account_owner` |
| POST | `/kb-connectors/{id}/sync` | Trigger immediate sync (enqueued). | `require_account_owner` |
| GET | `/kb-connectors/{id}/status` | Sync status + doc count + last error. | `require_l1_or_above` |
Internal ticket endpoints (`/api/v1/internal-tickets`):
| Method | Path | Purpose | Auth |
|---|---|---|---|
| GET | `/internal-tickets` | List (account-scoped). | `require_l1_or_coverage` |
| GET | `/internal-tickets/{id}` | Detail. | `require_l1_or_coverage` |
| POST | `/internal-tickets/{id}/promote-to-psa` | Push to configured PSA, set `psa_promoted_ticket_id`. | `require_account_owner` |
User management addition:
| Method | Path | Purpose | Auth |
|---|---|---|---|
| PATCH | `/users/{id}/coverage` | Set `can_cover_l1` flag. Body `{can_cover_l1: bool}`. | `require_account_owner` |
---
## 7. Frontend surface
### 7.1 Sidebar — L1 view
```
LOGO
─────────────
Workspace /l1
Tickets /l1/tickets
My Drafts /l1/drafts
─────────────
Guides /guides
Account /account (filtered — no integrations, no categories)
```
No `/pilot`, no `/trees`, no `/flows`, no `/review-queue`, no `/escalations`, no team analytics. Sidebar.tsx picks the nav array by role.
### 7.2 Sidebar — engineer coverage view
Engineer's existing sidebar plus a single appended entry "L1 Workspace" → `/l1`. Shown when `canCoverL1 || isOwner || isSuperAdmin`.
### 7.3 `/l1` dashboard layout
Three vertical zones, single column, max width ~1100px:
1. **Greeting** — uppercase tracking date label + Bricolage 700 hero ("Good morning, {firstName}.")
2. **Describe the problem** card — large textarea (autofocus on load), optional `customer_name` + `customer_contact` fields, single primary CTA "Start walk →" (the only electric-blue element on the page)
3. **Open tickets** — section label, count, table rows (merged PSA + internal with origin badges), row hover `bg-elevated`
4. **Resume in progress** — shown only when L1 has a half-walked session
Tailwind v4 tokens: `bg-page` base, `bg-card` zones, `bg-elevated` row hover, electric-blue accent only on primary CTA. No `text-secondary`. All borders `border-default`.
### 7.4 `/l1/walk/{sessionId}` walker
Sticky header + two-pane body, full-height (flex chain per Lesson — every ancestor needs `flex` + `flex-1` + `min-h-0`).
**Header:**
- Back arrow + ticket ref + customer name + AI-built badge (when target is proposal)
- Problem statement line
- Persistent action buttons: `[ Escalate ]` `[ Resolve ✓ ]`
**Left pane (main):**
- "Step N · estimated M" label
- Current node card — large yes/no/answer buttons (min 44px tap target)
- Optional note textarea below the card (appended to `walked_path_snapshot`)
- On a fresh proposal that's still building: shimmer placeholder + "Building from KB… ~10s"
**Right pane (transcript):**
- Walked-so-far list (node title + answer chosen)
- Current step highlight
- "Source:" section listing KB citations for the current node (proposal walks only)
**Resolve modal:**
- "Did this resolve it?" `[ Yes ]` `[ No ]`
- Resolution notes textarea
- Yes + target was proposal → sets `validated_by_outcome=true`
- No → prompt to escalate instead
**Escalate modal:**
- Reason category dropdown: *Out of L1 scope · Customer demanding senior · Tree dead-ended · AI tree wrong · Other*
- Free-text reason
- Confirm
### 7.5 `/l1/drafts` page
Read-only list, columns: `created` · `problem (truncated)` · `ticket #` · `status` (pending review / outcome-validated / promoted / retired). Click → read-only detail view showing tree + walked path. No edit affordances.
### 7.6 `/l1/tickets` page
Full-page version of the dashboard queue widget. Filter by status, origin (PSA/internal), assigned-to-me.
### 7.7 Coverage banner
`<L1CoverageBanner />` — slim ~32px band, info-cyan-dim background, mounted at the top of all `/l1/*` pages when `!isL1Tech && (canCoverL1 || isOwner || isSuperAdmin)`:
```
You're covering L1. Actions logged as coverage. [Switch back →]
```
The "Switch back" link returns to `/`.
### 7.8 Routing
```tsx
const L1Dashboard = lazyWithRetry(() => import('@/pages/l1/L1Dashboard'))
const L1WalkPage = lazyWithRetry(() => import('@/pages/l1/L1WalkPage'))
const L1DraftsPage = lazyWithRetry(() => import('@/pages/l1/L1DraftsPage'))
const L1TicketsPage = lazyWithRetry(() => import('@/pages/l1/L1TicketsPage'))
```
Mounted under the `/` ProtectedRoute branch at:
- `/l1``L1Dashboard`
- `/l1/walk/:sessionId``L1WalkPage`
- `/l1/drafts``L1DraftsPage`
- `/l1/tickets``L1TicketsPage`
Wrapped in `L1RouteGuard` (403 if not `l1_tech` AND not coverage-flagged). `ProtectedRoute.tsx` post-login redirect: L1 users land on `/l1` instead of `/`.
`lazyWithRetry`, not `React.lazy` (per existing convention).
---
## 8. AI match-or-build pipeline
### 8.1 Match-or-build algorithm
```
match_or_build(account_id, problem_text, ticket_ref):
embedding = embedding_service.embed(problem_text)
# 1. Match authored flows
flow_hits = rag_service.match_flows(account_id, embedding, k=5)
if flow_hits and flow_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'flow', id: flow_hits[0].flow_id, score: ...}
# 2. Match outcome-validated proposals only
proposal_hits = rag_service.match_proposals(
account_id, embedding, k=5,
where=validated_by_outcome=true,
)
if proposal_hits and proposal_hits[0].score >= MATCH_THRESHOLD:
return {kind: 'proposal', id: proposal_hits[0].proposal_id, score: ...}
# 3. Build fresh
kb_chunks = rag_service.match_kb_chunks(account_id, embedding, k=8)
if not kb_chunks:
raise BuildAbortedNoKB(
"Cannot build a tree with no KB content. "
"Upload docs or wait for a connector sync."
)
nearest_flows = flow_hits[:3]
proposal = ai_tree_builder.build(
problem_text, kb_chunks, nearest_flows, account_id, ticket_ref
)
return {kind: 'proposal', id: proposal.id, score: None}
```
`MATCH_THRESHOLD` — per-account configurable; default `0.75` (cosine).
The "no empty KB build" rule is enforced because an AI tree built on the model's general knowledge — without MSP-specific grounding — risks suggesting unsafe or hallucinated fixes.
### 8.2 AI tree-build details
**Model:** `settings.get_model_for_action('l1_realtime_build')`. Recommend Sonnet for v1 (latency-sensitive).
**Schema:** output validated against the existing flow node schema (matches `tree_editor` output). Validation failure aborts the build rather than persisting malformed data.
**Prompt strategy** (per Lesson on prompt anti-parrot — critical):
- System prompt: role definition + output schema using `<placeholder>` notation only. Never literal field values.
- Few-shot examples loaded as user/assistant messages from a separate file, never inline in the system prompt.
- User message: `{problem_statement}` + `{kb_context: [doc_title, section, content]}` + `{nearest_flow_summaries}` + instruction to cite KB chunks per node.
- Output includes `kb_citations: [{node_id, kb_doc_id, snippet}]` for walker's "Source:" pane and engineer review.
**Latency:** whole-tree-then-return (~515s typical). UX is a shimmer "Building from KB…" placeholder. Streaming node-by-node deferred to v2.
**Anthropic SDK config** (per Lesson): `max_retries=1`. Prompt caching enabled on the stable system+few-shot bundle (high cache hit rate expected per account).
**Telemetry:**
- `l1.match_or_build.duration_ms`, `l1.match_or_build.outcome` (`flow_match`/`proposal_match`/`built`/`aborted_no_kb`)
- `anthropic.cache` events (existing pattern) tagged `action=l1_realtime_build`
- `l1.tree_build.tokens_in`, `tokens_out`
**Anti-parrot guardrail:** the existing `tests/test_prompt_anti_parrot.py` auto-discovers new prompt constants via pattern match on `*_PROMPT` / `*_SCHEMA` / `*_PROTOCOL` / `*_FORMAT`. No new test required.
### 8.3 Hallucinated-citation defense
After build, the writer verifies every `kb_doc_id` in `kb_citations` exists in the account's KB. Unverified citations are stripped from the walker's "Source:" pane (the node still renders, just without a source). Engineer review surfaces stripped citations as a warning.
---
## 9. KB ingestion
### 9.1 Connector interface
```python
class KBConnector(ABC):
async def test_credentials(self) -> bool
async def list_documents(self, since: datetime | None) -> AsyncIterator[KBDocRef]
async def fetch_content(self, ref: KBDocRef) -> KBDocContent
async def subscribe_to_changes(self) -> AsyncIterator[ChangeEvent] # optional, no-op v1
```
Registry dispatches by `provider` string. Credentials encrypted at rest via Fernet (reuse `services/psa/encryption.py` pattern).
### 9.2 Per-connector specifics
| | IT Glue | Hudu | Microsoft Graph (SharePoint/OneDrive) |
|---|---|---|---|
| Auth | API token (header) | API key (header) | OAuth 2.0 |
| Ingested types | Documents, KB Articles | Articles | docx, pdf, md, txt |
| Never ingested | Passwords, Configurations, sensitive flex assets | Passwords, sensitive items | Files in folders matching `(secret\|confidential\|private)` heuristic; files with a tenant Sensitivity Label |
| Filtering | Per-org (techs see all client orgs they have permission to) | Per-folder | Per-site / per-drive (owner picks at config time) |
| Rate limits | ~100/min token bucket | ~250/min token bucket | Built-in Graph throttling backoff |
All three deliver content to `kb_ingestion_writer` which:
1. Chunks (paragraph-aware, configurable size with overlap)
2. Embeds via `embedding_service`
3. Upserts into `kb_documents` keyed on `(connector_config_id, source_ref)`; chunks into `kb_document_chunks`
Cross-connector conflicts: same doc text appearing in two connectors yields two rows (provider-scoped `source_ref`). Engineers can dedup manually if needed.
### 9.3 Sync scheduling
`kb_ingestion_scheduler.py` runs as APScheduler interval job, `max_instances=1`. Per cycle:
1. Query active `kb_connector_configs` where `last_sync_at` is older than `sync_interval_minutes` (default 360 = 6h).
2. Dispatch per account; concurrency cap = 4 simultaneous accounts.
3. For each connector: `list_documents(since=last_sync_at)` → for each ref, `fetch_content` → write.
4. Compute the diff between current refs and existing rows (same `connector_config_id`); soft-delete missing ones via `deleted_at`.
5. Update `last_sync_at`, `last_sync_status`, `last_sync_error`.
Must use `_admin_session_factory()` not `get_db()` for startup-side and scheduler-side queries (per Lesson on RLS at startup — no `app.current_account_id` set).
Immediate sync via `POST /api/v1/kb-connectors/{id}/sync` enqueues a job; scheduler picks it up within ~30s.
---
## 10. Escalation flow
1. L1 clicks **Escalate** → modal (reason category + optional free text).
2. `POST /api/v1/l1/sessions/{id}/escalate` → backend:
- Calls extended `escalation_package_generator.generate(session_id, include_l1_walk=true)`. Package contents:
```
problem_statement, customer_name, customer_contact,
ticket_ref (PSA id or internal id),
target_kind ('flow' | 'proposal'), target_id,
walked_path,
ai_draft_proposal_id,
kb_citations,
escalation_reason, reason_category, l1_user_id
```
- Creates an `ai_session` with the package serialized into system context for the chat surface.
- If PSA-backed: `psa_provider.reassign_ticket(ticket_id, to=account.engineer_queue_name)`. Default `'Tier 2'`. Owner configurable in `/account/integrations`.
- If internal-backed: `internal_tickets.status='escalated'`, `assigned_user_id=null` (round-robin assignment is out of scope).
- Writes notification via existing `notification_service` — bell badge to all engineers in account.
- Audit log entry; `acting_as` reflects whether L1 or coverage-engineer escalated.
3. Toast on L1 side, return to `/l1`.
4. Engineer clicks notification → `/pilot/{sessionId}` → chat surface renders the package as a sticky "Escalation context" card; engineer continues in chat.
**Un-escalate is out of scope.** If engineer wants to bounce back, they reassign in PSA manually.
---
## 11. Internal ticket fallback
When the account has no active PSA provider:
- Intake creates `internal_tickets` row instead of a PSA ticket.
- Queue surface merges PSA + internal with `Internal` / `PSA` origin badge.
- Escalation flips `internal_tickets.status='escalated'` and assigns engineer (or leaves null for any engineer to claim — v1 behavior).
- Engineer post-escalation sees the internal ticket as a session; no PSA roundtrip.
**Promote to PSA:** owner-only action on any internal ticket. Pushes the ticket into the configured PSA provider, sets `psa_promoted_ticket_id`. Manual; not automatic on PSA-install. Lets MSPs adopt PSA mid-flight without orphaning prior internal tickets.
---
## 12. Outcome-validation lifecycle
```
1. L1 intake → match_or_build → FlowProposal(source='ai_realtime_l1',
validated_by_outcome=false,
linked_ticket_id=...)
2. L1 walks → POST /l1/sessions/{id}/step appends to walked_path_snapshot
3. L1 hits Resolve:
modal: "Did this resolve it?" [Yes] [No] + resolution_notes
4. helpful=true → flow_proposal.validated_by_outcome = true
→ walked_path_snapshot frozen
→ ticket closed (PSA or internal)
helpful=false → validated_by_outcome stays false
→ L1 prompted: "Escalate instead?"
5. Engineer review queue:
ORDER BY validated_by_outcome DESC, created_at DESC
- Outcome-validated drafts surface first
- Promote / edit-and-promote / retire
6. Promote → new flow with source='ai_promoted'; original proposal kept with status='promoted'
→ future match_or_build matches the new flow on the flow-match pass
```
---
## 13. Out of scope (v1 non-goals)
- End-user / self-service portal ("L0" tier).
- Engineer warm-transfer / live take-over during a call.
- L1 ↔ engineer real-time chat during a call.
- Multi-language UI / customer-language toggle in walker.
- Auto-promote internal tickets to PSA on integration install.
- AI tree streaming (node-by-node).
- KB write-back to IT Glue/Hudu/SharePoint (read-only ingestion).
- Confluence connector.
- Per-step KB citation editing in engineer review (engineers edit the tree, not citations).
- Final Stripe pricing SKU (data model supports differential pricing; price set in Stripe dashboard).
- "Switch to L1 mode" persistent toggle for engineers (coverage flag + banner only).
- Cancel/un-escalate flow.
- Round-robin engineer assignment on internal-ticket escalations.
---
## 14. Testing strategy
### 14.1 Backend (pytest)
- Unit: `match_or_build` covers all four paths (flow-match, proposal-match, built, aborted_no_kb).
- Unit: `ai_tree_builder` schema validation — assert rejection of malformed Anthropic output before persistence.
- Unit: each connector's `list_documents` + `fetch_content` against recorded HTTP fixtures.
- Integration: intake → walk → resolve(helpful=true) → assert `FlowProposal.validated_by_outcome=true`, ticket closed.
- Integration: intake → walk → escalate → assert PSA `reassign_ticket` invoked, `ai_session` created with package, audit log entry, notification dispatched.
- Integration: KB scheduler — `max_instances=1`, sequential per-account, soft-delete on removal.
- **RLS regression** (highest priority): `l1_tech` user in account A cannot read account B's tickets, drafts, KB docs, or connector configs. Added to existing RLS test suite.
- Anti-parrot: existing CI test auto-discovers new prompt module.
### 14.2 Frontend
- Unit: `usePermissions` — L1 sees L1 paths, blocked from engineer paths. Coverage flag opens L1 paths.
- Unit: `L1WalkPage` — node advance, escalate modal, resolve modal flips `validated_by_outcome` correctly.
- Unit: `L1CoverageBanner` — visible for engineer-with-flag on `/l1/*`, hidden for L1 users.
- E2E (Playwright, scoped selectors per Lesson):
- L1 sign-in → dashboard → intake → walker → resolve → verify ticket closed + proposal flagged.
- Engineer with `can_cover_l1` → sidebar entry visible → click → coverage banner shows → walks a session → audit log records `acting_as='l1_coverage'`.
- L1 hitting `/pilot`, `/trees/new`, `/escalations` → 403 or redirect.
---
## 15. Acceptance criteria (v1 ships when…)
- L1 role assignable; assigned L1 sees L1 sidebar only; no engineer route reachable.
- L1 intake creates a ticket (PSA or internal) and lands in walker session.
- Walker handles both flows and proposals; AI-built badge + sources shown for proposals.
- Escalate generates package, reassigns ticket, notifies engineers.
- Resolve flips `validated_by_outcome`; review queue prioritizes outcome-validated drafts.
- All three KB connectors configurable; initial sync + periodic re-sync + soft-delete on removal.
- AI build refuses with informative error when account KB is empty.
- Coverage flag works end-to-end with audit-log tagging.
- RLS blocks cross-tenant reads on every new table.
- L1 seat count tracked separately from engineer seats in admin/billing UI.
---
## 16. Risks & mitigations
| Risk | Mitigation |
|---|---|
| AI builds an unsafe tree | Schema validation rejects malformed output. Engineer review is the gate before draft becomes "real" flow. v1 refuses to build when KB is empty. |
| Hallucinated KB citations | Post-build verification that each `kb_doc_id` exists; unverified citations stripped from walker, surfaced as warning in engineer review. |
| Duplicate proposals for same problem | Validated-proposal match pass deduplicates after one L1 validates; pre-validation dups are tolerated and dedup'd during engineer review. |
| KB ingestion captures sensitive content | Per-connector deny-lists (passwords, sensitive flex assets, MS Graph Sensitivity Labels). Owners exclude specific folders/sites at config. All ingested docs visible in `/account/kb` for manual deletion. |
| AI build latency frustrates customer on call | Build-progress UI sets expectation. Escalate button visible from page load. Future: pre-warm builds on PSA-ticket-landed event. |
| Three connectors is more scope than originally proposed | Acknowledged. Each connector is ~12 weeks of work. Plan should sequence them and allow shipping with IT Glue + Hudu first if SharePoint slips. |
| Engineer review queue backlog stalls library growth | Validated-proposal match pass means good drafts get reused without engineer review. Backlog only delays the move from `'proposal'` to `'flow'`, not the L1's ability to use validated content. |
---
## 17. Naming reference
| Layer | Value |
|---|---|
| DB enum (`account_role`) | `l1_tech` |
| UI display | "L1 Tech" / "L1" |
| Sidebar entry | "L1 Workspace" |
| URL prefix | `/l1` |
| Coverage flag column | `users.can_cover_l1` |
| Coverage audit tag | `acting_as = 'l1_coverage'` |
| Pricing label | "L1 seat" |
| Stripe SKU | Set in Stripe dashboard at launch — data model supports differential pricing now |
---
## 18. Open implementation decisions (deferred to plan, not blocking design)
- Specific `MATCH_THRESHOLD` default value validation (initial 0.75, tune from telemetry post-launch).
- Specific Anthropic model choice for `l1_realtime_build` (Sonnet vs Opus — pick based on quality benchmark during plan).
- Chunk size + overlap for KB ingestion writer (tune in implementation).
- Engineer queue label default (`'Tier 2'` vs `'Engineering'`) — owner-configurable anyway.
- Exact look of the build-progress shimmer animation — design-system handoff.
These are tuning/UX-polish details, not architectural forks. They land during the writing-plans phase, not here.
### Note on scope and phasing
This is a substantive feature: new role, four frontend pages, ~12 endpoints, AI tree-builder, three KB connectors, escalation extensions, and six migrations. The implementation plan will almost certainly phase the work — a reasonable cut is:
- **Phase 1:** role + L1 surface against existing authored flows (no AI build, no connectors yet). Validates the seat model, walker UX, escalation, internal ticket fallback, and coverage flag end-to-end.
- **Phase 2:** `kb_documents` schema + AI tree-builder + match-or-build pipeline. Enables real-time AI flows grounded on manually-uploaded KB.
- **Phase 3:** the three KB connectors (IT Glue, Hudu, SharePoint/OneDrive). Each is roughly self-contained — can ship one at a time and reorder if a connector blocks.
Phasing is a plan-level decision; the spec captures the full feature.
---
*End of spec.*