diff --git a/docs/plans/2026-02-17-maintenance-flows-design.md b/docs/plans/2026-02-17-maintenance-flows-design.md new file mode 100644 index 00000000..cafb9fae --- /dev/null +++ b/docs/plans/2026-02-17-maintenance-flows-design.md @@ -0,0 +1,202 @@ +# Maintenance Flows — Design Document + +> **Date:** 2026-02-17 +> **Status:** Approved +> **Phase:** Design (pre-implementation) + +--- + +## Overview + +Add `maintenance` as a first-class flow type in ResolutionFlow, alongside `troubleshooting` and `procedural`. Maintenance flows are designed for MSP scheduled/repeatable infrastructure tasks (e.g., patching Citrix servers, updating FSLogix, updating RDS software). They share the procedural execution engine but add scheduling, multi-target batch launching, and saved target lists. + +--- + +## Goals + +- Visual separation of maintenance flows from troubleshooting and project flows +- Batch launch: one flow run against N servers/targets simultaneously, each tracked as an independent session +- Saved target lists per team, with ad-hoc entry and future PSA/RMM import +- Scheduled auto-session creation with in-app notifications +- Re-use target lists from previous batch runs + +--- + +## Data Model + +### `tree_type` expansion + +**Migration:** Drop and recreate the `ck_trees_tree_type` check constraint to allow `'troubleshooting' | 'procedural' | 'maintenance'`. + +Maintenance flows reuse `tree_structure` (step-by-step like procedural) and `intake_form` (for capturing target-specific context at session start, e.g., patch version). + +--- + +### `target_lists` table (new) + +```sql +id UUID PRIMARY KEY DEFAULT gen_random_uuid() +team_id UUID NOT NULL REFERENCES teams(id) ON DELETE CASCADE +created_by UUID REFERENCES users(id) ON DELETE SET NULL +name VARCHAR(255) NOT NULL +description TEXT +targets JSONB NOT NULL -- [{ "label": "RDS-01", "notes": "..." }, ...] +created_at TIMESTAMPTZ NOT NULL DEFAULT now() +updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +``` + +- Scoped to team; any engineer can create/edit/delete their team's lists +- Each target entry: `label` (required, display name / hostname) + `notes` (optional, IP, role, etc.) + +--- + +### `maintenance_schedules` table (new) + +```sql +id UUID PRIMARY KEY DEFAULT gen_random_uuid() +tree_id UUID NOT NULL REFERENCES trees(id) ON DELETE CASCADE +created_by UUID REFERENCES users(id) ON DELETE SET NULL +cron_expression VARCHAR(100) NOT NULL -- e.g. "0 9 15 * *" +timezone VARCHAR(100) NOT NULL DEFAULT 'UTC' +target_list_id UUID REFERENCES target_lists(id) ON DELETE SET NULL +is_active BOOLEAN NOT NULL DEFAULT true +next_run_at TIMESTAMPTZ NOT NULL +last_run_at TIMESTAMPTZ +created_at TIMESTAMPTZ NOT NULL DEFAULT now() +updated_at TIMESTAMPTZ NOT NULL DEFAULT now() +``` + +- One active schedule per maintenance flow (enforced at API level) +- `target_list_id` is optional — if null, schedule auto-creates sessions without targets (engineer specifies targets on the pending sessions) +- `next_run_at` is computed from `cron_expression` + `timezone` at creation/update + +--- + +### Sessions — batch tracking fields (new columns) + +```sql +batch_id UUID -- all sessions from one batch launch share this value +target_label VARCHAR(255) -- e.g. "RDS-01" +``` + +- `batch_id` is generated at batch launch time (not per-session) +- `target_label` is the label from the target list entry or ad-hoc input + +--- + +## Scheduling Engine + +**APScheduler** runs in-process with the FastAPI backend (async scheduler). + +On startup: +1. Load all `is_active=true` maintenance schedules +2. Register each as an APScheduler job using its `cron_expression` + `timezone` + +When a schedule fires: +1. Resolve target list (`target_list_id` → targets, or empty list if null) +2. Generate a new `batch_id` +3. Create one `Session` per target with `batch_id`, `target_label`, status `pending` +4. Update `last_run_at`, compute and update `next_run_at` +5. Create in-app notification: "Maintenance run ready: [Flow Name] — N sessions created" + +Schedule changes (create/update/disable) are applied to APScheduler immediately via the API. + +--- + +## Batch Launch (Ad-hoc) + +Triggered from the maintenance flow detail page. Engineer picks target list via modal with four tabs: + +| Tab | Description | +|-----|-------------| +| **Saved List** | Pick from team's saved target lists | +| **Previous Run** | Browse this flow's past batches, re-use that target list | +| **Manual Entry** | Paste/type server names (one per line) | +| **PSA/RMM Import** | Placeholder — "Coming soon" | + +After confirming, engineer sees a preview: "Will create N sessions for: RDS-01, RDS-02..." + +On confirm: creates N sessions with shared `batch_id`, status `pending`. + +--- + +## UI / UX + +### Sidebar + +``` +All Flows [total] + Troubleshooting [count] + Projects [count] + Maintenance [count] ← new +``` + +Links to `/trees?type=maintenance`. + +### TreeLibraryPage + +- `typeFilter` expands to `'all' | 'troubleshooting' | 'procedural' | 'maintenance'` +- Maintenance flows show a distinct badge (wrench icon, amber accent color) + +### Flow Editor + +- New flow type selector includes "Maintenance" +- Uses the same `ProceduralEditorPage` — no new editor needed + +### Maintenance Flow Detail Page (`/flows/:id/maintenance`) + +New page shown when opening a maintenance flow (via `getTreeNavigatePath`). Sections: +- **Overview** — name, description, steps summary +- **Schedule panel** — set/edit/disable cron schedule, timezone, assigned target list +- **Batch Launch button** — opens target list modal +- **Run history** — past batches grouped by `batch_id`, status rollup (e.g., "6/8 complete") + +### Sessions Page — Batch View + +Sessions with a shared `batch_id` collapsed into a single row: +- Flow name, launch date, target count, completion status +- Expand to see individual target sessions + +### Target Lists Settings (`/account/target-lists`) + +New page under Team settings. Engineers can: +- Create a named target list with target entries (label + optional notes) +- Edit / delete existing lists +- See last-used date per list + +### Routing + +`getTreeNavigatePath()` in `@/lib/routing` gains `'maintenance'` case → `/flows/:id/maintenance`. + +Individual session execution from the detail page still uses `ProceduralNavigationPage`. + +--- + +## Rollout Phases + +| Phase | Scope | +|-------|-------| +| 1 — DB + API | Alembic migration, model changes, target_lists + schedules endpoints, batch session creation API | +| 2 — Core UI | Sidebar entry, type filter, flow badge, maintenance detail page, batch launch modal | +| 3 — Scheduler | APScheduler integration, auto-session creation, in-app notifications | +| 4 — Target Lists | Saved lists settings page under Team settings | + +Each phase is independently shippable without breaking existing flows. + +--- + +## Testing + +- `test_maintenance_tree_type.py` — CRUD, check constraint, filter by type +- `test_target_lists.py` — create/list/update/delete, team scoping +- `test_maintenance_schedules.py` — create/update/disable, `next_run_at` calculation, schedule fires + creates correct batch sessions +- `test_batch_sessions.py` — correct session count, shared `batch_id`, `target_label` values, re-use previous session targets +- Frontend: `npm run build` after each phase + +--- + +## Future + +- PSA/RMM import (ConnectWise, Kaseya) for target lists — Phase 4 roadmap item +- Patch window constraints (maintenance flows only run within defined windows) +- Per-target session results dashboard