docs: add maintenance flows design document
Covers tree_type expansion, target_lists + maintenance_schedules data model, APScheduler-based auto-session creation, batch launch modal, and phased rollout plan. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
202
docs/plans/2026-02-17-maintenance-flows-design.md
Normal file
202
docs/plans/2026-02-17-maintenance-flows-design.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Maintenance Flows — Design Document
|
||||
|
||||
> **Date:** 2026-02-17
|
||||
> **Status:** Approved
|
||||
> **Phase:** Design (pre-implementation)
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Add `maintenance` as a first-class flow type in ResolutionFlow, alongside `troubleshooting` and `procedural`. Maintenance flows are designed for MSP scheduled/repeatable infrastructure tasks (e.g., patching Citrix servers, updating FSLogix, updating RDS software). They share the procedural execution engine but add scheduling, multi-target batch launching, and saved target lists.
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
- Visual separation of maintenance flows from troubleshooting and project flows
|
||||
- Batch launch: one flow run against N servers/targets simultaneously, each tracked as an independent session
|
||||
- Saved target lists per team, with ad-hoc entry and future PSA/RMM import
|
||||
- Scheduled auto-session creation with in-app notifications
|
||||
- Re-use target lists from previous batch runs
|
||||
|
||||
---
|
||||
|
||||
## Data Model
|
||||
|
||||
### `tree_type` expansion
|
||||
|
||||
**Migration:** Drop and recreate the `ck_trees_tree_type` check constraint to allow `'troubleshooting' | 'procedural' | 'maintenance'`.
|
||||
|
||||
Maintenance flows reuse `tree_structure` (step-by-step like procedural) and `intake_form` (for capturing target-specific context at session start, e.g., patch version).
|
||||
|
||||
---
|
||||
|
||||
### `target_lists` table (new)
|
||||
|
||||
```sql
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
||||
team_id UUID NOT NULL REFERENCES teams(id) ON DELETE CASCADE
|
||||
created_by UUID REFERENCES users(id) ON DELETE SET NULL
|
||||
name VARCHAR(255) NOT NULL
|
||||
description TEXT
|
||||
targets JSONB NOT NULL -- [{ "label": "RDS-01", "notes": "..." }, ...]
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
```
|
||||
|
||||
- Scoped to team; any engineer can create/edit/delete their team's lists
|
||||
- Each target entry: `label` (required, display name / hostname) + `notes` (optional, IP, role, etc.)
|
||||
|
||||
---
|
||||
|
||||
### `maintenance_schedules` table (new)
|
||||
|
||||
```sql
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
|
||||
tree_id UUID NOT NULL REFERENCES trees(id) ON DELETE CASCADE
|
||||
created_by UUID REFERENCES users(id) ON DELETE SET NULL
|
||||
cron_expression VARCHAR(100) NOT NULL -- e.g. "0 9 15 * *"
|
||||
timezone VARCHAR(100) NOT NULL DEFAULT 'UTC'
|
||||
target_list_id UUID REFERENCES target_lists(id) ON DELETE SET NULL
|
||||
is_active BOOLEAN NOT NULL DEFAULT true
|
||||
next_run_at TIMESTAMPTZ NOT NULL
|
||||
last_run_at TIMESTAMPTZ
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
|
||||
```
|
||||
|
||||
- One active schedule per maintenance flow (enforced at API level)
|
||||
- `target_list_id` is optional — if null, schedule auto-creates sessions without targets (engineer specifies targets on the pending sessions)
|
||||
- `next_run_at` is computed from `cron_expression` + `timezone` at creation/update
|
||||
|
||||
---
|
||||
|
||||
### Sessions — batch tracking fields (new columns)
|
||||
|
||||
```sql
|
||||
batch_id UUID -- all sessions from one batch launch share this value
|
||||
target_label VARCHAR(255) -- e.g. "RDS-01"
|
||||
```
|
||||
|
||||
- `batch_id` is generated at batch launch time (not per-session)
|
||||
- `target_label` is the label from the target list entry or ad-hoc input
|
||||
|
||||
---
|
||||
|
||||
## Scheduling Engine
|
||||
|
||||
**APScheduler** runs in-process with the FastAPI backend (async scheduler).
|
||||
|
||||
On startup:
|
||||
1. Load all `is_active=true` maintenance schedules
|
||||
2. Register each as an APScheduler job using its `cron_expression` + `timezone`
|
||||
|
||||
When a schedule fires:
|
||||
1. Resolve target list (`target_list_id` → targets, or empty list if null)
|
||||
2. Generate a new `batch_id`
|
||||
3. Create one `Session` per target with `batch_id`, `target_label`, status `pending`
|
||||
4. Update `last_run_at`, compute and update `next_run_at`
|
||||
5. Create in-app notification: "Maintenance run ready: [Flow Name] — N sessions created"
|
||||
|
||||
Schedule changes (create/update/disable) are applied to APScheduler immediately via the API.
|
||||
|
||||
---
|
||||
|
||||
## Batch Launch (Ad-hoc)
|
||||
|
||||
Triggered from the maintenance flow detail page. Engineer picks target list via modal with four tabs:
|
||||
|
||||
| Tab | Description |
|
||||
|-----|-------------|
|
||||
| **Saved List** | Pick from team's saved target lists |
|
||||
| **Previous Run** | Browse this flow's past batches, re-use that target list |
|
||||
| **Manual Entry** | Paste/type server names (one per line) |
|
||||
| **PSA/RMM Import** | Placeholder — "Coming soon" |
|
||||
|
||||
After confirming, engineer sees a preview: "Will create N sessions for: RDS-01, RDS-02..."
|
||||
|
||||
On confirm: creates N sessions with shared `batch_id`, status `pending`.
|
||||
|
||||
---
|
||||
|
||||
## UI / UX
|
||||
|
||||
### Sidebar
|
||||
|
||||
```
|
||||
All Flows [total]
|
||||
Troubleshooting [count]
|
||||
Projects [count]
|
||||
Maintenance [count] ← new
|
||||
```
|
||||
|
||||
Links to `/trees?type=maintenance`.
|
||||
|
||||
### TreeLibraryPage
|
||||
|
||||
- `typeFilter` expands to `'all' | 'troubleshooting' | 'procedural' | 'maintenance'`
|
||||
- Maintenance flows show a distinct badge (wrench icon, amber accent color)
|
||||
|
||||
### Flow Editor
|
||||
|
||||
- New flow type selector includes "Maintenance"
|
||||
- Uses the same `ProceduralEditorPage` — no new editor needed
|
||||
|
||||
### Maintenance Flow Detail Page (`/flows/:id/maintenance`)
|
||||
|
||||
New page shown when opening a maintenance flow (via `getTreeNavigatePath`). Sections:
|
||||
- **Overview** — name, description, steps summary
|
||||
- **Schedule panel** — set/edit/disable cron schedule, timezone, assigned target list
|
||||
- **Batch Launch button** — opens target list modal
|
||||
- **Run history** — past batches grouped by `batch_id`, status rollup (e.g., "6/8 complete")
|
||||
|
||||
### Sessions Page — Batch View
|
||||
|
||||
Sessions with a shared `batch_id` collapsed into a single row:
|
||||
- Flow name, launch date, target count, completion status
|
||||
- Expand to see individual target sessions
|
||||
|
||||
### Target Lists Settings (`/account/target-lists`)
|
||||
|
||||
New page under Team settings. Engineers can:
|
||||
- Create a named target list with target entries (label + optional notes)
|
||||
- Edit / delete existing lists
|
||||
- See last-used date per list
|
||||
|
||||
### Routing
|
||||
|
||||
`getTreeNavigatePath()` in `@/lib/routing` gains `'maintenance'` case → `/flows/:id/maintenance`.
|
||||
|
||||
Individual session execution from the detail page still uses `ProceduralNavigationPage`.
|
||||
|
||||
---
|
||||
|
||||
## Rollout Phases
|
||||
|
||||
| Phase | Scope |
|
||||
|-------|-------|
|
||||
| 1 — DB + API | Alembic migration, model changes, target_lists + schedules endpoints, batch session creation API |
|
||||
| 2 — Core UI | Sidebar entry, type filter, flow badge, maintenance detail page, batch launch modal |
|
||||
| 3 — Scheduler | APScheduler integration, auto-session creation, in-app notifications |
|
||||
| 4 — Target Lists | Saved lists settings page under Team settings |
|
||||
|
||||
Each phase is independently shippable without breaking existing flows.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
- `test_maintenance_tree_type.py` — CRUD, check constraint, filter by type
|
||||
- `test_target_lists.py` — create/list/update/delete, team scoping
|
||||
- `test_maintenance_schedules.py` — create/update/disable, `next_run_at` calculation, schedule fires + creates correct batch sessions
|
||||
- `test_batch_sessions.py` — correct session count, shared `batch_id`, `target_label` values, re-use previous session targets
|
||||
- Frontend: `npm run build` after each phase
|
||||
|
||||
---
|
||||
|
||||
## Future
|
||||
|
||||
- PSA/RMM import (ConnectWise, Kaseya) for target lists — Phase 4 roadmap item
|
||||
- Patch window constraints (maintenance flows only run within defined windows)
|
||||
- Per-target session results dashboard
|
||||
Reference in New Issue
Block a user