Files
resolutionflow/docs/plans/2026-02-17-maintenance-flows-design.md
chihlasm 3b506059f6 docs: add maintenance flows design document
Covers tree_type expansion, target_lists + maintenance_schedules data
model, APScheduler-based auto-session creation, batch launch modal,
and phased rollout plan.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-17 16:36:55 -05:00

7.1 KiB

Maintenance Flows — Design Document

Date: 2026-02-17 Status: Approved Phase: Design (pre-implementation)


Overview

Add maintenance as a first-class flow type in ResolutionFlow, alongside troubleshooting and procedural. Maintenance flows are designed for MSP scheduled/repeatable infrastructure tasks (e.g., patching Citrix servers, updating FSLogix, updating RDS software). They share the procedural execution engine but add scheduling, multi-target batch launching, and saved target lists.


Goals

  • Visual separation of maintenance flows from troubleshooting and project flows
  • Batch launch: one flow run against N servers/targets simultaneously, each tracked as an independent session
  • Saved target lists per team, with ad-hoc entry and future PSA/RMM import
  • Scheduled auto-session creation with in-app notifications
  • Re-use target lists from previous batch runs

Data Model

tree_type expansion

Migration: Drop and recreate the ck_trees_tree_type check constraint to allow 'troubleshooting' | 'procedural' | 'maintenance'.

Maintenance flows reuse tree_structure (step-by-step like procedural) and intake_form (for capturing target-specific context at session start, e.g., patch version).


target_lists table (new)

id          UUID PRIMARY KEY DEFAULT gen_random_uuid()
team_id     UUID NOT NULL REFERENCES teams(id) ON DELETE CASCADE
created_by  UUID REFERENCES users(id) ON DELETE SET NULL
name        VARCHAR(255) NOT NULL
description TEXT
targets     JSONB NOT NULL  -- [{ "label": "RDS-01", "notes": "..." }, ...]
created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
  • Scoped to team; any engineer can create/edit/delete their team's lists
  • Each target entry: label (required, display name / hostname) + notes (optional, IP, role, etc.)

maintenance_schedules table (new)

id               UUID PRIMARY KEY DEFAULT gen_random_uuid()
tree_id          UUID NOT NULL REFERENCES trees(id) ON DELETE CASCADE
created_by       UUID REFERENCES users(id) ON DELETE SET NULL
cron_expression  VARCHAR(100) NOT NULL   -- e.g. "0 9 15 * *"
timezone         VARCHAR(100) NOT NULL DEFAULT 'UTC'
target_list_id   UUID REFERENCES target_lists(id) ON DELETE SET NULL
is_active        BOOLEAN NOT NULL DEFAULT true
next_run_at      TIMESTAMPTZ NOT NULL
last_run_at      TIMESTAMPTZ
created_at       TIMESTAMPTZ NOT NULL DEFAULT now()
updated_at       TIMESTAMPTZ NOT NULL DEFAULT now()
  • One active schedule per maintenance flow (enforced at API level)
  • target_list_id is optional — if null, schedule auto-creates sessions without targets (engineer specifies targets on the pending sessions)
  • next_run_at is computed from cron_expression + timezone at creation/update

Sessions — batch tracking fields (new columns)

batch_id      UUID        -- all sessions from one batch launch share this value
target_label  VARCHAR(255) -- e.g. "RDS-01"
  • batch_id is generated at batch launch time (not per-session)
  • target_label is the label from the target list entry or ad-hoc input

Scheduling Engine

APScheduler runs in-process with the FastAPI backend (async scheduler).

On startup:

  1. Load all is_active=true maintenance schedules
  2. Register each as an APScheduler job using its cron_expression + timezone

When a schedule fires:

  1. Resolve target list (target_list_id → targets, or empty list if null)
  2. Generate a new batch_id
  3. Create one Session per target with batch_id, target_label, status pending
  4. Update last_run_at, compute and update next_run_at
  5. Create in-app notification: "Maintenance run ready: [Flow Name] — N sessions created"

Schedule changes (create/update/disable) are applied to APScheduler immediately via the API.


Batch Launch (Ad-hoc)

Triggered from the maintenance flow detail page. Engineer picks target list via modal with four tabs:

Tab Description
Saved List Pick from team's saved target lists
Previous Run Browse this flow's past batches, re-use that target list
Manual Entry Paste/type server names (one per line)
PSA/RMM Import Placeholder — "Coming soon"

After confirming, engineer sees a preview: "Will create N sessions for: RDS-01, RDS-02..."

On confirm: creates N sessions with shared batch_id, status pending.


UI / UX

Sidebar

All Flows          [total]
  Troubleshooting  [count]
  Projects         [count]
  Maintenance      [count]  ← new

Links to /trees?type=maintenance.

TreeLibraryPage

  • typeFilter expands to 'all' | 'troubleshooting' | 'procedural' | 'maintenance'
  • Maintenance flows show a distinct badge (wrench icon, amber accent color)

Flow Editor

  • New flow type selector includes "Maintenance"
  • Uses the same ProceduralEditorPage — no new editor needed

Maintenance Flow Detail Page (/flows/:id/maintenance)

New page shown when opening a maintenance flow (via getTreeNavigatePath). Sections:

  • Overview — name, description, steps summary
  • Schedule panel — set/edit/disable cron schedule, timezone, assigned target list
  • Batch Launch button — opens target list modal
  • Run history — past batches grouped by batch_id, status rollup (e.g., "6/8 complete")

Sessions Page — Batch View

Sessions with a shared batch_id collapsed into a single row:

  • Flow name, launch date, target count, completion status
  • Expand to see individual target sessions

Target Lists Settings (/account/target-lists)

New page under Team settings. Engineers can:

  • Create a named target list with target entries (label + optional notes)
  • Edit / delete existing lists
  • See last-used date per list

Routing

getTreeNavigatePath() in @/lib/routing gains 'maintenance' case → /flows/:id/maintenance.

Individual session execution from the detail page still uses ProceduralNavigationPage.


Rollout Phases

Phase Scope
1 — DB + API Alembic migration, model changes, target_lists + schedules endpoints, batch session creation API
2 — Core UI Sidebar entry, type filter, flow badge, maintenance detail page, batch launch modal
3 — Scheduler APScheduler integration, auto-session creation, in-app notifications
4 — Target Lists Saved lists settings page under Team settings

Each phase is independently shippable without breaking existing flows.


Testing

  • test_maintenance_tree_type.py — CRUD, check constraint, filter by type
  • test_target_lists.py — create/list/update/delete, team scoping
  • test_maintenance_schedules.py — create/update/disable, next_run_at calculation, schedule fires + creates correct batch sessions
  • test_batch_sessions.py — correct session count, shared batch_id, target_label values, re-use previous session targets
  • Frontend: npm run build after each phase

Future

  • PSA/RMM import (ConnectWise, Kaseya) for target lists — Phase 4 roadmap item
  • Patch window constraints (maintenance flows only run within defined windows)
  • Per-target session results dashboard