Files
resolutionflow/.ai/PROJECT_CONTEXT.md
Michael Chihlas f1be3abcc5
Some checks failed
CI / e2e (push) Has been cancelled
CI / frontend (push) Has been cancelled
CI / backend (push) Has been cancelled
Mirror to GitHub / mirror (push) Has been cancelled
feat: self-serve signup Phase 2 (frontend cutover) (#162)
Co-authored-by: Michael Chihlas <michael@resolutionflow.com>
Co-committed-by: Michael Chihlas <michael@resolutionflow.com>
2026-05-07 18:42:20 +00:00

22 KiB
Raw Blame History

PROJECT_CONTEXT.md — ResolutionFlow

SaaS troubleshooting platform for MSPs. Stable architectural truth. Updated only when the repo's shape changes.


Product & naming

Canonical product name is ResolutionFlow. patherly is the legacy internal name — still present in DB name (patherly on Railway, resolutionflow locally), some Railway service names, and historical paths. Treat as aliases, not canonical. Docker containers are resolutionflow_*.

User terminology: "Flows" (not Trees), "Projects" (not Procedures), "Solutions Library" (not Step Library). Maintenance flows hidden from pilot UI (backend retains them). DB column tree_type values unchanged.


SaaS shape

Multi-tenant by account. Primary role hierarchy: super_admin > owner > engineer > viewer — driven by is_super_admin + account_role. Never role=='admin' — use is_super_admin. Separate team-scoped admin gate exists orthogonally to the role hierarchy: is_team_admin=True + valid team_id, enforced by require_team_admin. Backend deps in app/api/deps.py: get_current_active_user, require_engineer_or_admin, require_admin, require_account_owner, require_team_admin. Frontend: usePermissions() hook. Central logic in backend/app/core/permissions.py + frontend/src/hooks/usePermissions.ts.


Status

Go-to-Market Validation (pre-PMF). Backend feature-complete (55+ endpoints, 100+ tests). Phase 0.5 FlowPilot telemetry baseline accruing. See CURRENT-STATE.md for live status, 03-DEVELOPMENT-ROADMAP.md for phases.


Tech stack

  • Backend: Python 3.12 + FastAPI, SQLAlchemy 2.0 async (asyncpg), Alembic, Pydantic v2, JWT (python-jose + bcrypt, JTI refresh rotation), APScheduler (in-process with FastAPI lifespan).
  • Frontend: React 19 + Vite + TypeScript, Tailwind v4 (CSS-only config in index.css), Zustand (immer + zundo), React Router v7, Axios (token-refresh interceptor), Lucide.
  • DB: PostgreSQL 16 (RLS enabled Phase 4, pgvector).

Project structure

resolutionflow/
├── backend/
│   ├── app/
│   │   ├── main.py                     # FastAPI entry
│   │   ├── api/endpoints/              # 50+ routers registered in api/router.py — auth/admin, trees/sessions, AI/chat, scripts, integrations, uploads, accounts, FlowPilot, etc.
│   │   ├── api/deps.py                 # auth deps (incl. require_team_admin)
│   │   ├── api/router.py               # registration
│   │   ├── core/                       # config, database, permissions, security, audit, rate_limit
│   │   ├── models/                     # SQLAlchemy (incl. FlowProposal)
│   │   ├── schemas/                    # Pydantic
│   │   ├── services/psa/               # PSA provider pattern (base, connectwise/, autotask/, halopsa/, cache, encryption, exceptions, registry, ticket_context, types)
│   │   ├── services/knowledge_flywheel.py + _scheduler.py
│   │   └── services/knowledge_gap_service.py
│   ├── alembic/versions/               # 001-070 sequential, then hex hash
│   ├── scripts/                        # seed_data, seed_trees, seed_test_users
│   └── tests/                          # pytest integration
├── frontend/
│   ├── src/
│   │   ├── api/                        # Axios client + endpoint modules
│   │   ├── components/                 # common, layout, dashboard, tree-editor, session, procedural, procedural-editor, library, step-library, ui, flowpilot
│   │   ├── hooks/                      # usePermissions, useSessionTimer, useKeyboardShortcuts
│   │   ├── pages/
│   │   ├── store/                      # Zustand (auth, treeEditor, proceduralEditor, userPreferences, scriptGeneratorStore)
│   │   └── types/
│   └── (Tailwind v4 CSS-only config in src/index.css)
├── docs/plans/archive/                 # pre-March 2026 plans
├── docs/connectwise/                   # CW API reference + best-practices guides
├── docs/LESSONS-ARCHIVE.md             # archived lessons (fixes in code)
├── .ai/                                # dual-agent handoff system (see .ai/README.md)
├── CLAUDE.md · AGENTS.md · CURRENT-STATE.md · DESIGN-SYSTEM.md · DEV-ENV.md

Dev commands

Full setup in DEV-ENV.md (host-agnostic, with homelab Proxmox reference topology). Day-to-day:

docker compose -f docker-compose.dev.yml up -d                      # start stack
cd backend && source venv/bin/activate && uvicorn app.main:app --reload
cd frontend && npm run dev
pytest --override-ini="addopts="                                    # tests (first time: CREATE DATABASE resolutionflow_test)
cd backend && alembic upgrade head                                  # migrate
cd backend && alembic revision -m "desc"                            # manual migration (preferred per Lesson 77)
cd backend && alembic revision --autogenerate -m "desc"             # picks up drift; review carefully
cd frontend && npm run build                                        # stricter than tsc --noEmit — final check
cd frontend && npx tsc -b                                           # TS-only check when dist/ has EACCES
docker exec -it resolutionflow_postgres psql -U postgres -d resolutionflow
python -m scripts.seed_trees                                        # seed (from backend/)

Never pass --rev-id to alembic — let it generate the hex hash.

On hosts without native python/node/npm (e.g. the code-server LXC), run commands inside the already-running containers instead:

docker exec resolutionflow_backend pytest --override-ini="addopts="
docker exec resolutionflow_backend alembic upgrade head
docker exec -w /app resolutionflow_frontend npm run build
docker exec -w /app resolutionflow_frontend npx tsc -b

URLs & test users

URLs: Frontend http://localhost:5173, backend http://localhost:8000, API docs http://localhost:8000/api/docs.

Test users (all password TestPass123!): admin@resolutionflow.example.com (super_admin), teamadmin@resolutionflow.example.com, engineer@resolutionflow.example.com, pro@resolutionflow.example.com.


CI

Gitea (gitea.resolutionflow.com/chihlasm/resolutionflow/actions). gh CLI works for issues/PRs on the GitHub mirror, but not CI runs.


Deployment (Railway)

  • Prod: resolutionflow.com (frontend), api.resolutionflow.com (backend).
  • Auto-deploy: Gitea push → GitHub mirror → Railway follows GitHub main.
  • PR environments auto-created; need manual domain generation + VITE_API_URL with https:// prefix.
  • ALLOW_RAILWAY_ORIGINS=true for *.up.railway.app CORS.
  • Shared Variables (Railway project-level) auto-propagate to PR envs — use for secrets like ANTHROPIC_API_KEY.
  • Super admin utility: backend/make_superadmin_simple.py list|<email>.

ConnectWise PSA

Reference: docs/connectwise/ — start with CONNECTWISE-API-REFERENCE.md, then the best-practices/ guides. Extracted OpenAPI spec in connectwise-psa-resolutionflow-reference.json (670 endpoints, v2025.16); full spec in connectwise-psa-openapi-full.json.

  • Auth: API Key (Base64 companyId+publicKey:privateKey) + clientId header every request. clientId is server-side (CW_CLIENT_ID in config.py) — identifies ResolutionFlow, not per-tenant. Per-connection: company_id, public_key, private_key, server_url.
  • Architecture: services/psa/ provider pattern — PSAProvider base, ConnectWiseProvider impl, PsaProviderRegistry for multi-PSA dispatch. Credentials encrypted at rest via services/psa/encryption.py (Fernet). Per-team credentials, never per-user. Endpoints in api/endpoints/integrations.py. In-memory TTL cache in services/psa/cache.py.
  • Integration flows: session docs → ticket notes (POST /service/tickets/{id}/notes, markdown supported); ticket context → FlowPilot; callbacks via /system/callbacks with HMAC verification.
  • API rules: pin version via Accept header application/vnd.connectwise.com+json; version=2025.16. Paginate ≤1000/page. Dynamic base URL via /login/companyinfo/{companyId}. Request minimal permissions (MY, not ALL).

Coding standards

  • Python: type hints everywhere, async/await for DB, Pydantic v2, DateTime(timezone=True) always.
  • TypeScript: interfaces for all data, const over let, functional components + hooks, shared logic in custom hooks.
  • Git: feature branch before committing (git checkout -b feat/feature-name). Commit format: type: description (feat/fix/refactor/docs/test/chore). Large features: commit per phase with npm run build validation. Push to Gitea — auto-mirrors to GitHub (.gitea/workflows/mirror-to-github.yml); never push GitHub directly. (Agent-specific Co-Authored-By trailers live in CLAUDE.md / AGENTS.md.)

After shipping: update CURRENT-STATE.md + 03-DEVELOPMENT-ROADMAP.md, gh issue close #N for resolved issues, add lessons only for non-obvious traps (otherwise let the code speak).


Common tasks

  • New endpoint: endpoints/router.pyschemas/ → tests → frontend API client.
  • New page: pages/ → route in router.tsx → nav in AppLayout.tsx.
  • New public route: top-level in router.tsx alongside /login, not inside ProtectedRoute.
  • New frontend API module: types in types/ → export from types/index.ts → client in api/ → export from api/index.ts.
  • Schema change: update model → alembic revision -m "desc" → review → alembic upgrade head.
  • New VITE_* env var: add as ARG + ENV in frontend/Dockerfile for Railway builds (Lesson 60 — Railway env vars are runtime-only, Vite bakes at build time).
  • Account sub-page: add route in router.tsx under account children + add link card in AccountSettingsPage.tsxAccountLayout has NO sidebar nav.

Design system

Source of truth: DESIGN-SYSTEM.md. Read before any visual change.

  • Flat high-contrast dark theme, Sentry/PostHog-inspired. No glass, backdrop blur, ambient orbs, gradient surfaces.
  • Accent electric blue (#60a5fa dark / #2563eb light) — ≤5% of UI, interactive elements only. Warning amber (#fbbf24), info cyan (#67e8f9), success green (#34d399), danger red (#f87171). Each with -dim at 10% opacity.
  • Backgrounds: bg-sidebar (#0e1016) → bg-page (#16181f) → bg-card (#1e2028) → bg-elevated (#2a2d38). Borders border-default / border-hover.
  • Text: text-headingtext-primarytext-muted-foregroundtext-muted.
  • Fonts: IBM Plex Sans (body), Bricolage Grotesque (heading, 700 weight for logo), JetBrains Mono (code).
  • Logo: 30px gradient square (ember orange) + "ResolutionFlow" in Bricolage Grotesque. Assets in brand-assets/, frontend/src/assets/brand/, frontend/public/icons/.
  • Mockups: docs/mockups/ (HTML).
  • Deprecated — do not use: glass-card, glass-stat, bg-gradient-brand, backdrop-filter: blur(), ambient orbs, purple gradients, ember orange as accent, cyan as accent (cyan is info only).

Frontend patterns

  • Component basics: cn() from @/lib/utils, Lucide icons, Modal.tsx for modals (mobile-responsive items-end sm:items-center + max-w-full sm:max-w-lg).
  • Types: Create in types/, export from types/index.ts, import type { T } from '@/types'.
  • Routing: getTreeNavigatePath() / getTreeEditorPath() from @/lib/routing. Tree editor is /trees/new. All dashboard session clicks → /pilot/:id regardless of session_type.
  • Lazy routes: lazyWithRetry from @/lib/lazyWithRetry.ts, not React.lazy (auto-reload on stale chunks).
  • Public pages: raw fetch() with full URL, NOT apiClient (which requires auth tokens).
  • Toast: toast.warning() not toast.warn(). Import from @/lib/toast — methods: success, error, warning, info.
  • Assistant chat: uses local React useState, not Zustand. All three send paths (handleSend, sendPrefill, handleResumeNew) must call setShowTaskLane(true) when response has actions/questions.
  • Chat backend wiring: aiSessionsApi.sendChatMessage/ai-sessions/{id}/chatunified_chat_service.py. NOT assistant_chat_service.py (removed except retention settings).
  • FlowPilot: Actions live in page header (Resolve/Escalate/Share Update + overflow). useBlocker for active-session nav guard. "Pause & Leave" auto-pauses.
  • AI markers: [QUESTIONS], [ACTIONS], [FORK], [DELTA]...[/DELTA] (editor), [TREE_UPDATE] (troubleshooting builder), [STEPS_UPDATE] (procedural builder), [METADATA]. Parsed in unified_chat_service.py; conversation history stores stripped display_content. If markers disappear: check system-prompt final reminder + per-user-message [SYSTEM: ...] injection in _call_anthropic_cached().
  • Image uploads: paste/attach → Railway S3 via uploadsApi.upload() → resized by storage_service.resize_image_for_vision() (Pillow, 1568px max, PNG→JPEG) → base64 → Claude multimodal blocks. Max 3/msg. Images NOT stored in history.
  • Async select-load-apply: guard with a ref (pattern in AssistantChatPage currentChatRef). Update synchronously on every selection change; after every await, bail out if ref.current !== thisId.
  • Editor-Embedded Flow Assist: EditorAIPanel (320px side panel) + useEditorAI. Ghost nodes via _suggestion: true. Route actions via settings.get_model_for_action().
  • Script Builder: /script-builder, chat-style. Backend ScriptBuilderSession, script_builder_service.py, endpoints /scripts/builder/. FlowPilot handoff via action_type: "open_script_builder" + sessionStorage.
  • Intake form field schema: variable_name + field_type (NOT name / type).
  • Node field priority (copilot, summaries): titlequestiondescriptioncontentlabel.
  • Procedural sessions auto-start on page load (no intake/Start screen). Troubleshooting flows DO have a start screen.

Critical lessons

Lessons 1-40 archived to docs/LESSONS-ARCHIVE.md — fixes baked into the codebase. Grep the archive when an error message or symptom is unfamiliar, or after two failed attempts at resolving an issue. Don't pre-load for routine work.

Backend / data

  • APScheduler interval jobs always max_instances=1 — without it, overlapping runs reprocess records (TOCTOU).
  • get_db rolls back on exception — never remove the await session.rollback(), or one failed request poisons the connection with InFailedSQLTransaction cascading.
  • Startup routines on tenant-isolated tables must use _admin_session_factory(), not get_db(). Phase 4 RLS has no app.current_account_id set at startup. get_service_account_id is safe (reads cached app.state).
  • Backfill migrations adding account_id: grep ALL ModelClass( sites in service code to verify account_id= is passed. SQLAlchemy accepts None silently — Phase 4 RLS WITH CHECK surfaces the problem at runtime as InsufficientPrivilegeError: new row violates row-level security policy.
  • tree_shares.account_id = tree.account_id, never current_user.account_id. A super_admin sharing another tenant's tree must produce the share in the tree owner's tenant, or it becomes invisible post-RLS.
  • Global tables (no account_id, never in RLS migrations): script_categories, platform_steps, template_trees, plan_feature_defaults, accounts. Scan at class level — one .py file can hold multiple classes with different columns (e.g. ScriptCategory vs ScriptTemplate).
  • ai_sessions.status is VARCHAR(30) — fits requesting_escalation (23 chars). Migration f0aad74ea51b widened from 20.
  • PostgreSQL func.sum(case(...)) returns Decimal via asyncpg — cast to int() before Pydantic dict[str, Any].
  • Enhancement / branch_addition proposals need modified_flow_data via "Edit & Publish" — backend 400 on direct approve. Only new_flow supports direct approve.
  • Adding email types: static async method on EmailService in core/email.py. Fire-and-forget from endpoints (log errors, don't fail the request).

AI / FlowPilot

  • Anthropic SDK max_retries=1 — default of 2 can take 3× the timeout.
  • Model tier routing: settings.get_model_for_action(action_type). Always alias form (claude-sonnet-4-6).
  • FlowPilot must ask GUI-vs-script before suggesting either when both are viable — see FLOWPILOT_SYSTEM_PROMPT in flowpilot_engine.py.
  • Telemetry events to grep: anthropic.cache (prompt-cache hit/create), mcp.turn (per-turn MCP availability), mcp.fallback (MCP silent-retry fired).
  • Don't put literal payloads in system prompts. Bit us twice in one day: a worked [QUESTIONS] example with literal "Outlook + jsmith" content, and a full DNS troubleshooting tree, both caused Claude to recite that content on unrelated tickets — the symptom looked like task-lane state leaking across chats. The fix is structural: every output example in a system prompt uses <placeholder> syntax ({"text": "<one short, specific question>"}), never literal field values. Real-looking format examples live in few-shot messages (separate file, separate code path), not system prompts. Guardrail: tests/test_prompt_anti_parrot.py scans every *_PROMPT/*_SCHEMA/*_PROTOCOL/*_FORMAT constant in app/services/ and app/core/; CI fails when a marker block contains a literal JSON value or when a known leaked token (jsmith, DC01, ADSync, Dnscache, etc.) appears anywhere in a prompt.

Frontend / UI

  • Flex height chain: every ancestor from app-shell grid to React Flow canvas needs flex + flex-1 + min-h-0 or h-full. Missing flex collapses to 0. Same rule for FlowPilot action bar and any tall scroller.
  • React Flow CSS in Tailwind v4: import in index.css, not component JS. Override dark theme via --xy-* CSS vars.
  • text-secondary renders invisible on dark — Tailwind v4 maps it to --color-secondary (a surface color). Use text-muted-foreground for readable secondary text. Avoid text-muted for body — labels only.
  • bg-accent is electric blue — never for code/kbd. Use bg-white/[0.12] border border-white/[0.06] for inline code, bg-white/[0.08] for kbd. Accent reserved for interactive elements.
  • landing.css uses self-contained --lp-* vars — never var(--color-*) theme tokens (they resolve incorrectly outside the app shell).
  • Never transition: all — list properties explicitly, or layout props animate and jank.
  • Date range filter end dates: setHours(23, 59, 59, 999) before sending, or the day's items are excluded. For string-based date inputs, append T23:59:59.999Z.
  • TopBar search: full bar hidden sm:block, icon button sm:hidden — both open CommandPalette.
  • Hover pop-out cards: scrim pointer-events-none, expanded card has its own click handler at z-50, dismiss via onMouseLeave on wrapper. Never put handlers on the scrim.
  • tsc -b in Dockerfile is stricter than tsc --noEmit — enforces noUnusedLocals / noUnusedParameters as hard errors. Check IDE yellow squiggles before pushing.
  • Dashboard prefill auto-submits via useEffect + prefillHandledRef guard — no double-enter.
  • Global Axios 5xx interceptor fires before component .catch() — fix optional-data endpoints at the source (return [] / {} on provider failure), not in the component.
  • Playwright strict mode: scope selectors to avoid sidebar/main ambiguity. Use getByRole('heading', { name }) or .animate-scale-in locators, not bare getByText().

Env / infra

  • Node 20.19+ required (Vite 7). nvm use 20 or PATH="$HOME/.nvm/versions/node/v20.19.0/bin:$PATH".
  • Railway backend service is patherly, DB name railway. Public Postgres proxy: interchange.proxy.rlwy.net:45797.
  • Railway Object Storage bucket resolutionflow-uploads. Env vars STORAGE_*. boto3 in storage_service.py. Dockerfile needs Pillow + libjpeg-dev / zlib1g-dev.
  • PostHog: PostHogProvider + posthog.init() in main.tsx. Helpers in lib/analytics.ts. Env: VITE_PUBLIC_POSTHOG_KEY, VITE_PUBLIC_POSTHOG_HOST. identifyUser() in authStore.fetchUser(), resetAnalytics() on logout.
  • bun PATH on devserver01: BUN_INSTALL="$HOME/.bun", PATH="$BUN_INSTALL/bin:$PATH". Playwright Chromium needs libatk1.0-0 libatk-bridge2.0-0 libcups2 libxkbcommon0 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2.
  • Full-stack change: trace schema → endpoint → API client → hook → store → UI. Don't assume one end proves the other.
  • Dev env — see DEV-ENV.md for current topology, REPO_ROOT requirement when compose runs inside a container, Vite allowedHosts, linuxserver.io group_add + custom-cont-init.d workaround, docker compose up no-op-on-unchanged-hash gotcha.

Quick reference

What Where
Detailed status CURRENT-STATE.md
Roadmap 03-DEVELOPMENT-ROADMAP.md
Design system DESIGN-SYSTEM.md
Dev env DEV-ENV.md
Archived lessons docs/LESSONS-ARCHIVE.md
ConnectWise API docs/connectwise/
GitHub issues gh issue list --state open
Local API docs http://localhost:8000/api/docs
Handoff system .ai/README.md