The code-server LXC has bun and docker but no python/node/npm on PATH,
which left Codex unable to reproduce build/test commands. Adds a 6-line
block to PROJECT_CONTEXT.md showing the docker exec resolutionflow_{backend,frontend}
form, and updates the AGENTS.md "Tooling you do NOT have" line to point
Codex at it instead of suggesting toolchain installs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
22 KiB
PROJECT_CONTEXT.md — ResolutionFlow
SaaS troubleshooting platform for MSPs. Stable architectural truth. Updated only when the repo's shape changes.
Product & naming
Canonical product name is ResolutionFlow. patherly is the legacy internal name — still present in DB name (patherly on Railway, resolutionflow locally), some Railway service names, and historical paths. Treat as aliases, not canonical. Docker containers are resolutionflow_*.
User terminology: "Flows" (not Trees), "Projects" (not Procedures), "Solutions Library" (not Step Library). Maintenance flows hidden from pilot UI (backend retains them). DB column tree_type values unchanged.
SaaS shape
Multi-tenant by account. Primary role hierarchy: super_admin > owner > engineer > viewer — driven by is_super_admin + account_role. Never role=='admin' — use is_super_admin. Separate team-scoped admin gate exists orthogonally to the role hierarchy: is_team_admin=True + valid team_id, enforced by require_team_admin. Backend deps in app/api/deps.py: get_current_active_user, require_engineer_or_admin, require_admin, require_account_owner, require_team_admin. Frontend: usePermissions() hook. Central logic in backend/app/core/permissions.py + frontend/src/hooks/usePermissions.ts.
Status
Go-to-Market Validation (pre-PMF). Backend feature-complete (55+ endpoints, 100+ tests). Phase 0.5 FlowPilot telemetry baseline accruing. See CURRENT-STATE.md for live status, 03-DEVELOPMENT-ROADMAP.md for phases.
Tech stack
- Backend: Python 3.11 + FastAPI, SQLAlchemy 2.0 async (asyncpg), Alembic, Pydantic v2, JWT (python-jose + bcrypt, JTI refresh rotation), APScheduler (in-process with FastAPI lifespan).
- Frontend: React 19 + Vite + TypeScript, Tailwind v4 (CSS-only config in
index.css), Zustand (immer + zundo), React Router v7, Axios (token-refresh interceptor), Lucide. - DB: PostgreSQL 16 (RLS enabled Phase 4, pgvector).
Project structure
resolutionflow/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI entry
│ │ ├── api/endpoints/ # 50+ routers registered in api/router.py — auth/admin, trees/sessions, AI/chat, scripts, integrations, uploads, accounts, FlowPilot, etc.
│ │ ├── api/deps.py # auth deps (incl. require_team_admin)
│ │ ├── api/router.py # registration
│ │ ├── core/ # config, database, permissions, security, audit, rate_limit
│ │ ├── models/ # SQLAlchemy (incl. FlowProposal)
│ │ ├── schemas/ # Pydantic
│ │ ├── services/psa/ # PSA provider pattern (base, connectwise/, autotask/, halopsa/, cache, encryption, exceptions, registry, ticket_context, types)
│ │ ├── services/knowledge_flywheel.py + _scheduler.py
│ │ └── services/knowledge_gap_service.py
│ ├── alembic/versions/ # 001-070 sequential, then hex hash
│ ├── scripts/ # seed_data, seed_trees, seed_test_users
│ └── tests/ # pytest integration
├── frontend/
│ ├── src/
│ │ ├── api/ # Axios client + endpoint modules
│ │ ├── components/ # common, layout, dashboard, tree-editor, session, procedural, procedural-editor, library, step-library, ui, flowpilot
│ │ ├── hooks/ # usePermissions, useSessionTimer, useKeyboardShortcuts
│ │ ├── pages/
│ │ ├── store/ # Zustand (auth, treeEditor, proceduralEditor, userPreferences, scriptGeneratorStore)
│ │ └── types/
│ └── (Tailwind v4 CSS-only config in src/index.css)
├── docs/plans/archive/ # pre-March 2026 plans
├── docs/connectwise/ # CW API reference + best-practices guides
├── docs/LESSONS-ARCHIVE.md # archived lessons (fixes in code)
├── .ai/ # dual-agent handoff system (see .ai/README.md)
├── CLAUDE.md · AGENTS.md · CURRENT-STATE.md · DESIGN-SYSTEM.md · DEV-ENV.md
Dev commands
Full setup in DEV-ENV.md (host-agnostic, with homelab Proxmox reference topology). Day-to-day:
docker compose -f docker-compose.dev.yml up -d # start stack
cd backend && source venv/bin/activate && uvicorn app.main:app --reload
cd frontend && npm run dev
pytest --override-ini="addopts=" # tests (first time: CREATE DATABASE resolutionflow_test)
cd backend && alembic upgrade head # migrate
cd backend && alembic revision -m "desc" # manual migration (preferred per Lesson 77)
cd backend && alembic revision --autogenerate -m "desc" # picks up drift; review carefully
cd frontend && npm run build # stricter than tsc --noEmit — final check
cd frontend && npx tsc -b # TS-only check when dist/ has EACCES
docker exec -it resolutionflow_postgres psql -U postgres -d resolutionflow
python -m scripts.seed_trees # seed (from backend/)
Never pass --rev-id to alembic — let it generate the hex hash.
On hosts without native python/node/npm (e.g. the code-server LXC), run commands inside the already-running containers instead:
docker exec resolutionflow_backend pytest --override-ini="addopts="
docker exec resolutionflow_backend alembic upgrade head
docker exec -w /app resolutionflow_frontend npm run build
docker exec -w /app resolutionflow_frontend npx tsc -b
URLs & test users
URLs: Frontend http://localhost:5173, backend http://localhost:8000, API docs http://localhost:8000/api/docs.
Test users (all password TestPass123!): admin@resolutionflow.example.com (super_admin), teamadmin@resolutionflow.example.com, engineer@resolutionflow.example.com, pro@resolutionflow.example.com.
CI
Gitea (gitea.resolutionflow.com/chihlasm/resolutionflow/actions). gh CLI works for issues/PRs on the GitHub mirror, but not CI runs.
Deployment (Railway)
- Prod:
resolutionflow.com(frontend),api.resolutionflow.com(backend). - Auto-deploy: Gitea push → GitHub mirror → Railway follows GitHub
main. - PR environments auto-created; need manual domain generation +
VITE_API_URLwithhttps://prefix. ALLOW_RAILWAY_ORIGINS=truefor*.up.railway.appCORS.- Shared Variables (Railway project-level) auto-propagate to PR envs — use for secrets like
ANTHROPIC_API_KEY. - Super admin utility:
backend/make_superadmin_simple.py list|<email>.
ConnectWise PSA
Reference: docs/connectwise/ — start with CONNECTWISE-API-REFERENCE.md, then the best-practices/ guides. Extracted OpenAPI spec in connectwise-psa-resolutionflow-reference.json (670 endpoints, v2025.16); full spec in connectwise-psa-openapi-full.json.
- Auth: API Key (Base64
companyId+publicKey:privateKey) +clientIdheader every request.clientIdis server-side (CW_CLIENT_IDinconfig.py) — identifies ResolutionFlow, not per-tenant. Per-connection:company_id,public_key,private_key,server_url. - Architecture:
services/psa/provider pattern —PSAProviderbase,ConnectWiseProviderimpl,PsaProviderRegistryfor multi-PSA dispatch. Credentials encrypted at rest viaservices/psa/encryption.py(Fernet). Per-team credentials, never per-user. Endpoints inapi/endpoints/integrations.py. In-memory TTL cache inservices/psa/cache.py. - Integration flows: session docs → ticket notes (
POST /service/tickets/{id}/notes, markdown supported); ticket context → FlowPilot; callbacks via/system/callbackswith HMAC verification. - API rules: pin version via Accept header
application/vnd.connectwise.com+json; version=2025.16. Paginate ≤1000/page. Dynamic base URL via/login/companyinfo/{companyId}. Request minimal permissions (MY, not ALL).
Coding standards
- Python: type hints everywhere, async/await for DB, Pydantic v2,
DateTime(timezone=True)always. - TypeScript: interfaces for all data,
constoverlet, functional components + hooks, shared logic in custom hooks. - Git: feature branch before committing (
git checkout -b feat/feature-name). Commit format:type: description(feat/fix/refactor/docs/test/chore). Large features: commit per phase withnpm run buildvalidation. Push to Gitea — auto-mirrors to GitHub (.gitea/workflows/mirror-to-github.yml); never push GitHub directly. (Agent-specificCo-Authored-Bytrailers live in CLAUDE.md / AGENTS.md.)
After shipping: update CURRENT-STATE.md + 03-DEVELOPMENT-ROADMAP.md, gh issue close #N for resolved issues, add lessons only for non-obvious traps (otherwise let the code speak).
Common tasks
- New endpoint:
endpoints/→router.py→schemas/→ tests → frontend API client. - New page:
pages/→ route inrouter.tsx→ nav inAppLayout.tsx. - New public route: top-level in
router.tsxalongside/login, not insideProtectedRoute. - New frontend API module: types in
types/→ export fromtypes/index.ts→ client inapi/→ export fromapi/index.ts. - Schema change: update model →
alembic revision -m "desc"→ review →alembic upgrade head. - New
VITE_*env var: add asARG+ENVinfrontend/Dockerfilefor Railway builds (Lesson 60 — Railway env vars are runtime-only, Vite bakes at build time). - Account sub-page: add route in
router.tsxunderaccountchildren + add link card inAccountSettingsPage.tsx—AccountLayouthas NO sidebar nav.
Design system
Source of truth: DESIGN-SYSTEM.md. Read before any visual change.
- Flat high-contrast dark theme, Sentry/PostHog-inspired. No glass, backdrop blur, ambient orbs, gradient surfaces.
- Accent electric blue (#60a5fa dark / #2563eb light) — ≤5% of UI, interactive elements only. Warning amber (#fbbf24), info cyan (#67e8f9), success green (#34d399), danger red (#f87171). Each with
-dimat 10% opacity. - Backgrounds:
bg-sidebar(#0e1016) →bg-page(#16181f) →bg-card(#1e2028) →bg-elevated(#2a2d38). Bordersborder-default/border-hover. - Text:
text-heading→text-primary→text-muted-foreground→text-muted. - Fonts: IBM Plex Sans (body), Bricolage Grotesque (heading, 700 weight for logo), JetBrains Mono (code).
- Logo: 30px gradient square (ember orange) + "ResolutionFlow" in Bricolage Grotesque. Assets in
brand-assets/,frontend/src/assets/brand/,frontend/public/icons/. - Mockups:
docs/mockups/(HTML). - Deprecated — do not use: glass-card, glass-stat,
bg-gradient-brand,backdrop-filter: blur(), ambient orbs, purple gradients, ember orange as accent, cyan as accent (cyan is info only).
Frontend patterns
- Component basics:
cn()from@/lib/utils, Lucide icons,Modal.tsxfor modals (mobile-responsiveitems-end sm:items-center+max-w-full sm:max-w-lg). - Types: Create in
types/, export fromtypes/index.ts,import type { T } from '@/types'. - Routing:
getTreeNavigatePath()/getTreeEditorPath()from@/lib/routing. Tree editor is/trees/new. All dashboard session clicks →/pilot/:idregardless ofsession_type. - Lazy routes:
lazyWithRetryfrom@/lib/lazyWithRetry.ts, notReact.lazy(auto-reload on stale chunks). - Public pages: raw
fetch()with full URL, NOTapiClient(which requires auth tokens). - Toast:
toast.warning()nottoast.warn(). Import from@/lib/toast— methods:success,error,warning,info. - Assistant chat: uses local React
useState, not Zustand. All three send paths (handleSend,sendPrefill,handleResumeNew) must callsetShowTaskLane(true)when response has actions/questions. - Chat backend wiring:
aiSessionsApi.sendChatMessage→/ai-sessions/{id}/chat→unified_chat_service.py. NOTassistant_chat_service.py(removed except retention settings). - FlowPilot: Actions live in page header (Resolve/Escalate/Share Update + overflow).
useBlockerfor active-session nav guard. "Pause & Leave" auto-pauses. - AI markers:
[QUESTIONS],[ACTIONS],[FORK],[DELTA]...[/DELTA](editor),[TREE_UPDATE](troubleshooting builder),[STEPS_UPDATE](procedural builder),[METADATA]. Parsed inunified_chat_service.py; conversation history stores strippeddisplay_content. If markers disappear: check system-prompt final reminder + per-user-message[SYSTEM: ...]injection in_call_anthropic_cached(). - Image uploads: paste/attach → Railway S3 via
uploadsApi.upload()→ resized bystorage_service.resize_image_for_vision()(Pillow, 1568px max, PNG→JPEG) → base64 → Claude multimodal blocks. Max 3/msg. Images NOT stored in history. - Async select-load-apply: guard with a ref (pattern in
AssistantChatPagecurrentChatRef). Update synchronously on every selection change; after everyawait, bail out ifref.current !== thisId. - Editor-Embedded Flow Assist:
EditorAIPanel(320px side panel) +useEditorAI. Ghost nodes via_suggestion: true. Route actions viasettings.get_model_for_action(). - Script Builder:
/script-builder, chat-style. BackendScriptBuilderSession,script_builder_service.py, endpoints/scripts/builder/. FlowPilot handoff viaaction_type: "open_script_builder"+sessionStorage. - Intake form field schema:
variable_name+field_type(NOTname/type). - Node field priority (copilot, summaries):
title→question→description→content→label. - Procedural sessions auto-start on page load (no intake/Start screen). Troubleshooting flows DO have a start screen.
Critical lessons
Lessons 1-40 archived to docs/LESSONS-ARCHIVE.md — fixes baked into the codebase. Grep the archive when an error message or symptom is unfamiliar, or after two failed attempts at resolving an issue. Don't pre-load for routine work.
Backend / data
- APScheduler interval jobs always
max_instances=1— without it, overlapping runs reprocess records (TOCTOU). get_dbrolls back on exception — never remove theawait session.rollback(), or one failed request poisons the connection withInFailedSQLTransactioncascading.- Startup routines on tenant-isolated tables must use
_admin_session_factory(), notget_db(). Phase 4 RLS has noapp.current_account_idset at startup.get_service_account_idis safe (reads cachedapp.state). - Backfill migrations adding
account_id: grep ALLModelClass(sites in service code to verifyaccount_id=is passed. SQLAlchemy acceptsNonesilently — Phase 4 RLS WITH CHECK surfaces the problem at runtime asInsufficientPrivilegeError: new row violates row-level security policy. tree_shares.account_id = tree.account_id, nevercurrent_user.account_id. A super_admin sharing another tenant's tree must produce the share in the tree owner's tenant, or it becomes invisible post-RLS.- Global tables (no
account_id, never in RLS migrations):script_categories,platform_steps,template_trees,plan_feature_defaults,accounts. Scan at class level — one.pyfile can hold multiple classes with different columns (e.g.ScriptCategoryvsScriptTemplate). ai_sessions.statusis VARCHAR(30) — fitsrequesting_escalation(23 chars). Migrationf0aad74ea51bwidened from 20.- PostgreSQL
func.sum(case(...))returnsDecimalvia asyncpg — cast toint()before Pydanticdict[str, Any]. - Enhancement / branch_addition proposals need
modified_flow_datavia "Edit & Publish" — backend 400 on direct approve. Onlynew_flowsupports direct approve. - Adding email types: static async method on
EmailServiceincore/email.py. Fire-and-forget from endpoints (log errors, don't fail the request).
AI / FlowPilot
- Anthropic SDK
max_retries=1— default of 2 can take 3× the timeout. - Model tier routing:
settings.get_model_for_action(action_type). Always alias form (claude-sonnet-4-6). - FlowPilot must ask GUI-vs-script before suggesting either when both are viable — see
FLOWPILOT_SYSTEM_PROMPTinflowpilot_engine.py. - Telemetry events to grep:
anthropic.cache(prompt-cache hit/create),mcp.turn(per-turn MCP availability),mcp.fallback(MCP silent-retry fired). - Don't put literal payloads in system prompts. Bit us twice in one day: a worked
[QUESTIONS]example with literal "Outlook + jsmith" content, and a full DNS troubleshooting tree, both caused Claude to recite that content on unrelated tickets — the symptom looked like task-lane state leaking across chats. The fix is structural: every output example in a system prompt uses<placeholder>syntax ({"text": "<one short, specific question>"}), never literal field values. Real-looking format examples live in few-shot messages (separate file, separate code path), not system prompts. Guardrail:tests/test_prompt_anti_parrot.pyscans every*_PROMPT/*_SCHEMA/*_PROTOCOL/*_FORMATconstant inapp/services/andapp/core/; CI fails when a marker block contains a literal JSON value or when a known leaked token (jsmith, DC01, ADSync, Dnscache, etc.) appears anywhere in a prompt.
Frontend / UI
- Flex height chain: every ancestor from
app-shellgrid to React Flow canvas needsflex+flex-1+min-h-0orh-full. Missingflexcollapses to 0. Same rule for FlowPilot action bar and any tall scroller. - React Flow CSS in Tailwind v4: import in
index.css, not component JS. Override dark theme via--xy-*CSS vars. text-secondaryrenders invisible on dark — Tailwind v4 maps it to--color-secondary(a surface color). Usetext-muted-foregroundfor readable secondary text. Avoidtext-mutedfor body — labels only.bg-accentis electric blue — never for code/kbd. Usebg-white/[0.12] border border-white/[0.06]for inline code,bg-white/[0.08]for kbd. Accent reserved for interactive elements.landing.cssuses self-contained--lp-*vars — nevervar(--color-*)theme tokens (they resolve incorrectly outside the app shell).- Never
transition: all— list properties explicitly, or layout props animate and jank. - Date range filter end dates:
setHours(23, 59, 59, 999)before sending, or the day's items are excluded. For string-based date inputs, appendT23:59:59.999Z. - TopBar search: full bar
hidden sm:block, icon buttonsm:hidden— both open CommandPalette. - Hover pop-out cards: scrim
pointer-events-none, expanded card has its own click handler atz-50, dismiss viaonMouseLeaveon wrapper. Never put handlers on the scrim. tsc -bin Dockerfile is stricter thantsc --noEmit— enforcesnoUnusedLocals/noUnusedParametersas hard errors. Check IDE yellow squiggles before pushing.- Dashboard prefill auto-submits via
useEffect+prefillHandledRefguard — no double-enter. - Global Axios 5xx interceptor fires before component
.catch()— fix optional-data endpoints at the source (return[]/{}on provider failure), not in the component. - Playwright strict mode: scope selectors to avoid sidebar/main ambiguity. Use
getByRole('heading', { name })or.animate-scale-inlocators, not baregetByText().
Env / infra
- Node 20.19+ required (Vite 7).
nvm use 20orPATH="$HOME/.nvm/versions/node/v20.19.0/bin:$PATH". - Railway backend service is
patherly, DB namerailway. Public Postgres proxy:interchange.proxy.rlwy.net:45797. - Railway Object Storage bucket
resolutionflow-uploads. Env varsSTORAGE_*. boto3 instorage_service.py. Dockerfile needs Pillow +libjpeg-dev/zlib1g-dev. - PostHog:
PostHogProvider+posthog.init()inmain.tsx. Helpers inlib/analytics.ts. Env:VITE_PUBLIC_POSTHOG_KEY,VITE_PUBLIC_POSTHOG_HOST.identifyUser()inauthStore.fetchUser(),resetAnalytics()on logout. - bun PATH on devserver01:
BUN_INSTALL="$HOME/.bun",PATH="$BUN_INSTALL/bin:$PATH". Playwright Chromium needslibatk1.0-0 libatk-bridge2.0-0 libcups2 libxkbcommon0 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2. - Full-stack change: trace schema → endpoint → API client → hook → store → UI. Don't assume one end proves the other.
- Dev env — see DEV-ENV.md for current topology,
REPO_ROOTrequirement when compose runs inside a container, ViteallowedHosts, linuxserver.iogroup_add+ custom-cont-init.d workaround,docker compose upno-op-on-unchanged-hash gotcha.
Quick reference
| What | Where |
|---|---|
| Detailed status | CURRENT-STATE.md |
| Roadmap | 03-DEVELOPMENT-ROADMAP.md |
| Design system | DESIGN-SYSTEM.md |
| Dev env | DEV-ENV.md |
| Archived lessons | docs/LESSONS-ARCHIVE.md |
| ConnectWise API | docs/connectwise/ |
| GitHub issues | gh issue list --state open |
| Local API docs | http://localhost:8000/api/docs |
| Handoff system | .ai/README.md |