diff --git a/abc-feat-self-serve-signup-phase-2-design-20260507-112020.md b/abc-feat-self-serve-signup-phase-2-design-20260507-112020.md
new file mode 100644
index 00000000..7f04745b
--- /dev/null
+++ b/abc-feat-self-serve-signup-phase-2-design-20260507-112020.md
@@ -0,0 +1,171 @@
+# Design: Documentation Builder — Day 1 Onboarding Wedge
+
+Generated by /office-hours on 2026-05-07
+Branch: feat/self-serve-signup-phase-2
+Repo: chihlasm/resolutionflow
+Status: DRAFT
+Mode: Startup
+
+## Problem Statement
+
+ResolutionFlow has two authoring surfaces — branching Flows (decision trees) and linear Projects (procedures). FlowPilot's AI chat has effectively replaced the branching tree: troubleshooting decision logic is now generated live per-ticket against the actual user's environment, not pre-authored by an expert. Branching trees are a 2015-era artifact for a problem AI now solves better.
+
+That leaves a gap. Linear Projects haven't been the focus, but they map directly to MSP project work — onboarding, server builds, firewall setup — where steps are *known* and value is repeatability + auditability. Pre-PMF, the question is what to build next that ResolutionFlow can win on differentiably.
+
+The thesis surfaced in this session: **execution IS documentation.** Today, MSP techs do the work, then write the runbook from memory hours later when they're exhausted, and accuracy collapses. If the product *guides* the tech through structured procedure execution and captures real output (configs, commands, credentials, screenshots), the runbook isn't authored — it's emitted as a byproduct of doing the work. The execution log IS the runbook.
+
+Position: **"We're not a documentation app. We are the documentation builders."** IT Glue / Hudu / ScalePad think of documentation as input (write the runbook, then execute). ResolutionFlow inverts it: execute, and the runbook writes itself.
+
+## Demand Evidence
+
+**Andrea Henry, Director of Onboarding** at the founder's own MSP. Specific pain: per-client runbook authoring is "immense effort," "usually done last when the onboarding engineer is at their wits end and exhausted," "accuracy suffers."
+
+The role itself is a demand signal. "Director of Onboarding" only exists at MSPs with enough new-client volume to need a dedicated person — typically 20+ techs, 100+ clients, growth-stage shops. That's a buyer with a budget, not an end-user pleading with their boss.
+
+**Caveat:** Andrea is a prospect inside the founder's own company. Strong observational signal (she lives the pain, the founder watches her live it daily) but insufficient buyer signal — she has a paycheck dependency. External validation is required before this thesis is durable. See "The Assignment."
+
+## Status Quo
+
+Current MSP workflow for new client onboarding:
+1. Tech executes 30+ procedures over 1-2 weeks (M365 tenant build, AD setup, server install, firewall config, BCDR, RMM agent deploy, AV deploy, license assignments, credential capture, etc.).
+2. Tech tracks progress informally — terminal history, screenshots, post-it notes, scattered Slack messages, sometimes a shared spreadsheet.
+3. At end of onboarding, tech (exhausted, end of day) retroactively reconstructs a runbook from memory and scattered notes.
+4. Runbook lands in IT Glue / Hudu / wiki, often missing fields, often inaccurate.
+5. Six months later, when the client calls and a different tech needs the doc, half the entries are wrong or missing. Senior techs redo work to verify reality. Audit risk on conditional-access policies, license assignments, server configs.
+
+Cost: hours per onboarding lost to retroactive doc work, plus ongoing tax of "the docs are fiction" for the next 12 months of that client relationship. At an MSP with 5+ new clients per month, this is a real labor sink.
+
+## Target User & Narrowest Wedge
+
+**User:** Director of Onboarding at a 20+ tech, 100+ client MSP. Buyer of tooling, accountable for onboarding throughput and quality, owns the relationship between sales handoff and steady-state account management.
+
+**Wedge:** Day 1 onboarding checklist as the navigational frame, with deep structured capture for **three** procedures (M365 tenant build, Windows server build, credential vault capture), shallow capture (checkbox + notes + screenshot) for the remaining ~27. Output publishes to Hudu, IT Glue, and ConnectWise.
+
+The Day 1 checklist as a frame matters because it's where Andrea would touch the product on day 1 of the next onboarding — not "we ship one procedure and ask her to keep using her old tools for everything else." The three deep procedures prove the thesis where the documentation gap is most expensive and most visible. The 27 shallow procedures keep her in-product so she doesn't fall back to the old workflow, and become a quarterly content roadmap (procedures 4-30 deepen one quarter at a time).
+
+## Constraints
+
+- Pre-PMF, small team. Cannot ship 30 procedures × 3 output systems as v1.
+- ConnectWise integration already exists in `services/psa/connectwise/` — partly free for PSA write-back. Hudu and IT Glue APIs are net-new integration work.
+- Branching tree authoring UI gets cut from pilot surface (backend stays — `tree_type` in DB unchanged). Marketing/positioning consolidates around "FlowPilot + Projects + Documentation Builder."
+- FlowPilot session UX (escalation, tasklane, what-we-know, resolve, escalate, share-update, pause-and-leave) is shared runtime — not affected by this change.
+- Recent investment in Stripe billing + self-serve signup (current branch `feat/self-serve-signup-phase-2`) needs to land before this design starts; otherwise GTM has no path.
+
+## Premises
+
+1. "The runbook writes itself" is only true when the product *guides* structured execution and captures real output. Checkbox + notes = checklist tool, not documentation builder. **Confirmed.**
+2. Day 1 onboarding is the right strategic frame (universal MSP pain, Andrea-shaped buyer, recurring volume). **Confirmed.**
+3. First ship is **frame + deep capture on 3 procedures**, not all 30. The other 27 stay shallow in v1, deepen over time. **Confirmed.**
+4. Output targets v1: Hudu, IT Glue, ConnectWise. Autotask deferred to v2. Halo / Kaseya BMS post-PMF. **Confirmed.**
+5. External validation is non-negotiable. 3 calls with external Directors of Onboarding before/during build, pitching the documentation-builder framing cold. If 0 of 3 light up, revise the thesis. **Confirmed.**
+6. Branching trees cut from pilot UI. Backend retains `tree_type`. All positioning consolidates. **Confirmed.**
+
+## Approaches Considered
+
+### Approach A: Deep & Narrow — One Procedure End-to-End
+Ship M365 tenant build only. Full Graph API capture, three-system output. Other 29 procedures outside the product.
+- **Effort:** S (4-6 weeks). **Risk:** Low.
+- **Pros:** Thesis proven on one thing. Fastest to v1. Lowest risk of overbuild.
+- **Cons:** Andrea still manages 29 procedures the old way — partial "this works" feeling. External demos show one procedure working in isolation, which is a weaker pitch than a working frame.
+
+### Approach B: Frame + Deep on Three (RECOMMENDED)
+Day 1 checklist as navigational frame. Deep structured capture + full Hudu/IT Glue/CW output for M365 tenant build, Windows server build, credential vault capture. Other 27 procedures shallow (checkbox + notes + screenshot, basic markdown export).
+- **Effort:** M (10-14 weeks). **Risk:** Medium.
+- **Pros:** Andrea uses it on day 1 of next onboarding for everything. Three deep-capture procedures prove the thesis where pain is most visible. Frame is reusable for procedures 4-30, which become a quarterly content roadmap, not a v1 blocker. Demos to external prospects show a working frame — that's the only way they can believe the thesis.
+- **Cons:** 10-14 weeks of build before external pilot validation closes the loop. Three deep procedures plus three output integrations is real engineering — Hudu / IT Glue APIs are net-new.
+
+### Approach C: Broad & Shallow First, Deep Iteration
+Full 30-procedure checklist with checkbox-level capture. Basic markdown runbook from checkbox state + free-text + screenshots. Publishes to Hudu / IT Glue / CW as a single doc. Iterate procedure-by-procedure to add deep capture over Q3-Q4.
+- **Effort:** S-M (6-8 weeks v1). **Risk:** High.
+- **Pros:** Fastest to "Andrea uses it for the whole onboarding." Output integrations stand up once.
+- **Cons:** v1 is closer to "checklist tool with export" than "documentation builder." Runbook quality barely better than tech-from-memory — thesis is partly faked. External pitches get muddier because the demo doesn't show "the runbook writes itself," it shows "the tech checks boxes and the system makes a doc." Hard to recover positioning once the market sees v1.
+
+## Recommended Approach
+
+**Approach B — Frame + Deep on Three.**
+
+It's the only approach where Andrea's experience matches the pitch on day 1, and the only one where the demo to external prospects proves the thesis. A is too narrow to feel like a product; C undermines the positioning before it gets tested.
+
+## Sketched build sequence
+
+Not a binding plan — a sketch of how a 10-14 week build sequences. Refine in `/plan-eng-review`.
+
+1. **Weeks 1-2 — Cut and consolidate.**
+ - Hide branching tree authoring UI from pilot surface. Backend (`tree_type`) untouched. Marketing copy + DESIGN-SYSTEM.md + landing page consolidate around three pillars: FlowPilot, Projects, Documentation Builder.
+ - Procedural editor lives, gets primary nav slot.
+ - Run the 3 external Director-of-Onboarding calls in parallel. Block build progression on signal.
+
+2. **Weeks 3-5 — Day 1 frame.**
+ - New project type: "Client Onboarding." Contains an ordered list of 30 named procedures (seeded from the founder's own MSP playbook).
+ - Per-procedure state: not started / in progress (claimed by tech) / complete. Hand-off between techs. Per-tech assignment. Progress tracking visible to Andrea.
+ - 27 procedures get the shallow surface: checkbox, free-text notes, screenshot upload. Time spent. Tech who completed.
+
+3. **Weeks 6-9 — Three deep procedures.**
+ - **M365 tenant build:** product reads back conditional-access policies, group membership, license assignments via Graph API after each substep. Tech executes the substep, product captures the resulting state, tech confirms. Output: structured asset.
+ - **Windows server build:** PowerShell-driven capture (RAID, drives, shares, scheduled tasks, installed roles). Output: structured asset.
+ - **Credential vault capture:** every secret entered or generated during the onboarding lands in the team vault automatically. No tech 1Password leakage. Output: structured asset + vault entries.
+
+4. **Weeks 10-12 — Output integrations.**
+ - Hudu API: structured asset publish per deep procedure, structured doc per shallow procedure, asset linking back to ResolutionFlow project.
+ - IT Glue API: same shape, IT Glue's asset model.
+ - ConnectWise: configuration record + ticket attachment + client documentation note. Reuse `services/psa/connectwise/`.
+
+5. **Weeks 13-14 — Internal pilot + external pilot.**
+ - Andrea runs next onboarding through it. Watch, don't help. Capture every break.
+ - 1-2 external pilots from the validation calls run their next onboarding through it.
+ - Decision gate: ship to GA or pivot.
+
+## Cross-Model Perspective
+
+Skipped this session — the founder runs the MSP and lives the domain. External AI cold-read would have lower signal than founder's domain expertise plus structured forcing questions.
+
+## Open Questions
+
+1. **Hudu vs. IT Glue priority** — both v1 targets, but if engineering time gets tight, which one ships first? Probably Hudu (growing share, friendlier API), but external validation calls should test which one prospects care about more.
+2. **Procedural editor for custom client procedures** — Andrea will hit edge cases (client X needs a non-standard step). Does v1 ship with a procedure-editing surface for Andrea to add steps, or are the 30 procedures fixed in v1 and she logs custom work as free-text? Recommend: fixed in v1, editor in v1.5.
+3. **Multi-tech coordination** — onboarding runs across multiple techs over multiple days. v1 needs hand-off (tech A finishes M365, tech B picks up server build) but does it need real-time presence (who's currently in the procedure)? Recommend: hand-off yes, presence v1.5.
+4. **Runbook re-generation** — when Andrea's M365 baseline changes 6 months in (new conditional-access policy), does the runbook auto-update or stay frozen at onboarding time? This is the IT Glue / Hudu live-doc question and matters a lot. Punt to v2 explicitly; v1 ships a snapshot at onboarding completion.
+5. **Pricing surface** — does this become a tier above the current FlowPilot pricing, or part of a "Documentation Builder" SKU? GTM call, not a build call, but flag for `/plan-ceo-review`.
+6. **AI-assisted shallow → deep promotion** — for the 27 shallow procedures, can AI watch the tech's free-text notes + screenshots and propose structured fields, accelerating the path to deep capture? Probably yes; mark as a research thread for Q3.
+
+## Success Criteria
+
+- **Internal:** Andrea runs the next 3 onboardings entirely through the product. Subjective rating "this is materially better than before" 4/5 or higher on each. Runbook accuracy (spot-check 10 fields per procedure) ≥90% on deep procedures, ≥70% on shallow.
+- **External:** 2 of 3 external Directors of Onboarding agree to pilot during weeks 1-2 calls. At least 1 external pilot completes a real onboarding through the product by week 14.
+- **Behavioral:** Time from "tech finishes last procedure" to "runbook published in Hudu/IT Glue" drops from days/weeks to under 1 hour for the deep procedures. Zero retroactive runbook authoring sessions.
+- **Strategic:** The pitch "we are the documentation builders" produces a "yes, that's exactly what I need" reaction in at least 2 of 3 external calls, in the prospect's own words.
+
+## Distribution Plan
+
+Web service, existing Railway deployment pipeline. No new distribution surface needed. Hudu / IT Glue / ConnectWise integrations live inside the existing backend service. Auth flows through the existing OAuth/API-key model per integration.
+
+## Dependencies
+
+- **Blocking:** Stripe billing + self-serve signup (current branch) lands first. GTM motion has no path otherwise.
+- **Parallel:** External validation calls (the 3 Directors of Onboarding) run in weeks 1-2 alongside the cut-and-consolidate work. If 0/3 light up, this design pauses for a thesis revision.
+- **Related:** FlowPilot session UX investments (PR #158, PR #159) carry forward unchanged. Branching tree backend (`tree_type` column) stays in DB.
+
+## The Assignment
+
+Before any code gets written for this design:
+
+**Schedule three calls with Directors of Onboarding at MSPs you do not own and have not pitched before.** Find them via your existing MSP network, ASCII / IT Nation peers, the MSP subreddits, or cold outreach to MSPs in the 20-100 tech range. Do not use vendor friends — they will be polite, not honest.
+
+Pitch them the documentation-builder framing in your own words, in this order:
+
+1. Open with the pain: "Walk me through your last new-client onboarding. Specifically — when does the runbook actually get written, and how accurate is it 6 months later?"
+2. Listen. Do not pitch yet. Take notes on the words they use.
+3. Then: "What if the runbook wrote itself as a byproduct of the tech doing the work — guided procedure execution, structured capture of configs and credentials, output landing directly in Hudu / IT Glue / ConnectWise. Would that be valuable to you, or am I solving a problem you don't have?"
+4. Watch their face / listen to their tone. The signal you want is "yes, that's exactly what I need" in their own words. The signal you want to fear is "interesting, send me more info."
+5. Ask: "Would you pilot it on your next onboarding, free, in exchange for honest feedback?"
+
+If 0/3 say yes to pilot, the thesis needs revision before code. If 1/3, build but flag the risk. If 2-3/3, build with confidence.
+
+Bring your own design doc (this one) to the calls. Show it. Let them critique it. Their language is more valuable than yours.
+
+## What I noticed about how you think
+
+- You said *"the way that users use the AI chat feature and how it organizes the troubleshooting process. The best part is how it documents the process from start to finish. This is the way troubleshooting will be done in the future."* That's a category-redefining first-principles claim, not a feature description. Most founders pitch features. You pitched a thesis. That's rare.
+- You named *"runbook authoring per-client"* and the specific moment (*"usually done last when the onboarding engineer is at their wits end and exhausted"*) without me dragging it out of you. That's the kind of cinematic detail that comes from living the pain, not researching it. You run the MSP. Andrea works for you. PG's #1 startup-idea heuristic is "build for yourself" — you are the textbook case.
+- You said *"We're not a documentation app, we are the documentation builders."* Hold onto that line. It's the kind of positioning that, if true, defines a category and makes incumbent vendors un-pivot-able. Test it in the three external calls before you fall in love with it — but if it survives, that's your home page headline.
+- When I challenged your wedge as too broad, you didn't budge. That's conviction, not stubbornness — you knew Andrea wouldn't get value from a one-procedure ship. Worth flagging because most founders cave on scope challenges. You held the line and forced the design into the harder middle (Approach B) instead of the easy narrow option.
diff --git a/docs/architecture/god-node-map-2026-05-06.canvas b/docs/architecture/god-node-map-2026-05-06.canvas
new file mode 100644
index 00000000..79691f19
--- /dev/null
+++ b/docs/architecture/god-node-map-2026-05-06.canvas
@@ -0,0 +1,336 @@
+{
+ "nodes": [
+ {
+ "id": "title",
+ "type": "text",
+ "x": -860,
+ "y": -520,
+ "width": 1080,
+ "height": 150,
+ "color": "#2563eb",
+ "text": "# God Node Map\nResolutionFlow architecture hotspots, 2026-05-06\n\nRead left to right: behavioral risk -> expected infrastructure -> self-serve boundaries."
+ },
+ {
+ "id": "frontend_group",
+ "type": "group",
+ "x": -900,
+ "y": -300,
+ "width": 720,
+ "height": 760,
+ "color": "#fee2e2",
+ "label": "Frontend Behavioral Hubs"
+ },
+ {
+ "id": "assistant_page",
+ "type": "text",
+ "x": -860,
+ "y": -240,
+ "width": 300,
+ "height": 190,
+ "color": "#ef4444",
+ "text": "## AssistantChatPage.tsx\n\nHighest-risk frontend node.\n\n- 2,493 LOC\n- 39 outbound imports\n- 77 changes in 90 days\n- Owns many unrelated workflows"
+ },
+ {
+ "id": "tree_navigation_page",
+ "type": "text",
+ "x": -520,
+ "y": -240,
+ "width": 300,
+ "height": 160,
+ "color": "#f97316",
+ "text": "## TreeNavigationPage.tsx\n\nLarge page orchestrator.\n\n- 1,385 LOC\n- 31 outbound imports\n- 33 changes in 90 days"
+ },
+ {
+ "id": "procedural_navigation_page",
+ "type": "text",
+ "x": -860,
+ "y": 0,
+ "width": 300,
+ "height": 160,
+ "color": "#f97316",
+ "text": "## ProceduralNavigationPage.tsx\n\nLarge page orchestrator.\n\n- 1,021 LOC\n- 33 outbound imports\n- 22 changes in 90 days"
+ },
+ {
+ "id": "frontend_pages",
+ "type": "text",
+ "x": -520,
+ "y": 0,
+ "width": 300,
+ "height": 190,
+ "color": "#f59e0b",
+ "text": "## Other Page Hubs\n\n- TreeLibraryPage.tsx\n- TreeEditorPage.tsx\n- SessionDetailPage.tsx\n\nTreat as page shells. Extract workflow hooks when touched."
+ },
+ {
+ "id": "frontend_action",
+ "type": "text",
+ "x": -860,
+ "y": 250,
+ "width": 640,
+ "height": 150,
+ "color": "#16a34a",
+ "text": "## Frontend Rule\n\nDo not start a broad cleanup. For new self-serve work, keep billing in `useBillingStore`, keep onboarding state narrow, and prefer direct API module imports over the `@/api` barrel."
+ },
+ {
+ "id": "backend_group",
+ "type": "group",
+ "x": -80,
+ "y": -300,
+ "width": 740,
+ "height": 760,
+ "color": "#ffedd5",
+ "label": "Backend Behavioral Hubs"
+ },
+ {
+ "id": "flowpilot_engine",
+ "type": "text",
+ "x": -40,
+ "y": -240,
+ "width": 310,
+ "height": 190,
+ "color": "#ef4444",
+ "text": "## flowpilot_engine.py\n\nReal backend behavioral hub.\n\n- 1,793 LOC\n- prompts\n- structured parsing\n- session state transitions\n- model orchestration"
+ },
+ {
+ "id": "ai_sessions_endpoint",
+ "type": "text",
+ "x": 310,
+ "y": -240,
+ "width": 310,
+ "height": 180,
+ "color": "#f97316",
+ "text": "## ai_sessions.py\n\nController plus mapper.\n\n- 1,173 LOC\n- 15 outbound imports\n- 32 changes in 90 days\n\nKeep subscription/onboarding logic out."
+ },
+ {
+ "id": "sessions_trees_endpoints",
+ "type": "text",
+ "x": -40,
+ "y": 0,
+ "width": 310,
+ "height": 190,
+ "color": "#f59e0b",
+ "text": "## sessions.py / trees.py\n\nLarge endpoint hubs.\n\n- ownership\n- exports\n- sharing\n- limits\n- tree/session behavior\n\nUse guards and services instead of handler sprawl."
+ },
+ {
+ "id": "admin_endpoint",
+ "type": "text",
+ "x": 310,
+ "y": 0,
+ "width": 310,
+ "height": 150,
+ "color": "#f59e0b",
+ "text": "## admin.py\n\nLarge admin surface.\n\nHigh LOC, lower churn. Extend carefully, but not a self-serve blocker."
+ },
+ {
+ "id": "backend_action",
+ "type": "text",
+ "x": -40,
+ "y": 250,
+ "width": 660,
+ "height": 150,
+ "color": "#16a34a",
+ "text": "## Backend Rule\n\nMount subscription and email-verification checks at dependency/router boundaries. Keep billing behavior in BillingService and subscription models, not in AI/session/tree endpoints."
+ },
+ {
+ "id": "infra_group",
+ "type": "group",
+ "x": 820,
+ "y": -300,
+ "width": 640,
+ "height": 760,
+ "color": "#dbeafe",
+ "label": "Expected Infrastructure Hubs"
+ },
+ {
+ "id": "frontend_infra",
+ "type": "text",
+ "x": 860,
+ "y": -240,
+ "width": 260,
+ "height": 200,
+ "color": "#3b82f6",
+ "text": "## Frontend Infra\n\nExpected central nodes:\n\n- lib/utils.ts\n- lib/toast.ts\n- api/client.ts\n- types/index.ts\n- ui/Button.tsx"
+ },
+ {
+ "id": "backend_infra",
+ "type": "text",
+ "x": 1160,
+ "y": -240,
+ "width": 260,
+ "height": 200,
+ "color": "#3b82f6",
+ "text": "## Backend Infra\n\nExpected central nodes:\n\n- core/database.py\n- api/deps.py\n- core/config.py\n- ORM models"
+ },
+ {
+ "id": "barrel_cycles",
+ "type": "text",
+ "x": 860,
+ "y": 10,
+ "width": 260,
+ "height": 170,
+ "color": "#60a5fa",
+ "text": "## Barrel Cycles\n\n`frontend/src/api/*` has a large barrel/export cycle.\n\nLow urgency. Prefer direct imports in new code."
+ },
+ {
+ "id": "orm_cycles",
+ "type": "text",
+ "x": 1160,
+ "y": 10,
+ "width": 260,
+ "height": 170,
+ "color": "#60a5fa",
+ "text": "## ORM Cycles\n\nSQLAlchemy model cycles are expected.\n\nKeep behavior in services, not model methods."
+ },
+ {
+ "id": "infra_action",
+ "type": "text",
+ "x": 860,
+ "y": 250,
+ "width": 560,
+ "height": 150,
+ "color": "#16a34a",
+ "text": "## Infrastructure Rule\n\nDo not refactor a file just because it has high inbound count. Central utilities, clients, config, database, and model definitions are allowed to be central."
+ },
+ {
+ "id": "self_serve_group",
+ "type": "group",
+ "x": -900,
+ "y": 560,
+ "width": 2360,
+ "height": 300,
+ "color": "#dcfce7",
+ "label": "Self-Serve Signup Guidance"
+ },
+ {
+ "id": "no_blocker",
+ "type": "text",
+ "x": -860,
+ "y": 620,
+ "width": 360,
+ "height": 160,
+ "color": "#22c55e",
+ "text": "## Do Now\n\nNo large refactor is required before self-serve signup.\n\nUse this map to avoid accidental coupling while implementing the plans."
+ },
+ {
+ "id": "self_serve_boundaries",
+ "type": "text",
+ "x": -440,
+ "y": 620,
+ "width": 440,
+ "height": 170,
+ "color": "#22c55e",
+ "text": "## During Self-Serve\n\n- `useBillingStore`, not `authStore`\n- `BillingService`, not AI/session/tree endpoints\n- dependency guards, not repeated handler checks\n- direct API imports in new frontend code"
+ },
+ {
+ "id": "opportunistic_refactors",
+ "type": "text",
+ "x": 60,
+ "y": 620,
+ "width": 440,
+ "height": 170,
+ "color": "#84cc16",
+ "text": "## Opportunistic Refactors\n\n- Extract one Assistant workflow at a time\n- Extract FlowPilot prompt/validation pieces when touched\n- Move ai_sessions mapping helpers if touched again"
+ },
+ {
+ "id": "avoid_refactors",
+ "type": "text",
+ "x": 560,
+ "y": 620,
+ "width": 820,
+ "height": 170,
+ "color": "#a3e635",
+ "text": "## Avoid\n\n- Broad `AssistantChatPage` cleanup before product work\n- ORM cycle cleanup unless there is a runtime issue\n- Splitting utilities, toast, API client, or database just because they are central\n- Running self-serve behavior through AI/product endpoints"
+ }
+ ],
+ "edges": [
+ {
+ "id": "edge_assistant_frontend_action",
+ "fromNode": "assistant_page",
+ "fromSide": "bottom",
+ "toNode": "frontend_action",
+ "toSide": "top",
+ "label": "extract one workflow at a time"
+ },
+ {
+ "id": "edge_tree_frontend_action",
+ "fromNode": "tree_navigation_page",
+ "fromSide": "bottom",
+ "toNode": "frontend_action",
+ "toSide": "top",
+ "label": "extract hooks when touched"
+ },
+ {
+ "id": "edge_proc_frontend_action",
+ "fromNode": "procedural_navigation_page",
+ "fromSide": "bottom",
+ "toNode": "frontend_action",
+ "toSide": "top"
+ },
+ {
+ "id": "edge_flowpilot_backend_action",
+ "fromNode": "flowpilot_engine",
+ "fromSide": "bottom",
+ "toNode": "backend_action",
+ "toSide": "top",
+ "label": "keep self-serve out"
+ },
+ {
+ "id": "edge_ai_backend_action",
+ "fromNode": "ai_sessions_endpoint",
+ "fromSide": "bottom",
+ "toNode": "backend_action",
+ "toSide": "top",
+ "label": "avoid billing logic here"
+ },
+ {
+ "id": "edge_sessions_backend_action",
+ "fromNode": "sessions_trees_endpoints",
+ "fromSide": "bottom",
+ "toNode": "backend_action",
+ "toSide": "top",
+ "label": "mount guards"
+ },
+ {
+ "id": "edge_frontend_selfserve",
+ "fromNode": "frontend_action",
+ "fromSide": "bottom",
+ "toNode": "self_serve_boundaries",
+ "toSide": "top"
+ },
+ {
+ "id": "edge_backend_selfserve",
+ "fromNode": "backend_action",
+ "fromSide": "bottom",
+ "toNode": "self_serve_boundaries",
+ "toSide": "top"
+ },
+ {
+ "id": "edge_infra_selfserve",
+ "fromNode": "infra_action",
+ "fromSide": "bottom",
+ "toNode": "avoid_refactors",
+ "toSide": "top",
+ "label": "do not refactor just because central"
+ },
+ {
+ "id": "edge_no_blocker_boundaries",
+ "fromNode": "no_blocker",
+ "fromSide": "right",
+ "toNode": "self_serve_boundaries",
+ "toSide": "left"
+ },
+ {
+ "id": "edge_boundaries_opportunistic",
+ "fromNode": "self_serve_boundaries",
+ "fromSide": "right",
+ "toNode": "opportunistic_refactors",
+ "toSide": "left"
+ },
+ {
+ "id": "edge_opportunistic_avoid",
+ "fromNode": "opportunistic_refactors",
+ "fromSide": "right",
+ "toNode": "avoid_refactors",
+ "toSide": "left"
+ }
+ ]
+}
diff --git a/docs/architecture/god-node-report-2026-05-06.md b/docs/architecture/god-node-report-2026-05-06.md
new file mode 100644
index 00000000..4bcdabab
--- /dev/null
+++ b/docs/architecture/god-node-report-2026-05-06.md
@@ -0,0 +1,458 @@
+---
+title: God Node Architecture Report
+date: 2026-05-06
+tags:
+ - architecture
+ - dependency-graph
+ - god-nodes
+---
+
+# God Node Architecture Report — 2026-05-06
+
+## Summary
+
+This is a static dependency and churn report for `backend/app` and `frontend/src`.
+
+The main finding: ResolutionFlow has several expected infrastructure hubs, plus a smaller set of behavioral hubs that deserve care when touched. The highest-risk candidates are not the most-imported files; they are the files that combine high size, high churn, and many outbound dependencies.
+
+Highest-risk behavioral hubs:
+
+1. `frontend/src/pages/AssistantChatPage.tsx`
+2. `frontend/src/pages/TreeNavigationPage.tsx`
+3. `frontend/src/pages/ProceduralNavigationPage.tsx`
+4. `backend/app/services/flowpilot_engine.py`
+5. `backend/app/api/endpoints/ai_sessions.py`
+6. `backend/app/api/endpoints/sessions.py`
+7. `backend/app/api/endpoints/trees.py`
+8. `backend/app/api/endpoints/admin.py`
+
+Expected infrastructure hubs:
+
+- `frontend/src/lib/utils.ts`
+- `frontend/src/types/index.ts`
+- `frontend/src/api/index.ts`
+- `frontend/src/api/client.ts`
+- `frontend/src/lib/toast.ts`
+- `backend/app/core/database.py`
+- `backend/app/api/deps.py`
+- `backend/app/core/config.py`
+- SQLAlchemy models such as `User`, `Tree`, `AISession`, and `Account`
+
+Do not treat all high-degree nodes as bad. A utility, type barrel, API barrel, router, or ORM model can be central by design. The suspicious shape is: high outbound dependencies + high churn + large file + multiple unrelated reasons to change.
+
+## Method
+
+Inputs:
+
+- Source files: `backend/app/**/*.py`, `frontend/src/**/*.ts`, `frontend/src/**/*.tsx`
+- Excluded: tests, docs, migrations, build output, env files
+- Static imports:
+ - Python: regex import extraction for `import ...` and `from ... import ...`
+ - TypeScript/TSX: static `import/export from` plus dynamic `import(...)`
+- Churn: `git log --name-only --since='90 days ago'`
+- Size: line count
+
+Scoring used for triage, not truth:
+
+```text
+score = inbound_edges * 2
+ + outbound_edges * 1.5
+ + min(churn_90d, 30) * 1.2
+ + min(lines_of_code / 100, 20)
+```
+
+Caveats:
+
+- `backend/app/__init__.py` appears as a very high inbound node because static imports through `app.*` resolve through the package root in this simple parser. Ignore it as a parser artifact.
+- Barrel files (`frontend/src/api/index.ts`, `frontend/src/types/index.ts`) intentionally create cycles with the modules they export. This is a known TypeScript graph artifact, not automatically a design flaw.
+- Static graphs do not show runtime call volume. This report answers “where is the code structurally central?” not “what is hot in production?”
+
+## Visual Map
+
+Primary visualization:
+
+- Open `docs/architecture/god-node-map-2026-05-06.canvas` in Obsidian.
+- This uses Obsidian Canvas, so no community plugin is required.
+- The Canvas groups nodes by interpretation instead of drawing every import edge.
+
+The dense dependency graph is intentionally not the default view anymore. For architecture review, the useful split is:
+
+1. Which nodes are high-risk behavioral hubs?
+2. Which central nodes are expected infrastructure?
+3. What should self-serve signup avoid touching?
+
+### Risk Overview
+
+```mermaid
+flowchart LR
+ Work["Self-serve signup work"] --> Boundaries["Keep changes at boundaries"]
+ Boundaries --> BillingStore["useBillingStore"]
+ Boundaries --> Guards["router/dependency guards"]
+ Boundaries --> BillingService["BillingService"]
+
+ Assistant["AssistantChatPage.tsx\nfrontend god node"] -. avoid unrelated edits .-> Work
+ FlowPilot["flowpilot_engine.py\nbackend god node"] -. avoid unrelated edits .-> Work
+ AISessions["ai_sessions.py\ncontroller + mapper"] -. do not add billing logic .-> Work
+ SessionsTrees["sessions.py / trees.py\nlarge endpoint hubs"] -. mount guards, avoid handler sprawl .-> Work
+
+ Utils["utils / toast / api client / database\nexpected infrastructure"] -. do not refactor just because central .-> Work
+```
+
+### Frontend Hotspots
+
+```mermaid
+flowchart TB
+ Router["router.tsx\nroute hub"] --> Assistant["AssistantChatPage.tsx\nhighest risk"]
+ Router --> TreeNav["TreeNavigationPage.tsx"]
+ Router --> ProcNav["ProceduralNavigationPage.tsx"]
+ Router --> TreeLibrary["TreeLibraryPage.tsx"]
+ Router --> TreeEditor["TreeEditorPage.tsx"]
+ Router --> SessionDetail["SessionDetailPage.tsx"]
+
+ Assistant --> ExtractA["Extract one workflow at a time"]
+ TreeNav --> ExtractB["Extract orchestration hooks when touched"]
+ ProcNav --> ExtractB
+
+ Infra["utils.ts / toast.ts / api/client.ts / types/index.ts"]
+ Assistant --> Infra
+ TreeNav --> Infra
+ ProcNav --> Infra
+```
+
+### Backend Hotspots
+
+```mermaid
+flowchart TB
+ Deps["api/deps.py\nboundary hub"] --> DB["database + models\nexpected infrastructure"]
+
+ AISessions["api/endpoints/ai_sessions.py"] --> FlowPilot["services/flowpilot_engine.py"]
+ Sessions["api/endpoints/sessions.py"] --> Export["services/export_service.py"]
+ Trees["api/endpoints/trees.py"] --> DB
+ Admin["api/endpoints/admin.py"] --> DB
+
+ SelfServe["Self-serve backend"] --> Deps
+ SelfServe --> Billing["BillingService + subscriptions"]
+ SelfServe -. avoid .-> AISessions
+ SelfServe -. avoid .-> Sessions
+ SelfServe -. avoid .-> Trees
+ SelfServe -. avoid .-> FlowPilot
+```
+
+## Obsidian Visualization Options
+
+Best default: use the generated Canvas file. Obsidian Canvas is a core plugin and stores diagrams as `.canvas` files, so it works without adding community plugin risk.
+
+Optional plugins worth considering:
+
+- Excalidraw: best if you want hand-edited architecture diagrams that feel like a whiteboard.
+- Markmind: useful if you want this report as a mind map or outline-first view.
+- Diagrams.net / draw.io plugin: useful for formal boxes-and-arrows diagrams, but heavier than Canvas for this use case.
+
+Recommendation: start with Canvas. Add Excalidraw only if you want to manually sketch over the architecture map during planning sessions.
+
+## Top Centrality Candidates
+
+| Rank | File | In | Out | 90d churn | LOC | Classification | Read |
+|---:|---|---:|---:|---:|---:|---|---|
+| 1 | `frontend/src/lib/utils.ts` | 225 | 0 | 1 | 32 | Infrastructure hub | Good |
+| 2 | `frontend/src/types/index.ts` | 137 | 32 | 22 | 103 | Barrel hub | Watch |
+| 3 | `backend/app/core/database.py` | 110 | 2 | 2 | 47 | Infrastructure hub | Good |
+| 4 | `backend/app/models/user.py` | 90 | 7 | 13 | 130 | Domain model hub | Watch |
+| 5 | `frontend/src/api/index.ts` | 38 | 40 | 26 | 41 | API barrel hub | Watch |
+| 6 | `frontend/src/lib/toast.ts` | 79 | 0 | 1 | 72 | Infrastructure hub | Good |
+| 7 | `frontend/src/router.tsx` | 1 | 72 | 48 | 308 | Router hub | Watch |
+| 8 | `backend/app/api/deps.py` | 56 | 9 | 13 | 292 | Auth/dependency hub | Watch |
+| 9 | `backend/app/core/config.py` | 44 | 1 | 27 | 232 | Config hub | Good, but churny |
+| 10 | `frontend/src/pages/AssistantChatPage.tsx` | 2 | 39 | 77 | 2493 | Behavioral hub | High risk |
+| 11 | `backend/app/models/tree.py` | 43 | 10 | 11 | 233 | Domain model hub | Watch |
+| 12 | `frontend/src/api/client.ts` | 51 | 2 | 5 | 173 | API client hub | Good |
+| 13 | `frontend/src/pages/TreeNavigationPage.tsx` | 2 | 31 | 33 | 1385 | Behavioral hub | High risk |
+| 14 | `frontend/src/components/ui/Button.tsx` | 43 | 2 | 6 | 65 | UI primitive | Good |
+| 15 | `backend/app/models/ai_session.py` | 32 | 11 | 11 | 314 | Domain model hub | Watch |
+| 16 | `frontend/src/pages/ProceduralNavigationPage.tsx` | 1 | 33 | 22 | 1021 | Behavioral hub | High risk |
+| 17 | `frontend/src/pages/TreeLibraryPage.tsx` | 3 | 27 | 38 | 546 | Behavioral hub | Medium risk |
+| 18 | `backend/app/models/account.py` | 29 | 11 | 8 | 70 | Domain model hub | Watch |
+| 19 | `backend/app/api/endpoints/sessions.py` | 0 | 24 | 26 | 1186 | Endpoint hub | High risk |
+| 20 | `frontend/src/pages/TreeEditorPage.tsx` | 2 | 20 | 28 | 928 | Behavioral hub | Medium risk |
+| 21 | `frontend/src/pages/SessionDetailPage.tsx` | 2 | 21 | 28 | 623 | Behavioral hub | Medium risk |
+| 22 | `backend/app/api/endpoints/trees.py` | 0 | 20 | 23 | 1332 | Endpoint hub | High risk |
+| 23 | `backend/app/api/endpoints/ai_sessions.py` | 0 | 15 | 32 | 1173 | Endpoint hub | High risk |
+| 24 | `backend/app/services/flowpilot_engine.py` | 1 | 17 | 20 | 1793 | Behavioral service hub | High risk |
+
+## Findings
+
+### 1. `AssistantChatPage.tsx` Is The Clearest Frontend God Node
+
+Evidence:
+
+- 2,493 LOC
+- 39 outbound dependencies
+- 77 changes in 90 days
+- Owns routing, chat selection, magic-moment pickup state, task-lane state, upload state, facts, suggested fixes, preview state, script-builder surfaces, modals, keyboard shortcuts, local/session storage, and message rendering orchestration.
+
+Classification: behavioral god node.
+
+This file has too many reasons to change. It is not dangerous because many files import it; it is dangerous because it imports many things, owns many workflows, and changes constantly.
+
+Recommended response:
+
+- Do not do a broad refactor in isolation.
+- When touching it, extract one workflow at a time behind a hook or controller:
+ - `useTaskLaneState`
+ - `usePilotPickup`
+ - `useSuggestedFixPreview`
+ - `useSessionFacts`
+ - `useScriptBuilderPanelState`
+- Keep the page as an orchestrator, but move state machines and async effects out.
+- Before major changes, add narrow regression tests around task-lane ownership and session switching.
+
+Priority: high, opportunistic refactor.
+
+### 2. `flowpilot_engine.py` Is A Real Backend Behavioral Hub
+
+Evidence:
+
+- 1,793 LOC
+- 17 outbound dependencies
+- 20 changes in 90 days
+- Owns prompts, structured output parsing, session start, step generation, confidence, close/resolve/escalate behaviors, and likely several persistence transitions.
+
+Classification: behavioral service hub.
+
+This is not surprising: FlowPilot is core product logic. The risk is that prompt text, model call orchestration, persistence, and business rules live close together.
+
+Recommended response:
+
+- Keep this file stable during unrelated work.
+- Extract only when a change naturally creates a seam:
+ - prompt construction
+ - structured output validation
+ - session state transition persistence
+ - documentation/status update generation
+- Avoid routing new self-serve billing or account logic through this service.
+
+Priority: high, but avoid speculative refactor.
+
+### 3. AI Session Endpoint Is Acting As A Controller Plus Mapper
+
+File: `backend/app/api/endpoints/ai_sessions.py`
+
+Evidence:
+
+- 1,173 LOC
+- 15 outbound dependencies
+- 32 changes in 90 days
+- Contains endpoint handlers, quota checks, response mapping, ownership behavior, chat wiring, and PSA retry integration.
+
+Classification: endpoint god node.
+
+The endpoint does more than route HTTP to services. Some helper logic is fine, but the mapper and ownership rules should stay stable and test-backed.
+
+Recommended response:
+
+- Keep endpoint handlers thin when adding new features.
+- Move reusable mapping logic such as `_build_session_detail` to a schema/service helper if it is touched again.
+- Do not add subscription or onboarding behavior directly here; mount dependencies at router level where possible.
+
+Priority: high for change discipline, medium for refactor.
+
+### 4. Classic Session And Tree Endpoints Are Large, But Mostly Expected
+
+Files:
+
+- `backend/app/api/endpoints/sessions.py`
+- `backend/app/api/endpoints/trees.py`
+
+Evidence:
+
+- `sessions.py`: 1,186 LOC, 24 outbound dependencies, 26 changes
+- `trees.py`: 1,332 LOC, 20 outbound dependencies, 23 changes
+
+Classification: endpoint hubs.
+
+These files are not surprising in a CRUD-heavy FastAPI app, but they are large enough that behavioral additions should be routed through services or focused helpers.
+
+Recommended response:
+
+- For new subscription guards, mount dependencies instead of inserting repeated checks inside handlers.
+- For new tree/session behavior, prefer service functions over adding more endpoint-local logic.
+- Add regression tests before modifying export, sharing, ownership, or limit-check paths.
+
+Priority: medium-high.
+
+### 5. Frontend Page-Level Hubs Are The Main UI Risk
+
+Files:
+
+- `frontend/src/pages/TreeNavigationPage.tsx`
+- `frontend/src/pages/ProceduralNavigationPage.tsx`
+- `frontend/src/pages/TreeLibraryPage.tsx`
+- `frontend/src/pages/TreeEditorPage.tsx`
+- `frontend/src/pages/SessionDetailPage.tsx`
+
+Pattern:
+
+- High outbound dependencies
+- Meaningful churn
+- Page components own orchestration plus rendering
+
+Recommended response:
+
+- Treat page components as shells where possible.
+- Extract stable workflow hooks before adding another workflow.
+- Keep design updates scoped to subcomponents.
+- Avoid adding global state unless the state truly spans routes.
+
+Priority: medium, with `TreeNavigationPage.tsx` and `ProceduralNavigationPage.tsx` highest.
+
+### 6. Auth Store Is Central But Not Yet A Problem
+
+File: `frontend/src/store/authStore.ts`
+
+Evidence:
+
+- 21 inbound dependencies
+- 5 outbound dependencies
+- 144 LOC
+- 6 changes in 90 days
+
+Classification: central state hub.
+
+This is a normal app hub. It becomes risky if billing, onboarding, feature gates, and auth all accumulate here. The self-serve spec’s choice to create `useBillingStore` instead of embedding billing state in `/auth/me` is the right architectural direction.
+
+Recommended response:
+
+- Keep auth store focused on identity/session/account bootstrap.
+- Put billing in `useBillingStore`.
+- Put onboarding wizard state in a narrow API/hook, not in auth.
+
+Priority: watch.
+
+### 7. Barrels Are Creating A Large Frontend Cycle
+
+Cycle:
+
+- 42 files under `frontend/src/api/*`
+- Driven by `frontend/src/api/index.ts` exporting modules while some modules import from the barrel or share `apiClient`.
+
+Classification: barrel cycle / tooling artifact with some real coupling risk.
+
+This is common and not urgent. It can confuse static tools and make imports less explicit.
+
+Recommended response:
+
+- Prefer direct imports from concrete API modules in new code:
+ - Good: `import { aiSessionsApi } from '@/api/aiSessions'`
+ - Avoid: `import { aiSessionsApi } from '@/api'`
+- Keep `api/index.ts` only for broad convenience if it remains useful.
+- Do not spend time untangling old imports unless dependency tooling starts enforcing boundaries.
+
+Priority: low.
+
+### 8. Backend ORM Model Cycles Are Expected
+
+Cycle:
+
+- 17 files across account/user/tree/session/subscription/category/share models
+- 5 files across AI session branch/handoff/step models
+
+Classification: SQLAlchemy relationship cycle.
+
+This is expected in an ORM with bidirectional relationships. It does not mean the model layer is broken.
+
+Recommended response:
+
+- Keep imports guarded with `TYPE_CHECKING` where possible.
+- Keep model methods thin.
+- Put behavior in services, not model properties beyond simple derived flags.
+
+Priority: low.
+
+## Ranked Action List
+
+### Do Now
+
+No immediate large refactor is recommended before self-serve signup work. The report does not show a blocker.
+
+### Do During Self-Serve Work
+
+1. Keep `useBillingStore` separate from `authStore`.
+2. Mount subscription and email verification guards at router/dependency boundaries, not inside individual endpoint handlers.
+3. Keep new billing service behavior out of existing `ai_sessions.py`, `sessions.py`, and `trees.py` except for dependency wiring.
+4. Prefer direct frontend API imports over `@/api` barrel imports in new code.
+
+### Do Opportunistically
+
+1. Extract one workflow at a time from `AssistantChatPage.tsx`.
+2. Extract prompt construction or structured response validation from `flowpilot_engine.py` when touched.
+3. Move response mapping helpers out of `ai_sessions.py` if those helpers change again.
+4. Split page-level orchestration hooks out of `TreeNavigationPage.tsx` and `ProceduralNavigationPage.tsx` as features touch them.
+
+### Avoid
+
+1. Do not split `utils.ts`, `toast.ts`, `api/client.ts`, or `core/database.py` just because they are central.
+2. Do not refactor ORM model cycles unless they cause import/runtime issues.
+3. Do not start a broad barrel-file cleanup unless tooling or build performance requires it.
+
+## Raw Metrics Snapshot
+
+Total analyzed files: 783
+Total static import edges: 2,946
+
+Top inbound hubs:
+
+| File | Inbound | Outbound | 90d churn | LOC | Note |
+|---|---:|---:|---:|---:|---|
+| `frontend/src/lib/utils.ts` | 225 | 0 | 1 | 32 | Healthy utility hub |
+| `frontend/src/types/index.ts` | 137 | 32 | 22 | 103 | Barrel hub |
+| `backend/app/core/database.py` | 110 | 2 | 2 | 47 | Healthy infrastructure hub |
+| `backend/app/models/user.py` | 90 | 7 | 13 | 130 | Domain model hub |
+| `frontend/src/lib/toast.ts` | 79 | 0 | 1 | 72 | Healthy utility hub |
+| `backend/app/api/deps.py` | 56 | 9 | 13 | 292 | Auth/dependency hub |
+| `frontend/src/api/client.ts` | 51 | 2 | 5 | 173 | API infrastructure hub |
+| `backend/app/core/config.py` | 44 | 1 | 27 | 232 | Config hub, high churn |
+| `backend/app/models/tree.py` | 43 | 10 | 11 | 233 | Domain model hub |
+| `frontend/src/components/ui/Button.tsx` | 43 | 2 | 6 | 65 | UI primitive |
+
+Top outbound hubs:
+
+| File | Inbound | Outbound | 90d churn | LOC | Note |
+|---|---:|---:|---:|---:|---|
+| `frontend/src/router.tsx` | 1 | 72 | 48 | 308 | Router hub, acceptable |
+| `frontend/src/api/index.ts` | 38 | 40 | 26 | 41 | Barrel hub |
+| `frontend/src/pages/AssistantChatPage.tsx` | 2 | 39 | 77 | 2493 | High-risk behavioral hub |
+| `frontend/src/pages/ProceduralNavigationPage.tsx` | 1 | 33 | 22 | 1021 | High-risk behavioral hub |
+| `frontend/src/pages/TreeNavigationPage.tsx` | 2 | 31 | 33 | 1385 | High-risk behavioral hub |
+| `frontend/src/pages/TreeLibraryPage.tsx` | 3 | 27 | 38 | 546 | Medium-risk page hub |
+| `backend/app/api/endpoints/sessions.py` | 0 | 24 | 26 | 1186 | High-risk endpoint hub |
+| `backend/app/api/endpoints/admin.py` | 0 | 22 | 10 | 1430 | Admin endpoint hub |
+| `frontend/src/pages/SessionDetailPage.tsx` | 2 | 21 | 28 | 623 | Medium-risk page hub |
+| `backend/app/api/endpoints/auth.py` | 0 | 20 | 9 | 721 | Auth endpoint hub |
+| `backend/app/api/endpoints/trees.py` | 0 | 20 | 23 | 1332 | High-risk endpoint hub |
+| `frontend/src/pages/ProceduralEditorPage.tsx` | 1 | 20 | 16 | 475 | Medium-risk page hub |
+| `frontend/src/pages/TreeEditorPage.tsx` | 2 | 20 | 28 | 928 | Medium-risk page hub |
+
+Detected cycles:
+
+| Size | Area | Interpretation |
+|---:|---|---|
+| 42 | `frontend/src/api/*` | Barrel/export cycle. Low urgency. |
+| 17 | backend ORM models | Expected SQLAlchemy relationship cycle. Low urgency. |
+| 5 | backend AI session models | Expected relationship cycle. Low urgency. |
+| 2 | tree preview components | Small component cycle; inspect only if these files become troublesome. |
+
+## How To Re-run
+
+The current environment does not have native Python, so this report was generated with Node-based static parsing plus shell/git commands. A future repeat can use a dedicated script if this becomes a regular architecture check.
+
+Suggested future command shape:
+
+```bash
+node scripts/architecture/god-node-report.mjs
+```
+
+If this becomes a recurring check, add:
+
+- `scripts/architecture/god-node-report.mjs`
+- `docs/architecture/god-node-report-YYYY-MM-DD.md`
+- optional `docs/architecture/god-node-graph-YYYY-MM-DD.mmd`
diff --git a/docs/architecture/workflows-analysis.html b/docs/architecture/workflows-analysis.html
new file mode 100644
index 00000000..7c855e3d
--- /dev/null
+++ b/docs/architecture/workflows-analysis.html
@@ -0,0 +1,523 @@
+
+
+
+
+
+ResolutionFlow — Workflow Analysis
+
+
+
+
+
+
+
Architecture review · 2026-05-13
+
ResolutionFlow workflow analysis
+
+ Based on workflows.html · 28 user-facing flows · 297 traced steps · 120 unique files
+
+
+
+
+
Bottom line
+
You're not bloated, and most of the "circles" in the diagram are visualization artifact, not architecture problems. Each HTTP call shows up as two steps (request + response), so a normal round-trip looks like a circle even though it's one unit of work.
+
Three real items worth engineering attention: ai_sessions.py is becoming a god endpoint, the three chat services have a confusing boundary, and the auth token tables have no physical cleanup so they accrue rows forever. Everything else looks structurally healthy.
+
+
+
Headline numbers
+
+
+
+
Avg steps / flow
+
10.6
+
healthy range for multi-tenant SaaS
+
+
+
Avg files / flow
+
7.5
+
one file per layer, roughly
+
+
+
Revisit ratio
+
1.39
+
1.0 = flat; 2.0+ = chat-shaped
+
+
+
"Backward" edges
+
15%
+
mostly HTTP response, not real circles
+
+
+
+
Why the diagrams look circular
+
+
Each HTTP request and its response are encoded as two separate steps. So an API call architecturally goes one direction, but visually looks like a loop. Breakdown of the 44 backward-flowing edges:
+
+
+
+
Kind
Count
Real circle?
Example
+
+
+
+
http_post / http_get response
+
20
+
artifact
+
Server returns 200 to client. Not a circle.
+
+
+
function_call return value
+
8
+
artifact
+
oauth_providers returns an OAuthProfile to the endpoint that called it.
+
+
+
state_update (hook → component/page)
+
8
+
idiomatic
+
Hook returns updated state, page re-renders. Pure React data flow.
+
+
+
redirect (OAuth provider → app)
+
4
+
real
+
Google/Microsoft sends user back to /oauth/callback. Architecturally required.
+
+
+
webhook
+
1
+
real
+
Stripe POSTs to /webhooks/stripe. External system re-enters us.
+
+
+
navigation / external_api / other
+
3
+
real
+
Page-to-page nav, Anthropic returning a response.
+
+
+
+
+
After subtracting the request/response duality, the real backward edges are about 3% of steps, and every one of them is in a place where the architecture demands it (React state propagation, OAuth callbacks, webhooks).
+
+
What's healthy
+
+
+
Clean layer discipline good
+
The system mostly respects layer boundaries. endpoint → service (34x), service → external (37x), api_client → endpoint (30x) dominate the traffic. Things flow in the expected direction.
+
+
+
+
flowpilot_engine is the right kind of shared service good
+
Touched by 5 flows (start, respond, resolve, pause, abandon). That's a coordination kernel doing its job — high fan-in is correct for orchestration code.
+
+
+
+
PostgreSQL in 25/28 flows good
+
Star topology, not a tangle. That's what a database is supposed to look like.
+
+
+
Layer transition heatmap
+
+
How many times each layer-pair appears across all steps. Bright cells = well-traveled paths. Empty cells = layer boundaries that aren't crossed (mostly a good sign).
+
+
+
+
+
+
page
comp
hook
store
api_c
http
endp
serv
core
model
ext
+
+
+
+
page
13
5
6
12
17
·
·
·
·
·
2
+
comp
1
5
2
·
1
·
1
·
·
·
·
+
hook
7
1
·
·
11
·
·
·
·
·
·
+
store
·
·
·
4
2
·
1
·
·
·
1
+
api_client
·
·
·
·
·
5
30
·
·
·
1
+
endpoint
3
·
9
2
4
·
1
34
8
2
29
+
service
1
·
·
·
2
·
3
9
5
4
37
+
core
·
·
·
·
·
·
·
·
·
·
4
+
model
·
·
·
·
·
·
·
·
·
·
1
+
external
4
·
·
·
·
·
1
1
·
·
·
+
http_client
·
·
·
·
·
·
5
·
·
·
·
+
+
+
+
Read row → column. Diagonal = same-layer transitions. Above-diagonal = "backward" (e.g. endpoint → hook = HTTP response). The strong upper-right concentration (endpoint → service → external) is the right shape.
+
+
Top coupling hot-spots
+
+
Files appearing in the most flows. The first two (PostgreSQL, Anthropic) are expected; everything else is worth a glance.
+
+
+
+
Flows
File
Layer
Read
+
+
+
25
external:postgres
external
Expected. The DB is the hub.
+
10
external:anthropic_api
external
Expected for an AI product.
+
7
backend/app/api/endpoints/ai_sessions.py
endpoint
God endpoint candidate. See concern below.
+
6
frontend/src/api/aiSessions.ts
api_client
Mirrors the god endpoint. Splits naturally if backend splits.
+
5
backend/app/services/flowpilot_engine.py
service
Healthy coordination kernel.
+
5
backend/app/api/endpoints/auth.py
endpoint
5 auth flows, 5 endpoints. Reasonable.
+
5
frontend/src/store/authStore.ts
store
Centralized auth state. Correct.
+
5
frontend/src/pages/FlowPilotSessionPage.tsx
page
Worth checking — see OAuth concern.
+
5
frontend/src/hooks/useFlowPilotSession.ts
hook
Always co-travels with the page. Right pattern.
+
+
+
+
Things worth examining
+
+
+
1. ai_sessions.py is a god endpoint split candidate
+
Appears in 7 flows. Houses ~12 route handlers in one file: create, respond, chat, resolve, escalate, pause, abandon, pickup, list, get, plus the /chat + /respond overload. It's the highest-coupled non-DB node.
backend/app/services/assistant_chat_service.py — _call_ai infrastructure (Anthropic with caching, MCP, vision)
+
backend/app/core/ai_chat_service.py — flow-builder chat for editors (separate domain)
+
+
The PROJECT_CONTEXT.md note says assistant_chat_service was "removed except for retention settings," but the trace shows unified_chat_service.send_chat_message still calls into it for _call_ai. So the file is load-bearing infrastructure, not retention scaffolding.
+
Two paths forward:
+
+
Rename assistant_chat_service.py → ai_call_utils.py (or fold the _call_ai function into core/ai_provider.py where the provider abstraction already lives).
+
Update PROJECT_CONTEXT.md to match reality.
+
+
Either way the confusing seam goes away.
+
+
+
+
3. OAuth login is the most "circular" real flow overloaded callback
+
19 steps, 4 backward edges, 3 self-loops — by far the most complex auth flow. Some complexity is unavoidable (provider redirect = 2 boundary crossings). But 3 self-loops on OAuthCallbackPage suggest the page is doing too much local state shuffling: CSRF state validation, code exchange, invite-code stash retrieval, JWT storage, navigation, welcome-banner logic.
+
Worth a look: move OAuth state handling into either authStore (which would centralize all auth state in one place) or a useOAuthCallback hook. The page itself should be mostly declarative.
+
+
+
+
4. Three auth-token tables grow without bound add cleanup
+
Auth writes to refresh_tokens, password_reset_tokens, email_verification_tokens, and oauth_identities. Each table is individually justified (different lifecycles, different lookup patterns, JTI rotation for refresh) — this is not bloat in the code. But the cleanup story is missing.
+
Verified directly: retention_cleanup.py only sweeps AssistantChat. scheduler.py only has one other cleanup job, for AIConversation. The auth endpoint code in auth.pyrevokes tokens (UPDATE … SET revoked_at = now()) but never deletes them. So:
+
+
refresh_tokens — revoked rows stay forever. One row per login + one per refresh rotation.
+
password_reset_tokens — one row per forgot-password request, no cleanup at all.
+
email_verification_tokens — one row per signup (and per re-send), no cleanup.
+
oauth_identities — correctly persistent; this is a permanent FK from user to provider, not a cleanup target.
+
+
Suggested fix: add a daily APScheduler job in retention_cleanup.py (or a sibling) that hard-deletes rows where revoked_at < now() - INTERVAL '30 days' for refresh_tokens, and expires_at < now() - INTERVAL '7 days' for the two single-use token tables. Pattern matches the existing cleanup_expired_chats shape and the _cleanup_expired_ai_conversations job in scheduler.py.
+
Earlier draft of this concern pointed to retention_cleanup.py as the place to verify existing cleanup. That was wrong — no such cleanup exists. Corrected after direct check.
+
+
+
Things not to worry about
+
+
+
Hook ↔ page state loops in session flows
+
That's just React. useFlowPilotSession and FlowPilotSessionPage always travel together because the hook is that page's controller — they're maximally coupled by design, which is the right pattern.
+
+
+
+
Low "work percentage" on simple flows
+
"Pause & leave" comes out at 11% real work, 89% plumbing. That's correct — pause is structurally just PATCH status='paused'. There's no work to do beyond plumbing. The metric undersells simple flows.
+
+
+
+
The 25-flow PostgreSQL hub
+
Star topology, not a tangle. A database serving every flow is the architectural ideal.
+
+
+
Caveats on this analysis
+
+
+ Work vs plumbing heuristic undersells reality. It counts http_post as plumbing even when it carries the actual payload. Work percentages should be read as roughly 2x the displayed value.
+
+
+
+ Only user-facing flows are traced. Background work (knowledge flywheel scheduler, retention cleanup, PSA retry scheduler, MCP turn routing) isn't in here — and that's exactly where bloat tends to hide because nobody watches it. A follow-up trace of the background jobs would close the loop.
+
+
+
+ ~6 of 297 steps marked unverified (mostly knowledge-flywheel-created proposals). They're included in the totals but the conclusions don't depend on them.
+
+
+
+ "Backward edge" includes HTTP responses. An HTTP round-trip looks like one forward step (request) plus one backward step (response). That alone accounts for the majority of the 15% backward share. The interesting backward edges are the ~3% that aren't request/response duality.
+
+ Send us a note and we'll get back to you within one business day.
+
+
+
+ >
+ )
+}
+```
+
+A few things worth pointing out:
+
+- **`PageMeta`** sets the document title and description. Every page should have one. It's how you keep tab titles informative without scattering `` calls everywhere.
+- **`min-h-screen bg-background`** ensures the page fills the viewport with the brand background color. Critical for public pages that don't sit inside an app layout.
+- **`mx-auto max-w-xl`** caps line length around 65–75 characters of body text, per the shared design laws. `max-w-xl` is ~36rem; for the form we'll keep at this width.
+- **`font-heading`** maps to the heading font defined in `frontend/src/index.css`. Use it on H1s, not body text.
+
+Save the file. Nothing visible happens yet: we haven't told the router that `/contact` exists.
+
+---
+
+## Step 3: Wire up the route
+
+Open `frontend/src/router.tsx`. Near the top of the file, you'll see a list of `lazyWithRetry` imports for every page. Add yours, alphabetized in the public-page group:
+
+```tsx
+const ContactPage = lazyWithRetry(() => import('@/pages/ContactPage'))
+```
+
+`lazyWithRetry` is a thin wrapper around React's `lazy()` that retries once if the chunk fails to load (which can happen during a deploy). Use it for everything; never plain `lazy()`.
+
+Now scroll down to the `sentryCreateBrowserRouter` array and add a route entry next to the other public ones (`/landing`, `/privacy`, `/terms`):
+
+```tsx
+{
+ path: '/contact',
+ element: page(ContactPage),
+ errorElement: ,
+},
+```
+
+The `page()` helper wraps the component in `` and `}>`. That gives you a graceful loader while the chunk loads and an error boundary if something throws. The `errorElement: ` handles router-level errors (e.g., a 404 thrown deeper in the tree).
+
+Save. Vite reloads. Navigate to `http://your-dev-host:5173/contact` (or whatever URL serves the dev frontend). You should see the heading, the description, and the back-to-home link.
+
+> If you see a blank page or an error, check the browser console first. The two common mistakes here are: (1) wrong import path, (2) forgetting `export default`. Fix and re-save.
+
+---
+
+## Step 4: Add the form
+
+Now we add the actual contact form. Replace the body of the page (everything inside `max-w-xl`) with the form scaffolding. Keep imports for now; we'll add more in the next step.
+
+```tsx
+
+
+ ← Back to home
+
+
Contact
+
+ Send us a note and we'll get back to you within one business day.
+
+
+
+
+```
+
+Notice what we **did not** do:
+
+- No outer card wrapper (`rounded-2xl border bg-card p-6`). The page background and the centered `max-w-xl` container are enough structure. Wrapping a single form in a card adds chrome that says nothing. Per `PRODUCT.md`: *"Cards are the lazy answer."*
+- No icons next to labels. The labels carry the meaning; icons would be decoration.
+- No fancy gradient on the submit button. The accent color is reserved for ≤5% of the UI; one solid button is the pattern.
+- No nested borders or shadows.
+
+Save. The form renders. The fields are real HTML inputs: they accept focus, browser autofill works, validation messages appear if you submit empty.
+
+> If your form fields look unstyled, check that the `className` strings copied without line breaks. Tailwind compiles class strings literally; a stray newline inside the quotes breaks every utility on that line.
+
+The `inputClass` you see here is duplicated three times. That's intentional for the tutorial; repetition makes it easy to read. In real code you'd extract a constant once you have three matching calls. Look at `frontend/src/pages/account/ProfileSettingsPage.tsx` for the project's existing convention.
+
+---
+
+## Step 5: Manage form state
+
+Right now the inputs are uncontrolled (the browser owns their values) and submitting reloads the page. We need React state so we can read the values, validate them, and prevent the default submit.
+
+At the top of the file, add `useState`:
+
+```tsx
+import { useState } from 'react'
+```
+
+Inside the component, above the `return`, add three pieces of state and a submit handler:
+
+```tsx
+const [name, setName] = useState('')
+const [email, setEmail] = useState('')
+const [message, setMessage] = useState('')
+const [isSubmitting, setIsSubmitting] = useState(false)
+
+const handleSubmit = async (e: React.FormEvent) => {
+ e.preventDefault()
+ if (!name.trim() || !email.trim() || !message.trim()) return
+
+ setIsSubmitting(true)
+ try {
+ // Replaced with a real API call in Step 7.
+ await new Promise((resolve) => setTimeout(resolve, 600))
+ // Success handling lands in Step 6.
+ } finally {
+ setIsSubmitting(false)
+ }
+}
+```
+
+Then wire the inputs and the form:
+
+```tsx
+
+```
+
+What changed:
+
+- **`value` + `onChange`** makes each input a controlled component. React owns the truth; the input mirrors it.
+- **`e.preventDefault()`** stops the browser's default form submit (which would do a full page reload).
+- **`isSubmitting`** disables the button during the in-flight request and swaps the label. Users get immediate feedback that something happened.
+- **The trim() guards** catch empty submissions even when the browser's `required` attribute is bypassed (e.g., autofill anomalies).
+
+Save. Try typing in the fields. Click Send message. The button briefly says "Sending…" then re-enables. Nothing user-visible happens after that yet. That's the next step.
+
+---
+
+## Step 6: Show a success state
+
+When the submit succeeds, the form should disappear and a confirmation should take its place. That's both a clearer signal and a stronger feeling than a toast that vanishes after three seconds.
+
+Add one more piece of state:
+
+```tsx
+const [submitted, setSubmitted] = useState(false)
+```
+
+Update the submit handler so it flips `submitted` on success:
+
+```tsx
+const handleSubmit = async (e: React.FormEvent) => {
+ e.preventDefault()
+ if (!name.trim() || !email.trim() || !message.trim()) return
+
+ setIsSubmitting(true)
+ try {
+ await new Promise((resolve) => setTimeout(resolve, 600))
+ setSubmitted(true)
+ } finally {
+ setIsSubmitting(false)
+ }
+}
+```
+
+Now branch the JSX so the form renders only when `!submitted`:
+
+```tsx
+{submitted ? (
+
+
Message sent
+
+ Thanks, {name.trim()}. We'll reply at{' '}
+ {email.trim()} within one business day.
+
+
+
+) : (
+
+)}
+```
+
+A few teaching moments here:
+
+- **The success state is a single bordered region**, not a confetti card with a check icon. PRODUCT.md's tone is "competent, no fluff."
+- **It echoes the user's name and email back** so they know the right address received their message. This is a small touch that builds trust.
+- **There's a "Send another message" affordance** that resets the form. Don't trap users in success. Give them a way back.
+
+Save. Submit the form. The fields disappear and the confirmation appears. Click "Send another message" and you're back to the empty form.
+
+---
+
+## Step 7: Wire it to a real API endpoint
+
+So far the submit is a mock 600ms delay. To make it real, we need three things: an API endpoint, a frontend client function, and updated error handling.
+
+The backend endpoint setup is its own tutorial; for now we'll add the frontend client and call a not-yet-existing path, so the call fails gracefully with a toast. When the backend lands, you change one line of your client and you're done.
+
+Create `frontend/src/api/contact.ts`:
+
+```ts
+import { apiClient } from './client'
+
+export const contactApi = {
+ submit: (data: { name: string; email: string; message: string }) =>
+ apiClient.post('/contact', data).then((r) => r.data),
+}
+```
+
+That's the whole pattern. `apiClient` is a pre-configured Axios instance from `frontend/src/api/client.ts` with the base URL, auth, and error interceptors already wired. Every API module in `frontend/src/api/` follows this same shape. Read `frontend/src/api/betaFeedback.ts` to see another minimal example.
+
+Now in `ContactPage.tsx`, swap the mock for a real call. Add to imports:
+
+```tsx
+import { contactApi } from '@/api/contact'
+import { toast } from '@/lib/toast'
+```
+
+Update the submit handler:
+
+```tsx
+const handleSubmit = async (e: React.FormEvent) => {
+ e.preventDefault()
+ if (!name.trim() || !email.trim() || !message.trim()) return
+
+ setIsSubmitting(true)
+ try {
+ await contactApi.submit({
+ name: name.trim(),
+ email: email.trim(),
+ message: message.trim(),
+ })
+ setSubmitted(true)
+ } catch (err) {
+ console.error('Failed to send contact message:', err)
+ toast.error("We couldn't send your message. Please try again.")
+ } finally {
+ setIsSubmitting(false)
+ }
+}
+```
+
+What this gets you:
+
+- Backend errors (500, network failure, etc.) show a toast and keep the form filled. The user can retry without retyping.
+- The success path only fires if the API call succeeds, with no false positives.
+- `toast` comes from `@/lib/toast`, the project's wrapper around Sonner. It's themed and consistent with every other toast in the app.
+
+Save. Submit the form. Because there's no `/contact` backend endpoint yet, the call will fail and you'll see an error toast. That's correct behavior. The frontend is doing exactly what it should. When someone implements the backend, no frontend change is required.
+
+---
+
+## Step 8: Link from the landing page
+
+A page that nobody can reach isn't a page. Open `frontend/src/pages/LandingPage.tsx` and find the `