Updated documentation; added PERFORMANCE-HEALTH-CHECK.md

This commit is contained in:
Michael Chihlas
2026-02-04 21:46:32 -05:00
parent 2733a00253
commit d7c5c8c9ce
2 changed files with 697 additions and 3 deletions

View File

@@ -91,6 +91,20 @@ When adding new frontend pages or components, use "ResolutionFlow" for any user-
- Purple gradient theme, custom fonts (Plus Jakarta Sans, Inter, Outfit)
- Custom SVG logo in header and auth pages
- Updated favicon and browser tab title
- **Token Refresh Fix:**
- Silent refresh with single-flight queue (prevents concurrent 401 race conditions)
- Backend `get_refresh_token_payload` dependency extracts refresh token from Authorization header
- Frontend Axios interceptor queues failed requests during refresh, retries after success
- Auth store synced after silent refresh via `setTokens` action
- **Session Scratchpad (Floating Overlay):**
- Fixed-position overlay panel (420px wide, 55vh tall) on right edge
- Floating button when collapsed, slide-in panel when expanded
- Ctrl+/ keyboard shortcut to toggle
- Auto-save with 1s debounce, markdown preview, localStorage persistence
- Main content adjusts width via padding transition when panel opens
- **Global Thin Scrollbar Styling:**
- 6px thin scrollbars site-wide (Firefox `scrollbar-width: thin` + WebKit pseudo-elements)
- Theme-aware colors using CSS variables (`--border`, `--muted-foreground`)
### What's In Progress
@@ -180,7 +194,7 @@ patherly/
│ │ ├── router.tsx
│ │ ├── assets/brand/ # Brand logos (SVG)
│ │ ├── api/ # Axios API client
│ │ │ ├── client.ts # Axios instance with interceptors
│ │ │ ├── client.ts # Axios instance with refresh queue interceptor
│ │ │ ├── auth.ts
│ │ │ ├── trees.ts
│ │ │ └── sessions.ts
@@ -196,7 +210,7 @@ patherly/
│ │ │ ├── tree-editor/ # Tree editor components
│ │ │ ├── tree-preview/ # Visual tree preview
│ │ │ ├── step-library/ # Step library browser, forms, modals
│ │ │ ├── session/ # Session modals, scratchpad sidebar
│ │ │ ├── session/ # Session modals, scratchpad floating overlay
│ │ │ └── ui/ # MarkdownContent
│ │ ├── pages/
│ │ │ ├── LoginPage.tsx
@@ -508,6 +522,33 @@ Key state: `pendingStep`, `pendingContinuationNodeId`, `customBranchMode`, `bran
Custom steps are stored in session JSONB (`custom_steps` field) and referenced by UUID in `pathTaken`.
`findNode()` only searches tree structure -- use `findCustomStep()` for custom step UUIDs.
### Token Refresh: Match Frontend/Backend Contract
The refresh endpoint must accept tokens the same way the frontend sends them.
```python
# WRONG - Expects bare string, but frontend sends Authorization header
@router.post("/refresh")
async def refresh_token(refresh_token: str):
payload = decode_token(refresh_token)
# CORRECT - Use dependency that reads from Authorization header
@router.post("/refresh")
async def refresh_token(
payload: Annotated[dict, Depends(get_refresh_token_payload)],
):
```
The frontend Axios interceptor sends `Authorization: Bearer <refresh_token>`. The backend must extract it from the header, not expect it as a query/body parameter.
### CORS Errors Can Mask Server 500s
When the backend returns a 500 Internal Server Error, CORS headers are not added to the response. The browser reports this as a CORS error, hiding the real cause. Always check backend logs first when debugging CORS issues locally.
### Run Migrations Before Local Testing
After cloning or pulling new changes, always run `alembic upgrade head` before starting the backend. Missing migrations cause 500 errors (e.g., `column does not exist`) that manifest as CORS errors in the browser.
---
## API Endpoints Reference
@@ -586,7 +627,7 @@ interface Decision {
### State Management
- **Auth:** `useAuthStore` - Zustand with localStorage persistence
- **Auth:** `useAuthStore` - Zustand with localStorage persistence (includes `setTokens` for silent refresh sync)
- **Theme:** `useThemeStore` - Dark/light/system preference
- **Tree Editor:** `useTreeEditorStore` - Zustand + immer + zundo (undo/redo)
- **User Preferences:** `useUserPreferencesStore` - Zustand with localStorage persistence (export format default)
@@ -612,9 +653,28 @@ interface Decision {
import api from '@/api/client'
// Token refresh handled automatically by interceptor
// Concurrent 401s are queued — only one refresh request fires at a time
// On refresh failure, user is logged out and redirected to /login
const response = await api.get('/api/v1/trees')
```
### Floating Overlay Pattern (Scratchpad)
The scratchpad uses `position: fixed` with an `onOpenChange` callback so the parent page can adjust layout:
```tsx
// Child: ScratchpadSidebar.tsx
onOpenChange?: (isOpen: boolean) => void
// Fires when collapsed state changes, parent uses it to add/remove padding
// Parent: TreeNavigationPage.tsx
const [scratchpadOpen, setScratchpadOpen] = useState(...)
<div className={cn('...', scratchpadOpen && 'pr-[440px]')}>
<div className="mx-auto max-w-4xl"> {/* centers in available space */}
```
Position overlay at `right-2` (not `right-0`) so it sits inside the page scrollbar, and use full `rounded-lg` (not `rounded-l-lg`).
---
## Common Tasks

View File

@@ -0,0 +1,634 @@
# ResolutionFlow Performance Health Check
**Purpose:** Verify application performance and scalability before/during beta testing
**When to run:** Before beta launch, then monthly during growth phase
**Time required:** 2-3 hours first time, 30-60 minutes for routine checks
---
## Prerequisites
- [ ] Docker Desktop running
- [ ] Access to Railway dashboard
- [ ] VS Code open with ResolutionFlow project
- [ ] Python virtual environment activated
- [ ] Node.js installed (for k6)
---
## 1. Database Performance Check
### 1.1 Verify Indexes Exist
**Why:** Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant.
**Commands to run:**
```bash
# Connect to your Railway PostgreSQL database
# Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL
# Option 1: Use Railway CLI
railway connect PostgreSQL
# Option 2: Use psql directly
psql "your-database-url-here"
```
**Once connected, run:**
```sql
-- Check what indexes exist
SELECT
tablename,
indexname,
indexdef
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;
```
**What you're looking for:**
**GOOD:** You should see indexes on:
- `users.email` (for login lookups)
- `users.username` (for login lookups)
- `trees.created_by` (for "my trees" queries)
- `tree_nodes.tree_id` (for loading tree structure)
- `sessions.tree_id` (for session lookups)
**BAD:** If these are missing, queries will slow down as data grows
**Fix if needed:**
```sql
-- Example: Add missing index
CREATE INDEX idx_trees_created_by ON trees(created_by);
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
CREATE INDEX idx_sessions_tree_id ON sessions(tree_id);
```
### 1.2 Test Query Performance
**Run realistic queries and time them:**
```sql
-- Enable timing
\timing
-- Test: Full-text search on trees (simulates search bar)
SELECT * FROM trees
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
-- Test: Load tree with all nodes (simulates opening tree editor)
SELECT tn.*
FROM tree_nodes tn
WHERE tn.tree_id = 1 -- Replace with actual tree ID
ORDER BY tn.position;
-- Test: User's tree list (simulates dashboard)
SELECT * FROM trees
WHERE created_by = 1 -- Replace with actual user ID
ORDER BY updated_at DESC
LIMIT 20;
```
**Benchmarks:**
-**GOOD:** All queries < 50ms
- ⚠️ **WARNING:** Any query 50-200ms (optimize later)
-**BAD:** Any query > 200ms (optimize NOW)
### 1.3 Check Database Size
```sql
-- See how much data you have
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
```
**What this tells you:** If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic.
---
## 2. Frontend Performance Check
### 2.1 Test Large Tree Rendering
**Create a "stress test" tree:**
1. Log into ResolutionFlow frontend
2. Create a new tree called "Performance Test - Large Tree"
3. Add 50-100 nodes (use copy/paste to speed this up)
4. Save the tree
**What to watch:**
- Does the editor lag when adding nodes?
- Does scrolling feel smooth?
- Does saving take more than 2-3 seconds?
**Tools to use:**
Open Chrome DevTools (F12):
```
1. Go to Performance tab
2. Click Record (red circle)
3. Interact with large tree (scroll, add nodes, expand/collapse)
4. Stop recording
5. Look for red bars (blocking/slow operations)
```
**Benchmarks:**
-**GOOD:** No operations block for > 100ms
- ⚠️ **WARNING:** Some operations 100-300ms
-**BAD:** Operations > 300ms (users will notice lag)
### 2.2 Check Bundle Size
**Why:** Large JavaScript bundles = slow initial page load
```bash
# From your React frontend directory
cd frontend
npm run build
# Look at the output - it will show bundle sizes
```
**Benchmarks:**
-**GOOD:** Main bundle < 500KB gzipped
- ⚠️ **WARNING:** 500KB - 1MB
-**BAD:** > 1MB (investigate what's bloating it)
### 2.3 Lighthouse Audit
**Chrome has this built-in:**
```
1. Open ResolutionFlow in Chrome
2. F12 → Lighthouse tab
3. Select "Desktop" + "Performance"
4. Click "Analyze page load"
```
**Benchmarks:**
-**GOOD:** Performance score > 80
- ⚠️ **WARNING:** 60-80
-**BAD:** < 60
**Common issues and fixes:**
- "Eliminate render-blocking resources" → lazy load components
- "Reduce unused JavaScript" → code splitting needed
- "Serve images in next-gen formats" → use WebP instead of PNG
---
## 3. API Response Time Check
### 3.1 Manual Timing Test
**Use Railway logs:**
```
1. Go to Railway dashboard → API service → Deployments
2. Click "View Logs"
3. Perform actions in ResolutionFlow frontend
4. Watch logs for response times
```
FastAPI logs look like:
```
INFO: 127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s]
```
**Benchmarks:**
-**GOOD:** Most endpoints < 100ms
- ⚠️ **WARNING:** Some endpoints 100-300ms
-**BAD:** Any endpoint > 500ms
### 3.2 Automated API Testing
**Create a simple test script:**
```python
# File: tests/performance_test.py
import httpx
import time
from statistics import mean
API_BASE = "https://api.resolutionflow.com" # Your Railway API URL
TOKEN = "your-jwt-token-here" # Get from browser DevTools after login
headers = {"Authorization": f"Bearer {TOKEN}"}
def time_endpoint(method, path, **kwargs):
"""Time a single API request"""
start = time.time()
response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs)
elapsed = (time.time() - start) * 1000 # Convert to milliseconds
return elapsed, response.status_code
# Test critical endpoints
tests = [
("GET", "/api/trees"),
("GET", "/api/trees/1"), # Replace with actual tree ID
("GET", "/api/trees/1/nodes"),
("POST", "/api/trees/search", json={"query": "password"}),
]
print("API Performance Test Results:")
print("-" * 50)
for method, path in tests:
times = []
for i in range(5): # Run each test 5 times
elapsed, status = time_endpoint(method, path)
times.append(elapsed)
avg_time = mean(times)
print(f"{method} {path}")
print(f" Average: {avg_time:.2f}ms")
print(f" Min: {min(times):.2f}ms, Max: {max(times):.2f}ms")
print()
```
**Run it:**
```bash
python tests/performance_test.py
```
---
## 4. Monitoring Setup
### 4.1 Railway Built-in Monitoring
**What Railway gives you for free:**
```
1. Go to Railway dashboard
2. Click each service (API, Frontend, PostgreSQL)
3. Go to "Metrics" tab
```
**Watch for:**
- CPU usage spikes (should stay < 50% normally)
- Memory usage growing over time (memory leak indicator)
- Request rate (see usage patterns)
**Set up alerts:**
```
1. Railway dashboard → Project Settings → Notifications
2. Add your email
3. Enable "Deployment Failed" and "Service Crashed"
```
### 4.2 Sentry Error Tracking (Recommended)
**Why add Sentry:**
- Free tier = 5,000 errors/month
- Email alerts when things break
- See exact user actions before crash
- Industry standard (your future dev team will expect this)
**Setup (5 minutes):**
**Backend (FastAPI):**
```bash
pip install sentry-sdk[fastapi]
```
```python
# File: main.py (add at the top)
import sentry_sdk
sentry_sdk.init(
dsn="your-sentry-dsn-here", # Get from sentry.io after signup
traces_sample_rate=0.1, # 10% of requests (free tier friendly)
environment="production",
)
```
**Frontend (React):**
```bash
npm install @sentry/react
```
```javascript
// File: src/index.js (add at the top)
import * as Sentry from "@sentry/react";
Sentry.init({
dsn: "your-sentry-dsn-here",
integrations: [new Sentry.BrowserTracing()],
tracesSampleRate: 0.1,
environment: "production",
});
```
**Get your DSN:**
```
1. Sign up at sentry.io (free)
2. Create new project → Select "FastAPI" and "React"
3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456)
4. Add to Railway environment variables:
- SENTRY_DSN=your-dsn-here
```
**What you get:**
- Email when errors occur
- Stack traces showing exactly what broke
- User session replay (see what they clicked before crash)
- Performance monitoring (slow API calls flagged automatically)
---
## 5. Load Testing with k6
**Why k6:**
- Industry standard (Grafana Labs maintains it)
- Shows you EXACTLY how many concurrent users your app can handle
- Simple JavaScript syntax
- Free and open source
### 5.1 Install k6
**Windows (using Chocolatey):**
```powershell
choco install k6
```
**Or download directly:**
- Go to: https://k6.io/docs/get-started/installation/
- Download Windows installer
- Run installer
**Verify:**
```bash
k6 version
```
### 5.2 Create Load Test Script
**File: `tests/load_test.js`**
```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';
// Test configuration
export const options = {
stages: [
{ duration: '30s', target: 10 }, // Ramp up to 10 users over 30s
{ duration: '1m', target: 10 }, // Stay at 10 users for 1 minute
{ duration: '30s', target: 20 }, // Ramp up to 20 users
{ duration: '1m', target: 20 }, // Stay at 20 users for 1 minute
{ duration: '30s', target: 0 }, // Ramp down to 0
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms
http_req_failed: ['rate<0.01'], // Less than 1% of requests can fail
},
};
const BASE_URL = 'https://api.resolutionflow.com';
let authToken;
// Setup: Login once per virtual user
export function setup() {
const loginRes = http.post(`${BASE_URL}/api/auth/login`,
JSON.stringify({
username: 'test_user', // Replace with test account
password: 'test_password',
}),
{ headers: { 'Content-Type': 'application/json' } }
);
return { token: loginRes.json('access_token') };
}
// Main test: Simulate realistic user behavior
export default function (data) {
const headers = {
'Authorization': `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
// Scenario 1: Load dashboard (get trees list)
let res = http.get(`${BASE_URL}/api/trees`, { headers });
check(res, {
'dashboard loaded': (r) => r.status === 200,
'dashboard fast': (r) => r.timings.duration < 300,
});
sleep(1); // User reads for 1 second
// Scenario 2: Open a tree
res = http.get(`${BASE_URL}/api/trees/1`, { headers }); // Replace with real tree ID
check(res, {
'tree loaded': (r) => r.status === 200,
'tree load fast': (r) => r.timings.duration < 500,
});
sleep(2); // User reads tree for 2 seconds
// Scenario 3: Load tree nodes
res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers });
check(res, {
'nodes loaded': (r) => r.status === 200,
'nodes fast': (r) => r.timings.duration < 500,
});
sleep(1);
// Scenario 4: Search trees
res = http.post(
`${BASE_URL}/api/trees/search`,
JSON.stringify({ query: 'password reset' }),
{ headers }
);
check(res, {
'search worked': (r) => r.status === 200,
'search fast': (r) => r.timings.duration < 400,
});
sleep(2);
}
```
### 5.3 Run Load Test
**Basic test (10 users):**
```bash
k6 run tests/load_test.js
```
**Aggressive test (50 users):**
```bash
k6 run --vus 50 --duration 2m tests/load_test.js
```
**What the output means:**
```
✓ dashboard loaded
✓ dashboard fast
checks.........................: 95.23% ✓ 1234 ✗ 78
data_received..................: 1.2 MB 20 kB/s
data_sent......................: 456 kB 7.6 kB/s
http_req_blocked...............: avg=1.2ms min=0s med=0s max=45ms p(90)=0s p(95)=0s
http_req_duration..............: avg=142ms min=23ms med=98ms max=1.2s p(90)=245ms p(95)=387ms
http_reqs......................: 1234 20.5/s
```
**How to read this:**
- `checks`: % of tests that passed (want > 95%)
- `http_req_duration p(95)`: 95% of requests faster than this (want < 500ms)
- `http_reqs`: Requests per second your app handled
- `http_req_failed`: % of requests that errored (want < 1%)
### 5.4 Interpret Results
**✅ GOOD (Ready for beta):**
```
http_req_duration p(95) < 500ms
http_req_failed < 1%
All checks passing > 95%
```
**⚠️ WARNING (Watch closely during beta):**
```
http_req_duration p(95) 500-1000ms
http_req_failed 1-5%
Some checks failing
```
**❌ BAD (Fix before beta launch):**
```
http_req_duration p(95) > 1000ms
http_req_failed > 5%
Lots of timeouts or 500 errors
```
---
## 6. Pre-Launch Checklist
Run this checklist **before** inviting beta testers:
### Database
- [ ] All critical indexes exist (Section 1.1)
- [ ] Query performance < 200ms (Section 1.2)
- [ ] No unexplained table bloat (Section 1.3)
### Frontend
- [ ] Large tree (100 nodes) renders without lag (Section 2.1)
- [ ] Bundle size < 1MB (Section 2.2)
- [ ] Lighthouse score > 70 (Section 2.3)
### API
- [ ] All endpoints < 500ms under load (Section 3)
- [ ] Railway logs show no errors (Section 4.1)
### Monitoring
- [ ] Railway alerts configured (Section 4.1)
- [ ] Sentry installed (optional but recommended) (Section 4.2)
### Load Testing
- [ ] k6 test passes with 20 concurrent users (Section 5.3)
- [ ] No request failures during load test (Section 5.4)
---
## 7. Monthly Health Check (After Launch)
Once live with beta testers, run this monthly:
**Quick version (30 minutes):**
```bash
# 1. Check Railway metrics
# Look for: CPU/memory trends, error rate spikes
# 2. Review Sentry errors (if installed)
# Look for: New error patterns, increasing error rates
# 3. Run quick load test
k6 run tests/load_test.js
# 4. Check database query times
# Run queries from Section 1.2, watch for slowdowns
```
**When to do deep dive:**
- After adding major new features
- If users report slowness
- Before scaling to new MSP clients
- Every 3 months minimum
---
## 8. Common Performance Issues & Fixes
### Issue: "Search is slow"
**Diagnosis:**
```sql
EXPLAIN ANALYZE
SELECT * FROM trees
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
```
**Fix:** Add GIN index:
```sql
CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description));
```
### Issue: "Loading tree nodes is slow"
**Diagnosis:** Missing index on foreign key
**Fix:**
```sql
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
```
### Issue: "Dashboard takes forever to load"
**Diagnosis:** Fetching too much data
**Fix:** Add pagination to API:
```python
# Instead of: SELECT * FROM trees
# Use: SELECT * FROM trees LIMIT 20 OFFSET 0
```
### Issue: "Frontend feels sluggish"
**Diagnosis:** Re-rendering too often
**Fix:** Add React.memo() to components, use proper dependency arrays in useEffect
### Issue: "API crashes under load"
**Diagnosis:** Not enough Railway resources
**Fix:**
```
1. Railway dashboard → API service → Settings
2. Increase memory limit (default is 512MB, try 1GB)
3. Enable auto-scaling if needed
```
---
## Resources
**Tools mentioned:**
- k6: https://k6.io/docs/
- Sentry: https://sentry.io/
- PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
- Chrome Lighthouse: Built into Chrome DevTools (F12)
**When to get help:**
- k6 test failing badly (> 10% error rate)
- Database queries consistently > 1 second
- Sentry showing critical errors
- Railway CPU/memory maxing out
**Next steps after this checklist:**
- If all checks pass → Launch beta confidently
- If warnings found → Document them, monitor during beta
- If critical issues → Fix before launch, re-run tests