# ResolutionFlow Performance Health Check **Purpose:** Verify application performance and scalability before/during beta testing **When to run:** Before beta launch, then monthly during growth phase **Time required:** 2-3 hours first time, 30-60 minutes for routine checks --- ## Prerequisites - [ ] Docker Desktop running - [ ] Access to Railway dashboard - [ ] VS Code open with ResolutionFlow project - [ ] Python virtual environment activated - [ ] Node.js installed (for k6) --- ## 1. Database Performance Check ### 1.1 Verify Indexes Exist **Why:** Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant. **Commands to run:** ```bash # Connect to your Railway PostgreSQL database # Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL # Option 1: Use Railway CLI railway connect PostgreSQL # Option 2: Use psql directly psql "your-database-url-here" ``` **Once connected, run:** ```sql -- Check what indexes exist SELECT tablename, indexname, indexdef FROM pg_indexes WHERE schemaname = 'public' ORDER BY tablename, indexname; ``` **What you're looking for:** ✅ **GOOD:** You should see indexes on: - `users.email` (for login lookups) - `users.username` (for login lookups) - `trees.created_by` (for "my trees" queries) - `tree_nodes.tree_id` (for loading tree structure) - `sessions.tree_id` (for session lookups) ❌ **BAD:** If these are missing, queries will slow down as data grows **Fix if needed:** ```sql -- Example: Add missing index CREATE INDEX idx_trees_created_by ON trees(created_by); CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id); CREATE INDEX idx_sessions_tree_id ON sessions(tree_id); ``` ### 1.2 Test Query Performance **Run realistic queries and time them:** ```sql -- Enable timing \timing -- Test: Full-text search on trees (simulates search bar) SELECT * FROM trees WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password'); -- Test: Load tree with all nodes (simulates opening tree editor) SELECT tn.* FROM tree_nodes tn WHERE tn.tree_id = 1 -- Replace with actual tree ID ORDER BY tn.position; -- Test: User's tree list (simulates dashboard) SELECT * FROM trees WHERE created_by = 1 -- Replace with actual user ID ORDER BY updated_at DESC LIMIT 20; ``` **Benchmarks:** - ✅ **GOOD:** All queries < 50ms - ⚠️ **WARNING:** Any query 50-200ms (optimize later) - ❌ **BAD:** Any query > 200ms (optimize NOW) ### 1.3 Check Database Size ```sql -- See how much data you have SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables WHERE schemaname = 'public' ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; ``` **What this tells you:** If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic. --- ## 2. Frontend Performance Check ### 2.1 Test Large Tree Rendering **Create a "stress test" tree:** 1. Log into ResolutionFlow frontend 2. Create a new tree called "Performance Test - Large Tree" 3. Add 50-100 nodes (use copy/paste to speed this up) 4. Save the tree **What to watch:** - Does the editor lag when adding nodes? - Does scrolling feel smooth? - Does saving take more than 2-3 seconds? **Tools to use:** Open Chrome DevTools (F12): ``` 1. Go to Performance tab 2. Click Record (red circle) 3. Interact with large tree (scroll, add nodes, expand/collapse) 4. Stop recording 5. Look for red bars (blocking/slow operations) ``` **Benchmarks:** - ✅ **GOOD:** No operations block for > 100ms - ⚠️ **WARNING:** Some operations 100-300ms - ❌ **BAD:** Operations > 300ms (users will notice lag) ### 2.2 Check Bundle Size **Why:** Large JavaScript bundles = slow initial page load ```bash # From your React frontend directory cd frontend npm run build # Look at the output - it will show bundle sizes ``` **Benchmarks:** - ✅ **GOOD:** Main bundle < 500KB gzipped - ⚠️ **WARNING:** 500KB - 1MB - ❌ **BAD:** > 1MB (investigate what's bloating it) ### 2.3 Lighthouse Audit **Chrome has this built-in:** ``` 1. Open ResolutionFlow in Chrome 2. F12 → Lighthouse tab 3. Select "Desktop" + "Performance" 4. Click "Analyze page load" ``` **Benchmarks:** - ✅ **GOOD:** Performance score > 80 - ⚠️ **WARNING:** 60-80 - ❌ **BAD:** < 60 **Common issues and fixes:** - "Eliminate render-blocking resources" → lazy load components - "Reduce unused JavaScript" → code splitting needed - "Serve images in next-gen formats" → use WebP instead of PNG --- ## 3. API Response Time Check ### 3.1 Manual Timing Test **Use Railway logs:** ``` 1. Go to Railway dashboard → API service → Deployments 2. Click "View Logs" 3. Perform actions in ResolutionFlow frontend 4. Watch logs for response times ``` FastAPI logs look like: ``` INFO: 127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s] ``` **Benchmarks:** - ✅ **GOOD:** Most endpoints < 100ms - ⚠️ **WARNING:** Some endpoints 100-300ms - ❌ **BAD:** Any endpoint > 500ms ### 3.2 Automated API Testing **Create a simple test script:** ```python # File: tests/performance_test.py import httpx import time from statistics import mean API_BASE = "https://api.resolutionflow.com" # Your Railway API URL TOKEN = "your-jwt-token-here" # Get from browser DevTools after login headers = {"Authorization": f"Bearer {TOKEN}"} def time_endpoint(method, path, **kwargs): """Time a single API request""" start = time.time() response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs) elapsed = (time.time() - start) * 1000 # Convert to milliseconds return elapsed, response.status_code # Test critical endpoints tests = [ ("GET", "/api/trees"), ("GET", "/api/trees/1"), # Replace with actual tree ID ("GET", "/api/trees/1/nodes"), ("POST", "/api/trees/search", json={"query": "password"}), ] print("API Performance Test Results:") print("-" * 50) for method, path in tests: times = [] for i in range(5): # Run each test 5 times elapsed, status = time_endpoint(method, path) times.append(elapsed) avg_time = mean(times) print(f"{method} {path}") print(f" Average: {avg_time:.2f}ms") print(f" Min: {min(times):.2f}ms, Max: {max(times):.2f}ms") print() ``` **Run it:** ```bash python tests/performance_test.py ``` --- ## 4. Monitoring Setup ### 4.1 Railway Built-in Monitoring **What Railway gives you for free:** ``` 1. Go to Railway dashboard 2. Click each service (API, Frontend, PostgreSQL) 3. Go to "Metrics" tab ``` **Watch for:** - CPU usage spikes (should stay < 50% normally) - Memory usage growing over time (memory leak indicator) - Request rate (see usage patterns) **Set up alerts:** ``` 1. Railway dashboard → Project Settings → Notifications 2. Add your email 3. Enable "Deployment Failed" and "Service Crashed" ``` ### 4.2 Sentry Error Tracking (Recommended) **Why add Sentry:** - Free tier = 5,000 errors/month - Email alerts when things break - See exact user actions before crash - Industry standard (your future dev team will expect this) **Setup (5 minutes):** **Backend (FastAPI):** ```bash pip install sentry-sdk[fastapi] ``` ```python # File: main.py (add at the top) import sentry_sdk sentry_sdk.init( dsn="your-sentry-dsn-here", # Get from sentry.io after signup traces_sample_rate=0.1, # 10% of requests (free tier friendly) environment="production", ) ``` **Frontend (React):** ```bash npm install @sentry/react ``` ```javascript // File: src/index.js (add at the top) import * as Sentry from "@sentry/react"; Sentry.init({ dsn: "your-sentry-dsn-here", integrations: [new Sentry.BrowserTracing()], tracesSampleRate: 0.1, environment: "production", }); ``` **Get your DSN:** ``` 1. Sign up at sentry.io (free) 2. Create new project → Select "FastAPI" and "React" 3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456) 4. Add to Railway environment variables: - SENTRY_DSN=your-dsn-here ``` **What you get:** - Email when errors occur - Stack traces showing exactly what broke - User session replay (see what they clicked before crash) - Performance monitoring (slow API calls flagged automatically) --- ## 5. Load Testing with k6 **Why k6:** - Industry standard (Grafana Labs maintains it) - Shows you EXACTLY how many concurrent users your app can handle - Simple JavaScript syntax - Free and open source ### 5.1 Install k6 **Windows (using Chocolatey):** ```powershell choco install k6 ``` **Or download directly:** - Go to: https://k6.io/docs/get-started/installation/ - Download Windows installer - Run installer **Verify:** ```bash k6 version ``` ### 5.2 Create Load Test Script **File: `tests/load_test.js`** ```javascript import http from 'k6/http'; import { check, sleep } from 'k6'; // Test configuration export const options = { stages: [ { duration: '30s', target: 10 }, // Ramp up to 10 users over 30s { duration: '1m', target: 10 }, // Stay at 10 users for 1 minute { duration: '30s', target: 20 }, // Ramp up to 20 users { duration: '1m', target: 20 }, // Stay at 20 users for 1 minute { duration: '30s', target: 0 }, // Ramp down to 0 ], thresholds: { http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms http_req_failed: ['rate<0.01'], // Less than 1% of requests can fail }, }; const BASE_URL = 'https://api.resolutionflow.com'; let authToken; // Setup: Login once per virtual user export function setup() { const loginRes = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({ username: 'test_user', // Replace with test account password: 'test_password', }), { headers: { 'Content-Type': 'application/json' } } ); return { token: loginRes.json('access_token') }; } // Main test: Simulate realistic user behavior export default function (data) { const headers = { 'Authorization': `Bearer ${data.token}`, 'Content-Type': 'application/json', }; // Scenario 1: Load dashboard (get trees list) let res = http.get(`${BASE_URL}/api/trees`, { headers }); check(res, { 'dashboard loaded': (r) => r.status === 200, 'dashboard fast': (r) => r.timings.duration < 300, }); sleep(1); // User reads for 1 second // Scenario 2: Open a tree res = http.get(`${BASE_URL}/api/trees/1`, { headers }); // Replace with real tree ID check(res, { 'tree loaded': (r) => r.status === 200, 'tree load fast': (r) => r.timings.duration < 500, }); sleep(2); // User reads tree for 2 seconds // Scenario 3: Load tree nodes res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers }); check(res, { 'nodes loaded': (r) => r.status === 200, 'nodes fast': (r) => r.timings.duration < 500, }); sleep(1); // Scenario 4: Search trees res = http.post( `${BASE_URL}/api/trees/search`, JSON.stringify({ query: 'password reset' }), { headers } ); check(res, { 'search worked': (r) => r.status === 200, 'search fast': (r) => r.timings.duration < 400, }); sleep(2); } ``` ### 5.3 Run Load Test **Basic test (10 users):** ```bash k6 run tests/load_test.js ``` **Aggressive test (50 users):** ```bash k6 run --vus 50 --duration 2m tests/load_test.js ``` **What the output means:** ``` ✓ dashboard loaded ✓ dashboard fast checks.........................: 95.23% ✓ 1234 ✗ 78 data_received..................: 1.2 MB 20 kB/s data_sent......................: 456 kB 7.6 kB/s http_req_blocked...............: avg=1.2ms min=0s med=0s max=45ms p(90)=0s p(95)=0s http_req_duration..............: avg=142ms min=23ms med=98ms max=1.2s p(90)=245ms p(95)=387ms http_reqs......................: 1234 20.5/s ``` **How to read this:** - `checks`: % of tests that passed (want > 95%) - `http_req_duration p(95)`: 95% of requests faster than this (want < 500ms) - `http_reqs`: Requests per second your app handled - `http_req_failed`: % of requests that errored (want < 1%) ### 5.4 Interpret Results **✅ GOOD (Ready for beta):** ``` http_req_duration p(95) < 500ms http_req_failed < 1% All checks passing > 95% ``` **⚠️ WARNING (Watch closely during beta):** ``` http_req_duration p(95) 500-1000ms http_req_failed 1-5% Some checks failing ``` **❌ BAD (Fix before beta launch):** ``` http_req_duration p(95) > 1000ms http_req_failed > 5% Lots of timeouts or 500 errors ``` --- ## 6. Pre-Launch Checklist Run this checklist **before** inviting beta testers: ### Database - [ ] All critical indexes exist (Section 1.1) - [ ] Query performance < 200ms (Section 1.2) - [ ] No unexplained table bloat (Section 1.3) ### Frontend - [ ] Large tree (100 nodes) renders without lag (Section 2.1) - [ ] Bundle size < 1MB (Section 2.2) - [ ] Lighthouse score > 70 (Section 2.3) ### API - [ ] All endpoints < 500ms under load (Section 3) - [ ] Railway logs show no errors (Section 4.1) ### Monitoring - [ ] Railway alerts configured (Section 4.1) - [ ] Sentry installed (optional but recommended) (Section 4.2) ### Load Testing - [ ] k6 test passes with 20 concurrent users (Section 5.3) - [ ] No request failures during load test (Section 5.4) --- ## 7. Monthly Health Check (After Launch) Once live with beta testers, run this monthly: **Quick version (30 minutes):** ```bash # 1. Check Railway metrics # Look for: CPU/memory trends, error rate spikes # 2. Review Sentry errors (if installed) # Look for: New error patterns, increasing error rates # 3. Run quick load test k6 run tests/load_test.js # 4. Check database query times # Run queries from Section 1.2, watch for slowdowns ``` **When to do deep dive:** - After adding major new features - If users report slowness - Before scaling to new MSP clients - Every 3 months minimum --- ## 8. Common Performance Issues & Fixes ### Issue: "Search is slow" **Diagnosis:** ```sql EXPLAIN ANALYZE SELECT * FROM trees WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password'); ``` **Fix:** Add GIN index: ```sql CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description)); ``` ### Issue: "Loading tree nodes is slow" **Diagnosis:** Missing index on foreign key **Fix:** ```sql CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id); ``` ### Issue: "Dashboard takes forever to load" **Diagnosis:** Fetching too much data **Fix:** Add pagination to API: ```python # Instead of: SELECT * FROM trees # Use: SELECT * FROM trees LIMIT 20 OFFSET 0 ``` ### Issue: "Frontend feels sluggish" **Diagnosis:** Re-rendering too often **Fix:** Add React.memo() to components, use proper dependency arrays in useEffect ### Issue: "API crashes under load" **Diagnosis:** Not enough Railway resources **Fix:** ``` 1. Railway dashboard → API service → Settings 2. Increase memory limit (default is 512MB, try 1GB) 3. Enable auto-scaling if needed ``` --- ## Resources **Tools mentioned:** - k6: https://k6.io/docs/ - Sentry: https://sentry.io/ - PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html - Chrome Lighthouse: Built into Chrome DevTools (F12) **When to get help:** - k6 test failing badly (> 10% error rate) - Database queries consistently > 1 second - Sentry showing critical errors - Railway CPU/memory maxing out **Next steps after this checklist:** - If all checks pass → Launch beta confidently - If warnings found → Document them, monitor during beta - If critical issues → Fix before launch, re-run tests