15 KiB
ResolutionFlow Performance Health Check
Purpose: Verify application performance and scalability before/during beta testing
When to run: Before beta launch, then monthly during growth phase
Time required: 2-3 hours first time, 30-60 minutes for routine checks
Prerequisites
- Docker Desktop running
- Access to Railway dashboard
- VS Code open with ResolutionFlow project
- Python virtual environment activated
- Node.js installed (for k6)
1. Database Performance Check
1.1 Verify Indexes Exist
Why: Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant.
Commands to run:
# Connect to your Railway PostgreSQL database
# Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL
# Option 1: Use Railway CLI
railway connect PostgreSQL
# Option 2: Use psql directly
psql "your-database-url-here"
Once connected, run:
-- Check what indexes exist
SELECT
tablename,
indexname,
indexdef
FROM pg_indexes
WHERE schemaname = 'public'
ORDER BY tablename, indexname;
What you're looking for:
✅ GOOD: You should see indexes on:
users.email(for login lookups)users.username(for login lookups)trees.created_by(for "my trees" queries)tree_nodes.tree_id(for loading tree structure)sessions.tree_id(for session lookups)
❌ BAD: If these are missing, queries will slow down as data grows
Fix if needed:
-- Example: Add missing index
CREATE INDEX idx_trees_created_by ON trees(created_by);
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
CREATE INDEX idx_sessions_tree_id ON sessions(tree_id);
1.2 Test Query Performance
Run realistic queries and time them:
-- Enable timing
\timing
-- Test: Full-text search on trees (simulates search bar)
SELECT * FROM trees
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
-- Test: Load tree with all nodes (simulates opening tree editor)
SELECT tn.*
FROM tree_nodes tn
WHERE tn.tree_id = 1 -- Replace with actual tree ID
ORDER BY tn.position;
-- Test: User's tree list (simulates dashboard)
SELECT * FROM trees
WHERE created_by = 1 -- Replace with actual user ID
ORDER BY updated_at DESC
LIMIT 20;
Benchmarks:
- ✅ GOOD: All queries < 50ms
- ⚠️ WARNING: Any query 50-200ms (optimize later)
- ❌ BAD: Any query > 200ms (optimize NOW)
1.3 Check Database Size
-- See how much data you have
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
What this tells you: If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic.
2. Frontend Performance Check
2.1 Test Large Tree Rendering
Create a "stress test" tree:
- Log into ResolutionFlow frontend
- Create a new tree called "Performance Test - Large Tree"
- Add 50-100 nodes (use copy/paste to speed this up)
- Save the tree
What to watch:
- Does the editor lag when adding nodes?
- Does scrolling feel smooth?
- Does saving take more than 2-3 seconds?
Tools to use:
Open Chrome DevTools (F12):
1. Go to Performance tab
2. Click Record (red circle)
3. Interact with large tree (scroll, add nodes, expand/collapse)
4. Stop recording
5. Look for red bars (blocking/slow operations)
Benchmarks:
- ✅ GOOD: No operations block for > 100ms
- ⚠️ WARNING: Some operations 100-300ms
- ❌ BAD: Operations > 300ms (users will notice lag)
2.2 Check Bundle Size
Why: Large JavaScript bundles = slow initial page load
# From your React frontend directory
cd frontend
npm run build
# Look at the output - it will show bundle sizes
Benchmarks:
- ✅ GOOD: Main bundle < 500KB gzipped
- ⚠️ WARNING: 500KB - 1MB
- ❌ BAD: > 1MB (investigate what's bloating it)
2.3 Lighthouse Audit
Chrome has this built-in:
1. Open ResolutionFlow in Chrome
2. F12 → Lighthouse tab
3. Select "Desktop" + "Performance"
4. Click "Analyze page load"
Benchmarks:
- ✅ GOOD: Performance score > 80
- ⚠️ WARNING: 60-80
- ❌ BAD: < 60
Common issues and fixes:
- "Eliminate render-blocking resources" → lazy load components
- "Reduce unused JavaScript" → code splitting needed
- "Serve images in next-gen formats" → use WebP instead of PNG
3. API Response Time Check
3.1 Manual Timing Test
Use Railway logs:
1. Go to Railway dashboard → API service → Deployments
2. Click "View Logs"
3. Perform actions in ResolutionFlow frontend
4. Watch logs for response times
FastAPI logs look like:
INFO: 127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s]
Benchmarks:
- ✅ GOOD: Most endpoints < 100ms
- ⚠️ WARNING: Some endpoints 100-300ms
- ❌ BAD: Any endpoint > 500ms
3.2 Automated API Testing
Create a simple test script:
# File: tests/performance_test.py
import httpx
import time
from statistics import mean
API_BASE = "https://api.resolutionflow.com" # Your Railway API URL
TOKEN = "your-jwt-token-here" # Get from browser DevTools after login
headers = {"Authorization": f"Bearer {TOKEN}"}
def time_endpoint(method, path, **kwargs):
"""Time a single API request"""
start = time.time()
response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs)
elapsed = (time.time() - start) * 1000 # Convert to milliseconds
return elapsed, response.status_code
# Test critical endpoints
tests = [
("GET", "/api/trees"),
("GET", "/api/trees/1"), # Replace with actual tree ID
("GET", "/api/trees/1/nodes"),
("POST", "/api/trees/search", json={"query": "password"}),
]
print("API Performance Test Results:")
print("-" * 50)
for method, path in tests:
times = []
for i in range(5): # Run each test 5 times
elapsed, status = time_endpoint(method, path)
times.append(elapsed)
avg_time = mean(times)
print(f"{method} {path}")
print(f" Average: {avg_time:.2f}ms")
print(f" Min: {min(times):.2f}ms, Max: {max(times):.2f}ms")
print()
Run it:
python tests/performance_test.py
4. Monitoring Setup
4.1 Railway Built-in Monitoring
What Railway gives you for free:
1. Go to Railway dashboard
2. Click each service (API, Frontend, PostgreSQL)
3. Go to "Metrics" tab
Watch for:
- CPU usage spikes (should stay < 50% normally)
- Memory usage growing over time (memory leak indicator)
- Request rate (see usage patterns)
Set up alerts:
1. Railway dashboard → Project Settings → Notifications
2. Add your email
3. Enable "Deployment Failed" and "Service Crashed"
4.2 Sentry Error Tracking (Recommended)
Why add Sentry:
- Free tier = 5,000 errors/month
- Email alerts when things break
- See exact user actions before crash
- Industry standard (your future dev team will expect this)
Setup (5 minutes):
Backend (FastAPI):
pip install sentry-sdk[fastapi]
# File: main.py (add at the top)
import sentry_sdk
sentry_sdk.init(
dsn="your-sentry-dsn-here", # Get from sentry.io after signup
traces_sample_rate=0.1, # 10% of requests (free tier friendly)
environment="production",
)
Frontend (React):
npm install @sentry/react
// File: src/index.js (add at the top)
import * as Sentry from "@sentry/react";
Sentry.init({
dsn: "your-sentry-dsn-here",
integrations: [new Sentry.BrowserTracing()],
tracesSampleRate: 0.1,
environment: "production",
});
Get your DSN:
1. Sign up at sentry.io (free)
2. Create new project → Select "FastAPI" and "React"
3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456)
4. Add to Railway environment variables:
- SENTRY_DSN=your-dsn-here
What you get:
- Email when errors occur
- Stack traces showing exactly what broke
- User session replay (see what they clicked before crash)
- Performance monitoring (slow API calls flagged automatically)
5. Load Testing with k6
Why k6:
- Industry standard (Grafana Labs maintains it)
- Shows you EXACTLY how many concurrent users your app can handle
- Simple JavaScript syntax
- Free and open source
5.1 Install k6
Windows (using Chocolatey):
choco install k6
Or download directly:
- Go to: https://k6.io/docs/get-started/installation/
- Download Windows installer
- Run installer
Verify:
k6 version
5.2 Create Load Test Script
File: tests/load_test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
// Test configuration
export const options = {
stages: [
{ duration: '30s', target: 10 }, // Ramp up to 10 users over 30s
{ duration: '1m', target: 10 }, // Stay at 10 users for 1 minute
{ duration: '30s', target: 20 }, // Ramp up to 20 users
{ duration: '1m', target: 20 }, // Stay at 20 users for 1 minute
{ duration: '30s', target: 0 }, // Ramp down to 0
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms
http_req_failed: ['rate<0.01'], // Less than 1% of requests can fail
},
};
const BASE_URL = 'https://api.resolutionflow.com';
let authToken;
// Setup: Login once per virtual user
export function setup() {
const loginRes = http.post(`${BASE_URL}/api/auth/login`,
JSON.stringify({
username: 'test_user', // Replace with test account
password: 'test_password',
}),
{ headers: { 'Content-Type': 'application/json' } }
);
return { token: loginRes.json('access_token') };
}
// Main test: Simulate realistic user behavior
export default function (data) {
const headers = {
'Authorization': `Bearer ${data.token}`,
'Content-Type': 'application/json',
};
// Scenario 1: Load dashboard (get trees list)
let res = http.get(`${BASE_URL}/api/trees`, { headers });
check(res, {
'dashboard loaded': (r) => r.status === 200,
'dashboard fast': (r) => r.timings.duration < 300,
});
sleep(1); // User reads for 1 second
// Scenario 2: Open a tree
res = http.get(`${BASE_URL}/api/trees/1`, { headers }); // Replace with real tree ID
check(res, {
'tree loaded': (r) => r.status === 200,
'tree load fast': (r) => r.timings.duration < 500,
});
sleep(2); // User reads tree for 2 seconds
// Scenario 3: Load tree nodes
res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers });
check(res, {
'nodes loaded': (r) => r.status === 200,
'nodes fast': (r) => r.timings.duration < 500,
});
sleep(1);
// Scenario 4: Search trees
res = http.post(
`${BASE_URL}/api/trees/search`,
JSON.stringify({ query: 'password reset' }),
{ headers }
);
check(res, {
'search worked': (r) => r.status === 200,
'search fast': (r) => r.timings.duration < 400,
});
sleep(2);
}
5.3 Run Load Test
Basic test (10 users):
k6 run tests/load_test.js
Aggressive test (50 users):
k6 run --vus 50 --duration 2m tests/load_test.js
What the output means:
✓ dashboard loaded
✓ dashboard fast
checks.........................: 95.23% ✓ 1234 ✗ 78
data_received..................: 1.2 MB 20 kB/s
data_sent......................: 456 kB 7.6 kB/s
http_req_blocked...............: avg=1.2ms min=0s med=0s max=45ms p(90)=0s p(95)=0s
http_req_duration..............: avg=142ms min=23ms med=98ms max=1.2s p(90)=245ms p(95)=387ms
http_reqs......................: 1234 20.5/s
How to read this:
checks: % of tests that passed (want > 95%)http_req_duration p(95): 95% of requests faster than this (want < 500ms)http_reqs: Requests per second your app handledhttp_req_failed: % of requests that errored (want < 1%)
5.4 Interpret Results
✅ GOOD (Ready for beta):
http_req_duration p(95) < 500ms
http_req_failed < 1%
All checks passing > 95%
⚠️ WARNING (Watch closely during beta):
http_req_duration p(95) 500-1000ms
http_req_failed 1-5%
Some checks failing
❌ BAD (Fix before beta launch):
http_req_duration p(95) > 1000ms
http_req_failed > 5%
Lots of timeouts or 500 errors
6. Pre-Launch Checklist
Run this checklist before inviting beta testers:
Database
- All critical indexes exist (Section 1.1)
- Query performance < 200ms (Section 1.2)
- No unexplained table bloat (Section 1.3)
Frontend
- Large tree (100 nodes) renders without lag (Section 2.1)
- Bundle size < 1MB (Section 2.2)
- Lighthouse score > 70 (Section 2.3)
API
- All endpoints < 500ms under load (Section 3)
- Railway logs show no errors (Section 4.1)
Monitoring
- Railway alerts configured (Section 4.1)
- Sentry installed (optional but recommended) (Section 4.2)
Load Testing
- k6 test passes with 20 concurrent users (Section 5.3)
- No request failures during load test (Section 5.4)
7. Monthly Health Check (After Launch)
Once live with beta testers, run this monthly:
Quick version (30 minutes):
# 1. Check Railway metrics
# Look for: CPU/memory trends, error rate spikes
# 2. Review Sentry errors (if installed)
# Look for: New error patterns, increasing error rates
# 3. Run quick load test
k6 run tests/load_test.js
# 4. Check database query times
# Run queries from Section 1.2, watch for slowdowns
When to do deep dive:
- After adding major new features
- If users report slowness
- Before scaling to new MSP clients
- Every 3 months minimum
8. Common Performance Issues & Fixes
Issue: "Search is slow"
Diagnosis:
EXPLAIN ANALYZE
SELECT * FROM trees
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
Fix: Add GIN index:
CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description));
Issue: "Loading tree nodes is slow"
Diagnosis: Missing index on foreign key
Fix:
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
Issue: "Dashboard takes forever to load"
Diagnosis: Fetching too much data
Fix: Add pagination to API:
# Instead of: SELECT * FROM trees
# Use: SELECT * FROM trees LIMIT 20 OFFSET 0
Issue: "Frontend feels sluggish"
Diagnosis: Re-rendering too often
Fix: Add React.memo() to components, use proper dependency arrays in useEffect
Issue: "API crashes under load"
Diagnosis: Not enough Railway resources
Fix:
1. Railway dashboard → API service → Settings
2. Increase memory limit (default is 512MB, try 1GB)
3. Enable auto-scaling if needed
Resources
Tools mentioned:
- k6: https://k6.io/docs/
- Sentry: https://sentry.io/
- PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
- Chrome Lighthouse: Built into Chrome DevTools (F12)
When to get help:
- k6 test failing badly (> 10% error rate)
- Database queries consistently > 1 second
- Sentry showing critical errors
- Railway CPU/memory maxing out
Next steps after this checklist:
- If all checks pass → Launch beta confidently
- If warnings found → Document them, monitor during beta
- If critical issues → Fix before launch, re-run tests