634 lines
15 KiB
Markdown
634 lines
15 KiB
Markdown
# ResolutionFlow Performance Health Check
|
|
|
|
**Purpose:** Verify application performance and scalability before/during beta testing
|
|
**When to run:** Before beta launch, then monthly during growth phase
|
|
**Time required:** 2-3 hours first time, 30-60 minutes for routine checks
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
- [ ] Docker Desktop running
|
|
- [ ] Access to Railway dashboard
|
|
- [ ] VS Code open with ResolutionFlow project
|
|
- [ ] Python virtual environment activated
|
|
- [ ] Node.js installed (for k6)
|
|
|
|
---
|
|
|
|
## 1. Database Performance Check
|
|
|
|
### 1.1 Verify Indexes Exist
|
|
|
|
**Why:** Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant.
|
|
|
|
**Commands to run:**
|
|
```bash
|
|
# Connect to your Railway PostgreSQL database
|
|
# Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL
|
|
|
|
# Option 1: Use Railway CLI
|
|
railway connect PostgreSQL
|
|
|
|
# Option 2: Use psql directly
|
|
psql "your-database-url-here"
|
|
```
|
|
|
|
**Once connected, run:**
|
|
```sql
|
|
-- Check what indexes exist
|
|
SELECT
|
|
tablename,
|
|
indexname,
|
|
indexdef
|
|
FROM pg_indexes
|
|
WHERE schemaname = 'public'
|
|
ORDER BY tablename, indexname;
|
|
```
|
|
|
|
**What you're looking for:**
|
|
|
|
✅ **GOOD:** You should see indexes on:
|
|
- `users.email` (for login lookups)
|
|
- `users.username` (for login lookups)
|
|
- `trees.created_by` (for "my trees" queries)
|
|
- `tree_nodes.tree_id` (for loading tree structure)
|
|
- `sessions.tree_id` (for session lookups)
|
|
|
|
❌ **BAD:** If these are missing, queries will slow down as data grows
|
|
|
|
**Fix if needed:**
|
|
```sql
|
|
-- Example: Add missing index
|
|
CREATE INDEX idx_trees_created_by ON trees(created_by);
|
|
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
|
|
CREATE INDEX idx_sessions_tree_id ON sessions(tree_id);
|
|
```
|
|
|
|
### 1.2 Test Query Performance
|
|
|
|
**Run realistic queries and time them:**
|
|
```sql
|
|
-- Enable timing
|
|
\timing
|
|
|
|
-- Test: Full-text search on trees (simulates search bar)
|
|
SELECT * FROM trees
|
|
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
|
|
|
|
-- Test: Load tree with all nodes (simulates opening tree editor)
|
|
SELECT tn.*
|
|
FROM tree_nodes tn
|
|
WHERE tn.tree_id = 1 -- Replace with actual tree ID
|
|
ORDER BY tn.position;
|
|
|
|
-- Test: User's tree list (simulates dashboard)
|
|
SELECT * FROM trees
|
|
WHERE created_by = 1 -- Replace with actual user ID
|
|
ORDER BY updated_at DESC
|
|
LIMIT 20;
|
|
```
|
|
|
|
**Benchmarks:**
|
|
- ✅ **GOOD:** All queries < 50ms
|
|
- ⚠️ **WARNING:** Any query 50-200ms (optimize later)
|
|
- ❌ **BAD:** Any query > 200ms (optimize NOW)
|
|
|
|
### 1.3 Check Database Size
|
|
```sql
|
|
-- See how much data you have
|
|
SELECT
|
|
schemaname,
|
|
tablename,
|
|
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
|
|
FROM pg_tables
|
|
WHERE schemaname = 'public'
|
|
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
|
|
```
|
|
|
|
**What this tells you:** If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic.
|
|
|
|
---
|
|
|
|
## 2. Frontend Performance Check
|
|
|
|
### 2.1 Test Large Tree Rendering
|
|
|
|
**Create a "stress test" tree:**
|
|
|
|
1. Log into ResolutionFlow frontend
|
|
2. Create a new tree called "Performance Test - Large Tree"
|
|
3. Add 50-100 nodes (use copy/paste to speed this up)
|
|
4. Save the tree
|
|
|
|
**What to watch:**
|
|
|
|
- Does the editor lag when adding nodes?
|
|
- Does scrolling feel smooth?
|
|
- Does saving take more than 2-3 seconds?
|
|
|
|
**Tools to use:**
|
|
|
|
Open Chrome DevTools (F12):
|
|
```
|
|
1. Go to Performance tab
|
|
2. Click Record (red circle)
|
|
3. Interact with large tree (scroll, add nodes, expand/collapse)
|
|
4. Stop recording
|
|
5. Look for red bars (blocking/slow operations)
|
|
```
|
|
|
|
**Benchmarks:**
|
|
- ✅ **GOOD:** No operations block for > 100ms
|
|
- ⚠️ **WARNING:** Some operations 100-300ms
|
|
- ❌ **BAD:** Operations > 300ms (users will notice lag)
|
|
|
|
### 2.2 Check Bundle Size
|
|
|
|
**Why:** Large JavaScript bundles = slow initial page load
|
|
```bash
|
|
# From your React frontend directory
|
|
cd frontend
|
|
npm run build
|
|
|
|
# Look at the output - it will show bundle sizes
|
|
```
|
|
|
|
**Benchmarks:**
|
|
- ✅ **GOOD:** Main bundle < 500KB gzipped
|
|
- ⚠️ **WARNING:** 500KB - 1MB
|
|
- ❌ **BAD:** > 1MB (investigate what's bloating it)
|
|
|
|
### 2.3 Lighthouse Audit
|
|
|
|
**Chrome has this built-in:**
|
|
```
|
|
1. Open ResolutionFlow in Chrome
|
|
2. F12 → Lighthouse tab
|
|
3. Select "Desktop" + "Performance"
|
|
4. Click "Analyze page load"
|
|
```
|
|
|
|
**Benchmarks:**
|
|
- ✅ **GOOD:** Performance score > 80
|
|
- ⚠️ **WARNING:** 60-80
|
|
- ❌ **BAD:** < 60
|
|
|
|
**Common issues and fixes:**
|
|
- "Eliminate render-blocking resources" → lazy load components
|
|
- "Reduce unused JavaScript" → code splitting needed
|
|
- "Serve images in next-gen formats" → use WebP instead of PNG
|
|
|
|
---
|
|
|
|
## 3. API Response Time Check
|
|
|
|
### 3.1 Manual Timing Test
|
|
|
|
**Use Railway logs:**
|
|
```
|
|
1. Go to Railway dashboard → API service → Deployments
|
|
2. Click "View Logs"
|
|
3. Perform actions in ResolutionFlow frontend
|
|
4. Watch logs for response times
|
|
```
|
|
|
|
FastAPI logs look like:
|
|
```
|
|
INFO: 127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s]
|
|
```
|
|
|
|
**Benchmarks:**
|
|
- ✅ **GOOD:** Most endpoints < 100ms
|
|
- ⚠️ **WARNING:** Some endpoints 100-300ms
|
|
- ❌ **BAD:** Any endpoint > 500ms
|
|
|
|
### 3.2 Automated API Testing
|
|
|
|
**Create a simple test script:**
|
|
```python
|
|
# File: tests/performance_test.py
|
|
|
|
import httpx
|
|
import time
|
|
from statistics import mean
|
|
|
|
API_BASE = "https://api.resolutionflow.com" # Your Railway API URL
|
|
TOKEN = "your-jwt-token-here" # Get from browser DevTools after login
|
|
|
|
headers = {"Authorization": f"Bearer {TOKEN}"}
|
|
|
|
def time_endpoint(method, path, **kwargs):
|
|
"""Time a single API request"""
|
|
start = time.time()
|
|
response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs)
|
|
elapsed = (time.time() - start) * 1000 # Convert to milliseconds
|
|
return elapsed, response.status_code
|
|
|
|
# Test critical endpoints
|
|
tests = [
|
|
("GET", "/api/trees"),
|
|
("GET", "/api/trees/1"), # Replace with actual tree ID
|
|
("GET", "/api/trees/1/nodes"),
|
|
("POST", "/api/trees/search", json={"query": "password"}),
|
|
]
|
|
|
|
print("API Performance Test Results:")
|
|
print("-" * 50)
|
|
|
|
for method, path in tests:
|
|
times = []
|
|
for i in range(5): # Run each test 5 times
|
|
elapsed, status = time_endpoint(method, path)
|
|
times.append(elapsed)
|
|
|
|
avg_time = mean(times)
|
|
print(f"{method} {path}")
|
|
print(f" Average: {avg_time:.2f}ms")
|
|
print(f" Min: {min(times):.2f}ms, Max: {max(times):.2f}ms")
|
|
print()
|
|
```
|
|
|
|
**Run it:**
|
|
```bash
|
|
python tests/performance_test.py
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Monitoring Setup
|
|
|
|
### 4.1 Railway Built-in Monitoring
|
|
|
|
**What Railway gives you for free:**
|
|
```
|
|
1. Go to Railway dashboard
|
|
2. Click each service (API, Frontend, PostgreSQL)
|
|
3. Go to "Metrics" tab
|
|
```
|
|
|
|
**Watch for:**
|
|
- CPU usage spikes (should stay < 50% normally)
|
|
- Memory usage growing over time (memory leak indicator)
|
|
- Request rate (see usage patterns)
|
|
|
|
**Set up alerts:**
|
|
```
|
|
1. Railway dashboard → Project Settings → Notifications
|
|
2. Add your email
|
|
3. Enable "Deployment Failed" and "Service Crashed"
|
|
```
|
|
|
|
### 4.2 Sentry Error Tracking (Recommended)
|
|
|
|
**Why add Sentry:**
|
|
- Free tier = 5,000 errors/month
|
|
- Email alerts when things break
|
|
- See exact user actions before crash
|
|
- Industry standard (your future dev team will expect this)
|
|
|
|
**Setup (5 minutes):**
|
|
|
|
**Backend (FastAPI):**
|
|
```bash
|
|
pip install sentry-sdk[fastapi]
|
|
```
|
|
```python
|
|
# File: main.py (add at the top)
|
|
|
|
import sentry_sdk
|
|
|
|
sentry_sdk.init(
|
|
dsn="your-sentry-dsn-here", # Get from sentry.io after signup
|
|
traces_sample_rate=0.1, # 10% of requests (free tier friendly)
|
|
environment="production",
|
|
)
|
|
```
|
|
|
|
**Frontend (React):**
|
|
```bash
|
|
npm install @sentry/react
|
|
```
|
|
```javascript
|
|
// File: src/index.js (add at the top)
|
|
|
|
import * as Sentry from "@sentry/react";
|
|
|
|
Sentry.init({
|
|
dsn: "your-sentry-dsn-here",
|
|
integrations: [new Sentry.BrowserTracing()],
|
|
tracesSampleRate: 0.1,
|
|
environment: "production",
|
|
});
|
|
```
|
|
|
|
**Get your DSN:**
|
|
```
|
|
1. Sign up at sentry.io (free)
|
|
2. Create new project → Select "FastAPI" and "React"
|
|
3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456)
|
|
4. Add to Railway environment variables:
|
|
- SENTRY_DSN=your-dsn-here
|
|
```
|
|
|
|
**What you get:**
|
|
|
|
- Email when errors occur
|
|
- Stack traces showing exactly what broke
|
|
- User session replay (see what they clicked before crash)
|
|
- Performance monitoring (slow API calls flagged automatically)
|
|
|
|
---
|
|
|
|
## 5. Load Testing with k6
|
|
|
|
**Why k6:**
|
|
- Industry standard (Grafana Labs maintains it)
|
|
- Shows you EXACTLY how many concurrent users your app can handle
|
|
- Simple JavaScript syntax
|
|
- Free and open source
|
|
|
|
### 5.1 Install k6
|
|
|
|
**Windows (using Chocolatey):**
|
|
```powershell
|
|
choco install k6
|
|
```
|
|
|
|
**Or download directly:**
|
|
- Go to: https://k6.io/docs/get-started/installation/
|
|
- Download Windows installer
|
|
- Run installer
|
|
|
|
**Verify:**
|
|
```bash
|
|
k6 version
|
|
```
|
|
|
|
### 5.2 Create Load Test Script
|
|
|
|
**File: `tests/load_test.js`**
|
|
```javascript
|
|
import http from 'k6/http';
|
|
import { check, sleep } from 'k6';
|
|
|
|
// Test configuration
|
|
export const options = {
|
|
stages: [
|
|
{ duration: '30s', target: 10 }, // Ramp up to 10 users over 30s
|
|
{ duration: '1m', target: 10 }, // Stay at 10 users for 1 minute
|
|
{ duration: '30s', target: 20 }, // Ramp up to 20 users
|
|
{ duration: '1m', target: 20 }, // Stay at 20 users for 1 minute
|
|
{ duration: '30s', target: 0 }, // Ramp down to 0
|
|
],
|
|
thresholds: {
|
|
http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms
|
|
http_req_failed: ['rate<0.01'], // Less than 1% of requests can fail
|
|
},
|
|
};
|
|
|
|
const BASE_URL = 'https://api.resolutionflow.com';
|
|
let authToken;
|
|
|
|
// Setup: Login once per virtual user
|
|
export function setup() {
|
|
const loginRes = http.post(`${BASE_URL}/api/auth/login`,
|
|
JSON.stringify({
|
|
username: 'test_user', // Replace with test account
|
|
password: 'test_password',
|
|
}),
|
|
{ headers: { 'Content-Type': 'application/json' } }
|
|
);
|
|
|
|
return { token: loginRes.json('access_token') };
|
|
}
|
|
|
|
// Main test: Simulate realistic user behavior
|
|
export default function (data) {
|
|
const headers = {
|
|
'Authorization': `Bearer ${data.token}`,
|
|
'Content-Type': 'application/json',
|
|
};
|
|
|
|
// Scenario 1: Load dashboard (get trees list)
|
|
let res = http.get(`${BASE_URL}/api/trees`, { headers });
|
|
check(res, {
|
|
'dashboard loaded': (r) => r.status === 200,
|
|
'dashboard fast': (r) => r.timings.duration < 300,
|
|
});
|
|
sleep(1); // User reads for 1 second
|
|
|
|
// Scenario 2: Open a tree
|
|
res = http.get(`${BASE_URL}/api/trees/1`, { headers }); // Replace with real tree ID
|
|
check(res, {
|
|
'tree loaded': (r) => r.status === 200,
|
|
'tree load fast': (r) => r.timings.duration < 500,
|
|
});
|
|
sleep(2); // User reads tree for 2 seconds
|
|
|
|
// Scenario 3: Load tree nodes
|
|
res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers });
|
|
check(res, {
|
|
'nodes loaded': (r) => r.status === 200,
|
|
'nodes fast': (r) => r.timings.duration < 500,
|
|
});
|
|
sleep(1);
|
|
|
|
// Scenario 4: Search trees
|
|
res = http.post(
|
|
`${BASE_URL}/api/trees/search`,
|
|
JSON.stringify({ query: 'password reset' }),
|
|
{ headers }
|
|
);
|
|
check(res, {
|
|
'search worked': (r) => r.status === 200,
|
|
'search fast': (r) => r.timings.duration < 400,
|
|
});
|
|
sleep(2);
|
|
}
|
|
```
|
|
|
|
### 5.3 Run Load Test
|
|
|
|
**Basic test (10 users):**
|
|
```bash
|
|
k6 run tests/load_test.js
|
|
```
|
|
|
|
**Aggressive test (50 users):**
|
|
```bash
|
|
k6 run --vus 50 --duration 2m tests/load_test.js
|
|
```
|
|
|
|
**What the output means:**
|
|
```
|
|
✓ dashboard loaded
|
|
✓ dashboard fast
|
|
|
|
checks.........................: 95.23% ✓ 1234 ✗ 78
|
|
data_received..................: 1.2 MB 20 kB/s
|
|
data_sent......................: 456 kB 7.6 kB/s
|
|
http_req_blocked...............: avg=1.2ms min=0s med=0s max=45ms p(90)=0s p(95)=0s
|
|
http_req_duration..............: avg=142ms min=23ms med=98ms max=1.2s p(90)=245ms p(95)=387ms
|
|
http_reqs......................: 1234 20.5/s
|
|
```
|
|
|
|
**How to read this:**
|
|
|
|
- `checks`: % of tests that passed (want > 95%)
|
|
- `http_req_duration p(95)`: 95% of requests faster than this (want < 500ms)
|
|
- `http_reqs`: Requests per second your app handled
|
|
- `http_req_failed`: % of requests that errored (want < 1%)
|
|
|
|
### 5.4 Interpret Results
|
|
|
|
**✅ GOOD (Ready for beta):**
|
|
```
|
|
http_req_duration p(95) < 500ms
|
|
http_req_failed < 1%
|
|
All checks passing > 95%
|
|
```
|
|
|
|
**⚠️ WARNING (Watch closely during beta):**
|
|
```
|
|
http_req_duration p(95) 500-1000ms
|
|
http_req_failed 1-5%
|
|
Some checks failing
|
|
```
|
|
|
|
**❌ BAD (Fix before beta launch):**
|
|
```
|
|
http_req_duration p(95) > 1000ms
|
|
http_req_failed > 5%
|
|
Lots of timeouts or 500 errors
|
|
```
|
|
|
|
---
|
|
|
|
## 6. Pre-Launch Checklist
|
|
|
|
Run this checklist **before** inviting beta testers:
|
|
|
|
### Database
|
|
- [ ] All critical indexes exist (Section 1.1)
|
|
- [ ] Query performance < 200ms (Section 1.2)
|
|
- [ ] No unexplained table bloat (Section 1.3)
|
|
|
|
### Frontend
|
|
- [ ] Large tree (100 nodes) renders without lag (Section 2.1)
|
|
- [ ] Bundle size < 1MB (Section 2.2)
|
|
- [ ] Lighthouse score > 70 (Section 2.3)
|
|
|
|
### API
|
|
- [ ] All endpoints < 500ms under load (Section 3)
|
|
- [ ] Railway logs show no errors (Section 4.1)
|
|
|
|
### Monitoring
|
|
- [ ] Railway alerts configured (Section 4.1)
|
|
- [ ] Sentry installed (optional but recommended) (Section 4.2)
|
|
|
|
### Load Testing
|
|
- [ ] k6 test passes with 20 concurrent users (Section 5.3)
|
|
- [ ] No request failures during load test (Section 5.4)
|
|
|
|
---
|
|
|
|
## 7. Monthly Health Check (After Launch)
|
|
|
|
Once live with beta testers, run this monthly:
|
|
|
|
**Quick version (30 minutes):**
|
|
```bash
|
|
# 1. Check Railway metrics
|
|
# Look for: CPU/memory trends, error rate spikes
|
|
|
|
# 2. Review Sentry errors (if installed)
|
|
# Look for: New error patterns, increasing error rates
|
|
|
|
# 3. Run quick load test
|
|
k6 run tests/load_test.js
|
|
|
|
# 4. Check database query times
|
|
# Run queries from Section 1.2, watch for slowdowns
|
|
```
|
|
|
|
**When to do deep dive:**
|
|
- After adding major new features
|
|
- If users report slowness
|
|
- Before scaling to new MSP clients
|
|
- Every 3 months minimum
|
|
|
|
---
|
|
|
|
## 8. Common Performance Issues & Fixes
|
|
|
|
### Issue: "Search is slow"
|
|
|
|
**Diagnosis:**
|
|
```sql
|
|
EXPLAIN ANALYZE
|
|
SELECT * FROM trees
|
|
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');
|
|
```
|
|
|
|
**Fix:** Add GIN index:
|
|
```sql
|
|
CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description));
|
|
```
|
|
|
|
### Issue: "Loading tree nodes is slow"
|
|
|
|
**Diagnosis:** Missing index on foreign key
|
|
|
|
**Fix:**
|
|
```sql
|
|
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
|
|
```
|
|
|
|
### Issue: "Dashboard takes forever to load"
|
|
|
|
**Diagnosis:** Fetching too much data
|
|
|
|
**Fix:** Add pagination to API:
|
|
```python
|
|
# Instead of: SELECT * FROM trees
|
|
# Use: SELECT * FROM trees LIMIT 20 OFFSET 0
|
|
```
|
|
|
|
### Issue: "Frontend feels sluggish"
|
|
|
|
**Diagnosis:** Re-rendering too often
|
|
|
|
**Fix:** Add React.memo() to components, use proper dependency arrays in useEffect
|
|
|
|
### Issue: "API crashes under load"
|
|
|
|
**Diagnosis:** Not enough Railway resources
|
|
|
|
**Fix:**
|
|
```
|
|
1. Railway dashboard → API service → Settings
|
|
2. Increase memory limit (default is 512MB, try 1GB)
|
|
3. Enable auto-scaling if needed
|
|
```
|
|
|
|
---
|
|
|
|
## Resources
|
|
|
|
**Tools mentioned:**
|
|
- k6: https://k6.io/docs/
|
|
- Sentry: https://sentry.io/
|
|
- PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
|
|
- Chrome Lighthouse: Built into Chrome DevTools (F12)
|
|
|
|
**When to get help:**
|
|
- k6 test failing badly (> 10% error rate)
|
|
- Database queries consistently > 1 second
|
|
- Sentry showing critical errors
|
|
- Railway CPU/memory maxing out
|
|
|
|
**Next steps after this checklist:**
|
|
- If all checks pass → Launch beta confidently
|
|
- If warnings found → Document them, monitor during beta
|
|
- If critical issues → Fix before launch, re-run tests |