Files

Michael Chihlas d7c5c8c9ce Updated documentation; added PERFORMANCE-HEALTH-CHECK.md

2026-02-04 21:46:32 -05:00

15 KiB

Raw Permalink Blame History

ResolutionFlow Performance Health Check

Purpose: Verify application performance and scalability before/during beta testing
When to run: Before beta launch, then monthly during growth phase
Time required: 2-3 hours first time, 30-60 minutes for routine checks

Prerequisites

Docker Desktop running
Access to Railway dashboard
VS Code open with ResolutionFlow project
Python virtual environment activated
Node.js installed (for k6)

1. Database Performance Check

1.1 Verify Indexes Exist

Why: Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant.

Commands to run:

# Connect to your Railway PostgreSQL database
# Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL

# Option 1: Use Railway CLI
railway connect PostgreSQL

# Option 2: Use psql directly
psql "your-database-url-here"

Once connected, run:

-- Check what indexes exist
SELECT 
    tablename, 
    indexname, 
    indexdef 
FROM pg_indexes 
WHERE schemaname = 'public' 
ORDER BY tablename, indexname;

What you're looking for:

✅ GOOD: You should see indexes on:

users.email (for login lookups)
users.username (for login lookups)
trees.created_by (for "my trees" queries)
tree_nodes.tree_id (for loading tree structure)
sessions.tree_id (for session lookups)

❌ BAD: If these are missing, queries will slow down as data grows

Fix if needed:

-- Example: Add missing index
CREATE INDEX idx_trees_created_by ON trees(created_by);
CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);
CREATE INDEX idx_sessions_tree_id ON sessions(tree_id);

1.2 Test Query Performance

Run realistic queries and time them:

-- Enable timing
\timing

-- Test: Full-text search on trees (simulates search bar)
SELECT * FROM trees 
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');

-- Test: Load tree with all nodes (simulates opening tree editor)
SELECT tn.* 
FROM tree_nodes tn 
WHERE tn.tree_id = 1  -- Replace with actual tree ID
ORDER BY tn.position;

-- Test: User's tree list (simulates dashboard)
SELECT * FROM trees 
WHERE created_by = 1  -- Replace with actual user ID
ORDER BY updated_at DESC 
LIMIT 20;

Benchmarks:

✅ GOOD: All queries < 50ms
⚠️ WARNING: Any query 50-200ms (optimize later)
❌ BAD: Any query > 200ms (optimize NOW)

1.3 Check Database Size

-- See how much data you have
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

What this tells you: If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic.

2. Frontend Performance Check

2.1 Test Large Tree Rendering

Create a "stress test" tree:

Log into ResolutionFlow frontend
Create a new tree called "Performance Test - Large Tree"
Add 50-100 nodes (use copy/paste to speed this up)
Save the tree

What to watch:

Does the editor lag when adding nodes?
Does scrolling feel smooth?
Does saving take more than 2-3 seconds?

Tools to use:

Open Chrome DevTools (F12):

1. Go to Performance tab
2. Click Record (red circle)
3. Interact with large tree (scroll, add nodes, expand/collapse)
4. Stop recording
5. Look for red bars (blocking/slow operations)

Benchmarks:

✅ GOOD: No operations block for > 100ms
⚠️ WARNING: Some operations 100-300ms
❌ BAD: Operations > 300ms (users will notice lag)

2.2 Check Bundle Size

Why: Large JavaScript bundles = slow initial page load

# From your React frontend directory
cd frontend
npm run build

# Look at the output - it will show bundle sizes

Benchmarks:

✅ GOOD: Main bundle < 500KB gzipped
⚠️ WARNING: 500KB - 1MB
❌ BAD: > 1MB (investigate what's bloating it)

2.3 Lighthouse Audit

Chrome has this built-in:

1. Open ResolutionFlow in Chrome
2. F12 → Lighthouse tab
3. Select "Desktop" + "Performance"
4. Click "Analyze page load"

Benchmarks:

✅ GOOD: Performance score > 80
⚠️ WARNING: 60-80
❌ BAD: < 60

Common issues and fixes:

"Eliminate render-blocking resources" → lazy load components
"Reduce unused JavaScript" → code splitting needed
"Serve images in next-gen formats" → use WebP instead of PNG

3. API Response Time Check

3.1 Manual Timing Test

Use Railway logs:

1. Go to Railway dashboard → API service → Deployments
2. Click "View Logs"
3. Perform actions in ResolutionFlow frontend
4. Watch logs for response times

FastAPI logs look like:

INFO:     127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s]

Benchmarks:

✅ GOOD: Most endpoints < 100ms
⚠️ WARNING: Some endpoints 100-300ms
❌ BAD: Any endpoint > 500ms

3.2 Automated API Testing

Create a simple test script:

# File: tests/performance_test.py

import httpx
import time
from statistics import mean

API_BASE = "https://api.resolutionflow.com"  # Your Railway API URL
TOKEN = "your-jwt-token-here"  # Get from browser DevTools after login

headers = {"Authorization": f"Bearer {TOKEN}"}

def time_endpoint(method, path, **kwargs):
    """Time a single API request"""
    start = time.time()
    response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs)
    elapsed = (time.time() - start) * 1000  # Convert to milliseconds
    return elapsed, response.status_code

# Test critical endpoints
tests = [
    ("GET", "/api/trees"),
    ("GET", "/api/trees/1"),  # Replace with actual tree ID
    ("GET", "/api/trees/1/nodes"),
    ("POST", "/api/trees/search", json={"query": "password"}),
]

print("API Performance Test Results:")
print("-" * 50)

for method, path in tests:
    times = []
    for i in range(5):  # Run each test 5 times
        elapsed, status = time_endpoint(method, path)
        times.append(elapsed)
    
    avg_time = mean(times)
    print(f"{method} {path}")
    print(f"  Average: {avg_time:.2f}ms")
    print(f"  Min: {min(times):.2f}ms, Max: {max(times):.2f}ms")
    print()

Run it:

python tests/performance_test.py

4. Monitoring Setup

4.1 Railway Built-in Monitoring

What Railway gives you for free:

1. Go to Railway dashboard
2. Click each service (API, Frontend, PostgreSQL)
3. Go to "Metrics" tab

Watch for:

CPU usage spikes (should stay < 50% normally)
Memory usage growing over time (memory leak indicator)
Request rate (see usage patterns)

Set up alerts:

1. Railway dashboard → Project Settings → Notifications
2. Add your email
3. Enable "Deployment Failed" and "Service Crashed"

4.2 Sentry Error Tracking (Recommended)

Why add Sentry:

Free tier = 5,000 errors/month
Email alerts when things break
See exact user actions before crash
Industry standard (your future dev team will expect this)

Setup (5 minutes):

Backend (FastAPI):

pip install sentry-sdk[fastapi]

# File: main.py (add at the top)

import sentry_sdk

sentry_sdk.init(
    dsn="your-sentry-dsn-here",  # Get from sentry.io after signup
    traces_sample_rate=0.1,  # 10% of requests (free tier friendly)
    environment="production",
)

Frontend (React):

npm install @sentry/react

// File: src/index.js (add at the top)

import * as Sentry from "@sentry/react";

Sentry.init({
  dsn: "your-sentry-dsn-here",
  integrations: [new Sentry.BrowserTracing()],
  tracesSampleRate: 0.1,
  environment: "production",
});

Get your DSN:

1. Sign up at sentry.io (free)
2. Create new project → Select "FastAPI" and "React"
3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456)
4. Add to Railway environment variables:
   - SENTRY_DSN=your-dsn-here

What you get:

Email when errors occur
Stack traces showing exactly what broke
User session replay (see what they clicked before crash)
Performance monitoring (slow API calls flagged automatically)

5. Load Testing with k6

Why k6:

Industry standard (Grafana Labs maintains it)
Shows you EXACTLY how many concurrent users your app can handle
Simple JavaScript syntax
Free and open source

5.1 Install k6

Windows (using Chocolatey):

choco install k6

Or download directly:

Go to: https://k6.io/docs/get-started/installation/
Download Windows installer
Run installer

Verify:

k6 version

5.2 Create Load Test Script

File: tests/load_test.js

import http from 'k6/http';
import { check, sleep } from 'k6';

// Test configuration
export const options = {
  stages: [
    { duration: '30s', target: 10 },  // Ramp up to 10 users over 30s
    { duration: '1m', target: 10 },   // Stay at 10 users for 1 minute
    { duration: '30s', target: 20 },  // Ramp up to 20 users
    { duration: '1m', target: 20 },   // Stay at 20 users for 1 minute
    { duration: '30s', target: 0 },   // Ramp down to 0
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms
    http_req_failed: ['rate<0.01'],   // Less than 1% of requests can fail
  },
};

const BASE_URL = 'https://api.resolutionflow.com';
let authToken;

// Setup: Login once per virtual user
export function setup() {
  const loginRes = http.post(`${BASE_URL}/api/auth/login`, 
    JSON.stringify({
      username: 'test_user',  // Replace with test account
      password: 'test_password',
    }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  
  return { token: loginRes.json('access_token') };
}

// Main test: Simulate realistic user behavior
export default function (data) {
  const headers = {
    'Authorization': `Bearer ${data.token}`,
    'Content-Type': 'application/json',
  };

  // Scenario 1: Load dashboard (get trees list)
  let res = http.get(`${BASE_URL}/api/trees`, { headers });
  check(res, {
    'dashboard loaded': (r) => r.status === 200,
    'dashboard fast': (r) => r.timings.duration < 300,
  });
  sleep(1);  // User reads for 1 second

  // Scenario 2: Open a tree
  res = http.get(`${BASE_URL}/api/trees/1`, { headers });  // Replace with real tree ID
  check(res, {
    'tree loaded': (r) => r.status === 200,
    'tree load fast': (r) => r.timings.duration < 500,
  });
  sleep(2);  // User reads tree for 2 seconds

  // Scenario 3: Load tree nodes
  res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers });
  check(res, {
    'nodes loaded': (r) => r.status === 200,
    'nodes fast': (r) => r.timings.duration < 500,
  });
  sleep(1);

  // Scenario 4: Search trees
  res = http.post(
    `${BASE_URL}/api/trees/search`,
    JSON.stringify({ query: 'password reset' }),
    { headers }
  );
  check(res, {
    'search worked': (r) => r.status === 200,
    'search fast': (r) => r.timings.duration < 400,
  });
  sleep(2);
}

5.3 Run Load Test

Basic test (10 users):

k6 run tests/load_test.js

Aggressive test (50 users):

k6 run --vus 50 --duration 2m tests/load_test.js

What the output means:

     ✓ dashboard loaded
     ✓ dashboard fast
     
     checks.........................: 95.23% ✓ 1234  ✗ 78
     data_received..................: 1.2 MB  20 kB/s
     data_sent......................: 456 kB  7.6 kB/s
     http_req_blocked...............: avg=1.2ms   min=0s   med=0s   max=45ms  p(90)=0s   p(95)=0s  
     http_req_duration..............: avg=142ms   min=23ms med=98ms max=1.2s  p(90)=245ms p(95)=387ms
     http_reqs......................: 1234   20.5/s

How to read this:

checks: % of tests that passed (want > 95%)
http_req_duration p(95): 95% of requests faster than this (want < 500ms)
http_reqs: Requests per second your app handled
http_req_failed: % of requests that errored (want < 1%)

5.4 Interpret Results

✅ GOOD (Ready for beta):

http_req_duration p(95) < 500ms
http_req_failed < 1%
All checks passing > 95%

⚠️ WARNING (Watch closely during beta):

http_req_duration p(95) 500-1000ms
http_req_failed 1-5%
Some checks failing

❌ BAD (Fix before beta launch):

http_req_duration p(95) > 1000ms
http_req_failed > 5%
Lots of timeouts or 500 errors

6. Pre-Launch Checklist

Run this checklist before inviting beta testers:

Database

All critical indexes exist (Section 1.1)
Query performance < 200ms (Section 1.2)
No unexplained table bloat (Section 1.3)

Frontend

Large tree (100 nodes) renders without lag (Section 2.1)
Bundle size < 1MB (Section 2.2)
Lighthouse score > 70 (Section 2.3)

API

All endpoints < 500ms under load (Section 3)
Railway logs show no errors (Section 4.1)

Monitoring

Railway alerts configured (Section 4.1)
Sentry installed (optional but recommended) (Section 4.2)

Load Testing

k6 test passes with 20 concurrent users (Section 5.3)
No request failures during load test (Section 5.4)

7. Monthly Health Check (After Launch)

Once live with beta testers, run this monthly:

Quick version (30 minutes):

# 1. Check Railway metrics
# Look for: CPU/memory trends, error rate spikes

# 2. Review Sentry errors (if installed)
# Look for: New error patterns, increasing error rates

# 3. Run quick load test
k6 run tests/load_test.js

# 4. Check database query times
# Run queries from Section 1.2, watch for slowdowns

When to do deep dive:

After adding major new features
If users report slowness
Before scaling to new MSP clients
Every 3 months minimum

8. Common Performance Issues & Fixes

Issue: "Search is slow"

Diagnosis:

EXPLAIN ANALYZE 
SELECT * FROM trees 
WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password');

Fix: Add GIN index:

CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description));

Issue: "Loading tree nodes is slow"

Diagnosis: Missing index on foreign key

Fix:

CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id);

Issue: "Dashboard takes forever to load"

Diagnosis: Fetching too much data

Fix: Add pagination to API:

# Instead of: SELECT * FROM trees
# Use: SELECT * FROM trees LIMIT 20 OFFSET 0

Issue: "Frontend feels sluggish"

Diagnosis: Re-rendering too often

Fix: Add React.memo() to components, use proper dependency arrays in useEffect

Issue: "API crashes under load"

Diagnosis: Not enough Railway resources

Fix:

1. Railway dashboard → API service → Settings
2. Increase memory limit (default is 512MB, try 1GB)
3. Enable auto-scaling if needed

Resources

Tools mentioned:

k6: https://k6.io/docs/
Sentry: https://sentry.io/
PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html
Chrome Lighthouse: Built into Chrome DevTools (F12)

When to get help:

k6 test failing badly (> 10% error rate)
Database queries consistently > 1 second
Sentry showing critical errors
Railway CPU/memory maxing out

Next steps after this checklist:

If all checks pass → Launch beta confidently
If warnings found → Document them, monitor during beta
If critical issues → Fix before launch, re-run tests

15 KiB Raw Permalink Blame History

ResolutionFlow Performance Health Check

Prerequisites

1. Database Performance Check

1.1 Verify Indexes Exist

1.2 Test Query Performance

1.3 Check Database Size

2. Frontend Performance Check

2.1 Test Large Tree Rendering

2.2 Check Bundle Size

2.3 Lighthouse Audit

3. API Response Time Check

3.1 Manual Timing Test

3.2 Automated API Testing

4. Monitoring Setup

4.1 Railway Built-in Monitoring

4.2 Sentry Error Tracking (Recommended)

5. Load Testing with k6

5.1 Install k6

5.2 Create Load Test Script

5.3 Run Load Test

5.4 Interpret Results

6. Pre-Launch Checklist

Database

Frontend

API

Monitoring

Load Testing

7. Monthly Health Check (After Launch)

8. Common Performance Issues & Fixes

Issue: "Search is slow"

Issue: "Loading tree nodes is slow"

Issue: "Dashboard takes forever to load"

Issue: "Frontend feels sluggish"

Issue: "API crashes under load"

Resources

15 KiB

Raw Permalink Blame History