diff --git a/CLAUDE.md b/CLAUDE.md index a3d9a594..10b246c0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -91,6 +91,20 @@ When adding new frontend pages or components, use "ResolutionFlow" for any user- - Purple gradient theme, custom fonts (Plus Jakarta Sans, Inter, Outfit) - Custom SVG logo in header and auth pages - Updated favicon and browser tab title +- **Token Refresh Fix:** + - Silent refresh with single-flight queue (prevents concurrent 401 race conditions) + - Backend `get_refresh_token_payload` dependency extracts refresh token from Authorization header + - Frontend Axios interceptor queues failed requests during refresh, retries after success + - Auth store synced after silent refresh via `setTokens` action +- **Session Scratchpad (Floating Overlay):** + - Fixed-position overlay panel (420px wide, 55vh tall) on right edge + - Floating button when collapsed, slide-in panel when expanded + - Ctrl+/ keyboard shortcut to toggle + - Auto-save with 1s debounce, markdown preview, localStorage persistence + - Main content adjusts width via padding transition when panel opens +- **Global Thin Scrollbar Styling:** + - 6px thin scrollbars site-wide (Firefox `scrollbar-width: thin` + WebKit pseudo-elements) + - Theme-aware colors using CSS variables (`--border`, `--muted-foreground`) ### What's In Progress @@ -180,7 +194,7 @@ patherly/ │ │ ├── router.tsx │ │ ├── assets/brand/ # Brand logos (SVG) │ │ ├── api/ # Axios API client -│ │ │ ├── client.ts # Axios instance with interceptors +│ │ │ ├── client.ts # Axios instance with refresh queue interceptor │ │ │ ├── auth.ts │ │ │ ├── trees.ts │ │ │ └── sessions.ts @@ -196,7 +210,7 @@ patherly/ │ │ │ ├── tree-editor/ # Tree editor components │ │ │ ├── tree-preview/ # Visual tree preview │ │ │ ├── step-library/ # Step library browser, forms, modals -│ │ │ ├── session/ # Session modals, scratchpad sidebar +│ │ │ ├── session/ # Session modals, scratchpad floating overlay │ │ │ └── ui/ # MarkdownContent │ │ ├── pages/ │ │ │ ├── LoginPage.tsx @@ -508,6 +522,33 @@ Key state: `pendingStep`, `pendingContinuationNodeId`, `customBranchMode`, `bran Custom steps are stored in session JSONB (`custom_steps` field) and referenced by UUID in `pathTaken`. `findNode()` only searches tree structure -- use `findCustomStep()` for custom step UUIDs. +### Token Refresh: Match Frontend/Backend Contract + +The refresh endpoint must accept tokens the same way the frontend sends them. + +```python +# WRONG - Expects bare string, but frontend sends Authorization header +@router.post("/refresh") +async def refresh_token(refresh_token: str): + payload = decode_token(refresh_token) + +# CORRECT - Use dependency that reads from Authorization header +@router.post("/refresh") +async def refresh_token( + payload: Annotated[dict, Depends(get_refresh_token_payload)], +): +``` + +The frontend Axios interceptor sends `Authorization: Bearer `. The backend must extract it from the header, not expect it as a query/body parameter. + +### CORS Errors Can Mask Server 500s + +When the backend returns a 500 Internal Server Error, CORS headers are not added to the response. The browser reports this as a CORS error, hiding the real cause. Always check backend logs first when debugging CORS issues locally. + +### Run Migrations Before Local Testing + +After cloning or pulling new changes, always run `alembic upgrade head` before starting the backend. Missing migrations cause 500 errors (e.g., `column does not exist`) that manifest as CORS errors in the browser. + --- ## API Endpoints Reference @@ -586,7 +627,7 @@ interface Decision { ### State Management -- **Auth:** `useAuthStore` - Zustand with localStorage persistence +- **Auth:** `useAuthStore` - Zustand with localStorage persistence (includes `setTokens` for silent refresh sync) - **Theme:** `useThemeStore` - Dark/light/system preference - **Tree Editor:** `useTreeEditorStore` - Zustand + immer + zundo (undo/redo) - **User Preferences:** `useUserPreferencesStore` - Zustand with localStorage persistence (export format default) @@ -612,9 +653,28 @@ interface Decision { import api from '@/api/client' // Token refresh handled automatically by interceptor +// Concurrent 401s are queued — only one refresh request fires at a time +// On refresh failure, user is logged out and redirected to /login const response = await api.get('/api/v1/trees') ``` +### Floating Overlay Pattern (Scratchpad) + +The scratchpad uses `position: fixed` with an `onOpenChange` callback so the parent page can adjust layout: + +```tsx +// Child: ScratchpadSidebar.tsx +onOpenChange?: (isOpen: boolean) => void +// Fires when collapsed state changes, parent uses it to add/remove padding + +// Parent: TreeNavigationPage.tsx +const [scratchpadOpen, setScratchpadOpen] = useState(...) +
+
{/* centers in available space */} +``` + +Position overlay at `right-2` (not `right-0`) so it sits inside the page scrollbar, and use full `rounded-lg` (not `rounded-l-lg`). + --- ## Common Tasks diff --git a/docs/PERFORMANCE-HEALTH-CHECK.md b/docs/PERFORMANCE-HEALTH-CHECK.md new file mode 100644 index 00000000..52104e0b --- /dev/null +++ b/docs/PERFORMANCE-HEALTH-CHECK.md @@ -0,0 +1,634 @@ +# ResolutionFlow Performance Health Check + +**Purpose:** Verify application performance and scalability before/during beta testing +**When to run:** Before beta launch, then monthly during growth phase +**Time required:** 2-3 hours first time, 30-60 minutes for routine checks + +--- + +## Prerequisites + +- [ ] Docker Desktop running +- [ ] Access to Railway dashboard +- [ ] VS Code open with ResolutionFlow project +- [ ] Python virtual environment activated +- [ ] Node.js installed (for k6) + +--- + +## 1. Database Performance Check + +### 1.1 Verify Indexes Exist + +**Why:** Indexes are like the index in a book - without them, PostgreSQL scans every row (slow). With them, lookups are instant. + +**Commands to run:** +```bash +# Connect to your Railway PostgreSQL database +# Get connection string from Railway dashboard → PostgreSQL service → Variables → DATABASE_URL + +# Option 1: Use Railway CLI +railway connect PostgreSQL + +# Option 2: Use psql directly +psql "your-database-url-here" +``` + +**Once connected, run:** +```sql +-- Check what indexes exist +SELECT + tablename, + indexname, + indexdef +FROM pg_indexes +WHERE schemaname = 'public' +ORDER BY tablename, indexname; +``` + +**What you're looking for:** + +✅ **GOOD:** You should see indexes on: +- `users.email` (for login lookups) +- `users.username` (for login lookups) +- `trees.created_by` (for "my trees" queries) +- `tree_nodes.tree_id` (for loading tree structure) +- `sessions.tree_id` (for session lookups) + +❌ **BAD:** If these are missing, queries will slow down as data grows + +**Fix if needed:** +```sql +-- Example: Add missing index +CREATE INDEX idx_trees_created_by ON trees(created_by); +CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id); +CREATE INDEX idx_sessions_tree_id ON sessions(tree_id); +``` + +### 1.2 Test Query Performance + +**Run realistic queries and time them:** +```sql +-- Enable timing +\timing + +-- Test: Full-text search on trees (simulates search bar) +SELECT * FROM trees +WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password'); + +-- Test: Load tree with all nodes (simulates opening tree editor) +SELECT tn.* +FROM tree_nodes tn +WHERE tn.tree_id = 1 -- Replace with actual tree ID +ORDER BY tn.position; + +-- Test: User's tree list (simulates dashboard) +SELECT * FROM trees +WHERE created_by = 1 -- Replace with actual user ID +ORDER BY updated_at DESC +LIMIT 20; +``` + +**Benchmarks:** +- ✅ **GOOD:** All queries < 50ms +- ⚠️ **WARNING:** Any query 50-200ms (optimize later) +- ❌ **BAD:** Any query > 200ms (optimize NOW) + +### 1.3 Check Database Size +```sql +-- See how much data you have +SELECT + schemaname, + tablename, + pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size +FROM pg_tables +WHERE schemaname = 'public' +ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; +``` + +**What this tells you:** If tables are growing unexpectedly large, you might have data bloat or missing cleanup logic. + +--- + +## 2. Frontend Performance Check + +### 2.1 Test Large Tree Rendering + +**Create a "stress test" tree:** + +1. Log into ResolutionFlow frontend +2. Create a new tree called "Performance Test - Large Tree" +3. Add 50-100 nodes (use copy/paste to speed this up) +4. Save the tree + +**What to watch:** + +- Does the editor lag when adding nodes? +- Does scrolling feel smooth? +- Does saving take more than 2-3 seconds? + +**Tools to use:** + +Open Chrome DevTools (F12): +``` +1. Go to Performance tab +2. Click Record (red circle) +3. Interact with large tree (scroll, add nodes, expand/collapse) +4. Stop recording +5. Look for red bars (blocking/slow operations) +``` + +**Benchmarks:** +- ✅ **GOOD:** No operations block for > 100ms +- ⚠️ **WARNING:** Some operations 100-300ms +- ❌ **BAD:** Operations > 300ms (users will notice lag) + +### 2.2 Check Bundle Size + +**Why:** Large JavaScript bundles = slow initial page load +```bash +# From your React frontend directory +cd frontend +npm run build + +# Look at the output - it will show bundle sizes +``` + +**Benchmarks:** +- ✅ **GOOD:** Main bundle < 500KB gzipped +- ⚠️ **WARNING:** 500KB - 1MB +- ❌ **BAD:** > 1MB (investigate what's bloating it) + +### 2.3 Lighthouse Audit + +**Chrome has this built-in:** +``` +1. Open ResolutionFlow in Chrome +2. F12 → Lighthouse tab +3. Select "Desktop" + "Performance" +4. Click "Analyze page load" +``` + +**Benchmarks:** +- ✅ **GOOD:** Performance score > 80 +- ⚠️ **WARNING:** 60-80 +- ❌ **BAD:** < 60 + +**Common issues and fixes:** +- "Eliminate render-blocking resources" → lazy load components +- "Reduce unused JavaScript" → code splitting needed +- "Serve images in next-gen formats" → use WebP instead of PNG + +--- + +## 3. API Response Time Check + +### 3.1 Manual Timing Test + +**Use Railway logs:** +``` +1. Go to Railway dashboard → API service → Deployments +2. Click "View Logs" +3. Perform actions in ResolutionFlow frontend +4. Watch logs for response times +``` + +FastAPI logs look like: +``` +INFO: 127.0.0.1 - "GET /api/trees HTTP/1.1" 200 OK [0.023s] +``` + +**Benchmarks:** +- ✅ **GOOD:** Most endpoints < 100ms +- ⚠️ **WARNING:** Some endpoints 100-300ms +- ❌ **BAD:** Any endpoint > 500ms + +### 3.2 Automated API Testing + +**Create a simple test script:** +```python +# File: tests/performance_test.py + +import httpx +import time +from statistics import mean + +API_BASE = "https://api.resolutionflow.com" # Your Railway API URL +TOKEN = "your-jwt-token-here" # Get from browser DevTools after login + +headers = {"Authorization": f"Bearer {TOKEN}"} + +def time_endpoint(method, path, **kwargs): + """Time a single API request""" + start = time.time() + response = httpx.request(method, f"{API_BASE}{path}", headers=headers, **kwargs) + elapsed = (time.time() - start) * 1000 # Convert to milliseconds + return elapsed, response.status_code + +# Test critical endpoints +tests = [ + ("GET", "/api/trees"), + ("GET", "/api/trees/1"), # Replace with actual tree ID + ("GET", "/api/trees/1/nodes"), + ("POST", "/api/trees/search", json={"query": "password"}), +] + +print("API Performance Test Results:") +print("-" * 50) + +for method, path in tests: + times = [] + for i in range(5): # Run each test 5 times + elapsed, status = time_endpoint(method, path) + times.append(elapsed) + + avg_time = mean(times) + print(f"{method} {path}") + print(f" Average: {avg_time:.2f}ms") + print(f" Min: {min(times):.2f}ms, Max: {max(times):.2f}ms") + print() +``` + +**Run it:** +```bash +python tests/performance_test.py +``` + +--- + +## 4. Monitoring Setup + +### 4.1 Railway Built-in Monitoring + +**What Railway gives you for free:** +``` +1. Go to Railway dashboard +2. Click each service (API, Frontend, PostgreSQL) +3. Go to "Metrics" tab +``` + +**Watch for:** +- CPU usage spikes (should stay < 50% normally) +- Memory usage growing over time (memory leak indicator) +- Request rate (see usage patterns) + +**Set up alerts:** +``` +1. Railway dashboard → Project Settings → Notifications +2. Add your email +3. Enable "Deployment Failed" and "Service Crashed" +``` + +### 4.2 Sentry Error Tracking (Recommended) + +**Why add Sentry:** +- Free tier = 5,000 errors/month +- Email alerts when things break +- See exact user actions before crash +- Industry standard (your future dev team will expect this) + +**Setup (5 minutes):** + +**Backend (FastAPI):** +```bash +pip install sentry-sdk[fastapi] +``` +```python +# File: main.py (add at the top) + +import sentry_sdk + +sentry_sdk.init( + dsn="your-sentry-dsn-here", # Get from sentry.io after signup + traces_sample_rate=0.1, # 10% of requests (free tier friendly) + environment="production", +) +``` + +**Frontend (React):** +```bash +npm install @sentry/react +``` +```javascript +// File: src/index.js (add at the top) + +import * as Sentry from "@sentry/react"; + +Sentry.init({ + dsn: "your-sentry-dsn-here", + integrations: [new Sentry.BrowserTracing()], + tracesSampleRate: 0.1, + environment: "production", +}); +``` + +**Get your DSN:** +``` +1. Sign up at sentry.io (free) +2. Create new project → Select "FastAPI" and "React" +3. Copy the DSN (looks like: https://abc123@o123.ingest.sentry.io/456) +4. Add to Railway environment variables: + - SENTRY_DSN=your-dsn-here +``` + +**What you get:** + +- Email when errors occur +- Stack traces showing exactly what broke +- User session replay (see what they clicked before crash) +- Performance monitoring (slow API calls flagged automatically) + +--- + +## 5. Load Testing with k6 + +**Why k6:** +- Industry standard (Grafana Labs maintains it) +- Shows you EXACTLY how many concurrent users your app can handle +- Simple JavaScript syntax +- Free and open source + +### 5.1 Install k6 + +**Windows (using Chocolatey):** +```powershell +choco install k6 +``` + +**Or download directly:** +- Go to: https://k6.io/docs/get-started/installation/ +- Download Windows installer +- Run installer + +**Verify:** +```bash +k6 version +``` + +### 5.2 Create Load Test Script + +**File: `tests/load_test.js`** +```javascript +import http from 'k6/http'; +import { check, sleep } from 'k6'; + +// Test configuration +export const options = { + stages: [ + { duration: '30s', target: 10 }, // Ramp up to 10 users over 30s + { duration: '1m', target: 10 }, // Stay at 10 users for 1 minute + { duration: '30s', target: 20 }, // Ramp up to 20 users + { duration: '1m', target: 20 }, // Stay at 20 users for 1 minute + { duration: '30s', target: 0 }, // Ramp down to 0 + ], + thresholds: { + http_req_duration: ['p(95)<500'], // 95% of requests must complete in 500ms + http_req_failed: ['rate<0.01'], // Less than 1% of requests can fail + }, +}; + +const BASE_URL = 'https://api.resolutionflow.com'; +let authToken; + +// Setup: Login once per virtual user +export function setup() { + const loginRes = http.post(`${BASE_URL}/api/auth/login`, + JSON.stringify({ + username: 'test_user', // Replace with test account + password: 'test_password', + }), + { headers: { 'Content-Type': 'application/json' } } + ); + + return { token: loginRes.json('access_token') }; +} + +// Main test: Simulate realistic user behavior +export default function (data) { + const headers = { + 'Authorization': `Bearer ${data.token}`, + 'Content-Type': 'application/json', + }; + + // Scenario 1: Load dashboard (get trees list) + let res = http.get(`${BASE_URL}/api/trees`, { headers }); + check(res, { + 'dashboard loaded': (r) => r.status === 200, + 'dashboard fast': (r) => r.timings.duration < 300, + }); + sleep(1); // User reads for 1 second + + // Scenario 2: Open a tree + res = http.get(`${BASE_URL}/api/trees/1`, { headers }); // Replace with real tree ID + check(res, { + 'tree loaded': (r) => r.status === 200, + 'tree load fast': (r) => r.timings.duration < 500, + }); + sleep(2); // User reads tree for 2 seconds + + // Scenario 3: Load tree nodes + res = http.get(`${BASE_URL}/api/trees/1/nodes`, { headers }); + check(res, { + 'nodes loaded': (r) => r.status === 200, + 'nodes fast': (r) => r.timings.duration < 500, + }); + sleep(1); + + // Scenario 4: Search trees + res = http.post( + `${BASE_URL}/api/trees/search`, + JSON.stringify({ query: 'password reset' }), + { headers } + ); + check(res, { + 'search worked': (r) => r.status === 200, + 'search fast': (r) => r.timings.duration < 400, + }); + sleep(2); +} +``` + +### 5.3 Run Load Test + +**Basic test (10 users):** +```bash +k6 run tests/load_test.js +``` + +**Aggressive test (50 users):** +```bash +k6 run --vus 50 --duration 2m tests/load_test.js +``` + +**What the output means:** +``` + ✓ dashboard loaded + ✓ dashboard fast + + checks.........................: 95.23% ✓ 1234 ✗ 78 + data_received..................: 1.2 MB 20 kB/s + data_sent......................: 456 kB 7.6 kB/s + http_req_blocked...............: avg=1.2ms min=0s med=0s max=45ms p(90)=0s p(95)=0s + http_req_duration..............: avg=142ms min=23ms med=98ms max=1.2s p(90)=245ms p(95)=387ms + http_reqs......................: 1234 20.5/s +``` + +**How to read this:** + +- `checks`: % of tests that passed (want > 95%) +- `http_req_duration p(95)`: 95% of requests faster than this (want < 500ms) +- `http_reqs`: Requests per second your app handled +- `http_req_failed`: % of requests that errored (want < 1%) + +### 5.4 Interpret Results + +**✅ GOOD (Ready for beta):** +``` +http_req_duration p(95) < 500ms +http_req_failed < 1% +All checks passing > 95% +``` + +**⚠️ WARNING (Watch closely during beta):** +``` +http_req_duration p(95) 500-1000ms +http_req_failed 1-5% +Some checks failing +``` + +**❌ BAD (Fix before beta launch):** +``` +http_req_duration p(95) > 1000ms +http_req_failed > 5% +Lots of timeouts or 500 errors +``` + +--- + +## 6. Pre-Launch Checklist + +Run this checklist **before** inviting beta testers: + +### Database +- [ ] All critical indexes exist (Section 1.1) +- [ ] Query performance < 200ms (Section 1.2) +- [ ] No unexplained table bloat (Section 1.3) + +### Frontend +- [ ] Large tree (100 nodes) renders without lag (Section 2.1) +- [ ] Bundle size < 1MB (Section 2.2) +- [ ] Lighthouse score > 70 (Section 2.3) + +### API +- [ ] All endpoints < 500ms under load (Section 3) +- [ ] Railway logs show no errors (Section 4.1) + +### Monitoring +- [ ] Railway alerts configured (Section 4.1) +- [ ] Sentry installed (optional but recommended) (Section 4.2) + +### Load Testing +- [ ] k6 test passes with 20 concurrent users (Section 5.3) +- [ ] No request failures during load test (Section 5.4) + +--- + +## 7. Monthly Health Check (After Launch) + +Once live with beta testers, run this monthly: + +**Quick version (30 minutes):** +```bash +# 1. Check Railway metrics +# Look for: CPU/memory trends, error rate spikes + +# 2. Review Sentry errors (if installed) +# Look for: New error patterns, increasing error rates + +# 3. Run quick load test +k6 run tests/load_test.js + +# 4. Check database query times +# Run queries from Section 1.2, watch for slowdowns +``` + +**When to do deep dive:** +- After adding major new features +- If users report slowness +- Before scaling to new MSP clients +- Every 3 months minimum + +--- + +## 8. Common Performance Issues & Fixes + +### Issue: "Search is slow" + +**Diagnosis:** +```sql +EXPLAIN ANALYZE +SELECT * FROM trees +WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('english', 'password'); +``` + +**Fix:** Add GIN index: +```sql +CREATE INDEX idx_trees_fts ON trees USING GIN (to_tsvector('english', name || ' ' || description)); +``` + +### Issue: "Loading tree nodes is slow" + +**Diagnosis:** Missing index on foreign key + +**Fix:** +```sql +CREATE INDEX idx_tree_nodes_tree_id ON tree_nodes(tree_id); +``` + +### Issue: "Dashboard takes forever to load" + +**Diagnosis:** Fetching too much data + +**Fix:** Add pagination to API: +```python +# Instead of: SELECT * FROM trees +# Use: SELECT * FROM trees LIMIT 20 OFFSET 0 +``` + +### Issue: "Frontend feels sluggish" + +**Diagnosis:** Re-rendering too often + +**Fix:** Add React.memo() to components, use proper dependency arrays in useEffect + +### Issue: "API crashes under load" + +**Diagnosis:** Not enough Railway resources + +**Fix:** +``` +1. Railway dashboard → API service → Settings +2. Increase memory limit (default is 512MB, try 1GB) +3. Enable auto-scaling if needed +``` + +--- + +## Resources + +**Tools mentioned:** +- k6: https://k6.io/docs/ +- Sentry: https://sentry.io/ +- PostgreSQL EXPLAIN: https://www.postgresql.org/docs/current/using-explain.html +- Chrome Lighthouse: Built into Chrome DevTools (F12) + +**When to get help:** +- k6 test failing badly (> 10% error rate) +- Database queries consistently > 1 second +- Sentry showing critical errors +- Railway CPU/memory maxing out + +**Next steps after this checklist:** +- If all checks pass → Launch beta confidently +- If warnings found → Document them, monitor during beta +- If critical issues → Fix before launch, re-run tests \ No newline at end of file