docs: add permissions audit design doc and implementation plan

Full-stack RBAC audit covering frontend UX, backend architecture,
and adversarial analysis. Implementation plan phased by severity
(Critical → High → Medium → Low).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Michael Chihlas
2026-02-05 17:42:38 -05:00
parent 34daa26a67
commit 02bd97948e
2 changed files with 656 additions and 0 deletions

View File

@@ -0,0 +1,227 @@
# Permissions & RBAC Audit — Design Document
> **Date:** 2026-02-05
> **Status:** Draft
> **Scope:** Full-stack permissions audit across trees, steps, sessions, teams, and admin features
---
## Background
ResolutionFlow has a role-based access control system with three roles (`admin`, `engineer`, `viewer`), a `is_team_admin` boolean, and team-scoped resources. A comprehensive audit was performed across the entire codebase — frontend UX, backend architecture, and adversarial analysis — to assess the current state of permissions enforcement.
**Audit methodology:** Three independent reviews were conducted:
1. **Frontend UX audit** — every page, component, API client, store, and router
2. **Backend architecture audit** — every endpoint, dependency, model, schema, and test
3. **Devil's advocate critique** — security gaps, edge cases, challenged assumptions
---
## Current State Summary
### What Works
- **Session ownership** is properly enforced — users can only see/modify their own sessions
- **Folder ownership** is properly scoped to the creating user
- **Tree CRUD** has reasonable backend permission checks (engineer+ to create, author/admin/team-admin to edit, admin-only delete)
- **Step Library** enforces visibility levels (private/team/public) and ownership for edit/delete
- **Category/Tag management** is properly gated to admin and team admin roles
- **Invite code management** is admin-only
### What's Broken
The permission system has critical vulnerabilities, significant gaps, and inconsistencies across resource types.
---
## Findings by Severity
### CRITICAL
#### 1. Self-Assignable Admin Role at Registration
**Location:** `backend/app/schemas/user.py:14`, `backend/app/api/endpoints/auth.py:74-78`
The `UserCreate` schema accepts a `role` field from client input. The registration endpoint passes it directly to the User model. Any user with a valid invite code can register with `role: "admin"` and gain full admin privileges.
```python
# Current — VULNERABLE
class UserCreate(UserBase):
role: str = Field(default="engineer", ...)
# Registration blindly trusts client input
new_user = User(role=user_data.role, ...)
```
**Impact:** Complete authorization bypass. A single request escalates to admin.
#### 2. XSS in HTML Export
**Location:** `backend/app/api/endpoints/sessions.py:349-405`
The HTML export function directly interpolates user-provided strings without escaping:
```python
html.append(f'<h1>{tree_name}</h1>')
html.append(f'<p><strong>Ticket:</strong> {session.ticket_number}</p>')
html.append(f'<p><strong>Client:</strong> {session.client_name}</p>')
```
If any field contains `<script>` tags, the exported HTML becomes an XSS payload. Since exports are likely pasted into ticketing systems or emailed, this is a stored XSS vector.
**Impact:** Arbitrary JavaScript execution in any system that renders the exported HTML.
#### 3. Default Secret Key in Source Code
**Location:** `backend/app/core/config.py:29`
If `.env` is missing or `SECRET_KEY` isn't set, JWTs are signed with a publicly known default string. Anyone can forge admin tokens.
**Impact:** Complete authentication bypass if deployed without proper `.env` configuration.
---
### HIGH
#### 4. No Access Control on `start_session`
**Location:** `backend/app/api/endpoints/sessions.py:69-109`
The endpoint checks tree existence and active status but does NOT check if the user has permission to access the tree. Any authenticated user can start a session on any active tree (including private/team-only) if they know the UUID, receiving the full `tree_structure` in the session's `tree_snapshot`.
**Impact:** Information disclosure — bypasses all tree visibility rules.
#### 5. No Token Revocation
**Location:** `backend/app/api/endpoints/auth.py:188-193`
Logout is a client-side no-op. JWTs remain valid until expiration (15 min access, 7 day refresh). Combined with no account deactivation, a terminated employee retains 7 days of access.
**Impact:** Cannot immediately revoke access to client troubleshooting data.
#### 6. No Account Deactivation (`is_active`)
**Location:** `backend/app/api/deps.py:66-72`
The `get_current_active_user` dependency is a no-op — no `is_active` field exists on the User model. The only way to block a user is to delete their database row, which cascades all their data.
**Impact:** No graceful way to disable accounts.
#### 7. No Rate Limiting on Auth Endpoints
Login, registration, and invite code validation have no rate limiting. The unauthenticated `/invites/validate/{code}` endpoint allows unlimited invite code enumeration.
**Impact:** Brute force attacks and invite code enumeration.
#### 8. `is_team_admin` Cannot Be Set
**Location:** `backend/app/models/user.py:28`
The field exists and permission checks reference it, but no API endpoint can set it. The entire "team admin" permission tier is dead code in production.
**Impact:** Category management, step category management, and team-scoped tag management are admin-only in practice.
---
### MEDIUM
#### 9. Frontend Ignores All Roles
**Location:** `frontend/src/components/layout/ProtectedRoute.tsx`, `frontend/src/router.tsx`
`ProtectedRoute` only checks `isAuthenticated`. No role-based route guards, no conditional UI rendering, no admin pages. All users see the same interface regardless of role.
Specific issues:
- Edit button appears on ALL tree cards for ALL users (`TreeLibraryPage.tsx:309-318`)
- "Create Tree" button visible to viewers who will get 403 (`TreeLibraryPage.tsx:146-155`)
- No 403 error handling in the Axios interceptor
- No admin dashboard despite backend admin APIs existing
#### 10. `None == None` Team Visibility Bug
**Location:** `backend/app/api/endpoints/steps.py:29-37`
```python
if step.visibility == 'team':
return step.team_id == user.team_id or user.role == 'admin'
```
When both `step.team_id` and `user.team_id` are `None`, this evaluates to `True`. Users with no team can see "team" steps that also have no team.
#### 11. Admin List vs. Get Tree Inconsistency
`build_tree_access_filter` (used in `list_trees`) does NOT include an admin bypass, but `get_tree` does. Admins can access individual trees by ID but cannot discover them via the list endpoint.
#### 12. No Audit Logging
No record of administrative actions: tree deletion, invite code creation/revocation, category changes, permission modifications, or failed login attempts.
#### 13. No Delete UI for Trees or Steps
`treesApi.delete()` and `stepsApi.delete()` exist in the frontend API client but are never wired to any button or UI component. Trees and steps cannot be deleted from the frontend.
#### 14. Tree Node Deletion Without Confirmation
**Location:** `frontend/src/components/tree-editor/NodeList.tsx:288-298`
Clicking the trash icon in the tree editor immediately deletes the node with no confirmation dialog.
---
### LOW
| # | Issue | Location |
|---|-------|----------|
| 15 | `viewer` role accepted but barely enforced | `schemas/user.py`, all endpoints |
| 16 | No password complexity beyond 10-char minimum | `schemas/user.py:13` |
| 17 | Soft delete doesn't cascade-clean tags, folder memberships | `trees.py` delete endpoint |
| 18 | Debug CORS endpoint exposed in production | `main.py:86-94` |
| 19 | Tag search `ilike` doesn't escape `%` and `_` wildcards | `tags.py:97` |
| 20 | No team existence validation for admin step creation | `steps.py` create endpoint |
---
## MSP-Specific Concerns
1. **Weak multi-tenancy:** Admin role transcends all teams. Team A's admin could access Team B's client troubleshooting data (ticket numbers, client names, IP addresses in scratchpads).
2. **No data retention/purge:** Session scratchpads and decision notes contain sensitive client information with no cleanup mechanism.
3. **No team-level session visibility:** Supervisors can't review their team's sessions — only the session owner can view them.
4. **Step Library IP leakage:** Public steps expose team-specific troubleshooting procedures (commands, scripts, procedures) to all teams with no approval workflow.
---
## Challenged Assumptions
| Assumption | Reality |
|-----------|---------|
| "Invite codes protect registration" | They don't prevent role escalation — the role field is independent |
| "Team scoping = tenant isolation" | Admin role transcends all teams with no boundary |
| "Soft delete is sufficient" | Leaves data artifacts (sessions with tree_snapshot, folder memberships, tag assignments) |
| "JWT statelessness is acceptable" | For an MSP tool with client data, inability to immediately revoke access is a liability |
| "Frontend doesn't need role awareness" | Without it, every user sees affordances they can't use — confusing UX and attacker map |
---
## Test Coverage Gaps
The existing test suite does NOT cover:
- Cross-user resource access (user A accessing user B's sessions/folders/steps)
- Non-admin attempting tree deletion
- Viewer role restrictions
- Admin role escalation at registration
- Team-based access controls (no team fixtures exist)
- Step library permission checks (no step tests at all)
- Category/tag/folder cross-user access
- Concurrent session access across users
---
## References
- Frontend source: `frontend/src/`
- Backend endpoints: `backend/app/api/endpoints/`
- Auth dependencies: `backend/app/api/deps.py`
- Models: `backend/app/models/`
- Tests: `backend/tests/`

View File

@@ -0,0 +1,429 @@
# Permissions & RBAC Fixes — Implementation Plan
> **Date:** 2026-02-05
> **Status:** Draft
> **Depends on:** [2026-02-05-permissions-audit-design.md](2026-02-05-permissions-audit-design.md)
> **Scope:** Phased fix of all permission issues identified in the audit
---
## Phasing Strategy
Fixes are grouped into four phases by severity and dependency:
- **Phase A:** Critical security fixes (must-do before any multi-user use)
- **Phase B:** High-severity gaps (needed for MSP readiness)
- **Phase C:** Medium-severity improvements (better UX and consistency)
- **Phase D:** Low-severity cleanup (nice-to-haves)
Each phase is independently deployable. Phase A should be done as a single PR.
---
## Phase A: Critical Security Fixes
**Branch:** `fix/critical-security`
**Priority:** Immediate — blocks all other work
### A1. Remove Self-Assignable Admin Role
**Files to modify:**
- `backend/app/schemas/user.py` — Remove `role` from `UserCreate`
- `backend/app/api/endpoints/auth.py` — Hardcode `role="engineer"` in registration
**Changes:**
```python
# schemas/user.py — Remove role from UserCreate
class UserCreate(UserBase):
# role field REMOVED — always defaults to "engineer"
password: str = Field(..., min_length=10)
# auth.py — Hardcode role
new_user = User(
...
role="engineer", # Always engineer; admin set via admin endpoint only
...
)
```
**New endpoint (optional, can defer to Phase B):**
- `PUT /api/v1/admin/users/{user_id}/role` — Admin-only endpoint to change user roles
**Tests to add:**
- Attempt to register with `role: "admin"` → should still be engineer
- Verify no field in registration request can escalate privileges
### A2. Escape HTML Export Output
**Files to modify:**
- `backend/app/api/endpoints/sessions.py` — HTML export function
**Changes:**
```python
from html import escape
# Escape all user-provided content
html.append(f'<h1>{escape(tree_name)}</h1>')
html.append(f'<p><strong>Ticket:</strong> {escape(session.ticket_number or "")}</p>')
html.append(f'<p><strong>Client:</strong> {escape(session.client_name or "")}</p>')
# ... escape all interpolated values
```
**Tests to add:**
- Export session with `<script>` in ticket number → verify escaped in output
- Export session with HTML entities in notes → verify escaped
### A3. Fail on Default Secret Key in Production
**Files to modify:**
- `backend/app/core/config.py`
**Changes:**
```python
@validator("SECRET_KEY")
def validate_secret_key(cls, v):
if not v or v == "your-secret-key-change-in-production-use-openssl-rand-hex-32":
import os
if os.getenv("DEBUG", "true").lower() != "true":
raise ValueError("SECRET_KEY must be set to a secure value in production")
return v
```
### A4. Add Role Enum Constraint
**Files to modify:**
- `backend/app/models/user.py` — Add check constraint or use SQLAlchemy Enum
- New migration
**Changes:**
```python
from sqlalchemy import Enum as SAEnum
role = Column(SAEnum("admin", "engineer", "viewer", name="user_role"),
default="engineer", nullable=False)
```
**Migration:** `alembic revision --autogenerate -m "Add role enum constraint"`
---
## Phase B: High-Severity Gaps
**Branch:** `fix/high-security`
**Priority:** Before production MSP use
### B1. Add Tree Access Check to `start_session`
**Files to modify:**
- `backend/app/api/endpoints/sessions.py`
**Changes:** After fetching the tree, add the same access check used in `get_tree`:
```python
can_access = (
tree.is_default or tree.is_public or
tree.author_id == current_user.id or
(tree.team_id == current_user.team_id and current_user.team_id is not None) or
current_user.role == "admin"
)
if not can_access:
raise HTTPException(status_code=403, detail="You don't have access to this tree")
```
**Tests to add:**
- User starts session on own tree → success
- User starts session on public tree → success
- User starts session on private team tree they don't belong to → 403
- Admin starts session on any tree → success
### B2. Add `is_active` Field and Account Deactivation
**Files to modify:**
- `backend/app/models/user.py` — Add `is_active = Column(Boolean, default=True)`
- `backend/app/api/deps.py` — Implement `get_current_active_user` check
- New migration
**Changes:**
```python
# deps.py
async def get_current_active_user(current_user = Depends(get_current_user)):
if not current_user.is_active:
raise HTTPException(status_code=403, detail="Account has been deactivated")
return current_user
```
**Update all endpoints** to use `get_current_active_user` instead of `get_current_user` as their base dependency.
**New endpoint:**
- `PUT /api/v1/admin/users/{user_id}/deactivate` — Admin-only
### B3. Add User Management Endpoints
**New file:** `backend/app/api/endpoints/admin.py`
**Endpoints:**
| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/v1/admin/users` | List all users (admin only) |
| GET | `/api/v1/admin/users/{id}` | Get user details (admin only) |
| PUT | `/api/v1/admin/users/{id}/role` | Change user role (admin only) |
| PUT | `/api/v1/admin/users/{id}/team-admin` | Toggle is_team_admin (admin only) |
| PUT | `/api/v1/admin/users/{id}/deactivate` | Deactivate account (admin only) |
| PUT | `/api/v1/admin/users/{id}/activate` | Reactivate account (admin only) |
**Files to modify:**
- `backend/app/api/router.py` — Register admin router
- `backend/app/schemas/user.py` — Add admin response/update schemas
### B4. Add Rate Limiting
**New dependency:** `slowapi` or custom middleware
**Files to modify:**
- `backend/requirements.txt` — Add `slowapi`
- `backend/app/main.py` — Add rate limiter
- `backend/app/api/endpoints/auth.py` — Apply rate limits
**Rate limits:**
| Endpoint | Limit |
|----------|-------|
| `POST /auth/login` | 5/minute per IP |
| `POST /auth/register` | 3/minute per IP |
| `POST /auth/refresh` | 10/minute per IP |
| `GET /invites/validate/{code}` | 5/minute per IP |
### B5. Token Revocation (Short-Term Fix)
**Approach:** Switch to short-lived access tokens (5 min) and store refresh tokens in the database so they can be revoked.
**Files to modify:**
- `backend/app/models/` — New `RefreshToken` model (token hash, user_id, expires_at, revoked_at)
- `backend/app/api/endpoints/auth.py` — Store refresh tokens on login, validate on refresh, delete on logout
- `backend/app/core/config.py` — Reduce `ACCESS_TOKEN_EXPIRE_MINUTES` to 5
- New migration
**Logout becomes meaningful:**
```python
@router.post("/logout")
async def logout(payload = Depends(get_refresh_token_payload), db = Depends(get_db)):
# Revoke the refresh token in the database
await db.execute(
update(RefreshToken)
.where(RefreshToken.token_hash == hash_token(payload["jti"]))
.values(revoked_at=datetime.now(timezone.utc))
)
return {"message": "Successfully logged out"}
```
---
## Phase C: Medium-Severity Improvements
**Branch:** `feat/permissions-ux`
**Priority:** Better UX and consistency
### C1. Frontend Role-Based UI Gating
**Files to modify:**
- `frontend/src/components/layout/ProtectedRoute.tsx` — Add `requiredRole` prop
- `frontend/src/router.tsx` — Apply role guards to admin routes
- `frontend/src/pages/TreeLibraryPage.tsx` — Conditionally show create/edit buttons
- `frontend/src/api/client.ts` — Add 403 handling in Axios interceptor
**New component:**
```typescript
// ProtectedRoute with role checking
interface ProtectedRouteProps {
requiredRole?: UserRole | UserRole[]
requireTeamAdmin?: boolean
children: React.ReactNode
}
```
**403 interceptor pattern:**
```typescript
if (error.response?.status === 403) {
toast.error("You don't have permission to perform this action")
return Promise.reject(error)
}
```
### C2. Add Delete UI for Trees and Steps
**Files to modify:**
- `frontend/src/pages/TreeLibraryPage.tsx` — Add delete button (admin/owner only)
- `frontend/src/components/step-library/StepCard.tsx` — Add delete button (owner/admin only)
- `frontend/src/components/common/ConfirmDialog.tsx` — New reusable confirmation modal
**Visibility rules:**
- Tree delete button: visible to admin only (matches backend)
- Step delete button: visible to step owner and admin
- Both require confirmation dialog before API call
### C3. Fix `None == None` Team Visibility Bug
**Files to modify:**
- `backend/app/api/endpoints/steps.py`
**Change:**
```python
def can_view_step(user: User, step: StepLibrary) -> bool:
if step.visibility == 'team':
return (step.team_id is not None
and step.team_id == user.team_id) or user.role == 'admin'
```
### C4. Fix Admin Tree List Inconsistency
**Files to modify:**
- `backend/app/api/endpoints/trees.py`
**Change:** Add admin bypass to `build_tree_access_filter`:
```python
def build_tree_access_filter(current_user: User):
conditions = [
Tree.is_default == True,
Tree.is_public == True,
Tree.author_id == current_user.id,
]
if current_user.team_id:
conditions.append(Tree.team_id == current_user.team_id)
if current_user.role == "admin":
conditions.append(True) # Admin sees all
return or_(*conditions)
```
### C5. Add Node Deletion Confirmation
**Files to modify:**
- `frontend/src/components/tree-editor/NodeList.tsx` — Add confirmation before `deleteNode()`
### C6. Add Audit Log Table
**New file:** `backend/app/models/audit_log.py`
```python
class AuditLog(Base):
__tablename__ = "audit_logs"
id = Column(UUID, primary_key=True, default=uuid4)
user_id = Column(UUID, ForeignKey("users.id"))
action = Column(String(50)) # "tree.delete", "user.role_change", etc.
resource_type = Column(String(50)) # "tree", "step", "user", etc.
resource_id = Column(UUID)
details = Column(JSONB) # Before/after values
created_at = Column(DateTime(timezone=True), default=lambda: datetime.now(timezone.utc))
```
Integrate into admin endpoints and destructive actions.
---
## Phase D: Low-Severity Cleanup
**Branch:** `chore/permissions-cleanup`
**Priority:** When convenient
### D1. Define and Enforce Viewer Role Permissions
Decide what viewers can/cannot do and enforce consistently:
- **Proposed:** Viewers can browse trees, start sessions, and view steps. Cannot create trees, steps, or manage folders/tags/categories.
- Add `require_engineer_or_admin` to step creation, folder creation endpoints.
### D2. Add Password Complexity Validation
**Files to modify:** `backend/app/schemas/user.py`
Add regex validator: at least one uppercase, one lowercase, one digit, 10+ characters.
### D3. Soft Delete Cascade Cleanup
When soft-deleting a tree, also:
- Remove from all folder assignments
- Remove tag assignments
- Keep sessions intact (they have the snapshot)
### D4. Remove Debug Endpoint in Production
**Files to modify:** `backend/app/main.py`
Conditionally register `/debug/cors` only when `DEBUG=true`.
### D5. Escape Wildcards in Tag Search
**Files to modify:** `backend/app/api/endpoints/tags.py`
```python
q_escaped = q.replace("%", "\\%").replace("_", "\\_")
query = select(TreeTag).where(TreeTag.name.ilike(f"%{q_escaped}%"))
```
---
## Execution Order Summary
```
Phase A (Critical) ──── Single PR, immediate
A1. Remove self-assignable admin role
A2. Escape HTML export
A3. Fail on default secret key
A4. Add role enum constraint
Phase B (High) ──── 2-3 PRs
B1. Tree access check on start_session
B2. is_active field + account deactivation
B3. User management admin endpoints
B4. Rate limiting
B5. Token revocation
Phase C (Medium) ──── 3-4 PRs
C1. Frontend role-based UI gating
C2. Delete UI for trees and steps
C3. Fix None==None team visibility
C4. Fix admin tree list inconsistency
C5. Node deletion confirmation
C6. Audit log table
Phase D (Low) ──── As convenient
D1. Viewer role enforcement
D2. Password complexity
D3. Soft delete cascade cleanup
D4. Remove debug endpoint in prod
D5. Escape wildcards in tag search
```
---
## Estimated Scope
| Phase | Backend Changes | Frontend Changes | Migrations | New Tests |
|-------|----------------|-----------------|------------|-----------|
| A | 4 files modified | 0 | 1 | ~8 tests |
| B | 6 files modified, 2 new | 0 | 2 | ~15 tests |
| C | 3 files modified, 2 new | 5 files modified, 1 new | 1 | ~10 tests |
| D | 4 files modified | 0 | 0 | ~5 tests |
---
## Testing Strategy
Each phase should include:
1. **Unit tests** for new permission checks
2. **Integration tests** for cross-user access scenarios
3. **Negative tests** — verify unauthorized access returns 403, not 500
4. Manual verification of frontend behavior per role
**Priority test scenarios:**
- Register with `role: admin` → must be engineer
- User A accesses User B's session → 403
- Viewer creates a tree → 403
- Start session on private tree without access → 403
- Login with deactivated account → 403
- Exported HTML with `<script>` tag → escaped
- Brute force login → rate limited after 5 attempts
---
## Notes
- Phase A changes are backward-compatible — no frontend changes needed
- Phase B introduces new admin endpoints — frontend admin pages can follow in Phase C
- Token revocation (B5) is the most complex change — consider deferring to a dedicated PR
- All new endpoints should follow existing patterns in `deps.py` for consistency