diff --git a/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md b/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md new file mode 100644 index 00000000..1985f2c0 --- /dev/null +++ b/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md @@ -0,0 +1,572 @@ +# Tenant Data Isolation — Design Spec + +> **Date:** 2026-04-09 +> **Status:** Approved — ready for implementation planning +> **Approach:** PostgreSQL RLS as safety net; application-layer filtering as primary enforcement + +--- + +## Overview + +ResolutionFlow is a multi-tenant SaaS. The tenant boundary is `account_id` (UUID foreign key to `accounts.id`). Two accounts must be completely isolated at every layer: one client must never be able to see, access, query, or receive data belonging to another client — even if there is a bug in application code. + +This spec establishes the complete foundation that must be in place before further feature development proceeds. + +### Design Principle + +**Application code is the primary enforcement mechanism. PostgreSQL Row-Level Security (RLS) is the safety net.** + +Both layers must always be present. Developers must never rely on RLS to do filtering that application code should be doing. A missing `tenant_filter()` in application code is a code review failure even if RLS would have caught it. + +--- + +## Current State + +- Tenant boundary: `account_id` (not `team_id` — that column is legacy) +- Isolation today: 100% application-layer, manual per-endpoint `.where(account_id == ...)` +- No RLS, no DB session variables, no middleware tenant injection +- Nullable `account_id` on ~8 core models (User, Tree, TreeCategory, etc.) +- ~20 tables with no direct `account_id` (scoped only through join chains) +- 4 models with `team_id` only, no `account_id`: TargetList, ScriptBuilderSession, ScriptTemplate, ScriptGeneration +- Known existing gaps: see Section 4 + +--- + +## Section 1: Schema Changes + +### 1a. Denormalize `account_id` onto all tenant-relevant tables + +Add `account_id UUID NOT NULL` (with FK to `accounts.id` and index) to each of the following tables that currently lack it: + +| Table | Backfill path | +|---|---| +| `sessions` | `sessions.user_id → users.account_id` | +| `attachments` | `attachments.session_id → sessions.user_id → users.account_id` | +| `session_supporting_data` | `session_supporting_data.session_id → sessions.user_id → users.account_id` | +| `audit_logs` | `audit_logs.user_id → users.account_id` | +| `maintenance_schedules` | `maintenance_schedules.tree_id → trees.account_id` | +| `user_folders` | `user_folders.user_id → users.account_id` | +| `user_pinned_trees` | `user_pinned_trees.user_id → users.account_id` | +| `session_branches` | `session_branches.ai_session_id → ai_sessions.account_id` (verify column name — may be `session_id`) | +| `session_handoffs` | `session_handoffs.ai_session_id → ai_sessions.account_id` (verify column name — may be `session_id`) | +| `session_resolution_outputs` | `session_resolution_outputs.session_id → sessions.user_id → users.account_id` | +| `fork_points` | `fork_points.session_id → ai_sessions.account_id` | +| `ai_session_steps` | `ai_session_steps.session_id → ai_sessions.account_id` | +| `ai_suggestions` | `ai_suggestions.tree_id → trees.account_id` | +| `step_ratings` | `step_ratings.user_id → users.account_id` (backfill from rater, not the step) | +| `step_usage_logs` | `step_usage_logs.user_id → users.account_id` (backfill from user, not the step) | +| `psa_post_logs` | `psa_post_logs.psa_connection_id → psa_connections.account_id` (psa_connections already has account_id NOT NULL) | +| `psa_member_mappings` | `psa_member_mappings.psa_connection_id → psa_connections.account_id` | +| `notification_logs` | `notification_logs.notification_config_id → notification_configs.account_id` | + +**Deferred:** `tree_shares` — backfill strategy and RLS policy depend on whether sharing is intra-tenant only or cross-tenant. Must be resolved before RLS is enabled on this table. See Section 7 (Open Questions). + +### 1b. Make existing nullable `account_id` columns NOT NULL + +These tables have `account_id` already but it is nullable: + +- `users` +- `trees` +- `tree_categories` +- `tree_tags` +- `step_categories` +- `step_library` +- `tree_embeddings` +- `feedback` + +Action: Backfill any NULL rows (assign to correct account or delete if orphaned test/seed data), then `ALTER COLUMN account_id SET NOT NULL`. + +### 1c. Global / template content + +Move global content out of tenant tables into dedicated tables with no RLS: + +- **`template_trees`** — default and publicly visible trees (currently `is_default=True` or `visibility='public'` in `trees`). All tenants can read; no RLS. +- **`platform_steps`** — public steps from step library (currently `visibility='public'` in `step_library`). All tenants can read; no RLS. +- Tables already global (no change needed): `script_categories`, `platform_settings`, `plan_limits`, `feature_flags`, `plan_feature_defaults`. + +### 1d. Migration sequence (per table) + +Every table that gains `account_id` follows this exact sequence. Each step must succeed before the next begins: + +1. `ADD COLUMN account_id UUID` (nullable) +2. Backfill via `UPDATE ... JOIN ...` +3. `SELECT COUNT(*) WHERE account_id IS NULL` — must be zero before proceeding +4. `ALTER COLUMN account_id SET NOT NULL` +5. `CREATE INDEX ON (account_id)` +6. Enable RLS (in Phase 3, not here) + +Any migration that cannot reach step 3 (zero NULLs) must roll back completely. No partial state. + +--- + +## Section 2: PostgreSQL Roles & RLS Infrastructure + +### 2a. Database roles + +Two PostgreSQL roles: + +**`resolutionflow_app`** +- Used by all application requests +- Subject to all RLS policies +- Standard table privileges: `SELECT, INSERT, UPDATE, DELETE` on all tenant tables + +**`resolutionflow_admin`** +- Used by: Alembic migrations, seed scripts, super admin API endpoints, scheduled background jobs requiring cross-tenant access +- Has `BYPASSRLS` attribute — not subject to RLS policies +- Same table privileges as `resolutionflow_app` +- Connection string exposed as `DATABASE_ADMIN_URL` env var + +The current `postgres` superuser is replaced in the application by these two roles. The connection string in `DATABASE_URL` transitions to `resolutionflow_app`. Alembic uses `DATABASE_URL_SYNC` pointing to `resolutionflow_admin`. + +### 2b. Per-request tenant context injection + +Every request that passes through `get_db()` must execute, inside a transaction boundary: + +```sql +SET LOCAL app.current_account_id = ''; +``` + +`SET LOCAL` is transaction-scoped — it resets automatically when the transaction ends. No cleanup needed. This is implemented in a modified `get_db()` dependency that receives `current_user` and executes the SET before yielding the session. + +**Fail-closed behavior:** If `app.current_account_id` is not set (e.g., a bug where `SET LOCAL` was skipped), `current_setting('app.current_account_id', false)::uuid` returns NULL. `NULL = NULL` is false in SQL — the RLS policy matches zero rows. This is the correct fail-closed behavior. + +Public endpoints (no authenticated user) do not call `SET LOCAL`. RLS matches zero rows for all tenant tables. This is correct. + +### 2c. RLS policy pattern + +Every tenant table (from Section 1 + all existing tables with account_id) gets: + +```sql +ALTER TABLE
ENABLE ROW LEVEL SECURITY; +ALTER TABLE
FORCE ROW LEVEL SECURITY; + +-- SELECT +CREATE POLICY tenant_select ON
FOR SELECT + USING (account_id = current_setting('app.current_account_id', false)::uuid); + +-- INSERT +CREATE POLICY tenant_insert ON
FOR INSERT + WITH CHECK (account_id = current_setting('app.current_account_id', false)::uuid); + +-- UPDATE +CREATE POLICY tenant_update ON
FOR UPDATE + USING (account_id = current_setting('app.current_account_id', false)::uuid) + WITH CHECK (account_id = current_setting('app.current_account_id', false)::uuid); + +-- DELETE +CREATE POLICY tenant_delete ON
FOR DELETE + USING (account_id = current_setting('app.current_account_id', false)::uuid); +``` + +**`FORCE ROW LEVEL SECURITY`** ensures the table owner is also subject to policies. The `resolutionflow_admin` role bypasses via its `BYPASSRLS` attribute, not via ownership. + +**`audit_logs` exception:** SELECT policy only. No `WITH CHECK` on INSERT (app inserts audit logs freely). No UPDATE or DELETE policies ever. These constraints are permanent and must be documented in the migration comment. + +**Global tables** (`platform_settings`, `plan_limits`, `feature_flags`, `plan_feature_defaults`, `template_trees`, `platform_steps`): No RLS. + +### 2d. Connection pool reuse safety + +The `SET LOCAL` approach is transaction-scoped. A connection pool reuse test (see Section 6) must verify that a connection returned to the pool after tenant A's request does not carry tenant A's `account_id` into tenant B's request. This is guaranteed by `SET LOCAL` (resets on transaction end) but must be explicitly verified. + +--- + +## Section 3: Application-Layer Enforcement Patterns + +### 3a. `tenant_filter()` helper + +Add to `backend/app/core/filters.py`: + +```python +def tenant_filter(model, account_id: uuid.UUID): + """ + Primary app-layer tenant filter. + MUST be used in every SELECT/UPDATE/DELETE on tenant tables. + RLS is the safety net — this is the primary enforcement. + """ + return model.account_id == account_id +``` + +All existing filter helpers (`build_tree_access_filter`, `build_step_visibility_filter`) must internally call `tenant_filter()` as their base constraint. + +### 3b. Fetch-and-verify pattern + +For ID-based lookups, filter by **both** `id` AND `account_id` in the query — not fetch-then-check: + +```python +# Correct +stmt = select(Tree).where( + Tree.id == tree_id, + tenant_filter(Tree, current_user.account_id) +) +tree = (await db.execute(stmt)).scalar_one_or_none() +if not tree: + raise HTTPException(status_code=404) + +# Prohibited — fetch first, then check +tree = await db.get(Tree, tree_id) +if tree.account_id != current_user.account_id: + raise HTTPException(status_code=403) # Reveals existence — also wrong +``` + +Endpoints must return **404, not 403**, for cross-tenant ID lookups. Never confirm that a resource exists. + +### 3c. `get_tenant_context` dependency + +Add to `backend/app/api/deps.py`: + +```python +async def get_tenant_context( + current_user: User = Depends(get_current_active_user) +) -> uuid.UUID: + """ + Returns the current user's account_id. + Raises 403 if the user has no account association. + Inject this instead of accessing current_user.account_id directly. + """ + if current_user.account_id is None: + raise HTTPException(status_code=403, detail="User not associated with any account") + return current_user.account_id +``` + +### 3d. Insert pattern + +All inserts on tenant tables must explicitly set `account_id`. RLS `WITH CHECK` rejects inserts where `account_id` doesn't match the session variable, but application code must also set it: + +```python +new_record = Tree( + account_id=tenant_account_id, # Required — never omit + ... +) +``` + +### 3e. Code review checklist rule + +Every PR that touches a tenant table must be verified against: + +1. Every SELECT/UPDATE/DELETE includes `tenant_filter()` or a wrapper that calls it +2. ID lookups filter by both `id` and `account_id` in the same query +3. Inserts explicitly set `account_id` +4. 404 (not 403) returned for cross-tenant ID lookups +5. A cross-tenant isolation test is included (Phase 2 onwards) + +### 3f. CI grep check + +A grep-based CI check is active from the end of Phase 0 on all PRs. It flags: +- Queries on known tenant tables that don't include `tenant_filter` or `account_id` as a filter term +- Initial implementation: warn (not block) to allow calibration; switch to block after 2 weeks of false-positive tuning + +Pattern to be defined during Phase 0 implementation. + +--- + +## Section 4: Existing Gap Fixes + +### Immediate Hotfix (ships before Phase 0) + +**CRITICAL: Copilot tree access bypass** +- **Files:** `backend/app/api/endpoints/copilot.py`, `backend/app/services/copilot_service.py` +- **Issue:** `start_conversation()` loads a tree by UUID without verifying the requesting user has access. An attacker who knows a tree UUID from another account can extract its full structure, node names, and descriptions via the AI system prompt. +- **Fix:** Add `can_access_tree(current_user, tree)` check in the endpoint after loading the tree. Also add `tenant_filter(Tree, account_id)` to the tree query in `copilot_service.py`. Raise 404 (not 403) if the tree is not found or not accessible. +- Ships as an independent hotfix PR, merged immediately. + +### Phase 0 Fixes + +**LOW: Analytics flow endpoint** (`backend/app/api/endpoints/analytics.py`) +- `GET /analytics/flows/{tree_id}` returns analytics for any tree by UUID with no access check +- Fix: Add `tenant_filter(Tree, current_user.account_id)` to the tree fetch query. 404 if not found. + +**LOW: Category tree count** (`backend/app/api/endpoints/categories.py`) +- Tree count per category includes trees from all accounts +- Fix: Add `tenant_filter(Tree, current_user.account_id)` to the count subquery + +**LOW: AI session scope inconsistency** (`backend/app/api/endpoints/ai_sessions.py`) +- List endpoint is user-scoped (`user_id == current_user.id`); search endpoint uses `OR(user_id, account_id)` exposing other users' session summaries within the same account +- Sessions are user-scoped only. Cross-user access permitted only via explicit escalation or session sharing +- Fix: Restrict search endpoint to `user_id == current_user.id`. List and search must behave consistently. + +**Phase 0 UUID endpoint audit** +- Systematically review every endpoint with a `{resource_id}` URL parameter +- For each: verify that the ID lookup either (a) filters by `id AND account_id` in the query, or (b) calls `can_access_(current_user, resource)` on the fetched object +- Document every instance found, classify by severity, fix all before Phase 0 closes + +--- + +## Section 5: Legacy `team_id` Migration + +### 5a. TargetList — audit-gated + +Before any migration work on `TargetList`: + +1. Run a full codebase reference audit: grep for all references to `TargetList`, `target_list`, `target-list` +2. Query the production and staging databases for row count +3. Decision tree: + - Zero code references AND zero rows → drop the table entirely + - Zero rows but code references exist → deprecate code, then drop + - Rows exist → migrate (see sequence below) +4. Migration only proceeds after audit result is documented and approved + +If migration proceeds: +- Backfill path: `team_id → teams → users WHERE is_team_admin → account_id` +- If any row cannot be backfilled to a valid `account_id` → **full rollback**, no exceptions, no manual review queues + +### 5b. Pre-migration: teams-to-accounts orphan check + +Before any backfill using `team_id → account_id` chains, run: + +```sql +SELECT COUNT(*) FROM teams t +LEFT JOIN users u ON u.team_id = t.id AND u.account_id IS NOT NULL +WHERE u.id IS NULL; +``` + +Count must be zero before any backfill proceeds. Report and resolve orphaned teams first. + +### 5c. Approved migrations (no audit gate) + +**ScriptBuilderSession:** Add `account_id`, backfill from `user_id → users.account_id`, set NOT NULL. + +**ScriptTemplate:** Add `account_id`, backfill from `created_by_id → users.account_id`, set NOT NULL. + +**ScriptGeneration:** Add `account_id`, backfill from `user_id → users.account_id`, set NOT NULL. + +### 5d. team_id cleanup + +Do NOT drop `team_id` columns during migration. Keep until all application code is updated to use `account_id` exclusively. Drop `team_id` columns in a later cleanup migration after verification. + +--- + +## Section 6: Testing Strategy + +### Phase 1: RLS validation tests + +**File:** `backend/tests/test_rls_isolation.py` + +This test suite validates RLS policies at the database layer, independent of any application code. It connects to the test database using the `resolutionflow_app` role. + +**Setup fixture:** +- Creates two accounts (`account_a`, `account_b`) with seed rows in all tenant tables +- Creates an async DB connection using `resolutionflow_app` role +- Sets `SET LOCAL app.current_account_id = ''` before each test + +**Test cases per table (~5 cases per table, ~32 tables ≈ ~160 total):** + +1. **SELECT isolation** — querying as account_a returns zero rows for account_b's data +2. **INSERT enforcement** — inserting with `account_id = account_b_uuid` raises a PostgreSQL exception (rejected by `WITH CHECK`) +3. **INSERT cross-tenant FK** — inserting with correct `account_id` but a FK value (e.g., `tree_id`) that belongs to account_b is also rejected +4. **UPDATE enforcement** — updating account_b's rows as account_a affects zero rows +5. **DELETE enforcement** — deleting account_b's rows as account_a affects zero rows + +**Fail-closed test:** +```python +# Unset app.current_account_id — must raise a database exception +# Acceptable: psycopg2.errors.InvalidTextRepresentation (NULL::uuid cast fails) +# NOT acceptable: query returning zero rows silently +with pytest.raises(asyncpg.PostgresError): + await conn.execute("SELECT * FROM trees") +``` + +The exact exception type must be documented based on the behavior of `current_setting('app.current_account_id', false)::uuid` when unset. The test asserts on that specific exception. + +**audit_logs special cases:** +- No INSERT `WITH CHECK` test — audit logs must be insertable freely +- No UPDATE or DELETE tests — those policies must not exist +- Verify via `pg_policies` that only a SELECT policy exists on `audit_logs` + +**tree_shares:** Intentionally deferred. A TODO comment is included: +```python +# TODO: tree_shares RLS tests deferred pending sharing model decision. +# Must be added before RLS is enabled on the tree_shares table. +# See: docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md Section 7 +``` + +**Tables covered in Phase 1:** +`trees`, `ai_sessions`, `ai_chat_sessions`, `ai_conversations`, `sessions`, `step_library`, `tree_categories`, `tree_tags`, `step_categories`, `flow_proposals`, `attachments`, `audit_logs`, `psa_connections`, `copilot_conversations`, `file_uploads`, `kb_imports`, `subscriptions`, `account_invites`, `ai_usage`, `notifications`, `session_shares`, `script_builder_sessions`, `script_templates`, `script_generations`, `maintenance_schedules`, `user_folders`, `user_pinned_trees`, `session_supporting_data`, `session_branches`, `session_resolution_outputs`, `psa_post_logs`, `notification_logs` + +**Connection pool reuse test:** +```python +async def test_set_local_does_not_leak_between_connections(): + """SET LOCAL must not leak account_id when a connection is returned to the pool.""" + conn = await pool.acquire() + async with conn.transaction(): + await conn.execute("SET LOCAL app.current_account_id = $1", str(account_a_id)) + # transaction ends, SET LOCAL resets + # Return conn to pool, re-acquire — verify account_id is not set + conn2 = await pool.acquire() + result = await conn2.fetchval("SELECT current_setting('app.current_account_id', true)") + assert result is None or result == "" +``` + +### Phase 2: Per-endpoint cross-tenant tests + +As each endpoint is touched in any PR from Phase 1 onward, a cross-tenant isolation test is added in the same PR: + +```python +async def test_cannot_access_other_account_( + client_account_a, account_b_resource +): + """Account A cannot access Account B's resource by UUID.""" + response = await client_account_a.get(f"//{account_b_resource.id}") + assert response.status_code == 404 # Not 403 — never reveal existence +``` + +This is a hard requirement per the code review checklist. + +--- + +## Section 7: Phased Rollout Plan + +### Immediate Hotfix (before everything else) + +- Fix CRITICAL: Copilot tree access bypass (see Section 4) +- Ships as independent PR, merged immediately, does not wait for Phase 0 + +--- + +### Phase 0 — Foundation + +Goal: Fix all existing gaps; establish patterns and tooling; gate future PRs. + +1. Fix LOW: Analytics flow endpoint missing ownership check +2. Fix LOW: Category tree count scope +3. Fix LOW: AI session search/list inconsistency (restrict both to `user_id`) +4. **Full UUID endpoint audit** — every `{resource_id}` URL param checked. All gaps documented and fixed before Phase 0 closes. +5. Add `get_tenant_context` dependency to `deps.py` +6. Add `tenant_filter()` helper to `filters.py`. Update existing filter helpers to call it. +7. **Dead code audit:** TargetList references + database row count. Report result. +8. **Orphan check:** Teams without a resolvable account_id. Report result. +9. **Define and activate CI grep check** for missing `tenant_filter()` on tenant tables. Active on all PRs from Phase 1 forward. + +Gate: All gaps patched. CI grep check active. TargetList and team orphan audit results documented. + +--- + +### Phase 1 — Schema Migration + +Goal: Every tenant-relevant table has a direct, NOT NULL `account_id` column. + +10. Add `account_id` to all tables in Section 1a (migration per logical domain group: core sessions, PSA, AI, steps, notifications) +11. Make nullable `account_id` NOT NULL on models from Section 1b +12. Migrate `ScriptBuilderSession`, `ScriptTemplate`, `ScriptGeneration` from `team_id` to `account_id` +13. `TargetList`: execute result of Phase 0 audit (drop or migrate) +14. Create `template_trees` and `platform_steps` tables; migrate global content + +Gate: Zero NULL `account_id` values in any tenant table. All backfills verified. Database passes zero-NULL assertion query for every table in scope. + +--- + +### Phase 2 — PostgreSQL Infrastructure + +Goal: Database roles established; `SET LOCAL` wired into every request; RLS test suite green. + +15. Create `resolutionflow_app` and `resolutionflow_admin` PostgreSQL roles; grant privileges +16. Update `DATABASE_URL` to `resolutionflow_app`; add `DATABASE_ADMIN_URL` for admin connections +17. Update Alembic to use `DATABASE_ADMIN_URL` +18. Modify `get_db()` to execute `SET LOCAL app.current_account_id` per request, inside transaction boundary +19. Update super admin endpoints to use `resolutionflow_admin` connection +20. Write `test_rls_isolation.py` — all ~160 RLS tests. Must be 100% green before Phase 3. Connection pool reuse test included. +21. Measure `SET LOCAL` per-request overhead baseline + +Gate: All RLS tests green. All existing integration tests green. Connection pool reuse test green. Performance baseline documented. + +--- + +### Phase 3 — Enable RLS + +Goal: RLS policies active on all tenant tables. Phased by domain, not all at once. + +Enable RLS in these batches. Run full test suite after each batch before proceeding: + +**Batch A: Core data** +`trees`, `tree_categories`, `tree_tags`, `sessions`, `attachments`, `session_supporting_data`, `maintenance_schedules` + +**Batch B: AI & sessions** +`ai_sessions`, `ai_chat_sessions`, `ai_conversations`, `ai_usage`, `ai_session_steps`, `session_branches`, `session_handoffs`, `session_resolution_outputs`, `fork_points`, `copilot_conversations`, `flow_proposals` + +**Batch C: Steps & library** +`step_library`, `step_categories`, `step_ratings`, `step_usage_logs`, `ai_suggestions`, `template_trees` (no RLS), `platform_steps` (no RLS) + +**Batch D: Integrations & users** +`psa_connections`, `psa_post_logs`, `psa_member_mappings`, `subscriptions`, `account_invites`, `file_uploads`, `kb_imports`, `notifications`, `notification_logs`, `session_shares`, `user_folders`, `user_pinned_trees` + +**Batch E: Scripts & auth** +`script_builder_sessions`, `script_templates`, `script_generations`, `audit_logs` + +For each batch migration: `ENABLE ROW LEVEL SECURITY` + `FORCE ROW LEVEL SECURITY` + create all applicable policies. + +Gate: All RLS tests green after each batch. All integration tests green. Staging smoke test after Batch E. No performance regressions. + +--- + +### Phase 4 — Ongoing + +- Every PR that touches a tenant endpoint includes a cross-tenant isolation test +- CI grep check blocks PRs with missing `tenant_filter()` (warn → block after 2-week calibration) +- `tree_shares` RLS: design sharing model, implement tests, enable RLS before any sharing feature ships +- `team_id` columns: drop in a cleanup migration after all application code is fully migrated + +--- + +## Section 8: Open Questions + +| Question | Impact | Owner | +|---|---|---| +| Is tree sharing intra-tenant only, or can trees be shared across accounts? | Determines `tree_shares` schema, backfill strategy, and RLS policy. Tree_shares table deferred until resolved. | Product | +| What is the exact PostgreSQL exception raised when `current_setting('app.current_account_id', false)::uuid` is evaluated with no value set? | Determines the fail-closed test assertion. Must be tested in Phase 2. | Engineering | +| TargetList: zero references + zero rows? Or does data exist? | Determines whether table is dropped or migrated. | Phase 0 audit | + +--- + +## Appendix: Pre-Implementation Checklist + +Everything that must be in place before writing feature code on any tenant-data endpoint: + +### Foundation (Phase 0) +- [ ] Copilot tree access bypass hotfix shipped +- [ ] Analytics flow endpoint ownership check added +- [ ] Category tree count scoped to account +- [ ] AI session list and search both restricted to `user_id` +- [ ] Full UUID endpoint audit completed; all gaps documented and fixed +- [ ] `get_tenant_context` dependency added to `deps.py` +- [ ] `tenant_filter()` helper added to `filters.py` +- [ ] Existing filter helpers updated to use `tenant_filter()` internally +- [ ] TargetList dead code audit result documented +- [ ] Teams orphan count query run and result documented +- [ ] CI grep check defined and active + +### Schema (Phase 1) +- [ ] `account_id NOT NULL` on all tables in Section 1a denormalization list +- [ ] `account_id NOT NULL` on all existing nullable models from Section 1b +- [ ] ScriptBuilderSession, ScriptTemplate, ScriptGeneration migrated from team_id +- [ ] TargetList: dropped or migrated per audit result +- [ ] template_trees and platform_steps tables created +- [ ] Zero NULL assertion passes for every tenant table +- [ ] Migration sequence (add nullable → backfill → verify → NOT NULL → index) followed for each table + +### Infrastructure (Phase 2) +- [ ] `resolutionflow_app` role created with correct privileges +- [ ] `resolutionflow_admin` role created with `BYPASSRLS` +- [ ] `DATABASE_URL` updated to `resolutionflow_app` +- [ ] `DATABASE_ADMIN_URL` added for admin connections +- [ ] Alembic uses `DATABASE_ADMIN_URL` +- [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction +- [ ] Super admin endpoints use admin connection +- [ ] `test_rls_isolation.py` written with ~160 test cases +- [ ] All RLS tests pass (100%) +- [ ] Connection pool reuse test passes +- [ ] Fail-closed exception documented and asserted +- [ ] Performance baseline for `SET LOCAL` overhead documented + +### RLS Active (Phase 3) +- [ ] Batch A RLS policies applied and all tests green +- [ ] Batch B RLS policies applied and all tests green +- [ ] Batch C RLS policies applied and all tests green +- [ ] Batch D RLS policies applied and all tests green +- [ ] Batch E RLS policies applied and all tests green +- [ ] Staging smoke test passed +- [ ] All existing integration tests green with RLS active + +### Ongoing Standards +- [ ] Every new endpoint PR includes cross-tenant isolation test +- [ ] CI grep check blocks PRs with missing `tenant_filter()` +- [ ] `tree_shares` deferred — not enabled until sharing model documented +- [ ] `team_id` cleanup migration scheduled after full migration complete