diff --git a/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md b/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md index 1985f2c0..be49cc05 100644 --- a/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md +++ b/docs/superpowers/specs/2026-04-09-tenant-data-isolation-design.md @@ -506,7 +506,84 @@ Gate: All RLS tests green after each batch. All integration tests green. Staging --- -## Section 8: Open Questions +## Section 8: Background Job Policy + +Background jobs and scheduled tasks that process tenant data are a distinct isolation surface. Unlike request-scoped endpoints, they do not have a `current_user` and must manage tenant context explicitly. + +### Policy + +Every background job that touches tenant tables must comply with one of two patterns: + +**Pattern A — `resolutionflow_admin` with explicit per-query `account_id` filtering** + +Use when: the job is inherently cross-tenant (e.g., processes all pending work across all accounts in a single pass). The admin role bypasses RLS, so explicit `account_id` filters in every query are mandatory — RLS is not the safety net here. + +```python +# Allowed: cross-tenant batch SELECT for IDs only, then loop per-account +result = await db.execute( + select(Model.id, Model.account_id).where(Model.status == "pending") +) +for row in result.all(): + await _process_one(row.id, row.account_id, db) # account_id threaded through + +# In the processing function: all queries must filter by account_id +async def _process_one(record_id, account_id, db): + result = await db.execute( + select(Model).where(Model.id == record_id, Model.account_id == account_id) + ) +``` + +**Pattern B — `resolutionflow_app` with `SET LOCAL` per tenant loop iteration** + +Use when: the job processes tenants one at a time and it is practical to set the tenant context per iteration. + +```python +for account_id in account_ids: + async with async_session_maker() as db: + await db.execute( + text("SET LOCAL app.current_account_id = :id"), + {"id": str(account_id)} + ) + # All queries in this block are RLS-enforced to this tenant +``` + +### No cross-tenant queries without justification + +No background job may issue a SELECT, UPDATE, or DELETE that spans multiple tenants' data in a single query without explicit written justification in the code comment and in this spec. Approved cross-tenant operations are documented below. + +### Inventory: Current Background Jobs + +| Job | File | Pattern | Status | Notes | +|---|---|---|---|---| +| Knowledge Flywheel | `knowledge_flywheel_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must thread `account_id` from the session into all processing calls. | +| PSA Retry | `psa_retry_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must add `account_id` to batch query and thread through to `retry_failed_push`. | +| Chat Retention Cleanup | `retention_cleanup.py` | Admin + explicit filter | **Correct pattern** | Already loops per-account with explicit `account_id` in all queries. Model for other jobs. Needs role update to `resolutionflow_admin`. | +| Maintenance Schedule Firing | `scheduler.py` (`_fire_maintenance_schedule`) | Admin + explicit filter | **Needs update** | Fetches `MaintenanceSchedule` and `Tree` by ID without `account_id` filter. After Phase 1, `Session` creation must set `account_id`. Add `account_id` to all queries. | +| AI Conversation Expiry | `scheduler.py` (`_cleanup_expired_ai_conversations`) | Admin, cross-tenant approved | **Correct pattern** | Cross-tenant DELETE by `expires_at` is explicitly justified: rows are expired regardless of tenant. Needs role update to `resolutionflow_admin`. Document this approval in code comment. | + +### Approved cross-tenant queries + +The following cross-tenant operations are explicitly approved. All others require a new entry here before shipping: + +| Job | Query | Justification | +|---|---|---| +| AI Conversation Expiry | `DELETE FROM ai_conversations WHERE expires_at < NOW()` | Time-based expiry is tenant-independent. Deleting expired rows does not expose data across tenants. | +| Chat Retention Cleanup | `SELECT id FROM accounts` | Required to iterate per-account. Read of account IDs only; no tenant data accessed. | +| Knowledge Flywheel (after fix) | `SELECT id, account_id FROM ai_sessions WHERE analysis_status='pending'` | Cross-tenant ID harvest only. No tenant data in the SELECT. `account_id` is threaded into per-tenant processing. | +| PSA Retry (after fix) | `SELECT id, account_id FROM psa_post_logs WHERE status='pending_retry'` | Cross-tenant ID harvest only. `account_id` threaded into per-tenant processing. | + +### Checklist additions (Phase 2) + +- [ ] All background jobs updated to use `resolutionflow_admin` role via `DATABASE_ADMIN_URL` +- [ ] Knowledge Flywheel: `account_id` added to batch query and threaded through to `analyze_session` +- [ ] PSA Retry: `account_id` added to batch query and threaded through to `retry_failed_push` +- [ ] Maintenance Schedule Firing: all queries include `account_id`; `Session` creation sets `account_id` after Phase 1 migration +- [ ] AI Conversation Expiry: cross-tenant approval comment added to code +- [ ] All jobs reviewed against this policy before Phase 3 (RLS enable) + +--- + +## Section 9: Open Questions | Question | Impact | Owner | |---|---|---| @@ -550,6 +627,11 @@ Everything that must be in place before writing feature code on any tenant-data - [ ] Alembic uses `DATABASE_ADMIN_URL` - [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction - [ ] Super admin endpoints use admin connection +- [ ] All background jobs updated to use `resolutionflow_admin` role (see Section 8) +- [ ] Knowledge Flywheel: `account_id` threaded through batch query and `analyze_session` +- [ ] PSA Retry: `account_id` threaded through batch query and `retry_failed_push` +- [ ] Maintenance Schedule: all queries include `account_id`; `Session` creation sets `account_id` +- [ ] AI Conversation Expiry: cross-tenant approval comment added to code - [ ] `test_rls_isolation.py` written with ~160 test cases - [ ] All RLS tests pass (100%) - [ ] Connection pool reuse test passes