docs: add background job isolation policy to tenant isolation spec
Documents policy for all 5 existing background jobs: - Knowledge Flywheel and PSA Retry flagged for account_id threading - Chat Retention already follows correct pattern (model for others) - Maintenance Schedule Firing needs account_id in queries + Session creation - AI Conversation Expiry approved as cross-tenant with justification Adds approved cross-tenant query registry and Phase 2 checklist items. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -506,7 +506,84 @@ Gate: All RLS tests green after each batch. All integration tests green. Staging
|
||||
|
||||
---
|
||||
|
||||
## Section 8: Open Questions
|
||||
## Section 8: Background Job Policy
|
||||
|
||||
Background jobs and scheduled tasks that process tenant data are a distinct isolation surface. Unlike request-scoped endpoints, they do not have a `current_user` and must manage tenant context explicitly.
|
||||
|
||||
### Policy
|
||||
|
||||
Every background job that touches tenant tables must comply with one of two patterns:
|
||||
|
||||
**Pattern A — `resolutionflow_admin` with explicit per-query `account_id` filtering**
|
||||
|
||||
Use when: the job is inherently cross-tenant (e.g., processes all pending work across all accounts in a single pass). The admin role bypasses RLS, so explicit `account_id` filters in every query are mandatory — RLS is not the safety net here.
|
||||
|
||||
```python
|
||||
# Allowed: cross-tenant batch SELECT for IDs only, then loop per-account
|
||||
result = await db.execute(
|
||||
select(Model.id, Model.account_id).where(Model.status == "pending")
|
||||
)
|
||||
for row in result.all():
|
||||
await _process_one(row.id, row.account_id, db) # account_id threaded through
|
||||
|
||||
# In the processing function: all queries must filter by account_id
|
||||
async def _process_one(record_id, account_id, db):
|
||||
result = await db.execute(
|
||||
select(Model).where(Model.id == record_id, Model.account_id == account_id)
|
||||
)
|
||||
```
|
||||
|
||||
**Pattern B — `resolutionflow_app` with `SET LOCAL` per tenant loop iteration**
|
||||
|
||||
Use when: the job processes tenants one at a time and it is practical to set the tenant context per iteration.
|
||||
|
||||
```python
|
||||
for account_id in account_ids:
|
||||
async with async_session_maker() as db:
|
||||
await db.execute(
|
||||
text("SET LOCAL app.current_account_id = :id"),
|
||||
{"id": str(account_id)}
|
||||
)
|
||||
# All queries in this block are RLS-enforced to this tenant
|
||||
```
|
||||
|
||||
### No cross-tenant queries without justification
|
||||
|
||||
No background job may issue a SELECT, UPDATE, or DELETE that spans multiple tenants' data in a single query without explicit written justification in the code comment and in this spec. Approved cross-tenant operations are documented below.
|
||||
|
||||
### Inventory: Current Background Jobs
|
||||
|
||||
| Job | File | Pattern | Status | Notes |
|
||||
|---|---|---|---|---|
|
||||
| Knowledge Flywheel | `knowledge_flywheel_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must thread `account_id` from the session into all processing calls. |
|
||||
| PSA Retry | `psa_retry_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must add `account_id` to batch query and thread through to `retry_failed_push`. |
|
||||
| Chat Retention Cleanup | `retention_cleanup.py` | Admin + explicit filter | **Correct pattern** | Already loops per-account with explicit `account_id` in all queries. Model for other jobs. Needs role update to `resolutionflow_admin`. |
|
||||
| Maintenance Schedule Firing | `scheduler.py` (`_fire_maintenance_schedule`) | Admin + explicit filter | **Needs update** | Fetches `MaintenanceSchedule` and `Tree` by ID without `account_id` filter. After Phase 1, `Session` creation must set `account_id`. Add `account_id` to all queries. |
|
||||
| AI Conversation Expiry | `scheduler.py` (`_cleanup_expired_ai_conversations`) | Admin, cross-tenant approved | **Correct pattern** | Cross-tenant DELETE by `expires_at` is explicitly justified: rows are expired regardless of tenant. Needs role update to `resolutionflow_admin`. Document this approval in code comment. |
|
||||
|
||||
### Approved cross-tenant queries
|
||||
|
||||
The following cross-tenant operations are explicitly approved. All others require a new entry here before shipping:
|
||||
|
||||
| Job | Query | Justification |
|
||||
|---|---|---|
|
||||
| AI Conversation Expiry | `DELETE FROM ai_conversations WHERE expires_at < NOW()` | Time-based expiry is tenant-independent. Deleting expired rows does not expose data across tenants. |
|
||||
| Chat Retention Cleanup | `SELECT id FROM accounts` | Required to iterate per-account. Read of account IDs only; no tenant data accessed. |
|
||||
| Knowledge Flywheel (after fix) | `SELECT id, account_id FROM ai_sessions WHERE analysis_status='pending'` | Cross-tenant ID harvest only. No tenant data in the SELECT. `account_id` is threaded into per-tenant processing. |
|
||||
| PSA Retry (after fix) | `SELECT id, account_id FROM psa_post_logs WHERE status='pending_retry'` | Cross-tenant ID harvest only. `account_id` threaded into per-tenant processing. |
|
||||
|
||||
### Checklist additions (Phase 2)
|
||||
|
||||
- [ ] All background jobs updated to use `resolutionflow_admin` role via `DATABASE_ADMIN_URL`
|
||||
- [ ] Knowledge Flywheel: `account_id` added to batch query and threaded through to `analyze_session`
|
||||
- [ ] PSA Retry: `account_id` added to batch query and threaded through to `retry_failed_push`
|
||||
- [ ] Maintenance Schedule Firing: all queries include `account_id`; `Session` creation sets `account_id` after Phase 1 migration
|
||||
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
|
||||
- [ ] All jobs reviewed against this policy before Phase 3 (RLS enable)
|
||||
|
||||
---
|
||||
|
||||
## Section 9: Open Questions
|
||||
|
||||
| Question | Impact | Owner |
|
||||
|---|---|---|
|
||||
@@ -550,6 +627,11 @@ Everything that must be in place before writing feature code on any tenant-data
|
||||
- [ ] Alembic uses `DATABASE_ADMIN_URL`
|
||||
- [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction
|
||||
- [ ] Super admin endpoints use admin connection
|
||||
- [ ] All background jobs updated to use `resolutionflow_admin` role (see Section 8)
|
||||
- [ ] Knowledge Flywheel: `account_id` threaded through batch query and `analyze_session`
|
||||
- [ ] PSA Retry: `account_id` threaded through batch query and `retry_failed_push`
|
||||
- [ ] Maintenance Schedule: all queries include `account_id`; `Session` creation sets `account_id`
|
||||
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
|
||||
- [ ] `test_rls_isolation.py` written with ~160 test cases
|
||||
- [ ] All RLS tests pass (100%)
|
||||
- [ ] Connection pool reuse test passes
|
||||
|
||||
Reference in New Issue
Block a user