docs: add background job isolation policy to tenant isolation spec
Documents policy for all 5 existing background jobs: - Knowledge Flywheel and PSA Retry flagged for account_id threading - Chat Retention already follows correct pattern (model for others) - Maintenance Schedule Firing needs account_id in queries + Session creation - AI Conversation Expiry approved as cross-tenant with justification Adds approved cross-tenant query registry and Phase 2 checklist items. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -506,7 +506,84 @@ Gate: All RLS tests green after each batch. All integration tests green. Staging
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Section 8: Open Questions
|
## Section 8: Background Job Policy
|
||||||
|
|
||||||
|
Background jobs and scheduled tasks that process tenant data are a distinct isolation surface. Unlike request-scoped endpoints, they do not have a `current_user` and must manage tenant context explicitly.
|
||||||
|
|
||||||
|
### Policy
|
||||||
|
|
||||||
|
Every background job that touches tenant tables must comply with one of two patterns:
|
||||||
|
|
||||||
|
**Pattern A — `resolutionflow_admin` with explicit per-query `account_id` filtering**
|
||||||
|
|
||||||
|
Use when: the job is inherently cross-tenant (e.g., processes all pending work across all accounts in a single pass). The admin role bypasses RLS, so explicit `account_id` filters in every query are mandatory — RLS is not the safety net here.
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Allowed: cross-tenant batch SELECT for IDs only, then loop per-account
|
||||||
|
result = await db.execute(
|
||||||
|
select(Model.id, Model.account_id).where(Model.status == "pending")
|
||||||
|
)
|
||||||
|
for row in result.all():
|
||||||
|
await _process_one(row.id, row.account_id, db) # account_id threaded through
|
||||||
|
|
||||||
|
# In the processing function: all queries must filter by account_id
|
||||||
|
async def _process_one(record_id, account_id, db):
|
||||||
|
result = await db.execute(
|
||||||
|
select(Model).where(Model.id == record_id, Model.account_id == account_id)
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Pattern B — `resolutionflow_app` with `SET LOCAL` per tenant loop iteration**
|
||||||
|
|
||||||
|
Use when: the job processes tenants one at a time and it is practical to set the tenant context per iteration.
|
||||||
|
|
||||||
|
```python
|
||||||
|
for account_id in account_ids:
|
||||||
|
async with async_session_maker() as db:
|
||||||
|
await db.execute(
|
||||||
|
text("SET LOCAL app.current_account_id = :id"),
|
||||||
|
{"id": str(account_id)}
|
||||||
|
)
|
||||||
|
# All queries in this block are RLS-enforced to this tenant
|
||||||
|
```
|
||||||
|
|
||||||
|
### No cross-tenant queries without justification
|
||||||
|
|
||||||
|
No background job may issue a SELECT, UPDATE, or DELETE that spans multiple tenants' data in a single query without explicit written justification in the code comment and in this spec. Approved cross-tenant operations are documented below.
|
||||||
|
|
||||||
|
### Inventory: Current Background Jobs
|
||||||
|
|
||||||
|
| Job | File | Pattern | Status | Notes |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| Knowledge Flywheel | `knowledge_flywheel_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must thread `account_id` from the session into all processing calls. |
|
||||||
|
| PSA Retry | `psa_retry_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must add `account_id` to batch query and thread through to `retry_failed_push`. |
|
||||||
|
| Chat Retention Cleanup | `retention_cleanup.py` | Admin + explicit filter | **Correct pattern** | Already loops per-account with explicit `account_id` in all queries. Model for other jobs. Needs role update to `resolutionflow_admin`. |
|
||||||
|
| Maintenance Schedule Firing | `scheduler.py` (`_fire_maintenance_schedule`) | Admin + explicit filter | **Needs update** | Fetches `MaintenanceSchedule` and `Tree` by ID without `account_id` filter. After Phase 1, `Session` creation must set `account_id`. Add `account_id` to all queries. |
|
||||||
|
| AI Conversation Expiry | `scheduler.py` (`_cleanup_expired_ai_conversations`) | Admin, cross-tenant approved | **Correct pattern** | Cross-tenant DELETE by `expires_at` is explicitly justified: rows are expired regardless of tenant. Needs role update to `resolutionflow_admin`. Document this approval in code comment. |
|
||||||
|
|
||||||
|
### Approved cross-tenant queries
|
||||||
|
|
||||||
|
The following cross-tenant operations are explicitly approved. All others require a new entry here before shipping:
|
||||||
|
|
||||||
|
| Job | Query | Justification |
|
||||||
|
|---|---|---|
|
||||||
|
| AI Conversation Expiry | `DELETE FROM ai_conversations WHERE expires_at < NOW()` | Time-based expiry is tenant-independent. Deleting expired rows does not expose data across tenants. |
|
||||||
|
| Chat Retention Cleanup | `SELECT id FROM accounts` | Required to iterate per-account. Read of account IDs only; no tenant data accessed. |
|
||||||
|
| Knowledge Flywheel (after fix) | `SELECT id, account_id FROM ai_sessions WHERE analysis_status='pending'` | Cross-tenant ID harvest only. No tenant data in the SELECT. `account_id` is threaded into per-tenant processing. |
|
||||||
|
| PSA Retry (after fix) | `SELECT id, account_id FROM psa_post_logs WHERE status='pending_retry'` | Cross-tenant ID harvest only. `account_id` threaded into per-tenant processing. |
|
||||||
|
|
||||||
|
### Checklist additions (Phase 2)
|
||||||
|
|
||||||
|
- [ ] All background jobs updated to use `resolutionflow_admin` role via `DATABASE_ADMIN_URL`
|
||||||
|
- [ ] Knowledge Flywheel: `account_id` added to batch query and threaded through to `analyze_session`
|
||||||
|
- [ ] PSA Retry: `account_id` added to batch query and threaded through to `retry_failed_push`
|
||||||
|
- [ ] Maintenance Schedule Firing: all queries include `account_id`; `Session` creation sets `account_id` after Phase 1 migration
|
||||||
|
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
|
||||||
|
- [ ] All jobs reviewed against this policy before Phase 3 (RLS enable)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Section 9: Open Questions
|
||||||
|
|
||||||
| Question | Impact | Owner |
|
| Question | Impact | Owner |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
@@ -550,6 +627,11 @@ Everything that must be in place before writing feature code on any tenant-data
|
|||||||
- [ ] Alembic uses `DATABASE_ADMIN_URL`
|
- [ ] Alembic uses `DATABASE_ADMIN_URL`
|
||||||
- [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction
|
- [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction
|
||||||
- [ ] Super admin endpoints use admin connection
|
- [ ] Super admin endpoints use admin connection
|
||||||
|
- [ ] All background jobs updated to use `resolutionflow_admin` role (see Section 8)
|
||||||
|
- [ ] Knowledge Flywheel: `account_id` threaded through batch query and `analyze_session`
|
||||||
|
- [ ] PSA Retry: `account_id` threaded through batch query and `retry_failed_push`
|
||||||
|
- [ ] Maintenance Schedule: all queries include `account_id`; `Session` creation sets `account_id`
|
||||||
|
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
|
||||||
- [ ] `test_rls_isolation.py` written with ~160 test cases
|
- [ ] `test_rls_isolation.py` written with ~160 test cases
|
||||||
- [ ] All RLS tests pass (100%)
|
- [ ] All RLS tests pass (100%)
|
||||||
- [ ] Connection pool reuse test passes
|
- [ ] Connection pool reuse test passes
|
||||||
|
|||||||
Reference in New Issue
Block a user