docs: add background job isolation policy to tenant isolation spec

Documents policy for all 5 existing background jobs:
- Knowledge Flywheel and PSA Retry flagged for account_id threading
- Chat Retention already follows correct pattern (model for others)
- Maintenance Schedule Firing needs account_id in queries + Session creation
- AI Conversation Expiry approved as cross-tenant with justification

Adds approved cross-tenant query registry and Phase 2 checklist items.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
chihlasm
2026-04-09 02:38:55 +00:00
parent a7dff9e143
commit 82bb7967d8

View File

@@ -506,7 +506,84 @@ Gate: All RLS tests green after each batch. All integration tests green. Staging
---
## Section 8: Open Questions
## Section 8: Background Job Policy
Background jobs and scheduled tasks that process tenant data are a distinct isolation surface. Unlike request-scoped endpoints, they do not have a `current_user` and must manage tenant context explicitly.
### Policy
Every background job that touches tenant tables must comply with one of two patterns:
**Pattern A — `resolutionflow_admin` with explicit per-query `account_id` filtering**
Use when: the job is inherently cross-tenant (e.g., processes all pending work across all accounts in a single pass). The admin role bypasses RLS, so explicit `account_id` filters in every query are mandatory — RLS is not the safety net here.
```python
# Allowed: cross-tenant batch SELECT for IDs only, then loop per-account
result = await db.execute(
select(Model.id, Model.account_id).where(Model.status == "pending")
)
for row in result.all():
await _process_one(row.id, row.account_id, db) # account_id threaded through
# In the processing function: all queries must filter by account_id
async def _process_one(record_id, account_id, db):
result = await db.execute(
select(Model).where(Model.id == record_id, Model.account_id == account_id)
)
```
**Pattern B — `resolutionflow_app` with `SET LOCAL` per tenant loop iteration**
Use when: the job processes tenants one at a time and it is practical to set the tenant context per iteration.
```python
for account_id in account_ids:
async with async_session_maker() as db:
await db.execute(
text("SET LOCAL app.current_account_id = :id"),
{"id": str(account_id)}
)
# All queries in this block are RLS-enforced to this tenant
```
### No cross-tenant queries without justification
No background job may issue a SELECT, UPDATE, or DELETE that spans multiple tenants' data in a single query without explicit written justification in the code comment and in this spec. Approved cross-tenant operations are documented below.
### Inventory: Current Background Jobs
| Job | File | Pattern | Status | Notes |
|---|---|---|---|---|
| Knowledge Flywheel | `knowledge_flywheel_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must thread `account_id` from the session into all processing calls. |
| PSA Retry | `psa_retry_scheduler.py` | Admin + explicit filter | **Needs update** | Batch SELECT across all accounts with no `account_id` filter. Must add `account_id` to batch query and thread through to `retry_failed_push`. |
| Chat Retention Cleanup | `retention_cleanup.py` | Admin + explicit filter | **Correct pattern** | Already loops per-account with explicit `account_id` in all queries. Model for other jobs. Needs role update to `resolutionflow_admin`. |
| Maintenance Schedule Firing | `scheduler.py` (`_fire_maintenance_schedule`) | Admin + explicit filter | **Needs update** | Fetches `MaintenanceSchedule` and `Tree` by ID without `account_id` filter. After Phase 1, `Session` creation must set `account_id`. Add `account_id` to all queries. |
| AI Conversation Expiry | `scheduler.py` (`_cleanup_expired_ai_conversations`) | Admin, cross-tenant approved | **Correct pattern** | Cross-tenant DELETE by `expires_at` is explicitly justified: rows are expired regardless of tenant. Needs role update to `resolutionflow_admin`. Document this approval in code comment. |
### Approved cross-tenant queries
The following cross-tenant operations are explicitly approved. All others require a new entry here before shipping:
| Job | Query | Justification |
|---|---|---|
| AI Conversation Expiry | `DELETE FROM ai_conversations WHERE expires_at < NOW()` | Time-based expiry is tenant-independent. Deleting expired rows does not expose data across tenants. |
| Chat Retention Cleanup | `SELECT id FROM accounts` | Required to iterate per-account. Read of account IDs only; no tenant data accessed. |
| Knowledge Flywheel (after fix) | `SELECT id, account_id FROM ai_sessions WHERE analysis_status='pending'` | Cross-tenant ID harvest only. No tenant data in the SELECT. `account_id` is threaded into per-tenant processing. |
| PSA Retry (after fix) | `SELECT id, account_id FROM psa_post_logs WHERE status='pending_retry'` | Cross-tenant ID harvest only. `account_id` threaded into per-tenant processing. |
### Checklist additions (Phase 2)
- [ ] All background jobs updated to use `resolutionflow_admin` role via `DATABASE_ADMIN_URL`
- [ ] Knowledge Flywheel: `account_id` added to batch query and threaded through to `analyze_session`
- [ ] PSA Retry: `account_id` added to batch query and threaded through to `retry_failed_push`
- [ ] Maintenance Schedule Firing: all queries include `account_id`; `Session` creation sets `account_id` after Phase 1 migration
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
- [ ] All jobs reviewed against this policy before Phase 3 (RLS enable)
---
## Section 9: Open Questions
| Question | Impact | Owner |
|---|---|---|
@@ -550,6 +627,11 @@ Everything that must be in place before writing feature code on any tenant-data
- [ ] Alembic uses `DATABASE_ADMIN_URL`
- [ ] `get_db()` executes `SET LOCAL app.current_account_id` per request inside transaction
- [ ] Super admin endpoints use admin connection
- [ ] All background jobs updated to use `resolutionflow_admin` role (see Section 8)
- [ ] Knowledge Flywheel: `account_id` threaded through batch query and `analyze_session`
- [ ] PSA Retry: `account_id` threaded through batch query and `retry_failed_push`
- [ ] Maintenance Schedule: all queries include `account_id`; `Session` creation sets `account_id`
- [ ] AI Conversation Expiry: cross-tenant approval comment added to code
- [ ] `test_rls_isolation.py` written with ~160 test cases
- [ ] All RLS tests pass (100%)
- [ ] Connection pool reuse test passes