After migration 174f442795b7 enforces NOT NULL on account_id, all
platform/global content must use the sentinel platform account instead
of NULL. Three categories of fixes:
1. trees.py: is_default trees now get PLATFORM_ACCOUNT_ID (not None)
2. admin_categories.py: global category CRUD now uses PLATFORM_ACCOUNT_ID
3. categories.py, tags.py, step_categories.py: creation endpoints coerce
None → PLATFORM_ACCOUNT_ID; IS NULL filter queries updated to
== PLATFORM_ACCOUNT_ID (IS NULL queries returned empty after migration
backfilled all global rows to the platform account)
Defines PLATFORM_ACCOUNT_ID constant in app/core/service_account.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The INSERT into template_trees incorrectly referenced `tags` as a column
on the `trees` table. Tags are a relationship via the `tree_tag_assignments`
join table — there is no direct column. Migration was failing with:
UndefinedColumn: column "tags" does not exist ... FROM trees
Fixed by replacing COALESCE(tags, '[]') with a correlated subquery that
aggregates tag names from tree_tag_assignments → tree_tags.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Migration 057 inserts 6 AD script templates with NULL team_id and NULL
created_by. Neither backfill path (created_by→users, team_id→team admin)
could attribute them to an account, causing the verify check to fail.
Fix: pre-create the platform sentinel account (ON CONFLICT DO NOTHING,
safe since 3a40fe11b427 also creates it idempotently) and add a final
fallback UPDATE assigning any remaining NULL script_templates to it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PostgreSQL UPDATE...FROM does not allow the updated table to be
referenced inside the FROM clause's JOIN conditions. Replace the
LEFT JOIN psa_connections with a correlated subquery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
session_resolution_outputs is created in migration 067 (sequential branch
from 064). On fresh databases, Alembic could run cc214c63aa30 before 067,
causing "table does not exist" errors. depends_on ensures 067 always runs
first regardless of branch traversal order.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Combines the Phase 1 tenant isolation chain (064 → ... → 174f442795b7)
with the main sequential chain (064 → ... → 070) into a single Alembic
head (a9f3b2c1d4e5) so `alembic upgrade head` in the Dockerfile works
without ambiguity.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
b8d2f4a6c091 was NOT the production head. The true head was 064
(064_normalize_script_builder_messages) via the chain:
b8d2f4a6c091 → f0aad74ea51b → 062 → 063 → 064
This caused 'multiple head revisions' on Railway deployment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The tags column was accidentally omitted from the is_default tree copy.
Now uses COALESCE(tags, '[]'::jsonb) to preserve source tree tags.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All previously-nullable account_id columns are now NOT NULL.
tree_embeddings and feedback backfilled before constraint applied.
Global content assigned to platform sentinel account (00000000-...-0001)
in preceding migration.
Tables updated: users, trees, tree_categories, tree_tags,
step_categories, step_library, tree_embeddings, feedback
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Creates template_trees and platform_steps (no account_id, no RLS).
Migrates is_default=TRUE trees and public steps into them.
Creates sentinel platform account (00000000-...-0001) for global
tree_categories, tree_tags, step_categories, step_library, and
is_default trees — clearing all NULL account_id rows in those tables
as prerequisite for Group 9 SET NOT NULL.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Zero rows in production — this is a schema-only migration in practice.
team_id kept for app code compatibility. Drop deferred to later cleanup.
Backfill: team_id → team admin user → account_id; fallback: created_by.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
team_id is kept in all three tables — drop deferred until app code
is fully migrated off team_id references.
Tables: script_builder_sessions, script_templates, script_generations
Backfill: user_id/created_by → users.account_id
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
psa_post_log: backfill via psa_connection, fallback to posted_by user
psa_member_mappings: backfill via psa_connection
notification_logs: backfill via notification_config
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Backfill from rater/user's account_id (not the step's account_id).
This is an explicit design decision — step rating data is attributed
to the account that performed the rating.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add tenant data isolation design spec
Complete architecture plan for multi-tenant data isolation across
all layers (PostgreSQL RLS, application-layer filtering, schema
migration, testing strategy, and phased rollout checklist).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add background job isolation policy to tenant isolation spec
Documents policy for all 5 existing background jobs:
- Knowledge Flywheel and PSA Retry flagged for account_id threading
- Chat Retention already follows correct pattern (model for others)
- Maintenance Schedule Firing needs account_id in queries + Session creation
- AI Conversation Expiry approved as cross-tenant with justification
Adds approved cross-tenant query registry and Phase 2 checklist items.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add tenant isolation Phase 0 implementation plan
8 tasks covering: CRITICAL copilot hotfix, tenant_filter() helper,
get_tenant_context dependency, analytics/category/AI session gap fixes,
full UUID endpoint audit, TargetList dead code audit, teams orphan
check, and CI grep check for missing tenant filters.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add tenant_filter() helper and get_tenant_context dependency
tenant_filter(model, account_id) is the canonical app-layer tenant
scoping expression. Every query on a tenant table must use it.
build_tree_access_filter and build_step_visibility_filter updated
to call tenant_filter() internally for the account_id match.
get_tenant_context is a FastAPI dependency that returns account_id
or raises 403 if the user has no account — prevents raw access to
current_user.account_id and centralises the null check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: scope analytics/flows/{tree_id} to requesting account
Any authenticated user could read flow analytics (session counts,
completion rates, CSAT) for any tree UUID. Now returns 404 if the
tree doesn't belong to the requesting account.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: scope category tree_count to requesting account
tree_count on GET /categories/{id} was including trees from all
accounts, leaking cross-tenant row counts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: restrict AI session search to current user only
Search endpoint used OR(user_id, account_id), exposing other users'
problem_summary and problem_domain within the same account. Sessions
are user-scoped only — cross-user access requires explicit escalation
or sharing. List and search endpoints now behave consistently.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: add ownership check and 404 responses to ai-sessions endpoints
Cross-tenant isolation audit found:
- retry-psa-push had NO ownership check (CRITICAL) — any user could retry any session's PSA push
- save_task_lane used db.get() without ownership filter, returned 403 revealing existence
- get_session returned 403 instead of 404 for unauthorized access
- stream_documentation returned 403 instead of 404
All now use query-level user_id filtering and return 404 to avoid revealing existence.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-tenant session access
All session endpoints (get, update, complete, scratchpad, variables, export,
ticket-link) now return 404 instead of 403 when a user tries to access
another user's session. This prevents confirming existence of resources
across tenant boundaries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-tenant tree access
get_tree and update_tree now return 404 when a user cannot access a tree
(private tree from another account). Prevents confirming resource existence
across tenant boundaries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-tenant step access
get_step_or_404 now returns 404 when can_view_step or can_edit_step fails,
preventing confirmation of step existence across tenant boundaries.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-tenant upload access
get_upload_url and delete_upload now return 404 when the upload belongs to
a different account/user, preventing resource existence confirmation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-tenant share access
revoke_share and create_share now return 404 when the caller is not the
owner, preventing resource existence confirmation across users.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-team tree access in maintenance schedules
_get_tree_or_403 now returns 404 when the user's team does not match,
preventing confirmation of tree existence across teams.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-account tag access
get_tag now returns 404 for account-specific tags that belong to another
account, preventing resource existence confirmation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 instead of 403 for cross-account step category access
get_step_category now returns 404 for account-specific categories that
belong to another account, preventing resource existence confirmation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* test: add cross-tenant isolation tests for Task 6 UUID audit
Tests cover:
- Tree GET/PUT returns 404 for cross-account access
- Session GET returns 404 for cross-user access
- AI session GET returns 404 for cross-user access
- AI session retry-psa-push requires ownership
- Upload URL returns 404 for cross-account access
- Share revoke returns 404 for cross-user access
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: return 404 (not 403) for get_documentation cross-user access; add missing Task 6 tests
get_documentation was revealing session existence via 403. Added pre-check
query filtering by session_id AND user_id before calling the engine.
Also add cross-tenant isolation tests for steps, tags, step_categories,
and maintenance_schedules endpoints fixed in Task 6 (TDD was skipped).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: address Task 6 quality review — rename helper, restore 403 for intra-account, add docs test
- Rename _get_tree_or_403 → _get_tree_or_404 in maintenance_schedules.py
(function now raises 404, old name was misleading)
- Restore HTTP 403 for intra-account permission failures in update_tree:
same-account users who can see a tree but can't edit it got 404 (wrong);
only cross-account lookups should return 404 to avoid confirming existence
- Apply same 403/404 distinction to update_tree_visibility
- Add test: get_documentation must return 404 for cross-user session access
- Add comment documenting owner-only design for documentation endpoints
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore: Task 7+8 — TargetList audit, CI tenant-filter grep check
Task 7: TargetList dead code audit
- Found active code references in 12+ files across backend and frontend
(full CRUD API + frontend page + MaintenanceScheduleSection + BatchLaunchModal)
- Decision: migrate to account_id in Phase 1 (cannot drop)
- DB row count not available from code-server — must verify from VPS SSH
before Phase 1 migration
- Teams orphan check query documented; must run from VPS SSH before Phase 1
- Results documented in spec Section 9
Task 8: CI tenant-filter enforcement check (warn mode)
- Create backend/scripts/check_tenant_filters.py
Scans endpoint and service files for select() on tenant tables without
tenant_filter/account_id/user_id in surrounding context. Currently
reports 109 warnings (Phase 1 backlog). Exits 0 (warn mode).
- Add Check tenant filter enforcement step to backend CI job
Add --fail flag after Phase 1 backlog clears to make it blocking.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: record Phase 0 audit results — 0 orphaned teams, 0 target_list rows
Both checks confirmed 2026-04-09 from production DB.
Phase 1 migration is safe to proceed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add tenant data isolation design spec
Complete architecture plan for multi-tenant data isolation across
all layers (PostgreSQL RLS, application-layer filtering, schema
migration, testing strategy, and phased rollout checklist).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add background job isolation policy to tenant isolation spec
Documents policy for all 5 existing background jobs:
- Knowledge Flywheel and PSA Retry flagged for account_id threading
- Chat Retention already follows correct pattern (model for others)
- Maintenance Schedule Firing needs account_id in queries + Session creation
- AI Conversation Expiry approved as cross-tenant with justification
Adds approved cross-tenant query registry and Phase 2 checklist items.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* docs: add tenant isolation Phase 0 implementation plan
8 tasks covering: CRITICAL copilot hotfix, tenant_filter() helper,
get_tenant_context dependency, analytics/category/AI session gap fixes,
full UUID endpoint audit, TargetList dead code audit, teams orphan
check, and CI grep check for missing tenant filters.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: CRITICAL — scope copilot tree query to current account
A user who knew another account's tree UUID could start a copilot
conversation, causing the tree's full node structure, names, and
descriptions to be sent to the AI as part of the system prompt.
Fix: add account_id (or is_default / visibility='public') filter to
the tree SELECT in copilot_service.start_conversation(). Returns 404
for inaccessible trees. Test added in test_tenant_isolation_p0.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes from beta tester session feedback:
1. MCP error handling (backend/app/services/assistant_chat_service.py)
- The MCP Microsoft Learn integration was catching only BadRequestError.
Any other error type (APIStatusError, APIConnectionError, timeout) from
the external MCP server propagated as a 502, causing the generic error.
- Now catches all Exception types when MCP is active and retries without
MCP using the stable client.messages.create endpoint.
2. Frontend error UX (frontend/src/pages/AssistantChatPage.tsx)
- catch {} was silently swallowing all errors and inserting a generic
assistant message. Now: differentiates 429 (rate limit) vs 502/503
(AI unavailable), removes the optimistic user message on failure,
restores the failed message to the input so users can retry without
retyping, and logs errors to console for debugging.
3. Image attachments visible in chat (frontend/src/components/assistant/ChatMessage.tsx)
- Uploaded images were sent to the AI correctly but never shown in the
chat thread. Now captures preview URLs before clearing pendingUploads
and renders thumbnails above the user bubble, clickable to full size.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add persistent session header with title, status badge, Resolve,
Escalate, and Update Ticket/Share Update buttons — mirrors
FlowPilotSessionPage pattern exactly
- Update Ticket label when psa_ticket_id present, Share Update otherwise
- Full mobile support via ⋯ overflow menu (Resolve, Escalate, Update, Pause)
- Strip _(not yet completed)_ markers from stored conversation_messages
in unified_chat_service to prevent stale task lane items from prior
turns leaking into new sessions via the AI's re-include instruction
- Add currentChatRef guard to handleResumeNew (was missing unlike handleSend)
- Remove Update/Conclude from chatbar — toolbar is now input utilities only
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Import and call clearTaskState before updating questions/actions in
handleSend and handleTaskSubmit so new AI tasks always replace stale
sessionStorage cache instead of being overridden by it
- Include pending (not yet completed) tasks in the AI message on partial
submit so the AI knows which tasks were left unanswered
- Fix stale closure in TaskLane saveTaskLane useEffect — use refs for
questions/actions so the debounced backend save always uses current values
- Add responses field to pending_task_lane TypeScript type, removing the
unsafe double-cast in selectChat
- Instruct the AI to re-surface incomplete tasks unless ≥75% confident
the information is no longer needed
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Reformat PSA resolution/escalation notes: clean single-line header,
steps with engineer responses inline, remove duplicate timing blocks,
remove AI confidence section, add follow-up recommendations
- Standardize time display to decimal hours (e.g. 0.25 hrs) across all
note formatters and status update context
- Add follow_up_recommendations to SessionDocumentation schema and
surface in SessionDocView; extracted from resolution suggestion steps
- Add _build_what_we_know() helper: uses session.evidence_items when
cockpit branch merges, falls back to deriving findings from steps
- Fix option label lookup in generate_status_update (was passing raw
machine values to AI instead of human-readable labels)
- Add 'What We Know' section to status update ticket notes prompt
- Improve _build_session_context in resolution_output_generator to
include intake text and full step details instead of truncated chat
- Add request_info audience type: client-facing information request
that skips the length step and generates a numbered question list
- Improve client_update and email_draft prompts with per-context
guidance (status/resolution/escalation) and fix escalation subject
line from 'Specialist Review' to 'Specialist Assistance'
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- script_builder endpoint: pg_advisory_xact_lock on user_id before
session count check, preventing concurrent creates from both passing
the MAX_SESSIONS_PER_USER guard
- script_builder_service send_message: pg_advisory_xact_lock on session_id
before message count check, preventing concurrent sends from both
passing the MAX_MESSAGES_PER_SESSION guard
- script_builder_service save_to_library: replace check-then-insert slug
logic with IntegrityError retry loop (3 attempts with fresh UUID suffix);
add unique constraint on script_templates.slug (migration 070)
- ScriptBuilderPage: add creatingSessionRef to serialize concurrent
handleSend calls that would otherwise both call createSession() while
session is still null
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The generate_status_update service inserted AISessionStep with
step_type='status_update' which violated the DB CHECK constraint,
causing a 500 error. Also fix incorrect field name confidence_score
(should be confidence_at_step) and remove nonexistent confidence_tier.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously save_to_library() hardcoded parameters_schema to empty and
always used session.latest_script. Now accepts optional overrides from
the frontend for parameterized script bodies.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_build_session_detail was omitting pending_task_lane, is_branching, and
active_branch_id from the GET /ai-sessions/{id} response. The fields
existed on the schema and model but were never passed in the manual
constructor, so task lane state could never be restored on navigation.
Also adds console logging to AssistantChatPage selectChat flow to
diagnose message restoration, and fixes ScriptTemplateEditor stepper
dismiss firing during programmatic script_body updates.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add PUT /ai-sessions/{id}/task-lane endpoint that saves the full task
lane state (AI questions/actions + user's in-progress responses) to
the pending_task_lane JSONB column. TaskLane debounce-saves to the
backend every 2s after changes. On session load, user responses are
restored from the backend into sessionStorage so TaskLane picks them
up on mount. Users can now close the browser, come back later, and
find their task lane exactly where they left it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Renamed fc01_add_pending_task_lane → 068_add_pending_task_lane with
revision ID "068" and down_revision "067". Added migration naming
convention to CLAUDE.md to prevent future hex-hash migrations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The pending_task_lane migration was branching from fb1481317ff6 which
already had a child (4f4137ce79e5). Fixes multiple-heads error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Task lane questions/actions are now saved to a pending_task_lane JSONB
column on ai_sessions, restoring them on session switch or page reload.
Partial submit no longer force-clears the lane — the AI response
controls what stays. Also removes redundant "New Session" button from
the sidebar (dashboard already provides this).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add DOCX MIME type to ALLOWED_DOCUMENT_TYPES in storage_service.py
- Add python-docx text extraction in _generate_ai_description
- Extract shared _store_document_content helper for PDF/DOCX
- Add python-docx>=1.1.0 to requirements.txt
- Add tests for docx upload acceptance and document fetch
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PDF uploads were stored in S3 and had text extracted during upload, but
fetch_upload_images() filtered exclusively for image MIME types, so
document content never reached the AI.
- Add fetch_upload_documents() in storage_service.py to retrieve
extracted_content for PDFs and text files
- Update ai_sessions.py chat endpoint to call both fetch_upload_images
and fetch_upload_documents, injecting document text as context
- Add PDF text extraction in _generate_ai_description (pypdf)
- Add pypdf>=4.0.0 to requirements.txt
- Fix test_db teardown to avoid connection pool issues
- Add 5 tests for fetch_upload_documents
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test was inserting a "Team Script" with team_id=NULL (since the
test user has no team), then expecting shared=true to return it.
SQL NULL=NULL is falsy, so the query correctly returned nothing.
Fix: create a team, assign it to the user, then test the filter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>