Files
resolutionflow/legal/data-inventory.md
Michael Chihlas 41f5519916
All checks were successful
Mirror to GitHub / mirror (push) Successful in 6s
docs(legal): add baseline legal documents (privacy, ToS, DPA, subprocessors, cookies)
Generated by the resolutionflow-legal skill from a code scan of the FastAPI
backend + React frontend on commit 0564646. Each document is a starting
point for attorney review, not legal advice.

Includes:
- privacy-policy.md, terms-of-service.md, cookie-policy.md (public-facing)
- dpa.md (contractual; signed with MSP customers)
- subprocessor-list.md (Railway, Anthropic, Voyage, Stripe, Resend, Sentry,
  PostHog, Google Fonts — confirmed live as of scan)
- data-inventory.md + classification.md (Phase 1/2 working files)
- attorney-review-checklist.md (consolidated [LEGAL REVIEW] punch list)
- implementation-verification.md (claim-by-claim audit vs. actual code)

Three blocking issues filed before public publication:
- #175 deletion-on-offboarding (or rewrite retention claims)
- #176 narrow Sentry send_default_pii + Session Replay config
- #177 EU/UK consent for PostHog + Google Fonts

Public-facing documents intentionally route physical-mail requests through
support@ rather than publishing the LLC's registered address.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 12:51:19 -04:00

24 KiB
Raw Permalink Blame History

ResolutionFlow Data Inventory

Generated: 2026-05-14 Repo path: /config/workspace/resolutionflow Scanned commit: 0564646 (branch feat/public-landing-routing-refactor)

Derived directly from the FastAPI backend, React 19 frontend, and deployment config. Anything ambiguous from the scan is flagged in Section 5 — Open questions and must be confirmed by the user before generation.


1. First-party data (ResolutionFlow as controller)

These are categories where ResolutionFlow itself decides why and how the data is processed (i.e., its own users, billing, telemetry).

1a. Account identity & authentication

Table Fields Sensitivity Retention
users email (unique), password_hash (bcrypt), name, phone, job_title, timezone, avatar_url, logo_data, company_display_name, role_at_signup, last_login, email_verified_at, deleted_at (soft) Direct PII + credential Indefinite (soft-delete only; no automated purge of soft-deleted rows)
accounts name, display_code, stripe_customer_id, branding_*, team_size_bucket, primary_psa, chat_retention_days (default 90), chat_retention_max_count (default 100), session_idle_minutes, session_absolute_minutes, sso_provider, sso_config (JSONB) Account metadata; tenant boundary Indefinite
account_invites email, code, role, invited_by_id, expires_at, revoked_at, email_sent_at PII (invitee email) Until expiry/revocation; no automated purge
oauth_identities provider (google/microsoft), provider_subject, provider_email_at_link, user_id PII (federated identity binding) Until manual unlink/account deletion
email_verification_tokens token_hash (SHA-256), user_id, expires_at, used_at Auth token (hashed) Until used or expired; no automated purge of expired rows confirmed
password_reset_tokens (parallel structure expected) Auth token (hashed) Until used or expired
refresh_tokens token_hash, user_id, expires_at, revoked_at Auth token (hashed) Idle 3d / absolute 14d defaults (overridable per-account); rows persist after expiry — no purge job confirmed

Authentication mechanics: JWT with HS256, 5-min access tokens, refresh-token rotation (idle 3d / absolute 14d defaults from Settings.SESSION_*_MINUTES_DEFAULT). Passwords hashed with bcrypt (12 rounds). OAuth supported for Google and Microsoft.

1b. Authorization & audit

Table Fields Sensitivity Retention
audit_logs user_id, account_id, action, resource_type, resource_id, details (JSONB), ip_address (up to 45 chars — IPv6) PII (IP address), behavioral Indefinite — no purge job
teams, team membership team metadata Tenant metadata Indefinite

1c. Billing & subscriptions

Table Fields Sensitivity Retention
subscriptions account_id, stripe_subscription_id, stripe_price_id, plan, status, current_period_*, cancel_at_period_end, seat_limit Billing metadata Indefinite
plan_billing (account billing snapshot fields) Billing metadata Indefinite
stripe_events id (Stripe event id), event_type, payload_excerpt (JSONB), processed_at Billing metadata Indefinite (idempotency table)

Card data: ResolutionFlow does not store card numbers. Stripe Elements (@stripe/stripe-js on the frontend) collects card details directly; only Stripe IDs are stored server-side.

1d. Telemetry, AI usage, product behavior

Table Fields Notes
ai_usage user_id, account_id, conversation_id, tier_at_time, input_tokens, output_tokens, estimated_cost_usd, succeeded, extra_data (JSONB) Per-AI-call accounting; no message bodies
feature_flag / overrides flag membership Operational
feedback, beta_feedback user_id, reaction, category, text, page_url, session_id User-supplied free-text feedback
survey_invite, survey_response survey content User-supplied
session_rating 15 star rating + feedback text User-supplied

1e. Marketing / pre-signup leads

Table Fields Notes
sales_leads email, name, company, team_size, message, source, posthog_distinct_id, status Contact/demo requests from public pages
(beta signup endpoint) similar — see api/endpoints/beta_signup.py Pre-onboarding leads

1f. Frontend telemetry (client-originated, server-collected)

  • PostHog (posthog-js) initialized in main.tsx: autocapture: true, capture_pageview: true, capture_pageleave: 'if_capture_pageview', persistence: 'localStorage+cookie'. Identified by user.id, grouped by account_id. Sends to us.i.posthog.com (US instance). Web Vitals events also forwarded.
  • Sentry (@sentry/react + sentry-sdk[fastapi]): error tracking + 20% traces sample rate in prod, Session Replay at 1% normal / 100% error sessions; maskAllText: false, blockAllMedia: false (instrument.ts), so replays can contain visible text and media unless an explicit data-sentry-mask is added.
  • Backend Sentry: send_default_pii=True (main.py:18) — Sentry receives user identifiers, request paths, and request body fragments by default.

2. Customer data (ResolutionFlow as processor)

Data flowing through ResolutionFlow on behalf of MSP customers. The MSP is the controller; ResolutionFlow processes on their instruction. These are the categories where the DPA's processor obligations apply.

2a. Troubleshooting session content

Table Fields Notes
ai_sessions intake_content (JSONB: text, image URLs, log contents, ticket data), problem_summary, problem_domain, conversation_messages (full LLM history JSONB), system_prompt_snapshot, pending_task_lane, resolution_summary, resolution_action, resolution_note_markdown, escalation_reason, escalation_package (JSONB), escalation_package_markdown, session_feedback, ticket_data (PSA snapshot) High sensitivity — may contain end-client names, hostnames, IPs, emails, internal credentials, ticket bodies. The MSP's clients are the data subjects here, not the MSP.
ai_session_steps per-step actions/notes Same sensitivity as parent
ai_session_embeddings pgvector embeddings Derived from session content
ai_conversations AI flow-builder wizard state, messages (JSONB), wizard_state, generated_tree, expires_at TTL: 24h, purged hourly via _cleanup_expired_ai_conversations
sessions (legacy guided sessions) tree_snapshot, path_taken, decisions, custom_steps, scratchpad, next_steps, ticket_number, client_name, outcome_notes Same sensitivity
session_branches, fork_point, session_handoff, session_facts, session_resolution_output, session_suggested_fixes branching + handoff artifacts Same sensitivity
assistant_chat, copilot_conversation open-ended chat threads with the model Same sensitivity. Retention: account-configurable, default 90 days OR 100-chat cap (retention_cleanup.py). Pinned chats are exempt.
ai_chat_session parallel chat session table Auto-archived after 30 days of inactivity (main.py:45) — archived (not deleted)
kb_import uploaded KB content for ingestion Same sensitivity

2b. Flow / Tree authoring

Table Notes
trees, tree, tree_embedding, tree_share, tree_chunker, draft_template, template_tree, step_library, step_category, script_template, script_builder_session, network_diagram, flow_proposal, platform_step, supporting_data Customer-authored content. Tenant-isolated except for template_trees, platform_steps, script_categories, plan_feature_defaults, accounts (global tables).

2c. PSA connection & ticket data

Table Fields Notes
psa_connections provider, display_name, site_url, company_id, credentials_encrypted (Fernet, key derived via HKDF from SECRET_KEY — see encryption.py), flowpilot_settings One per account. Application-layer encryption of credentials at rest.
psa_activity_log, psa_post_log, psa_member_mapping PSA push history, retry state Internal audit of round-trip writes

PSA ticket bodies, contact names, company names, and notes flow into ai_sessions.ticket_data and intake_content. ConnectWise is the MSP's existing data source, not a ResolutionFlow subprocessor (see references/msp-context.md and Subprocessor section below). When ResolutionFlow writes back (resolution notes, escalation packages), that's the MSP instructing a write to their own data store — resolution_note_external_id and escalation_package_external_id capture the round-trip pointer.

2d. File uploads

Table Fields Storage Retention
file_uploads account_id, uploaded_by, session_id, filename, content_type, size_bytes, storage_key, ai_description, extracted_content, content_summary Railway Object Storage (S3-compatible) bucket resolutionflow-uploads Indefinite — no automated purge surfaced
attachments session attachments Same Indefinite

PDFs and DOCX files are text-extracted (pypdf, python-docx). Images are resized via Pillow and forwarded as multimodal blocks to Claude — but per repo convention, images are not stored in conversation history.

2e. Notifications & emails

Table Notes
notifications In-app notifications
notification_log Delivery attempts
notification_config Per-user/account preferences

Transactional email is sent via Resend (resend==2.21.0, RESEND_API_KEY). FROM address: invites@resolutionflow.com. Sales-lead notifications go to sales@resolutionflow.com.


3. Subprocessors

Each row reflects what the scan found in the codebase or deployment configuration.

Subprocessor: Railway

  • Service type: Application + database hosting + S3-compatible object storage
  • Data categories: All stored data — primary PostgreSQL database (DB name railway in prod, alias patherly), application compute, uploaded files in resolutionflow-uploads bucket
  • Location: US (Railway default region; confirm specific region used)
  • Detected via: backend/railway.toml, frontend/railway.toml, DATABASE_URL, STORAGE_* env vars
  • DPA reference: https://railway.com/legal/dpa

Subprocessor: Anthropic

  • Service type: LLM API (Claude — Sonnet 4.6 standard tier, Haiku 4.5 fast tier)
  • Data categories: Session intake text, conversation history, ticket data, file content (PDF/DOCX text + resized image bytes), prompt cache contents
  • Location: US
  • Purpose: FlowPilot guided troubleshooting, AI flow builder, chat, resolution-note + escalation-package generation, fact synthesis, template extraction, network-diagram generation, script builder
  • Detected via: ANTHROPIC_API_KEY, anthropic>=0.40.0, AI_PROVIDER='anthropic' in config.py:153-208
  • DPA reference: https://www.anthropic.com/legal/commercial-dpa
  • [LEGAL REVIEW: verify training carve-out] Anthropic's commercial API tier does not train on customer data by default — confirm the tier in use matches before publishing.

Subprocessor: Google AI (Gemini)

  • Service type: LLM API fallback
  • Data categories: Same as Anthropic when AI_PROVIDER='gemini'
  • Location: US
  • Detected via: GOOGLE_AI_API_KEY, google-genai>=1.0.0, AI_MODEL_GEMINI='gemini-2.5-flash'
  • DPA reference: https://cloud.google.com/terms/data-processing-addendum
  • [LEGAL REVIEW: confirm whether Gemini is currently active] The code path exists but Anthropic is the configured default. Disclose either as "primary + fallback" or remove if Gemini key is not provisioned in prod.

Subprocessor: Voyage AI

  • Service type: Embeddings (RAG / similarity search)
  • Data categories: Text excerpts from sessions and flows used to compute vector embeddings (voyage-3.5, 1024 dimensions)
  • Location: US
  • Detected via: VOYAGE_API_KEY, voyageai>=0.3.0, EMBEDDING_MODEL='voyage-3.5'
  • DPA reference: https://www.voyageai.com/dpa [LEGAL REVIEW: confirm Voyage DPA URL and zero-retention status]

Subprocessor: Stripe

  • Service type: Payment processing
  • Data categories: Billing contact, card details (collected by Stripe Elements client-side — ResolutionFlow does not see PANs), Stripe customer/subscription IDs, webhook event payloads
  • Location: US (Stripe Global)
  • Detected via: STRIPE_SECRET_KEY, STRIPE_PUBLISHABLE_KEY, STRIPE_WEBHOOK_SECRET, stripe==14.3.0, @stripe/stripe-js
  • DPA reference: https://stripe.com/legal/dpa
  • PCI: SAQ-A scope (Stripe Elements). ResolutionFlow never receives full card data.

Subprocessor: Resend

  • Service type: Transactional email
  • Data categories: Recipient email addresses, email subject + body content (account invites, password resets, email verification, feedback notifications, sales-lead notifications)
  • Location: US
  • Detected via: RESEND_API_KEY, resend==2.21.0, FROM_EMAIL='invites@resolutionflow.com'
  • DPA reference: https://resend.com/legal/dpa

Subprocessor: Sentry

  • Service type: Error tracking + performance tracing + Session Replay
  • Data categories: Stack traces, request paths, user IDs and request body fragments (send_default_pii=True), browser session replays at 1%/100% sampling with text + media unmasked, breadcrumbs
  • Location: US (Sentry SaaS) — [LEGAL REVIEW: confirm Sentry data region]
  • Detected via: SENTRY_DSN, sentry-sdk[fastapi]>=2.54.0, @sentry/react, main.py:14-26, instrument.ts
  • DPA reference: https://sentry.io/legal/dpa/
  • [LEGAL REVIEW: PII posture] send_default_pii=True + unmasked Session Replay is broader than typical defaults. Either narrow the configuration (recommended: enable text masking on sensitive routes; set send_default_pii=False; add Sentry scrubbing rules for intake_content, conversation_messages, ticket_data) or disclose explicitly.

Subprocessor: PostHog

  • Service type: Product analytics + Web Vitals
  • Data categories: User ID, account ID (as group), email + name + plan + role on identify, page paths, autocaptured DOM interactions, custom events
  • Location: US (us.i.posthog.com instance)
  • Detected via: posthog-js, @posthog/react, main.tsx:17-23, VITE_PUBLIC_POSTHOG_KEY
  • DPA reference: https://posthog.com/dpa
  • Cookies: PostHog sets a first-party cookie because persistence: 'localStorage+cookie' is configured — disclosure required in Cookie Policy and consent flow if EU/UK visitors are reachable on public pages.

Subprocessor: Google Fonts

  • Service type: Font CDN
  • Data categories: Visitor IP address (Google Fonts exposes IPs to Google)
  • Location: Global Google CDN
  • Detected via: index.html:11-13fonts.googleapis.com + fonts.gstatic.com
  • DPA reference: Google's terms (Google Fonts is normally treated as a service, not a controller-controller share, but the IP exposure is a known disclosure)
  • [LEGAL REVIEW: Schrems II / EU caution] For EU/UK visitors, Google Fonts loaded over fonts.googleapis.com is a recurring GDPR enforcement target. Consider self-hosting (Bunny Fonts or bundling) to remove the disclosure.

NOT subprocessors (deliberately excluded)

  • ConnectWise PSA — MSP customer's existing data source/controller, not a ResolutionFlow subprocessor (see references/msp-context.md). Disclose as "data source the customer authorizes ResolutionFlow to read from and, when instructed, write to."
  • Autotask, HaloPSA — same classification (provider stubs exist in services/psa/; current scan suggests ConnectWise is the only live provider, but [OPEN QUESTION] below asks the user to confirm)
  • GoDaddy / DNS registrar — DNS only, no traffic proxy
  • GitHub mirror, Gitea — source control, no customer data flows
  • Microsoft Learn MCP — read-only documentation lookup; the MCP server returns docs to ResolutionFlow, no customer data flows to Microsoft as part of this integration

4. Cookies and trackers

Name / pattern Type Set by Purpose Strict-necessary?
ph_* (PostHog) Persistent first-party posthog-js (persistence: 'localStorage+cookie') Analytics — distinct ID, session, feature-flag state No — requires consent under GDPR/UK PECR
access_token, refresh_token localStorage (NOT cookies) authStore, OAuthCallbackPage, SessionExpiryToast Auth bearer tokens for API calls Strict-necessary
theme-storage localStorage index.html inline script UI theme preference Strict-necessary (preference)
rf-editor-fullscreen localStorage Modal.tsx UI preference Strict-necessary (preference)
rf-intended-plan localStorage RegisterPage.tsx Carry pricing-page selection into signup Strict-necessary (UX)
recentFlows storage key localStorage lib/recentFlows.ts Recent flow MRU Strict-necessary (UX)
Step-feedback "hint shown" flag localStorage StepFeedback.tsx Suppress repeated coachmark Strict-necessary (UX)
Rated-sessions list localStorage csatUtils.ts Hide CSAT widget after rating Strict-necessary (UX)
Escalation-queue "seen" set localStorage EscalationQueue.tsx Mark notifications seen Strict-necessary (UX)

Backend-set cookies: None found. Auth uses bearer tokens delivered in JSON, stored client-side in localStorage. No Set-Cookie headers issued by FastAPI middleware.

Note on auth tokens in localStorage: This is a known security-disclosure point. Tokens in localStorage are accessible to any JS running on the page; XSS would expose them. Disclose in the security section of the Privacy Policy as a deliberate architecture choice.


5. Retention and deletion logic — confirmed gaps

What the scan confirms has automated retention:

  • AI flow-builder wizard conversations (ai_conversations): 24h TTL, purged hourly (scheduler.py:118)
  • Assistant chats (assistant_chat): account-configurable retention, default 90 days OR 100 chats (whichever first) for non-pinned chats; cleanup runs daily (retention_cleanup.py)
  • AI chat sessions (ai_chat_session): auto-archived (not deleted) after 30 days idle (main.py:45)

What the scan confirms is missing:

  • audit_logs — no purge job; grows indefinitely (IP addresses retained forever)
  • refresh_tokens — expired/revoked rows persist; no GC
  • email_verification_tokens, password_reset_tokens — no purge of expired rows confirmed
  • file_uploads and Railway storage objects — no lifecycle policy surfaced
  • ai_sessions and full session content (intake, conversation, ticket snapshots) — no automated purge; tied only to soft-delete of the owning user
  • ai_usage — telemetry retained indefinitely
  • sales_leads, beta_feedback, survey_response — no purge job
  • notifications, notification_log — no purge job
  • stripe_events — idempotency table grows indefinitely
  • Soft-deleted users (users.deleted_at) — no hard-delete job; hard_delete_user exists as a super-admin endpoint only

Account deletion behavior (accounts.py:524): owner-only, blocked if other members exist, performs soft-delete of the user + revoke all refresh tokens. Account row, audit logs, sessions, files, etc. are not purged.

[LEGAL REVIEW: GDPR Article 5(1)(e) storage limitation] A controller-facing claim of "we retain data only as long as necessary" would conflict with the current state. The Privacy Policy should either (a) describe the actual state honestly ("retained until you request deletion") with an explicit deletion-on-request commitment and SLA, or (b) implement scheduled purge for the categories above before publishing.


6. Logging & encryption posture

Logging (app/core/middleware.py RequestLoggingMiddleware, ErrorLoggingMiddleware): request paths and errors logged via Python logging. [LEGAL REVIEW: confirm whether request bodies are logged] — if yes, structured PII (emails, ticket content) ends up in logs/ and on Railway. Audit logger.info / logger.exception call sites to verify.

At-rest encryption:

  • PSA credentials (psa_connections.credentials_encrypted): application-layer Fernet encryption, key derived from SECRET_KEY via HKDF. Confirmed.
  • Railway-managed Postgres + Object Storage: disk-level encryption from the platform. [LEGAL REVIEW: verify Railway encryption attestation] before claiming "encrypted at rest" globally.
  • No additional column-level encryption for password_hash (bcrypt is the protection there), ai_sessions.*, intake_content, conversation_messages, etc.

In transit: HTTPS on prod (resolutionflow.com, api.resolutionflow.com). Backend serves over HTTP locally; production CORS gated by ALLOW_RAILWAY_ORIGINS for PR envs.

Security headers: SecurityHeadersMiddleware present with CSP in report-only mode (CSP_REPORT_ONLY=True default).


7. Open questions for the user

These must be confirmed before generation:

  1. Live PSA providersservices/psa/ has stubs for ConnectWise, Autotask, and HaloPSA. Is only ConnectWise active in production, or are Autotask/HaloPSA also enabled? (Affects DPA and Privacy Policy data-source list.)
  2. Gemini status — is GOOGLE_AI_API_KEY provisioned in prod, or is Anthropic the sole live LLM provider? (Disclose one or both.)
  3. Voyage AI status — is VOYAGE_API_KEY provisioned in prod? Embeddings are a live code path but the key may not be set.
  4. Sentry data region — US or EU? (Affects EU data-transfer disclosure.)
  5. Railway region — which region is the prod project deployed in? (Affects data-location claims.)
  6. Jurisdictions targeted — should we assume EU/UK reachable (default yes for B2B SaaS), California (yes), other US states (Virginia, Colorado, Connecticut, Texas — newer laws now in force)? Anything to exclude?
  7. Business entity — what is the legal entity name and address that should appear as "Controller" / "Service Provider" on the documents? (Required for binding contact / notices section.)
  8. DPO / privacy contact email — is there a dedicated address (e.g., privacy@resolutionflow.com), or should we use support@ / michael@resolutionflow.com?
  9. Whether Microsoft Learn MCP usage is enabled in prodENABLE_MCP_MICROSOFT_LEARN=True default. The integration retrieves docs only (no customer data outflow), but worth confirming.
  10. Non-codebase tools — does ResolutionFlow use any of: Zapier/n8n/Make, HubSpot/Salesforce CRM, DocuSign, Help Scout/Zendesk, transcription/voice (Whisper, Eleven Labs), customer-data-platform tooling? None found in code; common to be configured elsewhere.
  11. AGE: Children's data — confirm ResolutionFlow has no users under 13 (US COPPA) / 16 (UK GDPR). Should be implicit for a B2B MSP product but the policy needs to state it.
  12. Free tier / EULA — confirm whether the product accepts unauthenticated visitors who can submit anything other than the public sales-lead form and public flow shares.
  13. Backup retention — Railway Postgres backups (point-in-time recovery window) extend effective retention. Confirm the PITR window and disclose.

Stop point. Per the skill workflow, generation is blocked on user confirmation of this inventory. Please review and either confirm or correct each section — and answer Section 7 — before I move to Phase 2 (classification) and Phase 3 (generation).