Generated by the resolutionflow-legal skill from a code scan of the FastAPI
backend + React frontend on commit 0564646. Each document is a starting
point for attorney review, not legal advice.
Includes:
- privacy-policy.md, terms-of-service.md, cookie-policy.md (public-facing)
- dpa.md (contractual; signed with MSP customers)
- subprocessor-list.md (Railway, Anthropic, Voyage, Stripe, Resend, Sentry,
PostHog, Google Fonts — confirmed live as of scan)
- data-inventory.md + classification.md (Phase 1/2 working files)
- attorney-review-checklist.md (consolidated [LEGAL REVIEW] punch list)
- implementation-verification.md (claim-by-claim audit vs. actual code)
Three blocking issues filed before public publication:
- #175 deletion-on-offboarding (or rewrite retention claims)
- #176 narrow Sentry send_default_pii + Session Replay config
- #177 EU/UK consent for PostHog + Google Fonts
Public-facing documents intentionally route physical-mail requests through
support@ rather than publishing the LLC's registered address.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24 KiB
ResolutionFlow Data Inventory
Generated: 2026-05-14
Repo path: /config/workspace/resolutionflow
Scanned commit: 0564646 (branch feat/public-landing-routing-refactor)
Derived directly from the FastAPI backend, React 19 frontend, and deployment config. Anything ambiguous from the scan is flagged in Section 5 — Open questions and must be confirmed by the user before generation.
1. First-party data (ResolutionFlow as controller)
These are categories where ResolutionFlow itself decides why and how the data is processed (i.e., its own users, billing, telemetry).
1a. Account identity & authentication
| Table | Fields | Sensitivity | Retention |
|---|---|---|---|
users |
email (unique), password_hash (bcrypt), name, phone, job_title, timezone, avatar_url, logo_data, company_display_name, role_at_signup, last_login, email_verified_at, deleted_at (soft) |
Direct PII + credential | Indefinite (soft-delete only; no automated purge of soft-deleted rows) |
accounts |
name, display_code, stripe_customer_id, branding_*, team_size_bucket, primary_psa, chat_retention_days (default 90), chat_retention_max_count (default 100), session_idle_minutes, session_absolute_minutes, sso_provider, sso_config (JSONB) |
Account metadata; tenant boundary | Indefinite |
account_invites |
email, code, role, invited_by_id, expires_at, revoked_at, email_sent_at |
PII (invitee email) | Until expiry/revocation; no automated purge |
oauth_identities |
provider (google/microsoft), provider_subject, provider_email_at_link, user_id |
PII (federated identity binding) | Until manual unlink/account deletion |
email_verification_tokens |
token_hash (SHA-256), user_id, expires_at, used_at |
Auth token (hashed) | Until used or expired; no automated purge of expired rows confirmed |
password_reset_tokens |
(parallel structure expected) | Auth token (hashed) | Until used or expired |
refresh_tokens |
token_hash, user_id, expires_at, revoked_at |
Auth token (hashed) | Idle 3d / absolute 14d defaults (overridable per-account); rows persist after expiry — no purge job confirmed |
Authentication mechanics: JWT with HS256, 5-min access tokens, refresh-token rotation (idle 3d / absolute 14d defaults from Settings.SESSION_*_MINUTES_DEFAULT). Passwords hashed with bcrypt (12 rounds). OAuth supported for Google and Microsoft.
1b. Authorization & audit
| Table | Fields | Sensitivity | Retention |
|---|---|---|---|
audit_logs |
user_id, account_id, action, resource_type, resource_id, details (JSONB), ip_address (up to 45 chars — IPv6) |
PII (IP address), behavioral | Indefinite — no purge job |
teams, team membership |
team metadata | Tenant metadata | Indefinite |
1c. Billing & subscriptions
| Table | Fields | Sensitivity | Retention |
|---|---|---|---|
subscriptions |
account_id, stripe_subscription_id, stripe_price_id, plan, status, current_period_*, cancel_at_period_end, seat_limit |
Billing metadata | Indefinite |
plan_billing |
(account billing snapshot fields) | Billing metadata | Indefinite |
stripe_events |
id (Stripe event id), event_type, payload_excerpt (JSONB), processed_at |
Billing metadata | Indefinite (idempotency table) |
Card data: ResolutionFlow does not store card numbers. Stripe Elements (@stripe/stripe-js on the frontend) collects card details directly; only Stripe IDs are stored server-side.
1d. Telemetry, AI usage, product behavior
| Table | Fields | Notes |
|---|---|---|
ai_usage |
user_id, account_id, conversation_id, tier_at_time, input_tokens, output_tokens, estimated_cost_usd, succeeded, extra_data (JSONB) |
Per-AI-call accounting; no message bodies |
feature_flag / overrides |
flag membership | Operational |
feedback, beta_feedback |
user_id, reaction, category, text, page_url, session_id |
User-supplied free-text feedback |
survey_invite, survey_response |
survey content | User-supplied |
session_rating |
1–5 star rating + feedback text | User-supplied |
1e. Marketing / pre-signup leads
| Table | Fields | Notes |
|---|---|---|
sales_leads |
email, name, company, team_size, message, source, posthog_distinct_id, status |
Contact/demo requests from public pages |
| (beta signup endpoint) | similar — see api/endpoints/beta_signup.py |
Pre-onboarding leads |
1f. Frontend telemetry (client-originated, server-collected)
- PostHog (
posthog-js) initialized in main.tsx:autocapture: true,capture_pageview: true,capture_pageleave: 'if_capture_pageview',persistence: 'localStorage+cookie'. Identified byuser.id, grouped byaccount_id. Sends tous.i.posthog.com(US instance). Web Vitals events also forwarded. - Sentry (
@sentry/react+sentry-sdk[fastapi]): error tracking + 20% traces sample rate in prod, Session Replay at 1% normal / 100% error sessions;maskAllText: false,blockAllMedia: false(instrument.ts), so replays can contain visible text and media unless an explicitdata-sentry-maskis added. - Backend Sentry:
send_default_pii=True(main.py:18) — Sentry receives user identifiers, request paths, and request body fragments by default.
2. Customer data (ResolutionFlow as processor)
Data flowing through ResolutionFlow on behalf of MSP customers. The MSP is the controller; ResolutionFlow processes on their instruction. These are the categories where the DPA's processor obligations apply.
2a. Troubleshooting session content
| Table | Fields | Notes |
|---|---|---|
ai_sessions |
intake_content (JSONB: text, image URLs, log contents, ticket data), problem_summary, problem_domain, conversation_messages (full LLM history JSONB), system_prompt_snapshot, pending_task_lane, resolution_summary, resolution_action, resolution_note_markdown, escalation_reason, escalation_package (JSONB), escalation_package_markdown, session_feedback, ticket_data (PSA snapshot) |
High sensitivity — may contain end-client names, hostnames, IPs, emails, internal credentials, ticket bodies. The MSP's clients are the data subjects here, not the MSP. |
ai_session_steps |
per-step actions/notes | Same sensitivity as parent |
ai_session_embeddings |
pgvector embeddings | Derived from session content |
ai_conversations |
AI flow-builder wizard state, messages (JSONB), wizard_state, generated_tree, expires_at |
TTL: 24h, purged hourly via _cleanup_expired_ai_conversations |
sessions (legacy guided sessions) |
tree_snapshot, path_taken, decisions, custom_steps, scratchpad, next_steps, ticket_number, client_name, outcome_notes |
Same sensitivity |
session_branches, fork_point, session_handoff, session_facts, session_resolution_output, session_suggested_fixes |
branching + handoff artifacts | Same sensitivity |
assistant_chat, copilot_conversation |
open-ended chat threads with the model | Same sensitivity. Retention: account-configurable, default 90 days OR 100-chat cap (retention_cleanup.py). Pinned chats are exempt. |
ai_chat_session |
parallel chat session table | Auto-archived after 30 days of inactivity (main.py:45) — archived (not deleted) |
kb_import |
uploaded KB content for ingestion | Same sensitivity |
2b. Flow / Tree authoring
| Table | Notes |
|---|---|
trees, tree, tree_embedding, tree_share, tree_chunker, draft_template, template_tree, step_library, step_category, script_template, script_builder_session, network_diagram, flow_proposal, platform_step, supporting_data |
Customer-authored content. Tenant-isolated except for template_trees, platform_steps, script_categories, plan_feature_defaults, accounts (global tables). |
2c. PSA connection & ticket data
| Table | Fields | Notes |
|---|---|---|
psa_connections |
provider, display_name, site_url, company_id, credentials_encrypted (Fernet, key derived via HKDF from SECRET_KEY — see encryption.py), flowpilot_settings |
One per account. Application-layer encryption of credentials at rest. |
psa_activity_log, psa_post_log, psa_member_mapping |
PSA push history, retry state | Internal audit of round-trip writes |
PSA ticket bodies, contact names, company names, and notes flow into ai_sessions.ticket_data and intake_content. ConnectWise is the MSP's existing data source, not a ResolutionFlow subprocessor (see references/msp-context.md and Subprocessor section below). When ResolutionFlow writes back (resolution notes, escalation packages), that's the MSP instructing a write to their own data store — resolution_note_external_id and escalation_package_external_id capture the round-trip pointer.
2d. File uploads
| Table | Fields | Storage | Retention |
|---|---|---|---|
file_uploads |
account_id, uploaded_by, session_id, filename, content_type, size_bytes, storage_key, ai_description, extracted_content, content_summary |
Railway Object Storage (S3-compatible) bucket resolutionflow-uploads |
Indefinite — no automated purge surfaced |
attachments |
session attachments | Same | Indefinite |
PDFs and DOCX files are text-extracted (pypdf, python-docx). Images are resized via Pillow and forwarded as multimodal blocks to Claude — but per repo convention, images are not stored in conversation history.
2e. Notifications & emails
| Table | Notes |
|---|---|
notifications |
In-app notifications |
notification_log |
Delivery attempts |
notification_config |
Per-user/account preferences |
Transactional email is sent via Resend (resend==2.21.0, RESEND_API_KEY). FROM address: invites@resolutionflow.com. Sales-lead notifications go to sales@resolutionflow.com.
3. Subprocessors
Each row reflects what the scan found in the codebase or deployment configuration.
Subprocessor: Railway
- Service type: Application + database hosting + S3-compatible object storage
- Data categories: All stored data — primary PostgreSQL database (DB name
railwayin prod, aliaspatherly), application compute, uploaded files inresolutionflow-uploadsbucket - Location: US (Railway default region; confirm specific region used)
- Detected via:
backend/railway.toml,frontend/railway.toml,DATABASE_URL,STORAGE_*env vars - DPA reference: https://railway.com/legal/dpa
Subprocessor: Anthropic
- Service type: LLM API (Claude — Sonnet 4.6 standard tier, Haiku 4.5 fast tier)
- Data categories: Session intake text, conversation history, ticket data, file content (PDF/DOCX text + resized image bytes), prompt cache contents
- Location: US
- Purpose: FlowPilot guided troubleshooting, AI flow builder, chat, resolution-note + escalation-package generation, fact synthesis, template extraction, network-diagram generation, script builder
- Detected via:
ANTHROPIC_API_KEY,anthropic>=0.40.0,AI_PROVIDER='anthropic'in config.py:153-208 - DPA reference: https://www.anthropic.com/legal/commercial-dpa
- [LEGAL REVIEW: verify training carve-out] Anthropic's commercial API tier does not train on customer data by default — confirm the tier in use matches before publishing.
Subprocessor: Google AI (Gemini)
- Service type: LLM API fallback
- Data categories: Same as Anthropic when
AI_PROVIDER='gemini' - Location: US
- Detected via:
GOOGLE_AI_API_KEY,google-genai>=1.0.0,AI_MODEL_GEMINI='gemini-2.5-flash' - DPA reference: https://cloud.google.com/terms/data-processing-addendum
- [LEGAL REVIEW: confirm whether Gemini is currently active] The code path exists but Anthropic is the configured default. Disclose either as "primary + fallback" or remove if Gemini key is not provisioned in prod.
Subprocessor: Voyage AI
- Service type: Embeddings (RAG / similarity search)
- Data categories: Text excerpts from sessions and flows used to compute vector embeddings (
voyage-3.5, 1024 dimensions) - Location: US
- Detected via:
VOYAGE_API_KEY,voyageai>=0.3.0,EMBEDDING_MODEL='voyage-3.5' - DPA reference: https://www.voyageai.com/dpa [LEGAL REVIEW: confirm Voyage DPA URL and zero-retention status]
Subprocessor: Stripe
- Service type: Payment processing
- Data categories: Billing contact, card details (collected by Stripe Elements client-side — ResolutionFlow does not see PANs), Stripe customer/subscription IDs, webhook event payloads
- Location: US (Stripe Global)
- Detected via:
STRIPE_SECRET_KEY,STRIPE_PUBLISHABLE_KEY,STRIPE_WEBHOOK_SECRET,stripe==14.3.0,@stripe/stripe-js - DPA reference: https://stripe.com/legal/dpa
- PCI: SAQ-A scope (Stripe Elements). ResolutionFlow never receives full card data.
Subprocessor: Resend
- Service type: Transactional email
- Data categories: Recipient email addresses, email subject + body content (account invites, password resets, email verification, feedback notifications, sales-lead notifications)
- Location: US
- Detected via:
RESEND_API_KEY,resend==2.21.0,FROM_EMAIL='invites@resolutionflow.com' - DPA reference: https://resend.com/legal/dpa
Subprocessor: Sentry
- Service type: Error tracking + performance tracing + Session Replay
- Data categories: Stack traces, request paths, user IDs and request body fragments (
send_default_pii=True), browser session replays at 1%/100% sampling with text + media unmasked, breadcrumbs - Location: US (Sentry SaaS) — [LEGAL REVIEW: confirm Sentry data region]
- Detected via:
SENTRY_DSN,sentry-sdk[fastapi]>=2.54.0,@sentry/react, main.py:14-26, instrument.ts - DPA reference: https://sentry.io/legal/dpa/
- [LEGAL REVIEW: PII posture]
send_default_pii=True+ unmasked Session Replay is broader than typical defaults. Either narrow the configuration (recommended: enable text masking on sensitive routes; setsend_default_pii=False; add Sentry scrubbing rules forintake_content,conversation_messages,ticket_data) or disclose explicitly.
Subprocessor: PostHog
- Service type: Product analytics + Web Vitals
- Data categories: User ID, account ID (as group), email + name + plan + role on identify, page paths, autocaptured DOM interactions, custom events
- Location: US (
us.i.posthog.cominstance) - Detected via:
posthog-js,@posthog/react, main.tsx:17-23,VITE_PUBLIC_POSTHOG_KEY - DPA reference: https://posthog.com/dpa
- Cookies: PostHog sets a first-party cookie because
persistence: 'localStorage+cookie'is configured — disclosure required in Cookie Policy and consent flow if EU/UK visitors are reachable on public pages.
Subprocessor: Google Fonts
- Service type: Font CDN
- Data categories: Visitor IP address (Google Fonts exposes IPs to Google)
- Location: Global Google CDN
- Detected via: index.html:11-13 —
fonts.googleapis.com+fonts.gstatic.com - DPA reference: Google's terms (Google Fonts is normally treated as a service, not a controller-controller share, but the IP exposure is a known disclosure)
- [LEGAL REVIEW: Schrems II / EU caution] For EU/UK visitors, Google Fonts loaded over
fonts.googleapis.comis a recurring GDPR enforcement target. Consider self-hosting (Bunny Fonts or bundling) to remove the disclosure.
NOT subprocessors (deliberately excluded)
- ConnectWise PSA — MSP customer's existing data source/controller, not a ResolutionFlow subprocessor (see
references/msp-context.md). Disclose as "data source the customer authorizes ResolutionFlow to read from and, when instructed, write to." - Autotask, HaloPSA — same classification (provider stubs exist in
services/psa/; current scan suggests ConnectWise is the only live provider, but [OPEN QUESTION] below asks the user to confirm) - GoDaddy / DNS registrar — DNS only, no traffic proxy
- GitHub mirror, Gitea — source control, no customer data flows
- Microsoft Learn MCP — read-only documentation lookup; the MCP server returns docs to ResolutionFlow, no customer data flows to Microsoft as part of this integration
4. Cookies and trackers
| Name / pattern | Type | Set by | Purpose | Strict-necessary? |
|---|---|---|---|---|
ph_* (PostHog) |
Persistent first-party | posthog-js (persistence: 'localStorage+cookie') |
Analytics — distinct ID, session, feature-flag state | No — requires consent under GDPR/UK PECR |
access_token, refresh_token |
localStorage (NOT cookies) | authStore, OAuthCallbackPage, SessionExpiryToast |
Auth bearer tokens for API calls | Strict-necessary |
theme-storage |
localStorage | index.html inline script |
UI theme preference | Strict-necessary (preference) |
rf-editor-fullscreen |
localStorage | Modal.tsx |
UI preference | Strict-necessary (preference) |
rf-intended-plan |
localStorage | RegisterPage.tsx |
Carry pricing-page selection into signup | Strict-necessary (UX) |
recentFlows storage key |
localStorage | lib/recentFlows.ts |
Recent flow MRU | Strict-necessary (UX) |
| Step-feedback "hint shown" flag | localStorage | StepFeedback.tsx |
Suppress repeated coachmark | Strict-necessary (UX) |
| Rated-sessions list | localStorage | csatUtils.ts |
Hide CSAT widget after rating | Strict-necessary (UX) |
| Escalation-queue "seen" set | localStorage | EscalationQueue.tsx |
Mark notifications seen | Strict-necessary (UX) |
Backend-set cookies: None found. Auth uses bearer tokens delivered in JSON, stored client-side in localStorage. No Set-Cookie headers issued by FastAPI middleware.
Note on auth tokens in localStorage: This is a known security-disclosure point. Tokens in localStorage are accessible to any JS running on the page; XSS would expose them. Disclose in the security section of the Privacy Policy as a deliberate architecture choice.
5. Retention and deletion logic — confirmed gaps
What the scan confirms has automated retention:
- AI flow-builder wizard conversations (
ai_conversations): 24h TTL, purged hourly (scheduler.py:118) - Assistant chats (
assistant_chat): account-configurable retention, default 90 days OR 100 chats (whichever first) for non-pinned chats; cleanup runs daily (retention_cleanup.py) - AI chat sessions (
ai_chat_session): auto-archived (not deleted) after 30 days idle (main.py:45)
What the scan confirms is missing:
audit_logs— no purge job; grows indefinitely (IP addresses retained forever)refresh_tokens— expired/revoked rows persist; no GCemail_verification_tokens,password_reset_tokens— no purge of expired rows confirmedfile_uploadsand Railway storage objects — no lifecycle policy surfacedai_sessionsand full session content (intake, conversation, ticket snapshots) — no automated purge; tied only to soft-delete of the owning userai_usage— telemetry retained indefinitelysales_leads,beta_feedback,survey_response— no purge jobnotifications,notification_log— no purge jobstripe_events— idempotency table grows indefinitely- Soft-deleted users (
users.deleted_at) — no hard-delete job;hard_delete_userexists as a super-admin endpoint only
Account deletion behavior (accounts.py:524): owner-only, blocked if other members exist, performs soft-delete of the user + revoke all refresh tokens. Account row, audit logs, sessions, files, etc. are not purged.
[LEGAL REVIEW: GDPR Article 5(1)(e) storage limitation] A controller-facing claim of "we retain data only as long as necessary" would conflict with the current state. The Privacy Policy should either (a) describe the actual state honestly ("retained until you request deletion") with an explicit deletion-on-request commitment and SLA, or (b) implement scheduled purge for the categories above before publishing.
6. Logging & encryption posture
Logging (app/core/middleware.py RequestLoggingMiddleware, ErrorLoggingMiddleware): request paths and errors logged via Python logging. [LEGAL REVIEW: confirm whether request bodies are logged] — if yes, structured PII (emails, ticket content) ends up in logs/ and on Railway. Audit logger.info / logger.exception call sites to verify.
At-rest encryption:
- PSA credentials (
psa_connections.credentials_encrypted): application-layer Fernet encryption, key derived fromSECRET_KEYvia HKDF. ✅ Confirmed. - Railway-managed Postgres + Object Storage: disk-level encryption from the platform. [LEGAL REVIEW: verify Railway encryption attestation] before claiming "encrypted at rest" globally.
- No additional column-level encryption for
password_hash(bcrypt is the protection there),ai_sessions.*,intake_content,conversation_messages, etc.
In transit: HTTPS on prod (resolutionflow.com, api.resolutionflow.com). Backend serves over HTTP locally; production CORS gated by ALLOW_RAILWAY_ORIGINS for PR envs.
Security headers: SecurityHeadersMiddleware present with CSP in report-only mode (CSP_REPORT_ONLY=True default).
7. Open questions for the user
These must be confirmed before generation:
- Live PSA providers —
services/psa/has stubs for ConnectWise, Autotask, and HaloPSA. Is only ConnectWise active in production, or are Autotask/HaloPSA also enabled? (Affects DPA and Privacy Policy data-source list.) - Gemini status — is
GOOGLE_AI_API_KEYprovisioned in prod, or is Anthropic the sole live LLM provider? (Disclose one or both.) - Voyage AI status — is
VOYAGE_API_KEYprovisioned in prod? Embeddings are a live code path but the key may not be set. - Sentry data region — US or EU? (Affects EU data-transfer disclosure.)
- Railway region — which region is the prod project deployed in? (Affects data-location claims.)
- Jurisdictions targeted — should we assume EU/UK reachable (default yes for B2B SaaS), California (yes), other US states (Virginia, Colorado, Connecticut, Texas — newer laws now in force)? Anything to exclude?
- Business entity — what is the legal entity name and address that should appear as "Controller" / "Service Provider" on the documents? (Required for binding contact / notices section.)
- DPO / privacy contact email — is there a dedicated address (e.g.,
privacy@resolutionflow.com), or should we usesupport@/michael@resolutionflow.com? - Whether Microsoft Learn MCP usage is enabled in prod —
ENABLE_MCP_MICROSOFT_LEARN=Truedefault. The integration retrieves docs only (no customer data outflow), but worth confirming. - Non-codebase tools — does ResolutionFlow use any of: Zapier/n8n/Make, HubSpot/Salesforce CRM, DocuSign, Help Scout/Zendesk, transcription/voice (Whisper, Eleven Labs), customer-data-platform tooling? None found in code; common to be configured elsewhere.
- AGE: Children's data — confirm ResolutionFlow has no users under 13 (US COPPA) / 16 (UK GDPR). Should be implicit for a B2B MSP product but the policy needs to state it.
- Free tier / EULA — confirm whether the product accepts unauthenticated visitors who can submit anything other than the public sales-lead form and public flow shares.
- Backup retention — Railway Postgres backups (point-in-time recovery window) extend effective retention. Confirm the PITR window and disclose.
Stop point. Per the skill workflow, generation is blocked on user confirmation of this inventory. Please review and either confirm or correct each section — and answer Section 7 — before I move to Phase 2 (classification) and Phase 3 (generation).