# Phase C: Sensitive Data Redaction — Consolidated Implementation Plan > **Status:** Approved — ready for implementation > **Spec:** `docs/plans/2026-02-13-EXPORT-IMPROVEMENTS-SPEC.md` section C1 > **UI Decision:** Simple toggle (Option 1) > **Redaction Posture:** Conservative (false positives > false negatives) > **Branch:** `feat/export-phase-c` > **No DB migration required** --- ## Overview Server-side regex redaction with a simple checkbox toggle in the export preview modal. Redaction runs **after** export generation and variable resolution to ensure no sensitive data slips through via late substitution. No rich editor — keeps the existing textarea. User sees a summary of what was masked and can manually edit the result. Redaction is **non-persistent** and **request-scoped** — database records are never mutated. --- ## Scope **In scope:** - Redaction for exported content in SessionDetailPage preview/download/copy flows - Backend redaction summary returned to frontend for user visibility - Conservative pattern set (IPv4, IPv6, email, bearer/API/JWT-like tokens, UNC paths) **Out of scope:** - Rich editor / highlight / per-item unmask controls - Redaction changes to non-export APIs or persisted session data - Hostname masking (MSP tickets legitimately reference hostnames) --- ## Design Decisions | Decision | Rationale | |----------|-----------| | Redaction runs post-generation, post-variable-substitution | Prevents misses from late substitutions; redacts the final rendered text | | Fail-closed on error | If `redaction_mode="mask"` and redaction processing fails, return 500 — never leak unredacted content | | Conservative detection | Prefer false positives over false negatives; users can manually edit | | Idempotent output | Running redaction twice on already-redacted content produces the same result | | Deterministic replacement order | Patterns applied in fixed order to prevent overlapping-match inconsistencies | | Non-persistent | DB records are never mutated; redaction is request-scoped | | Hostname exclusion | MSP tickets legitimately reference hostnames | --- ## Backend ### 1. New File: `backend/app/services/redaction_service.py` **`RedactionSummary` dataclass:** ```python @dataclass class RedactionSummary: ips: int = 0 emails: int = 0 tokens: int = 0 unc_paths: int = 0 @property def total(self) -> int: return self.ips + self.emails + self.tokens + self.unc_paths ``` **Compiled regex pattern registry (deterministic order):** | Priority | Pattern | Regex | Replacement | |----------|---------|-------|-------------| | 1 | Bearer tokens | `Bearer\s+[A-Za-z0-9._-]+` | `[TOKEN REDACTED]` | | 2 | API key patterns | Long hex/base64 strings (32+ chars) | `[TOKEN REDACTED]` | | 3 | UNC paths | `\\\\[\w.-]+\\[\w$.-]+` | `[UNC PATH REDACTED]` | | 4 | Email | `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z\|a-z]{2,}\b` | `[EMAIL REDACTED]` | | 5 | IPv6 | `\b(?:[0-9a-fA-F]{1,4}:){2,7}[0-9a-fA-F]{1,4}\b` | `[IP REDACTED]` | | 6 | IPv4 | `\b(?:\d{1,3}\.){3}\d{1,3}\b` | `[IP REDACTED]` | > **Priority rationale:** More specific/longer patterns match first to prevent partial matches. Bearer tokens before general tokens, IPv6 before IPv4, etc. **Core function:** ```python def apply_redaction_to_text(content: str) -> tuple[str, RedactionSummary]: """ Apply all redaction patterns to text content. Uses re.subn for replacement + counting in one pass per pattern. Returns (redacted_content, summary). """ ``` - Compile all patterns at module load time (not per-request) - Use `re.subn()` for simultaneous replacement and counting - Ensure idempotent output — already-redacted placeholders like `[IP REDACTED]` must not be re-matched - Raise exception on unexpected errors (fail-closed behavior enforced by caller) ### 2. Schema Change: `backend/app/schemas/session.py` Add to `SessionExport`: ```python redaction_mode: Literal["none", "mask"] = "none" ``` ### 3. Endpoint Integration: `backend/app/api/endpoints/sessions.py` Update export flow with this execution order: ``` 1. Fetch session 2. Generate export by format (markdown/text/html) 3. Resolve variables 4. IF redaction_mode == "mask": Call redaction service on final rendered content If redaction raises → return 500 (fail-closed) 5. Set response headers 6. Return content ``` **Critical: Redaction happens AFTER steps 2-3**, not before format branching. **Response headers (always set):** - `X-Redaction-Mode: none|mask` — always present on export responses - `X-Redaction-Summary: {"ips": 3, "emails": 2, "tokens": 1, "unc_paths": 0, "total": 6}` — present only when mode is `mask` **Redaction footer appended to export content when matches exist:** ``` --- Redacted: 3 IPs, 2 emails, 1 token --- ``` Keep existing media types and exported-flag behavior unchanged. ### 4. CORS Header Exposure: `backend/main.py` Update **both** CORS middleware branches to expose redaction headers: ```python expose_headers=[ "X-Redaction-Mode", "X-Redaction-Summary", "X-Correlation-ID", "X-Process-Time" ] ``` > **Without this, the frontend cannot read custom headers from the response.** This is a browser security restriction (CORS). --- ## Frontend ### 5. Types: `frontend/src/types/session.ts` Add to `SessionExport` type: ```typescript redaction_mode?: 'none' | 'mask'; ``` Add new interface: ```typescript interface RedactionSummary { ips: number; emails: number; tokens: number; unc_paths: number; total: number; } ``` ### 6. API Layer: `frontend/src/api/sessions.ts` **Keep existing `export()` function unchanged** for backward compatibility. **Add new function:** ```typescript async function exportWithMeta( id: string, options: SessionExport ): Promise<{ content: string; redactionMode: 'none' | 'mask'; redactionSummary: RedactionSummary | null; }> { // Makes same API call but parses response headers // Safely parse X-Redaction-Summary with try/catch // Returns structured metadata alongside content } ``` > **Why a separate function?** Existing callers of `export()` don't break. Preview flows that need metadata use the new function. Clean separation. ### 7. Session Detail Page: `frontend/src/pages/SessionDetailPage.tsx` - Add state: `redactionMode: 'none' | 'mask'` (default: `'none'`) - Add state: `redactionSummary: RedactionSummary | null` - Use `exportWithMeta()` for preview and toggle-refresh flows - Pass toggle callback and summary to `ExportPreviewModal` - Keep "Copy for Ticket" and non-preview copy behavior unchanged unless explicitly toggled - Follow same pattern as existing `includeSummary` state ### 8. Export Preview Modal: `frontend/src/components/session/ExportPreviewModal.tsx` **New props:** ```typescript redactionEnabled?: boolean; onToggleRedaction?: (enabled: boolean) => void; redactionSummary?: RedactionSummary | null; ``` **Checkbox — match existing "Include Summary" visual pattern:** ```tsx ``` **Summary display:** - When matches exist: `"Masked: 3 IPs, 2 emails, 1 token"` in `text-blue-400` - When mask is on but no matches: `"No sensitive data detected"` in `text-white/40` - Helper text below toggle: `"Toggling reloads content and replaces any manual edits"` in `text-white/30 text-xs` --- ## Testing ### Backend Unit Tests: `backend/tests/test_redaction_service.py` | Test Case | Description | |-----------|-------------| | Individual patterns | Each pattern type independently (IPv4, IPv6, email, bearer token, API key, UNC path) | | Mixed content | Multiple pattern types in single text block, verify aggregate counts | | No matches | Input with no sensitive data returns unchanged text and zero counts | | Idempotency | Already-redacted placeholders (`[IP REDACTED]`) are not re-matched or double-counted | | Token boundaries | Conservative token detection minimum-length boundaries (32+ chars) | | Edge cases | Empty strings, None handling, very long strings | | Total calculation | `summary.total` matches sum of individual counts | ### Backend Integration Tests: `backend/tests/test_sessions.py` (extend) | Test Case | Description | |-----------|-------------| | `redaction_mode=none` | Returns unmasked export and `X-Redaction-Mode: none` header | | `redaction_mode=mask` | Masks content and sets parseable `X-Redaction-Summary` header | | Variable substitution | Content from variable resolution is also masked when matching patterns | | Media types unchanged | Export content types remain the same regardless of redaction | | Exported flag unchanged | Existing exported-flag semantics for completed/in-progress sessions unchanged | | Error behavior | Redaction failure returns 500, not unredacted content | ### Frontend Validation - `npm run build` validates types - `npm run test` for any existing test suites - Verify `exportWithMeta` header parsing behavior - Verify `ExportPreviewModal` toggle and summary rendering states ### Manual QA Checklist - [ ] Preview with redaction OFF shows original content - [ ] Preview with redaction ON masks sensitive data and shows accurate summary - [ ] Toggle redaction repeatedly — verify stable counts and content - [ ] Download from preview uses the currently shown (edited/masked) content - [ ] Copy for Ticket respects current redaction choice - [ ] Content with variables resolves correctly, then redacts - [ ] Redaction footer appears in exported content when matches exist - [ ] Summary line disappears when redaction is toggled off --- ## Acceptance Criteria 1. User can enable/disable masking via preview toggle without page reload 2. Masked output contains no raw matches for any covered pattern 3. Summary counts are visible in UI and match backend-calculated values 4. No persisted session fields are changed by export redaction 5. Existing export formats and Phase B features continue to pass current tests 6. Redaction failure results in 500 error, never unredacted content delivery --- ## Files to Create/Modify | Action | File | Notes | |--------|------|-------| | **Create** | `backend/app/services/redaction_service.py` | Core redaction engine | | **Create** | `backend/tests/test_redaction_service.py` | Unit tests for redaction | | **Modify** | `backend/app/schemas/session.py` | Add `redaction_mode` to `SessionExport` | | **Modify** | `backend/app/api/endpoints/sessions.py` | Integration point (post-generation) | | **Modify** | `backend/main.py` | CORS `expose_headers` for both branches | | **Modify** | `frontend/src/types/session.ts` | Add `RedactionSummary` interface + `redaction_mode` | | **Modify** | `frontend/src/api/sessions.ts` | Add `exportWithMeta()` function | | **Modify** | `frontend/src/components/session/ExportPreviewModal.tsx` | Checkbox + summary UI | | **Modify** | `frontend/src/pages/SessionDetailPage.tsx` | State management + wiring | | **Extend** | `backend/tests/test_sessions.py` | Integration tests for export + redaction | --- ## Implementation Order 1. `redaction_service.py` + unit tests (standalone, no dependencies) 2. Schema change in `session.py` 3. Endpoint integration in `sessions.py` + CORS update in `main.py` 4. Backend integration tests 5. Frontend types + API layer (`session.ts`, `sessions.ts`) 6. Frontend UI (`ExportPreviewModal.tsx`, `SessionDetailPage.tsx`) 7. Manual QA against checklist --- ## Assumptions & Defaults - Default redaction mode is `none` - Redaction scope is export content only, not stored session data - Hostnames are intentionally not masked - Conservative detection is accepted, including possible false positives - No DB migration is required - Existing `export()` API function remains unchanged for backward compatibility