Files
resolutionflow/docs/archive/phase-c-redaction-consolidated-plan.md
chihlasm 350c977eda feat: add procedural flows with intake forms, navigation, and seed templates
Adds a new "procedural" tree type for linear step-by-step project workflows
(domain controller setup, M365 onboarding, VPN config, etc). Includes intake
form builder, two-panel step navigation, variable resolution, procedural
exports, 3 seed templates, and UI rename from "Trees" to "Flows".

Also archives 19 implemented plan docs and creates deferred features backlog.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 04:13:52 -05:00

12 KiB

Phase C: Sensitive Data Redaction — Consolidated Implementation Plan

Status: Approved — ready for implementation
Spec: docs/plans/2026-02-13-EXPORT-IMPROVEMENTS-SPEC.md section C1
UI Decision: Simple toggle (Option 1)
Redaction Posture: Conservative (false positives > false negatives)
Branch: feat/export-phase-c
No DB migration required


Overview

Server-side regex redaction with a simple checkbox toggle in the export preview modal. Redaction runs after export generation and variable resolution to ensure no sensitive data slips through via late substitution. No rich editor — keeps the existing textarea. User sees a summary of what was masked and can manually edit the result.

Redaction is non-persistent and request-scoped — database records are never mutated.


Scope

In scope:

  • Redaction for exported content in SessionDetailPage preview/download/copy flows
  • Backend redaction summary returned to frontend for user visibility
  • Conservative pattern set (IPv4, IPv6, email, bearer/API/JWT-like tokens, UNC paths)

Out of scope:

  • Rich editor / highlight / per-item unmask controls
  • Redaction changes to non-export APIs or persisted session data
  • Hostname masking (MSP tickets legitimately reference hostnames)

Design Decisions

Decision Rationale
Redaction runs post-generation, post-variable-substitution Prevents misses from late substitutions; redacts the final rendered text
Fail-closed on error If redaction_mode="mask" and redaction processing fails, return 500 — never leak unredacted content
Conservative detection Prefer false positives over false negatives; users can manually edit
Idempotent output Running redaction twice on already-redacted content produces the same result
Deterministic replacement order Patterns applied in fixed order to prevent overlapping-match inconsistencies
Non-persistent DB records are never mutated; redaction is request-scoped
Hostname exclusion MSP tickets legitimately reference hostnames

Backend

1. New File: backend/app/services/redaction_service.py

RedactionSummary dataclass:

@dataclass
class RedactionSummary:
    ips: int = 0
    emails: int = 0
    tokens: int = 0
    unc_paths: int = 0

    @property
    def total(self) -> int:
        return self.ips + self.emails + self.tokens + self.unc_paths

Compiled regex pattern registry (deterministic order):

Priority Pattern Regex Replacement
1 Bearer tokens Bearer\s+[A-Za-z0-9._-]+ [TOKEN REDACTED]
2 API key patterns Long hex/base64 strings (32+ chars) [TOKEN REDACTED]
3 UNC paths \\\\[\w.-]+\\[\w$.-]+ [UNC PATH REDACTED]
4 Email \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b [EMAIL REDACTED]
5 IPv6 \b(?:[0-9a-fA-F]{1,4}:){2,7}[0-9a-fA-F]{1,4}\b [IP REDACTED]
6 IPv4 \b(?:\d{1,3}\.){3}\d{1,3}\b [IP REDACTED]

Priority rationale: More specific/longer patterns match first to prevent partial matches. Bearer tokens before general tokens, IPv6 before IPv4, etc.

Core function:

def apply_redaction_to_text(content: str) -> tuple[str, RedactionSummary]:
    """
    Apply all redaction patterns to text content.
    Uses re.subn for replacement + counting in one pass per pattern.
    Returns (redacted_content, summary).
    """
  • Compile all patterns at module load time (not per-request)
  • Use re.subn() for simultaneous replacement and counting
  • Ensure idempotent output — already-redacted placeholders like [IP REDACTED] must not be re-matched
  • Raise exception on unexpected errors (fail-closed behavior enforced by caller)

2. Schema Change: backend/app/schemas/session.py

Add to SessionExport:

redaction_mode: Literal["none", "mask"] = "none"

3. Endpoint Integration: backend/app/api/endpoints/sessions.py

Update export flow with this execution order:

1. Fetch session
2. Generate export by format (markdown/text/html)
3. Resolve variables
4. IF redaction_mode == "mask":
     Call redaction service on final rendered content
     If redaction raises → return 500 (fail-closed)
5. Set response headers
6. Return content

Critical: Redaction happens AFTER steps 2-3, not before format branching.

Response headers (always set):

  • X-Redaction-Mode: none|mask — always present on export responses
  • X-Redaction-Summary: {"ips": 3, "emails": 2, "tokens": 1, "unc_paths": 0, "total": 6} — present only when mode is mask

Redaction footer appended to export content when matches exist:

--- Redacted: 3 IPs, 2 emails, 1 token ---

Keep existing media types and exported-flag behavior unchanged.

4. CORS Header Exposure: backend/main.py

Update both CORS middleware branches to expose redaction headers:

expose_headers=[
    "X-Redaction-Mode",
    "X-Redaction-Summary",
    "X-Correlation-ID",
    "X-Process-Time"
]

Without this, the frontend cannot read custom headers from the response. This is a browser security restriction (CORS).


Frontend

5. Types: frontend/src/types/session.ts

Add to SessionExport type:

redaction_mode?: 'none' | 'mask';

Add new interface:

interface RedactionSummary {
  ips: number;
  emails: number;
  tokens: number;
  unc_paths: number;
  total: number;
}

6. API Layer: frontend/src/api/sessions.ts

Keep existing export() function unchanged for backward compatibility.

Add new function:

async function exportWithMeta(
  id: string,
  options: SessionExport
): Promise<{
  content: string;
  redactionMode: 'none' | 'mask';
  redactionSummary: RedactionSummary | null;
}> {
  // Makes same API call but parses response headers
  // Safely parse X-Redaction-Summary with try/catch
  // Returns structured metadata alongside content
}

Why a separate function? Existing callers of export() don't break. Preview flows that need metadata use the new function. Clean separation.

7. Session Detail Page: frontend/src/pages/SessionDetailPage.tsx

  • Add state: redactionMode: 'none' | 'mask' (default: 'none')
  • Add state: redactionSummary: RedactionSummary | null
  • Use exportWithMeta() for preview and toggle-refresh flows
  • Pass toggle callback and summary to ExportPreviewModal
  • Keep "Copy for Ticket" and non-preview copy behavior unchanged unless explicitly toggled
  • Follow same pattern as existing includeSummary state

8. Export Preview Modal: frontend/src/components/session/ExportPreviewModal.tsx

New props:

redactionEnabled?: boolean;
onToggleRedaction?: (enabled: boolean) => void;
redactionSummary?: RedactionSummary | null;

Checkbox — match existing "Include Summary" visual pattern:

<label className="flex items-center gap-2 text-sm text-white/60 cursor-pointer">
  <input type="checkbox" checked={redactionEnabled} onChange={...} />
  Mask Sensitive Data
</label>

Summary display:

  • When matches exist: "Masked: 3 IPs, 2 emails, 1 token" in text-blue-400
  • When mask is on but no matches: "No sensitive data detected" in text-white/40
  • Helper text below toggle: "Toggling reloads content and replaces any manual edits" in text-white/30 text-xs

Testing

Backend Unit Tests: backend/tests/test_redaction_service.py

Test Case Description
Individual patterns Each pattern type independently (IPv4, IPv6, email, bearer token, API key, UNC path)
Mixed content Multiple pattern types in single text block, verify aggregate counts
No matches Input with no sensitive data returns unchanged text and zero counts
Idempotency Already-redacted placeholders ([IP REDACTED]) are not re-matched or double-counted
Token boundaries Conservative token detection minimum-length boundaries (32+ chars)
Edge cases Empty strings, None handling, very long strings
Total calculation summary.total matches sum of individual counts

Backend Integration Tests: backend/tests/test_sessions.py (extend)

Test Case Description
redaction_mode=none Returns unmasked export and X-Redaction-Mode: none header
redaction_mode=mask Masks content and sets parseable X-Redaction-Summary header
Variable substitution Content from variable resolution is also masked when matching patterns
Media types unchanged Export content types remain the same regardless of redaction
Exported flag unchanged Existing exported-flag semantics for completed/in-progress sessions unchanged
Error behavior Redaction failure returns 500, not unredacted content

Frontend Validation

  • npm run build validates types
  • npm run test for any existing test suites
  • Verify exportWithMeta header parsing behavior
  • Verify ExportPreviewModal toggle and summary rendering states

Manual QA Checklist

  • Preview with redaction OFF shows original content
  • Preview with redaction ON masks sensitive data and shows accurate summary
  • Toggle redaction repeatedly — verify stable counts and content
  • Download from preview uses the currently shown (edited/masked) content
  • Copy for Ticket respects current redaction choice
  • Content with variables resolves correctly, then redacts
  • Redaction footer appears in exported content when matches exist
  • Summary line disappears when redaction is toggled off

Acceptance Criteria

  1. User can enable/disable masking via preview toggle without page reload
  2. Masked output contains no raw matches for any covered pattern
  3. Summary counts are visible in UI and match backend-calculated values
  4. No persisted session fields are changed by export redaction
  5. Existing export formats and Phase B features continue to pass current tests
  6. Redaction failure results in 500 error, never unredacted content delivery

Files to Create/Modify

Action File Notes
Create backend/app/services/redaction_service.py Core redaction engine
Create backend/tests/test_redaction_service.py Unit tests for redaction
Modify backend/app/schemas/session.py Add redaction_mode to SessionExport
Modify backend/app/api/endpoints/sessions.py Integration point (post-generation)
Modify backend/main.py CORS expose_headers for both branches
Modify frontend/src/types/session.ts Add RedactionSummary interface + redaction_mode
Modify frontend/src/api/sessions.ts Add exportWithMeta() function
Modify frontend/src/components/session/ExportPreviewModal.tsx Checkbox + summary UI
Modify frontend/src/pages/SessionDetailPage.tsx State management + wiring
Extend backend/tests/test_sessions.py Integration tests for export + redaction

Implementation Order

  1. redaction_service.py + unit tests (standalone, no dependencies)
  2. Schema change in session.py
  3. Endpoint integration in sessions.py + CORS update in main.py
  4. Backend integration tests
  5. Frontend types + API layer (session.ts, sessions.ts)
  6. Frontend UI (ExportPreviewModal.tsx, SessionDetailPage.tsx)
  7. Manual QA against checklist

Assumptions & Defaults

  • Default redaction mode is none
  • Redaction scope is export content only, not stored session data
  • Hostnames are intentionally not masked
  • Conservative detection is accepted, including possible false positives
  • No DB migration is required
  • Existing export() API function remains unchanged for backward compatibility