Files
resolutionflow/.ai/DECISIONS.md
Michael Chihlas fb2dc222fd
All checks were successful
Mirror to GitHub / mirror (push) Successful in 5s
CI / frontend (pull_request) Successful in 5m9s
CI / backend (pull_request) Successful in 9m43s
CI / e2e (pull_request) Successful in 10m13s
docs(ai): handoff for fresh session — AI consolidation plan locked
- HANDOFF: rewritten resume point. AI summary blocker is the active
  task; consolidation plan is the path. 5-step implementation order
  with watch-outs and breadcrumbs.
- CURRENT_TASK: updated commit table through 0d1b305. Documents the
  live-test results (what works, the AI summary blocker), full
  consolidation design with proposed payload shape.
- SESSION_LOG: chronological entry covering live QA bash, two
  pickup bugs found + fixed, the three Enter/dashboard/timeout
  fixes, and the architectural smell that surfaced.
- DECISIONS: new entry "Consolidate the three per-escalation AI
  calls into one structured generation" — rejected alternatives
  (bump timeout further, copy status-update content the wrong way,
  switch to Haiku) and consequences (5s magic-moment, ~60% token
  reduction, instant Ticket Notes button, schema enforcement
  required, migration concerns documented).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-29 00:21:30 -04:00

9.5 KiB

DECISIONS.md

Append-only architectural decision log. Newest entries at the top. Entry format:

## YYYY-MM-DD — <short title>
**Context:** why this came up
**Decision:** what we chose
**Rejected:** what we didn't choose and why
**Consequences:** what this means going forward

2026-04-29 — Consolidate the three per-escalation AI calls into one structured generation

Context: A single user-initiated escalation currently triggers three separate Sonnet calls, all summarizing the same source material (session state, steps taken, "what we know") from slightly different angles:

  1. _build_escalation_package_enhanced — runs in the background enrich_escalation_async task, builds a rich JSON payload that's saved to ai_session.escalation_package.
  2. _generate_ai_assessment — also background, returns the magic-moment screen fields (likely_cause, suggested_steps[], confidence).
  3. generate_status_update — engineer-triggered when they click "Ticket Notes" / "Client Update" / "Email Draft" in the conclude modal, generates audience-specific PSA prose.

The user surfaced the smell: the engineer is typically generating a status update during the escalate flow, so the AI assessment work is being done twice with overlapping context and the engineer's PSA prose is being thrown away. Live test on 2026-04-29 also showed that bumping the assessment timeout 15s → 45s did NOT fix the empty-placeholder bug — meaning the architectural smell is also a demo blocker.

Decision: ONE structured AI call per escalation that produces a single payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Persist to SessionHandoff. The conclude modal's "Ticket Notes" button reads from the saved prose instead of calling the model. "Client Update" and "Email Draft" buttons trigger a cheap Haiku transformation over the saved prose (tone shift only, not a re-summarization).

Proposed payload shape (final form decided during implementation):

{
  "summary_prose": "<PSA-flavored ticket-notes paragraph>",
  "what_we_know": ["<one-liner>"],
  "likely_cause": "<one sentence>",
  "suggested_steps": ["<short step>"],
  "confidence": "low | medium | high",
  "audience_variants": {"client_update": null, "email_draft": null}
}

audience_variants filled lazily on first user request, cached.

Rejected:

  • Just bumping the timeout further. Already tried 5s → 15s → 45s. The architectural redundancy is the real cost — even if Sonnet completed reliably, three calls per escalation is wasteful and creates three places where state can diverge.
  • Reusing the engineer's status update content as the AI assessment. User's first instinct, but: status updates aren't always generated (engineer has to click), they're audience-specific (so you'd pick which one to copy), and they're prose without the structured fields the magic-moment screen needs. The right consolidation is the OTHER direction — generate ONE structured payload that the status-update buttons consume.
  • Switching the assessment to Haiku for speed. Faster but solves only the latency symptom, not the redundancy. Doesn't help the conclude modal's status-update buttons.

Consequences:

  • Magic-moment screen populates in ~5s instead of 25s+ (work happens in the foreground escalate path, not in a background task that races with the senior's pickup).
  • Token spend per escalation drops by ~60% — one Sonnet call replaces two; the third (audience variants) becomes Haiku.
  • Engineer's "Ticket Notes" button is instant — no model round-trip.
  • Schema enforcement matters. The current _generate_ai_assessment returns freeform prose that the frontend stuffs into assessment_text because the structured fields aren't reliably parseable. The new call must use Anthropic's structured output / tool-use to enforce the schema.
  • Migration concern: ai_session.escalation_package JSON column has live data on existing sessions. Keep it READABLE for backward compatibility; just stop writing the enhanced payload from enrich_escalation_async. If downstream queue summaries depend on it, dual-write the basic snapshot.
  • Test fixtures (test_handoff_manager.py, test_session_handoffs_api.py) currently stub _generate_ai_assessment via AsyncMock. Updating the stubs is part of the rename.
  • The frontend SSE assessment-ready subscription (added in 0f00ee5) stays as-is — it just listens for the new event payload.

2026-04-28 — Tag the task-lane state with an owner chatId

Context: A recurring bug — every time the user returned to test escalation work, creating a new session would flash the previous session's task-lane data (questions, actions, "Tasks" pill counts) before the new session's AI response landed. The first attempt to fix it (8914391) added initializer-time guards (incomingPrefill || isPickup) that skipped the sessionStorage restore on mount. That covered exactly two entry paths and missed every other case: in-place URL navigation, mid-flight pickup, HMR re-runs, and the gap between setActiveChatId(B) and the AI response that finally populates B's questions/actions. The persistence effect made it worse by writing {chatId: activeChatId, questions: activeQuestions} — at any moment where activeChatId had flipped before the questions were updated, sessionStorage was stamped with {chatId: B, questions: [A's data]} and a subsequent restore would happily render A's data for B.

The root cause was that activeQuestions / activeActions / showTaskLane were three independent state slices implicitly assumed to be in sync with activeChatId. The synchronization was by convention, not by structure. Every code path that mutated them had to remember to call resetSessionDerivedState first; missing one created stale UI.

Decision: Add a taskLaneOwnerChatId state that records which chatId the in-memory questions/actions belong to, set at every site that populates them (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix), cleared in resetSessionDerivedState. The persistence effect writes ownerChatId as the chatId tag. Render is gated on taskLaneOwnerChatId === activeChatId and ANDed into all three render conditions (toolbar Tasks button, narrow-viewport floating drawer, main side panel). The mount-time skipTaskLaneRestore guard stays as belt-and-braces for the prefill/pickup entry-flash window, which the owner-gate alone doesn't cover.

Rejected:

  • More entry-path guards. That's whack-a-mole — the next path nobody anticipated will reproduce the bug. The owner-gate makes the bug structurally impossible regardless of which path triggers it.
  • Combining the four state slices into a single tagged object. Cleaner long-term but a bigger refactor with more touch points. The owner-tracking approach gets the structural guarantee with a minimal diff and keeps the existing setState patterns.
  • Inlining the comparison at every render site. Works but proliferates the comparison; one named derived value (taskLaneIsForActiveChat) reads better and groups the gate with the persistence-effect / state declarations as a named concept.

Consequences:

  • Stale task-lane data is structurally unable to display. The lane is hidden during any window where ownerChatId !== activeChatId, no matter what mutation path got you there.
  • Adding new sites that populate activeQuestions / activeActions requires also setting taskLaneOwnerChatId. The pattern is documented in the commit message and visible in every existing populate site as a paired call.
  • The mount-time skipTaskLaneRestore guard is now redundant in steady-state but kept for the few-hundred-ms flash window between component mount and the first sendPrefill / selectChat effect. Deleting it would re-introduce a (smaller) flash without strong reason.
  • Future task-lane state slices (e.g. facts, activeFix) follow the same pattern: gate their visibility on the owner check via the existing render conditions. Tagging more slices with their own *OwnerChatId is a future refactor if the slices diverge.

2026-04-24 — Adopt dual-agent handoff system (.ai/ + CLAUDE.md + AGENTS.md)

Context: Claude Code hits session and weekly usage limits. Work stalls when the primary agent is locked out. Needed a structured way for OpenAI Codex to resume where Claude left off without losing architectural truth or drifting across sessions.

Decision: Split the old CLAUDE.md into .ai/PROJECT_CONTEXT.md (stable repo truth), agent-specific root files (CLAUDE.md, AGENTS.md) with a shared protocol block, and a small handoff toolkit (CURRENT_TASK.md, HANDOFF.md, TODO.md, DECISIONS.md, SESSION_LOG.md, README.md). Previous CLAUDE.md snapshotted in commit e110fed before the migration.

Rejected:

  • Single symlinked CLAUDE.md/AGENTS.md — diverges silently, hides agent-specific tooling differences.
  • Putting GitNexus/gstack content in AGENTS.md — Codex doesn't have those tools; would mislead the resume agent.
  • Keeping the old CLAUDE.md as-is and adding AGENTS.md alongside it — duplicated truth, drift guaranteed.

Consequences:

  • First read for either agent: .ai/PROJECT_CONTEXT.md + .ai/CURRENT_TASK.md + .ai/HANDOFF.md.
  • Architectural changes in the repo require updating PROJECT_CONTEXT.md, not the root agent files.
  • Git trailers differ per agent (Claude Opus 4.7 vs Codex) — preserved in each root file.
  • Legacy SESSION-HANDOFF.md deleted in the same commit; superseded by .ai/HANDOFF.md.