Findings doc gets a per-finding RESOLUTION section; HANDOFF resume point moves to "re-push + merge" and corrects the false Task 16/17 "done" record; CURRENT_TASK updated; two architectural decisions logged (real ai_build columns replacing the meta convention; ad-hoc walk restored); SESSION_LOG entry added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
35 KiB
DECISIONS.md
Append-only architectural decision log. Newest entries at the top. Entry format:
## YYYY-MM-DD — <short title> **Context:** why this came up **Decision:** what we chose **Rejected:** what we didn't choose and why **Consequences:** what this means going forward
2026-06-09 — L1 ai_build context lives in columns, not a hidden meta walked_path entry
Context: PR #193 review found that the intake category was smuggled into the
ai_build session's walked_path as a fake {"node_type":"meta","category":...}
entry that every consumer had to remember to skip. Most didn't: it made an
otherwise-empty walk truthy (junk pending proposals reached the review queue),
pushed the depth cap off by one (counted as a real step), and rendered as a blank
row in the escalations UI. Compounding it, AI-generated nodes carried no id, but
the advance protocol keys on node_id — so the walk could never advance past the
first question (the headline feature was non-functional end-to-end).
Decision: Add real category, problem_text, and pending_node columns to
l1_walk_sessions (migration 61dda4f615c6) and delete the meta-entry convention
entirely. Intake stores category/problem_text on the session; /next-node
reads them off the row (no ticket re-fetch, no walked_path scan). The server assigns
every node a uuid4().hex[:8] id (ai_tree_builder._assign_id) — never the model.
pending_node persists the served-but-unanswered node so a refresh / StrictMode
double-mount replays it instead of firing a fresh paid LLM call.
Rejected: Symptom-level strip-meta fixes (filter the meta entry at each consumer). Smaller diff, but leaves the landmine convention in place for the next consumer to trip over — contrary to the project principle (correct architecture over minimal diff). Asking the LLM to invent node ids: not stable, not trustworthy.
Consequences: walked_path now holds only real steps. Adding a new consumer no
longer requires knowing about a hidden entry. WalkSessionResponse exposes
category/problem_text (escalations UI shows the real problem). The meta
node_type and _strip_meta are gone.
2026-06-09 — Keep the L1 ad-hoc walk fallback (don't drop it)
Context: The Phase 2A intake rewrite dropped the else: start_adhoc_session(...)
branch, leaving start_adhoc_session with zero callers and the out_of_scope prompt
offering only Escalate/Cancel — while L1CategoriesPage copy still promised "Disabled
categories fall back to an ad-hoc walk or escalation." A capability silently regressed.
Decision: Restore it (review Finding 5 option a). Intake honors adhoc=True
(a new IntakeRequest field → "adhoc" outcome) and the out_of_scope prompt gained a
"Walk it ad-hoc" button. This preserves the pre-existing free-form-walk capability and
keeps the settings copy honest.
Rejected: Dropping ad-hoc and fixing the copy. It removes a capability techs had, for a problem class (out-of-scope) where a free-form walk is the natural fallback before escalation. Cheaper, but a product regression dressed as cleanup.
Consequences: start_adhoc_session has a caller again. The walker renders adhoc
sessions via its existing non-ai_build branch (free-form notes, no AI tree).
2026-05-29 — Single source of truth for plan-tier taxonomy (derive admin UI + validation from plan_limits)
Context: A prod report ("AI sessions aren't working") traced to the owner account having no paid plan (AI is plan-gated), compounded by a real bug: the admin "Change Plan" dropdown (AccountDetailPage.tsx:443-445) still offered the dead team slug (renamed to enterprise in migration 4ce3e594cb87, 2026-05-07) and omitted starter/enterprise. Selecting "Team" 400s against the hardcoded allow-list in admin.py:994. The dropdown was missed during the 2026-05-07 taxonomy reconciliation because the allowed-plan list is hand-duplicated across ≥6 backend + frontend sites. Second taxonomy-drift incident.
Decision: Option B — make plan_limits the single source of truth: admin dropdown + pricing/checkout derive plan options from a plans endpoint (filter is_public, order by sort_order, label from display_name), and backend validation checks against actual plan_limits rows rather than a hardcoded tuple. Implementation deferred (active work is on another branch); fully specced in TODO.md. A trivial dropdown-options fix may land first to unblock the admin tool.
Rejected: Option A (patch only the AccountDetailPage dropdown). Fixes the symptom but leaves the duplication that has now caused two drift incidents — and there is no outage forcing a minimal diff (bug is admin-only and was already worked around via direct Pro assignment). Conflicts with the repo principle "prefer correct architecture over minimal diff."
Consequences: New plan tiers become a data change (a plan_limits row) instead of a multi-file code edit; UI and validation can no longer drift from the catalog. Requires a public-plans read endpoint (or extending billing state) consumed by the admin UI + pricing page. The 'team' visibility string (Tree.visibility / StepLibrary.visibility) is a separate domain and is explicitly out of scope.
2026-05-28 — Scope Anthropic structured outputs to flat-array JSON only
Context: Optimizing the existing Claude API usage (no model change). The Anthropic path in generate_json (ai_provider.py) had no equivalent to the Gemini path's response_mime_type="application/json" — it prompted for JSON and relied on downstream defenses: _strip_markdown_fences (ai_fix), parse_llm_json (knowledge_flywheel), and _try_repair_json (kb_conversion, which balances unclosed braces on truncated output). Anthropic structured outputs (output_config.format with a JSON schema) guarantee valid, parseable JSON and would eliminate those band-aids. The question was which of the four generate_json call sites can adopt it.
Structured outputs has hard schema limits: no recursive schemas, and every object must set additionalProperties: false (so the schema must enumerate exactly the fields the model emits — a superset is impossible, an omission makes a field unproducible). Tracing the call sites against those limits:
- kb_conversion → output is
{title, description, nodes: [...]}/{...steps[], intake_form[]}— flat arrays, references bynext_node_id/id, no nesting. Expressible. - ai_fix → returns a fixed node that is itself a subtree;
_find_node_by_idrecursesnode["children"]and the prompt requires decision nodes to have ≥2 children. Recursive, arbitrary depth. - knowledge_flywheel flow-gen → emits
tree_structure, a decision-tree root with nestedchildren/options, persisted as an opaque blob. - knowledge_flywheel enhancement → flat
new_nodes[] + modified_options[]; expressible but low-frequency and only fence-stripped.
Decision: Apply structured outputs to flat-array outputs only — i.e. kb_conversion. Wired via an optional schema= param on AIProvider.generate_json (None = legacy prompt-only behavior; Anthropic maps it to output_config.format, Gemini ignores it), with the two KB schemas + _schema_for_target_type() in kb_conversion_service.py, gated behind settings.AI_KB_CONVERT_STRUCTURED_OUTPUT (default False) pending a live constrained-decoding smoke-test in staging. The robustness fixes that motivated the work — _extract_text_from_response (skip non-text blocks, log max_tokens/refusal, raise on no-text) — live in the shared provider, so all four callers already benefit regardless of schema adoption.
Rejected:
- Forcing schemas on ai_fix / flow-gen. Their outputs are recursive/nested decision trees; a bounded-depth schema would reject valid deeper trees and break generation. Wrong architecture for marginal/zero benefit (flow-gen's tree is stored as a blob, never schema-validated downstream).
- Wiring the flywheel enhancement site. Flat and technically expressible, but low call frequency and only fence-stripping today — marginal benefit against the risk of a blind (un-live-tested)
additionalProperties: falseschema. - Deleting the fence-strip / repair helpers now.
_strip_markdown_fences/parse_llm_jsonmust stay — they protect the recursive paths that can't use schemas. Only_try_repair_json(kb-only) becomes removable, and only after the flag is validated in staging.
Consequences:
- Structured outputs is the tool for flat JSON; recursive decision-tree outputs are excluded by design. New flat-JSON
generate_jsoncallers can opt in viaschema=; recursive ones should not. AI_KB_CONVERT_STRUCTURED_OUTPUTmust be smoke-tested against the live model (both target types) before production enablement. Open risk: whether Anthropic accepts optional (non-required) fields — if not, the schemas need every field inrequiredwith nullable types. The flag makes this fully reversible.- Deferred cleanup: once the flag is validated, remove only
_try_repair_jsonfrom the kb_conversion Anthropic path; leave the fence-strippers. - Work lives on branch
feat/ai-structured-outputs(commits84a02a5,1388357), based ondesign/l1-workspace.
2026-05-13 — Session expiration policy: 3d idle / 14d absolute defaults + per-account override
Context: User report: "I login to ResolutionFlow and never have to log back in." Investigation found refresh tokens at REFRESH_TOKEN_EXPIRE_DAYS=7 with JTI rotation (security.py:36) — every /auth/refresh minted a fresh 7-day window. Net effect: a sliding 7-day session with no absolute cap. Visit once a week, logged in forever. Acceptable for pilot but not for MSP buyers whose SOC2 / cyber-insurance auditors require enforced session timeouts. Required for the same Phase O launch readiness as the other gates already in flight.
Decision: Two-window model snapshotted into the refresh JWT at login. Defaults to Strict (3-day idle, 14-day absolute), bounded by env-var system min/max. Per-account override via two new accounts columns (NULL = use system default). Owner-only GET/PATCH /accounts/me/security endpoint with effective-value validation (partial-override case caught at the app layer because the DB CHECK can't see Settings). Sibling POST /accounts/me/security/revoke-sessions for all|others-scoped bulk revocation. Frontend: Strict/Standard/Custom presets, active-users list (name + email + last-login-ago), differentiated SessionExpiryToast (idle = warning amber with "Stay signed in" → /auth/refresh; absolute = info cyan, informational only), cyan info-tone banner on /login?reason=session_expired, auto-redirect after scope=all bulk-revoke. Error-detail taxonomy on the wire: session_expired_idle, session_expired_absolute, invalid_refresh_token. Grandfather path: legacy refresh tokens (no auth_time claim) get one free rotation under the new policy. Atomic-revoke-then-check on /auth/refresh so absolute-expired tokens can't be replayed.
8 commits on feat/session-expiration-policy branch (92fa3bc → c7cd711), ~1300 LoC backend + frontend including 28 backend tests. Plan + design review at docs/plans/2026-05-13-session-expiration-policy.md (initial design score 4/10 → final 9/10 via /plan-design-review; 7 design decisions locked).
Rejected:
- Idle-only or absolute-only enforcement. Idle without absolute is the current broken state (sliding forever). Absolute without idle is too strict — kicks users out daily.
- Hard cutover on deploy (SECRET_KEY rotation). Forces every pilot to log in again immediately; high support cost. Grandfather path is friendlier and adds ~50 lines of code.
- Distinguish
session_revoked_by_adminfrominvalid_refresh_tokenon the wire for users whose sessions were killed via bulk-revoke. Requires tracking revocation reason perrefresh_tokensrow. Not worth the complexity for v1 — affected users see they're logged out, same as any other revoke. - Per-user device list with per-device revoke. Refresh tokens don't carry device/user-agent metadata today. Account-wide bulk revoke covers the breach-response use case; per-device is a follow-up if pilots ask.
- "Loose" preset (90d). Strict default suggests we shouldn't ship a one-click loose option. Owners who want a loose policy can use Custom and own the choice explicitly.
- Always-required
idle_minutes+absolute_minutes(XOR-NULL invariant). Forces owners who only want to override idle to also re-declare the absolute window, leaking the system default into account data. Partial overrides allowed; validated at the app layer against current defaults. - Reveal-on-Custom UI for the minute inputs. Hidden-by-default-reveal-on-radio shifts page layout when Custom is selected. Always-visible-but-disabled is more stable and previews the Custom interaction.
- Modal-stays-open-success-state for scope=all bulk-revoke. User preferred auto-redirect-with-toast (more standard SaaS pattern); the toast acts as the success acknowledgment before /login loads.
Consequences:
- "Logged in forever" is fixed. Every user sees a hard 14-day re-auth at minimum (3-day idle in practice for typical usage).
- Account owners get a complete self-service surface for policy + bulk session control. New
/account/securityroute, owner-gated. - Audit-log entries on both mutations:
account.session_policy_updateandaccount.sessions_revoked_bulk. SOC2-ready. - Frontend
idle_expires_at+absolute_expires_atflow through the entire auth surface (Token,OAuthCallbackResponse,authStore, persistence).useAuthSessionExpiryhook is the single source for "is the session about to end." - Future improvements (filed as follow-ups in plan §9): per-user device list (requires
refresh_tokens.last_used_atcolumn), super-admin global ceiling UI, per-user policy. None block current shipping. - Cyan info-tone banner on
/loginis the first of its kind in the app; sets precedent for future neutral system messages.
2026-05-07 — Per-email allowlist (INTERNAL_TESTER_EMAILS) for self-serve soft cutover
Context: Phase O Task 46 ("internal validation pass") needed a way to exercise the full self-serve flow against the prod backend before flipping SELF_SERVE_ENABLED=true for everyone. The plan doc described the mechanism but the backend support was never built — flagged in SESSION_LOG.md as a code blocker. Stripe live-mode setup is also gated on having a working internal-tester path in prod test mode.
Decision: Comma-separated allowlist INTERNAL_TESTER_EMAILS parsed by a Pydantic field_validator into a normalized lowercase list. Two helpers on Settings: is_internal_tester(email) (case-insensitive membership check) and is_self_serve_active_for(email) (returns SELF_SERVE_ENABLED OR is_internal_tester(email)). Both endpoints that gate on the global flag now call the helper:
/config/publicaccepts optional auth via newget_current_user_optionaldep; returnsself_serve_enabled=truefor allowlisted authenticated callers; anonymous calls always see the global flag./auth/registerallows allowlisted emails to register without an invite code.
Rejected:
- Custom header
X-Internal-Tester-Emailfor anonymous flows. Spoofable. The auth/register-payload checks are sufficient because the user has to OWN the email to register or log in. - Separate allowlists per surface (
INTERNAL_PRICING_TESTERS,INTERNAL_OAUTH_TESTERS). Premature splitting. The Phase O use case is "this small set of people can see the new flow"; one variable handles it. If finer granularity emerges, split then. - Database table for the allowlist. Env var matches the spec from the plan doc and fits the soft-cutover lifecycle — list is small, changes infrequently, lives alongside other deployment-time config.
Consequences:
- Stripe internal validation can run end-to-end in prod test mode without flipping the global flag.
- Anonymous callers always see the global flag — the allowlist never leaks via unauthenticated request content. Three regression tests in
test_config_public.pyenforce this. INTERNAL_TESTER_EMAILSplumbed throughdocker-compose.dev.ymland documented inbackend/.env.example. Railway prod env will need the same var set during Phase O cutover.
2026-05-07 — Reconcile plan tier taxonomy (rename team → enterprise, add starter)
Context: PR #162 left a real architectural gap. Marketing surface (PricingPage, Stripe products) was wired for Starter / Pro / Enterprise while backend was on free / pro / team. plan_billing.plan FK referenced plan_limits.plan so the BillingPlan schema's Literal["pro", "starter", "team", "enterprise"] could accept values that violated the FK. plan_billing was unseeded in dev, so no checkout could complete. Subscription.plan.in_(["pro", "team"]) paid-plan checks wouldn't recognize enterprise. Self-serve cutover was blocked at the data layer.
Decision: Reconcile to a single taxonomy — backend slugs become free / pro / starter / enterprise, matching the marketing surface and Stripe products. Migration 4ce3e594cb87:
- Defensive
UPDATE subscriptions SET plan='enterprise' WHERE plan='team'(dev had zero such rows; safety for any prod stragglers). - Rename the
plan_limits.plan='team'row to'enterprise'. - Insert a
starterrow with caps interpolated between free and pro:max_trees=10,max_sessions=75,max_users=1,max_ai_builds_per_month=15, no KB Accelerator, no custom branding, no priority support.
Code rename across schemas, Subscription paid-plan/has_pro_entitlement checks, admin endpoints, frontend useSubscription.isPaidPlan. Resource visibility (Tree.visibility='team', StepLibrary.visibility='team') is a separate domain and intentionally untouched — that string means "shared with my account" and has nothing to do with the subscription tier.
New backend/scripts/sync_stripe_plan_ids.py — idempotent upsert of plan_billing rows from Stripe products by exact name match (ResolutionFlow Starter / Pro / Enterprise). Picks the active monthly recurring price for tiers that have one. Annual fields stay NULL by design — annual pricing is intentionally out of scope for the soft cutover ("want to be able to exit if necessary without breaching any terms").
Rejected:
- Map marketing names to existing slugs (Option A from the discussion). Smallest diff but means PricingPage cards have to translate
enterprise→teamat render time, and "Starter" can't exist as a real backend tier — it'd have to be hidden or dropped. Kicks the can. - Add
starteronly, keepteamslug as cosmetic enterprise (Option C). Mixed taxonomy across layers — slug-vs-display-name divergence guarantees confusion in 6 months. Compromise that's worse than either pure choice. - Annual pricing in this iteration. User's explicit constraint: skip annual to keep exit-flexibility. Schema columns (
annual_price_cents,stripe_annual_price_id) preserved as nullable for future re-enable. - Auto-archive the existing Enterprise
$500/motest-mode price. Done manually via Stripe MCP after un-setting the product'sdefault_pricefirst. Spec says Enterprise is sales-led with no catalog price.
Consequences:
plan_billingtable is now seedable and seeded. Test-modeplan_billingpopulated for all 3 tiers viasync_stripe_plan_ids.py. Live mode runs the same script after manual Dashboard setup of products + prices.- New consumers of
Subscription.planliteral must use("free", "pro", "starter", "enterprise"). Three call sites already updated. Backend-wide grep is the safety net for new ones. Subscription.is_paidandhas_pro_entitlementnow includestarter— Starter is a paid tier with a real $19.99/mo price.- 86/86 passing across the subscription/billing/plan/invite/admin sweep after the rename.
- Test fixtures:
conftest.pyplan_limits seed updated to the new taxonomy._seed_plan_limitshelper intest_plans_public.pyis now a true upsert so tests can overridemax_userseven when conftest seeded the canonical value.
2026-05-07 — Standardize backend Python on 3.12
Context: Runtime facts had drifted from docs. The backend Dockerfiles and running dev container were already on Python 3.12, GitHub CI had just been updated to 3.12, but project docs still said Python 3.11 and Gitea CI relied on the runner's ambient Python.
Decision: Treat Python 3.12 as the backend standard. Pin local pyenv via .python-version to 3.12.13, matching the current python:3.12-slim container patch level. Add explicit Python 3.12 setup to Gitea CI and keep GitHub CI on Python 3.12.
Rejected: Moving Docker/runtime back to Python 3.11. The application was already building and running on 3.12, so reverting the runtime would add churn without a product or dependency reason.
Consequences: Native backend work should use backend/venv created from Python 3.12.13. Future docs/CI/runtime changes should preserve Python 3.12 unless a deliberate upgrade decision is recorded.
2026-04-30 — Add applied_pending non-terminal status to suggested fixes
Context: The verifying banner forces a synchronous verdict — worked / didn't / partial — but a lot of real MSP fixes are async. Engineer ran the script but is waiting on the client to power-cycle, AD replication, an O365 license sync. With only the existing outcomes, the engineer either leaves the banner stale (eroding the verifying signal) or guesses wrong (corrupting outcome data). User flagged the gap directly. Today's NudgeBanner "Still checking" button just silences the nudge — it doesn't tell the system anything.
Decision: Add a fourth, non-terminal outcome applied_pending, parallel to applied_partial. Required pending_reason Text column stores the "what are you waiting on?" reason. Outcome endpoint allows pending → {success, failed, partial, dismissed} transitions; pending stamps applied_at but NOT verified_at (it's parked, not verified). Resolution-note generator frames the fix as provisional (no closure language); escalation-package generator surfaces pending verification as the leading hypothesis with a reference to what's being waited on. Frontend exposes the state via a new PendingBanner component (info-tone, mirrors PartialBanner) plus a "Waiting to verify…" overflow option in the verifying banner. NudgeBanner "Still checking" now records pending with a reason instead of just silencing.
Rejected:
- Reuse
applied_partial. Semantically wrong — partial means "I did some of it." Pending means "I did all of it, just can't tell if it worked." Generators write different prose for each, and conflating them would lose the distinction in the customer-facing resolution note and the next-engineer escalation handoff. - Add a
pending_reasoncolumn without a new status. The status field is what the dashboard, banner, and generators all branch on. Hiding pending state in a separate column would proliferateIF pending_reason IS NOT NULLchecks across every consumer. - Cross-session "Follow-ups" dashboard rollup in v1. Per-session
PendingBanneris the chat-anchored reminder. Add the dashboard surface only if engineers report losing track across multiple pending sessions in pilot use. - Optional follow-up timer ("remind me in 30m"). Out of scope; nice-to-have but not the wedge.
Consequences:
- Engineers can park a fix honestly without losing the verifying signal. The state survives across sessions because it's persisted server-side.
pending_reasonis preserved as audit trail when the engineer advances pending → success/failed/dismissed; it is not auto-cleared. Intentional — it tells the next reader "we waited for X, then it worked."- New consumers of
FixStatusmust handle theapplied_pendingcase. Currently three: the banner derivation inAssistantChatPage, the resolution-note generator, and the escalation-package generator. All three updated in this change. - Migration
c0f3a4b7e91dis reversible — downgrade rewrites pending rows back toapplied_partialand copiespending_reasonintopartial_notesif the partial slot was empty, then drops the column.
2026-04-30 — Allow escalated_to_id to send chat messages in claimed sessions
Context: During browser QA, clicking "Get AI analysis" on the magic-moment screen returned POST /ai-sessions/{id}/chat → 400. The senior tech who claimed the session is stored as escalated_to_id on AISession, not user_id (which remains the junior who created the session). unified_chat_service.send_chat_message queried WHERE ai_sessions.user_id = :user_id, so the senior's ID never matched and the endpoint rejected the request.
Decision: Extend the ownership check in send_chat_message to OR ai_sessions.escalated_to_id = :user_id using SQLAlchemy or_(). This is the minimal, correct fix: the session model already has a semantically valid "also owns" field for the claiming senior; extending the WHERE clause makes that ownership real.
Rejected:
- Transfer
user_idto the senior on claim. Breaks the audit trail —user_idis the originating engineer throughout the session lifecycle. Any query scoped to "sessions this engineer worked on" would silently lose the junior's history. - A separate
can_send_messageservice method. Adds indirection with no benefit for v1. Oneor_()line in the existing query is sufficient. - Checking a role/permission flag instead. Role gating (engineer/admin) already happens at the claim endpoint. The chat-send check is about session ownership, not role. Mixing the two concerns would be confusing.
Consequences:
- Seniors can send AI briefings and continue chat work in sessions they have claimed. Core escalation pickup flow unblocked.
- Any future caller of
send_chat_messageshould be aware that "user_id or escalated_to_id" is the ownership rule. The service-level check is the single enforcement point. user_idremains the originating engineer for all audit, history, and analytics queries. No data migration needed.
2026-04-29 — Consolidate the three per-escalation AI calls into one structured generation
Context: A single user-initiated escalation currently triggers three separate Sonnet calls, all summarizing the same source material (session state, steps taken, "what we know") from slightly different angles:
_build_escalation_package_enhanced— runs in the backgroundenrich_escalation_asynctask, builds a rich JSON payload that's saved toai_session.escalation_package._generate_ai_assessment— also background, returns the magic-moment screen fields (likely_cause,suggested_steps[],confidence).generate_status_update— engineer-triggered when they click "Ticket Notes" / "Client Update" / "Email Draft" in the conclude modal, generates audience-specific PSA prose.
The user surfaced the smell: the engineer is typically generating a status update during the escalate flow, so the AI assessment work is being done twice with overlapping context and the engineer's PSA prose is being thrown away. Live test on 2026-04-29 also showed that bumping the assessment timeout 15s → 45s did NOT fix the empty-placeholder bug — meaning the architectural smell is also a demo blocker.
Decision: ONE structured AI call per escalation that produces a single payload covering both the magic-moment screen's diagnostic fields AND the PSA-ready prose. Persist to SessionHandoff. The conclude modal's "Ticket Notes" button reads from the saved prose instead of calling the model. "Client Update" and "Email Draft" buttons trigger a cheap Haiku transformation over the saved prose (tone shift only, not a re-summarization).
Proposed payload shape (final form decided during implementation):
{
"summary_prose": "<PSA-flavored ticket-notes paragraph>",
"what_we_know": ["<one-liner>"],
"likely_cause": "<one sentence>",
"suggested_steps": ["<short step>"],
"confidence": "low | medium | high",
"audience_variants": {"client_update": null, "email_draft": null}
}
audience_variants filled lazily on first user request, cached.
Rejected:
- Just bumping the timeout further. Already tried 5s → 15s → 45s. The architectural redundancy is the real cost — even if Sonnet completed reliably, three calls per escalation is wasteful and creates three places where state can diverge.
- Reusing the engineer's status update content as the AI assessment. User's first instinct, but: status updates aren't always generated (engineer has to click), they're audience-specific (so you'd pick which one to copy), and they're prose without the structured fields the magic-moment screen needs. The right consolidation is the OTHER direction — generate ONE structured payload that the status-update buttons consume.
- Switching the assessment to Haiku for speed. Faster but solves only the latency symptom, not the redundancy. Doesn't help the conclude modal's status-update buttons.
Consequences:
- Magic-moment screen populates in ~5s instead of 25s+ (work happens in the foreground escalate path, not in a background task that races with the senior's pickup).
- Token spend per escalation drops by ~60% — one Sonnet call replaces two; the third (audience variants) becomes Haiku.
- Engineer's "Ticket Notes" button is instant — no model round-trip.
- Schema enforcement matters. The current
_generate_ai_assessmentreturns freeform prose that the frontend stuffs intoassessment_textbecause the structured fields aren't reliably parseable. The new call must use Anthropic's structured output / tool-use to enforce the schema. - Migration concern:
ai_session.escalation_packageJSON column has live data on existing sessions. Keep it READABLE for backward compatibility; just stop writing the enhanced payload fromenrich_escalation_async. If downstream queue summaries depend on it, dual-write the basic snapshot. - Test fixtures (
test_handoff_manager.py,test_session_handoffs_api.py) currently stub_generate_ai_assessmentviaAsyncMock. Updating the stubs is part of the rename. - The frontend SSE assessment-ready subscription (added in
0f00ee5) stays as-is — it just listens for the new event payload.
2026-04-28 — Tag the task-lane state with an owner chatId
Context: A recurring bug — every time the user returned to test escalation work, creating a new session would flash the previous session's task-lane data (questions, actions, "Tasks" pill counts) before the new session's AI response landed. The first attempt to fix it (8914391) added initializer-time guards (incomingPrefill || isPickup) that skipped the sessionStorage restore on mount. That covered exactly two entry paths and missed every other case: in-place URL navigation, mid-flight pickup, HMR re-runs, and the gap between setActiveChatId(B) and the AI response that finally populates B's questions/actions. The persistence effect made it worse by writing {chatId: activeChatId, questions: activeQuestions} — at any moment where activeChatId had flipped before the questions were updated, sessionStorage was stamped with {chatId: B, questions: [A's data]} and a subsequent restore would happily render A's data for B.
The root cause was that activeQuestions / activeActions / showTaskLane were three independent state slices implicitly assumed to be in sync with activeChatId. The synchronization was by convention, not by structure. Every code path that mutated them had to remember to call resetSessionDerivedState first; missing one created stale UI.
Decision: Add a taskLaneOwnerChatId state that records which chatId the in-memory questions/actions belong to, set at every site that populates them (sendPrefill, selectChat, handleSend, handleTaskSubmit, handleResumeNew, refreshFacts, handleApplyFix), cleared in resetSessionDerivedState. The persistence effect writes ownerChatId as the chatId tag. Render is gated on taskLaneOwnerChatId === activeChatId and ANDed into all three render conditions (toolbar Tasks button, narrow-viewport floating drawer, main side panel). The mount-time skipTaskLaneRestore guard stays as belt-and-braces for the prefill/pickup entry-flash window, which the owner-gate alone doesn't cover.
Rejected:
- More entry-path guards. That's whack-a-mole — the next path nobody anticipated will reproduce the bug. The owner-gate makes the bug structurally impossible regardless of which path triggers it.
- Combining the four state slices into a single tagged object. Cleaner long-term but a bigger refactor with more touch points. The owner-tracking approach gets the structural guarantee with a minimal diff and keeps the existing setState patterns.
- Inlining the comparison at every render site. Works but proliferates the comparison; one named derived value (
taskLaneIsForActiveChat) reads better and groups the gate with the persistence-effect / state declarations as a named concept.
Consequences:
- Stale task-lane data is structurally unable to display. The lane is hidden during any window where
ownerChatId !== activeChatId, no matter what mutation path got you there. - Adding new sites that populate
activeQuestions/activeActionsrequires also settingtaskLaneOwnerChatId. The pattern is documented in the commit message and visible in every existing populate site as a paired call. - The mount-time
skipTaskLaneRestoreguard is now redundant in steady-state but kept for the few-hundred-ms flash window between component mount and the first sendPrefill / selectChat effect. Deleting it would re-introduce a (smaller) flash without strong reason. - Future task-lane state slices (e.g.
facts,activeFix) follow the same pattern: gate their visibility on the owner check via the existing render conditions. Tagging more slices with their own*OwnerChatIdis a future refactor if the slices diverge.
2026-04-24 — Adopt dual-agent handoff system (.ai/ + CLAUDE.md + AGENTS.md)
Context: Claude Code hits session and weekly usage limits. Work stalls when the primary agent is locked out. Needed a structured way for OpenAI Codex to resume where Claude left off without losing architectural truth or drifting across sessions.
Decision: Split the old CLAUDE.md into .ai/PROJECT_CONTEXT.md (stable repo truth), agent-specific root files (CLAUDE.md, AGENTS.md) with a shared protocol block, and a small handoff toolkit (CURRENT_TASK.md, HANDOFF.md, TODO.md, DECISIONS.md, SESSION_LOG.md, README.md). Previous CLAUDE.md snapshotted in commit e110fed before the migration.
Rejected:
- Single symlinked CLAUDE.md/AGENTS.md — diverges silently, hides agent-specific tooling differences.
- Putting GitNexus/gstack content in AGENTS.md — Codex doesn't have those tools; would mislead the resume agent.
- Keeping the old CLAUDE.md as-is and adding AGENTS.md alongside it — duplicated truth, drift guaranteed.
Consequences:
- First read for either agent:
.ai/PROJECT_CONTEXT.md+.ai/CURRENT_TASK.md+.ai/HANDOFF.md. - Architectural changes in the repo require updating PROJECT_CONTEXT.md, not the root agent files.
- Git trailers differ per agent (
Claude Opus 4.7vsCodex) — preserved in each root file. - Legacy
SESSION-HANDOFF.mddeleted in the same commit; superseded by.ai/HANDOFF.md.