fix(handoff): bound escalation assessment latency

Co-Authored-By: Codex <noreply@openai.com>
2026-04-27 20:03:14 -04:00
parent fff8338bf2
commit 9bdd9959a8
6 changed files with 77 additions and 5 deletions
--- a/.ai/HANDOFF.md
+++ b/.ai/HANDOFF.md
@@ -23,10 +23,12 @@ Fixes made:
 - The SSE handshake test now calls `stream_escalations()` directly and consumes only the first generator yield, avoiding HTTPX's infinite-stream buffering behavior.
 - Handoff manager/API tests stub `_generate_ai_assessment()` with an `AsyncMock`.
 - `EscalationBus` normalizes string/UUID account IDs at subscribe/publish/unsubscribe/subscriber_count boundaries, with a regression test.
+- Follow-up fix: escalation AI assessment is now latency-bounded by `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` (default 5s). If it times out, handoff creation proceeds with no assessment instead of blocking on the model/network path.

 Verified:
 - `pytest tests/test_escalation_bus.py tests/test_handoff_manager.py tests/test_session_handoffs_api.py tests/test_flowpilot_analytics_escalations.py --override-ini=addopts= -q --durations=20` → `31 passed in 46.95s`
 - Same subset with `-n auto` → `31 passed in 17.80s`
+- After the assessment-timeout fix: same subset with `-n auto` → `32 passed in 17.77s`
 - No remaining pytest processes or `resolutionflow%test%` Postgres sessions after the run.

 ## Resume point
@@ -48,4 +50,4 @@ Verified:
 - Do not reintroduce `client.stream()`/ASGITransport tests for infinite SSE responses; test the generator directly or use a real server-level test.
 - `DROP SCHEMA public CASCADE` per test is still the dominant cost: DB-backed tests spend ~1.7-2.8s in setup. Use `-n auto` for focused backend loops.
 - The bus is acceptable for v1 pilot scale only because Railway is single-replica. Redis pub/sub is the obvious swap when horizontal scaling appears.
- Synchronous `_generate_ai_assessment()` during escalation creation remains product-latency risk; tests are now isolated from it, but the UX path should be watched as the magic-moment screen is built.
+- Escalation assessment can be missing when the 5s timeout fires. The handoff-context UI must render a graceful "assessment unavailable/in progress" state rather than treating it as required.
--- a/.ai/SESSION_LOG.md
+++ b/.ai/SESSION_LOG.md
@@ -21,8 +21,9 @@
 - Stubbed `_generate_ai_assessment()` in handoff manager/API tests so escalation handoff tests no longer wait on the real AI path.
 - Normalized account IDs inside `EscalationBus` so string UUIDs and `UUID` objects hit the same subscriber bucket; added a regression test.
 - Verified focused backend subset: serial `31 passed in 46.95s`; xdist `31 passed in 17.80s`. Confirmed no lingering pytest processes or test DB sessions afterward.
+- Follow-up in the same session: fixed the product latency risk by adding `ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS` (default 5s) around escalation AI assessment generation. If the optional assessment times out, handoff creation continues with no assessment. Added regression coverage; focused xdist subset now `32 passed in 17.77s`.
 - Left for next session: continue frontend SSE subscription in `EscalationQueue.tsx`, then the magic-moment handoff-context screen.
- Files touched: `backend/app/api/endpoints/session_handoffs.py`, `backend/app/core/escalation_bus.py`, `backend/tests/test_escalation_bus.py`, `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`.
+- Files touched: `backend/app/api/endpoints/session_handoffs.py`, `backend/app/core/config.py`, `backend/app/core/escalation_bus.py`, `backend/app/services/handoff_manager.py`, `backend/tests/test_escalation_bus.py`, `backend/tests/test_handoff_manager.py`, `backend/tests/test_session_handoffs_api.py`, `.ai/HANDOFF.md`, `.ai/SESSION_LOG.md`, `.ai/TODO.md`.

 ## 2026-04-26 03:50 EDT — Claude Code — Ship AssistantChatPage prefill `currentChatRef` fix; close out PR #150

--- a/.ai/TODO.md
+++ b/.ai/TODO.md
@@ -16,8 +16,6 @@
 - [ ] **Consider `pytest-testmon` for PR-time test selection.** Tracks which tests touched which source files and only re-runs affected ones. Best for small PRs touching ~few files. Adds cache-invalidation complexity; only worth it if the suite stays painfully long even after xdist.
 - [ ] **AssistantChatPage `currentChatRef` guard is a silent return** — `handleSend`, `handleTaskSubmit`, `selectChat`, `refreshFacts`, `refreshActiveFix`, and `refreshPreview` all bail with `if (currentChatRef.current !== sentForChatId) return` when stale. This is by design for chat switching, but it also silently masked the prefill-ref bug fixed in PR #153 — the user just saw "no AI response" with no log, no toast, no Sentry event. Either (a) log a `console.warn`/Sentry breadcrumb on the mismatch path so future drift is visible, or (b) split "expected stale" (chat switch) from "unexpected stale" (ref never updated) so only the latter alerts. Pair with an audit of every `currentChatRef.current = ...` assignment vs every `setActiveChatId(...)` call to make sure they're paired everywhere.

- [ ] **Make escalation AI assessment non-blocking or latency-bounded.** `HandoffManager.create_handoff(intent="escalate")` currently calls `_generate_ai_assessment()` synchronously before the handoff commit. Tests now stub this path, but the product path can still make the junior tech's Escalate action wait on model/network latency. For v1, either set a strict timeout with graceful fallback or move assessment generation behind the committed handoff and let the handoff-context screen render partial state until the assessment arrives.
-
 - [ ] **Allow peer-tech to escalate a colleague's session.** Today `POST /ai-sessions/{session_id}/handoff` in [endpoints/session_handoffs.py:48](backend/app/api/endpoints/session_handoffs.py#L48) filters by `AISession.user_id == current_user.id`, so only the session owner can escalate. Real MSP shops have peer hand-offs: Junior A is on lunch, Junior B sees the session is stuck and should be able to escalate it. Auth tweak: switch from session-owner check to `require_engineer_or_admin` + same-account scope. Add a `handed_off_by` audit column (already exists on `SessionHandoff`) so the original-owner-vs-actual-escalator distinction is preserved. Surfaced from /plan-eng-review on the Escalation-Mode wedge plan; v1 wedge demo doesn't need this (solo-founder pilot), but capture for v2 once 3+ pilots are live and a peer-claim need surfaces.

 - [ ] **Mobile/responsive design for EscalationQueue + handoff-context screen.** Pre-PMF wedge demo targets desktop only — MSP techs work on laptops/desktops in shop environments. Once 3+ paying customers exist and a tech requests mobile (likely on-call use case), spec the responsive behavior: stacked card layout below `sm:` breakpoint, full-bleed handoff-context overlay on mobile, swipe-to-claim gesture instead of Pick Up button. Surfaced from /plan-design-review on the Escalation-Mode wedge plan.
--- a/backend/app/core/config.py
+++ b/backend/app/core/config.py
@@ -111,6 +111,7 @@ class Settings(BaseSettings):
    GOOGLE_AI_API_KEY: Optional[str] = None
    AI_MODEL_GEMINI: str = "gemini-2.5-flash"
    AI_MODEL_ANTHROPIC: str = "claude-sonnet-4-6"
+    ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS: int = 5

    # Model tier routing — maps action types to model tiers
    AI_MODEL_TIERS: dict[str, str] = {
--- a/backend/app/services/handoff_manager.py
+++ b/backend/app/services/handoff_manager.py
@@ -57,7 +57,9 @@ class HandoffManager:
        ai_assessment = None
        ai_assessment_data = None
        if intent == "escalate":
-            ai_assessment, ai_assessment_data = await self._generate_ai_assessment(session)
+            ai_assessment, ai_assessment_data = (
+                await self._generate_ai_assessment_with_timeout(session)
+            )

        handoff = SessionHandoff(
            session_id=session_id,
@@ -311,6 +313,24 @@ class HandoffManager:
            logger.exception("Failed to generate AI assessment")
            return None, None

+    async def _generate_ai_assessment_with_timeout(
+        self, session: AISession
+    ) -> tuple[str | None, dict[str, Any] | None]:
+        """Generate optional escalation assessment within the click-path budget."""
+        timeout = settings.ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS
+        try:
+            return await asyncio.wait_for(
+                self._generate_ai_assessment(session),
+                timeout=timeout,
+            )
+        except asyncio.TimeoutError:
+            logger.warning(
+                "Escalation AI assessment timed out after %ss for session %s",
+                timeout,
+                session.id,
+            )
+            return None, None
+
    async def generate_briefing(
        self, handoff_id: UUID, claiming_user_id: UUID
    ) -> str:
--- a/backend/tests/test_handoff_manager.py
+++ b/backend/tests/test_handoff_manager.py
@@ -1,4 +1,5 @@
 """Integration tests for HandoffManager service."""
+import asyncio
 from unittest.mock import AsyncMock, patch

 import pytest
@@ -101,6 +102,55 @@ async def test_create_escalate_handoff(client: AsyncClient, test_user, auth_head
    assert "branch_map" in session.escalation_package or "snapshot" in session.escalation_package


+@pytest.mark.asyncio
+async def test_create_escalate_handoff_does_not_wait_on_slow_ai_assessment(
+    client: AsyncClient, test_user, auth_headers, test_db, monkeypatch
+):
+    """Escalate should commit a handoff even when optional AI assessment is slow."""
+    session = AISession(
+        user_id=test_user["user_data"]["id"],
+        account_id=test_user["user_data"]["account_id"],
+        session_type="guided",
+        intake_type="free_text",
+        intake_content={"text": "test"},
+        status="active",
+        confidence_tier="discovery",
+        conversation_messages=[],
+    )
+    test_db.add(session)
+    await test_db.flush()
+
+    async def slow_assessment(self, session):
+        await asyncio.sleep(0.2)
+        return "too slow", {"confidence": "medium"}
+
+    monkeypatch.setattr(
+        "app.services.handoff_manager.settings."
+        "ESCALATION_AI_ASSESSMENT_TIMEOUT_SECONDS",
+        0.01,
+    )
+    with patch.object(
+        HandoffManager,
+        "_generate_ai_assessment",
+        new=slow_assessment,
+    ):
+        manager = HandoffManager(test_db)
+        handoff = await manager.create_handoff(
+            session_id=session.id,
+            intent="escalate",
+            engineer_notes="Need senior help",
+            user_id=test_user["user_data"]["id"],
+        )
+
+    assert handoff.intent == "escalate"
+    assert handoff.ai_assessment is None
+    assert handoff.ai_assessment_data is None
+
+    await test_db.refresh(session)
+    assert session.status == "escalated"
+    assert session.handoff_count == 1
+
+
@pytest.mark.asyncio
 async def test_claim_session(client: AsyncClient, test_user, test_admin, auth_headers, test_db):
    """Claiming a handoff sets claimed_by and reactivates session."""