Files
resolutionflow/backend/app/api/endpoints/ai_sessions.py
chihlasm b3dba57bc5 feat: tenant isolation Phase 0 — app-layer filters, UUID audit, CI gate (#132)
* docs: add tenant data isolation design spec

Complete architecture plan for multi-tenant data isolation across
all layers (PostgreSQL RLS, application-layer filtering, schema
migration, testing strategy, and phased rollout checklist).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add background job isolation policy to tenant isolation spec

Documents policy for all 5 existing background jobs:
- Knowledge Flywheel and PSA Retry flagged for account_id threading
- Chat Retention already follows correct pattern (model for others)
- Maintenance Schedule Firing needs account_id in queries + Session creation
- AI Conversation Expiry approved as cross-tenant with justification

Adds approved cross-tenant query registry and Phase 2 checklist items.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add tenant isolation Phase 0 implementation plan

8 tasks covering: CRITICAL copilot hotfix, tenant_filter() helper,
get_tenant_context dependency, analytics/category/AI session gap fixes,
full UUID endpoint audit, TargetList dead code audit, teams orphan
check, and CI grep check for missing tenant filters.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add tenant_filter() helper and get_tenant_context dependency

tenant_filter(model, account_id) is the canonical app-layer tenant
scoping expression. Every query on a tenant table must use it.
build_tree_access_filter and build_step_visibility_filter updated
to call tenant_filter() internally for the account_id match.

get_tenant_context is a FastAPI dependency that returns account_id
or raises 403 if the user has no account — prevents raw access to
current_user.account_id and centralises the null check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: scope analytics/flows/{tree_id} to requesting account

Any authenticated user could read flow analytics (session counts,
completion rates, CSAT) for any tree UUID. Now returns 404 if the
tree doesn't belong to the requesting account.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: scope category tree_count to requesting account

tree_count on GET /categories/{id} was including trees from all
accounts, leaking cross-tenant row counts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: restrict AI session search to current user only

Search endpoint used OR(user_id, account_id), exposing other users'
problem_summary and problem_domain within the same account. Sessions
are user-scoped only — cross-user access requires explicit escalation
or sharing. List and search endpoints now behave consistently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add ownership check and 404 responses to ai-sessions endpoints

Cross-tenant isolation audit found:
- retry-psa-push had NO ownership check (CRITICAL) — any user could retry any session's PSA push
- save_task_lane used db.get() without ownership filter, returned 403 revealing existence
- get_session returned 403 instead of 404 for unauthorized access
- stream_documentation returned 403 instead of 404

All now use query-level user_id filtering and return 404 to avoid revealing existence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-tenant session access

All session endpoints (get, update, complete, scratchpad, variables, export,
ticket-link) now return 404 instead of 403 when a user tries to access
another user's session. This prevents confirming existence of resources
across tenant boundaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-tenant tree access

get_tree and update_tree now return 404 when a user cannot access a tree
(private tree from another account). Prevents confirming resource existence
across tenant boundaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-tenant step access

get_step_or_404 now returns 404 when can_view_step or can_edit_step fails,
preventing confirmation of step existence across tenant boundaries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-tenant upload access

get_upload_url and delete_upload now return 404 when the upload belongs to
a different account/user, preventing resource existence confirmation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-tenant share access

revoke_share and create_share now return 404 when the caller is not the
owner, preventing resource existence confirmation across users.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-team tree access in maintenance schedules

_get_tree_or_403 now returns 404 when the user's team does not match,
preventing confirmation of tree existence across teams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-account tag access

get_tag now returns 404 for account-specific tags that belong to another
account, preventing resource existence confirmation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 instead of 403 for cross-account step category access

get_step_category now returns 404 for account-specific categories that
belong to another account, preventing resource existence confirmation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test: add cross-tenant isolation tests for Task 6 UUID audit

Tests cover:
- Tree GET/PUT returns 404 for cross-account access
- Session GET returns 404 for cross-user access
- AI session GET returns 404 for cross-user access
- AI session retry-psa-push requires ownership
- Upload URL returns 404 for cross-account access
- Share revoke returns 404 for cross-user access

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: return 404 (not 403) for get_documentation cross-user access; add missing Task 6 tests

get_documentation was revealing session existence via 403. Added pre-check
query filtering by session_id AND user_id before calling the engine.

Also add cross-tenant isolation tests for steps, tags, step_categories,
and maintenance_schedules endpoints fixed in Task 6 (TDD was skipped).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address Task 6 quality review — rename helper, restore 403 for intra-account, add docs test

- Rename _get_tree_or_403 → _get_tree_or_404 in maintenance_schedules.py
  (function now raises 404, old name was misleading)
- Restore HTTP 403 for intra-account permission failures in update_tree:
  same-account users who can see a tree but can't edit it got 404 (wrong);
  only cross-account lookups should return 404 to avoid confirming existence
- Apply same 403/404 distinction to update_tree_visibility
- Add test: get_documentation must return 404 for cross-user session access
- Add comment documenting owner-only design for documentation endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: Task 7+8 — TargetList audit, CI tenant-filter grep check

Task 7: TargetList dead code audit
- Found active code references in 12+ files across backend and frontend
  (full CRUD API + frontend page + MaintenanceScheduleSection + BatchLaunchModal)
- Decision: migrate to account_id in Phase 1 (cannot drop)
- DB row count not available from code-server — must verify from VPS SSH
  before Phase 1 migration
- Teams orphan check query documented; must run from VPS SSH before Phase 1
- Results documented in spec Section 9

Task 8: CI tenant-filter enforcement check (warn mode)
- Create backend/scripts/check_tenant_filters.py
  Scans endpoint and service files for select() on tenant tables without
  tenant_filter/account_id/user_id in surrounding context. Currently
  reports 109 warnings (Phase 1 backlog). Exits 0 (warn mode).
- Add Check tenant filter enforcement step to backend CI job
  Add --fail flag after Phase 1 backlog clears to make it blocking.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: record Phase 0 audit results — 0 orphaned teams, 0 target_list rows

Both checks confirmed 2026-04-09 from production DB.
Phase 1 migration is safe to proceed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-09 00:42:19 -04:00

1105 lines
37 KiB
Python

"""FlowPilot AI session endpoints.
CRUD and interaction endpoints for AI-powered troubleshooting sessions:
POST /ai-sessions — Start a new session
POST /ai-sessions/{id}/respond — Submit step response, get next step
POST /ai-sessions/{id}/resolve — Resolve the session
POST /ai-sessions/{id}/escalate — Escalate the session
GET /ai-sessions — List user's sessions (paginated)
GET /ai-sessions/{id} — Get session detail with all steps
GET /ai-sessions/{id}/documentation — Get auto-generated documentation
POST /ai-sessions/{id}/rate — Submit post-session rating
"""
import logging
from datetime import datetime
from typing import Annotated, Optional
from uuid import UUID
from fastapi import APIRouter, Depends, HTTPException, Query, Request, status
from sqlalchemy import or_, select, func, text
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from app.core.rate_limit import limiter
from app.api.deps import get_current_active_user, get_db, require_engineer_or_admin
from app.core.config import settings
from app.core.ai_quota_service import check_ai_quota, record_ai_usage, get_user_plan
from app.models.user import User
from app.models.ai_session import AISession
from app.schemas.ai_session import (
AISessionCreateRequest,
AISessionCreateResponse,
StepResponseRequest,
StepResponseResponse,
ResolveSessionRequest,
EscalateSessionRequest,
SessionCloseResponse,
SessionDocumentation,
RateSessionRequest,
StatusUpdateRequest,
StatusUpdateResponse,
PickupSessionRequest,
LinkTicketRequest,
AISessionSummary,
AISessionDetail,
AISessionStepResponse,
AISessionSearchResult,
StepOptionSchema,
ChatSessionCreateResponse,
ChatMessageRequest,
ChatMessageResponse,
SaveTaskLaneRequest,
)
from app.services import flowpilot_engine
from app.services import unified_chat_service
from app.services.psa_documentation_service import retry_failed_push
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/ai-sessions", tags=["ai-sessions"])
def _build_session_detail(session: AISession) -> AISessionDetail:
"""Build AISessionDetail from ORM session with properly mapped steps.
AISessionDetail.model_validate(session) fails because the ORM steps
relationship uses 'id' while AISessionStepResponse expects 'step_id'.
This helper manually maps all fields to avoid that validation error.
"""
step_responses = []
for step in (session.steps or []):
options = []
if step.options_presented:
options = [
StepOptionSchema(
label=opt.get("label", ""),
value=opt.get("value", ""),
followup_hint=opt.get("followup_hint"),
)
for opt in step.options_presented
]
content = step.content or {}
step_responses.append(AISessionStepResponse(
step_id=step.id,
step_order=step.step_order,
step_type=step.step_type,
content=content,
context_message=step.context_message,
options=options,
allow_free_text=content.get("allow_free_text", True),
allow_skip=content.get("allow_skip", True),
confidence_tier=session.confidence_tier,
confidence_score=step.confidence_at_step,
))
return AISessionDetail(
id=session.id,
session_type=getattr(session, 'session_type', 'guided'),
title=getattr(session, 'title', None),
status=session.status,
intake_type=session.intake_type,
intake_content=session.intake_content or {},
problem_summary=session.problem_summary,
problem_domain=session.problem_domain,
confidence_tier=session.confidence_tier,
step_count=session.step_count,
session_rating=session.session_rating,
psa_ticket_id=session.psa_ticket_id,
psa_connection_id=session.psa_connection_id,
escalation_reason=session.escalation_reason,
matched_flow_id=session.matched_flow_id,
match_score=getattr(session, 'match_score', None),
resolution_summary=session.resolution_summary,
resolution_action=getattr(session, 'resolution_action', None),
session_feedback=session.session_feedback,
ticket_data=session.ticket_data,
created_at=session.created_at,
resolved_at=session.resolved_at,
steps=step_responses,
conversation_messages=session.conversation_messages or [],
pending_task_lane=session.pending_task_lane,
is_branching=getattr(session, 'is_branching', False),
active_branch_id=str(session.active_branch_id) if getattr(session, 'active_branch_id', None) else None,
)
def _require_ai_enabled() -> None:
if not settings.ai_enabled:
raise HTTPException(
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
detail="AI is not configured. Set GOOGLE_AI_API_KEY or ANTHROPIC_API_KEY.",
)
async def _check_quota(user: User, db: AsyncSession) -> None:
"""Check AI quota and raise 429 if exceeded."""
allowed, quota_status = await check_ai_quota(
user_id=user.id,
account_id=user.account_id,
db=db,
billing_anchor=user.ai_billing_cycle_anchor_at,
is_super_admin=user.is_super_admin,
)
if not allowed:
reset_key = "daily_reset_at" if quota_status.get("deny_reason") == "daily" else "monthly_reset_at"
raise HTTPException(
status_code=status.HTTP_429_TOO_MANY_REQUESTS,
detail={
"message": f"AI limit exceeded ({quota_status['deny_reason']})",
"reset_at": quota_status.get(reset_key),
"quota": quota_status,
},
)
async def _record_usage(
user: User,
db: AsyncSession,
generation_type: str,
input_tokens: int,
output_tokens: int,
succeeded: bool,
session_id: Optional[UUID] = None,
error_code: Optional[str] = None,
) -> None:
"""Record AI usage after an LLM call."""
plan = await get_user_plan(user.account_id, db)
estimated_cost = (
input_tokens * 3.0 / 1_000_000
+ output_tokens * 15.0 / 1_000_000
)
await record_ai_usage(
user_id=user.id,
account_id=user.account_id,
conversation_id=None,
generation_type=generation_type,
tier=plan,
input_tokens=input_tokens,
output_tokens=output_tokens,
estimated_cost=estimated_cost,
succeeded=succeeded,
counts_toward_quota=True,
error_code=error_code,
extra_data={"ai_session_id": str(session_id)} if session_id else None,
db=db,
)
# ── Create session ──
@router.post("", status_code=201)
@limiter.limit("5/minute")
async def create_session(
request: Request,
data: AISessionCreateRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Start a new FlowPilot or chat session."""
_require_ai_enabled()
await _check_quota(current_user, db)
# Chat sessions use a different creation path
if data.session_type == "chat":
try:
session = await unified_chat_service.create_chat_session(
user_id=current_user.id,
account_id=current_user.account_id,
team_id=current_user.team_id,
intake_content=data.intake_content,
db=db,
)
except Exception as e:
logger.exception("Chat session creation failed: %s", e)
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to create chat session",
)
await db.commit()
return ChatSessionCreateResponse(
session_id=session.id,
title=session.title or "New Chat",
status=session.status,
)
try:
result = await flowpilot_engine.start_session(
request=data,
user_id=current_user.id,
account_id=current_user.account_id,
team_id=current_user.team_id,
db=db,
)
except Exception as e:
logger.exception("FlowPilot session start failed: %s", e)
# Rollback the failed transaction before attempting usage recording
await db.rollback()
try:
await _record_usage(
current_user, db,
generation_type="flowpilot_start",
input_tokens=0, output_tokens=0,
succeeded=False, error_code=type(e).__name__,
)
await db.commit()
except Exception:
logger.warning("Failed to record usage after session start failure", exc_info=True)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail=f"AI provider error ({type(e).__name__}). Please try again.",
)
await _record_usage(
current_user, db,
generation_type="flowpilot_start",
input_tokens=result.first_step.confidence_score and 0, # Tracked on session
output_tokens=0,
succeeded=True,
session_id=result.session_id,
)
await db.commit()
return result
# ── Chat message ──
@router.post("/{session_id}/chat", response_model=ChatMessageResponse)
@limiter.limit("10/minute")
async def send_chat_message(
request: Request,
session_id: UUID,
data: ChatMessageRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Send a message in a chat session and get AI response."""
_require_ai_enabled()
await _check_quota(current_user, db)
user_id = current_user.id
account_id = current_user.account_id
# Fetch attached uploads from S3 (if any)
images = None
message = data.message
if data.upload_ids:
from app.services.storage_service import fetch_upload_images, fetch_upload_documents
images = await fetch_upload_images(data.upload_ids, account_id, db) or None
# Inject document text (PDFs, text files) as context in the message
documents = await fetch_upload_documents(data.upload_ids, account_id, db)
if documents:
doc_parts = []
for doc in documents:
doc_parts.append(f"--- Attached file: {doc['filename']} ---\n{doc['text']}")
doc_context = "\n\n".join(doc_parts)
message = f"{message}\n\n[Attached document content]\n{doc_context}"
try:
ai_content, suggested_flows, session, fork_metadata, actions_data, questions_data = await unified_chat_service.send_chat_message(
session_id=session_id,
user_id=user_id,
account_id=account_id,
message=message,
db=db,
images=images,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except Exception as e:
logger.exception("Chat message failed: %s", e)
await db.rollback()
try:
await _record_usage(
current_user, db,
generation_type="chat_message",
input_tokens=0, output_tokens=0,
succeeded=False,
session_id=session_id,
error_code=type(e).__name__,
)
await db.commit()
except Exception:
logger.warning("Failed to record usage after chat failure", exc_info=True)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail=f"AI provider error ({type(e).__name__}). Please try again.",
)
await _record_usage(
current_user, db,
generation_type="chat_message",
input_tokens=0, output_tokens=0,
succeeded=True,
session_id=session_id,
)
await db.commit()
return ChatMessageResponse(
content=ai_content,
suggested_flows=suggested_flows,
fork=fork_metadata,
actions=actions_data,
questions=questions_data,
)
# ── Respond to step ──
@router.post("/{session_id}/respond", response_model=StepResponseResponse)
@limiter.limit("15/minute")
async def respond_to_step(
request: Request,
session_id: UUID,
data: StepResponseRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Submit an engineer's response to a FlowPilot step and get the next step."""
_require_ai_enabled()
await _check_quota(current_user, db)
try:
result = await flowpilot_engine.process_response(
session_id=session_id,
request=data,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
except Exception as e:
logger.exception("FlowPilot response failed: %s", e)
await db.rollback()
try:
await _record_usage(
current_user, db,
generation_type="flowpilot_respond",
input_tokens=0, output_tokens=0,
succeeded=False,
session_id=session_id,
error_code=type(e).__name__,
)
await db.commit()
except Exception:
logger.warning("Failed to record usage after response failure", exc_info=True)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail=f"AI provider error ({type(e).__name__}). Please try again.",
)
await _record_usage(
current_user, db,
generation_type="flowpilot_respond",
input_tokens=0, output_tokens=0,
succeeded=True,
session_id=session_id,
)
await db.commit()
return result
# ── Resolve ──
@router.post("/{session_id}/resolve", response_model=SessionCloseResponse)
@limiter.limit("15/minute")
async def resolve_session(
request: Request,
session_id: UUID,
data: ResolveSessionRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Resolve a session. Returns immediately; use /documentation/stream for ticket notes."""
try:
result = await flowpilot_engine.resolve_session(
session_id=session_id,
request=data,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# Fire-and-forget: resolution outputs (don't block the response)
import asyncio
async def _post_resolve_tasks():
try:
from app.services.resolution_output_generator import ResolutionOutputGenerator
gen = ResolutionOutputGenerator(db)
await gen.generate_all(session_id)
except Exception:
logger.exception(f"Failed to generate resolution outputs for session {session_id}")
asyncio.create_task(_post_resolve_tasks())
return result
# ── Escalate ──
@router.post("/{session_id}/escalate", response_model=SessionCloseResponse)
@limiter.limit("15/minute")
async def escalate_session(
request: Request,
session_id: UUID,
data: EscalateSessionRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Escalate a FlowPilot session to another engineer."""
try:
result = await flowpilot_engine.escalate_session(
session_id=session_id,
request=data,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
return result
# ── Pause ──
@router.post("/{session_id}/pause", status_code=204)
@limiter.limit("15/minute")
async def pause_session(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Pause an active FlowPilot session for later resume."""
try:
await flowpilot_engine.pause_session(
session_id=session_id,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# ── Save Task Lane ──
@router.put("/{session_id}/task-lane", status_code=204)
@limiter.limit("30/minute")
async def save_task_lane(
request: Request,
session_id: UUID,
body: SaveTaskLaneRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Save the current task lane state including user's in-progress responses."""
result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
session = result.scalar_one_or_none()
if not session:
raise HTTPException(status_code=404, detail="Session not found")
payload = {
"questions": [q.model_dump() for q in body.questions],
"actions": [a.model_dump() for a in body.actions],
"responses": body.responses,
}
# Guard against oversized payloads (max 256KB serialized)
import json
if len(json.dumps(payload)) > 256 * 1024:
raise HTTPException(status_code=413, detail="Task lane payload too large")
session.pending_task_lane = payload
await db.commit()
# ── Resume ──
@router.post("/{session_id}/resume", status_code=204)
@limiter.limit("15/minute")
async def resume_session(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Resume a paused FlowPilot session."""
try:
await flowpilot_engine.resume_session(
session_id=session_id,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# ── Abandon / Close ──
@router.post("/{session_id}/abandon", status_code=204)
@limiter.limit("15/minute")
async def abandon_session(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
reason: str | None = None,
):
"""Close a session without resolving or escalating."""
try:
await flowpilot_engine.abandon_session(
session_id=session_id,
user_id=current_user.id,
reason=reason,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# ── Delete ──
@router.delete("/{session_id}", status_code=204)
async def delete_session(
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Delete a session (owner only)."""
result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
session = result.scalar_one_or_none()
if not session:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
await db.delete(session)
await db.commit()
# ── Escalation Queue ──
@router.get("/escalation-queue", response_model=list[AISessionSummary])
@limiter.limit("30/minute")
async def get_escalation_queue(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""List sessions requesting escalation for the current user's team/account."""
# Match by team_id if available, otherwise fall back to account_id
if current_user.team_id:
scope_filter = AISession.team_id == current_user.team_id
elif current_user.account_id:
scope_filter = AISession.account_id == current_user.account_id
else:
return []
result = await db.execute(
select(AISession)
.where(
scope_filter,
AISession.status == "requesting_escalation",
)
.order_by(AISession.created_at.desc())
)
sessions = result.scalars().all()
return [AISessionSummary.model_validate(s) for s in sessions]
# ── Pickup Escalated Session ──
@router.post("/{session_id}/pickup", response_model=StepResponseResponse)
@limiter.limit("5/minute")
async def pickup_session(
request: Request,
session_id: UUID,
data: PickupSessionRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Pick up an escalated session as a new engineer."""
_require_ai_enabled()
await _check_quota(current_user, db)
try:
result = await flowpilot_engine.pickup_session(
session_id=session_id,
resume_mode=data.resume_mode,
additional_context=data.additional_context,
user_id=current_user.id,
team_id=current_user.team_id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
except Exception as e:
logger.exception("FlowPilot pickup failed: %s", e)
await db.rollback()
try:
await _record_usage(
current_user, db,
generation_type="flowpilot_pickup",
input_tokens=0, output_tokens=0,
succeeded=False,
session_id=session_id,
error_code=type(e).__name__,
)
await db.commit()
except Exception:
logger.warning("Failed to record usage after pickup failure", exc_info=True)
raise HTTPException(
status_code=status.HTTP_502_BAD_GATEWAY,
detail=f"AI provider error ({type(e).__name__}). Please try again.",
)
await _record_usage(
current_user, db,
generation_type="flowpilot_pickup",
input_tokens=0, output_tokens=0,
succeeded=True,
session_id=session_id,
)
await db.commit()
return result
# ── Link Ticket ──
@router.post("/{session_id}/link-ticket", response_model=AISessionDetail)
@limiter.limit("10/minute")
async def link_ticket_to_session(
request: Request,
session_id: UUID,
data: LinkTicketRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Link a PSA ticket to an in-progress session retroactively."""
try:
await flowpilot_engine.link_ticket(
session_id=session_id,
psa_ticket_id=data.psa_ticket_id,
psa_connection_id=data.psa_connection_id,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# Return updated session detail
result = await db.execute(
select(AISession)
.options(selectinload(AISession.steps))
.where(AISession.id == session_id)
)
session = result.scalar_one_or_none()
if not session:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
return _build_session_detail(session)
# ── Search sessions (Command Palette) ──
@router.get("/search", response_model=list[AISessionSearchResult])
@limiter.limit("30/minute")
async def search_sessions(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
q: str = Query(..., min_length=2, max_length=200),
limit: int = Query(5, ge=1, le=20),
):
"""Search AI sessions by content using full-text search. Used by Command Palette."""
# Sessions are user-scoped. The list endpoint uses user_id only;
# search must be consistent. Cross-user access requires explicit
# escalation or session sharing — not ambient account membership.
result = await db.execute(
select(AISession)
.where(
AISession.user_id == current_user.id,
text("ai_sessions.search_vector @@ plainto_tsquery('english', :q)"),
)
.params(q=q)
.order_by(AISession.created_at.desc())
.limit(limit)
)
sessions = result.scalars().all()
return [
AISessionSearchResult(
id=s.id,
problem_summary=s.problem_summary,
problem_domain=s.problem_domain,
status=s.status,
created_at=s.created_at,
)
for s in sessions
]
# ── Similar Sessions ──
@router.get("/{session_id}/similar")
@limiter.limit("15/minute")
async def get_similar_sessions(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
limit: int = Query(5, ge=1, le=20),
):
"""Find sessions semantically similar to this one using vector embeddings."""
from app.services.session_embedding_service import find_similar_sessions
if not current_user.account_id:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="No account")
results = await find_similar_sessions(
session_id=session_id,
account_id=current_user.account_id,
db=db,
limit=limit,
)
return results
# ── List sessions ──
@router.get("", response_model=list[AISessionSummary])
@limiter.limit("30/minute")
async def list_sessions(
request: Request,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
session_status: Optional[str] = Query(None, alias="status"),
skip: int = Query(0, ge=0),
limit: int = Query(20, ge=1, le=100),
problem_domain: Optional[str] = Query(None),
matched_flow_id: Optional[UUID] = Query(None),
confidence_tier: Optional[str] = Query(None, pattern="^(guided|exploring|discovery)$"),
ticket_id: Optional[str] = Query(None),
session_type: Optional[str] = Query(None, pattern="^(guided|chat)$"),
date_from: Optional[datetime] = Query(None),
date_to: Optional[datetime] = Query(None),
q: Optional[str] = Query(None, min_length=2, max_length=200),
):
"""List the current user's AI sessions (owned or picked up)."""
user_id_str = str(current_user.id)
query = (
select(AISession)
.where(
or_(
AISession.user_id == current_user.id,
AISession.escalation_package["picked_up_by"].as_string() == user_id_str,
)
)
.order_by(AISession.created_at.desc())
.offset(skip)
.limit(limit)
)
if session_type:
query = query.where(AISession.session_type == session_type)
if session_status:
query = query.where(AISession.status == session_status)
if problem_domain:
query = query.where(AISession.problem_domain == problem_domain)
if matched_flow_id:
query = query.where(AISession.matched_flow_id == matched_flow_id)
if confidence_tier:
query = query.where(AISession.confidence_tier == confidence_tier)
if ticket_id:
query = query.where(AISession.psa_ticket_id == ticket_id)
if date_from:
query = query.where(AISession.created_at >= date_from)
if date_to:
query = query.where(AISession.created_at <= date_to)
if q:
query = query.where(
text("ai_sessions.search_vector @@ plainto_tsquery('english', :q)")
).params(q=q)
result = await db.execute(query)
sessions = result.scalars().all()
return [AISessionSummary.model_validate(s) for s in sessions]
# ── Get session detail ──
@router.get("/{session_id}", response_model=AISessionDetail)
@limiter.limit("30/minute")
async def get_session(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Get full session detail with all steps."""
result = await db.execute(
select(AISession)
.options(selectinload(AISession.steps))
.where(AISession.id == session_id)
)
session = result.scalar_one_or_none()
if not session:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
# Allow access if user is owner, escalation target, or picked-up handler
pkg = session.escalation_package or {}
is_handler = pkg.get("picked_up_by") == str(current_user.id)
if session.user_id != current_user.id and session.escalated_to_id != current_user.id and not is_handler:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Session not found")
return _build_session_detail(session)
# ── Documentation ──
@router.get("/{session_id}/documentation", response_model=SessionDocumentation)
@limiter.limit("30/minute")
async def get_documentation(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Get auto-generated documentation for a session."""
# Verify session ownership — owner only. Documentation endpoints require direct
# ownership; escalated_to_id / picked_up_by handlers use get_session (read-only).
# This is consistent with stream_documentation which has the same owner-only check.
result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
if not result.scalar_one_or_none():
raise HTTPException(status_code=404, detail="Session not found")
try:
return await flowpilot_engine.get_session_documentation(
session_id=session_id,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
@router.get("/{session_id}/documentation/stream")
@limiter.limit("20/minute")
async def stream_documentation(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Stream AI-generated ticket notes as Server-Sent Events."""
from starlette.responses import StreamingResponse
# Verify session ownership
result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
session = result.scalar_one_or_none()
if not session:
raise HTTPException(status_code=404, detail="Session not found")
async def event_generator():
try:
async for chunk in flowpilot_engine.stream_ticket_notes(
session_id=session_id,
user_id=current_user.id,
db=db,
):
# SSE format: data: <text>\n\n
yield f"data: {chunk}\n\n"
yield "data: [DONE]\n\n"
except Exception as e:
logger.exception("SSE stream error for session %s: %s", session_id, e)
yield f"data: [ERROR] {str(e)}\n\n"
return StreamingResponse(
event_generator(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no", # Disable nginx buffering
},
)
# ── Status Update ──
@router.post("/{session_id}/status-update", response_model=StatusUpdateResponse)
@limiter.limit("20/minute")
async def create_status_update(
request: Request,
session_id: UUID,
data: StatusUpdateRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
):
"""Generate a status update for ticket notes, client, or email."""
try:
return await flowpilot_engine.generate_status_update(
session_id=session_id,
request=data,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
# ── Rate ──
@router.post("/{session_id}/rate", status_code=204)
@limiter.limit("15/minute")
async def rate_session(
request: Request,
session_id: UUID,
data: RateSessionRequest,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Submit a post-session rating."""
try:
await flowpilot_engine.rate_session(
session_id=session_id,
rating=data.rating,
feedback=data.feedback,
user_id=current_user.id,
db=db,
)
except ValueError as e:
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=str(e))
except PermissionError as e:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail=str(e))
await db.commit()
# ── Retry PSA Push ──
@router.post("/{session_id}/retry-psa-push")
@limiter.limit("5/minute")
async def retry_psa_push_endpoint(
request: Request,
session_id: UUID,
current_user: Annotated[User, Depends(get_current_active_user)],
db: Annotated[AsyncSession, Depends(get_db)],
_: None = Depends(require_engineer_or_admin),
):
"""Manually retry a failed PSA documentation push."""
from app.models.psa_post_log import PsaPostLog
# Verify the session belongs to the current user
session_result = await db.execute(
select(AISession).where(
AISession.id == session_id,
AISession.user_id == current_user.id,
)
)
if not session_result.scalar_one_or_none():
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="Session not found",
)
# Find the latest failed push log for this session
result = await db.execute(
select(PsaPostLog)
.where(
PsaPostLog.ai_session_id == session_id,
PsaPostLog.status.in_(["failed", "pending_retry"]),
)
.order_by(PsaPostLog.posted_at.desc())
.limit(1)
)
log_entry = result.scalar_one_or_none()
if not log_entry:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail="No failed PSA push found for this session",
)
# Reset to pending_retry and attempt immediately
log_entry.status = "pending_retry"
log_entry.retry_count = max(0, log_entry.retry_count - 1) # Give one more attempt
success = await retry_failed_push(log_entry, db)
await db.commit()
return {
"psa_push_status": "sent" if success else log_entry.status,
"psa_push_error": log_entry.error_message if not success else None,
}