Files
resolutionflow/docs/plans/2026-05-13-session-expiration-policy.md
Michael Chihlas c7cd711859 feat: AccountSecuritySettingsPage + active-users list + toast + login banner
Eighth commit in the session-expiration-policy series. Surfaces all
the owner controls and user-facing expiry UX that the prior commits
plumbed through, designed end-to-end via /plan-design-review (initial
4/10 -> final 9/10; 7 decisions locked in the plan).

Backend additions:
- accounts/me/security GET response gains active_users: list of
  {user_id, name, email, last_login_at} for users in this account
  with at least one un-revoked refresh token. Joined query on
  refresh_tokens + users, distinct, ordered by last_login desc.
  Drives the Active Sessions section.

Frontend additions:
- api/accountSecurity.ts: typed client for GET/PATCH/revoke-sessions.
- hooks/useAuthSessionExpiry.ts: reads idle/absolute expiry from the
  auth store, returns warning ('none'|'soon'|'now') + reason
  ('idle'|'absolute') so consumers can pick the right UX for the
  closer window. Re-evaluates every 30s.
- components/common/SessionExpiryToast.tsx: top-of-app notice that
  fires at T-5min. Idle case: warning-amber tone, [Stay signed in]
  button hits authApi.refresh() and updates the store on success.
  Absolute case: info-cyan tone, [Sign in now] link to /login (no
  recoverable action). Dismissable, doesn't re-fire after dismissal.
- components/account/RevokeSessionsModal.tsx: confirmation modal for
  the two bulk-revoke scopes. Title, body, and confirm-label vary by
  scope; danger-styled confirm button.
- pages/account/AccountSecuritySettingsPage.tsx: the main page.
  Header (Shield icon), intro, Policy card with Strict/Standard/Custom
  radios + always-visible-disabled Custom inputs (idle/absolute
  minutes) with inline validation, Save button + emerald success ping,
  info note about 'applies at next login'. Active sessions card with
  count-aware copy, list of {name, email, last-login-ago} rows
  (caller tagged '(you)'), two buttons — 'except me' hidden when
  count=1, 'sign me out and everyone else' uses danger-tinted styling.
- pages/AccountSettingsPage.tsx: 'Session security' row added to the
  owner-only settings list.
- router.tsx: /account/security route, owner-gated via ProtectedRoute.
- pages/LoginPage.tsx: cyan info-tone banner above form when
  ?reason=session_expired is in the URL.
- components/layout/AppLayout.tsx: mounts <SessionExpiryToast />.

Scope=all bulk-revoke UX (the most jarring moment): on success,
toast.success(N sessions), 1.5s delay, then clear localStorage +
useAuthStore.logout() + window.location='/login' (no banner — the
owner just did this).

Backend tests: existing 22/22 still green plus the GET test now
asserts active_users is present + non-empty after login. Frontend:
tsc clean, authStore test 2/2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-13 17:07:14 -04:00

494 lines
42 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session Expiration Policy — Design & Implementation Plan
**Date:** 2026-05-13
**Owner:** Michael Chihlas
**Status:** Draft — pending review
**Related issue:** none yet (file after plan approval)
---
## 1. Problem
Today, once a user logs in to ResolutionFlow, they effectively stay logged in forever:
- Access token: 5 minutes — fine.
- Refresh token: 7 days, with JTI rotation. Every `/auth/refresh` mints a fresh 7-day window and revokes the old JTI.
- Frontend stores both in `localStorage`; Axios interceptor silently refreshes on every 401.
Net effect: a **sliding 7-day session with no absolute cap**. As long as a user opens the app at least once a week, the refresh token rolls forward indefinitely. There is no enforced re-authentication, no idle-timeout cap, no maximum session lifetime — and no per-account control for MSP owners whose customers may demand stricter security.
This was acceptable for pilot but is **not acceptable for self-serve launch**:
- MSP buyers' SOC2 / cyber-insurance auditors routinely require enforced session timeouts.
- A stolen device with an unlocked browser hands an attacker indefinite access.
- Owners of paying accounts expect to be able to set policy for their members.
## 2. Goals
1. **System-level absolute cap** — no session can exceed N days regardless of activity.
2. **Idle cap** — sessions inactive for N days must require re-login.
3. **Per-account owner override** — account owners can tighten or (within sysadmin-imposed ceilings) loosen the policy for their account.
4. **Graceful UX** — users get warned before forced re-login; rotation continues to be silent within the active window.
5. **Backward-compatible rollout** — existing refresh tokens are grandfathered for one rotation, not invalidated at deploy.
## 3. Non-goals
- Multi-device session management (revoke individual devices). Tracked separately; out of scope here.
- "Remember this device" / trusted device list. Out of scope.
- Per-user (vs per-account) overrides. Out of scope.
- Re-auth on sensitive action (step-up auth). Out of scope.
- Annual review of session policy (analytics dashboards). Out of scope.
## 4. Design
### 4.1 Two windows, both enforced
| Window | Default | Meaning |
|---|---|---|
| **Idle** | 3 days | Maximum time between `/auth/refresh` calls. Rotation extends this window. |
| **Absolute** | 14 days | Hard cap from original login (`auth_time`). Rotation does **not** extend this. |
The shorter of the two governs: a token is valid only if `now < min(idle_exp, auth_time + absolute_max)`.
### 4.2 JWT payload changes
Refresh-token JWT today (`backend/app/core/security.py:36`):
```json
{ "sub": "<user_id>", "type": "refresh", "jti": "<uuid>", "exp": <idle_exp> }
```
New refresh-token JWT:
```json
{
"sub": "<user_id>",
"type": "refresh",
"jti": "<uuid>",
"exp": <idle_exp>, // unchanged semantics, now = idle window
"auth_time": <login_unix_ts>, // original login (Unix seconds); NOT reset on rotation
"idle_max": <idle_seconds>, // captured at login (account policy snapshot, seconds)
"abs_max": <abs_seconds> // captured at login (account policy snapshot, seconds)
}
```
**Unit convention (single source of truth):**
| Surface | Unit | Why |
|---|---|---|
| `Settings.SESSION_*_MINUTES`, `accounts.session_*_minutes`, PATCH `/accounts/me/security` request/response, frontend form inputs | **minutes** | Human-readable, matches the column names, what owners actually edit |
| `idle_max`, `abs_max` inside the refresh JWT, `auth_time` | **seconds (Unix)** | Lets `auth_time + abs_max` be direct Unix math against `int(time.time())` with no conversion at check time |
| `idle_expires_at`, `absolute_expires_at` on API responses, `useAuthSessionExpiry` hook | **ISO 8601 UTC strings** | Matches the rest of the API surface (`DateTime(timezone=True)` everywhere) |
`resolve_session_policy(account)` (see §4.4) returns minutes; the `_mint_session_tokens` helper multiplies by 60 once when stamping the JWT. That's the only place the conversion happens.
Why snapshot `idle_max`/`abs_max` into the JWT instead of looking up the account policy on every refresh? Two reasons:
- Refresh path stays DB-cheap (one query, not two).
- If an owner tightens the policy after a user has logged in, the user's existing session continues under the policy in effect at login — fairer UX, matches what Okta and Microsoft do. New logins pick up the tightened policy.
Counter-consideration: if an owner *loosens* policy, existing sessions stay tight until next login. Acceptable; users won't notice. The owner-tightens case (security event) is the one that matters, and a kill-all-sessions admin button covers that scenario (out of scope here — log an issue).
### 4.3 Per-account policy storage
New columns on `accounts`:
| Column | Type | Nullable | Meaning |
|---|---|---|---|
| `session_idle_minutes` | `Integer` | yes | NULL = use system default |
| `session_absolute_minutes` | `Integer` | yes | NULL = use system default |
Minutes (not days) so admins can configure shorter windows for high-security tenants if needed. Stored as Integer to match existing pattern; conversion to `timedelta` happens at use site.
System-imposed bounds (in `Settings`, environment-overridable):
| Setting | Default | Floor | Ceiling |
|---|---|---|---|
| `SESSION_IDLE_MINUTES_DEFAULT` | 4320 (3d) | n/a | n/a |
| `SESSION_ABSOLUTE_MINUTES_DEFAULT` | 20160 (14d) | n/a | n/a |
| `SESSION_IDLE_MINUTES_MIN` | 15 | hard floor | account override cannot go below |
| `SESSION_IDLE_MINUTES_MAX` | 43200 (30d) | account override cannot go above | |
| `SESSION_ABSOLUTE_MINUTES_MIN` | 60 (1h) | hard floor | |
| `SESSION_ABSOLUTE_MINUTES_MAX` | 129600 (90d) | account override cannot go above | |
Plus invariant: an account's *effective* idle window must not exceed its *effective* absolute window. Enforcement is layered:
- **App-level (PATCH endpoint, authoritative):** before writing the row, resolve both effective values (`override ?? system_default`) and reject when effective idle > effective absolute. This is the only place that knows the current system defaults, so it's the only place that can catch a partial-override hole like `session_idle_minutes=43200, session_absolute_minutes=NULL` when the system absolute default is 20160.
- **DB CHECK constraint (defense in depth, narrower):** `session_idle_minutes IS NULL OR session_absolute_minutes IS NULL OR session_idle_minutes <= session_absolute_minutes`. This only catches the both-set case; the partial-override case is intentionally outside the DB's reach because the DB can't see `Settings`. Document this in a comment on the constraint.
Alternative considered: require both columns to be NULL or both set (XOR-with-NULL). Rejected because it forces an owner who only wants to override idle to also re-declare the absolute window, which leaks the system default into account data and makes the system default harder to evolve later.
### 4.4 Resolution function
```python
# backend/app/core/security.py
def resolve_session_policy(account: Account) -> tuple[int, int]:
"""Return (idle_minutes, absolute_minutes) for an account, applying defaults."""
idle = account.session_idle_minutes or settings.SESSION_IDLE_MINUTES_DEFAULT
abs_ = account.session_absolute_minutes or settings.SESSION_ABSOLUTE_MINUTES_DEFAULT
return idle, abs_
```
Called once at each of the four token-issuing entry points listed in §4.6 (`/auth/login`, `/auth/login/json`, `/auth/google/callback`, `/auth/microsoft/callback`) and snapshotted into the JWT via `_mint_session_tokens`. Not called on `/auth/refresh` — that path carries forward the existing snapshot.
### 4.5 Refresh endpoint changes
`POST /auth/refresh` (`backend/app/api/endpoints/auth.py:377`) currently:
1. Decodes refresh JWT (via `get_refresh_token_payload` dep).
2. Atomically revokes old JTI (`UPDATE … SET revoked_at=now() WHERE token_hash=? AND revoked_at IS NULL RETURNING …`).
3. Mints new refresh + access tokens with same `sub`.
New algorithm (precise):
1. Decode refresh JWT (idle expiry already surfaced as `session_expired_idle` by `decode_refresh_token_strict`; see §4.10).
2. **NEW:** load `user` and `user.account` by `sub` from the decoded payload. Needed before any legacy-token handling because the grandfather path needs to read the account's current policy. If the user is missing or inactive, return 401 with `detail="invalid_refresh_token"` (existing behavior, unchanged).
3. **NEW (grandfather path):** if `auth_time` is missing from the payload (legacy token issued before this PR), treat it as `now()` and snapshot the loaded account's current policy via `resolve_session_policy(account)` into `idle_max`/`abs_max`. One free rotation under the new policy.
4. **NEW:** compute `absolute_deadline = auth_time + abs_max` (both in Unix seconds). Compare with `now >= absolute_deadline`, not `>` — a token whose deadline equals `now()` is expired, not valid.
5. **Atomically revoke the JTI regardless of outcome** (single UPDATE, same statement as today). This consumes the token whether or not the absolute check passes — so an absolute-expired token cannot be replayed forever; a second attempt finds the row already `revoked_at IS NOT NULL` and falls through to the existing "invalid or revoked refresh token" 401.
6. If the atomic UPDATE matched zero rows (already revoked): 401 with `detail="invalid_refresh_token"`.
7. If `now >= absolute_deadline`: 401 with `detail="session_expired_absolute"`. (The row is already revoked from step 5.)
8. Otherwise mint new tokens, **carrying forward `auth_time`, `idle_max`, `abs_max` unchanged** from the old token (or freshly snapshotted if grandfathered in step 3).
Helper contract: `_refresh_session_tokens(payload, user, account, db) -> Token`. Takes the validated decoded payload plus the already-loaded user/account so it doesn't re-query. Returns the same `Token` shape as `_mint_session_tokens` (with the two new ISO expiry fields). Distinct from `_mint_session_tokens` because the refresh path carries claims forward instead of resolving policy.
Idle expiry is handled earlier in the chain: `get_refresh_token_payload` calls `decode_token`, which returns `None` for any JWT past `exp` — that's the existing 401 path. See §4.10 for distinguishing idle expiry from generic invalid-token errors in the response.
### 4.6 Login endpoints
Token-issuing endpoints that need the snapshot logic (verified against the codebase):
| Endpoint | File:line | Response model |
|---|---|---|
| `POST /auth/login` (form-encoded, OAuth2PasswordRequestForm) | `backend/app/api/endpoints/auth.py:303` | `Token` |
| `POST /auth/login/json` (JSON body — what the frontend actually calls) | `backend/app/api/endpoints/auth.py:342` | `Token` |
| `POST /auth/google/callback` | `backend/app/api/endpoints/oauth.py:174` | `OAuthCallbackResponse` |
| `POST /auth/microsoft/callback` | `backend/app/api/endpoints/oauth.py:204` | `OAuthCallbackResponse` |
| `POST /auth/refresh` | `backend/app/api/endpoints/auth.py:377` | `Token` |
`POST /auth/register` (`auth.py:92`) returns `UserResponse` and **does not auto-login** — the frontend follows up with a separate call to `/auth/login/json`. No token-minting changes needed in `/register` itself; the subsequent `/login/json` call will pick up the new claims naturally.
Each of the four token-issuing endpoints (login, login/json, both OAuth callbacks) calls `create_refresh_token` with the extra claims. Wrap in a helper `_mint_session_tokens(user, account, db) -> Token` (or `OAuthCallbackResponse` — see §4.10 on shared response fields) to avoid drift across four sites. `/auth/refresh` uses a variant that carries forward existing claims instead of re-snapshotting policy.
### 4.7 Account security endpoint
New endpoint module: `backend/app/api/endpoints/account_security.py`
```
GET /accounts/me/security → returns {
idle_minutes, absolute_minutes,
effective_idle_minutes, effective_absolute_minutes,
system_min/max bounds,
active_users: [{user_id, name, email, last_login_at}, ...]
}
PATCH /accounts/me/security → owner only; validates bounds + invariant; writes account row
```
`require_account_owner` from `app/api/deps.py:189` enforces ownership. Returns the *effective* values (after defaults applied) so the frontend doesn't have to know about NULL semantics.
**`active_users` field** (added during plan-design-review pass on 2026-05-13): the GET response includes a list of users with at least one un-revoked refresh token in this account. Query: `SELECT DISTINCT u.id, u.email, u.name, u.last_login FROM users u JOIN refresh_tokens rt ON rt.user_id = u.id WHERE u.account_id = :acct AND rt.revoked_at IS NULL`. The frontend uses this to render the "Active sessions" section with names + relative last-login timestamps (see §4.8) rather than a faceless count. Caveat: `last_login` updates only at login, not on refresh — so the relative timestamp is honest about "when they signed in," not "last touched the app." Per-refresh activity needs the deferred `refresh_tokens.last_used_at` follow-up (§9).
### 4.8 Frontend changes
**Response-field naming (single scheme, used everywhere):**
Both `Token` (`/auth/login`, `/auth/login/json`, `/auth/refresh`) and `OAuthCallbackResponse` (`/auth/google/callback`, `/auth/microsoft/callback`) gain two new fields:
| Field | Type | Source |
|---|---|---|
| `idle_expires_at` | ISO 8601 UTC string | derived from refresh JWT `exp` |
| `absolute_expires_at` | ISO 8601 UTC string | derived from refresh JWT `auth_time + abs_max` |
ISO strings (not Unix ints) for consistency with the rest of the API surface, which uses `DateTime(timezone=True)` everywhere. Frontend parses with `new Date(...)`.
**New hook:** `frontend/src/hooks/useAuthSessionExpiry.ts`
- Reads `idleExpiresAt` and `absoluteExpiresAt` from `authStore`.
- Returns `{ idleExpiresAt, absoluteExpiresAt, warning, reason }` where `warning ∈ {"none", "soon", "now"}` and `reason ∈ {"idle", "absolute"}` indicating which window is closer.
- "soon" fires at T-5min on whichever window comes first.
- Pairs with a top-of-app `<SessionExpiryToast />` mounted in `AppLayout.tsx`.
**SessionExpiryToast — differentiated by `reason`** (locked during plan-design-review):
- **`reason === "idle"`** (idle window is closer): warning-amber tone. Copy: *"Your session times out in 5 minutes."* Action button: `[Stay signed in]` → triggers a manual `/auth/refresh` call (resets the idle window). On success, toast dismisses + the store updates `idleExpiresAt`. On failure (e.g. absolute cap is also nearby and the refresh hits `session_expired_absolute`), fall through to the standard 401-handling redirect.
- **`reason === "absolute"`** (absolute window is closer): info-cyan tone (matching the `?reason=session_expired` banner). Copy: *"Your session ends at HH:MM for security. You'll need to sign in again."* No action button — nothing the user can do extends an absolute cap. Optional secondary action: `[Sign in now]` link to `/login` for users who want to re-auth proactively.
- Toast does not auto-dismiss (persists until acted on or window expires).
- Re-fires only after a successful `/auth/refresh` extends the idle window past T-5min and we cross back into "soon" later. Does not nag.
**Modified:** `frontend/src/api/client.ts` interceptor
- On 401 with `detail="session_expired_absolute"` **or** `detail="session_expired_idle"`: **skip the refresh attempt**, flush tokens, redirect to `/login?reason=session_expired`. (Both surfaces go through the same banner — users don't need to distinguish the two.)
- On 401 with `detail="invalid_refresh_token"` or any other detail: current behavior (drop to `/login` without the reason banner).
- Existing access-token-expired flow (transparent `/auth/refresh`) unchanged.
**Modified:** `frontend/src/store/authStore.ts`
- `setTokens(token: Token)` (`authStore.ts:140`) is the single token-persistence path used by both `login()` and the OAuth flow. Extend the `Token` type with `idle_expires_at` + `absolute_expires_at`; `setTokens` writes them to store + localStorage alongside the access/refresh tokens. No new action.
- The Axios refresh interceptor (`api/client.ts:139`) destructures `access_token, refresh_token` today — extend to read the two new fields and call `setTokens` so refreshed sessions update their expiry metadata.
- **Legacy-state migration:** on store rehydrate, if tokens exist but `idle_expires_at` / `absolute_expires_at` are missing from localStorage, leave them `null` and let the next `/auth/refresh` populate them via response fields. The hook treats `null` as "unknown — don't warn yet." No forced logout for pre-deploy localStorage.
**Modified:** `frontend/src/pages/OAuthCallbackPage.tsx`
- The `setTokens({...})` call at `OAuthCallbackPage.tsx:102` currently passes `{access_token, refresh_token, token_type}` from the `OAuthCallbackResponse`. Add `idle_expires_at` and `absolute_expires_at` to the spread so OAuth-issued sessions get the same expiry metadata as password logins.
**New page:** `frontend/src/pages/account/AccountSecuritySettingsPage.tsx`
- Lives under existing `/account` routing with `requireRoleOwner` style guard. Card lives in `AccountSettingsPage.tsx` grid alongside Branding / Chat Retention; **hidden entirely for non-owners** (matches existing role-conditional rendering at `AccountSettingsPage.tsx:597-651`).
- Page shell matches `ChatRetentionSettingsPage.tsx`: `max-w-2xl mx-auto py-8 px-6`, header row with Lucide icon + Bricolage 22px page title, `card-flat rounded-2xl p-6 space-y-6` body.
- **Vertical order (top → bottom):**
1. Page header (Lucide `Shield` icon + "Session Security")
2. One-line intro paragraph (`text-muted-foreground`): *"Control how long sessions can last before users must sign in again."*
3. **Session policy** card: three radios (Strict / Standard / Custom) with effective minute values visible per option ("Strict — 3d idle, 14d absolute"), then two numeric inputs (Idle minutes, Absolute minutes). **Inputs are always visible; disabled when a preset is selected.** Below inputs: hint text showing the system min/max from the GET response. Save button (primary) + inline `text-emerald-400 "Settings saved"` success ping for 3s after save (matching `ChatRetentionSettingsPage.tsx:112-114`).
4. Info line directly below Save: *"New policy applies the next time each person signs in. Use **Active sessions** below to force it immediately."* (`text-muted-foreground`, bold on "Active sessions" — anchor link or just visual emphasis).
5. Visual divider (1px `border-default`).
6. **Active sessions** section (see below for details).
- **Initial GET loading state:** centered `Loader2 animate-spin` page-body, matching `ChatRetentionSettingsPage.tsx:46-51`.
- **Inline validation** on Custom inputs: debounced 300ms; red border (`border-danger`) + small error text below field; Save button disabled when any field is invalid. Server-side 422 from PATCH surfaces via the existing axios interceptor toast.
**Active sessions section (within the same page):**
- GET response includes `active_users: [{user_id, name, email, last_login_at}, ...]` — backend addition; see §4.7.
- Section header: "Active sessions"
- Subhead: "N people are signed in to this account." (singular: "Only you are signed in.")
- Active-users list: one row per active user — `name (email) · logged in 2d ago` (relative time from `last_login_at`). Caller's own row marked with a small "(you)" tag.
- Buttons below the list — count-aware:
- **count > 1:** Two ghost buttons side-by-side — `[Sign out everyone except me]` and `[Sign me out and everyone else]` (the latter uses `text-danger` color to telegraph the self-impact).
- **count = 1 (solo owner):** Hide the "except me" button (it would revoke 0 — confusing). Show only `[Sign me out everywhere]` (still useful — signs the owner out from their other devices).
**Bulk-revoke confirmation modal** (via `components/common/Modal.tsx`):
- **scope=others:** title *"Sign out other users?"* · body *"This signs out the N other active users in your account. They'll need to sign in again. You stay signed in."* · buttons `[Cancel]` (ghost) + `[Sign out N users]` (`text-danger`).
- **scope=all:** title *"Sign out everyone?"* · body *"This signs out all N active users including yourself. Everyone will need to sign in again."* · buttons `[Cancel]` (ghost) + `[Sign out everyone]` (`text-danger`).
- After success: modal closes, `toast.success("Signed out N sessions")`. For scope=all: 1.5s delay → `useAuthStore.getState().logout()` + `window.location = '/login'` (no banner — they just did this, they know why they're here).
**Modified:** `AccountSettingsPage.tsx`
- Add a "Session Security" link card to the existing grid (owner-only visibility, alongside Branding / Chat Retention). Lucide `Shield` icon.
**New login page banner:** when `?reason=session_expired` is present, show a small info-tone banner **above the email/password form**:
- Background: `info-dim` (cyan-dim, `rgba(103,232,249,0.10)` dark / `rgba(8,145,178,0.07)` light per DESIGN-SYSTEM.md)
- Text color: `info` text token
- Border: `1px solid info-dim`
- Padding: 12px 16px, `radius-sm` (5px)
- Icon: Lucide `Info` (16px, info color, left-aligned)
- Copy: *"You were signed out for security. Sign back in to continue."*
- Not dismissable — disappears naturally when the user submits the form (the query string clears on navigate).
- Note: this is the first cyan info-tone banner in the app; sets the precedent we'll reuse for future neutral system messages.
**Modified:** `AccountSettingsPage.tsx`
- Add a "Session Security" link card to the existing grid (owner-only visibility).
**New login page banner:** when `?reason=session_expired` is present, show a calm info banner: "Your session ended for security. Please sign in again." (No alarm UI, just clarity. Same banner for both idle and absolute expiry; the user doesn't need to learn the distinction.)
### 4.9 Migration
`alembic revision -m "add session policy columns to accounts"` (manual, per Lesson 77).
```sql
ALTER TABLE accounts
ADD COLUMN session_idle_minutes INTEGER,
ADD COLUMN session_absolute_minutes INTEGER,
ADD CONSTRAINT session_idle_le_absolute_when_both_set
CHECK (session_idle_minutes IS NULL
OR session_absolute_minutes IS NULL
OR session_idle_minutes <= session_absolute_minutes);
COMMENT ON CONSTRAINT session_idle_le_absolute_when_both_set ON accounts IS
'Defense in depth: catches idle > absolute when both are overridden. '
'The partial-override case (one NULL, one set) is validated at the app layer '
'against current system defaults, since the DB cannot see Settings.';
```
No backfill: NULL is the intended state for "use system default."
Confirm: `accounts` is in the global-tables list per PROJECT_CONTEXT.md, so the migration does **not** add RLS predicates. Verified — `accounts` is explicitly named there.
### 4.10 Error-detail taxonomy
`/auth/refresh` returns 401 with one of these `detail` values, so the frontend can distinguish UX paths:
| `detail` | When | Frontend action |
|---|---|---|
| `session_expired_idle` | refresh JWT past `exp` (idle window elapsed) | flush tokens, redirect `/login?reason=session_expired` |
| `session_expired_absolute` | refresh JWT alive, but `now >= auth_time + abs_max` | flush tokens, redirect `/login?reason=session_expired` |
| `invalid_refresh_token` | JTI not in DB, already revoked, signature bad, type mismatch | flush tokens, redirect `/login` (no banner) |
Implementation note: `decode_token` currently swallows `JWTError` and returns `None`, so idle expiry is indistinguishable from a signature failure at the dep level. Fix by switching `get_refresh_token_payload` (or adding a sibling) to call `jwt.decode` directly and catch `ExpiredSignatureError` separately from generic `JWTError`. Idle-expired tokens raise the former; map that to `session_expired_idle`. All other JWT errors map to `invalid_refresh_token`.
### 4.11 Bulk session revocation (kill-all-sessions)
**Endpoint:** `POST /accounts/me/security/revoke-sessions`, owner-only via `require_account_owner`.
**Request body:**
```json
{ "scope": "all" | "others" }
```
Default `"all"` if body omitted. `"others"` excludes the calling user's own refresh tokens (so the owner stays signed in); `"all"` includes them.
**Response:**
```json
{ "revoked_count": <int> }
```
**Behavior:**
- Single SQL UPDATE: `refresh_tokens.revoked_at = now()` for rows where `user_id IN (SELECT id FROM users WHERE account_id = :caller_account_id)` AND `revoked_at IS NULL`. If `scope="others"`, also AND `user_id != caller.id`.
- All affected users' next `/auth/refresh` matches zero rows in the atomic revoke (§4.5 step 5) → 401 `invalid_refresh_token` → redirect to `/login` (no banner — the user was signed out by an admin, not by expiry; the plain `/login` redirect is honest UX).
- Caller's access token is not revoked (we don't track access JTIs by design); it dies naturally on its 5-minute timer. For `scope="all"`, the frontend handles UX by clearing localStorage and redirecting to `/login` after the response — so the stale access token simply isn't used. Accept the 5-minute window where the caller's access token could in theory still hit endpoints; this matches the existing logout flow and is consistent with the threat model (the action is "kick everyone out," not "instantly invalidate every credential").
**Audit:** writes one `account.sessions_revoked_bulk` event with `{actor_user_id, account_id, scope, revoked_count}`.
**Out of scope:** distinguishing `session_revoked_by_admin` from `invalid_refresh_token` on the wire for affected users. Doing so requires tracking the revocation reason per `refresh_tokens` row (new column). Not worth the complexity right now — the affected user just sees they're logged out, same as if they'd been logged out for any other reason. Revisit if pilots ask for it.
**Why not also per-user-device revoke?** Refresh tokens today don't carry device/user-agent metadata; the unit of granularity is "all of user X's active sessions" (which is most of what people want anyway — e.g., I lost my laptop). The endpoint is account-scoped because that's the owner-control story we're shipping. Per-user device list is a follow-up if/when needed (§9).
## 5. Backward compatibility
### 5.1 Existing refresh tokens (no `auth_time` claim)
On first `/auth/refresh` after deploy:
- Backend detects missing `auth_time`, treats current time as `auth_time`, snapshots current account policy.
- User effectively gets one free 14-day absolute window starting at first post-deploy refresh.
Trade-off vs forcing universal re-login on deploy:
- ✅ Zero deploy-day support burden (no pilots flood Slack with "I got logged out").
- ❌ Users with active sessions see no enforcement for up to 14 days.
Given the user base is small (pilot phase) and the bigger goal is *new* signups have a secure default, the friendly path wins.
### 5.2 If we ever need to invalidate everyone
`SECRET_KEY` rotation kills all existing tokens. Documented in `DEV-ENV.md` but not part of this PR.
## 6. Test plan
Backend (`backend/tests/test_session_policy.py` — new file, unless noted):
1. **Default policy applied** — login without account override → JWT has `idle_max=259200`, `abs_max=1209600` (seconds; 3d/14d). Account/settings columns are minutes (4320/20160); the helper multiplies by 60 when stamping.
2. **Account override honored** — owner PATCHes `session_idle_minutes=60`, `session_absolute_minutes=240` → next login JWT has `idle_max=3600`, `abs_max=14400` (seconds).
3. **Override bounds enforced** — PATCH idle below `SESSION_IDLE_MINUTES_MIN` → 422; PATCH absolute above `SESSION_ABSOLUTE_MINUTES_MAX` → 422.
4. **Invariant enforced (both-set)** — PATCH idle=300, absolute=120 → 422.
5. **Invariant enforced (partial override)** — system default absolute=20160; PATCH idle=43200 with absolute=NULL → 422 (effective idle > effective absolute, app-layer check).
6. **DB constraint catches both-set inversion** — direct SQL `UPDATE accounts SET session_idle_minutes=300, session_absolute_minutes=120` rolls back with `CheckViolation`.
7. **Non-owner cannot PATCH** — engineer/viewer get 403.
8. **Refresh respects absolute cap (boundary)** — set `auth_time = now - abs_max` exactly → refresh 401 with `session_expired_absolute` (deadline check is `>=`, not `>`).
9. **Absolute-expired token is consumed** — attempt #1 returns `session_expired_absolute`; attempt #2 with the same token returns `invalid_refresh_token` (row was revoked atomically in #1, cannot be replayed).
10. **Refresh extends idle but not absolute** — rotate twice within `abs_max`; both succeed; `auth_time` unchanged across rotations.
11. **Idle expiry (boundary)** — set refresh `exp = now` → 401 with `session_expired_idle` (not generic `invalid_refresh_token`).
12. **Grandfather path** — legacy refresh token without `auth_time`/`idle_max`/`abs_max` → one successful rotation; new JWT has all three claims, `auth_time≈now()`.
13. **Tightening after login doesn't affect existing sessions** — login under policy A, owner tightens to policy B, refresh succeeds under A's snapshot.
14. **`/auth/login/json` carries new claims and response fields** — JWT decode shows `auth_time`/`idle_max`/`abs_max`; response body has `idle_expires_at` + `absolute_expires_at` as ISO strings.
15. **OAuth callback responses include expiry fields**`/auth/google/callback` and `/auth/microsoft/callback` `OAuthCallbackResponse` bodies have both `idle_expires_at` and `absolute_expires_at`. Mock the Google/Microsoft token-exchange step; assert on the final response shape.
16. **Policy update writes audit row** — PATCH `/accounts/me/security` emits one `account.session_policy_update` audit event with `actor_user_id`, `account_id`, and a payload of `{old: {...}, new: {...}, effective_old: {...}, effective_new: {...}}`. Verify via the existing audit-log query in `core/audit.py`.
17. **Bulk revoke scope=all** — seed three active refresh tokens for two users in the account (caller + one other). POST `/accounts/me/security/revoke-sessions` with `{"scope": "all"}``revoked_count=3`; caller's own refresh token is now revoked too. Their next `/auth/refresh` → 401 `invalid_refresh_token`.
18. **Bulk revoke scope=others** — same seed. POST with `{"scope": "others"}``revoked_count=2` (caller's token survives). Caller's `/auth/refresh` still succeeds; the other user's `/auth/refresh` → 401 `invalid_refresh_token`.
19. **Bulk revoke is account-scoped** — seed tokens for users in account A and account B. Owner of A POSTs revoke → `revoked_count` reflects only A's tokens; B's tokens remain active.
20. **Bulk revoke is owner-only** — engineer/viewer POST → 403; super_admin POST against `/me` works only if they own an account (the endpoint is `/me`, not `/{account_id}`).
21. **Bulk revoke writes audit row**`account.sessions_revoked_bulk` with `{actor_user_id, account_id, scope, revoked_count}`.
22. **Bulk revoke is idempotent** — second immediate POST returns `revoked_count=0` (no already-revoked rows are double-stamped).
Frontend (`frontend/src/__tests__/` or colocated `*.test.tsx`):
- `useAuthSessionExpiry` returns `"soon"` within 5min of whichever of `idleExpiresAt`/`absoluteExpiresAt` comes first; `reason` field indicates which.
- Axios interceptor on 401 with `session_expired_absolute` redirects to `/login?reason=session_expired` instead of attempting refresh.
- Axios interceptor on 401 with `session_expired_idle` does the same.
- Axios interceptor on 401 with `invalid_refresh_token` redirects to `/login` *without* the reason banner.
- `authStore` rehydrate handles legacy localStorage shape (no `idleExpiresAt`/`absoluteExpiresAt`) without throwing or forced logout; hook treats `null` as "no warning."
Manual:
- Log in as `owner@`, set **Custom (idle=60 min, absolute=240 min)** under Account → Session Security, log out, log in as `engineer@` (same account), decode the refresh JWT in localStorage, confirm `idle_max=3600` and `abs_max=14400` (seconds — the configured minutes × 60).
- Confirm the existing `useSessionTimer` (troubleshooting-flow timer) is unaffected by the new hook.
- Pre-deploy localStorage path: install build, log in to capture token, deploy session-policy build, refresh page — confirm no forced logout and that the next `/auth/refresh` populates the new fields.
## 7. Rollout
1. Land migration + backend changes behind no flag (the absolute cap is the whole point — flagging it defeats the purpose).
2. Default policy is Strict (3d/14d) for new accounts. Existing pilot accounts get NULL → defaults; user can manually loosen any pilot account via the new endpoint or direct SQL if friction emerges.
3. After deploy, watch Sentry for spikes in `session_expired_absolute` 401s (expected: tiny — only legacy tokens approaching 14-day mark hit this) and unexpected refresh failures.
4. Announce in pilot Slack: "We added session expiration. You'll be asked to log in again every 2 weeks max. Account owners can adjust under Account → Session Security."
## 8. Files touched
### Backend
- `backend/app/core/config.py` — new `SESSION_*` settings (defaults + min/max bounds).
- `backend/app/core/security.py``create_refresh_token` signature change (accepts `auth_time`/`idle_max`/`abs_max`), `resolve_session_policy(account)` helper, `decode_refresh_token_strict()` that distinguishes `ExpiredSignatureError` from generic `JWTError`.
- `backend/app/api/deps.py` — update `get_refresh_token_payload` to surface idle-expiry as `session_expired_idle` instead of collapsing into a generic 401.
- `backend/app/api/endpoints/auth.py` — refresh-endpoint logic (atomic-revoke-then-check-absolute), `_mint_session_tokens(user, account, db) -> Token` helper, login + login/json call sites.
- `backend/app/api/endpoints/oauth.py` — both callbacks call `_mint_session_tokens`; `OAuthCallbackResponse` gains the two new fields.
- `backend/app/schemas/token.py``Token` (`token.py:5`) adds `idle_expires_at` + `absolute_expires_at` (ISO strings).
- `backend/app/schemas/oauth.py``OAuthCallbackResponse` adds the same two fields.
- `backend/app/api/endpoints/account_security.py` — NEW (~130 lines: GET/PATCH for policy + POST `/revoke-sessions`, audit logging for both mutations).
- `backend/app/api/router.py` — register new router.
- `backend/app/models/account.py` — two new columns + DB CHECK constraint.
- `backend/app/schemas/account_security.py` — NEW (request/response: policy GET/PATCH with effective + bounds; `RevokeSessionsRequest` + `RevokeSessionsResponse`).
- `backend/app/core/audit.py` — add `account.session_policy_update` event type (or use the existing generic emitter if it accepts free-form types — verify during impl).
- `backend/alembic/versions/<hash>_session_policy_columns.py` — NEW (manual; per Lesson 77, never `--rev-id`).
- `backend/tests/test_session_policy.py` — NEW.
### Frontend
- `frontend/src/api/client.ts` — interceptor branches on both `session_expired_idle` and `session_expired_absolute` (same redirect target `/login?reason=session_expired`); also propagates new expiry fields from successful `/auth/refresh` responses into `setTokens`.
- `frontend/src/api/auth.ts``Token` type adds the two new ISO fields.
- `frontend/src/store/authStore.ts``setTokens` persists the new expiry fields (no new action).
- `frontend/src/pages/OAuthCallbackPage.tsx` — pass `idle_expires_at` + `absolute_expires_at` through `setTokens({...})` at line 102.
- `frontend/src/hooks/useAuthSessionExpiry.ts` — NEW.
- `frontend/src/components/common/SessionExpiryToast.tsx` — NEW.
- `frontend/src/components/layout/AppLayout.tsx` — mount toast.
- `frontend/src/pages/account/AccountSecuritySettingsPage.tsx` — NEW (policy form + Active Sessions section with two revoke buttons + confirmation modal).
- `frontend/src/pages/AccountSettingsPage.tsx` — add link card.
- `frontend/src/router.tsx` — register route.
- `frontend/src/pages/LoginPage.tsx``?reason=session_expired` banner.
### Docs
- `.ai/DECISIONS.md` — entry for the 3d/14d default + per-account-override architecture.
- `CURRENT-STATE.md` — add session policy to "auth surface" summary.
Approx ~600 LoC across backend + frontend, plus tests.
## 9. Resolved decisions & follow-ups
Decisions baked into this plan (not open questions):
- **Audit logging is required.** PATCH `/accounts/me/security` writes one `account.session_policy_update` audit event; POST `/revoke-sessions` writes `account.sessions_revoked_bulk`. Security-relevant by definition. Covered in §6 tests #16 and #21 and §8 backend file list.
- **Presets are Strict and Standard only**, plus Custom. No "Loose" preset; owners who want a loose policy can use Custom and own the choice explicitly.
- **Tightening policy mid-session does NOT force-logout existing sessions** — but owners *can* force it via the bulk-revoke endpoint in §4.11. Existing sessions continue under the policy snapshot they were issued under unless explicitly revoked. The Account Security page surfaces this in copy (§4.8).
- **Bulk revoke is account-scoped, two-mode (`all` / `others`).** Per-user device lists are out of scope (§4.11).
Follow-up issues to file after this plan is approved (not blocking this PR):
1. **Super-admin global lock with UI** — today, env-var ceilings cover this. File an issue to expose `SESSION_*_MAX` as a sysadmin-editable setting if/when a customer asks.
2. **Per-user device list + per-device revoke** — refresh tokens would gain `user_agent` + `ip` + `last_used_at` columns; a new "Active devices" page would let users self-revoke individual sessions. File only if a real ask arrives. The account-wide bulk revoke covers the breach-response use case in the meantime.
3. **Per-user (not per-account) policy** — out of scope. File only if a real ask arrives.
## 10. Sequence of commits
1. `feat(auth): add session policy settings + account columns + migration` (settings + model + migration + DB CHECK; no behavior change yet).
2. `feat(auth): distinguish idle expiry from invalid refresh tokens` (`decode_refresh_token_strict`, `session_expired_idle` detail, test #11). Lands the error-detail taxonomy from §4.10 before anything depends on it.
3. `feat(auth): embed auth_time/idle_max/abs_max in refresh tokens` (`security.py` + `_mint_session_tokens` helper called from `/auth/login`, `/auth/login/json`, both OAuth callbacks; `Token` and `OAuthCallbackResponse` gain `idle_expires_at` + `absolute_expires_at`). Refresh still doesn't enforce absolute cap yet.
4. `feat(auth): enforce absolute session cap in /auth/refresh` (atomic-revoke-then-check, `session_expired_absolute` detail, grandfather logic, tests #8#13).
5. `feat(api): add GET/PATCH /accounts/me/security endpoint` (router, schemas, owner gate, bounds + partial-override invariant validation, audit logging on PATCH).
6. `feat(api): add POST /accounts/me/security/revoke-sessions` (bulk-revoke endpoint with `scope=all|others`, single-UPDATE implementation, audit logging, tests #17#22).
7. `feat(ui): handle session_expired_{idle,absolute} in axios interceptor + authStore` (new fields persisted, legacy-state migration, redirect to `/login?reason=session_expired`).
8. `feat: AccountSecuritySettingsPage + active-users list + toasts + login banner` (Strict/Standard/Custom presets with always-visible-disabled Custom inputs, count-aware Active Sessions section with name/email/last-login rows, differentiated SessionExpiryToast for idle-vs-absolute, cyan info-tone login banner, scope=all auto-redirect-after-toast UX. Includes a small backend addition: `active_users` field on `GET /accounts/me/security` — see §4.7).
9. `docs: add decision entry + update CURRENT-STATE auth surface` (`.ai/DECISIONS.md`, `CURRENT-STATE.md`).
Each commit independently passes `pytest --override-ini="addopts="` and `npm run build`. The two backend behavior gates (#2 and #4) ship behind no flag — they're the point of the work — but they're sequenced so any rollback is a single commit.
---
**Review checklist before implementation:**
- [x] Defaults confirmed: 3d idle / 14d absolute.
- [x] Per-account override approved.
- [x] Grandfather strategy (one free rotation) approved vs hard cutover.
- [x] Error-detail taxonomy approved (idle vs absolute distinct on the wire; same UX in the frontend).
- [x] Audit logging is a requirement, not optional.
- [x] Loose preset dropped; Strict / Standard / Custom only.
- [x] ISO timestamps (not Unix ints) for `idle_expires_at` / `absolute_expires_at` everywhere.
- [x] DB CHECK constraint scope documented; partial-override case validated app-side.
- [ ] System bounds in §4.3 acceptable as specified (15min floor, 30d idle ceiling, 90d absolute ceiling).
- [ ] Final approval on commit sequence in §10.
- [ ] No conflict with Phase O cutover sequencing (this can ship before OR after EIN/Stripe lands; independent path).
- [ ] File the kill-all-sessions follow-up issue per §9 before implementation begins, so the Account Security page can link to it (or leave the support-contact copy in place).
---
## GSTACK REVIEW REPORT
| Review | Trigger | Why | Runs | Status | Findings |
|--------|---------|-----|------|--------|----------|
| CEO Review | `/plan-ceo-review` | Scope & strategy | 0 | — | not run |
| Codex Review | `/codex review` | Independent 2nd opinion | 0 | — | not run |
| Eng Review | `/plan-eng-review` | Architecture & tests (required) | 0 | — | not run (the plan itself was eng-reviewed inline across 7 commits — backend complete & green) |
| Design Review | `/plan-design-review` | UI/UX gaps | 1 | CLEAR (PLAN) | score: 4/10 → 9/10, 7 decisions added |
| DX Review | `/plan-devex-review` | Developer experience gaps | 0 | — | not run |
**UNRESOLVED:** 0 design decisions; 3 plan-level checklist items remain (system bounds, commit sequence, Phase O sequencing — none block design).
**VERDICT:** DESIGN CLEARED — page layout, state coverage, post-revoke flow, toast logic, login banner tone, and form copy all locked. Commit 8 has a complete spec.