docs: fix 12 review issues in branching design spec

Critical fixes:
- _call_ai signature: return keys now match params (system_base, rag_context)
- Cross-branch context goes in rag_context (not system_base) to preserve caching
- HandoffManager builds own branch-aware snapshot (not reusing non-branch-aware fn)
- Added Integration Surface section: unified_chat_service + flowpilot_engine
  must check is_branching to avoid data divergence

Major fixes:
- active_branch_id: plain UUID, no FK (avoids circular FK)
- Resolution outputs: upsert on regeneration, not unique violation
- AI description pipeline: error handling + cost safeguards documented
- create_fork: pre-generates branch UUIDs for single-transaction insert

Minor fixes:
- status_changed_by FK target specified (users.id)
- Token budget: attachment descriptions inside cross-branch cap (4K combined)
- BranchAwarePromptBuilder: caller responsibility documented
- Removability: dual-write escalation_package shape divergence noted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
chihlasm
2026-03-24 07:46:39 +00:00
parent a9b2aadb9e
commit 3309b6d1ce

View File

@@ -47,7 +47,7 @@ The design is **additive and removable** — all branching state lives in new ta
| RAG search | `rag_service.py` | Branch messages use same RAG pipeline |
| Image upload + S3 storage | `file_uploads` table + `storage_service.py` | Extended with 5 new columns, no new table |
| PSA push | `psa_documentation_service.py` | Resolution outputs and handoff notes push through existing service |
| Escalation package | `_build_escalation_package_enhanced()` | HandoffManager reuses this for snapshot generation |
| Escalation package structure | `_build_escalation_package_enhanced()` pattern | HandoffManager builds its own branch-aware snapshot (the existing function is not branch-aware and uses a different AI call path). The snapshot JSONB follows the same general structure for compatibility. |
| Escalation queue | Existing `ai_sessions` query + frontend | Dual-write keeps old queue working |
| Session lifecycle | `flowpilot_engine.py` resolve/escalate/pause | Extended, not replaced |
@@ -76,7 +76,7 @@ The design is **additive and removable** — all branching state lives in new ta
| Column | Type | Default | Notes |
|---|---|---|---|
| `is_branching` | BOOLEAN | FALSE | Whether branching is active |
| `active_branch_id` | UUID FK NULLABLE | NULL | Currently viewed branch |
| `active_branch_id` | UUID NULLABLE (no FK — soft pointer to avoid circular FK) | NULL | Currently viewed branch. No FK constraint to `session_branches` because of circular reference (session → branch → session). Application-level integrity. |
| `handoff_count` | INTEGER | 0 | Times handed off |
| `total_active_seconds` | INTEGER | 0 | Cumulative active time |
| `total_parked_seconds` | INTEGER | 0 | Cumulative parked time |
@@ -101,7 +101,7 @@ The design is **additive and removable** — all branching state lives in new ta
All columns nullable. Existing rows unaffected. `ai_description` is always generated on upload (not just branching sessions) — useful for search, exports, PSA notes, Knowledge Flywheel.
Also add `'fork'` to `ai_session_steps.step_type` check constraint.
Also add `'fork'` to `ai_session_steps.step_type` check constraint. **Note:** modifying a PostgreSQL CHECK constraint requires `DROP CONSTRAINT` then `ADD CONSTRAINT` — this is NOT an additive operation. The Alembic migration must be written manually per CLAUDE.md lesson 77, using `op.drop_constraint('ck_ai_session_steps_step_type')` then `op.create_check_constraint(...)` with the new values list.
### New tables
@@ -118,7 +118,7 @@ Also add `'fork'` to `ai_session_steps.step_type` check constraint.
| `status` | VARCHAR(20) | `active`, `dead_end`, `solved`, `untried`, `revived` |
| `status_reason` | TEXT NULLABLE | AI-generated reason for status |
| `status_changed_at` | TIMESTAMP NULLABLE | |
| `status_changed_by` | UUID FK NULLABLE | |
| `status_changed_by` | UUID FK`users.id`, ondelete SET NULL, NULLABLE | |
| `conversation_messages` | JSONB | LLM message history scoped to this branch |
| `context_summary` | JSONB | `{tried: [], concluded: str, artifacts: []}` |
| `evidence_from_branch_id` | UUID FK NULLABLE | If revived, evidence source |
@@ -184,7 +184,7 @@ Dual-write: on create, also populates `ai_sessions.escalation_package` and `esca
| `created_at` | TIMESTAMP | |
| `updated_at` | TIMESTAMP | |
Constraints: `UNIQUE(session_id, output_type)`, check constraints on `output_type` and `status`.
Constraints: check constraints on `output_type` and `status`. Use `INSERT ... ON CONFLICT (session_id, output_type) DO UPDATE` (upsert) in `generate_all()` so outputs can be regenerated if a session is re-opened after resolution. `UNIQUE(session_id, output_type)` enforces one-of-each but allows replacement.
### Entity Relationships
@@ -256,7 +256,7 @@ Branch lifecycle management. Pure data operations + one LLM call pattern (contex
| Method | What it does | LLM call? |
|---|---|---|
| `create_root_branch(session_id)` | Creates root branch, sets `is_branching=True`, copies `session.conversation_messages` into root branch. Session-level field kept as pre-branching snapshot. | No |
| `create_fork(session_id, parent_branch_id, trigger_step_id, fork_reason, options[])` | Creates `ForkPoint` + N `SessionBranch` rows. Sets `is_fork_point=True` on trigger step. Unexplored options get status `untried`. | No |
| `create_fork(session_id, parent_branch_id, trigger_step_id, fork_reason, options[])` | Pre-generates all branch UUIDs in Python, then inserts `ForkPoint` (with branch_ids in options JSONB) + N `SessionBranch` rows in a single transaction. Sets `is_fork_point=True` on trigger step. Unexplored options get status `untried`. | No |
| `switch_branch(session_id, target_branch_id)` | Updates `session.active_branch_id`. Returns branch with context. | No |
| `mark_branch_status(branch_id, status, reason)` | Updates status. Generates `context_summary` via `_call_ai`. | Yes — summary |
| `revive_branch(branch_id, evidence_from_branch_id, evidence_description)` | Sets status `revived`, records evidence source, prepends revival context to branch messages. | No |
@@ -269,17 +269,22 @@ Pure function — takes data, returns assembled prompt components. No DB access,
**Single method:** `build(branch, sibling_summaries, session_context, attachments, token_budget)`
Returns: `{system_prompt: str, history: list[dict], new_message: str, images: list[dict]}`
Returns: `{system_base: str, rag_context: str, history: list[dict], new_message: str, images: list[dict]}`
Preserves `_call_ai`'s cache breakpoint behavior by separating history from new message.
**Important:** Return keys match `_call_ai`'s parameter names exactly. `system_base` is the stable system prompt (cached by Anthropic). `rag_context` contains the cross-branch summaries + attachment descriptions (NOT cached — changes per query). This preserves prompt caching: the base prompt is a cache hit across turns, while cross-branch context varies.
Callers invoke: `_call_ai(**builder.build(...))`.
**Data fetching responsibility:** The API endpoint (or a coordinator method `BranchManager.build_prompt_inputs(session_id, branch_id, db)`) pre-fetches all data from the DB — branch messages, sibling summaries via `build_cross_branch_context()`, session context, attachment descriptions — then passes the assembled data to `build()`. The builder itself does no DB queries.
**Assembly order:**
1. Session context (~2,000 tokens) — problem summary, domain, client info, PSA data
2. Cross-branch summaries (~3,000 token cap) — prioritized: active > untried > revived > dead_end
3. Revival context — if branch was revived, prepend evidence
4. Attachment descriptions (~1,000 tokens) — `ai_description` from other branches' uploads
5. Branch messages (remaining budget) — last 10-15 turns verbatim, older summarized
6. Token budget enforcement — compress: old messages → dead-end summaries → file content → never drop system prompt, last 5 messages, branch status map
1. `system_base``ASSISTANT_SYSTEM_PROMPT` + session context (~2,000 tokens). Problem summary, domain, client info, PSA data. This is stable across turns and gets cached.
2. `rag_context` — cross-branch summaries (~3,000 token cap, prioritized: active > untried > revived > dead_end) + attachment descriptions from other branches (~1,000 tokens). This changes per query and is NOT cached.
3. Revival context — if branch was revived, prepend evidence to `rag_context`.
4. `history` — branch's `conversation_messages` minus the last user message. Last 10-15 turns verbatim, older summarized.
5. `new_message` — the current user message (latest turn).
6. `images` — image references from current branch uploads.
7. Token budget enforcement — compress in order: old messages → dead-end summaries → file content → never drop system_base, last 5 messages, branch status map.
### `services/handoff_manager.py`
@@ -288,7 +293,7 @@ Unified park/escalate with dual-write backward compatibility.
| Method | What it does | LLM call? |
|---|---|---|
| `create_handoff(session_id, intent, engineer_notes, user_id)` | Creates `SessionHandoff`. Calls `generate_snapshot()`. If escalate, calls `generate_ai_assessment()`. Dual-writes to `session.escalation_package` + `escalated_to_id`. | Escalate only |
| `generate_snapshot(session_id)` | Serializes branch tree into snapshot JSONB. Reuses `_build_escalation_package_enhanced()` for steps-tried data. | No |
| `generate_snapshot(session_id)` | Serializes branch tree into snapshot JSONB. Builds its own branch-aware steps-tried data (the existing `_build_escalation_package_enhanced()` is not branch-aware — it iterates all steps without branch attribution and uses a different AI call path). Follows the same general snapshot structure for compatibility with the existing escalation queue. | No |
| `generate_ai_assessment(session_id)` | Full session + branch context → diagnostic assessment. | Yes |
| `generate_briefing(handoff_id, claiming_user_id)` | Natural-language handoff summary for claiming engineer. | Yes |
| `claim_session(handoff_id, claiming_user_id)` | Updates `claimed_by/at`, sets session `active`. Dual-writes `escalation_package`. | No |
@@ -311,12 +316,14 @@ Three LLM calls on resolve, each through `_call_ai`.
Not a new service — extends the existing upload endpoint in `uploads.py`:
1. Upload completes, response returned immediately.
1. Upload completes, response returned immediately (non-blocking).
2. Background task (via `asyncio.create_task`) calls `_call_ai` with image + prompt: "Describe this in one sentence for a troubleshooting context log."
3. Result written to `file_uploads.ai_description`.
4. For text files: extract content directly, call `_call_ai` for summary if >2,000 tokens.
4. For text files: extract content directly (no LLM), call `_call_ai` for summary only if content >2,000 tokens.
5. Always runs (not just branching sessions) — useful for search, exports, PSA notes.
**Safeguards:** The background task must catch and log all exceptions without crashing the upload response. If `_call_ai` fails (rate limit, timeout), `ai_description` stays NULL — the upload is still usable, just without cross-branch context. Cost is ~$0.005 per image on Sonnet (~1,650 input tokens + 50 output tokens). No additional rate limiting needed beyond the existing upload rate limit (10/minute).
---
## API Endpoints
@@ -356,16 +363,35 @@ Note: endpoints nest under `/ai-sessions` (not `/api/v1/sessions` as the origina
---
## Integration Surface — Existing Code Changes
**Critical:** The following existing code paths must check `session.is_branching` to avoid data divergence:
### `unified_chat_service.send_chat_message()`
Currently appends messages to `session.conversation_messages`. When `is_branching=True`, this must instead:
- Route the message to `session_branches[active_branch_id].conversation_messages`
- Use `BranchAwarePromptBuilder` for context assembly instead of the flat message history
- Still call `_call_ai` for the actual LLM interaction (same call path)
### `flowpilot_engine` step creation
Currently creates `AISessionStep` with no `branch_id`. When `is_branching=True`, must set `branch_id` to the active branch.
### Existing `/ai-sessions/{id}/chat` endpoint
Must detect `is_branching` and delegate to the branch message endpoint logic. Linear sessions continue through the existing path unchanged.
**Pattern:** Each integration point is a simple `if session.is_branching:` guard that delegates to branching services. The existing code path is the `else` — completely untouched. If the feature is rolled back, remove the guards and the else paths remain.
---
## Token Budget Strategy
### Budget Allocation
| Context Layer | Budget | Strategy |
|---|---|---|
| System prompt + session context | ~2,000 tokens | Fixed |
| Cross-branch summaries | ~3,000 tokens | Scales with branch count. Each summary ~200-500 tokens. Cap at 3,000. |
| System prompt + session context | ~2,000 tokens | Fixed. Lives in `system_base` (cached). |
| Cross-branch summaries + attachment descriptions | ~4,000 tokens combined | Scales with branch count. Each branch summary ~200-500 tokens + attachment descriptions ~100 tokens each. Lives in `rag_context` (not cached). Cap at 4,000 total. |
| Current branch messages | Remaining budget | Last 10-15 turns verbatim. Older summarized. |
| Attachment descriptions | ~1,000 tokens | Included in cross-branch summaries |
### Graceful Degradation (in order)
@@ -445,7 +471,8 @@ backend/app/
### Phase 1: Data Foundation (est. 2 days)
- Create 4 new models + Pydantic schemas
- Add columns to `ai_sessions`, `ai_session_steps`, `file_uploads`
- Single Alembic migration (all additive)
- Manual Alembic migration (per CLAUDE.md lesson 77). Mostly additive — new tables + nullable columns. One non-additive operation: `step_type` CHECK constraint must be dropped and recreated with `'fork'` added. Use `op.drop_constraint` / `op.create_check_constraint`.
- `active_branch_id` on `ai_sessions` is a plain UUID column with no FK constraint (avoids circular FK with `session_branches`)
- Unit tests for model creation and relationships
### Phase 2: Branch Engine (est. 2-3 days)
@@ -496,3 +523,5 @@ If this feature is pulled:
7. Delete endpoint files: `session_branches.py`, `session_handoffs.py`, `session_resolutions.py`
8. Delete frontend components/hooks/API clients listed above
9. Existing escalation flow, upload pipeline, chat service, PSA integration — **all untouched**
10. **Dual-write rollback note:** Sessions that were escalated via `HandoffManager` (while branching was active) will have `escalation_package` in the branching snapshot format (includes `branch_map`). The existing escalation queue UI should handle both the old flat format and the branching format gracefully — check for `branch_map` key presence. This is the one data shape difference that persists after rollback.
11. Remove `if session.is_branching:` guards from `unified_chat_service`, `flowpilot_engine`, and the chat endpoint. The else paths are the original code — unchanged.