diff --git a/docs/superpowers/specs/2026-03-24-conversational-branching-design.md b/docs/superpowers/specs/2026-03-24-conversational-branching-design.md index 898288dc..b0d35fef 100644 --- a/docs/superpowers/specs/2026-03-24-conversational-branching-design.md +++ b/docs/superpowers/specs/2026-03-24-conversational-branching-design.md @@ -47,7 +47,7 @@ The design is **additive and removable** — all branching state lives in new ta | RAG search | `rag_service.py` | Branch messages use same RAG pipeline | | Image upload + S3 storage | `file_uploads` table + `storage_service.py` | Extended with 5 new columns, no new table | | PSA push | `psa_documentation_service.py` | Resolution outputs and handoff notes push through existing service | -| Escalation package | `_build_escalation_package_enhanced()` | HandoffManager reuses this for snapshot generation | +| Escalation package structure | `_build_escalation_package_enhanced()` pattern | HandoffManager builds its own branch-aware snapshot (the existing function is not branch-aware and uses a different AI call path). The snapshot JSONB follows the same general structure for compatibility. | | Escalation queue | Existing `ai_sessions` query + frontend | Dual-write keeps old queue working | | Session lifecycle | `flowpilot_engine.py` resolve/escalate/pause | Extended, not replaced | @@ -76,7 +76,7 @@ The design is **additive and removable** — all branching state lives in new ta | Column | Type | Default | Notes | |---|---|---|---| | `is_branching` | BOOLEAN | FALSE | Whether branching is active | -| `active_branch_id` | UUID FK NULLABLE | NULL | Currently viewed branch | +| `active_branch_id` | UUID NULLABLE (no FK — soft pointer to avoid circular FK) | NULL | Currently viewed branch. No FK constraint to `session_branches` because of circular reference (session → branch → session). Application-level integrity. | | `handoff_count` | INTEGER | 0 | Times handed off | | `total_active_seconds` | INTEGER | 0 | Cumulative active time | | `total_parked_seconds` | INTEGER | 0 | Cumulative parked time | @@ -101,7 +101,7 @@ The design is **additive and removable** — all branching state lives in new ta All columns nullable. Existing rows unaffected. `ai_description` is always generated on upload (not just branching sessions) — useful for search, exports, PSA notes, Knowledge Flywheel. -Also add `'fork'` to `ai_session_steps.step_type` check constraint. +Also add `'fork'` to `ai_session_steps.step_type` check constraint. **Note:** modifying a PostgreSQL CHECK constraint requires `DROP CONSTRAINT` then `ADD CONSTRAINT` — this is NOT an additive operation. The Alembic migration must be written manually per CLAUDE.md lesson 77, using `op.drop_constraint('ck_ai_session_steps_step_type')` then `op.create_check_constraint(...)` with the new values list. ### New tables @@ -118,7 +118,7 @@ Also add `'fork'` to `ai_session_steps.step_type` check constraint. | `status` | VARCHAR(20) | `active`, `dead_end`, `solved`, `untried`, `revived` | | `status_reason` | TEXT NULLABLE | AI-generated reason for status | | `status_changed_at` | TIMESTAMP NULLABLE | | -| `status_changed_by` | UUID FK NULLABLE | | +| `status_changed_by` | UUID FK → `users.id`, ondelete SET NULL, NULLABLE | | | `conversation_messages` | JSONB | LLM message history scoped to this branch | | `context_summary` | JSONB | `{tried: [], concluded: str, artifacts: []}` | | `evidence_from_branch_id` | UUID FK NULLABLE | If revived, evidence source | @@ -184,7 +184,7 @@ Dual-write: on create, also populates `ai_sessions.escalation_package` and `esca | `created_at` | TIMESTAMP | | | `updated_at` | TIMESTAMP | | -Constraints: `UNIQUE(session_id, output_type)`, check constraints on `output_type` and `status`. +Constraints: check constraints on `output_type` and `status`. Use `INSERT ... ON CONFLICT (session_id, output_type) DO UPDATE` (upsert) in `generate_all()` so outputs can be regenerated if a session is re-opened after resolution. `UNIQUE(session_id, output_type)` enforces one-of-each but allows replacement. ### Entity Relationships @@ -256,7 +256,7 @@ Branch lifecycle management. Pure data operations + one LLM call pattern (contex | Method | What it does | LLM call? | |---|---|---| | `create_root_branch(session_id)` | Creates root branch, sets `is_branching=True`, copies `session.conversation_messages` into root branch. Session-level field kept as pre-branching snapshot. | No | -| `create_fork(session_id, parent_branch_id, trigger_step_id, fork_reason, options[])` | Creates `ForkPoint` + N `SessionBranch` rows. Sets `is_fork_point=True` on trigger step. Unexplored options get status `untried`. | No | +| `create_fork(session_id, parent_branch_id, trigger_step_id, fork_reason, options[])` | Pre-generates all branch UUIDs in Python, then inserts `ForkPoint` (with branch_ids in options JSONB) + N `SessionBranch` rows in a single transaction. Sets `is_fork_point=True` on trigger step. Unexplored options get status `untried`. | No | | `switch_branch(session_id, target_branch_id)` | Updates `session.active_branch_id`. Returns branch with context. | No | | `mark_branch_status(branch_id, status, reason)` | Updates status. Generates `context_summary` via `_call_ai`. | Yes — summary | | `revive_branch(branch_id, evidence_from_branch_id, evidence_description)` | Sets status `revived`, records evidence source, prepends revival context to branch messages. | No | @@ -269,17 +269,22 @@ Pure function — takes data, returns assembled prompt components. No DB access, **Single method:** `build(branch, sibling_summaries, session_context, attachments, token_budget)` -Returns: `{system_prompt: str, history: list[dict], new_message: str, images: list[dict]}` +Returns: `{system_base: str, rag_context: str, history: list[dict], new_message: str, images: list[dict]}` -Preserves `_call_ai`'s cache breakpoint behavior by separating history from new message. +**Important:** Return keys match `_call_ai`'s parameter names exactly. `system_base` is the stable system prompt (cached by Anthropic). `rag_context` contains the cross-branch summaries + attachment descriptions (NOT cached — changes per query). This preserves prompt caching: the base prompt is a cache hit across turns, while cross-branch context varies. + +Callers invoke: `_call_ai(**builder.build(...))`. + +**Data fetching responsibility:** The API endpoint (or a coordinator method `BranchManager.build_prompt_inputs(session_id, branch_id, db)`) pre-fetches all data from the DB — branch messages, sibling summaries via `build_cross_branch_context()`, session context, attachment descriptions — then passes the assembled data to `build()`. The builder itself does no DB queries. **Assembly order:** -1. Session context (~2,000 tokens) — problem summary, domain, client info, PSA data -2. Cross-branch summaries (~3,000 token cap) — prioritized: active > untried > revived > dead_end -3. Revival context — if branch was revived, prepend evidence -4. Attachment descriptions (~1,000 tokens) — `ai_description` from other branches' uploads -5. Branch messages (remaining budget) — last 10-15 turns verbatim, older summarized -6. Token budget enforcement — compress: old messages → dead-end summaries → file content → never drop system prompt, last 5 messages, branch status map +1. `system_base` — `ASSISTANT_SYSTEM_PROMPT` + session context (~2,000 tokens). Problem summary, domain, client info, PSA data. This is stable across turns and gets cached. +2. `rag_context` — cross-branch summaries (~3,000 token cap, prioritized: active > untried > revived > dead_end) + attachment descriptions from other branches (~1,000 tokens). This changes per query and is NOT cached. +3. Revival context — if branch was revived, prepend evidence to `rag_context`. +4. `history` — branch's `conversation_messages` minus the last user message. Last 10-15 turns verbatim, older summarized. +5. `new_message` — the current user message (latest turn). +6. `images` — image references from current branch uploads. +7. Token budget enforcement — compress in order: old messages → dead-end summaries → file content → never drop system_base, last 5 messages, branch status map. ### `services/handoff_manager.py` @@ -288,7 +293,7 @@ Unified park/escalate with dual-write backward compatibility. | Method | What it does | LLM call? | |---|---|---| | `create_handoff(session_id, intent, engineer_notes, user_id)` | Creates `SessionHandoff`. Calls `generate_snapshot()`. If escalate, calls `generate_ai_assessment()`. Dual-writes to `session.escalation_package` + `escalated_to_id`. | Escalate only | -| `generate_snapshot(session_id)` | Serializes branch tree into snapshot JSONB. Reuses `_build_escalation_package_enhanced()` for steps-tried data. | No | +| `generate_snapshot(session_id)` | Serializes branch tree into snapshot JSONB. Builds its own branch-aware steps-tried data (the existing `_build_escalation_package_enhanced()` is not branch-aware — it iterates all steps without branch attribution and uses a different AI call path). Follows the same general snapshot structure for compatibility with the existing escalation queue. | No | | `generate_ai_assessment(session_id)` | Full session + branch context → diagnostic assessment. | Yes | | `generate_briefing(handoff_id, claiming_user_id)` | Natural-language handoff summary for claiming engineer. | Yes | | `claim_session(handoff_id, claiming_user_id)` | Updates `claimed_by/at`, sets session `active`. Dual-writes `escalation_package`. | No | @@ -311,12 +316,14 @@ Three LLM calls on resolve, each through `_call_ai`. Not a new service — extends the existing upload endpoint in `uploads.py`: -1. Upload completes, response returned immediately. +1. Upload completes, response returned immediately (non-blocking). 2. Background task (via `asyncio.create_task`) calls `_call_ai` with image + prompt: "Describe this in one sentence for a troubleshooting context log." 3. Result written to `file_uploads.ai_description`. -4. For text files: extract content directly, call `_call_ai` for summary if >2,000 tokens. +4. For text files: extract content directly (no LLM), call `_call_ai` for summary only if content >2,000 tokens. 5. Always runs (not just branching sessions) — useful for search, exports, PSA notes. +**Safeguards:** The background task must catch and log all exceptions without crashing the upload response. If `_call_ai` fails (rate limit, timeout), `ai_description` stays NULL — the upload is still usable, just without cross-branch context. Cost is ~$0.005 per image on Sonnet (~1,650 input tokens + 50 output tokens). No additional rate limiting needed beyond the existing upload rate limit (10/minute). + --- ## API Endpoints @@ -356,16 +363,35 @@ Note: endpoints nest under `/ai-sessions` (not `/api/v1/sessions` as the origina --- +## Integration Surface — Existing Code Changes + +**Critical:** The following existing code paths must check `session.is_branching` to avoid data divergence: + +### `unified_chat_service.send_chat_message()` +Currently appends messages to `session.conversation_messages`. When `is_branching=True`, this must instead: +- Route the message to `session_branches[active_branch_id].conversation_messages` +- Use `BranchAwarePromptBuilder` for context assembly instead of the flat message history +- Still call `_call_ai` for the actual LLM interaction (same call path) + +### `flowpilot_engine` step creation +Currently creates `AISessionStep` with no `branch_id`. When `is_branching=True`, must set `branch_id` to the active branch. + +### Existing `/ai-sessions/{id}/chat` endpoint +Must detect `is_branching` and delegate to the branch message endpoint logic. Linear sessions continue through the existing path unchanged. + +**Pattern:** Each integration point is a simple `if session.is_branching:` guard that delegates to branching services. The existing code path is the `else` — completely untouched. If the feature is rolled back, remove the guards and the else paths remain. + +--- + ## Token Budget Strategy ### Budget Allocation | Context Layer | Budget | Strategy | |---|---|---| -| System prompt + session context | ~2,000 tokens | Fixed | -| Cross-branch summaries | ~3,000 tokens | Scales with branch count. Each summary ~200-500 tokens. Cap at 3,000. | +| System prompt + session context | ~2,000 tokens | Fixed. Lives in `system_base` (cached). | +| Cross-branch summaries + attachment descriptions | ~4,000 tokens combined | Scales with branch count. Each branch summary ~200-500 tokens + attachment descriptions ~100 tokens each. Lives in `rag_context` (not cached). Cap at 4,000 total. | | Current branch messages | Remaining budget | Last 10-15 turns verbatim. Older summarized. | -| Attachment descriptions | ~1,000 tokens | Included in cross-branch summaries | ### Graceful Degradation (in order) @@ -445,7 +471,8 @@ backend/app/ ### Phase 1: Data Foundation (est. 2 days) - Create 4 new models + Pydantic schemas - Add columns to `ai_sessions`, `ai_session_steps`, `file_uploads` -- Single Alembic migration (all additive) +- Manual Alembic migration (per CLAUDE.md lesson 77). Mostly additive — new tables + nullable columns. One non-additive operation: `step_type` CHECK constraint must be dropped and recreated with `'fork'` added. Use `op.drop_constraint` / `op.create_check_constraint`. +- `active_branch_id` on `ai_sessions` is a plain UUID column with no FK constraint (avoids circular FK with `session_branches`) - Unit tests for model creation and relationships ### Phase 2: Branch Engine (est. 2-3 days) @@ -496,3 +523,5 @@ If this feature is pulled: 7. Delete endpoint files: `session_branches.py`, `session_handoffs.py`, `session_resolutions.py` 8. Delete frontend components/hooks/API clients listed above 9. Existing escalation flow, upload pipeline, chat service, PSA integration — **all untouched** +10. **Dual-write rollback note:** Sessions that were escalated via `HandoffManager` (while branching was active) will have `escalation_package` in the branching snapshot format (includes `branch_map`). The existing escalation queue UI should handle both the old flat format and the branching format gracefully — check for `branch_map` key presence. This is the one data shape difference that persists after rollback. +11. Remove `if session.is_branching:` guards from `unified_chat_service`, `flowpilot_engine`, and the chat endpoint. The else paths are the original code — unchanged.