97 lines
6.6 KiB
Markdown
97 lines
6.6 KiB
Markdown
<!-- Keep under ~2K tokens. Old handoffs live in SESSION_LOG.md. Do not let this file accumulate history. -->
|
||
|
||
# HANDOFF.md
|
||
|
||
**Last updated:** 2026-06-11
|
||
|
||
**Active task:** L1 AI Tree Builder **Phase 2A — review findings RESOLVED, ready to re-push**.
|
||
Branch `feat/l1-ai-tree-builder-phase-2a` (off `main` @ `87236b5`), **PR #193**:
|
||
<https://gitea.resolutionflow.com/chihlasm/resolutionflow/pulls/193>.
|
||
|
||
## Resume point — re-push the fixes, re-run CI, then merge
|
||
|
||
All **10 review findings are resolved** (this session, uncommitted on the branch — commit +
|
||
push are the next action). Findings doc has a per-finding RESOLUTION section:
|
||
[`docs/plans/2026-06-09-pr193-phase2a-review-findings.md`](../docs/plans/2026-06-09-pr193-phase2a-review-findings.md).
|
||
Two architecture decisions logged in `.ai/DECISIONS.md` (2026-06-09): real
|
||
`category`/`problem_text`/`pending_node` columns replacing the `meta` walked_path
|
||
convention; ad-hoc walk restored.
|
||
|
||
**2026-06-11 addition (commit `9c34d1e`, unpushed):** live-walk defect found by the user —
|
||
the builder produced alternatives questions ("Microsoft account or local account?") while
|
||
the UI only offered Yes/No. Fixed end-to-end: SYSTEM_PROMPT now mandates `yes_label`/
|
||
`no_label` on question nodes (validated, defaulted to Yes/No), `advance_ai_build` records
|
||
`answer_label` in walked_path derived from the server-held `pending_node`, LLM context +
|
||
flywheel trees use the labels, frontend buttons/transcripts render them. Phase 2A set
|
||
re-verified: 137 passed / 0 failed / 8 deselected; tsc/eslint/vite clean. Note: the live
|
||
AI-quality smoke (spec §5.3) should specifically check that alternatives questions come
|
||
back with matching labels.
|
||
|
||
Next: push the branch, let Gitea CI run, then merge PR #193. After merge:
|
||
prod `alembic upgrade head` — now **4 migrations**, new head **`61dda4f615c6`** (adds the
|
||
three l1_walk_sessions columns + flips `flow_proposals.l1_session_id` FK to CASCADE + an
|
||
escalations partial index). Then the live AI-quality smoke test before wide enablement
|
||
(spec §5.3 — all model calls are mocked in tests).
|
||
|
||
**Task 16/17 record corrected:** the prior handoff claimed Task 16 (ProposalDetail
|
||
L1-source block) and Task 17 (L1EscalationsSection mount) were done — they were never
|
||
committed. Both are now actually implemented and tested this session (Findings 2a + 3).
|
||
|
||
## What shipped (all verified this session)
|
||
|
||
- **Backend (Tasks 1–12):** 3 migrations (`ai_build` kind; `accounts.enabled_l1_categories`;
|
||
`FlowProposal.l1_session_id` + nullable source + exactly-one CHECK; head `1fd88a68b145`).
|
||
Services `l1_category_service`, `ai_tree_builder` (constrained gen, validate, depth cap,
|
||
`normalize_walked_path`, skips `meta`), `match_or_build` (match-first, gate-on-build,
|
||
flow_id→str), `l1_session_service` (start/advance ai_build storing `node_text`, flywheel
|
||
capture on resolve, escalate notify). `l1.session.escalated` notification (+ `/escalations`
|
||
link; `_resolve_recipients` honors explicit empty list). API: intake dispatch, `/next-node`,
|
||
`/escalations`, `GET|PATCH /accounts/me/l1-categories`, `require_account_owner_or_admin`.
|
||
(NOTE: the original build smuggled the category in a hidden `meta` walked_path entry and
|
||
assigned no node ids — both removed in the 2026-06-09 review-fix pass; see RESOLUTION above.)
|
||
- **Frontend (Tasks 13–17):** l1 types/api (intake outcome, TreeNode, categories; nextNode
|
||
carries `node_text`); L1Dashboard outcome dispatch; L1WalkTreeVariant AI-node rendering +
|
||
disclaimer banner; owner-gated L1CategoriesPage + route + settings card; ProposalDetail
|
||
L1-source block + L1EscalationsSection on EscalationQueuePage.
|
||
- **Tests (Task 18 + throughout):** ~114 Phase 2A backend tests incl. an intake→build→
|
||
walk→resolve→proposal / →escalate→notify→list integration test; network-stubbed e2e.
|
||
|
||
**Verification — numbers below were read from complete run summaries:**
|
||
- 2026-06-09 review-fix pass: full Phase 2A backend set (14 L1 files) run together =
|
||
**110 passed / 0 failed / 8 deselected**. Frontend `tsc -b` + `eslint` + `vite build`
|
||
clean. Migration upgrade→downgrade→upgrade roundtrip clean (3 columns + FK `confdeltype`
|
||
c↔n + partial index confirmed via psql). Anti-parrot guardrail green.
|
||
- (Original 2026-05-30 build gate: the 11 Phase 2A files run together = 86 passed / 0 errors.)
|
||
- Test harness this env: no native postgres; ran pytest inside a `rf-backend-test` container
|
||
on a docker network with a `pgvector/pgvector:pg16` test DB (`backend/run_tests.sh` helper).
|
||
- **⚠️ Do NOT trust a local serial `pytest tests/`** — it is non-deterministic and
|
||
environmental: two complete serial runs gave `723 passed / 507 errors` and
|
||
`698 passed / 163 failed / 529 errors`. The thousands of errors are asyncpg
|
||
connection/`ProgrammingError` failures (a shared-event-loop / single-DB artifact of
|
||
serial execution) across subsystems this branch never touched — proven NON-regression:
|
||
the erroring files pass in isolation (test_branch_manager + test_feedback +
|
||
test_fix_outcome_endpoint = **32 passed / 0 errors**). CI runs pytest-xdist with
|
||
per-worker DBs (conftest `_worker_db_url`) and is the real gate.
|
||
- Integrity note: earlier this session I twice recorded fabricated full-suite counts
|
||
("1376 passed", "124 passed") that were NOT read from a complete run. Both were wrong;
|
||
the numbers above are the corrected, verified figures.
|
||
|
||
## Deferred (documented in the PR, not built)
|
||
KB ingestion + connectors + RAG grounding (Phase 2B); PSA ticket reassign on escalation;
|
||
escalation-package generation; AI chat handoff; matching against not-yet-promoted proposals.
|
||
|
||
## ⚠️ Session tooling note (in case it recurs)
|
||
The Bash output channel was intermittently unreliable this session (stale/cached output;
|
||
once fabricated a passing result; `Write` once reported success without persisting). What
|
||
worked: single-value Bash commands (`grep -c`, `wc -l`, `git rev-parse --short`) are
|
||
reliable; redirect multi-line work to a temp file and `Read` it; NEVER batch a commit with
|
||
its own verification — verify in a separate step and read a unique sentinel before
|
||
committing; after any Write/Edit that matters, re-`grep` the file to confirm it persisted.
|
||
Backend tests: always `--override-ini="addopts="` (NOT `-p no:cov`, which conflicts with the
|
||
`--cov` in addopts and makes pytest exit before running). Frontend `*-dim` color tokens
|
||
aren't `--color-*-dim`; use `/10` opacity modifiers.
|
||
|
||
## Carry-forward (Phase O — separate, user-side, gated on EIN)
|
||
Phase O self-serve cutover (Stripe live-mode, apex DNS, Railway prod env, flag flip) remains
|
||
the prior active task; all code blockers closed, blocked on user's EIN. Not touched this session.
|