chihlasm 97cd297f46 feat: AI-assisted flow builder with 4-stage wizard (#87)
* feat: AI-assisted flow builder with 4-stage wizard

Implements the complete AI flow builder feature using a guided 4-stage
wizard (Foundation → Scaffold → Branch Detail → Review & Assemble).
AI assists at bounded points using Claude Haiku for cost-efficient
structured JSON generation (~$0.01-0.03/flow).

Backend: new models (ai_conversations, ai_usage), Alembic migration,
quota enforcement with billing anchor, Anthropic API integration with
prompt caching, tree validation, conversation CRUD with 24h TTL,
APScheduler cleanup job, 5 API endpoints, Pydantic schemas.

Frontend: TypeScript types, API client, Zustand store for wizard state,
7 components (modal, step indicator, foundation form, branch selector,
branch detail view, tree preview, quota display), MyTreesPage integration
with "Build with AI" button (hidden when AI not configured).

Tests: 14 validator unit tests + 11 endpoint integration tests with
mocked Anthropic (zero real API spend). All 25 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: dashboard design doc and implementation plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Phase 1 — pinnedFlowsStore, pagination hook, cached quota hook, sidebar refactor

- Add pin() to pinnedFlowsApi
- Create pinnedFlowsStore (Zustand) — single source of truth for pin state
- Add dashboardMyFlowsView preference to userPreferencesStore
- Create usePaginationParams hook (URL-synced)
- Create useCachedQuota hook (5-min TTL)
- Sidebar uses pinnedFlowsStore instead of local state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Phase 2 — pin/favorite buttons on all library view components

- TreeGridView: star in top-right corner of cards
- TreeListView: star at end of each row
- TreeTableView: dedicated leftmost Favorite column
- All with proper a11y (aria-label), event isolation, loading states

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Phase 3 — Library page create dropdown + AI Builder + pin wiring

- Replace single Create link with dropdown menu (3 flow types + AI Builder)
- Wire pinnedFlowsStore to all view components
- AI Builder modal integration via useCachedQuota hook

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Phase 4 — Dashboard refactor with Favorites grid + paginated My Flows

- Favorites section: compact grid from pinnedFlowsStore, max 2 rows, expandable
- My Flows: author_id filter, URL-synced pagination (10/25/50/All)
- View toggle (grid/list/table) with independent preference
- Skeleton loaders, empty states with CTAs
- Create dropdown with AI Builder option
- 500-item ceiling for "Show All" mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Phase 5 — Sidebar pinned section dual collapse + show more/less

- Header collapse hides entire section, resets to 5 items on re-expand
- List truncation: show first 5, "Show more (N)" expands to all
- Clicking a flow auto-collapses back to 5
- Smooth max-height CSS transition (250ms ease-out)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: stabilize usePaginationParams to prevent infinite re-render loop

allowedPageSizes array was recreated every render as a useMemo dep,
causing infinite updates. Use useRef to stabilize the reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove Set-based Zustand selectors causing infinite re-render loop

Zustand selectors returning new Set() on every call fail Object.is
equality check, triggering continuous re-renders. Replaced with
useMemo-derived Sets in consuming components.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: pin route ordering and star icon overlap in grid view

Move GET /pinned and PATCH /pinned/reorder before GET /{tree_id} to
prevent FastAPI from matching "pinned" as a UUID path parameter (422).
Relocate star button from absolute positioning into the header row to
avoid overlapping privacy icons and category badges.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: code review fixes — date calc, input validation, rate limits, shared components

- Fix monthly_reset_at crash when billing anchor day exceeds next month's length
- Add environment_tags sanitization (max 20 tags, 100 chars each) to prevent prompt injection
- Add @limiter.limit("10/minute") rate limiting to all AI endpoints
- Use getTreeNavigatePath() routing helper instead of hardcoded paths
- Extract shared CreateFlowDropdown component from QuickStartPage and TreeLibraryPage
- Clear useCachedQuota on logout to prevent stale data across user sessions
- Add useRef guard to scaffold useEffect to prevent potential double-fire
- Use node.id as React key instead of array index in BranchDetailView
- Remove redundant dead logic in ai_tree_validator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct Anthropic model ID to full dated version

claude-haiku-4-5 is not a valid model alias — Anthropic requires the
full dated model ID claude-haiku-4-5-20251001.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: strip markdown code fences from AI JSON responses

Haiku sometimes wraps its JSON in ```json ... ``` despite the prompt
instructing otherwise. Strip fences before parsing to avoid JSONDecodeError
at char 0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: increase branch_detail max_tokens to 8192 and add response logging

Truncated output at 4096 tokens produces invalid JSON mid-generation.
Also logs stop_reason and output_tokens per attempt to diagnose failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: pass explicit status='draft' when creating AI-generated flow

Tree model defaults to 'published' in the DB schema, but passing status=None
from the constructor overrides that default, causing a nullable=False violation
and a 500 on save.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: auto-advance branch detail and pin navigation bar

- Auto-advance to next undetailed branch after generation completes,
  using a useEffect that watches the count of detailed branches
- Cap tree preview at max-h-48 with internal scroll so the nav bar
  is never pushed off screen
- Make nav bar sticky bottom-0 with bg-card so it stays visible
  regardless of content height

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: increase branch retries to 3 and relax cross-reference validation on final attempt

next_node_id mismatches are a common model hallucination that the retry
prompt doesn't reliably fix. On the final (3rd) attempt, accept the branch
with strict=False so only truly fatal errors (missing fields, dead ends,
bad JSON) cause a hard failure. Cross-reference issues are minor and
fixable in the tree editor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: strengthen prompt to prevent next_node_id mismatches, keep strict validation

Rather than lowering the validation bar, improve the system prompt:
- Rule 6 now explicitly states next_node_id must match a direct child's id
- Added rule 10: build tree bottom-up to avoid forward-reference errors
- Corrective prompt now calls out the ID mismatch constraint specifically

Reverts the strict=False fallback — flows must be correct before saving.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: persist branch viewing index in store to survive phase remounts

Local useState resets to 0 every time phase transitions from 'generating'
back to 'detailing', causing the view to snap back to branch 1.

Move viewingIndex to store's currentBranchIndex (already existed) and
advance it in generateBranchDetail after success. Component reads from
store so remounts no longer lose position.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct publish validation to check title instead of action/solution fields

The publish validator was checking for an 'action' field on action nodes
and a 'solution' field on solution nodes, but the actual node schema
(confirmed from seed data and frontend types) uses 'title'/'description'.
This caused all AI-generated trees to fail publish validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: correct action node schema and improve AI flow quality

- Fix action nodes to use next_node_id (not children) for continuation,
  matching how TreeNavigationPage.tsx navigates action nodes
- Validator now requires next_node_id on all action nodes and flags
  missing ones as broken dead ends
- Update _check_branch_termination: action nodes are not dead ends since
  they continue via next_node_id (validated separately)
- Improve scaffold prompt: branch names must describe observable symptoms
  users can self-identify, not internal category names
- Update branch_detail prompt with clearer action node schema, corrected
  few-shot example showing proper next_node_id on action nodes
- Improve assemble_tree root question to be more user-facing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: add AI flow builder gotchas to CLAUDE.md (#23-25)

- Action nodes use next_node_id (not children) for navigation
- Anthropic model IDs require full dated version string
- Claude API may wrap JSON in markdown fences

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve CI lint errors and httpx dependency conflict

- Fix httpx version conflict: requirements-dev.txt now uses >=0.27.0 to match requirements.txt
- Extract CSAT helper functions to csatUtils.ts to fix react-refresh/only-export-components
- Remove default export from admin/EmptyState.tsx shim (same rule)
- Fix empty catch block in Modal.tsx (no-empty)
- Add eslint-disable comments for intentional setState-in-effect patterns in
  FlowAnalyticsPanel, QuickLaunch, NodeEditorPanel, useCachedQuota,
  MyAnalyticsPage, TeamAnalyticsPage
- Add eslint-disable comments for intentional _children destructure in NodeEditorPanel
- Fix _parentId unused var in useTreeLayout.ts
- Rewrite usePaginationParams.ts to avoid reading refs during render

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update tests to match action node schema (next_node_id, not children)

- Update _make_valid_tree() in test_ai_tree_validator to use next_node_id
  on action nodes (solution is a sibling, not a child)
- Fix test_dead_end_action_node → test_dead_end_decision_node (action nodes
  don't have child-based dead ends; dead ends are decision nodes with no children)
- Add test_action_missing_next_node_id for the new validation rule
- Update BRANCH_DETAIL_JSON in test_ai_endpoints to use next_node_id pattern
- Update test_draft_trees.py to use "title" field for action/solution nodes
  (tree_validation.py was updated this branch to require "title" not "action"/"solution")

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update remaining tests and session_to_tree for title field rename

- test_tree_validation.py: replace "action"/"solution" content fields with "title"
- test_procedural_flows.py: update solution node fixtures to use "title"
- test_save_session_as_tree.py: update fixtures and assertions for "title" field
- session_to_tree.py: generate "title" instead of "action"/"solution" on converted nodes;
  fall back to legacy field names when reading from old tree snapshots for compatibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 00:03:54 -05:00

ResolutionFlow

Take the path MOST traveled.

Project Status: 🚀 Phase 2 - Active Development

Backend: Complete and tested (18 API endpoints, 40+ integration tests) Frontend: Core features complete, Tree Editor in progress Tree Editor: Visual editor with form-based editing and live preview panel


The Problem

MSP engineers face constant context switching between diverse technical issues (file shares, server outages, VPN failures, Active Directory problems). This creates:

  • Cognitive overload: 15-25 minutes to regain focus after each context switch
  • Inconsistent documentation: Under pressure, notes are rushed or incomplete
  • Lost tribal knowledge: Best troubleshooting paths live only in senior engineers' heads
  • Repeated work: Same issues investigated from scratch each time
  • Burnout: Research shows context switching is a major contributor to burnout

The Solution

An intelligent decision tree system that:

Guides engineers through proven troubleshooting paths
Captures decisions and notes automatically as you work
Generates professional ticket documentation with one click
Builds institutional knowledge that improves over time
Reduces cognitive load during high-stress situations

Success Metric

If Michael (our primary user) uses this tool for 50% of his tickets in 3 months, we've succeeded.


Key Features

MVP (Weeks 1-3)

  • 🌳 Tree Navigation - Step-by-step guided troubleshooting
  • 📝 Automatic Notes - Capture context at each decision point
  • 📄 Export - Generate professional documentation (plain text, markdown, HTML)
  • 🔐 Multi-User - Team authentication and access control
  • 📚 Documentation Links - Contextual links to KB articles and vendor docs

Phase 2 (Weeks 4-6)

  • 👥 Team Management - Controlled authorship, shared access
  • ✏️ Tree Editor - Visual interface to create/modify decision trees
  • 📱 Mobile Responsive - Works on phone/tablet for on-site work
  • 🔀 Custom Branches - Add unique steps on-the-fly during troubleshooting
  • 🔍 Search & Categories - Find the right tree quickly

Phase 3 (Weeks 7-12)

  • 📎 Attachments - Upload screenshots, logs, command outputs
  • 💾 Offline Mode - Continue working without internet, sync when back online
  • 🏢 Client Context - Auto-fill client-specific details (server names, topologies)
  • 📧 Send to Engineer - Generate simplified checklist for onsite techs
  • 📊 Analytics - Track usage, common paths, team performance

Phase 4 (Months 4-6)

  • 🔌 API & Integrations - Connect to ConnectWise, Kaseya, LabTech
  • Automation - Execute PowerShell scripts directly from trees
  • 🏢 Enterprise Features - SSO, white-labeling, advanced RBAC
  • 🌐 Marketplace - Share and discover community-contributed trees

Tech Stack

Frontend

  • React - Modern, flexible, excellent offline support
  • Tailwind CSS - Rapid UI development
  • Service Workers - Offline capability
  • IndexedDB - Local data storage

Backend

  • Python FastAPI - Modern, fast, async support
  • SQLAlchemy - ORM with async support
  • PostgreSQL - Reliable database with excellent JSON support
  • Alembic - Database migrations

Infrastructure

  • S3-Compatible Storage - File attachments (MinIO for dev, S3/Spaces for prod)
  • Railway/Render - Simple hosting to start
  • Docker - Containerized development environment

Project Structure

troubleshooting-tree-app/
├── docs/
│   ├── 01-PROJECT-OVERVIEW.md          # Vision, goals, market analysis
│   ├── 02-TECHNICAL-ARCHITECTURE.md    # System design, data models, API specs
│   ├── 03-DEVELOPMENT-ROADMAP.md       # Phases, timeline, milestones
│   ├── 04-FEATURE-SPECIFICATIONS.md    # Detailed feature descriptions
│   └── 05-QUESTIONS-AND-ACTION-ITEMS.md # Decisions needed, next steps
├── backend/                             # Python FastAPI application (future)
├── frontend/                            # React application (future)
├── database/                            # Database schemas, migrations (future)
└── README.md                            # This file

Getting Started

For Michael (Primary User)

Immediate Action Items:

  1. Answer Key Questions (see docs/05-QUESTIONS-AND-ACTION-ITEMS.md)

    • Timeline needs
    • Budget for hosting
    • Team size
    • Branding preferences
  2. Document 5 Troubleshooting Scenarios

    • Citrix VDA Not Registering
    • FSLogix Profile Issues
    • Active Directory Replication Failure
    • SonicWall VPN Tunnel Down
    • User Unable to Access File Share

    See template in 05-QUESTIONS-AND-ACTION-ITEMS.md

  3. Provide Sample Export

    • Show how you currently write ticket notes
    • What format/level of detail is needed
  4. Review Documentation

    • Read through all docs in docs/ folder
    • Flag anything unclear or that you disagree with
    • Add your own thoughts/ideas

For Developers (Future)

Once development starts:

  1. Clone repository
  2. Set up development environment (Docker)
  3. Install dependencies
  4. Run migrations
  5. Start development servers
  6. See CONTRIBUTING.md for coding standards

Development Principles

  1. User First - Every feature must solve a real problem for Michael and his team
  2. Speed Matters - Tool must be faster than doing it manually
  3. Progressive Enhancement - Start simple, add complexity only when needed
  4. Offline Capable - Many MSP sites have poor connectivity
  5. Automation-Ready - Architecture supports future integration with scripts/tools
  6. Documentation Over Memory - Capture tribal knowledge explicitly
  7. Fail Gracefully - Never lose user's work, even if server fails

Use Cases

Scenario 1: Standard Troubleshooting

Michael gets a ticket: "User can't access file share"

  1. Opens app, selects "File Share Access Issues" tree
  2. Enters ticket number, client name
  3. Follows decision tree, making selections and adding notes
  4. Reaches resolution in 10 minutes
  5. Clicks "Export", copies formatted notes into ticket
  6. Done - professional documentation with zero extra effort

Scenario 2: Complex Multi-Step Issue

Michael troubleshooting Citrix VDA registration failure

  1. Starts with "VDA Not Registering" tree
  2. Discovers network issue, branches to "Network Connectivity" tree
  3. Finds firewall blocking traffic, attaches screenshot of rule
  4. Returns to VDA tree, continues troubleshooting
  5. Automation script restarts services, captures output
  6. VDA registers successfully
  7. Exports comprehensive notes showing entire diagnostic path

Scenario 3: Junior Engineer Learning

New engineer Sarah gets escalated Active Directory issue

  1. Selects "AD Replication Failure" tree (created by Michael)
  2. Tree guides her step-by-step with commands to run
  3. At each step, links to Microsoft Learn docs explain concepts
  4. She adds detailed notes about what she found
  5. Reaches point requiring senior help, shares session link with Michael
  6. Michael reviews her work, sees exactly what she tried
  7. Guides her through final steps over Slack
  8. Sarah learns the process, documents it properly

Scenario 4: On-Site Technician

Michael needs hands at a remote site

  1. Creates troubleshooting plan in app
  2. Clicks "Send to Engineer", generates simplified checklist
  3. Sends link to on-site tech via text
  4. Tech follows steps, checks boxes, adds photos of error messages
  5. Reports back results in real-time
  6. Michael adjusts plan remotely if needed
  7. Issue resolved with minimal back-and-forth

Why This Could Be Special

For Individual Engineers

  • Save 30+ minutes per complex ticket
  • Never lose track of troubleshooting progress
  • Professional documentation every time
  • Learn from experienced engineers' approaches
  • Build personal knowledge base over time

For MSP Teams

  • Standardize troubleshooting procedures
  • Onboard junior engineers faster
  • Capture institutional knowledge before engineers leave
  • Improve ticket documentation quality
  • Identify training gaps and common issues
  • Track team performance and efficiency

For the Market

  • 30,000+ MSPs in North America alone
  • Adjacent markets: Internal IT, DevOps, Technical Support
  • Current solutions are either too generic (flowchart tools) or too rigid (static runbooks)
  • Unique Value: Purpose-built for technical troubleshooting with automation integration

Potential Business Model

  • Free Tier: Personal use, limited trees
  • Pro Tier: $15-25/user/month - Team features, unlimited trees, analytics
  • Enterprise: Custom pricing - API, SSO, white-labeling
  • Marketplace: Revenue share on community trees
  • Professional Services: Custom tree development, training, consulting

Inspiration & Similar Tools

What Exists Today

  • ServiceNow Knowledge Base - Good for static docs, not interactive troubleshooting
  • IT Glue - Documentation repository, not a troubleshooting guide
  • Confluence Decision Trees - Generic flowcharts, not execution-focused
  • Custom Runbooks - Static, not adaptive, no automation

What We're Building

Imagine if ServiceNow Knowledge, Flowchart tools, and PowerShell automation had a baby specifically designed for MSP troubleshooting. That's this.


FAQ

Q: Why not just use a wiki or documentation system?
A: Wikis are great for reference, but they don't guide you through troubleshooting in real-time or automatically generate ticket notes from your actions.

Q: Won't creating trees take more time than just doing the work?
A: Initially, yes. But after 2-3 uses of a tree, you've saved more time than you spent creating it. Plus, the tree captures knowledge that helps the entire team.

Q: What if the tree doesn't cover my specific issue?
A: You can add custom branches on-the-fly during troubleshooting. These custom paths can then be incorporated into the tree for next time.

Q: How is this different from a flowchart tool?
A: Flowcharts are static diagrams. This is an active troubleshooting companion that captures your work and generates documentation.

Q: Can I use this offline?
A: Yes (Phase 3). Trees are cached locally, you can work offline, and changes sync when you're back online.

Q: Will this replace my ticketing system?
A: No, it complements it. You still create tickets in your PSA, but this generates the detailed notes you paste into tickets.

Q: Can I automate steps?
A: Yes (Phase 4). Integrate PowerShell scripts and other automation that can be triggered directly from decision nodes.


Contributing

This is currently a private project in planning phase. Once we move to active development, we'll create a CONTRIBUTING.md with:

  • Code of conduct
  • Development workflow
  • Coding standards
  • Testing requirements
  • PR process

Contact & Feedback

Primary User: Michael Chihlas
Project Lead: [To be determined]
Communication: [To be determined]

For questions, suggestions, or to get involved, contact Michael.


License

[To be determined]

Options being considered:

  • Open source (MIT/Apache 2.0) - maximize adoption
  • Source-available with commercial license - protect business interests
  • Proprietary - if building as commercial product

Acknowledgments

  • Research on context switching and burnout that inspired this project
  • MSP community for sharing their pain points and workflows
  • All the engineers who've struggled with documentation and wished for a better way

Roadmap at a Glance

└─ [✅ Planning] COMPLETE
   ├─ ✅ Document requirements
   ├─ ✅ Make key decisions
   └─ ✅ Setup initial architecture

└─ [✅ Phase 1: MVP] COMPLETE
   ├─ ✅ Backend API (18 endpoints)
   ├─ ✅ Tree navigation UI
   ├─ ✅ Session tracking
   └─ ✅ Export functionality

└─ [🚀 Phase 2: Team Ready] ← IN PROGRESS
   ├─ ✅ Tree Editor (form-based with preview)
   ├─ ⏳ Team management
   └─ ⏳ Mobile responsive

└─ [📋 Phase 3: Professional]
   ├─ Attachments
   ├─ Offline mode
   └─ Analytics

└─ [📋 Phase 4: Platform]
   ├─ API & integrations
   ├─ Automation
   └─ Enterprise features

Last Updated: 2026-01-28 Project Status: Phase 2 - Active Development Next Milestone: Complete Tree Editor polishing, Team management features

Description
Troubleshooting decision tree application for MSP engineers - automatically generates professional documentation from guided diagnostic workflows
Readme 16 MiB
Languages
Python 54.7%
TypeScript 43.5%
HTML 1.1%
CSS 0.6%