- FastAPI backend with JWT auth - PostgreSQL database schema - Trees and Sessions CRUD APIs - Export functionality (Markdown, Text, HTML) - Docker setup for local development - Alembic migrations
783 lines
32 KiB
Markdown
783 lines
32 KiB
Markdown
# Troubleshooting Scenarios for Decision Tree App
|
|
|
|
## Scenario 1: FSLogix Profile Not Loading
|
|
|
|
### Issue Details
|
|
**Issue Name:** FSLogix Profile Not Loading
|
|
**Category:** Citrix/Virtual Desktop
|
|
**Estimated Time:** 10-15 minutes
|
|
**Common For:** Warner Robins City, other Citrix environments
|
|
|
|
### First Thing You Check
|
|
Can the user log into the server at all?
|
|
|
|
### Decision Tree
|
|
|
|
**Step 1: Can user log into server?**
|
|
- **YES** → Step 2: Check FSLogix service status
|
|
- **NO** → Different tree (AD account/licensing issue)
|
|
|
|
**Step 2: Is FSLogix service running on the server?**
|
|
- **RUNNING** → Step 3: Check frxtray.exe in user's task manager
|
|
- **STOPPED** → Step 4: Start service and check event log for cause
|
|
- **STUCK (Starting/Stopping)** → Step 5: Kill service process and restart
|
|
|
|
**Step 3: Is frxtray.exe running in user's task manager?**
|
|
- **YES** → Step 6: Check if profile VHD exists in share
|
|
- **NO** → Step 7: Check FSLogix agent installation
|
|
- **MULTIPLE INSTANCES** → Step 8: Kill all frxtray.exe, log user off, try again
|
|
|
|
**Step 4: Service Start Result**
|
|
*Action: Start-Service -Name 'frxsvc'*
|
|
- **Started successfully** → Step 9: Check Event Viewer for previous failure reason
|
|
- **Failed to start** → Step 10: Check service dependencies (NetLogon, RPC)
|
|
- **Started but stopped again** → Step 11: Check for file locks or permissions
|
|
|
|
**Step 5: Service Kill and Restart**
|
|
*Action: Stop-Process -Name frxsvc -Force; Start-Service frxsvc*
|
|
- **Service now running** → Step 3: Verify frxtray.exe
|
|
- **Still stuck** → Step 12: Check for corrupt profile or registry
|
|
|
|
**Step 6: Does user have a profile VHD in the share?**
|
|
*Check: \\server\fslogix\username\Profile_username.vhdx*
|
|
- **YES, file exists** → Step 13: Check VHD file permissions
|
|
- **NO, file missing** → Step 14: Check FSLogix registry path configuration
|
|
- **YES, but 0 bytes** → Step 15: Delete corrupt VHD, recreate profile
|
|
|
|
**Step 7: Is FSLogix agent installed?**
|
|
*Check: C:\Program Files\FSLogix\Apps\frxsvc.exe exists*
|
|
- **YES** → Step 16: Repair FSLogix agent
|
|
- **NO** → Step 17: Install FSLogix agent
|
|
|
|
**Step 8: Multiple frxtray instances**
|
|
*Action: Get-Process frxtray | Stop-Process -Force*
|
|
- **Killed successfully** → Log user off, have them log back in
|
|
- **Cannot kill** → Step 18: Check for file/folder locks
|
|
|
|
**Step 9: Check Event Viewer**
|
|
*Action: Check Application log for FSLogix errors*
|
|
- **Error 50 (Can't access network path)** → Step 19: Verify network path accessible
|
|
- **Error 13 (VHD locked)** → Step 20: Check for locks on VHD from other servers
|
|
- **Error 52 (Profile path not found)** → Step 14: Check registry settings
|
|
|
|
**Step 10: Check Service Dependencies**
|
|
*Action: Get-Service NetLogon, RpcSs status*
|
|
- **All running** → Step 21: Check antivirus blocking
|
|
- **NetLogon stopped** → Start NetLogon, then retry FSLogix
|
|
- **RPC stopped** → Critical issue, escalate to senior engineer
|
|
|
|
**Step 11: Check for File Locks**
|
|
*Action: Run Chihlas file lock checker on profile share*
|
|
- **No locks** → Step 22: Check disk space on profile server
|
|
- **Locked by another server** → Step 20: Release lock or force user logoff from other session
|
|
|
|
**Step 13: Check VHD Permissions**
|
|
*Action: Get-Acl on Profile_username.vhdx*
|
|
- **User has Full Control** → Step 23: Try mounting VHD manually
|
|
- **User missing permissions** → Step 24: Grant user full control
|
|
- **Everyone has permission but still fails** → Step 25: Check parent folder permissions
|
|
|
|
**Step 14: Check FSLogix Registry Path**
|
|
*Check: HKLM\SOFTWARE\FSLogix\Profiles - VHDLocations*
|
|
- **Path is correct** → Step 26: Check DNS resolution of server name
|
|
- **Path has typo** → Fix registry path, log user off and back on
|
|
- **Path uses old server** → Update to correct server path
|
|
|
|
**Step 15: Delete Corrupt VHD**
|
|
*Action: Delete 0-byte VHD file*
|
|
- **Deleted successfully** → User will get new profile on next login
|
|
- **Cannot delete (in use)** → Step 20: Check locks, force release
|
|
|
|
**Step 17: Install FSLogix Agent**
|
|
*Action: Run FSLogix installer from network share*
|
|
- **Installed successfully** → Reboot server, have user try again
|
|
- **Installation failed** → Step 27: Check server OS version compatibility
|
|
|
|
**Step 19: Verify Network Path**
|
|
*Action: Test-Path \\server\fslogix from problem server*
|
|
- **Accessible** → Step 28: Check firewall between servers
|
|
- **Not accessible** → Check DNS, check network connectivity
|
|
- **Accessible but slow** → Step 29: Check network performance
|
|
|
|
**Step 20: Check VHD Locks**
|
|
*Action: Use openfiles /query or handle.exe to check locks*
|
|
- **Locked by same server** → Kill locking process
|
|
- **Locked by different server** → Log user off from that server
|
|
- **Lock from crashed session** → Clear stale session, release lock
|
|
|
|
**Step 21: Check Antivirus**
|
|
*Action: Check if AV is scanning/blocking FSLogix folders*
|
|
- **FSLogix folders excluded** → Step 30: Check Windows Defender exclusions too
|
|
- **Not excluded** → Add exclusions, restart FSLogix service
|
|
- **Exclusions present but still blocking** → Temporarily disable AV to test
|
|
|
|
**Step 23: Try Mounting VHD Manually**
|
|
*Action: Mount-VHD -Path \\server\fslogix\...\Profile.vhdx*
|
|
- **Mounts successfully** → Profile is good, issue elsewhere (back to Step 2)
|
|
- **Fails to mount** → Step 31: Check VHD integrity
|
|
- **Mounts but takes forever** → Step 29: Network performance issue
|
|
|
|
**Step 24: Grant User Permissions**
|
|
*Action: icacls add full control for user on VHD*
|
|
- **Permissions granted** → Have user log off and back on
|
|
- **Cannot modify permissions** → Check if admin has access, check share permissions
|
|
|
|
**Step 31: Check VHD Integrity**
|
|
*Action: Test-VHD -Path ... in PowerShell*
|
|
- **VHD is healthy** → Issue is mounting or permissions
|
|
- **VHD is corrupt** → Step 15: Delete and recreate
|
|
- **Cannot test (access denied)** → Permission issue on share
|
|
|
|
**RESOLUTION: Profile loads successfully**
|
|
|
|
### Common Pitfalls
|
|
- VHD file locked by another server (user has session on multiple servers)
|
|
- Profile path in registry has typo or uses old server name
|
|
- Antivirus blocking VHD access or scanning profile folder
|
|
- NetLogon service stopped preventing network authentication
|
|
- Disk full on profile share
|
|
- DNS not resolving profile server name
|
|
- Stale sessions from crashed RDP connections
|
|
|
|
### Resolution Indicators
|
|
- User can log in successfully
|
|
- Profile loads within 30 seconds
|
|
- No FSLogix errors in Event Viewer
|
|
- frxtray.exe running in task manager
|
|
- User's desktop, documents appear correctly
|
|
|
|
### Documentation Links
|
|
- FSLogix Profile Troubleshooting: https://docs.microsoft.com/en-us/fslogix/troubleshooting-profile-container
|
|
- Event Log Error Codes: https://docs.microsoft.com/en-us/fslogix/profile-container-configuration-reference
|
|
- VHD Troubleshooting: Internal KB #FSL-001
|
|
|
|
---
|
|
|
|
## Scenario 2: Citrix VDA Not Registering
|
|
|
|
### Issue Details
|
|
**Issue Name:** Citrix VDA Not Registering with Delivery Controller
|
|
**Category:** Citrix/Virtual Desktop
|
|
**Estimated Time:** 10-20 minutes
|
|
**Common For:** Warner Robins City, all Citrix environments
|
|
|
|
### First Thing You Check
|
|
Can you ping the VDA from the Delivery Controller?
|
|
|
|
### Decision Tree
|
|
|
|
**Step 1: Can you ping VDA from DDC?**
|
|
*Action: Test-Connection -ComputerName VDA-HOSTNAME*
|
|
- **YES (replies)** → Step 2: Check VDA service status
|
|
- **NO (request timed out)** → Step 3: Network connectivity issue
|
|
|
|
**Step 2: What is VDA service status?**
|
|
*Action: Get-Service -Name 'BrokerAgent' on VDA*
|
|
- **RUNNING** → Step 4: Check DDC connection from VDA
|
|
- **STOPPED** → Step 5: Start VDA service
|
|
- **STUCK** → Step 6: Force kill and restart service
|
|
|
|
**Step 3: Network Connectivity Issue**
|
|
*Troubleshooting network layer*
|
|
- **VDA powered off** → Power on VDA, wait for boot
|
|
- **VDA on different subnet** → Step 7: Check routing/firewall
|
|
- **DNS not resolving** → Step 8: Check DNS configuration
|
|
- **Network cable unplugged** → Physical layer issue
|
|
|
|
**Step 4: Can VDA reach DDC on port 80/443?**
|
|
*Action: Test-NetConnection -ComputerName DDC-HOSTNAME -Port 80*
|
|
- **Port 80 success** → Step 9: Check VDA registration in Studio
|
|
- **Port 80 blocked** → Step 10: Check firewall rules
|
|
- **DNS fails** → Step 8: Check DNS
|
|
|
|
**Step 5: Start VDA Service**
|
|
*Action: Start-Service -Name 'BrokerAgent'*
|
|
- **Started successfully** → Step 11: Wait 60 seconds, check registration
|
|
- **Failed to start** → Step 12: Check Event Viewer for error
|
|
- **Started then stopped** → Step 13: Check service dependencies
|
|
|
|
**Step 6: Force Kill VDA Service**
|
|
*Action: Stop-Process -Name BrokerAgent -Force*
|
|
- **Killed successfully** → Step 5: Start service normally
|
|
- **Cannot kill (access denied)** → Restart VDA server
|
|
- **Killed but immediately respawns** → Step 14: Check for loops
|
|
|
|
**Step 7: Check Routing/Firewall**
|
|
*Between VDA and DDC*
|
|
- **Different VLANs** → Verify inter-VLAN routing configured
|
|
- **SonicWall between them** → Step 15: Check SonicWall rules
|
|
- **Switches involved** → Check VLAN tagging, trunk ports
|
|
|
|
**Step 8: Check DNS Configuration**
|
|
*Action: Resolve-DnsName DDC-HOSTNAME from VDA*
|
|
- **Resolves correctly** → DNS is fine, go back to network troubleshooting
|
|
- **Does not resolve** → Step 16: Check VDA DNS server settings
|
|
- **Resolves to wrong IP** → Step 17: Check DNS A record
|
|
|
|
**Step 9: Check VDA in Citrix Studio**
|
|
*Action: Open Studio > Machine Catalogs*
|
|
- **VDA shows "Registered"** → Issue resolved!
|
|
- **VDA shows "Unregistered"** → Step 18: Check ListOfDDCs registry
|
|
- **VDA not in catalog** → Step 19: Add VDA to catalog
|
|
|
|
**Step 10: Check Firewall Rules**
|
|
*Between VDA and DDC*
|
|
- **Windows Firewall blocking** → Create rule to allow DDC traffic
|
|
- **Hardware firewall blocking** → Step 15: Update SonicWall rules
|
|
- **NSG rules (if Azure)** → Add allow rule for ports 80, 443, 1494, 2598
|
|
|
|
**Step 11: Wait and Verify Registration**
|
|
*Action: Wait 60 seconds, refresh Studio*
|
|
- **Now registered** → Resolution confirmed!
|
|
- **Still unregistered** → Step 18: Check ListOfDDCs
|
|
- **Shows error in Studio** → Step 20: Check specific error code
|
|
|
|
**Step 12: Check Event Viewer**
|
|
*Action: Application log, filter for Citrix*
|
|
- **Error 1001 (cannot contact DDC)** → Step 4: Check connectivity
|
|
- **Error 1006 (auth failure)** → Step 21: Check machine account
|
|
- **Error 1035 (database connection failed)** → Escalate to DDC troubleshooting
|
|
|
|
**Step 13: Check Service Dependencies**
|
|
*Action: Check dependent services*
|
|
- **NetLogon stopped** → Start NetLogon first
|
|
- **Remote Registry stopped** → Start Remote Registry
|
|
- **Windows Event Log stopped** → Critical, may need reboot
|
|
|
|
**Step 15: Check SonicWall Rules**
|
|
*Between VDA subnet and DDC subnet*
|
|
- **No rule exists** → Create LAN→LAN allow rule for Citrix ports
|
|
- **Rule exists but wrong ports** → Add ports 80, 443, 1494, 2598
|
|
- **Rule exists, looks correct** → Check packet capture on SonicWall
|
|
|
|
**Step 16: Check VDA DNS Settings**
|
|
*Action: Get-DnsClientServerAddress on VDA*
|
|
- **Points to wrong DNS** → Set to correct DNS server
|
|
- **Points to correct DNS** → Step 17: Check DNS server itself
|
|
- **No DNS configured** → Configure DNS, restart VDA
|
|
|
|
**Step 17: Check DNS A Record**
|
|
*On DNS server*
|
|
- **A record correct** → Clear DNS cache on VDA
|
|
- **A record wrong IP** → Update A record, clear cache
|
|
- **A record missing** → Create A record for DDC
|
|
|
|
**Step 18: Check ListOfDDCs Registry**
|
|
*Action: Check HKLM\Software\Citrix\VirtualDesktopAgent - ListOfDDCs*
|
|
- **Points to correct DDC** → Step 22: Re-register VDA manually
|
|
- **Points to old/wrong DDC** → Update registry to correct DDC name
|
|
- **Registry key missing** → Run Citrix VDA installer repair
|
|
|
|
**Step 19: Add VDA to Catalog**
|
|
*In Citrix Studio*
|
|
- **Added successfully** → VDA should register within 60 seconds
|
|
- **Cannot add (not found)** → Step 1: Network connectivity issue
|
|
- **Cannot add (duplicate)** → VDA may be in different catalog, search
|
|
|
|
**Step 21: Check Machine Account**
|
|
*In Active Directory*
|
|
- **Account exists, enabled** → Step 23: Check computer trust relationship
|
|
- **Account disabled** → Enable account, restart VDA
|
|
- **Account missing** → Re-join VDA to domain
|
|
|
|
**Step 22: Re-register VDA Manually**
|
|
*Action: Run "C:\Program Files\Citrix\Virtual Desktop Agent\BrokerAgent.exe" -RegisterWithDDC*
|
|
- **Registration successful** → Verify in Studio
|
|
- **Registration failed** → Check error message, return to Step 4
|
|
- **Command not found** → VDA install corrupted, reinstall
|
|
|
|
**Step 23: Check Computer Trust Relationship**
|
|
*Action: Test-ComputerSecureChannel on VDA*
|
|
- **Trust relationship good** → Back to Step 2
|
|
- **Trust relationship broken** → Repair: Reset-ComputerMachinePassword
|
|
- **Repair failed** → Re-join domain
|
|
|
|
**RESOLUTION: VDA shows as Registered in Studio**
|
|
|
|
### Common Pitfalls
|
|
- Firewall blocking ports 80/443 between VDA and DDC
|
|
- DNS not resolving DDC hostname
|
|
- ListOfDDCs registry points to old/decommissioned DDC
|
|
- Machine account password expired or trust relationship broken
|
|
- VDA service won't stay running due to corrupt installation
|
|
- Network routing issue between VDA and DDC subnets
|
|
- VDA trying to register to wrong DDC in multi-site setup
|
|
|
|
### Resolution Indicators
|
|
- VDA shows "Registered" in Citrix Studio
|
|
- Users can successfully launch sessions to VDA
|
|
- No Citrix errors in Event Viewer
|
|
- VDA appears in correct delivery group
|
|
|
|
### Documentation Links
|
|
- VDA Registration: https://docs.citrix.com/en-us/citrix-virtual-apps-desktops/manage-deployment/vda-registration
|
|
- Troubleshooting: https://support.citrix.com/article/CTX136668
|
|
- Event Log Errors: https://support.citrix.com/article/CTX127348
|
|
|
|
---
|
|
|
|
## Scenario 3: User Cannot Access File Share
|
|
|
|
### Issue Details
|
|
**Issue Name:** User Cannot Access Network File Share
|
|
**Category:** File Services / Permissions
|
|
**Estimated Time:** 5-15 minutes
|
|
**Common For:** All clients with file servers
|
|
|
|
### First Thing You Check
|
|
Can the user ping the file server?
|
|
|
|
### Decision Tree
|
|
|
|
**Step 1: Can user ping file server by name?**
|
|
*Action: ping FILE-SERVER-NAME*
|
|
- **YES (replies)** → Step 2: Can user access share path
|
|
- **NO (timeout/host unreachable)** → Step 3: Network connectivity issue
|
|
- **Unknown host** → Step 4: DNS resolution issue
|
|
|
|
**Step 2: Can user access \\server\share in File Explorer?**
|
|
*Action: Navigate to \\SERVER\SHARE*
|
|
- **YES, opens** → Step 5: Check specific folder permissions
|
|
- **NO, access denied** → Step 6: Check share permissions
|
|
- **NO, network path not found** → Step 7: Check SMB service
|
|
|
|
**Step 3: Network Connectivity Issue**
|
|
*Troubleshooting layer 3*
|
|
- **User on VPN** → Step 8: Check VPN tunnel status
|
|
- **User on different site** → Step 9: Check site-to-site connectivity
|
|
- **Server on different VLAN** → Check inter-VLAN routing
|
|
- **Cable unplugged** → Physical issue
|
|
|
|
**Step 4: DNS Resolution Issue**
|
|
*Action: nslookup FILE-SERVER-NAME*
|
|
- **Resolves to correct IP** → Try accessing by IP: \\192.168.1.10\share
|
|
- **Does not resolve** → Step 10: Check DNS configuration
|
|
- **Resolves to wrong IP** → Step 11: Update DNS record
|
|
|
|
**Step 5: Can user access specific folder?**
|
|
*Action: Open \\server\share\specific-folder*
|
|
- **YES** → Issue resolved!
|
|
- **NO, access denied** → Step 12: Check NTFS permissions on folder
|
|
- **Folder doesn't exist** → Verify correct path, check if moved
|
|
|
|
**Step 6: Check Share Permissions**
|
|
*Action: Right-click share > Properties > Sharing > Permissions*
|
|
- **User has Read or Change** → Step 12: Check NTFS permissions
|
|
- **User not in permissions** → Step 13: Add user to share permissions
|
|
- **Everyone has Full Control** → Share perms OK, issue is NTFS
|
|
|
|
**Step 7: Check SMB Service**
|
|
*Action: Get-Service -Name LanmanServer on file server*
|
|
- **Running** → Step 14: Check SMB signing requirements
|
|
- **Stopped** → Start service, verify user can access
|
|
- **Disabled** → Enable and start service
|
|
|
|
**Step 8: Check VPN Tunnel**
|
|
*If user is remote*
|
|
- **VPN connected** → Step 15: Check VPN routing for file server subnet
|
|
- **VPN disconnected** → Reconnect VPN, retry
|
|
- **VPN connected but can't reach internal** → Step 16: Check split tunneling
|
|
|
|
**Step 9: Site-to-Site Connectivity**
|
|
*Between user's site and file server site*
|
|
- **Ping works between sites** → Not a site link issue
|
|
- **Ping fails between sites** → Step 17: Check VPN tunnel between sites
|
|
- **Some services work, files don't** → Check port 445 specifically
|
|
|
|
**Step 10: Check User's DNS Settings**
|
|
*Action: ipconfig /all on user's PC*
|
|
- **DNS points to DC** → Step 18: Check DNS server health
|
|
- **DNS points to wrong server** → Set correct DNS via DHCP or static
|
|
- **No DNS configured** → Configure DNS
|
|
|
|
**Step 12: Check NTFS Permissions**
|
|
*Action: Right-click folder > Properties > Security*
|
|
- **User has Read & Execute** → User should have access
|
|
- **User not listed** → Step 19: Check group memberships
|
|
- **User has Deny** → Step 20: Remove explicit Deny
|
|
|
|
**Step 13: Add User to Share Permissions**
|
|
*Action: Add user or user's group with appropriate access*
|
|
- **Added successfully** → User should now be able to access
|
|
- **Cannot add (grayed out)** → Check if Advanced Sharing is needed
|
|
- **Added but still fails** → Step 12: Check NTFS permissions
|
|
|
|
**Step 14: Check SMB Signing**
|
|
*Action: Check SMB server/client signing requirements*
|
|
- **Client requires signing, server doesn't** → Enable signing on server
|
|
- **Mismatch in SMB versions** → Step 21: Enable SMB 2.0/3.0
|
|
- **Settings match** → Not SMB signing issue
|
|
|
|
**Step 15: Check VPN Routing**
|
|
*Verify file server subnet is routed through VPN*
|
|
- **Route exists** → Check firewall rules on VPN
|
|
- **Route missing** → Add route for file server subnet
|
|
- **Route exists but traffic blocked** → Step 22: Check firewall
|
|
|
|
**Step 17: Check Site-to-Site VPN**
|
|
*Between locations*
|
|
- **Tunnel up** → Step 23: Check Phase 2 includes port 445
|
|
- **Tunnel down** → Troubleshoot VPN (separate tree)
|
|
- **Tunnel flapping** → Check for routing loops
|
|
|
|
**Step 18: Check DNS Server**
|
|
*On domain controller/DNS server*
|
|
- **DNS service running** → Check if A record exists for file server
|
|
- **DNS service stopped** → Start DNS service
|
|
- **High CPU/memory** → May need DNS server restart
|
|
|
|
**Step 19: Check Group Memberships**
|
|
*Action: Check what groups user belongs to*
|
|
- **User in correct group** → Step 24: Run gpupdate to refresh token
|
|
- **User not in group** → Add user to appropriate group
|
|
- **User added recently** → User needs to log off and back on
|
|
|
|
**Step 20: Remove Explicit Deny**
|
|
*Deny permissions override all allows*
|
|
- **Deny removed** → User should now have access
|
|
- **Deny is inherited** → Step 25: Check parent folder permissions
|
|
- **Cannot remove (grayed out)** → Disable inheritance, then remove
|
|
|
|
**Step 21: Enable SMB 2.0/3.0**
|
|
*Action: Enable SMB versions on server*
|
|
- **Enabled successfully** → User should now connect
|
|
- **Already enabled** → Check Windows version compatibility
|
|
- **Cannot enable** → OS version too old, may need upgrade
|
|
|
|
**Step 24: Refresh User Token**
|
|
*Action: Have user log off and back on (or run klist purge)*
|
|
- **After logoff/logon, works** → Resolution confirmed
|
|
- **Still fails after logoff** → Step 26: Check effective permissions
|
|
|
|
**Step 26: Check Effective Permissions**
|
|
*Action: Advanced Security > Effective Access*
|
|
- **Shows user should have access** → Step 27: Check for inheritance issues
|
|
- **Shows user has no access** → Permission configuration error
|
|
- **Tool shows access but user still can't** → Clear SMB cache
|
|
|
|
**RESOLUTION: User can access share and specific folders**
|
|
|
|
### Common Pitfalls
|
|
- User has NTFS permissions but not share permissions (or vice versa)
|
|
- User added to group but hasn't logged off/on to refresh token
|
|
- Explicit Deny permission overriding Allow permissions
|
|
- DNS not resolving file server name
|
|
- Firewall blocking port 445 (SMB)
|
|
- DFS namespace issues (different issue, separate tree)
|
|
- Offline Files caching causing stale view
|
|
|
|
### Resolution Indicators
|
|
- User can open \\server\share
|
|
- User can create/modify files if they should have write access
|
|
- File Explorer shows correct folders
|
|
- No "Access Denied" or "Network Path Not Found" errors
|
|
|
|
### Documentation Links
|
|
- SMB Troubleshooting: https://docs.microsoft.com/en-us/windows-server/storage/file-server/troubleshoot/
|
|
- File Permissions: Internal KB #NTFS-PERMS-001
|
|
- DFS Issues: Internal KB #DFS-TROUBLESHOOT
|
|
|
|
---
|
|
|
|
## Scenario 4: Active Directory Replication Failure
|
|
|
|
### Issue Details
|
|
**Issue Name:** Active Directory Replication Not Working
|
|
**Category:** Active Directory / Infrastructure
|
|
**Estimated Time:** 15-30 minutes
|
|
**Common For:** Multi-DC environments, especially after DC issues
|
|
|
|
### First Thing You Check
|
|
Can the DCs ping each other?
|
|
|
|
### Decision Tree
|
|
|
|
**Step 1: Can DCs ping each other by name?**
|
|
*Action: Test-Connection between all DCs*
|
|
- **YES, all reply** → Step 2: Check replication status
|
|
- **NO, some don't reply** → Step 3: Network connectivity issue
|
|
- **Name doesn't resolve** → Step 4: DNS issue
|
|
|
|
**Step 2: What does replicadmin /showrepl show?**
|
|
*Action: repadmin /showrepl on each DC*
|
|
- **Last replication: recent (< 1 hour)** → Replication working
|
|
- **Last replication: old (> 3 hours)** → Step 5: Check for specific errors
|
|
- **Replication failing with error** → Step 6: Identify error code
|
|
|
|
**Step 3: Network Connectivity Between DCs**
|
|
*Layer 3 troubleshooting*
|
|
- **Different sites** → Step 7: Check site link configuration
|
|
- **Firewall between DCs** → Step 8: Check firewall rules
|
|
- **Same site but can't reach** → Check switches, VLANs
|
|
|
|
**Step 4: DNS Issues Between DCs**
|
|
*Action: nslookup DC-NAME from other DC*
|
|
- **Resolves correctly** → Not DNS issue, back to Step 1
|
|
- **Doesn't resolve** → Step 9: Check DNS zone replication
|
|
- **Resolves to wrong IP** → Step 10: Update DNS A record
|
|
|
|
**Step 5: Check for Specific Replication Errors**
|
|
*Review repadmin output*
|
|
- **"Last attempt was successful"** → False alarm, replication OK
|
|
- **Shows specific error code** → Step 6: Identify error code
|
|
- **No errors but time is old** → Step 11: Force replication
|
|
|
|
**Step 6: Identify Replication Error Code**
|
|
*Common error codes*
|
|
- **Error 8606 (insufficient attributes)** → Step 12: Metadata cleanup needed
|
|
- **Error 8451/8452 (naming context)** → Step 13: Name server not advertising
|
|
- **Error 1722 (RPC server unavailable)** → Step 14: RPC/firewall issue
|
|
- **Error 1256 (domain trust issue)** → Step 15: Secure channel problem
|
|
- **Error 8614 (version mismatch)** → Step 16: Schema version issue
|
|
|
|
**Step 7: Check Site Link Configuration**
|
|
*Action: Check AD Sites and Services*
|
|
- **Site link exists** → Step 17: Check site link schedule
|
|
- **No site link** → Create site link between sites
|
|
- **Link cost too high** → Adjust link cost if needed
|
|
|
|
**Step 8: Check Firewall Rules Between DCs**
|
|
*Required ports for AD replication*
|
|
- **Ports 135, 389, 636, 3268, 49152+ open** → Not firewall issue
|
|
- **Some ports blocked** → Step 18: Open required AD ports
|
|
- **All ports open but still fails** → Back to Step 6 for errors
|
|
|
|
**Step 9: Check DNS Zone Replication**
|
|
*Action: Check _msdcs zone on both DCs*
|
|
- **Zone present on both** → Step 19: Check SRV records
|
|
- **Zone missing on one DC** → Step 20: Force DNS zone replication
|
|
- **Zone present but not replicating** → Check DNS application partition
|
|
|
|
**Step 11: Force Replication**
|
|
*Action: repadmin /syncall /AdeP*
|
|
- **Replication succeeded** → Check if ongoing or one-time issue
|
|
- **Still failing** → Step 6: Check specific error
|
|
- **Partially succeeded** → Identify which DCs failing
|
|
|
|
**Step 12: Metadata Cleanup for Error 8606**
|
|
*Action: ntdsutil metadata cleanup*
|
|
- **Phantom DC found** → Remove phantom DC object
|
|
- **No phantoms** → Step 21: Check USN rollback
|
|
- **Cleanup completed** → Force replication, verify
|
|
|
|
**Step 13: Name Server Not Advertising (8451/8452)**
|
|
*DC not advertising itself properly*
|
|
- **netlogon service stopped** → Start netlogon service
|
|
- **netlogon running** → Step 22: Re-register netlogon DNS records
|
|
- **After reregister, still fails** → Check DNS zone for SRV records
|
|
|
|
**Step 14: RPC Server Unavailable (1722)**
|
|
*RPC connectivity issue*
|
|
- **Port 135 blocked** → Step 8: Open port 135
|
|
- **Port open but RPC fails** → Step 23: Check RPC service status
|
|
- **RPC service running** → Check endpoint mapper
|
|
|
|
**Step 15: Secure Channel Problem (1256)**
|
|
*Computer account trust issue*
|
|
- **Password mismatch** → Step 24: Reset computer account
|
|
- **Account locked** → Unlock computer account in AD
|
|
- **Account missing** → Serious issue, may need DC demotion/promotion
|
|
|
|
**Step 16: Schema Version Mismatch (8614)**
|
|
*Schema versions don't match*
|
|
- **One DC has older schema** → Step 25: Update schema on older DC
|
|
- **Schema versions match** → May be false positive, check metadata
|
|
|
|
**Step 17: Check Site Link Schedule**
|
|
*Action: Site link properties > Change Schedule*
|
|
- **Replication blocked in current time** → Wait or adjust schedule
|
|
- **Schedule allows replication** → Step 26: Check site link cost
|
|
- **Schedule set to never** → Configure proper schedule
|
|
|
|
**Step 18: Open Required AD Ports**
|
|
*On firewall between DCs*
|
|
- **Rules added** → Test replication after 5 minutes
|
|
- **Cannot add rules** → Escalate to network team
|
|
- **Rules exist but traffic blocked** → Check for other firewalls
|
|
|
|
**Step 19: Check SRV Records**
|
|
*Action: nslookup -type=SRV _ldap._tcp.dc._msdcs.DOMAIN*
|
|
- **Both DCs listed** → DNS is good
|
|
- **One DC missing** → Step 22: Re-register DNS
|
|
- **No DCs listed** → Critical DNS issue, Step 20
|
|
|
|
**Step 20: Force DNS Zone Replication**
|
|
*Action: repadmin /replicate for DNS partitions*
|
|
- **DNS replicated** → Verify SRV records now present
|
|
- **DNS replication failed** → Check for DNS-specific errors
|
|
- **Partial replication** → May need multiple attempts
|
|
|
|
**Step 22: Re-register Netlogon DNS Records**
|
|
*Action: nltest /dsregdns on problem DC*
|
|
- **Registration succeeded** → Check DNS for new SRV records
|
|
- **Registration failed** → Check DNS service, Event Viewer
|
|
- **Succeeded but records still missing** → Manual creation needed
|
|
|
|
**Step 23: Check RPC Service**
|
|
*Action: Get-Service RPCSS*
|
|
- **Running** → Step 27: Check RPC port range
|
|
- **Stopped** → Start RPCSS service (critical!)
|
|
- **Stuck starting** → Reboot DC (after-hours if possible)
|
|
|
|
**Step 24: Reset Computer Account**
|
|
*Action: Reset-ComputerMachinePassword -Server PDC*
|
|
- **Reset successful** → Force replication, verify
|
|
- **Reset failed** → May need to reset from authoritative DC
|
|
- **After reset, still fails** → Deeper trust issue, may need demotion
|
|
|
|
**Step 27: Check RPC Port Range**
|
|
*Action: Check dynamic port range*
|
|
- **Default range (49152-65535)** → Range is fine
|
|
- **Custom restricted range** → Step 28: Ensure both DCs use same range
|
|
- **No dynamic ports available** → Exhaustion issue, investigate
|
|
|
|
**RESOLUTION: repadmin /showrepl shows recent successful replication on all DCs**
|
|
|
|
### Common Pitfalls
|
|
- Firewall blocking high ports (49152+) needed for RPC
|
|
- DNS SRV records missing or incorrect
|
|
- Phantom domain controller objects in AD Sites and Services
|
|
- Secure channel broken between DCs
|
|
- Time skew between DCs (> 5 minutes causes Kerberos failures)
|
|
- Antivirus blocking AD replication traffic
|
|
- Incorrect site link configuration
|
|
|
|
### Resolution Indicators
|
|
- repadmin /showrepl shows successful replication within last hour
|
|
- No replication errors in Directory Services event log
|
|
- dcdiag /test:replications passes
|
|
- Changes propagate between DCs within expected timeframe
|
|
- No Event ID 2042 (too long since last replication)
|
|
|
|
### Documentation Links
|
|
- AD Replication: https://docs.microsoft.com/en-us/windows-server/identity/ad-ds/manage/troubleshoot/
|
|
- Repadmin Guide: https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/cc770963(v=ws.11)
|
|
- Error Codes: Internal KB #AD-REPL-ERRORS
|
|
|
|
---
|
|
|
|
## Scenario 5: Password Reset Request (Simple Example)
|
|
|
|
### Issue Details
|
|
**Issue Name:** User Forgot Password - Needs Reset
|
|
**Category:** Account Management
|
|
**Estimated Time:** 2-5 minutes
|
|
**Common For:** Daily helpdesk task
|
|
|
|
### First Thing You Check
|
|
Verify user's identity
|
|
|
|
### Decision Tree
|
|
|
|
**Step 1: Can you verify user's identity?**
|
|
*Check against company verification policy*
|
|
- **YES (verified via phone/email/manager)** → Step 2: Locate user account
|
|
- **NO (cannot verify)** → Deny request, inform user of verification process
|
|
- **User is contractor** → Step 3: Check if manager approval required
|
|
|
|
**Step 2: Can you find user account in AD?**
|
|
*Action: Search Active Directory for username*
|
|
- **Account found** → Step 4: Check account status
|
|
- **Account not found** → Step 5: Check if name spelled correctly
|
|
- **Multiple accounts** → Step 6: Identify correct account
|
|
|
|
**Step 3: Manager Approval for Contractor**
|
|
*Per company policy*
|
|
- **Manager approves** → Step 2: Proceed with reset
|
|
- **Manager denies** → Inform contractor, deny request
|
|
- **Cannot reach manager** → Escalate to IT manager
|
|
|
|
**Step 4: Is account enabled?**
|
|
*Check account status*
|
|
- **Enabled** → Step 7: Reset password
|
|
- **Disabled** → Step 8: Check why disabled
|
|
- **Locked out** → Step 9: Unlock and reset
|
|
|
|
**Step 5: Check Name Spelling**
|
|
*Verify with user*
|
|
- **Found with correct spelling** → Step 4: Check status
|
|
- **Still not found** → Check if account exists, may need creation
|
|
- **User doesn't have account** → Route to new user request process
|
|
|
|
**Step 6: Identify Correct Account**
|
|
*Multiple John Smiths, etc.*
|
|
- **Identified by employee ID** → Step 4: Proceed
|
|
- **Identified by department** → Step 4: Proceed
|
|
- **Cannot identify** → Ask user for more info (start date, manager, etc.)
|
|
|
|
**Step 7: Reset Password**
|
|
*Action: Set temporary password in AD*
|
|
- **Reset successful** → Step 10: Communicate new password to user
|
|
- **Cannot reset (permission denied)** → Escalate to higher-level admin
|
|
- **Reset but user still can't login** → Step 11: Check for other issues
|
|
|
|
**Step 8: Account Disabled - Check Why**
|
|
*Look at account notes or ticket history*
|
|
- **Disabled for termination** → Do not enable, inform requester
|
|
- **Disabled for inactivity** → Step 12: Verify if user still employed
|
|
- **Disabled in error** → Enable account and reset password
|
|
|
|
**Step 9: Unlock Account**
|
|
*Action: Unlock account in AD*
|
|
- **Unlocked successfully** → Step 7: Reset password
|
|
- **Unlock failed** → Wait 15 minutes (lockout duration), try again
|
|
- **Immediately locks again** → Step 13: Check for automated login attempts
|
|
|
|
**Step 10: Communicate New Password**
|
|
*Securely provide temp password*
|
|
- **Told user over phone** → Instruct user must change at login
|
|
- **Sent via secure portal** → Provide portal link
|
|
- **User received password** → Step 14: Verify user can login
|
|
|
|
**Step 11: Reset Success But Login Failed**
|
|
*After reset, user still can't login*
|
|
- **Wrong username** → Provide correct username
|
|
- **Caps Lock on** → Inform user
|
|
- **Password not synced yet** → Wait 2-3 minutes, retry
|
|
- **MFA issue** → Different troubleshooting path
|
|
|
|
**Step 12: Verify User Still Employed**
|
|
*Check with HR or manager*
|
|
- **Still employed** → Enable account, reset password
|
|
- **Terminated** → Do not enable, close ticket
|
|
- **Unknown status** → Escalate to IT manager
|
|
|
|
**Step 13: Check for Automated Login Attempts**
|
|
*Saved credentials somewhere*
|
|
- **Old laptop auto-logging** → Have user change password on laptop
|
|
- **Mobile device** → Remove saved password on phone
|
|
- **Service account** → Update service account password
|
|
- **Can't identify source** → Change password multiple times
|
|
|
|
**Step 14: Verify User Can Login**
|
|
*Confirm with user*
|
|
- **Login successful** → Step 15: Set user must change password
|
|
- **Still cannot login** → Return to Step 11
|
|
- **Login works but can't access email** → Different issue
|
|
|
|
**Step 15: Force Password Change at Next Login**
|
|
*If not already set*
|
|
- **User will be prompted** → Document ticket, close
|
|
- **User successfully changed** → Resolution confirmed!
|
|
- **User locked out again** → May be complexity requirement issue
|
|
|
|
**RESOLUTION: User successfully logged in with new password**
|
|
|
|
### Common Pitfalls
|
|
- Not verifying user identity properly
|
|
- Forgetting to check if account is locked (not just disabled)
|
|
- Not telling user to change password at next login
|
|
- Multiple accounts for same name, resetting wrong one
|
|
- Account syncs slowly to other systems (email, VPN, etc.)
|
|
- User typing username incorrectly after reset
|
|
|
|
### Resolution Indicators
|
|
- User confirms successful login
|
|
- Account shows last login timestamp updated
|
|
- No subsequent lockout or password reset requests
|
|
- User able to access all required systems
|
|
|
|
### Documentation Links
|
|
- Password Policy: Internal KB #PWD-POLICY
|
|
- Identity Verification: Internal KB #ID-VERIFY
|
|
- Account Management: Internal KB #AD-ACCOUNTS |