762 lines
75 KiB
Python
762 lines
75 KiB
Python
#!/usr/bin/env python3
|
|
"""
|
|
ResolutionFlow Decision Trees - Batch 2b: Active Directory / Entra ID
|
|
|
|
Six AD/Entra ID troubleshooting trees for MSP engineers.
|
|
Imported by seed_trees_v2.py for seeding.
|
|
"""
|
|
|
|
from typing import Any
|
|
|
|
|
|
def get_repeated_lockout_tree() -> dict[str, Any]:
|
|
"""User Account Locked Out (Repeated) - AD tree."""
|
|
return {
|
|
"name": "User Account Locked Out (Repeated)",
|
|
"description": "Investigate and resolve repeated Active Directory account lockouts. Covers lockout source identification, common causes like stale credentials, service accounts, and mobile devices, with PowerShell diagnostics.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "Is this a one-time lockout or has the user been locked out multiple times recently?",
|
|
"help_text": "Check AD account properties and recent lockout history. A single lockout is usually a forgotten password; repeated lockouts indicate a deeper issue.",
|
|
"options": [
|
|
{"id": "one_time", "label": "First or one-time lockout", "next_node_id": "simple_unlock"},
|
|
{"id": "repeated", "label": "Multiple lockouts (keeps happening)", "next_node_id": "find_lockout_source"},
|
|
{"id": "many_users", "label": "Multiple users getting locked out", "next_node_id": "check_brute_force"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "simple_unlock",
|
|
"type": "action",
|
|
"title": "Unlock Account and Verify",
|
|
"description": "Simple lockout — unlock and confirm.\n\n**PowerShell:**\n```\nUnlock-ADAccount -Identity \"username\"\nGet-ADUser -Identity \"username\" -Properties LockedOut,PasswordLastSet,PasswordExpired\n```\n\n**Ask the user:**\n- Did you recently change your password?\n- Are you typing the right password?\n- Is Caps Lock on?\n\n**If password expired:** Reset it.\n**If user forgot password:** Reset and have them set a new one at next login.",
|
|
"next_node_id": "verify_simple_unlock"
|
|
},
|
|
{
|
|
"id": "verify_simple_unlock",
|
|
"type": "decision",
|
|
"question": "Can the user log in successfully now?",
|
|
"help_text": "Have the user try logging in after the unlock",
|
|
"options": [
|
|
{"id": "success", "label": "Yes, user is logged in", "next_node_id": "solution_simple_unlock"},
|
|
{"id": "locked_again", "label": "User got locked out again within minutes", "next_node_id": "find_lockout_source"},
|
|
{"id": "wrong_password", "label": "User says password is wrong (but it's correct in AD)", "next_node_id": "check_password_sync"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "solution_simple_unlock",
|
|
"type": "solution",
|
|
"title": "Resolved: Account Unlocked",
|
|
"description": "Simple lockout resolved by unlocking the account.\n\n**Ticket Notes:** Account was locked due to failed login attempts. Unlocked via PowerShell. User confirmed successful login.\n\n**If this recurs:** Use the 'repeated lockout' path to investigate the source."
|
|
},
|
|
{
|
|
"id": "check_password_sync",
|
|
"type": "action",
|
|
"title": "Check Password Sync Status",
|
|
"description": "User's password works in AD but not at the login prompt. This may be a sync/replication issue.\n\n**Check AD replication:**\n```\nrepadmin /replsummary\nrepadmin /showrepl\n```\n\n**Check which DC the user is authenticating against:**\n```\nnltest /dsgetdc:yourdomain.local\necho %LOGONSERVER%\n```\n\n**If using Entra ID / M365:** Check if password hash sync is current in Entra Connect.\n\n**Common cause:** Password was reset on DC1 but DC2 hasn't replicated yet. User's workstation is authenticating against DC2.",
|
|
"next_node_id": "find_lockout_source"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "find_lockout_source",
|
|
"type": "action",
|
|
"title": "Identify Lockout Source Computer",
|
|
"description": "Find which computer or device is causing the lockouts.\n\n**Step 1: Find the PDC Emulator** (lockout events are forwarded here):\n```\nGet-ADDomain | Select PDCEmulator\n```\n\n**Step 2: Query lockout events on the PDC:**\n```\nGet-WinEvent -ComputerName <PDC_NAME> -FilterHashtable @{\n LogName='Security'\n Id=4740\n} -MaxEvents 20 | Where-Object {\n $_.Properties[0].Value -eq 'username'\n} | Select TimeCreated,\n @{N='User';E={$_.Properties[0].Value}},\n @{N='SourceComputer';E={$_.Properties[1].Value}}\n```\n\n**Alternative:** Use Microsoft Account Lockout Status Tool (LockoutStatus.exe) for a GUI approach.\n\n**Document:** The source computer name and timestamps.",
|
|
"next_node_id": "lockout_source_result"
|
|
},
|
|
{
|
|
"id": "lockout_source_result",
|
|
"type": "decision",
|
|
"question": "What is the lockout source?",
|
|
"help_text": "The SourceComputer field in Event 4740 tells you where the bad attempts come from",
|
|
"options": [
|
|
{"id": "user_workstation", "label": "User's own workstation", "next_node_id": "check_cached_creds_workstation"},
|
|
{"id": "mobile_device", "label": "Mobile device or Exchange/ActiveSync", "next_node_id": "check_mobile_device"},
|
|
{"id": "server", "label": "A server (file server, app server, etc.)", "next_node_id": "check_service_account"},
|
|
{"id": "multiple_sources", "label": "Multiple different source computers", "next_node_id": "check_brute_force"},
|
|
{"id": "cant_determine", "label": "Source is blank or can't determine", "next_node_id": "enable_netlogon_logging"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_cached_creds_workstation",
|
|
"type": "action",
|
|
"title": "Check for Cached/Saved Credentials on Workstation",
|
|
"description": "The user's own workstation is sending bad credentials.\n\n**Check on the user's workstation:**\n\n**1. Windows Credential Manager:**\n```\nrundll32.exe keymgr.dll, KRShowKeyMgr\n# Or: Control Panel > Credential Manager\n```\nLook for saved credentials with old passwords.\n\n**2. Mapped drives with saved credentials:**\n```\nnet use\n```\nCheck for drives mapped with explicit credentials.\n\n**3. Scheduled tasks running as the user:**\n```\nGet-ScheduledTask | Where-Object {$_.Principal.UserId -like '*username*'}\n```\n\n**4. Browser saved passwords** — check Edge, Chrome for saved domain passwords.\n\n**5. RDP saved connections** — check for .rdp files with saved credentials.",
|
|
"next_node_id": "cached_cred_result"
|
|
},
|
|
{
|
|
"id": "cached_cred_result",
|
|
"type": "decision",
|
|
"question": "Did you find stale credentials?",
|
|
"help_text": "Any saved password that doesn't match the current AD password will cause lockouts",
|
|
"options": [
|
|
{"id": "found_cred_manager", "label": "Found old entries in Credential Manager", "next_node_id": "fix_credential_manager"},
|
|
{"id": "found_mapped_drive", "label": "Found mapped drive with saved creds", "next_node_id": "fix_mapped_drives"},
|
|
{"id": "found_scheduled_task", "label": "Found scheduled task running as user", "next_node_id": "fix_scheduled_task"},
|
|
{"id": "nothing_found", "label": "Nothing obvious found", "next_node_id": "check_deeper_sources"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_credential_manager",
|
|
"type": "solution",
|
|
"title": "Resolved: Remove Stale Credential Manager Entries",
|
|
"description": "Old passwords saved in Credential Manager were causing lockouts.\n\n**Fix:**\n1. Open Credential Manager (Control Panel)\n2. Under 'Windows Credentials', find entries for your domain\n3. Remove or update entries with the correct password\n4. Restart the workstation\n5. Unlock the AD account: `Unlock-ADAccount -Identity \"username\"`\n\n**Prevention:** Educate user that after password changes, they should update saved credentials.\n\n**Ticket Notes:** Stale credentials in Credential Manager causing lockouts. Entries removed/updated."
|
|
},
|
|
{
|
|
"id": "fix_mapped_drives",
|
|
"type": "solution",
|
|
"title": "Resolved: Fix Mapped Drive Credentials",
|
|
"description": "A mapped network drive was using old credentials.\n\n**Fix:**\n```\n# Remove the problematic mapping\nnet use Z: /delete\n\n# Remap without saved credentials (will use current login)\nnet use Z: \\\\server\\share /persistent:yes\n```\n\n**Or use Group Policy** to manage drive mappings (preferred for enterprise).\n\n**After fixing:** Unlock the account and monitor for recurrence."
|
|
},
|
|
{
|
|
"id": "fix_scheduled_task",
|
|
"type": "solution",
|
|
"title": "Resolved: Fix Scheduled Task Credentials",
|
|
"description": "A scheduled task was running with the user's old password.\n\n**Fix:**\n1. Open Task Scheduler on the affected machine\n2. Find the task running as the user\n3. Update the password in the task properties\n\n**PowerShell:**\n```\nGet-ScheduledTask | Where-Object {$_.Principal.UserId -like '*username*'} | Select TaskName,TaskPath\n```\n\n**Best practice:** Scheduled tasks should use service accounts, not user accounts.\n\n**After fixing:** Unlock the account."
|
|
},
|
|
{
|
|
"id": "check_deeper_sources",
|
|
"type": "action",
|
|
"title": "Check Less Obvious Lockout Sources",
|
|
"description": "Common sources cleared. Check these less obvious causes:\n\n**1. Outlook/Teams on another device:**\nIs the user logged into Outlook or Teams on a second computer, tablet, or phone with old password?\n\n**2. WiFi authentication (802.1X):**\nIf your WiFi uses domain credentials, the saved WiFi password may be old.\n\n**3. VPN client:**\nSaved VPN credentials with old password.\n\n**4. Applications with saved logins:**\nLOB apps, web portals using Windows auth.\n\n**5. Another user's machine:**\nIs someone else trying to access a share using this person's credentials?\n\n**Ask the user:** Have you logged into any other devices recently? Changed your password recently? Using any company apps on your phone?",
|
|
"next_node_id": "escalate_persistent_lockout"
|
|
},
|
|
{
|
|
"id": "escalate_persistent_lockout",
|
|
"type": "solution",
|
|
"title": "Escalate: Persistent Lockout - Source Unknown",
|
|
"description": "Unable to identify the lockout source through standard methods.\n\n**Advanced investigation needed:**\n1. Enable detailed Netlogon logging on DCs\n2. Use network packet capture to find authentication attempts\n3. Review RADIUS/NPS logs if using 802.1X\n4. Check Entra ID sign-in logs for cloud auth attempts\n\n**Temporary workaround:**\n- Increase account lockout threshold temporarily\n- Or add user to a 'lockout exempt' fine-grained password policy (if available)\n\n**Escalate to:** Senior Systems Administrator\n**Include:** Event 4740 logs, source computers found, items already checked."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_mobile_device",
|
|
"type": "solution",
|
|
"title": "Fix Mobile Device / Exchange ActiveSync",
|
|
"description": "A mobile device (phone/tablet) is sending old credentials via ActiveSync or Outlook mobile.\n\n**Fix:**\n1. Have the user update their password on their mobile device:\n - iPhone: Settings > Passwords & Accounts > Exchange > re-enter password\n - Android: Settings > Accounts > Exchange > update password\n - Outlook Mobile: Profile > Account > re-enter password\n2. If that doesn't work, remove and re-add the email account on the device\n\n**To confirm it's ActiveSync:**\nCheck Exchange/M365 ActiveSync logs for the user:\n```\nGet-MobileDeviceStatistics -Mailbox user@domain.com | Select DeviceFriendlyName,LastSyncAttemptTime,Status\n```\n\n**After fixing:** Unlock the AD account.\n\n**Prevention:** Consider using Intune or MDM to manage device password policies."
|
|
},
|
|
{
|
|
"id": "check_service_account",
|
|
"type": "solution",
|
|
"title": "Fix Service or Application Using User Credentials",
|
|
"description": "A server or application is using this user's credentials (usually incorrectly).\n\n**Check on the source server:**\n```\n# Services running as this user\nGet-WmiObject Win32_Service | Where-Object {$_.StartName -like '*username*'} | Select Name,StartName,State\n\n# IIS App Pools\nGet-IISAppPool | Where-Object {$_.ProcessModel.UserName -like '*username*'}\n\n# Scheduled Tasks\nGet-ScheduledTask | Where-Object {$_.Principal.UserId -like '*username*'}\n\n# COM+ Applications\n# Check via Component Services (dcomcnfg)\n```\n\n**Best practice:** Services should use dedicated service accounts (preferably Managed Service Accounts), never personal user accounts.\n\n**Fix:** Update the password in the service/app or migrate to a proper service account.\n\n**After fixing:** Unlock the AD account."
|
|
},
|
|
{
|
|
"id": "check_brute_force",
|
|
"type": "action",
|
|
"title": "Investigate Potential Brute Force Attack",
|
|
"description": "Multiple users getting locked out or lockouts from many different sources could indicate an attack.\n\n**Check Security Event Log for patterns:**\n```\n# Failed logon attempts (Event 4625)\nGet-WinEvent -FilterHashtable @{LogName='Security';Id=4625} -MaxEvents 100 |\n Group-Object {$_.Properties[5].Value} | Sort Count -Descending |\n Select Count,Name -First 20\n```\n\n**Red flags:**\n- Lockouts from unknown/external IPs\n- Lockouts happening at unusual hours\n- Many accounts targeted simultaneously\n- Attempts from multiple geographic locations\n\n**If this looks like an attack:**\n1. Do NOT just unlock accounts — investigate first\n2. Check if any accounts were actually compromised\n3. Review VPN and external-facing authentication logs",
|
|
"next_node_id": "brute_force_result"
|
|
},
|
|
{
|
|
"id": "brute_force_result",
|
|
"type": "decision",
|
|
"question": "Does this appear to be a security incident?",
|
|
"help_text": "Look at the pattern of lockouts, source IPs, and timing",
|
|
"options": [
|
|
{"id": "likely_attack", "label": "Yes, appears to be an attack / security incident", "next_node_id": "escalate_security"},
|
|
{"id": "not_attack", "label": "No, appears to be a system/config issue", "next_node_id": "check_common_mass_lockout"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "escalate_security",
|
|
"type": "solution",
|
|
"title": "SECURITY INCIDENT: Escalate Immediately",
|
|
"description": "**Priority: CRITICAL — Potential security incident.**\n\n**Do NOT just unlock accounts.**\n\n**Immediate actions:**\n1. Document all affected accounts and lockout sources\n2. Check if any accounts show successful logins from suspicious IPs\n3. Preserve event logs for forensics\n4. Check if MFA was bypassed\n\n**Escalate to:** Security team / CISO immediately\n**Include:** Event log exports, list of affected accounts, source IPs, timeline\n\n**Consider:**\n- Blocking suspicious source IPs at the firewall\n- Forcing password resets for affected accounts\n- Enabling enhanced logging\n\n**Communication:** Follow your incident response plan."
|
|
},
|
|
{
|
|
"id": "check_common_mass_lockout",
|
|
"type": "solution",
|
|
"title": "Investigate Mass Lockout (Non-Security)",
|
|
"description": "Multiple users locked out but doesn't appear to be an attack.\n\n**Common causes of mass lockouts:**\n\n1. **Password policy change:** New policy locked accounts that don't comply\n2. **Application with hardcoded credentials:** An app using a shared credential that was changed\n3. **GPO change:** New GPO tightened lockout thresholds\n4. **Service account cascade:** A service account got locked, causing dependent services to fail and retry\n5. **Kerberos ticket issues:** Time sync problem between DCs and clients\n\n**Check:**\n```\n# Recent GPO changes\nGet-GPO -All | Sort ModificationTime -Descending | Select DisplayName,ModificationTime -First 10\n\n# Time sync\nw32tm /query /status\n```\n\n**Escalate to:** Senior AD Administrator with pattern analysis."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "enable_netlogon_logging",
|
|
"type": "solution",
|
|
"title": "Enable Netlogon Logging for Detailed Tracking",
|
|
"description": "Event 4740 doesn't show the source. Enable Netlogon debug logging.\n\n**On the PDC Emulator:**\n```\n# Enable Netlogon debug logging\nnltest /dbflag:0x2080ffff\n\n# Log location\n# C:\\Windows\\debug\\netlogon.log\n```\n\n**Wait for the next lockout**, then search the log:\n```\nSelect-String -Path C:\\Windows\\debug\\netlogon.log -Pattern 'username'\n```\n\n**IMPORTANT:** Disable logging after troubleshooting:\n```\nnltest /dbflag:0x0\n```\n\nNetlogon logging is verbose and can fill disk space if left on.\n\n**Escalate to:** Senior AD admin if you need help interpreting the logs."
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
|
|
def get_ad_replication_tree() -> dict[str, Any]:
|
|
"""AD Replication Failures - Systems administration tree."""
|
|
return {
|
|
"name": "AD Replication Failures",
|
|
"description": "Diagnose and resolve Active Directory replication issues between domain controllers. Covers repadmin diagnostics, common error codes, DNS dependencies, and RPC connectivity troubleshooting.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "How was the AD replication issue discovered?",
|
|
"help_text": "Replication failures can cause inconsistent data across DCs — different users see different results for passwords, group memberships, GPOs, and DNS.",
|
|
"options": [
|
|
{"id": "monitoring_alert", "label": "Monitoring alert / repadmin check", "next_node_id": "run_repl_diagnostics"},
|
|
{"id": "user_symptoms", "label": "User-reported symptoms (password not working on some PCs, etc.)", "next_node_id": "confirm_repl_issue"},
|
|
{"id": "dcdiag_failure", "label": "DCDiag reported failures", "next_node_id": "run_repl_diagnostics"},
|
|
{"id": "new_dc", "label": "New DC not replicating", "next_node_id": "check_new_dc"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "confirm_repl_issue",
|
|
"type": "action",
|
|
"title": "Confirm This Is a Replication Issue",
|
|
"description": "User symptoms may or may not be replication. Quick check:\n\n```\nrepadmin /replsummary\n```\n\nIf you see failures or large 'number of failures' counts, replication is broken.\n\n**Also try:**\n```\nrepadmin /showrepl\ndcdiag /test:replications\n```\n\n**If replication looks healthy:** The user's issue is likely something else (password reset needed, group membership change, etc.)",
|
|
"next_node_id": "repl_confirmed"
|
|
},
|
|
{
|
|
"id": "repl_confirmed",
|
|
"type": "decision",
|
|
"question": "Does repadmin /replsummary show failures?",
|
|
"help_text": "Look for non-zero failure counts and error codes",
|
|
"options": [
|
|
{"id": "yes_failures", "label": "Yes, replication failures shown", "next_node_id": "run_repl_diagnostics"},
|
|
{"id": "no_failures", "label": "No, replication looks healthy", "next_node_id": "solution_repl_healthy"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "solution_repl_healthy",
|
|
"type": "solution",
|
|
"title": "AD Replication is Healthy",
|
|
"description": "Replication is working correctly. The user's issue has a different root cause.\n\n**Common alternative causes for 'replication-like' symptoms:**\n- Password was recently changed and user hit a DC that hasn't processed it yet (wait 15 min, normal delay)\n- Group membership change (Kerberos ticket needs renewal — user must log out/in)\n- DNS stale record (different from AD replication)\n\n**Ticket Notes:** AD replication verified healthy. User issue has different root cause."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "run_repl_diagnostics",
|
|
"type": "action",
|
|
"title": "Run Detailed Replication Diagnostics",
|
|
"description": "Gather comprehensive replication status.\n\n```\n# Summary of all replication partnerships\nrepadmin /replsummary\n\n# Detailed per-DC replication status\nrepadmin /showrepl * /csv > C:\\temp\\replstatus.csv\n\n# Check for lingering objects\nrepadmin /removelingeringobjects\n\n# Full DC health check\ndcdiag /v /c /d /e /s:<DC_NAME>\n```\n\n**Key things to note:**\n- Which DCs are failing?\n- What error codes are shown?\n- How long has replication been failing?\n- Is it one-way or both directions?",
|
|
"next_node_id": "repl_error_type"
|
|
},
|
|
{
|
|
"id": "repl_error_type",
|
|
"type": "decision",
|
|
"question": "What replication error code or message do you see?",
|
|
"help_text": "Check the error code in repadmin /showrepl output",
|
|
"options": [
|
|
{"id": "rpc_error", "label": "RPC server unavailable (Error 1722)", "next_node_id": "fix_rpc"},
|
|
{"id": "dns_error", "label": "DNS lookup failure (Error 8524/8453)", "next_node_id": "fix_repl_dns"},
|
|
{"id": "access_denied", "label": "Access denied (Error 8453/5)", "next_node_id": "fix_repl_access"},
|
|
{"id": "schema_mismatch", "label": "Schema mismatch / version error", "next_node_id": "fix_schema"},
|
|
{"id": "other_error", "label": "Different error or not sure", "next_node_id": "general_repl_troubleshooting"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_rpc",
|
|
"type": "action",
|
|
"title": "Fix RPC Connectivity (Error 1722)",
|
|
"description": "AD replication uses RPC. Error 1722 means DCs can't communicate.\n\n**Test RPC connectivity:**\n```\n# Test from source DC to destination DC\nTest-NetConnection -ComputerName <TARGET_DC> -Port 135\nTest-NetConnection -ComputerName <TARGET_DC> -Port 445\n\n# Test RPC endpoint mapper\nportqry -n <TARGET_DC> -e 135\n```\n\n**Common causes:**\n- Firewall blocking RPC ports (135 + dynamic range 49152-65535)\n- DC is offline or unreachable\n- DNS returning wrong IP for the DC\n- Windows Firewall enabled with wrong rules\n\n**Check DNS resolution for the DC:**\n```\nnslookup <TARGET_DC_NAME>\nnslookup <TARGET_DC_FQDN>\n```",
|
|
"next_node_id": "rpc_result"
|
|
},
|
|
{
|
|
"id": "rpc_result",
|
|
"type": "decision",
|
|
"question": "Can you reach the target DC on port 135?",
|
|
"help_text": "Test-NetConnection result",
|
|
"options": [
|
|
{"id": "port_blocked", "label": "Port 135 blocked", "next_node_id": "escalate_rpc_firewall"},
|
|
{"id": "dc_offline", "label": "DC is completely unreachable", "next_node_id": "escalate_dc_offline"},
|
|
{"id": "port_open_still_fails", "label": "Port open but replication still fails", "next_node_id": "check_rpc_dynamic_ports"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "escalate_rpc_firewall",
|
|
"type": "solution",
|
|
"title": "Escalate: Firewall Blocking AD Replication",
|
|
"description": "A firewall is blocking RPC between DCs.\n\n**Required ports for AD replication:**\n- TCP 135 (RPC Endpoint Mapper)\n- TCP 389 (LDAP)\n- TCP 636 (LDAP SSL)\n- TCP 3268 (Global Catalog)\n- TCP 88 (Kerberos)\n- TCP 445 (SMB)\n- TCP 49152-65535 (RPC dynamic ports)\n - Or restrict to a fixed port range via registry\n\n**Escalate to:** Network team to open required ports between DCs.\n**Priority:** High — AD replication is critical infrastructure."
|
|
},
|
|
{
|
|
"id": "escalate_dc_offline",
|
|
"type": "solution",
|
|
"title": "Escalate: Domain Controller Offline",
|
|
"description": "The target DC is unreachable.\n\n**Check:**\n1. Is the server powered on? (hypervisor, iLO/iDRAC)\n2. Is the OS running? (try RDP, ping)\n3. Was it recently decommissioned?\n\n**If permanently offline:** The DC metadata needs to be cleaned from AD:\n```\nntdsutil\n metadata cleanup\n connections\n connect to server <WORKING_DC>\n quit\n select operation target\n list domains\n ...\n```\n\n**Escalate to:** Senior AD Administrator\n**Priority:** High"
|
|
},
|
|
{
|
|
"id": "check_rpc_dynamic_ports",
|
|
"type": "solution",
|
|
"title": "Check RPC Dynamic Port Range",
|
|
"description": "Port 135 is open but RPC dynamic ports may be blocked.\n\nAD replication uses dynamic RPC ports (49152-65535 by default).\n\n**To restrict to a specific range** (makes firewall rules easier):\n```\n# On each DC - set fixed RPC port range\nreg add HKLM\\SYSTEM\\CurrentControlSet\\Services\\NTDS\\Parameters /v \"TCP/IP Port\" /t REG_DWORD /d 50000\n```\nRestart the NTDS service after.\n\n**Escalate to:** Network team with the dynamic port range information."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "fix_repl_dns",
|
|
"type": "solution",
|
|
"title": "Fix DNS Issues Blocking Replication",
|
|
"description": "AD replication depends heavily on DNS. DCs find each other via SRV records.\n\n**Check DNS health:**\n```\n# Verify DC SRV records exist\nnslookup -type=srv _ldap._tcp.dc._msdcs.yourdomain.local\n\n# Re-register DC DNS records\nipconfig /registerdns\nnet stop netlogon && net start netlogon\n\n# Verify DNS on the DC\ndcdiag /test:dns /v\n```\n\n**Common causes:**\n- DC's DNS records missing or stale\n- DC pointing to wrong DNS server\n- DNS zone not replicating\n\n**Each DC should point to:** Itself and at least one other DC for DNS.\n\n**Escalate to:** DNS/AD Administrator if records are missing and won't re-register."
|
|
},
|
|
{
|
|
"id": "fix_repl_access",
|
|
"type": "solution",
|
|
"title": "Fix Access Denied Errors in Replication",
|
|
"description": "Replication is being denied — authentication or permission issue.\n\n**Common causes:**\n- Time skew between DCs (Kerberos requires <5 min difference)\n- Computer account password expired\n- Permissions removed from DC object in AD\n\n**Check time sync:**\n```\nw32tm /query /status\nw32tm /query /peers\n\n# Force time resync\nw32tm /resync /force\n```\n\n**If time is more than 5 minutes off:** Kerberos will fail. Fix time sync first.\n\n**Check secure channel:**\n```\nTest-ComputerSecureChannel -Verbose\nTest-ComputerSecureChannel -Repair\n```\n\n**Escalate to:** Senior AD Administrator if permissions or secure channel repair fails."
|
|
},
|
|
{
|
|
"id": "fix_schema",
|
|
"type": "solution",
|
|
"title": "Escalate: Schema Version Mismatch",
|
|
"description": "Schema versions don't match between DCs.\n\n**Check schema version:**\n```\nGet-ADObject (Get-ADRootDSE).schemaNamingContext -Properties objectVersion | Select objectVersion\n```\n\n**This usually happens when:** A DC was promoted or demoted improperly, or an AD upgrade (schema extension) partially completed.\n\n**This requires:** Senior AD administrator intervention. Do not attempt schema repairs without expertise.\n\n**Escalate to:** Senior AD Administrator / Directory Services specialist\n**Priority:** High — schema issues can corrupt the directory."
|
|
},
|
|
{
|
|
"id": "general_repl_troubleshooting",
|
|
"type": "solution",
|
|
"title": "General Replication Troubleshooting",
|
|
"description": "For errors not covered above, try these general steps:\n\n**1. Force replication:**\n```\nrepadmin /syncall /APed\n```\n\n**2. Check DC health:**\n```\ndcdiag /v /c\n```\n\n**3. Check event logs:**\n```\nGet-WinEvent -FilterHashtable @{LogName='Directory Service';Level=2,3} -MaxEvents 20\n```\n\n**4. Verify AD sites and subnets:**\nAD Sites and Services — are DCs in the correct sites? Are site links configured?\n\n**5. Check USN rollback:**\nIf a DC was restored from snapshot incorrectly, USN rollback can break replication permanently for that DC.\n\n**Escalate to:** Senior AD Administrator with dcdiag output and event logs."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_new_dc",
|
|
"type": "solution",
|
|
"title": "Troubleshoot New DC Not Replicating",
|
|
"description": "A newly promoted DC isn't replicating.\n\n**Check in order:**\n\n1. **DNS:** Is the new DC registered in DNS? Can it resolve other DCs?\n```\nnslookup <OTHER_DC_NAME>\ndcdiag /test:dns\n```\n\n2. **Site assignment:** Is the new DC in the correct AD site?\n Open AD Sites and Services and verify.\n\n3. **Replication partners:** Does it have replication partners?\n```\nrepadmin /showrepl <NEW_DC_NAME>\n```\n\n4. **Initial replication:** After promotion, initial replication can take time. Wait 15-30 minutes.\n\n5. **Network:** Can the new DC reach other DCs on required ports?\n\n**If still not replicating after 30 minutes:** Run `dcdiag /v` and `repadmin /showrepl` and escalate with the output."
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
|
|
def get_gpo_not_applying_tree() -> dict[str, Any]:
|
|
"""Group Policy Not Applying - AD tree."""
|
|
return {
|
|
"name": "Group Policy Not Applying",
|
|
"description": "Troubleshoot Group Policy Objects that aren't applying to users or computers. Covers GPResult diagnostics, scope filtering, WMI filters, inheritance, and common GPO processing issues.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "Is the GPO not applying to a single user/computer or multiple?",
|
|
"help_text": "This determines whether it's a scoping/targeting issue or a broader GPO infrastructure problem.",
|
|
"options": [
|
|
{"id": "single_target", "label": "Single user or computer", "next_node_id": "run_gpresult"},
|
|
{"id": "multiple_targets", "label": "Multiple users/computers", "next_node_id": "check_gpo_config"},
|
|
{"id": "new_gpo", "label": "Newly created GPO not working", "next_node_id": "check_new_gpo"},
|
|
{"id": "gpo_stopped", "label": "GPO was working but stopped", "next_node_id": "check_gpo_changes"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "run_gpresult",
|
|
"type": "action",
|
|
"title": "Run GPResult on the Affected Machine",
|
|
"description": "GPResult shows exactly which GPOs are applied and which are filtered out.\n\n**PowerShell (as Administrator on the affected machine):**\n```\n# Full HTML report (most useful)\ngpresult /h C:\\temp\\gpresult.html\nstart C:\\temp\\gpresult.html\n\n# Quick console output\ngpresult /r\n\n# For a specific user\ngpresult /user domain\\username /r\n```\n\n**Look for your GPO in the report:**\n- Is it listed under 'Applied GPOs'?\n- Is it listed under 'Denied GPOs' or 'Filtered GPOs'?\n- Is it missing entirely?",
|
|
"next_node_id": "gpresult_result"
|
|
},
|
|
{
|
|
"id": "gpresult_result",
|
|
"type": "decision",
|
|
"question": "Where does your GPO appear in the GPResult report?",
|
|
"help_text": "Check both Computer Configuration and User Configuration sections",
|
|
"options": [
|
|
{"id": "applied", "label": "GPO shows as Applied but settings not working", "next_node_id": "check_conflicting_gpo"},
|
|
{"id": "filtered_security", "label": "GPO shows as Filtered (Security)", "next_node_id": "fix_security_filtering"},
|
|
{"id": "filtered_wmi", "label": "GPO shows as Filtered (WMI)", "next_node_id": "fix_wmi_filter"},
|
|
{"id": "not_listed", "label": "GPO not listed at all", "next_node_id": "check_gpo_link"},
|
|
{"id": "denied", "label": "GPO shows as Denied", "next_node_id": "check_block_inheritance"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_conflicting_gpo",
|
|
"type": "solution",
|
|
"title": "Check for Conflicting GPO / Precedence",
|
|
"description": "GPO is applied but settings aren't taking effect. Another GPO may be overriding it.\n\n**GPO precedence (highest to lowest):**\n1. Local GPO\n2. Site GPOs\n3. Domain GPOs\n4. OU GPOs (child OU overrides parent OU)\n\n**Later-applied GPOs win** when settings conflict.\n\n**In the GPResult report:** Look for other GPOs that configure the same setting. The last one applied wins.\n\n**Also check:**\n- Is the setting under Computer or User configuration? It must match what you configured.\n- Are Preferences vs Policies confused? (Preferences can be overridden by users)\n\n**Fix:** Adjust GPO link order, use Enforced on the important GPO, or remove conflicting settings."
|
|
},
|
|
{
|
|
"id": "fix_security_filtering",
|
|
"type": "solution",
|
|
"title": "Fix Security Filtering",
|
|
"description": "GPO is filtered out by security permissions.\n\n**Check in Group Policy Management Console:**\n1. Select the GPO\n2. Check 'Security Filtering' section\n3. By default, 'Authenticated Users' should be listed\n\n**Common issues:**\n- Removed 'Authenticated Users' and added a specific group, but target isn't in that group\n- Missing 'Domain Computers' read permission (required since MS16-072 patch)\n\n**Fix for MS16-072:**\nThe GPO needs 'Domain Computers' (for computer policies) or 'Authenticated Users' with Read permission in the Delegation tab, even if security filtering targets a specific group.\n\n**GPMC:** GPO > Delegation tab > Add 'Domain Computers' with Read permission."
|
|
},
|
|
{
|
|
"id": "fix_wmi_filter",
|
|
"type": "solution",
|
|
"title": "Fix WMI Filter",
|
|
"description": "A WMI filter is preventing the GPO from applying.\n\n**Check the WMI filter query:**\nGPMC > Select GPO > WMI Filtering section — note the filter name.\nThen check: GPMC > WMI Filters > open the filter to see the query.\n\n**Test the WMI filter on the target machine:**\n```\n# Run the WMI query directly\nGet-WmiObject -Query \"SELECT * FROM Win32_OperatingSystem WHERE Version LIKE '10%'\"\n```\nIf it returns nothing, the filter is excluding this machine.\n\n**Common WMI filter issues:**\n- OS version filter excludes newer Windows versions\n- Hardware filter doesn't match (laptop vs desktop)\n- WMI repository corruption on client\n\n**Fix WMI on client:** `winmgmt /salvagerepository`"
|
|
},
|
|
{
|
|
"id": "check_gpo_link",
|
|
"type": "solution",
|
|
"title": "GPO Not Linked or Wrong OU",
|
|
"description": "GPO doesn't appear in GPResult at all — it's likely not linked to the correct OU or the object is in the wrong OU.\n\n**Check:**\n1. **Where is the user/computer in AD?**\n```\nGet-ADUser -Identity username | Select DistinguishedName\nGet-ADComputer -Identity computername | Select DistinguishedName\n```\n\n2. **Where is the GPO linked?**\nGPMC > Select GPO > check 'Scope' tab > 'Links' section\n\n3. **Does the OU match?** The GPO link OU must be the same OU (or a parent OU) where the user/computer object lives.\n\n**Common issues:**\n- Computer/user in wrong OU\n- GPO linked to wrong OU\n- GPO link is disabled (check the link status)\n\n**Fix:** Move the object to correct OU or link GPO to correct OU."
|
|
},
|
|
{
|
|
"id": "check_block_inheritance",
|
|
"type": "solution",
|
|
"title": "Check Block Inheritance / Enforced",
|
|
"description": "GPO is being denied — likely by Block Inheritance on the OU.\n\n**In GPMC:** Check the OU where the target object resides. If it has a blue exclamation mark, 'Block Inheritance' is enabled.\n\n**Options to fix:**\n1. Remove Block Inheritance on the OU (affects all GPOs)\n2. Set the GPO to 'Enforced' — this overrides Block Inheritance\n3. Link the GPO directly to the blocking OU\n\n**Use Enforced sparingly** — it overrides normal precedence and can cause unexpected behavior."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_gpo_config",
|
|
"type": "solution",
|
|
"title": "Check GPO Configuration for Multiple Targets",
|
|
"description": "GPO not applying to multiple targets. Check the GPO itself.\n\n**In GPMC:**\n1. Is the GPO link enabled? (Not disabled or unenforced)\n2. Is the GPO status correct? (Not 'All settings disabled')\n - GPO > Details tab > GPO Status\n3. Are the settings in the correct section?\n - Computer settings only apply to computer objects\n - User settings only apply to user objects\n\n**Force GP update on a test machine:**\n```\ngpupdate /force\n```\n\n**Check SYSVOL replication:**\n```\n# Compare GPO version on different DCs\nGet-GPO -Name \"Your GPO Name\" -Server DC1 | Select DisplayName,Computer,User\nGet-GPO -Name \"Your GPO Name\" -Server DC2 | Select DisplayName,Computer,User\n```\n\n**If versions differ:** SYSVOL replication (DFS-R or FRS) may be broken."
|
|
},
|
|
{
|
|
"id": "check_new_gpo",
|
|
"type": "solution",
|
|
"title": "New GPO Checklist",
|
|
"description": "Newly created GPO not working. Verify these common mistakes:\n\n**1. Is it linked?** Creating a GPO doesn't link it automatically.\n**2. Is the link enabled?** Check for the green checkmark on the link.\n**3. Security filtering:** Default is 'Authenticated Users' (correct).\n**4. Computer vs User settings:** Make sure settings are in the right section.\n**5. Loopback processing:** If applying user settings based on computer location, you need loopback processing enabled.\n**6. Replication time:** New GPO needs to replicate to all DCs. Wait 15-30 minutes.\n\n**Force update:**\n```\ngpupdate /force\ngpresult /r\n```\n\n**Still not working:** Check the GPResult report for why it's filtered."
|
|
},
|
|
{
|
|
"id": "check_gpo_changes",
|
|
"type": "solution",
|
|
"title": "Investigate GPO That Stopped Working",
|
|
"description": "GPO was working but stopped. Something changed.\n\n**Check recent changes:**\n```\n# When was the GPO last modified?\nGet-GPO -Name \"Your GPO Name\" | Select DisplayName,ModificationTime\n\n# All recently modified GPOs\nGet-GPO -All | Where-Object {$_.ModificationTime -gt (Get-Date).AddDays(-7)} | Sort ModificationTime -Descending\n```\n\n**Common causes:**\n- Someone edited the GPO and broke a setting\n- Security filtering was changed\n- WMI filter was added or modified\n- OU structure changed (objects moved)\n- SYSVOL replication broke\n- A Windows update changed how a setting works\n\n**Check SYSVOL health:**\n```\ndcdiag /test:sysvolcheck\ndcdiag /test:dfsrevent\n```\n\n**Escalate to:** Whoever manages GPOs with the modification timeline."
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
|
|
def get_entra_id_sync_tree() -> dict[str, Any]:
|
|
"""Entra ID Sync Issues (AD Connect) - Cloud identity tree."""
|
|
return {
|
|
"name": "Entra ID Sync Issues (AD Connect)",
|
|
"description": "Troubleshoot Microsoft Entra Connect (formerly Azure AD Connect) synchronization failures. Covers sync cycle errors, password hash sync, attribute conflicts, and connector space issues.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "What type of Entra ID sync issue are you experiencing?",
|
|
"help_text": "Entra Connect syncs on-premises AD objects to Entra ID (Azure AD). Issues affect M365 services, SSO, and cloud app access.",
|
|
"options": [
|
|
{"id": "sync_stopped", "label": "Sync has completely stopped", "next_node_id": "check_sync_service"},
|
|
{"id": "specific_user", "label": "Specific user/group not syncing", "next_node_id": "check_user_sync"},
|
|
{"id": "password_sync", "label": "Password changes not syncing to cloud", "next_node_id": "check_password_hash_sync"},
|
|
{"id": "export_errors", "label": "Sync errors / export failures", "next_node_id": "check_sync_errors"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_sync_service",
|
|
"type": "action",
|
|
"title": "Check Entra Connect Sync Service",
|
|
"description": "Verify the sync service is running.\n\n**On the Entra Connect server:**\n```\n# Check sync service status\nGet-Service ADSync\n\n# Check last sync time\nGet-ADSyncScheduler\n\n# Check sync cycle status\nGet-ADSyncScheduler | Select SyncCycleEnabled,NextSyncCycleStartTimeInUTC,CurrentlyEffectiveSyncCycleInterval\n```\n\n**Also check:** Entra admin center > Entra Connect > Sync status\n\n**If the service is stopped:** Start it: `Start-Service ADSync`",
|
|
"next_node_id": "sync_service_result"
|
|
},
|
|
{
|
|
"id": "sync_service_result",
|
|
"type": "decision",
|
|
"question": "What is the sync service status?",
|
|
"help_text": "Check service state and scheduler status",
|
|
"options": [
|
|
{"id": "service_stopped", "label": "ADSync service is stopped", "next_node_id": "fix_sync_service"},
|
|
{"id": "scheduler_disabled", "label": "Service running but scheduler disabled", "next_node_id": "enable_scheduler"},
|
|
{"id": "service_running", "label": "Service running, scheduler active", "next_node_id": "check_sync_errors"},
|
|
{"id": "server_unreachable", "label": "Entra Connect server is down", "next_node_id": "escalate_connect_server"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_sync_service",
|
|
"type": "action",
|
|
"title": "Start ADSync Service",
|
|
"description": "```\nStart-Service ADSync\nGet-Service ADSync\n\n# If it won't start, check event logs\nGet-WinEvent -FilterHashtable @{LogName='Application';ProviderName='ADSync'} -MaxEvents 20\n```\n\n**Common causes of service failure:**\n- SQL Server Express instance is down (ADSync uses a local SQL)\n- Disk space full on the Entra Connect server\n- Service account password changed\n- Windows update broke something\n\n**Check SQL:** `Get-Service 'ADSync' ; Get-Service MSSQL*`",
|
|
"next_node_id": "check_sync_errors"
|
|
},
|
|
{
|
|
"id": "enable_scheduler",
|
|
"type": "solution",
|
|
"title": "Re-enable Sync Scheduler",
|
|
"description": "Scheduler was disabled (commonly done during maintenance).\n\n```\n# Re-enable the scheduler\nSet-ADSyncScheduler -SyncCycleEnabled $true\n\n# Trigger an immediate sync\nStart-ADSyncSyncCycle -PolicyType Delta\n\n# Verify\nGet-ADSyncScheduler\n```\n\n**Note:** Scheduler is sometimes disabled during maintenance or troubleshooting. If someone disabled it, check if there's ongoing work before re-enabling.\n\n**Ticket Notes:** Sync scheduler was disabled. Re-enabled and triggered delta sync."
|
|
},
|
|
{
|
|
"id": "escalate_connect_server",
|
|
"type": "solution",
|
|
"title": "CRITICAL: Entra Connect Server Down",
|
|
"description": "**Priority: HIGH** — Sync will stop but existing cloud accounts continue working.\n\n**Impact:** Password changes, new users, and group changes won't sync to M365.\n\n**Immediate actions:**\n1. Check VM/server status in hypervisor\n2. Existing users can still log into M365 (cached auth)\n3. Password changes won't sync until server is back\n\n**Escalate to:** Infrastructure team to restore the server\n**Note:** If server can't be recovered, Entra Connect can be reinstalled on another server (requires config backup or reconfiguration)."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_user_sync",
|
|
"type": "action",
|
|
"title": "Check Why Specific User Isn't Syncing",
|
|
"description": "Use the Entra Connect Synchronization Service Manager or PowerShell.\n\n```\n# Search for the user in connector space\n$csUser = Get-ADSyncCSObject -ConnectorName \"yourdomain.local\" -DistinguishedName \"CN=User Name,OU=Users,DC=yourdomain,DC=local\"\n\n# Check if user is in sync scope\n# (Simpler approach - check if user exists in Entra)\nGet-AzureADUser -SearchString \"username\" | Select DisplayName,UserPrincipalName,DirSyncEnabled\n```\n\n**Common reasons a user doesn't sync:**\n- User is in an OU not selected for sync (OU filtering)\n- User is filtered by attribute-based sync rule\n- Duplicate or conflicting attribute (UPN, proxyAddress)\n- User was soft-deleted in Entra and conflicts",
|
|
"next_node_id": "user_sync_result"
|
|
},
|
|
{
|
|
"id": "user_sync_result",
|
|
"type": "decision",
|
|
"question": "Why is the user not syncing?",
|
|
"help_text": "Check Entra Connect OU filtering and sync rules",
|
|
"options": [
|
|
{"id": "wrong_ou", "label": "User is in an OU not selected for sync", "next_node_id": "fix_ou_filtering"},
|
|
{"id": "attribute_conflict", "label": "Duplicate attribute conflict (UPN, email)", "next_node_id": "fix_attribute_conflict"},
|
|
{"id": "filtered_rule", "label": "Filtered by a sync rule", "next_node_id": "fix_sync_rule"},
|
|
{"id": "unclear", "label": "Not sure why", "next_node_id": "check_sync_errors"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_ou_filtering",
|
|
"type": "solution",
|
|
"title": "Fix OU Filtering",
|
|
"description": "The user's OU is not included in the sync scope.\n\n**Options:**\n1. **Move the user** to an OU that's in sync scope\n2. **Add the OU** to the sync configuration:\n - Run the Entra Connect wizard\n - Choose 'Customize synchronization options'\n - Select the additional OU\n - Complete the wizard\n\n**After changing:** Run a delta sync:\n```\nStart-ADSyncSyncCycle -PolicyType Delta\n```\n\n**Caution:** Adding a large OU may sync many objects — verify your Entra ID license count."
|
|
},
|
|
{
|
|
"id": "fix_attribute_conflict",
|
|
"type": "solution",
|
|
"title": "Fix Duplicate Attribute Conflict",
|
|
"description": "Another object already has the same UPN or proxyAddress in Entra ID.\n\n**Check Entra admin center:**\nEntra ID > Users > search for the conflicting UPN or email.\n\n**Common conflicts:**\n- User was deleted and recreated with same UPN (soft-deleted copy still in Entra recycle bin)\n- Two AD users have the same proxyAddress/email\n- A cloud-only user exists with the same UPN\n\n**Fixes:**\n1. If soft-deleted: Permanently delete the old object in Entra recycle bin\n2. If duplicate email: Fix the duplicate in AD\n3. If cloud-only conflict: Delete the cloud user or change its UPN\n\n**After fixing:** Run delta sync: `Start-ADSyncSyncCycle -PolicyType Delta`"
|
|
},
|
|
{
|
|
"id": "fix_sync_rule",
|
|
"type": "solution",
|
|
"title": "Escalate: Custom Sync Rule Filtering",
|
|
"description": "A custom sync rule is filtering out this user.\n\n**Check sync rules:**\nOpen 'Synchronization Rules Editor' on the Entra Connect server.\n\nCustom rules are risky to modify without understanding the full sync configuration.\n\n**Escalate to:** Identity/Cloud Administrator who manages Entra Connect\n**Include:** User's DN, the sync rule name, and why the user needs to sync."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_password_hash_sync",
|
|
"type": "solution",
|
|
"title": "Troubleshoot Password Hash Sync",
|
|
"description": "Password changes in AD aren't reflecting in M365/Entra ID.\n\n**Check PHS status:**\n```\nInvoke-ADSyncDiagnostics -PasswordSync\n```\n\n**Check Event Log:**\n```\nGet-WinEvent -FilterHashtable @{LogName='Application';ProviderName='Directory Synchronization';Id=656,657} -MaxEvents 10\n```\n\n**Event 656:** Successful password sync\n**Event 657:** Failed password sync\n\n**Common causes:**\n- Password hash sync feature disabled in Entra Connect config\n- Connector account permissions changed in AD\n- Recent password change hasn't synced yet (wait for next cycle, usually 2 min)\n\n**Force immediate password sync:**\n```\nInvoke-ADSyncDiagnostics -PasswordSync\n```\n\n**If PHS is disabled:** Re-run the Entra Connect wizard and enable it.\n\n**Escalate to:** Identity Administrator if the connector account needs permission fixes."
|
|
},
|
|
{
|
|
"id": "check_sync_errors",
|
|
"type": "solution",
|
|
"title": "Review Sync Errors",
|
|
"description": "Check for export errors and sync failures.\n\n**On the Entra Connect server:**\n1. Open **Synchronization Service Manager**\n2. Check the **Operations** tab for recent sync cycles\n3. Look for 'export' operations with errors\n4. Click on the error count for details\n\n**PowerShell:**\n```\n# Get recent sync results\nGet-ADSyncRunProfileResult | Sort StartDate -Descending | Select -First 5\n\n# Check Entra portal\n# Entra admin center > Entra Connect > Sync errors\n```\n\n**Common export errors:**\n- InvalidSoftMatch: Attribute conflict in cloud\n- DataValidationFailed: Invalid characters in attributes\n- LargeObject: Object exceeds attribute size limits\n\n**Escalate to:** Identity/Cloud Administrator with the specific error details."
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
|
|
def get_domain_join_tree() -> dict[str, Any]:
|
|
"""User Cannot Join Domain - AD tree."""
|
|
return {
|
|
"name": "Computer Cannot Join Domain",
|
|
"description": "Troubleshoot domain join failures for new or reimaged computers. Covers DNS requirements, authentication issues, computer account limits, and common error codes.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "What error occurs when trying to join the domain?",
|
|
"help_text": "Try joining: System Properties > Computer Name > Change > Domain. Note the exact error message.",
|
|
"options": [
|
|
{"id": "domain_not_found", "label": "Domain could not be contacted / not found", "next_node_id": "check_dns_for_domain"},
|
|
{"id": "access_denied", "label": "Access denied / insufficient permissions", "next_node_id": "check_join_permissions"},
|
|
{"id": "account_exists", "label": "Computer account already exists", "next_node_id": "fix_existing_account"},
|
|
{"id": "other_error", "label": "Different error message", "next_node_id": "check_general_join"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_dns_for_domain",
|
|
"type": "action",
|
|
"title": "Verify DNS Can Resolve Domain Controllers",
|
|
"description": "Domain join requires DNS to find DCs. This is the most common failure.\n\n**On the computer being joined:**\n```\n# Check DNS settings\nipconfig /all\n\n# Can you resolve the domain?\nnslookup yourdomain.local\n\n# Can you find DC SRV records?\nnslookup -type=srv _ldap._tcp.dc._msdcs.yourdomain.local\n\n# Can you ping a DC?\nping <DC_HOSTNAME>\n```\n\n**The computer's DNS MUST point to an internal DNS server** that has the AD DNS zones. Public DNS (8.8.8.8) won't work for domain join.",
|
|
"next_node_id": "dns_join_result"
|
|
},
|
|
{
|
|
"id": "dns_join_result",
|
|
"type": "decision",
|
|
"question": "Can the computer resolve the domain name?",
|
|
"help_text": "nslookup should return DC IP addresses",
|
|
"options": [
|
|
{"id": "wrong_dns", "label": "DNS is pointing to wrong server (public DNS, etc.)", "next_node_id": "fix_dns_for_join"},
|
|
{"id": "dns_ok_cant_reach", "label": "DNS resolves but can't reach the DC", "next_node_id": "check_network_to_dc"},
|
|
{"id": "dns_resolves_ok", "label": "DNS resolves and can ping DC", "next_node_id": "check_join_permissions"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_dns_for_join",
|
|
"type": "action",
|
|
"title": "Set DNS to Internal DNS Servers",
|
|
"description": "The computer must use your AD DNS servers.\n\n```\n# Set DNS to your domain controllers/DNS servers\nSet-DnsClientServerAddress -InterfaceAlias 'Ethernet' -ServerAddresses '<DC1_IP>','<DC2_IP>'\n\n# Verify\nnslookup yourdomain.local\n```\n\n**If using DHCP:** The DHCP scope should be assigning internal DNS. If not, fix the DHCP scope options.\n\n**After setting DNS:** Retry the domain join.",
|
|
"next_node_id": "retry_join"
|
|
},
|
|
{
|
|
"id": "check_network_to_dc",
|
|
"type": "solution",
|
|
"title": "Check Network Connectivity to Domain Controller",
|
|
"description": "DNS resolves but can't reach the DC. Check network path.\n\n```\nTest-NetConnection -ComputerName <DC_IP> -Port 389\nTest-NetConnection -ComputerName <DC_IP> -Port 445\ntracert <DC_IP>\n```\n\n**Required ports for domain join:**\n- TCP/UDP 389 (LDAP)\n- TCP 445 (SMB)\n- TCP/UDP 88 (Kerberos)\n- TCP 135 + dynamic RPC\n- TCP/UDP 53 (DNS)\n\n**Common causes:** VLAN isolation, firewall blocking, VPN not connected.\n\n**Escalate to:** Network team if ports are blocked."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_join_permissions",
|
|
"type": "decision",
|
|
"question": "What credentials are being used to join the domain?",
|
|
"help_text": "Domain join requires specific permissions in AD",
|
|
"options": [
|
|
{"id": "regular_user", "label": "Regular domain user account", "next_node_id": "check_join_quota"},
|
|
{"id": "admin_account", "label": "Domain admin or delegated join account", "next_node_id": "check_admin_join_issue"},
|
|
{"id": "wrong_creds", "label": "Credentials might be wrong / expired", "next_node_id": "verify_credentials"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_join_quota",
|
|
"type": "solution",
|
|
"title": "Check Domain Join Quota",
|
|
"description": "Regular users can join up to **10 computers** by default (ms-DS-MachineAccountQuota).\n\n**Check current quota:**\n```\nGet-ADObject -Identity (Get-ADDomain).DistinguishedName -Properties ms-DS-MachineAccountQuota | Select ms-DS-MachineAccountQuota\n```\n\n**Check how many the user has joined:**\n```\nGet-ADComputer -Filter {ms-DS-CreatorSID -eq $((Get-ADUser username).SID)} | Measure-Object\n```\n\n**If quota exceeded:**\n1. Use a domain admin account to join instead\n2. Or pre-stage the computer account in AD (allows the user to join that specific computer)\n3. Or increase the quota (not recommended for security)\n\n**Best practice:** Pre-stage computer accounts or use a dedicated join account with delegated permissions."
|
|
},
|
|
{
|
|
"id": "check_admin_join_issue",
|
|
"type": "solution",
|
|
"title": "Admin Account Can't Join - Check OU Permissions",
|
|
"description": "Even admin accounts can fail if OU permissions are restricted.\n\n**Check:**\n1. Does a computer account already exist with this name? `Get-ADComputer -Identity \"COMPUTERNAME\"`\n2. If pre-staged, does the joining user have 'Reset Password' and 'Write Account Restrictions' on that computer object?\n3. Is the target OU restricted via delegation?\n\n**Try joining to default Computers container first:** If that works, it's an OU permissions issue.\n\n**If admin account is locked or expired:**\n```\nGet-ADUser -Identity adminaccount -Properties LockedOut,Enabled,PasswordExpired\n```"
|
|
},
|
|
{
|
|
"id": "verify_credentials",
|
|
"type": "solution",
|
|
"title": "Verify Domain Credentials",
|
|
"description": "Make sure the credentials are correct.\n\n**Use the full domain format:**\n- `DOMAIN\\username` or `username@domain.local`\n\n**Verify the account works:**\n- Try logging into another domain-joined PC\n- Or test: `runas /user:DOMAIN\\username cmd`\n\n**Check if account is locked/disabled:**\n```\nGet-ADUser -Identity username -Properties LockedOut,Enabled,PasswordExpired\n```"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "fix_existing_account",
|
|
"type": "solution",
|
|
"title": "Fix Existing Computer Account Conflict",
|
|
"description": "A computer account with this name already exists in AD.\n\n**Options:**\n1. **Delete the old account** (if the old computer is decommissioned):\n```\nRemove-ADComputer -Identity \"COMPUTERNAME\"\n```\n\n2. **Reset the old account** (allows rejoin):\n```\nReset-ComputerMachinePassword -Server <DC_NAME> -Credential (Get-Credential)\n```\n\n3. **Use a different computer name**\n\n4. **Pre-stage:** If the account was pre-staged, the joining user needs permission on that specific object.\n\n**After fixing:** Retry the domain join."
|
|
},
|
|
{
|
|
"id": "check_general_join",
|
|
"type": "solution",
|
|
"title": "General Domain Join Troubleshooting",
|
|
"description": "For other domain join errors:\n\n**Check the basics:**\n1. Time sync: Is the computer within 5 minutes of the DC?\n ```\n w32tm /query /status\n net time \\\\<DC_NAME>\n ```\n2. Network: Can you access `\\\\<DC_NAME>\\SYSVOL`?\n3. Firewall: Is Windows Firewall blocking domain traffic?\n4. Secure channel: For rejoins, try: `Test-ComputerSecureChannel -Repair`\n\n**Common error codes:**\n- 53: Network path not found (connectivity issue)\n- 1355: Domain not found (DNS issue)\n- 2224: Account already exists\n- 2691: Already joined to a domain (unjoin first)\n\n**Escalate to:** AD Administrator with the exact error code and message."
|
|
},
|
|
{
|
|
"id": "retry_join",
|
|
"type": "decision",
|
|
"question": "Did the domain join succeed after fixing DNS?",
|
|
"help_text": "Retry: System Properties > Computer Name > Change > Domain",
|
|
"options": [
|
|
{"id": "success", "label": "Yes, joined successfully", "next_node_id": "solution_joined"},
|
|
{"id": "different_error", "label": "Different error now", "next_node_id": "check_general_join"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "solution_joined",
|
|
"type": "solution",
|
|
"title": "Resolved: Computer Joined Domain",
|
|
"description": "Computer successfully joined the domain.\n\n**Post-join steps:**\n1. Restart the computer (required)\n2. Log in with domain credentials\n3. Verify Group Policy: `gpupdate /force`\n4. Move computer to correct OU if needed\n\n**Ticket Notes:** Domain join completed. Root cause was [DNS/permissions/etc]."
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
|
|
def get_kerberos_auth_tree() -> dict[str, Any]:
|
|
"""Kerberos/NTLM Authentication Failures - AD tree."""
|
|
return {
|
|
"name": "Kerberos / NTLM Authentication Failures",
|
|
"description": "Troubleshoot authentication failures including Kerberos ticket issues, NTLM fallback problems, SPN misconfigurations, and time sync issues that affect logins, file shares, and web applications.",
|
|
"category": "Active Directory",
|
|
"tree_structure": {
|
|
"id": "root",
|
|
"type": "decision",
|
|
"question": "What authentication symptom is the user experiencing?",
|
|
"help_text": "Authentication issues can manifest as login failures, access denied to resources, or double-prompts for credentials.",
|
|
"options": [
|
|
{"id": "login_failure", "label": "Can't log into Windows at all", "next_node_id": "check_dc_connectivity"},
|
|
{"id": "resource_access", "label": "Logged in but can't access file shares/apps", "next_node_id": "check_kerberos_tickets"},
|
|
{"id": "double_prompt", "label": "Gets prompted for credentials repeatedly (SSO not working)", "next_node_id": "check_spn_issues"},
|
|
{"id": "intermittent", "label": "Authentication works sometimes, fails other times", "next_node_id": "check_time_sync"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "check_dc_connectivity",
|
|
"type": "action",
|
|
"title": "Check Domain Controller Connectivity",
|
|
"description": "Windows login requires DC access for Kerberos authentication.\n\n**On the affected machine:**\n```\n# Which DC is being used?\necho %LOGONSERVER%\nnltest /dsgetdc:yourdomain.local\n\n# Can you reach a DC?\nTest-NetConnection -ComputerName <DC_NAME> -Port 88\nTest-NetConnection -ComputerName <DC_NAME> -Port 389\n```\n\n**If offline:** Windows will use cached credentials for login (if previously logged in). First-time logins require DC connectivity.\n\n**No DC available:** Check network, VPN, DNS settings.",
|
|
"next_node_id": "dc_connect_result"
|
|
},
|
|
{
|
|
"id": "dc_connect_result",
|
|
"type": "decision",
|
|
"question": "Can the machine reach a domain controller?",
|
|
"help_text": "Kerberos uses port 88, LDAP uses port 389",
|
|
"options": [
|
|
{"id": "no_dc", "label": "Can't reach any DC", "next_node_id": "fix_dc_connectivity"},
|
|
{"id": "dc_reachable", "label": "DC is reachable but login still fails", "next_node_id": "check_account_status"},
|
|
{"id": "cached_login", "label": "Can log in with cached creds only", "next_node_id": "fix_dc_connectivity"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "fix_dc_connectivity",
|
|
"type": "solution",
|
|
"title": "Restore Domain Controller Connectivity",
|
|
"description": "Machine can't reach a DC. Check:\n\n1. **Network:** Is the machine connected? `ipconfig /all`\n2. **DNS:** Pointing to internal DNS? `nslookup yourdomain.local`\n3. **VPN:** If remote, is VPN connected?\n4. **Firewall:** Ports 88, 389, 445, 135 open to DC?\n5. **DC status:** Are DCs actually online?\n\n**If VPN user:** Connect VPN first, then Ctrl+Alt+Del > Switch User > log in with domain creds (forces DC authentication).\n\n**If all DCs are down:** This is a major outage. Users can only use cached logins.\n\n**Escalate to:** Network team (if routing issue) or Infrastructure (if DC issue)."
|
|
},
|
|
{
|
|
"id": "check_account_status",
|
|
"type": "solution",
|
|
"title": "Check AD Account Status",
|
|
"description": "DC is reachable but auth fails. Check the account.\n\n```\nGet-ADUser -Identity username -Properties LockedOut,Enabled,PasswordExpired,PasswordLastSet,AccountExpirationDate\n```\n\n**Possible issues:**\n- Account locked out → Unlock it\n- Account disabled → Enable or investigate why\n- Password expired → Reset password\n- Account expired → Extend expiration date\n\n**Also check:** Is the computer's secure channel healthy?\n```\nTest-ComputerSecureChannel -Verbose\n```\nIf broken: `Test-ComputerSecureChannel -Repair -Credential (Get-Credential)`"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_kerberos_tickets",
|
|
"type": "action",
|
|
"title": "Check Kerberos Tickets",
|
|
"description": "User is logged in but can't access resources. Check Kerberos tickets.\n\n```\n# List current Kerberos tickets\nklist\n\n# Purge and get new tickets (forces re-authentication)\nklist purge\n\n# Then access the resource again — new tickets will be requested\n```\n\n**Look for:**\n- Are there valid TGT (krbtgt) tickets?\n- Are there service tickets for the resource you're accessing?\n- Have tickets expired?\n\n**If no tickets at all:** The machine may not be properly domain-joined or DC unreachable.",
|
|
"next_node_id": "ticket_result"
|
|
},
|
|
{
|
|
"id": "ticket_result",
|
|
"type": "decision",
|
|
"question": "Did purging and refreshing tickets fix the issue?",
|
|
"help_text": "After klist purge, try accessing the resource again",
|
|
"options": [
|
|
{"id": "fixed", "label": "Yes, resource access works now", "next_node_id": "solution_ticket_refresh"},
|
|
{"id": "still_fails", "label": "Still can't access the resource", "next_node_id": "check_spn_issues"},
|
|
{"id": "ntlm_fallback", "label": "Works but with credential prompt (NTLM fallback)", "next_node_id": "check_spn_issues"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "solution_ticket_refresh",
|
|
"type": "solution",
|
|
"title": "Resolved: Stale Kerberos Tickets",
|
|
"description": "Old Kerberos tickets were cached with outdated information.\n\n**Common causes:** Group membership change, password change, DC switchover.\n\n**Resolution:** Purged ticket cache with `klist purge`.\n\n**If this happens frequently:** The user may need to log out and back in after permission changes, or there may be a time sync issue.\n\n**Ticket Notes:** Stale Kerberos tickets cleared. User can access resources normally."
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "check_spn_issues",
|
|
"type": "solution",
|
|
"title": "Check SPN Configuration",
|
|
"description": "Kerberos requires correct Service Principal Names (SPNs) on the target service.\n\n**Check SPNs for a service account:**\n```\nsetspn -L <SERVICE_ACCOUNT_OR_COMPUTER>\n\n# Check for duplicate SPNs (common problem)\nsetspn -X\n```\n\n**Common SPN issues:**\n- Missing SPN: Kerberos can't find the service, falls back to NTLM\n- Duplicate SPN: Two accounts claim the same service — Kerberos fails\n- Wrong SPN format: Must match how clients access the service\n\n**Example SPNs:**\n- File share: `HOST/servername`\n- Web app: `HTTP/webapp.domain.local`\n- SQL: `MSSQLSvc/sqlserver.domain.local:1433`\n\n**Fix duplicate SPNs:** Remove the incorrect one: `setspn -D <SPN> <WRONG_ACCOUNT>`\n\n**Escalate to:** Senior AD admin for SPN changes — incorrect SPNs can break other services."
|
|
},
|
|
{
|
|
"id": "check_time_sync",
|
|
"type": "action",
|
|
"title": "Check Time Synchronization",
|
|
"description": "Kerberos requires clocks to be within 5 minutes of each other.\n\n```\n# Check current time vs DC time\nw32tm /query /status\nnet time \\\\<DC_NAME>\n\n# Check time source\nw32tm /query /source\n\n# Force time resync\nw32tm /resync /force\n\n# Check time offset\nw32tm /stripchart /computer:<DC_NAME> /samples:5\n```\n\n**If time is off by more than 5 minutes:** Kerberos authentication will fail completely.\n\n**Common causes of time drift:**\n- VM time sync disabled\n- Laptop was offline for extended period\n- NTP source unreachable\n- Hyper-V time sync conflicting with domain time",
|
|
"next_node_id": "time_result"
|
|
},
|
|
{
|
|
"id": "time_result",
|
|
"type": "decision",
|
|
"question": "Was the time more than 5 minutes off?",
|
|
"help_text": "Compare client time to DC time",
|
|
"options": [
|
|
{"id": "time_fixed", "label": "Yes, fixed time sync — auth works now", "next_node_id": "solution_time_sync"},
|
|
{"id": "time_ok", "label": "Time was fine, issue is something else", "next_node_id": "check_kerberos_tickets"}
|
|
],
|
|
"children": [
|
|
{
|
|
"id": "solution_time_sync",
|
|
"type": "solution",
|
|
"title": "Resolved: Time Sync Issue",
|
|
"description": "Kerberos was failing due to clock skew greater than 5 minutes.\n\n**Prevention:**\n- Ensure all domain members sync time from the DC\n- PDC Emulator should sync from an external NTP source\n- For VMs: Disable hypervisor time sync (use domain time hierarchy)\n\n**Verify domain time hierarchy:**\n```\nw32tm /query /source\n```\nDomain members should show a DC. PDC should show an NTP server.\n\n**Ticket Notes:** Authentication failure due to clock skew. Resynced time."
|
|
}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|