diff --git a/TSG/EnvironmentValidator/README.md b/TSG/EnvironmentValidator/README.md
index 72d594c..11611c1 100644
--- a/TSG/EnvironmentValidator/README.md
+++ b/TSG/EnvironmentValidator/README.md
@@ -5,6 +5,7 @@ This folder contains the TSG's related to Environment Validators.
* [Troubleshooting External Connectivity Failures in Environment Checker](./Troubleshooting-External-Connectivity-Failures-in-Environment-Checker.md)
* [Troubleshooting Test NetAdapter API Failure](./Troubleshooting-Test-NetAdapter-API.md)
* [Troubleshooting Test PhysicalDisk API Failure](./Troubleshooting-Test-PhysicalDisk-API.md)
+* [Troubleshooting Test System Drive Free Space](./Troubleshooting-Test-SystemDrive-Free-Space.md)
* [Troubleshooting TestPowerShell Module Version](./Troubleshooting-Test-PowerShell-Module-Version.md)
* [Troubleshooting Module Versions](Troubleshooting-Module-Versions.md)
* [Troubleshooting MSI Does Not Have Access to Subscription](Troubleshooting-MSI-Does-Not-Have-Access-To-Subscription.md)
diff --git a/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md
new file mode 100644
index 0000000..82ac1c9
--- /dev/null
+++ b/TSG/EnvironmentValidator/Troubleshooting-Test-SystemDrive-Free-Space.md
@@ -0,0 +1,393 @@
+# AzStackHci_Hardware_Test_SystemDrive_Free_Space
+
+
+
+ | Name |
+ AzStackHci_Hardware_Test_SystemDrive_Free_Space |
+
+
+ | Telemetry / health-scanner name |
+ AzStackHci_Hardware_SystemDriveFreeSpace (same check; this is the name used in Azure telemetry and the health-fault scanner) |
+
+
+ | Display name |
+ Test System Drive Free Space |
+
+
+ | Component |
+ Hardware (Environment Validator / Environment Checker) |
+
+
+ | Severity |
+ Critical: this validator blocks deployment and update operations until the machine is back above the minimum. |
+
+
+ | Required free space |
+ 30 GB on the system drive (C:) of every machine. |
+
+
+ | Applicable Scenarios |
+ Deployment, Add Node, and Update / Upgrade (pre-update health check). |
+
+
+ | Affected Versions |
+ Azure Local, version 23H2 and later. |
+
+
+
+## Overview
+
+This validator checks that the system drive (the `C:` drive) on each Azure Local
+machine has enough free space for the platform to operate and to install updates.
+It fails when free space on `C:` drops below the required minimum of **30 GB** on
+any machine in the cluster.
+
+A low system drive is a real problem, not just a warning. While the check is
+failing:
+
+- Solution (Azure Local) updates and upgrades are blocked at pre-update
+ validation, so the cluster cannot be patched.
+- Adding a machine to the cluster can fail validation.
+- New Arc and Kubernetes extensions may fail to deploy.
+- Existing workloads keep running, but the machine cannot be lifecycle-managed
+ reliably, and a drive that fills to zero can destabilize the node.
+
+## Where this failure appears
+
+You can see this failure in two places, the Azure portal and the machine itself.
+Both show the same underlying result.
+
+### In the Azure portal
+
+The check runs as part of the update readiness and system health checks, so it
+shows up in Azure Update Manager:
+
+1. Go to **Azure Update Manager > Resources > Azure Local**, or open the **Azure
+ Local** resource and its **Updates** page.
+2. In the system list, select the **Update readiness** status. A system that needs
+ attention shows a **Critical** or **Warning** state.
+3. Review the list of readiness checks. On current builds this appears as **System
+ Drive Free Space** (earlier builds: **Test System Drive Free Space**).
+4. Select the link under **Details**. The details pane shows the per-machine
+ results and a **Remediation** link (`https://aka.ms/hci-envch`).
+
+The portal does not show the raw JSON shown below. It renders the same result as a
+row in the readiness check list, with the display name, the Critical severity, the
+affected machine, and the remediation link.
+
+This check is reported in two scenarios, and the results can differ between them
+because each uses a different version of the validation logic:
+
+- **System health checks**, which run once every 24 hours.
+- **Update readiness checks**, which run after the update content is downloaded
+ and before installation.
+
+### On the machine
+
+Two on-box sources carry the result.
+
+**Event log (per machine).** The Environment Checker writes every check result to
+the **AzStackHciEnvironmentChecker** event log, located at
+`C:\Windows\System32\winevt\Logs\AzStackHciEnvironmentChecker.evtx`. Each result is
+the JSON body of an **Event ID 17205** entry. To read this check's most recent
+result on a machine:
+
+```powershell
+Get-WinEvent -LogName AzStackHciEnvironmentChecker -FilterXPath '*[System[(EventID=17205)]]' -MaxEvents 2000 |
+ Where-Object { $_.Message -match 'AzStackHci_Hardware_(Test_SystemDrive_Free_Space|SystemDriveFreeSpace)' } |
+ Select-Object -First 1 -ExpandProperty Message
+```
+
+**Pre-update health check result file (cluster-wide).** The pre-update health
+check writes its full result set to the cluster infrastructure share:
+
+```
+C:\ClusterStorage\Infrastructure_1\Shares\SU1_Infrastructure_1\Updates\HealthCheck\System\HealthCheckResult.EnvironmentChecker..json
+```
+
+This file is on cluster storage, so it is the same from any machine in the
+cluster. The newest `HealthCheckResult.EnvironmentChecker.*.json` holds the latest
+run. (A separate `HealthCheckResult.CheckCloudHealth.*.json` covers other checks
+and does not contain this one.)
+
+In both sources the result for this check looks like this:
+
+```json
+{
+ "Name": "AzStackHci_Hardware_SystemDriveFreeSpace",
+ "DisplayName": "System Drive Free Space",
+ "Title": "System Drive Free Space",
+ "Severity": "Critical",
+ "Status": "FAILURE",
+ "Description": "Checking System Drive Free Space",
+ "TargetResourceType": "Disk",
+ "TargetResourceName": "Machine: AzL-Node-01, Class: Disk, DriveLetter: C:",
+ "Remediation": "https://aka.ms/hci-envch",
+ "AdditionalData": {
+ "Detail": "Checking Hostname AzL-Node-01 for free space on root folder path 'C:' 25 GB. Expected at least 30 GB.",
+ "Status": "FAILURE",
+ "Resource": "AzL-Node-01"
+ }
+}
+```
+
+> [!NOTE]
+> The `Name`, `DisplayName`, and `Title` vary by build. Current builds emit the
+> telemetry / health-scanner name shown above (`AzStackHci_Hardware_SystemDriveFreeSpace` /
+> `System Drive Free Space`); earlier builds emit the env-checker name
+> (`AzStackHci_Hardware_Test_SystemDrive_Free_Space` / `Test System Drive Free Space`). The
+> `Detail` line is identical on both, and the `Get-WinEvent` filter above matches either name.
+
+The `Detail` line is the key part. It names the machine (`AzL-Node-01` above), the
+free space it found (25 GB), and the minimum it expected (30 GB). A passing result
+has `Status` of `0` or `SUCCESS`; a failing result has a non-zero status or
+`FAILURE`.
+
+## Requirements
+
+1. Each Azure Local machine must have at least **30 GB** free on its system drive
+ (`C:`).
+2. You run the steps below on the affected machine, signed in as an administrator,
+ in a PowerShell session.
+
+## Troubleshooting Steps
+
+### 1. Confirm which machine is low
+
+Check the free space directly on each machine:
+
+```powershell
+Get-PSDrive C | Select-Object @{n='FreeGB';e={[math]::Round($_.Free/1GB,1)}},
+ @{n='UsedGB';e={[math]::Round($_.Used/1GB,1)}}
+```
+
+If `FreeGB` is below 30, this check will fail on that machine. To check every
+machine in the cluster at once:
+
+```powershell
+Invoke-Command -ComputerName (Get-ClusterNode).Name -ScriptBlock {
+ [pscustomobject]@{ Node = $env:COMPUTERNAME
+ FreeGB = [math]::Round((Get-PSDrive C).Free/1GB,1) }
+} | Sort-Object FreeGB
+```
+
+### 2. Find what is using the system drive
+
+Before deleting anything, see where the space went. These commands are read-only.
+
+```powershell
+# Largest top-level folders on C: (this recursive scan can take a minute or two)
+Get-ChildItem C:\ -Directory -Force -ErrorAction SilentlyContinue | ForEach-Object {
+ $b = (Get-ChildItem $_.FullName -Recurse -File -Force -ErrorAction SilentlyContinue |
+ Measure-Object Length -Sum).Sum
+ [pscustomobject]@{ Folder = $_.Name; GB = [math]::Round($b/1GB,2) }
+} | Sort-Object GB -Descending | Select-Object -First 12
+
+# How much the Windows component store (WinSxS) can reclaim
+Dism.exe /Online /Cleanup-Image /AnalyzeComponentStore
+
+# Largest Windows event logs
+Get-ChildItem C:\Windows\System32\winevt\Logs -File | Sort-Object Length -Descending |
+ Select-Object -First 8 Name, @{n='GB';e={[math]::Round($_.Length/1GB,2)}}
+```
+
+On an Azure Local machine the usual large consumers are the Windows folder
+(including the WinSxS component store), the monitoring agent cache
+(`C:\GMACache`), Windows event logs, and the Windows Update download cache.
+
+One specific cause worth ruling out is leftover Environment Checker package folders
+piling up under the orchestrator's temp directory. If you see many folders there,
+follow the dedicated guide:
+[Known Issue: High Disk Space Usage in TEMP](./Known-Issue-High-Disk-Space-usage-in-TEMP.md).
+
+### 3. Reclaim space safely
+
+Work top to bottom. Tier 1 is safe and Microsoft-supported. Stop once the machine
+is back above 30 GB free with some margin.
+
+**Production safety at a glance.** None of the steps below require cluster downtime
+or a reboot. A few need light coordination:
+
+| Action | Safe while fully in production? |
+| --- | --- |
+| Tier 1a: WinSxS component cleanup | Yes, no reboot. It is IO and CPU intensive and can take several minutes, so prefer a quieter period. |
+| Tier 1b: clear Windows Update cache | Yes, but not while a solution update or upgrade is in progress, because it briefly stops the Windows Update and BITS services. |
+| Tier 1c: remove crash dumps | Yes, deletes files only. |
+| Tier 1d: clear temporary files | Yes, deletes files only. |
+| Tier 2: clear large event logs | Yes for uptime, but this erases diagnostic and audit history, and clearing the Security log has compliance implications. Export first. |
+| Tier 3: platform-managed areas | Do not delete. Fixing the cause has no workload impact. |
+
+If a machine is already near zero free space and at risk of dropping out of the
+cluster, treat that one machine as a maintenance action: pause and drain it first
+so its workloads move to other machines, then clean up, then resume. The cluster
+stays in production throughout, because the workloads live-migrate.
+
+```powershell
+Suspend-ClusterNode -Name -Drain # move workloads off this machine
+# ... run the cleanup steps below ...
+Resume-ClusterNode -Name # return the machine to service
+```
+
+#### Tier 1: safe to reclaim now
+
+**a. Clean the Windows component store (WinSxS).** This removes superseded update
+components and is fully supported. It is usually the largest safe win. Safe to run
+while fully in production with no reboot; it is IO and CPU intensive and can take
+several minutes to complete, so prefer a quieter period. A small number of packages
+can need a reboot to finish, so if the analysis still reports reclaimable packages
+afterward, a maintenance reboot completes the cleanup.
+
+```powershell
+Dism.exe /Online /Cleanup-Image /StartComponentCleanup
+```
+
+**b. Clear the Windows Update download cache.** Safe to clear; Windows re-downloads
+what it needs. Do not run this while a solution update or upgrade is in progress,
+because it briefly stops the Windows Update (`wuauserv`) and BITS services. Outside
+an active update there is no workload impact.
+
+```powershell
+Stop-Service wuauserv, bits -ErrorAction Stop
+try {
+ Remove-Item 'C:\Windows\SoftwareDistribution\Download\*' -Recurse -Force -ErrorAction SilentlyContinue
+} finally {
+ Start-Service wuauserv, bits
+}
+```
+
+**c. Remove crash dumps.** Collect them first only if you have an open support case
+that needs them. Safe in production; this deletes files only.
+
+```powershell
+Remove-Item C:\Windows\MEMORY.DMP -Force -ErrorAction SilentlyContinue
+Remove-Item C:\Windows\Minidump\* -Force -ErrorAction SilentlyContinue
+Remove-Item C:\Windows\LiveKernelReports\* -Recurse -Force -ErrorAction SilentlyContinue
+Remove-Item "$env:ProgramData\Microsoft\Windows\WER\ReportQueue\*" -Recurse -Force -ErrorAction SilentlyContinue
+```
+
+**d. Clear temporary files.** Safe in production; this deletes files only, and
+files in use are skipped.
+
+```powershell
+Remove-Item C:\Windows\Temp\* -Recurse -Force -ErrorAction SilentlyContinue
+Remove-Item $env:TEMP\* -Recurse -Force -ErrorAction SilentlyContinue
+```
+
+#### Tier 2: diagnostic logs (reclaim with care)
+
+Large event logs such as `Microsoft-Windows-FailoverClustering%4Diagnostic` and
+`Security` can each be 1 GB or more. They hold troubleshooting history and they
+regrow to their configured maximum size, so clearing them is a temporary gain.
+Clearing a log needs no reboot or downtime, but it erases troubleshooting and audit
+history, and clearing the Security log has compliance implications, so treat it as
+a data-retention decision.
+
+If you do not need the history, clear the log directly:
+
+```powershell
+wevtutil clear-log 'Microsoft-Windows-FailoverClustering/Diagnostic' # example
+```
+
+If you want to keep the history, export first, then clear. Write the export to a
+volume other than `C:` or to a network share, because the export is the same size
+as the log (often 1 GB or more), so writing it to `C:` would consume the very space
+you are trying to reclaim. Delete the export once you confirm you no longer need it.
+
+```powershell
+$log = 'Microsoft-Windows-FailoverClustering/Diagnostic' # example
+$dest = '' # e.g. E:\logbackup or \\server\share (must not be C:)
+New-Item -ItemType Directory $dest -Force | Out-Null
+wevtutil export-log $log (Join-Path $dest (($log -replace '/','_') + '.evtx')) /overwrite:true
+wevtutil clear-log $log
+```
+
+Do not disable or permanently shrink platform diagnostic logs without guidance,
+because they are needed to investigate cluster issues.
+
+#### Tier 3: platform-managed areas (do not delete; find the cause)
+
+Some large folders are managed by the platform. Deleting them can break monitoring
+or updates, and it does not fix the underlying cause.
+
+- **`C:\GMACache` (monitoring agent cache).** A large `GMACache`, especially
+ `GMACache\TelemetryCache`, usually means the machine cannot upload telemetry to
+ Azure, so the data backs up on disk. The fix is to restore outbound connectivity
+ and the Arc connection so the cache drains on its own. Do not delete the cache to
+ free space; that loses buffered data, and the folder simply refills while
+ connectivity is broken.
+- **`C:\Observability`, `C:\NugetStore`, `C:\ImageComposition`, `C:\CloudContent`,
+ `C:\Agents`.** These hold platform logs, solution packages, and update content.
+ They are managed and rotated automatically. Do not delete them. If one of them is
+ unusually large, open a support case rather than removing files.
+
+### 4. Verify the fix
+
+First confirm the machine is back above the minimum:
+
+```powershell
+Get-PSDrive C | Select-Object @{n='FreeGB';e={[math]::Round($_.Free/1GB,1)}}
+```
+
+Then re-validate. You have two options.
+
+**Fast: run just this one validator.** The Environment Checker module ships on every
+Azure Local machine, so you can run this single hardware check directly and get a
+result back in a few seconds, without running the full pre-update health check:
+
+```powershell
+$r = Invoke-AzStackHciHardwareValidation -Include Test-SystemDriveFreeSpace -PassThru
+$r | Select-Object Name, Status, Severity
+$r.AdditionalData.Detail
+```
+
+A healthy machine returns `Status` of `SUCCESS` and a detail line like
+`Checking Hostname for free space on root folder path 'C:' 56 GB. Expected at least 30 GB.`
+This is the quickest way to confirm your cleanup worked on the machine you just
+fixed. (`-Include Test-SystemDriveFreeSpace` runs only this check; drop the
+`-Include` to run the full hardware validation.)
+
+**Authoritative: re-run the pre-update health check.** This is what the portal
+readiness view and the cluster-wide result file reflect, so run it to clear the
+failure everywhere it is reported. It runs the full readiness check, so allow
+several minutes for the results to refresh:
+
+```powershell
+Invoke-SolutionUpdatePrecheck
+```
+
+After the re-run, **Test System Drive Free Space** should report success. You can
+confirm it in any of the places listed under [Where this failure
+appears](#where-this-failure-appears): the portal readiness checks, the
+`AzStackHciEnvironmentChecker` event log (Event ID 17205), or the newest
+`HealthCheckResult.EnvironmentChecker.*.json` on the infrastructure share.
+
+> **The portal can show a stale failure right after you reclaim space.** The
+> portal readiness view and the `HealthCheckResult.EnvironmentChecker.*.json` file
+> report the result of the *last* health check, so they keep showing the failure
+> until that result is refreshed, either by the pre-update health check above or by
+> the next scheduled periodic health check (roughly once a day). The fast targeted
+> check reflects the machine's live free space immediately, so use it to confirm
+> your fix and do not wait on the portal to update.
+
+If it still fails, repeat step 2 to see what refilled the drive. A drive that
+refills quickly is usually caused by a backed-up `GMACache` (a connectivity
+problem) or a runaway log, not a one-time pile of files.
+
+## When to escalate
+
+Open a support case if any of the following are true:
+
+- The drive refills faster than you can reclaim it, even after you fix outbound
+ connectivity.
+- A platform-managed folder (Tier 3) is the dominant consumer, and you cannot find
+ a connectivity or update cause.
+- The machine is at or near zero free space and will not boot or stay in the
+ cluster.
+
+## Related
+
+- [Known Issue: High Disk Space Usage in TEMP](./Known-Issue-High-Disk-Space-usage-in-TEMP.md)
+- General Environment Checker remediation link shown in the validator output:
+ https://aka.ms/hci-envch
+- Azure Local low-capacity requirements:
+ https://aka.ms/azurelocallowcapacityrequirements