Skip to content

fix(linbo_partition): retry umount /cache to survive concurrent re-mounts#155

Open
helge42 wants to merge 1 commit into
linuxmuster:mainfrom
helge42:fix/linbo-partition-umount-cache-race
Open

fix(linbo_partition): retry umount /cache to survive concurrent re-mounts#155
helge42 wants to merge 1 commit into
linuxmuster:mainfrom
helge42:fix/linbo-partition-umount-cache-race

Conversation

@helge42

@helge42 helge42 commented Jun 25, 2026

Copy link
Copy Markdown

Symptom

linbo-remote -i <host> -c format (or partition) aborts with:

Cannot unmount /cache.

Running the same operation locally via linbo-wrapper format from the LINBO GUI / shell works. The bug therefore looks client-state-dependent but is in fact a timing race: it hits fast hardware (NVMe) reliably and slower hardware (HDD / older SSD) only sporadically.

Root cause

linbo_gui periodically re-mounts /cache for status polling. On an otherwise idle client this is visible in dmesg as recurring cycles roughly every second:

EXT4-fs (nvme0n1p6): mounted filesystem ... r/w with ordered data mode.
EXT4-fs (nvme0n1p6): unmounting filesystem ...
EXT4-fs (nvme0n1p6): mounted filesystem ... ro with ordered data mode.
EXT4-fs (nvme0n1p6): unmounting filesystem ...

linbo_partition does (lines 213–218):

cd /
mount | grep -q ' /cache ' && umount /cache &> /dev/null && sleep 3
if mount | grep -q ' /cache '; then
  echo "Cannot unmount /cache." >&2
  exit 1
fi

The umount succeeds — but during the sleep 3 the polling re-mounts /cache, and the subsequent mount | grep re-check then aborts. gui_ctl disable, which linbo-remote calls before format, hides the GUI but does not stop the status polling.

Reproduction

  • Client sitting at the LINBO GUI, /cache mounted.
  • Watch the cycle on the client: while :; do date '+%H:%M:%S'; mount | grep cache; sleep 0.2; done
  • Trigger from the server: linbo-remote -i <client> -c format — fails with Cannot unmount /cache.
  • For comparison, in a shell on the client (Dropbear, port 2222): linbo-wrapper format — works.

Fix

Replace the single umount + sleep 3 with a short retry loop (up to 10 × 1 s). As soon as one iteration finds /cache unmounted, the loop exits and the existing failure check still fires if /cache is permanently busy. No new dependencies, behavior unchanged for the "not busy at all" case.

Test

Patched linbo_partition via a pre-hook in update-linbofs.pre.d on a production linuxmuster.net 7 server (linbofs64, 4.3.33). After PXE-booting test clients, linbo-remote -c format runs reliably across multiple invocations and on multiple clients. No regression observed for local linbo-wrapper format.

…unts

linbo_gui periodically re-mounts /cache for status polling (visible as
recurring mount/unmount cycles in dmesg on an otherwise idle client).
The single "umount /cache; sleep 3; re-check" sequence in linbo_partition
races against this polling: the umount succeeds, then the polling
re-mounts /cache during the 3 seconds, and the re-check aborts with
"Cannot unmount /cache."

This reproduces reliably with `linbo-remote -c format` (or partition) on
fast hardware (e.g. NVMe), while `linbo-wrapper format` started locally
from the GUI tends to miss the race window and works.

Replace the single attempt with a short retry loop (up to 10 x 1 s).
As soon as one iteration finds /cache unmounted, the loop exits and the
existing failure check still fires if /cache is permanently busy.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants