Emit named IN_CREATE/IN_DELETE for inotify directory watches#58
Emit named IN_CREATE/IN_DELETE for inotify directory watches#58chenhunghan wants to merge 3 commits into
Conversation
4aa4da1 to
7fd5187
Compare
| * On failure keep the previous baseline; the next successful snapshot | ||
| * reconciles whatever changed in between. | ||
| */ | ||
| if (dir_snapshot(w->path, &now, &now_n)) { |
There was a problem hiding this comment.
Stale w->path reads the wrong directory after rename
The snapshot/diff design rests on w->path (captured by strdup(path) at inotify_add_watch time, src/syscall/inotify.c:594) being a valid handle to the directory whose entries the diff should compare. The watch's identity, however, is w->host_fd — an O_EVTONLY fd that follows the inode across renames. Once the two diverge, dir_snapshot(w->path, ...) at src/syscall/inotify.c:409 no longer reads the directory the watch is registered for.
I reproduced both failure modes with two small static aarch64 binaries cross-built and run against build/elfuse at this PR's tip (7fd5187).
Scenario A — old path no longer exists
mkdtemp /tmp/A
inotify_add_watch(fd, /tmp/A, IN_CREATE | IN_DELETE)
rename /tmp/A -> /tmp/A-renamed /* watched inode moves with the fd */
touch /tmp/A-renamed/child /* in the watched dir */
mv raises NOTE_RENAME on the watched fd (delivered as IN_MOVE_SELF, correct). The subsequent touch raises NOTE_WRITE on the same fd; process_vnode_event calls dir_snapshot("/tmp/A", ...), opendir returns ENOENT, the function returns false, the keep-baseline branch fires — and no IN_CREATE for child is ever emitted.
Actual output:
watching /tmp/elfuse-renA-KRUJg3 (wd=1)
renamed /tmp/elfuse-renA-KRUJg3 -> /tmp/elfuse-renA-KRUJg3-renamed; old path is now nonexistent
created /tmp/elfuse-renA-KRUJg3-renamed/child in the renamed (watched) dir
IN_CREATE name="child" count: 0 (expected 1 on Linux)
verdict: BUG REPRODUCED (named event lost after rename)
From the caller's perspective the watch silently stops naming children after a rename — exactly the failure mode #55 was opened to fix.
Scenario B — old path is taken by a different inode
mkdtemp /tmp/A
inotify_add_watch(fd, /tmp/A, IN_CREATE | IN_DELETE)
rename /tmp/A -> /tmp/A-renamed /* watched inode moves with the fd */
mkdir /tmp/A /* fresh dir, different inode, unwatched */
touch /tmp/A/decoy /* in the new unwatched dir */
touch /tmp/A-renamed/child /* in the watched dir */
Now opendir("/tmp/A") succeeds, but it reads the new /tmp/A, not the directory the watch tracks. The diff compares the new directory's entries (["decoy"]) to the baseline ([]) and synthesises an IN_CREATE for the decoy. The real child event in the renamed directory never surfaces because the watched dir is never sampled.
Actual output:
step 1: watching /tmp/elfuse-rename-A-nydrPl (wd=1)
step 2: rename /tmp/elfuse-rename-A-nydrPl -> /tmp/elfuse-rename-A-nydrPl-renamed (watched inode moves with the fd)
step 3: mkdir /tmp/elfuse-rename-A-nydrPl again (NEW inode at old path -- unwatched)
step 4a: created /tmp/elfuse-rename-A-nydrPl/decoy in the NEW (unwatched) dir
step 4b: created /tmp/elfuse-rename-A-nydrPl-renamed/child in the WATCHED (renamed) dir
step 5: draining events for ~1s
event: wd=1 mask=0x00000100 (IN_CREATE) len=8 name=decoy
summary:
total IN_CREATE: 1
IN_CREATE name="decoy" (BUG if >0): 1
IN_CREATE name="child" (expected: 1): 0
verdict: BUG REPRODUCED
The emitted event is objectively wrong on both axes: it names a file in a directory the caller never watched, and it suppresses the named event for the actual child of the watched directory. This isn't theoretical — build systems (yarn/pnpm/Cargo) and test runners routinely mv build → build.old; mkdir build; … for atomic swap; the watch on the old build/ will start fabricating events for the unrelated replacement directory after the swap.
Reproducers
Both binaries are small static aarch64 ELFs cross-built with aarch64-linux-gnu-gcc -static -O2 -D_GNU_SOURCE. Sources attached as Scenario A / Scenario B above; they exit non-zero on the buggy path. A correct fix should make both exit zero (Scenario A: emit IN_CREATE name="child"; Scenario B: emit IN_CREATE name="child" and not name="decoy").
This comment was marked as resolved.
This comment was marked as resolved.
EVFILT_VNODE reports that a watched directory changed but not which child, so directory events were queued with no name. fsnotify-based consumers (notably the k0s manifest applier, which re-applies only when a *.yaml entry appears) filter on the entry name and silently drop nameless events, so manifests written after the watch was established were never picked up. Keep a per-watch snapshot of the directory's entry names; on each NOTE_WRITE re-list the directory and diff against the snapshot to emit a named IN_CREATE per added child and IN_DELETE per removed one, matching real inotify semantics. The blocking-read and non-blocking collect paths share one process_vnode_event() helper. The snapshot is allocated on add_watch and freed on rm_watch and inotify_close. Add a regression test (test-inotify Test 6) that watches a fresh directory, creates a child, and asserts a named IN_CREATE for it is delivered; this fails before the fix (the event arrives without a name). Validated with make check on Apple Silicon. (cherry picked from commit 2a5fa28b5257ceb5ed116c231aecc7c4a2ef54ab)
A directory watch snapshots the child-name set on each change and diffs it against the previous baseline to recover the name kqueue omits. dir_snapshot returned an empty list on any failure (opendir error, a mid-read readdir error, or an allocation failure), which the diff could not tell apart from a genuinely empty directory. On a transient failure the IN_DELETE pass then saw every known child as missing and queued a spurious IN_DELETE for each, and the baseline was overwritten with the empty list, so every later change re-reported the whole directory as IN_CREATE. The corruption was permanent. Make dir_snapshot return bool: on failure it frees any partial result and reports false, and readdir read errors are now detected by clearing errno before each call. process_vnode_event only diffs against and advances to a snapshot that succeeded; on failure it keeps the previous baseline so the next successful snapshot reconciles whatever changed in between. The watch-add path documents that a failed initial snapshot is a best-effort empty baseline. Validated with make check on Apple Silicon; test-inotify still passes 6/6.
7fd5187 to
c0a1182
Compare
process_vnode_event() ran opendir()/readdir() (via dir_snapshot) for a directory watch's NOTE_WRITE while holding the global inotify_lock, so a large or slow directory scan blocked every other inotify operation -- on all instances -- for its duration. This is new I/O-under-lock introduced with the named-event diffing. Take the snapshot with the lock released: copy the watch's path and dev/ino under the lock, drop the lock for the opendir/readdir I/O, then re-acquire and re-validate the instance (by guest_fd) and the watch (by host_fd + dev/ino) before applying the diff. A close or host_fd reuse during the window discards the stale snapshot. Both collect paths (collect_events and the blocking read) re-check the instance after the call and return EBADF if it was torn down, mirroring the existing post-kevent re-validation. No change to the emitted events (test-inotify still 6/6); addresses the lock-contention review finding.
|
Rebased onto the latest While here I also addressed the lock-contention review finding in a separate commit (9c28c56): the directory |
There was a problem hiding this comment.
2 issues found across 1 file (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/syscall/inotify.c">
<violation number="1" location="src/syscall/inotify.c:417">
P1: Directory diffs are path-based, so a rename makes named IN_CREATE/IN_DELETE report against the stale pathname instead of the watched inode.</violation>
<violation number="2" location="src/syscall/inotify.c:433">
P2: Revalidation uses recycled numeric fds as instance identity, so a close+reopen race can let a stale directory snapshot mutate the new watch.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| /* Copy the path + identity, then release the lock for the opendir/ | ||
| * readdir snapshot so filesystem I/O does not block other instances. | ||
| */ | ||
| char *path = strdup(w->path); |
There was a problem hiding this comment.
P1: Directory diffs are path-based, so a rename makes named IN_CREATE/IN_DELETE report against the stale pathname instead of the watched inode.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall/inotify.c, line 417:
<comment>Directory diffs are path-based, so a rename makes named IN_CREATE/IN_DELETE report against the stale pathname instead of the watched inode.</comment>
<file context>
@@ -398,41 +406,77 @@ static int process_vnode_event(inotify_instance_t *inst,
+ /* Copy the path + identity, then release the lock for the opendir/
+ * readdir snapshot so filesystem I/O does not block other instances.
+ */
+ char *path = strdup(w->path);
+ if (path) {
+ dev_t dev = w->dev;
</file context>
| * different file. Any of these discards the stale snapshot. | ||
| */ | ||
| widx = watch_find_by_hostfd(inst, host_fd); | ||
| if (inotify_state[slot].guest_fd != guest_fd || widx < 0) { |
There was a problem hiding this comment.
P2: Revalidation uses recycled numeric fds as instance identity, so a close+reopen race can let a stale directory snapshot mutate the new watch.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/syscall/inotify.c, line 433:
<comment>Revalidation uses recycled numeric fds as instance identity, so a close+reopen race can let a stale directory snapshot mutate the new watch.</comment>
<file context>
@@ -398,41 +406,77 @@ static int process_vnode_event(inotify_instance_t *inst,
+ * different file. Any of these discards the stale snapshot.
+ */
+ widx = watch_find_by_hostfd(inst, host_fd);
+ if (inotify_state[slot].guest_fd != guest_fd || widx < 0) {
+ free_dir_snapshot(now, now_n);
+ return 0;
</file context>
|
cubic-dev-ai flagged on PR #58:
This is correct, and it currently has no test coverage. The named-event Root cause A directory watch has two identities that are captured once at
Once the watched directory is renamed, Proposed regression testsTwo tests added to /* Read inotify events for ~1s, recording whether a named IN_CREATE for the
* (optional) want and forbid names was observed. Drains the whole window so a
* "must NOT appear" assertion is meaningful. */
static void watch_creates(int fd,
const char *want, bool *saw_want,
const char *forbid, bool *saw_forbid)
{
if (saw_want) *saw_want = false;
if (saw_forbid) *saw_forbid = false;
for (int attempt = 0; attempt < 50; attempt++) {
char buf[2048];
ssize_t n = read(fd, buf, sizeof(buf));
for (ssize_t off = 0;
n > 0 && off + (ssize_t) sizeof(struct inotify_event) <= n;) {
struct inotify_event *ev = (struct inotify_event *) (buf + off);
if ((ev->mask & IN_CREATE) && ev->len > 0) {
if (want && saw_want && strcmp(ev->name, want) == 0)
*saw_want = true;
if (forbid && saw_forbid && strcmp(ev->name, forbid) == 0)
*saw_forbid = true;
}
off += (ssize_t) sizeof(struct inotify_event) + ev->len;
}
usleep(20000);
}
}
/* Test 7 : a child created in the renamed (still-watched) directory
* must still produce a named IN_CREATE; today the stale-path snapshot fails and
* the event is silently lost. */
static void test_dir_rename_oldpath_gone(void)
{
TEST("named IN_CREATE survives dir rename (old path gone)");
char dir[] = "/tmp/elfuse-inotify-renA-XXXXXX";
if (!mkdtemp(dir)) { FAIL("mkdtemp"); return; }
char renamed[300];
snprintf(renamed, sizeof(renamed), "%s-renamed", dir);
int fd = inotify_init1(IN_NONBLOCK);
if (fd < 0) { FAIL("inotify_init1"); rmdir(dir); return; }
int wd = inotify_add_watch(fd, dir, IN_CREATE | IN_DELETE);
if (wd < 0) { FAIL("inotify_add_watch"); close(fd); rmdir(dir); return; }
if (rename(dir, renamed) != 0) {
FAIL("rename watched dir");
inotify_rm_watch(fd, wd); close(fd); rmdir(dir); return;
}
const char *child = "child";
char path[400];
snprintf(path, sizeof(path), "%s/%s", renamed, child);
int tfd = open(path, O_WRONLY | O_CREAT | O_EXCL, 0644);
if (tfd < 0) {
FAIL("create child in renamed dir");
inotify_rm_watch(fd, wd); close(fd); rmdir(renamed); return;
}
close(tfd);
bool saw_child = false;
watch_creates(fd, child, &saw_child, NULL, NULL);
EXPECT_TRUE(saw_child,
"named IN_CREATE for child lost after rename (old path gone)");
unlink(path); inotify_rm_watch(fd, wd); close(fd); rmdir(renamed);
}
/* Test 8 : after the atomic-swap pattern, the watched dir's child
* must be named and the replacement dir's file must NOT appear. This is also
* the case the dev/ino re-validation cannot catch -- the watch identity never
* changed, only the path -> vnode binding (an ABA). */
static void test_dir_rename_oldpath_reused(void)
{
TEST("dir rename: no fabricated event from replacement dir");
char dir[] = "/tmp/elfuse-inotify-renB-XXXXXX";
if (!mkdtemp(dir)) { FAIL("mkdtemp"); return; }
char renamed[300];
snprintf(renamed, sizeof(renamed), "%s-renamed", dir);
int fd = inotify_init1(IN_NONBLOCK);
if (fd < 0) { FAIL("inotify_init1"); rmdir(dir); return; }
int wd = inotify_add_watch(fd, dir, IN_CREATE | IN_DELETE);
if (wd < 0) { FAIL("inotify_add_watch"); close(fd); rmdir(dir); return; }
if (rename(dir, renamed) != 0) {
FAIL("rename watched dir");
inotify_rm_watch(fd, wd); close(fd); rmdir(dir); return;
}
if (mkdir(dir, 0755) != 0) {
FAIL("mkdir replacement dir");
inotify_rm_watch(fd, wd); close(fd); rmdir(renamed); return;
}
const char *decoy = "decoy", *child = "child";
char decoy_path[400], child_path[400];
snprintf(decoy_path, sizeof(decoy_path), "%s/%s", dir, decoy);
snprintf(child_path, sizeof(child_path), "%s/%s", renamed, child);
int dfd = open(decoy_path, O_WRONLY | O_CREAT | O_EXCL, 0644);
if (dfd >= 0) close(dfd);
int cfd = open(child_path, O_WRONLY | O_CREAT | O_EXCL, 0644);
if (cfd >= 0) close(cfd);
if (dfd < 0 || cfd < 0) {
FAIL("create decoy/child");
unlink(decoy_path); unlink(child_path);
inotify_rm_watch(fd, wd); close(fd); rmdir(dir); rmdir(renamed); return;
}
bool saw_child = false, saw_decoy = false;
watch_creates(fd, child, &saw_child, decoy, &saw_decoy);
if (!(saw_child && !saw_decoy))
printf("[child=%d decoy=%d] ", saw_child, saw_decoy);
EXPECT_TRUE(saw_child && !saw_decoy,
saw_decoy ? "fabricated IN_CREATE for replacement-dir file"
: "named IN_CREATE for watched child lost");
unlink(decoy_path); unlink(child_path);
inotify_rm_watch(fd, wd); close(fd); rmdir(dir); rmdir(renamed);
}Register both in test_dir_create_named();
test_dir_rename_oldpath_gone();
test_dir_rename_oldpath_reused();Verified current behavior (RED on this PR's tip)Test 6 passing alongside the two new failures confirms the failures are the |
Fixes #55.
Synthesizes named
IN_CREATE/IN_DELETEfor directory watches by snapshotting the directory's entry names and diffing on eachNOTE_WRITE(kqueueEVFILT_VNODEreports only that the directory changed, not which child). Full rationale in the commit message.Adds a regression test (
test-inotifyTest 6) that watches a fresh directory, creates a child, and asserts a namedIN_CREATEis delivered.Validation on Apple Silicon:
make elfuseandmake checkpass; the new test fails before this change (event arrives with an empty name) and passes after.Summary by cubic
Emit named IN_CREATE/IN_DELETE for directory watches by diffing per‑watch snapshots, and keep the previous baseline on snapshot failure. Snapshots now run without holding
inotify_lockto avoid blocking other watches. Fixes nameless directory events so fsnotify tools can match child names (fixes #55).Written for commit 9c28c56. Summary will update on new commits.