Skip to content

fix(ntx): split internal nodes correctly in a deep index (off-by-one)#92

Closed
Admnwk wants to merge 2 commits into
FiveTechSoft:mainfrom
Admnwk:fix/ntx-internal-split
Closed

fix(ntx): split internal nodes correctly in a deep index (off-by-one)#92
Admnwk wants to merge 2 commits into
FiveTechSoft:mainfrom
Admnwk:fix/ntx-internal-split

Conversation

@Admnwk

@Admnwk Admnwk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Problem

Once an NTX index grows a third level — an internal (branch) node fills up and has to split, which happens around 1500+ keys for a 15-byte key — the build produced a slightly corrupt tree: a full Skip walk returned more rows than exist (e.g. 2003 vs 2000), keys came back out of order, and a few Seeks missed. Unlike the duplicate-key case (#86), this hits plain ascending unique keys, so any large index is affected.

Root cause

In insert(), the internal-node split builds the parent's entry list, inserts the promoted separator at pos, then wired the new right sibling in with:

ents[pos + 1].lchild = prop_right;

For an ascending key stream the descent always lands at the rightmost slot (pos == key_count), so the separator is appended at the end and ents[pos + 1] reads one past the vector. Worse, prop_right was never recorded as the node's rightmost child — the stale original sentinel was reused instead, which for a rightmost leaf split aliases prop_left. So prop_left ended up referenced twice and prop_right was orphaned, injecting one phantom / out-of-order entry per internal split.

Fix

When the separator is appended at the end, prop_right becomes the node's new rightmost child (mirroring the non-splitting "room in parent" path); otherwise the original sentinel is preserved. This also removes the one-past-the-end vector access.

std::uint32_t node_rightmost;
if (pos + 1 < ents.size()) {
    ents[pos + 1].lchild = prop_right;   // separator before the end
    node_rightmost = right_sentinel;
} else {
    node_rightmost = prop_right;          // separator appended at the end
}
...
fill_internal(parent_right, p_right, node_rightmost);

Validation

  • New regression test tests/unit/abi_ntx_internal_split_test.cpp: builds a 2000-key unique index (no duplicates, no PACK) and asserts a full Skip walk visits exactly N records in order and every key is reachable by Seek.
    • RED before the fix: walk 2003/2000, out of order, Seek 1943/2000.
    • GREEN after.
  • Verified by an independent repro at N up to 10000 (unique and duplicate streams): every record reachable, in order.
  • Full unit suite: 786/786, 0 failures. Builds clean under -DOPENADS_WARNINGS_AS_ERRORS=ON (MSVC x64).

Notes

  • Stacked on fix(ntx): build a complete multi-page index over runs of duplicate keys #86 (the duplicate-key fix). The first commit here is fix(ntx): build a complete multi-page index over runs of duplicate keys #86; once that merges this collapses to the single internal-split commit. Both touch only src/drivers/ntx/ntx_index.{cpp,h} in different regions and are logically independent.
  • Unrelated pre-existing observation: abi_adi_multilevel_build_test is order-dependent — it fails when run in isolation (-tc=...) and passes inside the full suite, on plain v1.4.0 before any of these changes. It's an AdsCreateIndex61 → 5000 on an 8000-row ADI build that depends on warm heap state; not introduced or affected by this change (this fix in fact removes a one-past-the-end write). Flagging it as a separate item worth a look.

Admnwk and others added 2 commits June 26, 2026 01:03
An NTX index built over a key stream containing a clustered run of
duplicate keys came out structurally incomplete: a top-to-bottom Skip
walk reached only a fraction of the records and AdsSeek found almost
none of the survivors, even though AdsCreateIndex reported success. Two
independent defects in the B-tree insert path were responsible.

1. seek_key_for_write_(), shared by insert() and erase(), stopped the
   descent at an internal node whenever the key matched a separator
   exactly. NTX keeps keys in internal nodes (a B-tree, not a B+tree),
   so for erase / read positioning that is correct, but an insert must
   always reach a leaf. Stopping at a branch made insert() write the new
   entry into that internal node with a 0 left child, orphaning the
   whole subtree. Exact matches only happen for duplicate keys, so
   unique streams never tripped it. Fix: a descend_to_leaf flag (set by
   insert) ignores the exact-match branch shortcut and keeps walking
   down to a leaf.

2. The "room in parent" split path reused the parent's sentinel slot
   (its rightmost child) for the promoted separator without preserving
   the original rightmost child when the separator landed left of the
   end (pos < key_count). Runs of duplicates sort into the leftmost
   subtree, so their separators are inserted left of the end, dropping
   the right subtree and leaving an unreachable branch with a 0 right
   child. Ascending unique keys always split at the rightmost position,
   so they never tripped it. Fix: capture the rightmost child before the
   offset shift and reinstate it as the new sentinel when pos < key_count.

Adds abi_ntx_multipage_dup_test: builds a multi-page NTX over a
duplicate-laden key stream and asserts every record is reachable both by
a full Skip walk (in order) and an individual Seek. RED before the fix
(walk reached 30/300, Seek found 24/240), GREEN after. Full unit suite
786/786, 0 regressions; builds clean under -Werror.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Once an NTX index grows a third level — an internal (branch) node fills
up and must split, which happens around 1500+ keys for a 15-byte key —
insert() built the parent's entry list, inserted the promoted separator
at `pos`, then wired the new right sibling in with
`ents[pos + 1].lchild = prop_right`. For an ascending key stream the
descent always lands at the rightmost slot (pos == key_count), so the
separator is appended at the end and `ents[pos + 1]` reads one past the
vector. Worse, prop_right was never recorded as the node's rightmost
child: the stale original sentinel was reused instead, which for a
rightmost leaf split aliases prop_left — so prop_left ended up referenced
twice and prop_right was orphaned.

Effect: every internal-node split injected one phantom / out-of-order
entry. A full Skip walk returned slightly more rows than exist (e.g.
2003 vs 2000), keys came back out of order, and a few Seeks missed.

Fix: when the separator is appended at the end, prop_right becomes the
node's new rightmost child (mirroring the non-splitting "room in parent"
path); otherwise the original sentinel is preserved. This also removes
the one-past-the-end vector access.

Adds abi_ntx_internal_split_test: builds a 2000-key unique index (no
duplicates, no PACK) and asserts a full Skip walk visits exactly N
records in order and every key is reachable by Seek. RED before the fix
(walk 2003/2000, out of order, Seek 1943/2000), GREEN after. Full unit
suite passes; builds clean under -Werror.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes critical bugs in the NTX B-tree index implementation related to duplicate key insertion and internal node splitting, which previously caused orphaned subtrees and out-of-order/missing keys. It introduces a descend_to_leaf flag to ensure insertions always reach a leaf node, and corrects the handling of the rightmost child sentinel during internal node splits. Two regression tests are added to cover these scenarios. The review feedback highlights potential buffer overflow vulnerabilities in the newly added test files where fixed-size buffers are populated using std::memcpy without bounds checking, and suggests safer alternatives using std::snprintf or assertions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +41 to +42
UNSIGNED8 fld[32];
std::memcpy(fld, field, std::strlen(field) + 1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::memcpy with std::strlen to copy into a fixed-size buffer fld of size 32 can lead to a buffer overflow if field exceeds 31 characters. It is safer to use std::snprintf to ensure null-termination and prevent out-of-bounds writes.

Suggested change
UNSIGNED8 fld[32];
std::memcpy(fld, field, std::strlen(field) + 1);
UNSIGNED8 fld[32];
std::snprintf(reinterpret_cast<char*>(fld), sizeof(fld), "%s", field);

Comment on lines +69 to +70
UNSIGNED8 srv[256];
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Copying dir.string() into a fixed-size buffer srv of size 256 using std::memcpy can cause a buffer overflow if the temporary directory path is extremely long (which can happen in some deep CI or Windows environments). Adding a REQUIRE check ensures that the path fits within the buffer limits.

Suggested change
UNSIGNED8 srv[256];
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);
UNSIGNED8 srv[256];
REQUIRE(dir.string().size() < sizeof(srv));
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);

Comment on lines +51 to +52
UNSIGNED8 fld[32];
std::memcpy(fld, field, std::strlen(field) + 1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using std::memcpy with std::strlen to copy into a fixed-size buffer fld of size 32 can lead to a buffer overflow if field exceeds 31 characters. It is safer to use std::snprintf to ensure null-termination and prevent out-of-bounds writes.

Suggested change
UNSIGNED8 fld[32];
std::memcpy(fld, field, std::strlen(field) + 1);
UNSIGNED8 fld[32];
std::snprintf(reinterpret_cast<char*>(fld), sizeof(fld), "%s", field);

Comment on lines +79 to +80
UNSIGNED8 srv[256];
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Copying dir.string() into a fixed-size buffer srv of size 256 using std::memcpy can cause a buffer overflow if the temporary directory path is extremely long (which can happen in some deep CI or Windows environments). Adding a REQUIRE check ensures that the path fits within the buffer limits.

Suggested change
UNSIGNED8 srv[256];
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);
UNSIGNED8 srv[256];
REQUIRE(dir.string().size() < sizeof(srv));
std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1);

@Admnwk

Admnwk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator Author

Merged directly to main as b0ee044 (stacked on #86's c5ce0d5; both cherry-picked onto current main). Full unit suite 800/800, 0 failures, -Werror clean. Note: the pre-existing abi_adi_multilevel_build_test order-dependence flagged above is unaffected (passes in the full suite). Closing as merged — thanks!

@Admnwk Admnwk closed this Jun 26, 2026
@Admnwk Admnwk deleted the fix/ntx-internal-split branch June 26, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant