fix(cdx): rewrite stored key expression when re-creating a tag (re-filter on another column)#93
Conversation
Re-running INDEX ON <other column> TAG ORDERX over an existing .cdx
without first deleting the bag — the "re-filter on another column"
idiom — left the tag's stored key expression pointing at the OLD
column. The tag-already-exists path cleared the B+tree (clear_data)
and rewrote the unique/descend/keysize options (set_options) but never
touched the key expression in the sub-tag header. The freshly built
tree carried the new column's keys, so a read straight after the
re-create looked correct, but:
- AdsGetIndexExpr reported the old column; and
- the next dbAppend/dbReplace (sync_all_indexes_ evaluates
idx->expression()) keyed the new record off the OLD column and
inserted it out of order — the "al cambiar de columna la lista se
desordenaba" symptom, surfacing only after a later write/reopen.
Native DBFCDX has no such gap: hb_cdxOrderCreate deletes the existing
tag wholesale (hb_cdxIndexDelTag) and adds a fresh one, so expression,
FOR clause, options and tree all follow the new column. rddads simply
forwards OrdCreate to AdsCreateIndex61, so matching DBFCDX here also
removes the need for the app-side OrdSetFocus(0)+OrdBagClear+FERASE
workaround callers were using to defend against the stale tree.
Fix: on the CDX re-create path, rewrite the whole key-/FOR-expression
pool via a new CdxIndex::set_expression(key_expr, for_expr). The pool
layout (offsets 504..1024, NUL-terminated, FOR just past the key NUL)
is factored out of build_subtag_header into write_expr_pool_ and reused
so the fresh-create and re-create paths can't drift. set_expression
subsumes the former set_condition (which only rewrote FOR and now has
no callers), so that dead method is removed.
Adds abi_cdx_recreate_tag_diff_expr_test: builds tag ORDERX on K1,
re-creates it on K2 (reverse order, no FERASE), and asserts the stored
expression is K2, the walk follows K2, and a record appended AFTER the
re-create lands in K2 order. RED before the fix (expression == "K1",
append misplaced); GREEN after. Full unit suite 785/785 (the handful
of intermittently-red tests are pre-existing infra-flaky ADI/ADT/
serverd cases on fixed temp paths, unrelated to this CDX change and
green on isolated/repeat runs). Builds clean, no new warnings.
Scope: CDX only. The ADT/.adi re-create path has the same shape and is
left for a follow-up.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the index re-creation logic in AdsCreateIndex61 to correctly overwrite the stored key expression and FOR clause when a tag is re-created on a different column, matching native DBFCDX behavior. It introduces a helper write_expr_pool_ to handle rewriting the expression pool and adds a comprehensive unit test. The review feedback suggests restoring validation checks in set_expression to prevent silent truncation of expressions exceeding the pool limit, and replacing a fixed-size buffer in the unit test with a dynamically sized vector to avoid potential buffer overflows.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if (sub_header_offset_ == 0) { | ||
| return util::Error{6106, 0, | ||
| "CDX sub-tag header offset uninitialised", ""}; | ||
| } |
There was a problem hiding this comment.
The previous implementation of set_condition explicitly prevented silent truncation of the FOR expression if it exceeded the available pool space. By replacing it with set_expression and relying solely on write_expr_pool_ (which silently truncates using std::min), we lose this safety check. This could lead to silent index corruption or incorrect query filtering if a user provides expressions that exceed the 510-byte pool limit. We should restore this validation check in set_expression.
| if (sub_header_offset_ == 0) { | |
| return util::Error{6106, 0, | |
| "CDX sub-tag header offset uninitialised", ""}; | |
| } | |
| if (sub_header_offset_ == 0) { | |
| return util::Error{6106, 0, | |
| "CDX sub-tag header offset uninitialised", ""}; | |
| } | |
| if (key_expr.size() > 510) { | |
| return util::Error{6106, 0, | |
| "Key expression too long for CDX header pool", ""}; | |
| } | |
| const std::size_t exp_len_nul = key_expr.size() + 1; | |
| const std::size_t for_room = (exp_len_nul <= 510) ? (510 - exp_len_nul) : 0; | |
| if (for_expr.size() > for_room) { | |
| return util::Error{6106, 0, | |
| "FOR expression too long for CDX header pool", ""}; | |
| } |
There was a problem hiding this comment.
Restored the fail-loud guard in set_expression: it now returns 6106 if the key (>510) or FOR (> remaining pool) would not fit, instead of letting write_expr_pool_ clamp with std::min. Covers both halves of the pool, not just the FOR as the old set_condition did. Added a unit case asserting the rejection. (769bc15)
| UNSIGNED8 srv[256]; | ||
| std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1); |
There was a problem hiding this comment.
Using a fixed-size buffer of 256 bytes for the server path (srv) can lead to a buffer overflow if the temporary directory path is very long (which is common in nested CI/CD environments). Using a dynamically sized std::vector<UNSIGNED8> avoids this risk entirely.
| UNSIGNED8 srv[256]; | |
| std::memcpy(srv, dir.string().c_str(), dir.string().size() + 1); | |
| std::vector<UNSIGNED8> srv(dir.string().begin(), dir.string().end()); | |
| srv.push_back(0); |
There was a problem hiding this comment.
Switched srv to a std::vector sized to the path. Note: building it directly from dir.string().begin()/.end() reads iterators into two distinct temporaries (length_error at runtime), so the string is materialised once first. (769bc15)
… test Addresses the Gemini Code Assist review on the PR: - set_expression now rejects a key or FOR expression that does not fit the 510-byte sub-header pool (error 6106) instead of letting write_expr_pool_ silently clamp it with std::min — a clipped key expression is a corrupt index and a clipped FOR changes which rows are indexed. Restores the fail-loud guard the removed set_condition used to enforce, now covering both halves of the pool. Adds a unit case asserting the rejection. - The test's server-path buffer is a std::vector sized to the path instead of a fixed 256-byte array, so a long temp dir (nested CI) can't overflow. (The materialised string avoids building the vector from iterators into two distinct std::string temporaries.) Full unit suite 788/788. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Problem
Re-running
INDEX ON <other column> TAG ORDERXover an existing.cdxwithout first deleting the bag — the common Clipper/Harbour "re-filter on another column" idiom — left the tag's stored key expression pointing at the old column.The tag-already-exists path (
AdsCreateIndex61,is_cdx && exists) cleared the B+tree (clear_data) and rewrote the unique/descend/keysize options (set_options), but never rewrote the key expression in the sub-tag header. The freshly built tree carried the new column's keys, so a read straight after the re-create looked fine, but:AdsGetIndexExprreported the old column; anddbAppend/dbReplace(sync_all_indexes_evaluatesidx->expression()) keyed the new record off the old column and inserted it out of order — silent disorder that only surfaces after a later write or reopen.Reference behavior
Native DBFCDX has no such gap:
hb_cdxOrderCreate(src/rdd/dbfcdx/dbfcdx1.c) deletes the existing tag wholesale (hb_cdxIndexDelTag) and adds a fresh one, so expression, FOR clause, options and tree all follow the new column.rddadssimply forwardsOrdCreatetoAdsCreateIndex61, so matching DBFCDX here also removes the need for the app-sideOrdSetFocus(0)+OrdBagClear+FERASEworkaround callers used to defend against the stale tree.Fix
On the CDX re-create path, rewrite the whole key-/FOR-expression pool via a new
CdxIndex::set_expression(key_expr, for_expr). The pool layout (offsets 504..1024, NUL-terminated, FOR just past the key NUL) is factored out ofbuild_subtag_headerintowrite_expr_pool_and reused, so the fresh-create and re-create paths can't drift.set_expressionsubsumes the formerset_condition(which only rewrote FOR and now has no callers), so that dead method is removed.Test
abi_cdx_recreate_tag_diff_expr_test: builds tagORDERXonK1, re-creates it onK2(reverse order, noFERASE), and asserts the stored expression isK2, the walk followsK2, and a record appended after the re-create lands inK2order. RED before the fix (expression == "K1", appended record misplaced); GREEN after.Full unit suite 785/785, builds clean, no new warnings. (A few ADI/ADT/serverd tests on fixed temp paths are intermittently red regardless of this change — they pass on isolated and repeat runs; this CDX change is green deterministically.)
Scope
CDX only. The ADT/
.adire-create path has the same shape and is left for a follow-up.