Skip to content

string_view: fix HasWord false negative after a non-boundary partial match#463

Open
EylonKrause wants to merge 1 commit into
google:mainfrom
EylonKrause:fix/stringview-hasword-boundary
Open

string_view: fix HasWord false negative after a non-boundary partial match#463
EylonKrause wants to merge 1 commit into
google:mainfrom
EylonKrause:fix/stringview-hasword-boundary

Conversation

@EylonKrause

Copy link
Copy Markdown

Description

CpuFeatures_StringView_HasWord detects a separator-delimited word in a line (used to find CPU features in /proc/cpuinfo flag lists). It searches for the word in remainder — which advances every loop iteration — but runs the separator-boundary checks by slicing before/after from the original line using that remainder-relative index:

const int index_of_word = CpuFeatures_StringView_IndexOf(remainder, word);
...
const StringView before = CpuFeatures_StringView_KeepFront(line, index_of_word);
const StringView after  = CpuFeatures_StringView_PopFront(line, index_of_word + word.size);
...
remainder = CpuFeatures_StringView_PopFront(remainder, index_of_word + word.size);

On the first iteration remainder == line, so the index is correct. But once an earlier, non-boundary partial occurrence advances remainder, index_of_word is relative to the shorter remainder while still being applied to the full line — so the boundary checks examine the wrong characters and the function returns a false negative for a word that really is present as a token.

Example: HasWord("foox foo", "foo", ' ') returns false even though foo is the second token. This affects any feature whose name first appears as a non-boundary substring of an earlier token and then again as its own token (e.g. MIPS/LoongArch flag lists such as ... mips32r2 mips32 ... searching for mips32).

Fix

Convert the remainder-relative index to an absolute index into line before slicing (remainder is always a suffix of line, so its offset is line.size - remainder.size). This keeps the boundary checks anchored to the full line.

Note: naively switching the slices to remainder instead would introduce a false positive at remainder-start boundaries — e.g. HasWord("foofoo bar", "foo", ' ') would then wrongly match. The absolute-index form avoids that.

Testing

Extended the existing StringViewTest.CpuFeatures_StringView_HasWord test:

  • Before the fix: HasWord("foox foo","foo",' ') and HasWord("xvme vme","vme",' ') return false → the test FAILS.
  • After the fix: both return true, the no-false-positive guard (HasWord("foofoo bar","foo",' ')false) holds, and the full suite is green under ASan/UBSan (ctest: 4/4; string_view_test: 17/17).

…match

CpuFeatures_StringView_HasWord searches for the word in `remainder`
(which advances every loop iteration) but runs the separator-boundary
checks by slicing `before`/`after` from the full `line` using that
remainder-relative index. On the first iteration remainder == line so
the index is correct; once an earlier non-boundary partial occurrence
advances `remainder`, the index is relative to the shorter remainder yet
applied to `line`, so the boundary checks read the wrong characters and
the word is reported absent even when it is a real token.

E.g. HasWord("foox foo", "foo", ' ') returned false. This affects any
feature whose name first appears as a non-boundary substring of an
earlier token and then again as its own token (e.g. MIPS/LoongArch flag
lists).

Fix: convert the remainder-relative index to an absolute index into
`line` (remainder is always a suffix, offset = line.size -
remainder.size). Anchoring the slices to the full line also avoids the
false positive that slicing `remainder` directly would create at
remainder-start boundaries.

Extends StringViewTest.CpuFeatures_StringView_HasWord with the
false-negative repro and a no-false-positive guard.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant