string_view: fix HasWord false negative after a non-boundary partial match#463
Open
EylonKrause wants to merge 1 commit into
Open
string_view: fix HasWord false negative after a non-boundary partial match#463EylonKrause wants to merge 1 commit into
EylonKrause wants to merge 1 commit into
Conversation
…match
CpuFeatures_StringView_HasWord searches for the word in `remainder`
(which advances every loop iteration) but runs the separator-boundary
checks by slicing `before`/`after` from the full `line` using that
remainder-relative index. On the first iteration remainder == line so
the index is correct; once an earlier non-boundary partial occurrence
advances `remainder`, the index is relative to the shorter remainder yet
applied to `line`, so the boundary checks read the wrong characters and
the word is reported absent even when it is a real token.
E.g. HasWord("foox foo", "foo", ' ') returned false. This affects any
feature whose name first appears as a non-boundary substring of an
earlier token and then again as its own token (e.g. MIPS/LoongArch flag
lists).
Fix: convert the remainder-relative index to an absolute index into
`line` (remainder is always a suffix, offset = line.size -
remainder.size). Anchoring the slices to the full line also avoids the
false positive that slicing `remainder` directly would create at
remainder-start boundaries.
Extends StringViewTest.CpuFeatures_StringView_HasWord with the
false-negative repro and a no-false-positive guard.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
CpuFeatures_StringView_HasWorddetects a separator-delimited word in a line (used to find CPU features in/proc/cpuinfoflag lists). It searches for the word inremainder— which advances every loop iteration — but runs the separator-boundary checks by slicingbefore/afterfrom the originallineusing that remainder-relative index:On the first iteration
remainder == line, so the index is correct. But once an earlier, non-boundary partial occurrence advancesremainder,index_of_wordis relative to the shorterremainderwhile still being applied to the fullline— so the boundary checks examine the wrong characters and the function returns a false negative for a word that really is present as a token.Example:
HasWord("foox foo", "foo", ' ')returnsfalseeven thoughfoois the second token. This affects any feature whose name first appears as a non-boundary substring of an earlier token and then again as its own token (e.g. MIPS/LoongArch flag lists such as... mips32r2 mips32 ...searching formips32).Fix
Convert the remainder-relative index to an absolute index into
linebefore slicing (remainderis always a suffix ofline, so its offset isline.size - remainder.size). This keeps the boundary checks anchored to the full line.Note: naively switching the slices to
remainderinstead would introduce a false positive at remainder-start boundaries — e.g.HasWord("foofoo bar", "foo", ' ')would then wrongly match. The absolute-index form avoids that.Testing
Extended the existing
StringViewTest.CpuFeatures_StringView_HasWordtest:HasWord("foox foo","foo",' ')andHasWord("xvme vme","vme",' ')returnfalse→ the test FAILS.true, the no-false-positive guard (HasWord("foofoo bar","foo",' ')→false) holds, and the full suite is green under ASan/UBSan (ctest: 4/4;string_view_test: 17/17).