Add optional support for digit separators and cpp prefixes#369
Conversation
|
Thanks, to be reviewed. |
|
I rebased the PR here : https://github.com/fastfloat/fast_float/tree/pr-369-rebase I get that this PR slows down the performance by 5% to 10%. We could hack things so that the it is performance neutral. But I am not convinced. I don't know who needs this feature. I am concerned about adding a function that nobody would use but that we would need to maintain. |
Re-implements the optional digit-separator and base-prefix parsing (originally PR fastfloat#369) on top of the current store_spans hot-path architecture, with the goal of zero overhead when the features are not used. Key changes vs. the original PR: - has_separator is now a *compile-time* template parameter on parse_number_string, dispatched once (from_chars_advanced -> parse_number_string_options) based on options.digit_separator. The has_separator==false instantiation that every default caller uses is byte-for-byte the separator-free parser: no separator comparison ever enters a digit loop and the SIMD eight-digit fast path is preserved. The separator-aware code lives only in the cold true instantiation. - The store_spans no-span hot path (added to main after the original PR was branched) is preserved. - parse_options_t fields are ordered so the two new single-byte fields (digit_separator, format_options) fall into the existing padding for UC == char. sizeof(parse_options_t<char>) stays 16 bytes, so the struct is still register-passed and the call boundary is unchanged. Result: for the default (no separator, no prefix) path, the generated assembly of from_chars<double> is identical to main, and benchmarks show no measurable regression (the original PR was 5-10% slower). Adds basictest coverage for digit separators (including the >19-digit overflow re-scan paths) and prefix skipping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-implements the optional digit-separator and base-prefix parsing (originally PR fastfloat#369) on top of the current store_spans hot-path architecture, with the goal of zero overhead when the features are not used. Key changes vs. the original PR: - has_separator is now a *compile-time* template parameter on parse_number_string, dispatched once (from_chars_advanced -> parse_number_string_options) based on options.digit_separator. The has_separator==false instantiation that every default caller uses is byte-for-byte the separator-free parser: no separator comparison ever enters a digit loop and the SIMD eight-digit fast path is preserved. The separator-aware code lives only in the cold true instantiation. - The store_spans no-span hot path (added to main after the original PR was branched) is preserved. - parse_options_t fields are ordered so the two new single-byte fields (digit_separator, format_options) fall into the existing padding for UC == char. sizeof(parse_options_t<char>) stays 16 bytes, so the struct is still register-passed and the call boundary is unchanged. Result: for the default (no separator, no prefix) path, the generated assembly of from_chars<double> is identical to main, and benchmarks show no measurable regression (the original PR was 5-10% slower). Adds basictest coverage for digit separators (including the >19-digit overflow re-scan paths) and prefix skipping.
|
I think the useful precedent here is rust-lexical. Its float parser uses an Eisel-Lemire implementation, so this is not a random general-purpose parser with unrelated tradeoffs. lexical explicitly supports this class of use case: configurable digit separators, base prefixes/radices, and predefined number formats for data formats and language literal grammars such as JSON, TOML, YAML, Rust, Python, C++, C#, FORTRAN, COBOL, and others.
-> users parsing numeric literals or config/data formats whose grammar is intentionally wider than |
|
The performance concern is also handled in the same spirit. In lexical, the number format is a compile-time format specification; in this PR, the separator path is similarly isolated behind a compile-time So I agree this feature should not regress the default parser. My point is that the current design confines the extra behavior to an opt-in cold instantiation, while keeping the standard-number hot path unchanged. That seems to address both concerns: no runtime cost for default users, and a bounded maintenance surface for users who actually need language/config literal parsing. |
|
Look. I understand you have good intentions, and that you have put in some work into this, and I do invite contributions... But you need to tell us what the motivation is. My view is that having code in the library that nobody uses is a net negative. It makes the library larger, it increases the maintenance burden. So who is the consumer here ? Let us look at the PR: Firstly, I am measuring a performance degradation following this PR in our benchmarks using GCC 14 and an x64 processor. I am sure we can alleviate this with some engineering effort. Secondly why does it need to be in I have never encountered a case where I need to parse lots of floats with separators. I know that some programming languages allow it in code, but you rarely have lots and lots of floats to parse in code... and the compilers already have their own number parsers... can you point at one that would adopt |
|
Thanks for taking the time to lay this out, I think your core position is right, and I want to agree with it up front: carrying code that no clear consumer needs is a net negative, and the maintenance/size cost is real regardless of how cheap the runtime path is. So I won’t argue that this must go in.
Given that, I’m completely fine with you closing this at your discretion. The work wasn’t wasted, the perf-neutral approach was a worthwhile exercise, and the branch will stay available if a concrete consumer ever shows up. Thanks for the thoughtful review either way |
Add optional support in
from_chars_advancedto:')0x/0X,0b/0B) before parsing (decimal-only; no hex/binary floats)Resolves #124
no measurable regression on standard inputs (canada.txt / mesh.txt).