Skip to content

Conversation

muellerj2
Copy link
Contributor

Towards #5468. This replaces the weird usage of _Cmp_chrange() by a straightforward call to std::search() in _Matcher2::_Skip().

I fix a copy-paste mistake in the comment describing _Cmp_icase_translateleft as well.

I also made an attempt to replace _Cmp_chrange()'s implementation by a straightforward call to std::mismatch(), but that seems to be a pessimization in practice of about 10 % (probably because the strings tend to be quite short).

There will still be one follow-up to make an obvious improvement to the skip heuristic for regex and wregex in collate mode. But otherwise, I think this is basically it for simple improvements to the skip heuristic. There are still a few opportunities that could lead to some improvement -- handling several branches, avoiding to walk the NFA for each _Skip() call, or avoiding to compare the NFA nodes matched by _Skip() in _Match_pat() again -- but they are not straightforward to implement.

Benchmark

benchmark before after speedup
bm_lorem_search/"^bibe"/2 28.2506 28.8783 0.98
bm_lorem_search/"^bibe"/3 27.6228 29.82 0.93
bm_lorem_search/"^bibe"/4 28.8783 32.4707 0.89
bm_lorem_search/"bibe"/2 43492.7 2622.77 16.58
bm_lorem_search/"bibe"/3 90680.8 5000 18.14
bm_lorem_search/"bibe"/4 172631 9626.07 17.93
bm_lorem_search/"(bibe)"/2 47538.5 4296.88 11.06
bm_lorem_search/"(bibe)"/3 92071.8 8370.5 11.00
bm_lorem_search/"(bibe)"/4 181370 15485.6 11.71
bm_lorem_search/"(bibe)+"/2 64062.5 10253.9 6.25
bm_lorem_search/"(bibe)+"/3 153460 20856.3 7.36
bm_lorem_search/"(bibe)+"/4 249062 40108.8 6.21
bm_lorem_search/"(?:bibe)+"/2 49178 4603.8 10.68
bm_lorem_search/"(?:bibe)+"/3 94164.3 8998.29 10.46
bm_lorem_search/"(?:bibe)+"/4 188354 17578.3 10.72
bm_lorem_search/R"(\bbibe)"/2 96256.9 89979.2 1.07
bm_lorem_search/R"(\bbibe)"/3 194972 188354 1.04
bm_lorem_search/R"(\bbibe)"/4 374930 368369 1.02
bm_lorem_search/R"(\Bibe)"/2 235395 222178 1.06
bm_lorem_search/R"(\Bibe)"/3 404531 461498 0.88
bm_lorem_search/R"(\Bibe)"/4 941265 983099 0.96
bm_lorem_search/R"((?=....)bibe)"/2 48131.7 3138.95 15.33
bm_lorem_search/R"((?=....)bibe)"/3 96256.9 6277.9 15.33
bm_lorem_search/R"((?=....)bibe)"/4 179983 12207 14.74
bm_lorem_search/R"((?=bibe)....)"/2 44327.8 2915.74 15.20
bm_lorem_search/R"((?=bibe)....)"/3 87886.7 5580.36 15.75
bm_lorem_search/R"((?=bibe)....)"/4 179983 10986.3 16.38
bm_lorem_search/R"((?!lorem)bibe)"/2 45515.6 2999.44 15.17
bm_lorem_search/R"((?!lorem)bibe)"/3 92071.8 5859.38 15.71
bm_lorem_search/R"((?!lorem)bibe)"/4 188354 11160.7 16.88

Note that this means that all improvements since #5509 have sped up searching for the regex (bibe)+ by a factor of about 450 and for (?:bibe)+ by a factor of about 1000 in this benchmark.

@muellerj2 muellerj2 requested a review from a team as a code owner June 14, 2025 14:01
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Jun 14, 2025
@StephanTLavavej StephanTLavavej self-assigned this Jun 14, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels Jun 14, 2025
@StephanTLavavej StephanTLavavej removed their assignment Aug 2, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Aug 2, 2025
@StephanTLavavej
Copy link
Member

Thanks, looks great! 😻

@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Aug 7, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit c1ce930 into microsoft:main Aug 8, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews Aug 8, 2025
@StephanTLavavej
Copy link
Member

Thanks again for figuring out how to make this faster! 🚗 🦖 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regex meow is a substring of homeowner
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants