Skip to content

Conversation

muellerj2
Copy link
Contributor

@muellerj2 muellerj2 commented May 1, 2025

Fixes #5452. This is the minimal fix in _Matcher::_Skip that I mentioned in the issue: It avoids worst-case running time quadratic in the size of the searched string by giving up on _N_if nodes with two or more branches.

Unfortunately, we can't just give up whenever an _N_if is encountered because the parser currently likes to generate such nodes with a single branch. In particular, there is one such node at the beginning of each generated NFA. This means that giving up on such nodes would render _Skip pointless for all regular expressions.

I confirmed locally that this change is sufficient to fix the quadratic running time that #5452's test case exposes (though the quantifiers still almost double the running time.)

…ts with `?` quantifier or several alternatives
@muellerj2 muellerj2 requested a review from a team as a code owner May 1, 2025 18:09
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews May 1, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels May 1, 2025
@StephanTLavavej StephanTLavavej self-assigned this May 1, 2025
@StephanTLavavej StephanTLavavej removed their assignment May 3, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 3, 2025
@StephanTLavavej
Copy link
Member

Locally verified too. Given the very clear comment, and the difficulty of verifying this in automated testing without burning a bunch of CPU time, I'm okay with not having test coverage for this.

@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej added a commit to StephanTLavavej/STL that referenced this pull request May 9, 2025
@StephanTLavavej StephanTLavavej merged commit 5f5e5ea into microsoft:main May 10, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews May 10, 2025
@StephanTLavavej
Copy link
Member

O(Thanks2) 😹 🚀 🎉

@muellerj2 muellerj2 deleted the regex-matcher-skip-fix-quadratic-worst-case branch May 31, 2025 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regex meow is a substring of homeowner
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

<regex>: Nonlinear slowdown with increasing string length
3 participants