Skip to content

<regex>: wregex with regular expression [\w\s] fails to match some spaces #5243

@muellerj2

Description

@muellerj2

The regular expression [\w\s] fails to match whitespace characters with code points > 255.

Test case

#include <iostream>
#include <regex>

using namespace std;

int main() {
    const wregex re1(LR"([\s])");
    const wregex re2(LR"([\w\s])");
    cout << R"(U+0020 SPACE is matched by "[\s]": )" <<  regex_match(L" ", re1) << '\n';
    cout << R"(U+0020 SPACE is matched by "[\w\s]": )" <<  regex_match(L" ", re2) << '\n';
    cout << R"(U+2028 LINE SEPARATOR is matched by "[\s]": )" <<  regex_match(L"\u2028", re1) << '\n';
    cout << R"(U+2028 LINE SEPARATOR is matched by "[\w\s]": )" <<  regex_match(L"\u2028", re2) << '\n';
}

https://godbolt.org/z/oEdTs3Th4

This prints:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 0

Expected result

This should print:

U+0020 SPACE is matched by "[\s]": 1
U+0020 SPACE is matched by "[\w\s]": 1
U+2028 LINE SEPARATOR is matched by "[\s]": 1
U+2028 LINE SEPARATOR is matched by "[\w\s]": 1

Additional remarks

The underlying cause is #5242. But while I consider fixing #5242 ABI-breaking, I think this issue can be fixed without breaking ABI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!regexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions