-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<regex>
: regex_traits::transform_primary
should yield primary sort keys appropriate for the imbued locale
#5444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<regex>
: regex_traits::transform_primary
should yield primary sort keys appropriate for the imbued locale
#5444
Conversation
…t keys appropriate for the imbued locale
Thanks! 😻 I pushed some fixes, the most significant being
|
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
As foretold by the ancient prophecy, I had to push a commit to fix |
I resolved a trivial adjacent-add conflict with #5437 in |
Thanks for implementing this new LWG issue and fixing this ancient bug! 🐈⬛ 💚 🎉 |
Fixes #5435. Fixes #5291.
The actual work is done in two new functions
__std_regex_transform_primary_char/wchar_t
, which are basically 1:1 copies of_Strxfrm()
and_Wcsxfrm()
but pass different flags to__crtLCMapStringA/W
. I also took the liberty to correct the SAL annotations.__crtLCMapStringA/W
are declared inawint.hpp
which includesyvals.h
. I'm uncertain if this is the best approach, but I undefined_ENFORCE_ONLY_CORE_HEADERS
so thatawint.hpp
can be included.transform_primary
has to check the types of the collate facets using RTTI, so I made the function always returns an empty string when dynamic RTTI is disabled/_CPPRTTI
is undefined. The implementation itself is heavily based oncollate::do_transform
(including the change in #5431). It also needs access to the internals ofcollate
, so I made_Regex_traits
a friend of it.There is a behavior change for the C locale: As I explained in more detail in #5435, the traits requirement in [re.req]/20 is actually misleading, since it is wrong for precisely one locale: the C locale (or the POSIX locale, see the collation order definition here: https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02_06). Since the equivalence classes are derived from POSIX and the definition of
regex_traits::transform_primary
also alludes to "primary sort keys" which indirectly reference terminology from the POSIX standard (https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_02), I think we should do as POSIX says: "A" should not match[[=a=]]
.This has consequences:
<regex>
: Properly parse and match collating symbols and equivalences #5392, I assumed [re.req]/20, so I didn't add any character translation usingtranslate
andtranslate_nocase
when parsing equivalences. Now we have to add such logic in_Parser::_Do_ex_class2
to handle potentially case-sensitive sort keys when case-insensitive regexes are used (else "A" would even fail to match[[=A=]]
).Since matching and parsing of equivalences no longer go through
collate::transform
, related tests no longer have to be skipped under IDL mismatch.