-
-
Notifications
You must be signed in to change notification settings - Fork 224
More hostname label separator tests #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
More hostname label separator tests #776
Conversation
These are mostly meant to test that the additional `idn-hostname` label separators are treated like `.` and are only allowed between labels.
Regular `hostname` should not support the extended label separators used in `idn-hostname`.
This tests that the extended label separators used in `idn-hostname` properly validate individual labels and don't treat the whole instance as a single label.
A lot of these fail in my implementation (they show as valid where the test says it's invalid), but that likely just means I need to mark these as TODO, due to a gap in the library I'm using. |
Looking at the IDNA specifications some more, it's unclear if the extended set of label separators is included in RFC 5890. The only reference I can find is in an appendix of RFC 5891 ("Summary of Major Changes from IDNA2003" (RFC 3490)):
Not sure what that means. They're in RFC 3490 (as noted in #760), but I don't know how relevant that is since JSON schema uses 5890 for They're also included in Unicode technical standard 46, which references both IDNA versions. 🤷 Leading/trailing separators are similarly confusing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been looking into IDN validation requirements myself recently and I came across this PR. I'm sure I saw it before, wasn't in a place to understand it at the time.
Good call on adding tests for empty labels. That's definitely a gap we want to fill.
I hadn't noticed that the separator tests reference the wrong specification. I'm not sure either if the extended set of separators applies to IDNA2008. I'm ok with leaving them there and even adding to them until we can find some confirmation that it's not correct. It appears to be correct according to UTS #46
which is what most implementations are probably using anyway.
{ | ||
"description": "dot separator with label that is too long when separator is ignored", | ||
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com", | ||
"valid": true | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests are right. Label length limits apply to the A-label length, not the U-label length. When I convert to ASCII I get, xn--hxaaaaaa3ababababababwdddddeeeeeegfffff2jggggg9ghhhhh3miiiiiljjjjj.com
. The first label is 70 characters, which is already too long regardless of the separator.
I think this captures the intention of the test.
{ | |
"description": "dot separator with label that is too long when separator is ignored", | |
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com", | |
"valid": true | |
}, | |
{ | |
"description": "dot separator with label that is too long when separator is ignored", | |
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπα.com", | |
"valid": true | |
}, |
That way the label length is 60 (valid) if the dot is recognized as a separator and 64 (too long) if the dot is not recognized as a separator.
{ | ||
"description": "dot separator with label that is too long when separator is respected", | ||
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com", | ||
"valid": false | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these tests have anything to do with separators. If the label is too long, it's going to fail regardless of the separator. I think these just amount to duplicates of label length tests that we already have.
My implementation doesn't yet support all
idn-hostname
label separators properly, but (for multiple reasons) it is still passing the tests added here: #760These are the tests I came up with while fixing/reproducing the issues.