Skip to content

Conversation

davishmcclurg
Copy link
Contributor

My implementation doesn't yet support all idn-hostname label separators properly, but (for multiple reasons) it is still passing the tests added here: #760

These are the tests I came up with while fixing/reproducing the issues.

These are mostly meant to test that the additional `idn-hostname` label
separators are treated like `.` and are only allowed between labels.
Regular `hostname` should not support the extended label separators used
in `idn-hostname`.
This tests that the extended label separators used in `idn-hostname`
properly validate individual labels and don't treat the whole instance
as a single label.
@davishmcclurg davishmcclurg requested a review from a team as a code owner June 16, 2025 00:20
@karenetheridge
Copy link
Member

A lot of these fail in my implementation (they show as valid where the test says it's invalid), but that likely just means I need to mark these as TODO, due to a gap in the library I'm using.

@davishmcclurg
Copy link
Contributor Author

Looking at the IDNA specifications some more, it's unclear if the extended set of label separators is included in RFC 5890. The only reference I can find is in an appendix of RFC 5891 ("Summary of Major Changes from IDNA2003" (RFC 3490)):

Remove the dot separator from the mandatory part of the protocol.

Not sure what that means.

They're in RFC 3490 (as noted in #760), but I don't know how relevant that is since JSON schema uses 5890 for idn-hostname.

They're also included in Unicode technical standard 46, which references both IDNA versions. 🤷

Leading/trailing separators are similarly confusing.

Copy link
Member

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking into IDN validation requirements myself recently and I came across this PR. I'm sure I saw it before, wasn't in a place to understand it at the time.

Good call on adding tests for empty labels. That's definitely a gap we want to fill.

I hadn't noticed that the separator tests reference the wrong specification. I'm not sure either if the extended set of separators applies to IDNA2008. I'm ok with leaving them there and even adding to them until we can find some confirmation that it's not correct. It appears to be correct according to UTS #46 which is what most implementations are probably using anyway.

Comment on lines +427 to +431
{
"description": "dot separator with label that is too long when separator is ignored",
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com",
"valid": true
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these tests are right. Label length limits apply to the A-label length, not the U-label length. When I convert to ASCII I get, xn--hxaaaaaa3ababababababwdddddeeeeeegfffff2jggggg9ghhhhh3miiiiiljjjjj.com. The first label is 70 characters, which is already too long regardless of the separator.

I think this captures the intention of the test.

Suggested change
{
"description": "dot separator with label that is too long when separator is ignored",
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com",
"valid": true
},
{
"description": "dot separator with label that is too long when separator is ignored",
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπα.com",
"valid": true
},

That way the label length is 60 (valid) if the dot is recognized as a separator and 64 (too long) if the dot is not recognized as a separator.

Comment on lines +447 to +451
{
"description": "dot separator with label that is too long when separator is respected",
"data": "παράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμαπαράδειγμα.com",
"valid": false
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these tests have anything to do with separators. If the label is too long, it's going to fail regardless of the separator. I think these just amount to duplicates of label length tests that we already have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants