-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NormalizedURL#to_s returns punycode, other instance methods does not #89
Comments
I think we need to do #68 before this can be solved, so we can check whether the strings that |
Hmm, why? |
def trd
url_trd = public_suffix_domain.trd.to_s
url_trd = convert_to_ascii(url_trd) if normalized?
url_trd
end |
I see, but that's not how we are going to do it :) I think we should parse/convert only one time (when def trd
@normalized_trd || @trd
end I think we should expand #71 to include not only host, but all parts that can exist in the two formats |
Ah, yes, thats a much better idea. Just ignore my comments then 😄 |
Sorry, talked a little bit with @walro, he thinks we should make the def normalized
normalized_url = addressable_uri.dup
normalized_url.scheme = normalized_scheme
normalized_url.host = normalized_host
normalized_url.path = normalized_path
self.class.parse(normalized_url)
end Above we have only thought of addressable, I think addressable when it comes to TLDs doesn't do what we expect it to do... you can experiment with this if you want, or we can take a look together later. Gotta go now :) |
Just realized that this doesn't just have to do with normalized URLs. How do we want to do in this case: [1] pry(main)> u = Twingly::URL.parse("http://teståäö.xn--3e0b707e")
=> #<Twingly::URL:0x3fdd006c6b10 http://teståäö.xn--3e0b707e>
[2] pry(main)> u.to_s
=> "http://teståäö.xn--3e0b707e"
[3] pry(main)> u.tld
=> "한국" I think the most logical thing would be to return the |
How will we know what the original tld looks like if we can only extract the tld by using public_suffix, which only works with non-punycoded domains (because the suffix list doesn't contain punycoded domains)? I think I give up on this for today 😄 |
Maybe we need to create the reverse public suffix list
|
On the plane ride home I played around with twingly-url and public suffix 2, and I have the reverse list now :) |
public_suffix has an add method that you can use to insert new rules into the list. Maby that can be used somehow. I'll have to try and see if I can make something work :) |
Yes, that's what I used |
This is the case with the changes made in #90 – we no longer pass Addressable If we merge #90, we can close this issue by adding more comphrensive tests for our normalize methods, e.g. normalizing some different types of URLs and expecting correct output for each part and the whole URL. |
Calling
#.to_s
or#host
on a normalized URL returns a punycoded string.Calling
#domain
,#tld
,#sld
etc. on the normalized url returns the non-punycoded version of the URL. What those methods have in common is that they all uses public_suffix.The text was updated successfully, but these errors were encountered: