NormalizedURL#to_s returns punycode, other instance methods does not #89

roback · 2016-09-06T08:03:20Z

Calling #.to_s or #host on a normalized URL returns a punycoded string.

Calling #domain, #tld, #sld etc. on the normalized url returns the non-punycoded version of the URL. What those methods have in common is that they all uses public_suffix.

[3] pry(main)> url = Twingly::URL.parse("http://test.भारत")
=> #<Twingly::URL:0x3fe40956dc7c http://test.भारत>
[4] pry(main)> normalized_url = url.normalized
=> #<Twingly::URL:0x3fe40acdbfa8 http://www.test.xn--h2brj9c/>
[5] pry(main)> url.to_s
=> "http://test.भारत"
[6] pry(main)> normalized_url.to_s
=> "http://www.test.xn--h2brj9c/"
[7] pry(main)> normalized_url.tld
=> "भारत"
[8] pry(main)> normalized_url.domain
=> "test.भारत"

The text was updated successfully, but these errors were encountered:

roback · 2016-09-06T15:07:09Z

I think we need to do #68 before this can be solved, so we can check whether the strings that public_suffix_domain returns (in these methods) should be converted to ascii.

dentarg · 2016-09-06T15:21:39Z

Hmm, why?

roback · 2016-09-07T06:23:28Z

Hmm, why?

    def trd
      url_trd = public_suffix_domain.trd.to_s
      url_trd = convert_to_ascii(url_trd) if normalized?
      url_trd
    end

dentarg · 2016-09-07T07:40:49Z

I see, but that's not how we are going to do it :) I think we should parse/convert only one time (when #normalized is called), save the values for later, and return them when they are asked for.

    def trd
      @normalized_trd || @trd
    end

I think we should expand #71 to include not only host, but all parts that can exist in the two formats

roback · 2016-09-07T07:44:24Z

I see, but that's not have we are going to do it :) I think we should parse/convert only one time (when #normalized is called), save the values for later, and return them when they are asked for.

Ah, yes, thats a much better idea. Just ignore my comments then 😄

dentarg · 2016-09-07T07:47:29Z

Sorry, talked a little bit with @walro, he thinks we should make the #normalized method work, not keeping both states of both non-normalized and normalized

    def normalized
      normalized_url = addressable_uri.dup

      normalized_url.scheme = normalized_scheme
      normalized_url.host   = normalized_host
      normalized_url.path   = normalized_path

      self.class.parse(normalized_url)
    end

Above we have only thought of addressable, I think addressable when it comes to TLDs doesn't do what we expect it to do... you can experiment with this if you want, or we can take a look together later. Gotta go now :)

roback · 2016-09-07T08:16:44Z

Just realized that this doesn't just have to do with normalized URLs. How do we want to do in this case:

[1] pry(main)> u = Twingly::URL.parse("http://teståäö.xn--3e0b707e")
=> #<Twingly::URL:0x3fdd006c6b10 http://teståäö.xn--3e0b707e>
[2] pry(main)> u.to_s
=> "http://teståäö.xn--3e0b707e"
[3] pry(main)> u.tld
=> "한국"

I think the most logical thing would be to return the tld in it's original form ("xn--3e0b707e", the same as .to_s returns).

roback · 2016-09-08T14:36:52Z

I think the most logical thing would be to return the tld in it's original form ("xn--3e0b707e", the same as .to_s returns).

How will we know what the original tld looks like if we can only extract the tld by using public_suffix, which only works with non-punycoded domains (because the suffix list doesn't contain punycoded domains)?

I think I give up on this for today 😄

dentarg · 2016-09-08T15:07:21Z

Maybe we need to create the reverse public suffix list

On 8 sep. 2016, at 16:36, Mattias Roback [email protected] wrote:

I think the most logical thing would be to return the tld in it's original form ("xn--3e0b707e", the same as .to_s returns).

How will we know what the original tld looks like if we can only extract the tld by using public_suffix, which only works with non-punycoded domains (because the suffix list doesn't contain punycoded domains)?

I think I give up on this for today 😄

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

dentarg · 2016-09-08T22:19:19Z

On the plane ride home I played around with twingly-url and public suffix 2, and I have the reverse list now :)

roback · 2016-09-09T06:47:13Z

Maybe we need to create the reverse public suffix list

public_suffix has an add method that you can use to insert new rules into the list. Maby that can be used somehow. I'll have to try and see if I can make something work :)

dentarg · 2016-09-09T08:15:27Z

Yes, that's what I used

Related to #89 and #71. Close #85.

dentarg · 2016-09-09T10:11:02Z

I think the most logical thing would be to return the tld in it's original form ("xn--3e0b707e", the same as .to_s returns).

This is the case with the changes made in #90 – we no longer pass Addressable #display_uri.host (which always converted punycoded names (xn--`)) to public suffix.

If we merge #90, we can close this issue by adding more comphrensive tests for our normalize methods, e.g. normalizing some different types of URLs and expecting correct output for each part and the whole URL.

Close #89.

roback added the bug label Sep 6, 2016

dentarg added a commit that referenced this issue Sep 9, 2016

Use PublicSuffix 2

cd38e55

Related to #89 and #71. Close #85.

dentarg mentioned this issue Sep 9, 2016

Ensure normalized IDNA domains return ASCII strings #90

Merged

dentarg added a commit that referenced this issue Sep 14, 2016

Test all relevant parts of normalized IDN

765db07

Close #89.

dentarg closed this as completed in #90 Sep 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NormalizedURL#to_s returns punycode, other instance methods does not #89

NormalizedURL#to_s returns punycode, other instance methods does not #89

roback commented Sep 6, 2016

roback commented Sep 6, 2016

dentarg commented Sep 6, 2016

roback commented Sep 7, 2016

dentarg commented Sep 7, 2016 •

edited

Loading

roback commented Sep 7, 2016

dentarg commented Sep 7, 2016

roback commented Sep 7, 2016

roback commented Sep 8, 2016

dentarg commented Sep 8, 2016

dentarg commented Sep 8, 2016

roback commented Sep 9, 2016 •

edited

Loading

dentarg commented Sep 9, 2016

dentarg commented Sep 9, 2016 •

edited

Loading

NormalizedURL#to_s returns punycode, other instance methods does not #89

NormalizedURL#to_s returns punycode, other instance methods does not #89

Comments

roback commented Sep 6, 2016

roback commented Sep 6, 2016

dentarg commented Sep 6, 2016

roback commented Sep 7, 2016

dentarg commented Sep 7, 2016 • edited Loading

roback commented Sep 7, 2016

dentarg commented Sep 7, 2016

roback commented Sep 7, 2016

roback commented Sep 8, 2016

dentarg commented Sep 8, 2016

dentarg commented Sep 8, 2016

roback commented Sep 9, 2016 • edited Loading

dentarg commented Sep 9, 2016

dentarg commented Sep 9, 2016 • edited Loading

dentarg commented Sep 7, 2016 •

edited

Loading

roback commented Sep 9, 2016 •

edited

Loading

dentarg commented Sep 9, 2016 •

edited

Loading