Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

domain, sld, trd (maybe others) can be nil #52

Closed
dentarg opened this issue Oct 26, 2015 · 15 comments · Fixed by #78
Closed

domain, sld, trd (maybe others) can be nil #52

dentarg opened this issue Oct 26, 2015 · 15 comments · Fixed by #78
Labels

Comments

@dentarg
Copy link
Collaborator

dentarg commented Oct 26, 2015

This breaks our applications.

[8] pry(main)> Twingly::URL.parse("https://.com/").domain
=> nil
[9] pry(main)> Twingly::URL.parse("https://.com/").sld
=> nil
[10] pry(main)> Twingly::URL.parse("https://.com/").trd
=> nil
@dentarg dentarg added the bug label Oct 26, 2015
@walro
Copy link
Contributor

walro commented Oct 26, 2015

[5] pry(main)> uri = Addressable::URI.heuristic_parse("https://.com/path")
=> #<Addressable::URI:0x3ffeb1525144 URI:https://.com/path>
[6] pry(main)> uri.display_uri.host
=> ".com"
[7] pry(main)> PublicSuffix.parse(".com")
=> #<PublicSuffix::Domain:0x007ffd6329a1e0 @sld=nil, @tld="com", @trd=nil>

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 26, 2015

[13] pry(main)> url = Twingly::URL.parse("https://.com/")
=> #<Twingly::URL:0x3fe1dcd5b828 https://.com/>
[14] pry(main)> cd url
[15] pry(#<Twingly::URL>):1> ls
Comparable#methods: <  <=  ==  >  >=  between?
Twingly::URL#methods: <=>  domain  host  inspect  normalized  normalized_host  normalized_path  normalized_scheme  origin  path  scheme  sld  tld  to_s  trd  valid?  without_scheme
self.methods: __pry__
instance variables: @addressable_uri  @public_suffix_domain
locals: _  __  _dir_  _ex_  _file_  _in_  _out_  _pry_
[16] pry(#<Twingly::URL>):1> @public_suffix_domain
=> #<PublicSuffix::Domain:0x007fc3b9ab7190 @sld=nil, @tld="com", @trd=nil>
[17] pry(#<Twingly::URL>):1> @public_suffix_domain.domain
=> nil

Strange :)

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 26, 2015

[19] pry(#<Twingly::URL>):1> @public_suffix_domain.valid?
=> false
[20] pry(#<Twingly::URL>):1> @public_suffix_domain.valid_domain?
=> false

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 26, 2015

Still think this case should count as a valid URL.

$ curl -s -v `dig +short to`
* Rebuilt URL to: 216.74.32.107/
*   Trying 216.74.32.107...
* Connected to 216.74.32.107 (216.74.32.107) port 80 (#0)
> GET / HTTP/1.1
> Host: 216.74.32.107
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 26 Oct 2015 16:33:40 GMT
< Server: Apache/2.2.22
< Last-Modified: Thu, 07 Nov 2013 06:30:52 GMT
< ETag: "6e01ed-b1-4ea90683f1f00"
< Accept-Ranges: bytes
< Content-Length: 177
< Vary: Accept-Encoding
< Content-Type: text/html
<
<html><body><h1>It works!</h1>
<p>This is the default web page for this server.</p>
<p>The web server software is running but no content has been added, yet.</p>
</body></html>
* Connection #0 to host 216.74.32.107 left intact

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 26, 2015

.to is not the only one: http://blog.towo.eu/a-records-on-top-level-domains/

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 26, 2015

[4] pry(main)> Faraday.get "http://.to"
Faraday::ConnectionFailed: getaddrinfo: nodename nor servname provided, or not known
from /Users/dentarg/.rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:879:in `initialize'
[5] pry(main)> Faraday.get "http://to"
Faraday::ConnectionFailed: getaddrinfo: nodename nor servname provided, or not known
from /Users/dentarg/.rubies/ruby-2.2.2/lib/ruby/2.2.0/net/http.rb:879:in `initialize'

:-(

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 27, 2015

More variants to consider.

"http://[email protected]/"
"http://...com"
"http://.ly/xxx"
"http://.com.my/"
"http://.net"
"http://.com."
"http://.gl/xxx"

@dentarg
Copy link
Collaborator Author

dentarg commented Dec 11, 2015

I'm okay with these being invalid URLs, just want us to solve this and not have the nils :)

@roback
Copy link
Member

roback commented Jan 29, 2016

The best solution I can come up with is adding the following to Twingly::URL.internal_parse:

raise Twingly::URL::Error::ParseError if addressable_uri.host.start_with?(".") || addressable_uri.host.end_with?(".")

Works with all urls in this issue, but http://.twingly.com (see #70) becomes a NullURL then too.

@roback
Copy link
Member

roback commented Jan 29, 2016

Another way is adding .to_s in #domain, #trd etc., but then all URLs above would be considered valid :(

@walro
Copy link
Contributor

walro commented Jan 29, 2016

Maybe #74 will fix some (all) of these?

@roback
Copy link
Member

roback commented Jan 29, 2016

Maybe #74 will fix some (all) of these?

Looks like it (disclaimer: I have not read the whole RFC :))

@roback
Copy link
Member

roback commented Jan 29, 2016

Rubys URI.parse handles the above urls, so maby they're valid according to the rfc? In that case #74 doesn't fix any of these.

@walro
Copy link
Contributor

walro commented Jan 29, 2016

Ah, I see. Too bad.

@roback
Copy link
Member

roback commented Jan 29, 2016

Another solution is that the URL must always have a second level domain, which makes all above urls invalid.

raise Twingly::URL::Error::ParseError unless public_suffix_domain.sld

roback added a commit that referenced this issue Jan 29, 2016
The list of URLs is taken from the comments in #52.
This does not fix #52 completely as Twingly::URL#trd still
could be nil.
roback added a commit that referenced this issue Jan 29, 2016
close #52

Only added test for #trd since all other methods are tested in the
"when given bad input" context.
@roback roback closed this as completed in #78 Feb 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants