Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby compatibility (drop libidn requirement) #66

Closed
walro opened this issue Nov 16, 2015 · 23 comments
Closed

JRuby compatibility (drop libidn requirement) #66

walro opened this issue Nov 16, 2015 · 23 comments

Comments

@walro
Copy link
Contributor

walro commented Nov 16, 2015

Since the introduction of the idn-ruby gem in #63 we can no longer run under the Java runtime. Some ideas from the top of my head:

  • Stop using the gem
  • Is it possibly to have conditionals in the Gemfile? As in, don't use idn-ruby for Java platform.
  • If above is possible it would mean that we'd handle urls different depending upon runtime which is not that desirable.
@jage
Copy link
Contributor

jage commented Nov 16, 2015

Is it possibly to have conditionals in the Gemfile? As in, don't use ìdn-ruby` for Java platform.

It is possible (See platform in http://bundler.io/man/gemfile.5.html)

If above is possible it would mean that we'd handle urls different depending upon runtime which is not that desirable.

Agree, we'd like the same behaviour under JRuby and Ruby.

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

Since the introduction of the idn-ruby gem in #63 we can no longer run under the Java runtime.

#63 fixed #48 which was about sporkmonger/addressable#215, which now has been fixed in sporkmonger/addressable@cf8cc7d and released in Addressable 2.4.0.

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

Is it possibly to have conditionals in the Gemfile? As in, don't use idn-ruby for Java platform.

Yes: https://github.com/sporkmonger/addressable/blob/addressable-2.4.0/Gemfile#L27

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

Tried Adressable 2.4.0 without idn-ruby

$ git d
diff --git a/lib/twingly/url.rb b/lib/twingly/url.rb
index 282e13a..499579e 100644
--- a/lib/twingly/url.rb
+++ b/lib/twingly/url.rb
@@ -1,5 +1,4 @@
 require "addressable/uri"
-require "addressable/idna/native"
 require "public_suffix"

 require_relative "url/null_url"
@@ -16,7 +15,7 @@ module Twingly
     ERRORS = [
       Addressable::URI::InvalidURIError,
       PublicSuffix::DomainInvalid,
-      IDN::Idna::IdnaError,
+      Addressable::IDNA::PunycodeBadInput,
     ]

     def self.parse(potential_url)
diff --git a/twingly-url.gemspec b/twingly-url.gemspec
index 915e5d2..b8be843 100644
--- a/twingly-url.gemspec
+++ b/twingly-url.gemspec
@@ -12,9 +12,8 @@ Gem::Specification.new do |s|
   s.license     = "MIT"
   s.required_ruby_version = "~> 2.2"

-  s.add_dependency "addressable", "~> 2"
+  s.add_dependency "addressable", "~> 2.4"
   s.add_dependency "public_suffix", "~> 1.4"
-  s.add_dependency "idn-ruby", "~> 0.1"

   s.add_development_dependency "rake", "~> 10"
   s.add_development_dependency "rspec", "~> 3"

Two failures (about the same URL)

Failures:

  1) Twingly::URL.parse when given valid urls does not ruin the url "http://xn--zckp1cyg1.sblo.jp/"
     Failure/Error: expect(described_class.parse(valid_url).to_s).to eq(valid_url)

       expected: "http://xn--zckp1cyg1.sblo.jp/"
            got: ""

       (compared using ==)
     # ./spec/lib/twingly/url_spec.rb:59:in `block (5 levels) in <top (required)>'

  2) Twingly::URL#valid? returns true for the valid url "http://xn--zckp1cyg1.sblo.jp/"
     Failure/Error: expect(described_class.parse(valid_url).valid?).to be true

       expected true
            got false
     # ./spec/lib/twingly/url_spec.rb:195:in `block (4 levels) in <top (required)>'

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

# with idn-ruby
[6] pry(main)> Addressable::URI.parse("http://xn--zckp1cyg1.sblo.jp/").display_uri
=> #<Addressable::URI:0x3ff9cf027390 URI:http://xn--zckp1cyg1.sblo.jp/>

# without
[6] pry(main)> Addressable::URI.parse("http://xn--zckp1cyg1.sblo.jp/").display_uri
Addressable::IDNA::PunycodeBadInput: Input is invalid.
from /Users/dentarg/.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/pure.rb:562:in `punycode_decode'

@walro
Copy link
Contributor Author

walro commented Dec 7, 2015

Is http://xn--zckp1cyg1.sblo.jp/ valid? :)

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

It's resolvable at least:

$ dig xn--zckp1cyg1.sblo.jp

; <<>> DiG 9.9.5 <<>> xn--zckp1cyg1.sblo.jp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6862
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;xn--zckp1cyg1.sblo.jp.     IN  A

;; ANSWER SECTION:
xn--zckp1cyg1.sblo.jp.  28712   IN  A   59.106.18.133
xn--zckp1cyg1.sblo.jp.  28712   IN  A   59.106.18.137
xn--zckp1cyg1.sblo.jp.  28712   IN  A   59.106.18.136
xn--zckp1cyg1.sblo.jp.  28712   IN  A   59.106.18.132

;; Query time: 1 msec
;; SERVER: 10.21.1.22#53(10.21.1.22)
;; WHEN: Mon Dec 07 15:39:38 CET 2015
;; MSG SIZE  rcvd: 114
$ curl -v -L http://xn--zckp1cyg1.sblo.jp/
*   Trying 59.106.18.133...
* Connected to xn--zckp1cyg1.sblo.jp (59.106.18.133) port 80 (#0)
> GET / HTTP/1.1
> Host: xn--zckp1cyg1.sblo.jp
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 400 Bad Request
< Date: Mon, 07 Dec 2015 14:38:50 GMT
< Server: Apache
< Content-Length: 226
< Connection: close
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>
* Closing connection 0

@walro
Copy link
Contributor Author

walro commented Dec 7, 2015

Yes, but someone might have registered a "faulty" sub-domain. I tried some different online tools and they all had troubles converting it.

@walro
Copy link
Contributor Author

walro commented Dec 7, 2015

But even so, we should of course not blow up.

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

@walro If you want to you can weigh in at sporkmonger/addressable#219

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

@walro what tools did you try? sounds like we can change our spec then, if we revert the use of libidn

@walro
Copy link
Contributor Author

walro commented Dec 7, 2015

@walro what tools did you try? sounds like we can change our spec then, if we revert the use of libidn

https://www.punycoder.com/ and http://www.charset.org/punycode.php?encoded=xn--zckp1cyg1&decode=Punycode+to+normal+text

The latter does infact handle most part of it, but I suspect that the "box" at the end implicates an error.

However, I think our library should just treat it as a normal string that just happens to start with xn--, rather than blowing up so I don't see us reverting the use of libidn just yet.

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

I played around with the CLI for libidn (from Homebrew)

$ idn --version
idn (GNU Libidn) 1.32
Copyright (C) 2015 Simon Josefsson.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Simon Josefsson.
$ idn --debug --idna-to-unicode xn--zckp1cyg1
Charset `UTF-8'.
input[0] = U+0078
input[1] = U+006e
input[2] = U+002d
input[3] = U+002d
input[4] = U+007a
input[5] = U+0063
input[6] = U+006b
input[7] = U+0070
input[8] = U+0031
input[9] = U+0063
input[10] = U+0079
input[11] = U+0067
input[12] = U+0031
output[0] = U+0078
output[1] = U+006e
output[2] = U+002d
output[3] = U+002d
output[4] = U+007a
output[5] = U+0063
output[6] = U+006b
output[7] = U+0070
output[8] = U+0031
output[9] = U+0063
output[10] = U+0079
output[11] = U+0067
output[12] = U+0031
xn--zckp1cyg1

$ echo $?
0

Something valid:

$ idn --debug --idna-to-unicode xn--rksmrgs-5wao1o
Charset `UTF-8'.
input[0] = U+0078
input[1] = U+006e
input[2] = U+002d
input[3] = U+002d
input[4] = U+0072
input[5] = U+006b
input[6] = U+0073
input[7] = U+006d
input[8] = U+0072
input[9] = U+0067
input[10] = U+0073
input[11] = U+002d
input[12] = U+0035
input[13] = U+0077
input[14] = U+0061
input[15] = U+006f
input[16] = U+0031
input[17] = U+006f
output[0] = U+0072
output[1] = U+00e4
output[2] = U+006b
output[3] = U+0073
output[4] = U+006d
output[5] = U+00f6
output[6] = U+0072
output[7] = U+0067
output[8] = U+00e5
output[9] = U+0073
räksmörgås

$ echo $?
0

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

Played around some more

$ idn --punycode-decode xn--rksmrgs-5wao1o
xn--rksmrÂgÏÃs
$ idn --punycode-decode xn--zckp1cyg1
idn: punycode_decode: Invalid input
$ idn -h
Usage: idn [OPTION]... [STRINGS]...
Internationalized Domain Name (IDN) convert STRINGS, or standard input.

Command line interface to the internationalized domain name library.

All strings are expected to be encoded in the preferred charset used
by your locale.  Use `--debug' to find out what this charset is.  You
can override the charset used by setting environment variable CHARSET.

To process a string that starts with `-', for example `-foo', use `--'
to signal the end of parameters, as in `idn --quiet -a -- -foo'.

Mandatory arguments to long options are mandatory for short options too.
  -h, --help               Print help and exit
  -V, --version            Print version and exit
  -s, --stringprep         Prepare string according to nameprep profile
  -d, --punycode-decode    Decode Punycode
  -e, --punycode-encode    Encode Punycode
  -a, --idna-to-ascii      Convert to ACE according to IDNA (default mode)
  -u, --idna-to-unicode    Convert from ACE according to IDNA
      --allow-unassigned   Toggle IDNA AllowUnassigned flag (default off)
      --usestd3asciirules  Toggle IDNA UseSTD3ASCIIRules flag (default off)
      --no-tld             Don't check string for TLD specific rules
                             Only for --idna-to-ascii and --idna-to-unicode
  -n, --nfkc               Normalize string according to Unicode v3.2 NFKC
  -p, --profile=STRING     Use specified stringprep profile instead
                             Valid stringprep profiles: `Nameprep',
                             `iSCSI', `Nodeprep', `Resourceprep',
                             `trace', `SASLprep'
      --debug              Print debugging information
      --quiet              Silent operation

Report bugs to: [email protected]
GNU Libidn home page: <http://www.gnu.org/software/libidn/>
General help using GNU software: <http://www.gnu.org/gethelp/>

@dentarg
Copy link
Collaborator

dentarg commented Dec 7, 2015

Ah [21:47] Robin: prova punycode-decode utan xn--

$ idn --punycode-decode zckp1cyg1
idn: punycode_decode: Invalid input

$ idn --punycode-decode rksmrgs-5wao1o
räksmörgås

@dentarg
Copy link
Collaborator

dentarg commented Jan 5, 2016

sporkmonger/addressable#219

Has been solved (sporkmonger/addressable@295d7ba) but not yet released.

@dentarg
Copy link
Collaborator

dentarg commented Jan 5, 2016

sporkmonger/addressable#219

Has been solved (sporkmonger/addressable@295d7ba) but not yet released.

Summary: with the fix in addressable we would be able to handle #48 without libidn

@dentarg dentarg changed the title JRuby compatibility JRuby compatibility (drop libidn requirement) Jan 5, 2016
@dentarg
Copy link
Collaborator

dentarg commented Sep 12, 2016

Summary: with the fix in addressable we would be able to handle #48 without libidn

with #90 we can drop the use of libidn, because we no longer need to use #display_uri in addressable (which has the bug that has been fixed but not yet released)

@dentarg
Copy link
Collaborator

dentarg commented Sep 12, 2016

Hmm, this is tricky... :|

We have seen URLs like

http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com

Without libidn it's valid:

[1] pry(main)> Twingly::URL.parse("http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com")
=> #<Twingly::URL:0x3fc872208c4c http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com>

With libidn it's not valid:

[1] pry(main)> Twingly::URL.parse("http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com")
=> #<Twingly::URL::NullURL:0x007fe54b31c378>

this is the error

[4] pry(main)> Addressable::URI.heuristic_parse("http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com").normalized_host
IDN::Idna::IdnaError: Output would be too large or too small (5)
from /Users/dentarg/.gem/ruby/2.2.5/gems/addressable-2.4.0/lib/addressable/idna/native.rb:38:in `toASCII'

no libidn (using the pure Ruby version in addressable):

[1] pry(main)> Addressable::URI.heuristic_parse("http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com").normalized_host
=> "acinusfallumtrompetumnulluncreditumvisumestatcuadlongumetcefallumest.com"

@walro
Copy link
Contributor Author

walro commented Sep 12, 2016

@dentarg
Copy link
Collaborator

dentarg commented Sep 12, 2016

I guess the consequence would be that we will see more SocketError happening:

[3] pry(main)> Net::HTTP.get(URI("http://AcinusFallumTrompetumNullunCreditumVisumEstAtCuadLongumEtCefallumEst.com"))
SocketError: getaddrinfo: nodename nor servname provided, or not known
from /Users/dentarg/.rubies/ruby-2.2.5/lib/ruby/2.2.0/net/http.rb:879:in `initialize'

dentarg added a commit that referenced this issue Sep 12, 2016
Currently, this URL isn't valid, but if we drop libidn (which we talked
about), it would be valid, due to the way Addressable 2.4.0 works.
See comment at: #66 (comment)

Adding this URL to the tests so we will notice the change in behaviour,
if/when we drop libidn.
@dentarg
Copy link
Collaborator

dentarg commented Sep 12, 2016

I reported the above mentioned difference in behaviour to addressable: sporkmonger/addressable#239

dentarg added a commit that referenced this issue Nov 3, 2016
Instead, we use the pure Ruby IDNA solution in addressable.

Close #66.
@dentarg dentarg mentioned this issue Nov 3, 2016
@dentarg
Copy link
Collaborator

dentarg commented Nov 5, 2016

I reported the above mentioned difference in behaviour to addressable: sporkmonger/addressable#239

And it was fixed in sporkmonger/addressable@c73810f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants