Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use getPath instead of getRawPath to prevent doulbe encoding of the URI #3483

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jensmatw
Copy link
Contributor

Use getPath instead of getRawPath to prevent doulbe encoding of the URI to fix issue #3482.

@jensmatw jensmatw marked this pull request as ready for review July 30, 2024 12:33
@rworsnop
Copy link

See #3437

Copy link
Member

@spencergibb spencergibb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test

@rworsnop
Copy link

rworsnop commented Sep 25, 2024

I'm not sure this is going to work.
Consider the url "http://example.com/foo–bar foo" (that's an en dash, not a dash):

 String url = "http://example.com/foo%E2%80%93bar%20foo";
 URI uri = URI.create(url);
 String path = uri.getPath();
 URI uri2 = UriComponentsBuilder.fromUri(uri).replacePath(path).build().toUri(); 

uri2 is "http://example.com/foo–bar%20foo", so the space got re-encoded properly, but the en dash did not.

@jensmatw
Copy link
Contributor Author

jensmatw commented Sep 27, 2024

I'm not sure this is going to work. Consider the url "http://example.com/foo–bar foo" (that's an en dash, not a dash):

You are right, it only solves the issue for ascii characters because it seems UriComponentsBuilder does not encode anything else.
I'm not sure if there is an easy solution. I have seen your PR (#3437), which would encode space and the en dash correctly. But for rewrite path it would do the regex on the encoded url which i guess no one expects when writing a rewrite rule.

@rworsnop @spencergibb
Do you have an idea how we could support both, rewrite (applying regex) on the correct path and still maintain support for non-ascii?

@jensmatw
Copy link
Contributor Author

@rworsnop it seems that neither jdks URI nor the UriComponentsBuilder from Spring escapes unicodes outside of ascii.

jshell> new URI("http", "localhost", "/te-st test", null)
$32 ==> http://localhost/te-st%20test
UriComponentsBuilder.fromUri(request.uri()).replacePath("/te-st test").build().toUri();
http://localhost:8080/te-st%20test

Yet, the code from PR is still working for unicode characters, because it seems the http-client or something after the filter is escaping the en-dash correctly. But I found another issue: While the URI from componentsbuilder with te-st is sending te%E2%80%93st and te set is sending te%20% there is an issue if you have both! If we leave the filter with the URI te-st test it gets send to the downstream server as te%E2%80%93st%2520test, so it is double-encoded again. Something in the spring chain is "detecting" the unicode character and decides to encode everything again. Maybe it is also somewhere in the JDK.

@jensmatw
Copy link
Contributor Author

I've found a solution, it seems .encode() from the UriComponentsBuilder does encode everything correctly. It does not double encode the space. It still encodes the unicodes (see tests).

@spencergibb tests have also been added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants