Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible url normalization issues #72

Open
philbudne opened this issue Dec 29, 2023 · 1 comment
Open

possible url normalization issues #72

philbudne opened this issue Dec 29, 2023 · 1 comment
Labels
bug Something isn't working question Further information is requested

Comments

@philbudne
Copy link
Contributor

A few things I found while looking for pathological cases:

http://do.ma.in:80/what/ever is normalized to http://do.ma.in/what/ever
and so is https://do.ma.in/what/ever
but https://do.ma.in:442/what/ever comes out as http://do.ma.in:443/what/ever

http://10.2.3.4/hello/world.html comes out as http://2.3.4/hello/world.html

Spaces and %20 in query strings are normalized to +
but %20 and + in path are left as is
space is changed to %20

UTF-8 in path is %-quoted, but %27 is turned into '
(BUT ' is left alone, so the result is a uniform, but ' is officially a delimiter in https://datatracker.ietf.org/doc/html/rfc3986#section-2.2)

The above two were seen in the wild in:
http://www.seychellesnewsagency.com/articles/19841/Over++Seychelles%27+households+received+financial+assistance+following+Dec.++disasters

@rahulbot
Copy link
Contributor

Thinking about the application domain, I think ports should be ignored for normalization.
The IP address being trimmed seems like a bug, though perhaps not one that has negative impacts.
I'm not sure what to do about the escaping.

@rahulbot rahulbot added bug Something isn't working question Further information is requested labels Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants