You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few things I found while looking for pathological cases:
http://do.ma.in:80/what/ever is normalized to http://do.ma.in/what/ever
and so is https://do.ma.in/what/ever
but https://do.ma.in:442/what/ever comes out as http://do.ma.in:443/what/ever
http://10.2.3.4/hello/world.html comes out as http://2.3.4/hello/world.html
Spaces and %20 in query strings are normalized to +
but %20 and + in path are left as is
space is changed to %20
Thinking about the application domain, I think ports should be ignored for normalization.
The IP address being trimmed seems like a bug, though perhaps not one that has negative impacts.
I'm not sure what to do about the escaping.
A few things I found while looking for pathological cases:
http://do.ma.in:80/what/ever
is normalized tohttp://do.ma.in/what/ever
and so is
https://do.ma.in/what/ever
but
https://do.ma.in:442/what/ever
comes out ashttp://do.ma.in:443/what/ever
http://10.2.3.4/hello/world.html
comes out ashttp://2.3.4/hello/world.html
Spaces and
%20
in query strings are normalized to+
but
%20
and+
in path are left as isspace is changed to
%20
UTF-8 in path is %-quoted, but
%27
is turned into'
(BUT
'
is left alone, so the result is a uniform, but'
is officially a delimiter in https://datatracker.ietf.org/doc/html/rfc3986#section-2.2)The above two were seen in the wild in:
http://www.seychellesnewsagency.com/articles/19841/Over++Seychelles%27+households+received+financial+assistance+following+Dec.++disasters
The text was updated successfully, but these errors were encountered: