Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider Text instead of ByteString #54

Open
tysonzero opened this issue Aug 28, 2019 · 2 comments
Open

Consider Text instead of ByteString #54

tysonzero opened this issue Aug 28, 2019 · 2 comments

Comments

@tysonzero
Copy link

URI's are explicitly declared to be a sequence of characters, and not a sequence of octets, as per the RFC.

Thus ByteString seems like a dangerous type to use for this purpose, as it represents a sequence of octets and not a sequence of characters.

This would also be more compatible with IRIs, as according to the RFC they are also a sequence of characters, and the characters do not fit within ASCII.

@tysonzero tysonzero changed the title What is the reason for choosing bytestring over text Consider Text instead of ByteString Aug 28, 2019
@hasufell
Copy link
Collaborator

I don't think this library clashes with the spec. The interpretation of the bytestrings is left to the caller. That actually seems like the right thing to do:

This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text.

The characters in the ABNF grammar are ASCII and as such we don't need to know the encoding to parse:

The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII]. Because a URI is a sequence of characters, we must invert that relation in order to understand the URI syntax.

@hasufell
Copy link
Collaborator

That said, I'd actually say 'Text' is wrong and dangerous, because it makes the decoding choice for you (UTF-8), which is not what the spec says.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants