-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Console input/output procedures. #55
base: master
Are you sure you want to change the base?
Conversation
buffer.setLen(int(min(bytesRead, uint32(maxBytes)))) | ||
|
||
# Trim CR/CRLF from buffer. | ||
if len(buffer) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like duplicating strutils.stripLineEnd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem of strutils.stripLineEnd
is that it thinks that \v
and \f
are line end, but this is not a proper markers of end of line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These characters are quite obscure nowadays, but OK - if this is a general purpose procedure for reading a line of characters from the screen, why should it preserve a page break character at the end of the line, but not a new line character in the same place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zah we are supposed to read UTF-8 encoded strings are you sure \v
and \f
could not be part of UTF-8 encoded symbol which was cut because of buffer limits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in UTF-8, the first 128 characters in the ASCII table are encoded with a single byte that has a leading zero bit. All other characters are encoded with a sequence of bytes where each byte has a leading bit set to 1. More details here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zah you are right, but i still think \v
and \f
is not a proper condition for line end which we are going to receive from console. When we reading from console the only proper line end is \n
and \r\n
and nothing else, strutils
procedure is incorrect.
Add UTF-8 offset procedure. Add UTF-8 substr procedure. Add wchar_t to UTF-8 conversion procedure. Add multibyte to wchar_t conversion procedure (posix). Add UTF-8 tests. Fix password reader to validate utf-8 encoding when reading from pipe. Fix password reader to read utf-8 encoded strings from *nix console.
Add tests for UTF-8 to UTF-32 and UTF-32 to UTF-8 encoders.
I think we can extract the utf8 module + tests and combine it with my utf-8/16 module. we have some overlap functionality and some are distinct. utf8 module in this PR targeting I/O validation. my utf-8/16 module targeting lexer/parser construction. we can put them all in one module. |
Different console procedures which are missing or not properly implemented in Nim' stdlib.
Current PR introduces cross-platform password reading procedure which: