Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change trunc to handle unicode (rune counting) #308

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andrewmostello
Copy link
Contributor

Update the trunc func to count and truncate by rune instead of by byte. When truncating by byte, if a multi-byte unicode character is encountered, the character will be split and become invalid. Instead, trunc now looks at the rune count for truncation and a slice of runes for performing truncation.

After some searching it looks like len([]rune(s)) is compiler optimized (https://go-review.googlesource.com/c/go/+/108985), so conversion of the string into a rune slice is not done until trunc knows truncation needs to be done.

A quick note that this does not fix abbrev, which performs byte counting like trunc did, but is in a separate package. If this PR is merged, or upon maintainer request in review, I can create a PR to the goutils package.

Update the trunc func to count and truncate by rune instead of by
byte. When truncating by byte, if a multi-byte unicode character is
encountered, the character will be split and become invalid. Instead,
trunc now looks at the rune count for truncation and a slice of runes
for performing truncation.

After some searching it looks like len([]rune(s)) is compiler
optimized (https://go-review.googlesource.com/c/go/+/108985), so
conversion of the string into a rune slice is not done until trunc
knows truncation needs to be done.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant