Change trunc to handle unicode (rune counting) #308

andrewmostello · 2021-06-14T16:00:00Z

Update the trunc func to count and truncate by rune instead of by byte. When truncating by byte, if a multi-byte unicode character is encountered, the character will be split and become invalid. Instead, trunc now looks at the rune count for truncation and a slice of runes for performing truncation.

After some searching it looks like len([]rune(s)) is compiler optimized (https://go-review.googlesource.com/c/go/+/108985), so conversion of the string into a rune slice is not done until trunc knows truncation needs to be done.

A quick note that this does not fix abbrev, which performs byte counting like trunc did, but is in a separate package. If this PR is merged, or upon maintainer request in review, I can create a PR to the goutils package.

Update the trunc func to count and truncate by rune instead of by byte. When truncating by byte, if a multi-byte unicode character is encountered, the character will be split and become invalid. Instead, trunc now looks at the rune count for truncation and a slice of runes for performing truncation. After some searching it looks like len([]rune(s)) is compiler optimized (https://go-review.googlesource.com/c/go/+/108985), so conversion of the string into a rune slice is not done until trunc knows truncation needs to be done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change trunc to handle unicode (rune counting) #308

Change trunc to handle unicode (rune counting) #308

andrewmostello commented Jun 14, 2021

Change trunc to handle unicode (rune counting) #308

Are you sure you want to change the base?

Change trunc to handle unicode (rune counting) #308

Conversation

andrewmostello commented Jun 14, 2021