-
Notifications
You must be signed in to change notification settings - Fork 54
Encodings
All encodings are assumed to be utf-8. We now default this behaviour everywhere and deprecate the -K
flag (or we will after my next push). I've spent an extraordinary amount of time and effort trying to figure out what it should be and make it correct, and I just don't know / don't see any reasonable way to get to the point of knowing.
Options like the -K
flag expect the user to know how to fix their encoding issues when their environment is wrong. But when it's wrong:
- The user doesn't understand that that's the issue
- Even when they do, they don't understand why
- They don't know how to fix it
- It looks like SiB's fault (it's probably their editor getting launched in different ways which results in different configurations which results in nondeterministic environment configurations)
This is not a case of newbie users, I would personally struggle to figure it out if I hit it (and in retrospect, I'm pretty sure that I have hit this situation and did not realize what the issue was). I can cite several other very experienced users who wound up in a similar situation. Less experienced users probably just give up on it.
I'll reconsider the issue if you can give me a comprehensive set of test cases that I can pass to make SiB right. I'll work with you on this. I'll do the work to make SiB pass.
The tests need to also show cases where the user's encoding is wrong (primarily that their file is utf-8 but their encodings cause Ruby to set it to ASCII-8BIT or IBM437). This is important because that problem occurs more than the need to set the encoding (based on feedback in issues and my own experience). So if I do attempt to make encodings configurable then this common case where it's configured incorrectly needs to still work.
Specifically, I'm pretty sure that this occurs when certian environment variables are missing, particularly the LANG
environment variable, though #95 identifies some others as well.
If you can show how to detect that situation then we can try to default to the utf-8 assumption in that case and bring back the -K
flag and add the -E
flag to allow the user to configure it where they are informed enough to know what they want.
- This is the first encoding issue I hit, you should probably read it if you want to make these test cases.
- Probably worth reading this too, where I explain why I think working with encodings is so difficult.
- This issue explains how Ruby combines strings that have different encodings.
- This explores how POSIX sets environment variables to specify the encodings. There may be more than is in there, though, because I seem to recall that it could figure out my environment sometimes even when I deleted the variables.
- This gist could also be useful, it hits Ruby with a barrage of variables and looks at what Ruby sets for the different places that encodings show up.
- You'll need a Windows VM to test that situation. I had to install something fancy to unzip it (I think maybe "the unarchiver" but don't totally remember), then load it up in virtual box, once installed, there is a search bar next to the start menu you can use to find the shells, there are two shells "powershell" and "cmd.exe", you'll want to work in powershell and then test any conclusions you came to in cmd.exe You can install Ruby for Windows through RubyInstaller, Git has a Windows version you can download. I also installed their package manager, chocolatey, but don't remember if I needed to or not. It was rather difficult to figure out, I think I had to get into powershell and then there were several scripts they talked about, I think I just tried 30 variations on them until I found the one that worked.
- Do a search in the various SiB repositories (especially Atom, Sublime, and the main one) for encodings to see what other users have reported.
Wish I didn't have to do this, I'd much rather SiB be correct than "usually work for most people", but right now it's not working way too often and I don't know what correct even is.