Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextIOWrapper in __init__ is not constructed with UTF-8 encoding on Windows #2321

Open
Mrcubix opened this issue Oct 15, 2024 · 2 comments
Open

Comments

@Mrcubix
Copy link

Mrcubix commented Oct 15, 2024

Description

Currently using this tool to beautify js files downloaded from specified websites.
Some script, may contain Unicode character (Chinese comments in my case), which aren't parsed properly.
This tool does not seem to open either a specified file, or stdin with UTF-8 when necessary, leading to charmap errors.

write_beautified_output & process_files are both affected by the issue.

However, and this is out of scope of the issue, if the input is opened with the UTF-8 encoding, then at column 65096, a string for some reason is newlined, spaces are inserted in ids & classes, breaking a large portion of the beautifying process.

Input

The code looked like this before beautification:

see https://drunkdeer-antler.com/js/app.d3cb498c.js

Expected Output

The code should have looked like this after beautification:

see https://gist.github.com/Mrcubix/354ca746a0053a6dc1653b8b3b34583b

Actual Output

if opened with UTF-8

see https://gist.github.com/Mrcubix/741b9568f84f9c3cbf3f699a1dff0119

Steps to Reproduce

  1. Install js-beatify via pip
  2. Do the following:
more ./app.d3cb498c.js | js-beautify -o app.d3cb498c.beautified.js

Environment

OS: Windows
Python: 3.10.4

Settings

Default

@Mrcubix Mrcubix changed the title TextIOWrapper in __init__ is not constructed with UTF-8 encoding TextIOWrapper in __init__ is not constructed with UTF-8 encoding on Windows Oct 15, 2024
@bitwiseman
Copy link
Member

@Mrcubix
Have you tried this using the node.js version? I assume it doesn't occur there.
It is likely that all this is historical - intended to handle unicode issues on Python 2.7. We need to remove all the code around these corner cases.

@Mrcubix
Copy link
Author

Mrcubix commented Nov 1, 2024

The node.js version doesn't suffer this exact issue, but suffers from the same formatting issue as mentioned (which is out of scope of this issue)

if the input is opened with the UTF-8 encoding, then at column 65096, a string for some reason is newlined, spaces are inserted in ids & classes, breaking a large portion of the beautifying process.

So i ended up making a switch to other CLI formatters, twice (one of which didn't work), before getting one that actually work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants