Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'charmap' codec can't encode characters in position 0-2: character maps to <undefined> #313

Closed
Wonder-donbury opened this issue Feb 3, 2025 · 1 comment

Comments

@Wonder-donbury
Copy link

I was trying to test via CLI commands on korean pdf documents and it ended up giving errors like this.

PS C:\Users\donghwan.lee\Documents\markitdown> markitdown test.pdf > document.md
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Scripts\markitdown.exe\__main__.py", line 7, in <module>
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Lib\site-packages\markitdown\__main__.py", line 43, in main
    print(result.text_content)
  File "C:\Users\donghwan.lee\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

I have also tried setting the walkaround of PYTHONIOENCODING=utf-8 but it doesn't works.

@Linos1391
Copy link

Try -o instead of >, I think it was fixed in #116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants