Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte #5

Open
raymondlowe opened this issue Aug 19, 2024 · 1 comment
Open

UnicodeDecodeError: 'utf-8' codec can't decode byte #5

raymondlowe opened this issue Aug 19, 2024 · 1 comment

Comments

@raymondlowe
Copy link

I tried to add some (very old) static HTML files using /add filename.html and got this:

You: /add index.html
βœ… Added 1 files to knowledge!


You: /add about.html
πŸ“‚ Files currently in memory: index.html
Traceback (most recent call last):
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 746, in <module>
    asyncio.run(main())
  File "C:\Program Files\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 675, in main
    default_chat_history = await handle_add_command(default_chat_history, *filepaths)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 283, in handle_add_command
    content = read_file_content(path)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 241, in read_file_content
    return file.read()
           ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 8493: invalid start byte

Inside a venv, Python 3.11, Windows 11.

Versions:

pip freeze
argcomplete==3.2.3
asttokens==2.4.1
attrs==23.2.0
beautifulsoup4==4.12.2
cattrs==23.2.3
certifi==2023.7.22
charset-normalizer==3.3.0
click==8.1.7
colorama==0.4.6
comm==0.2.2
debugpy==1.8.1
decorator==5.1.1
executing==2.0.1
filelock==3.12.4
idna==3.4
ipykernel==6.29.4
ipython==8.24.0
jedi==0.19.1
jupyter_client==8.6.1
jupyter_core==5.7.2
lxml==4.9.3
matplotlib-inline==0.1.7
nest-asyncio==1.6.0
numpy==1.26.0
packaging==24.0
pandas==2.1.1
parso==0.8.4
pipx==1.5.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
psutil==5.9.8
pure-eval==0.2.2
Pygments==2.18.0
python-dateutil==2.8.2
pytz==2023.3.post1
pywin32==306
pyzmq==26.0.3
requests==2.31.0
requests-cache==1.2.0
requests-file==1.5.1
six==1.16.0
soupsieve==2.5
stack-data==0.6.3
tldextract==3.6.0
tornado==6.4
traitlets==5.14.3
typing_extensions==4.11.0
tzdata==2023.3
url-normalize==1.4.3
urllib3==2.0.6
userpath==1.9.2
wcwidth==0.2.13
@AllyourBaseBelongToUs
Copy link

I tried to add some (very old) static HTML files using /add filename.html and got this:

You: /add index.html
βœ… Added 1 files to knowledge!


You: /add about.html
πŸ“‚ Files currently in memory: index.html
Traceback (most recent call last):
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 746, in <module>
    asyncio.run(main())
  File "C:\Program Files\Python311\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 675, in main
    default_chat_history = await handle_add_command(default_chat_history, *filepaths)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 283, in handle_add_command
    content = read_file_content(path)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\REDACTED\REDACTED.com\main.py", line 241, in read_file_content
    return file.read()
           ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 8493: invalid start byte

Inside a venv, Python 3.11, Windows 11.

Versions:

pip freeze
argcomplete==3.2.3
asttokens==2.4.1
attrs==23.2.0
beautifulsoup4==4.12.2
cattrs==23.2.3
certifi==2023.7.22
charset-normalizer==3.3.0
click==8.1.7
colorama==0.4.6
comm==0.2.2
debugpy==1.8.1
decorator==5.1.1
executing==2.0.1
filelock==3.12.4
idna==3.4
ipykernel==6.29.4
ipython==8.24.0
jedi==0.19.1
jupyter_client==8.6.1
jupyter_core==5.7.2
lxml==4.9.3
matplotlib-inline==0.1.7
nest-asyncio==1.6.0
numpy==1.26.0
packaging==24.0
pandas==2.1.1
parso==0.8.4
pipx==1.5.0
platformdirs==4.2.0
prompt-toolkit==3.0.43
psutil==5.9.8
pure-eval==0.2.2
Pygments==2.18.0
python-dateutil==2.8.2
pytz==2023.3.post1
pywin32==306
pyzmq==26.0.3
requests==2.31.0
requests-cache==1.2.0
requests-file==1.5.1
six==1.16.0
soupsieve==2.5
stack-data==0.6.3
tldextract==3.6.0
tornado==6.4
traitlets==5.14.3
typing_extensions==4.11.0
tzdata==2023.3
url-normalize==1.4.3
urllib3==2.0.6
userpath==1.9.2
wcwidth==0.2.13

that's because there is a special character (probably an emoji or some foreign letter) inside your HTML

You can either update the file_read function to sort/filter out special characters or convert your HTML with a tool that can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants