Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: md5 for D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt is 48f993223d568afedc2893f7cd76719c, expected 68fb709f2a556b132b4915f2b3893ce7 #1451

Open
YuhuYang opened this issue Jan 27, 2025 · 1 comment
Labels

Comments

@YuhuYang
Copy link

Describe the bug
When I download the 'zh-hans' model, but I get the following error:

To Reproduce
Steps to reproduce the behavior:
2025-01-27 23:16:08 INFO: Checking for updates to resources.json in case models have been updated. Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://huggingface.co/stanfordnlp/stanza-zh-hans/resolve/v1.10.0/models/tokenize/gsdsimp.pt: 100%|█| 1.38M/1.38
Traceback (most recent call last):
File "d:\papers_2\cooperation\华语树库\stanza_parse.py", line 4, in
nlp = stanza.Pipeline(lang='zh-hans')
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\pipeline\core.py", line 252, in init
download_models(download_list,
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\resources\common.py", line 532, in download_models
request_file(
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\resources\common.py", line 159, in request_file
assert_file_exists(path, md5, alternate_md5)
File "D:\Anaconda\envs\stanza\lib\site-packages\stanza\resources\common.py", line 112, in assert_file_exists
raise ValueError("md5 for %s is %s, expected %s" % (path, file_md5, md5))
ValueError: md5 for D:\Anaconda\envs\stanza\Lib\site-packages\stanza\stanza_resources\zh-hans\tokenize\gsdsimp.pt is 48f993223d568afedc2893f7cd76719c, expected 68fb709f2a556b132b4915f2b3893ce7

Expected behavior
No error at the end of download.

Environment (please complete the following information):

  • OS: Windows
  • Python version: Python 3.9.21 from Anaconda
  • Stanza version: 1.10.1
@YuhuYang YuhuYang added the bug label Jan 27, 2025
@mil7
Copy link

mil7 commented Jan 29, 2025

Hi there,

we see a exactly the same error for stanza version 1.10.1 with
`resources_1.10.0.json and language "de". In our case we solved the issue by invalidating our Artifactory's cache (where our company expects us to draw our dependencies from).

Try to check something:

When I compared our cached version with the most recent json's version I realized that the md5 hash differ.
Unfortunately, for your language the md5 hash looks the same. But that might just be the case because the resources_1.10.0.json has been updated several times without changing the version. This makes it hard for caching systems to be up to date.

Is this a known bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants