Skip to content
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.

CheckSumError #245

Open
devilunraveled opened this issue Oct 10, 2023 · 3 comments
Open

CheckSumError #245

devilunraveled opened this issue Oct 10, 2023 · 3 comments

Comments

@devilunraveled
Copy link

TL;DR:
NonMatchingChecksumError when trying to pre-train the model.

Detailed Issue :
I have created a google instance and followed the steps to download the repository and also the checkpoint files.

Since I used a VM, the requirements.txt was installed successfully.

However, after running the command for pre-training on the arxiv dataset,

python3 pegasus/bin/train.py --params=arxiv_transformer --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 --model_dir=ckpt/pegasus_ckpt/arxiv

I get a long error message, ending with :

raise NonMatchingChecksumError(resource.url, tmp_path)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.123cf34cfc6e49cf974127ed04f99bb3/uc, has wrong checksum. 

This might `indicate:`

 * The website may be down (e.g. returned a 503 status code). Please check the url.
 * For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
 * The original datasets files may have been updated. In this case the TFDS `dataset` builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
 * If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally

Now, the first indication is false, since I manually checked the site and its working.
Since the arxiv dataset is on google drive, the second possibility is there, so I manually downloaded the dataset to the instance using

gdown https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download

After this I extracted the zip file and placed the extracted folders into the tensorflow_datasets/downloads/extracted folder as well as the tensorflow_datasets/downloads/manual folder in the hope that it works.

But I still get the same error, since the files specified are temp files, they are generated temporarily, so I can't simply place the zip or the extracted directory at that path.

I know I can probably try to treat this as a custom dataset, but I would like to avoid that if possible. Is there a way to manually set the data to a desired path and then continue from there?

Detailed Error

WARNING:tensorflow:From pegasus/bin/train.py:95: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

WARNING:tensorflow:From /home/devilunraveled/pegasus/pegasus/ops/public_parsing_ops.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

W1010 13:06:03.250727 139865476806464 module_wrapper.py:139] From /home/devilunraveled/pegasus/pegasus/ops/public_parsing_ops.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From /home/devilunraveled/pegasus/pegasus/params/estimator_utils.py:50: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W1010 13:06:03.507730 139865476806464 module_wrapper.py:139] From /home/devilunraveled/pegasus/pegasus/params/estimator_utils.py:50: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f3473f112f0>) includes params argument, but params are not passed to Estimator.
W1010 13:06:03.508445 139865476806464 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f3473f112f0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/arxiv', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3473f0f550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I1010 13:06:03.509111 139865476806464 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/arxiv', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3473f0f550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1010 13:06:03.509505 139865476806464 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W1010 13:06:03.509690 139865476806464 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W1010 13:06:03.517121 139865476806464 deprecation.py:506] From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1010 13:06:03.517755 139865476806464 deprecation.py:323] From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I1010 13:06:03.759901 139865476806464 dataset_info.py:430] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: scientific_papers/arxiv/1.1.1
I1010 13:06:03.972944 139865476806464 dataset_info.py:361] Load dataset info from /tmp/tmps2963og0tfds
I1010 13:06:03.975173 139865476806464 dataset_info.py:401] Field info.description from disk and from code do not match. Keeping the one from code.
I1010 13:06:03.975362 139865476806464 dataset_info.py:401] Field info.citation from disk and from code do not match. Keeping the one from code.
I1010 13:06:03.975821 139865476806464 dataset_builder.py:334] Generating dataset scientific_papers (/home/devilunraveled/tensorflow_datasets/scientific_papers/arxiv/1.1.1)
Downloading and preparing dataset scientific_papers/arxiv/1.1.1 (download: 4.20 GiB, generated: 7.07 GiB, total: 11.27 GiB) to /home/devilunraveled/tensorflow_datasets/scientific_papers/arxiv/1.1.1...
Dl Completed...: 0 url [00:00, ? url/s]          I1010 13:06:04.215519 139865476806464 download_manager.py:301] Downloading https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download into /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3...
Dl Completed...:   0%|                           I1010 13:06:04.218434 139865476806464 download_manager.py:301] Downloading https://drive.google.com/uc?id=1lvsqvsFi3W-pE1SqNZI0s8NR9rC1tsja&export=download into /home/devilunraveled/tensorflow_datasets/downloads/ucid_1lvsqvsFi3W-pE1SqNZI0s8NR_export_downloadY_jZrsD4nW0oeCgmL5TDLaYprWnpe0-DuXkeCnmmgwQ.tmp.f7946d585e11408bb56df702b6149b6d...
Dl Completed...:   0%|                           /usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
Extraction completed...: 0 file [00:00, ? file/s]███████████████| 2/2 [00:00<00:00,  1.46 url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00,  2.86 url/s]
ERROR:tensorflow:Error recorded from training_loop: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
 * The website may be down (e.g. returned a 503 status code). Please check the url.
 * For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
 * The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
 * If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally

E1010 13:06:04.912575 139865476806464 error_handling.py:75] Error recorded from training_loop: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
 * The website may be down (e.g. returned a 503 status code). Please check the url.
 * For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
 * The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
 * If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally

INFO:tensorflow:training_loop marked as finished
I1010 13:06:04.912781 139865476806464 error_handling.py:101] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1010 13:06:04.912866 139865476806464 error_handling.py:135] Reraising captured error
Traceback (most recent call last):
  File "pegasus/bin/train.py", line 95, in <module>
    tf.app.run(main)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "pegasus/bin/train.py", line 90, in main
    max_steps=train_steps)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
    return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
    input_fn, ModeKeys.TRAIN))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
    self._call_input_fn(input_fn, mode))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2987, in _call_input_fn
    return input_fn(**kwargs)
  File "/home/devilunraveled/pegasus/pegasus/data/infeed.py", line 41, in input_fn
    dataset = all_datasets.get_dataset(input_pattern, training)
  File "/home/devilunraveled/pegasus/pegasus/data/all_datasets.py", line 51, in get_dataset
    dataset, _ = builder.build(input_pattern, shuffle_files)
  File "/home/devilunraveled/pegasus/pegasus/data/datasets.py", line 199, in build
    dataset, num_examples = self.load(build_name, split, shuffle_files)
  File "/home/devilunraveled/pegasus/pegasus/data/datasets.py", line 157, in load
    data_dir=self.data_dir)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 53, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 339, in load
    dbuilder.download_and_prepare(**download_and_prepare_kwargs)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 53, in disallow_positional_args_dec
    return fn(*args, **kwargs)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 364, in download_and_prepare
    download_config=download_config)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1072, in _download_and_prepare
    max_examples_per_split=download_config.max_examples_per_split,
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 933, in _download_and_prepare
    dl_manager, **split_generators_kwargs):
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/summarization/scientific_papers.py", line 112, in _split_generators
    dl_paths = dl_manager.download_and_extract(_URLS)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 419, in download_and_extract
    return _map_promise(self._download_extract, url_or_urls)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 462, in _map_promise
    res = utils.map_nested(_wait_on_promise, all_promises)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 147, in map_nested
    for k, v in data_struct.items()
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 147, in <dictcomp>
    for k, v in data_struct.items()
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 161, in map_nested
    return function(data_struct)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 446, in _wait_on_promise
    return p.get()
  File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 512, in get
    return self._target_settled_value(_raise=True)
  File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 516, in _target_settled_value
    return self._target()._settled_value(_raise)
  File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 226, in _settled_value
    reraise(type(raise_val), raise_val, self._traceback)
  File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 87, in try_catch
    return (handler(*args, **kwargs), None)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 306, in callback
    resource, download_dir_path, checksum, dl_size)
  File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 261, in _handle_download_result
    raise NonMatchingChecksumError(resource.url, tmp_path)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
 * The website may be down (e.g. returned a 503 status code). Please check the url.
 * For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
 * The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
 * If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally
@jayasridharmireddi
Copy link

Hello, did you find a solution for this? I have the same error.

@devilunraveled
Copy link
Author

No, I did not find a resolution, my guess is that I'll need to change the datasets themselves, since the checkSumError will not go away, (unless of course) you modify the code for loading that itself. I eventually just used the model from huggingface. If you can do it, I recommend doing that, because the repository is probably not being maintained.

@jayasridharmireddi
Copy link

Ok, thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants