You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 24, 2024. It is now read-only.
raise NonMatchingChecksumError(resource.url, tmp_path)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.123cf34cfc6e49cf974127ed04f99bb3/uc, has wrong checksum.
This might `indicate:`
* The website may be down (e.g. returned a 503 status code). Please check the url.
* For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
* The original datasets files may have been updated. In this case the TFDS `dataset` builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
* If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally
Now, the first indication is false, since I manually checked the site and its working.
Since the arxiv dataset is on google drive, the second possibility is there, so I manually downloaded the dataset to the instance using
After this I extracted the zip file and placed the extracted folders into the tensorflow_datasets/downloads/extracted folder as well as the tensorflow_datasets/downloads/manual folder in the hope that it works.
But I still get the same error, since the files specified are temp files, they are generated temporarily, so I can't simply place the zip or the extracted directory at that path.
I know I can probably try to treat this as a custom dataset, but I would like to avoid that if possible. Is there a way to manually set the data to a desired path and then continue from there?
Detailed Error
WARNING:tensorflow:From pegasus/bin/train.py:95: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
WARNING:tensorflow:From /home/devilunraveled/pegasus/pegasus/ops/public_parsing_ops.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
W1010 13:06:03.250727 139865476806464 module_wrapper.py:139] From /home/devilunraveled/pegasus/pegasus/ops/public_parsing_ops.py:92: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.
WARNING:tensorflow:From /home/devilunraveled/pegasus/pegasus/params/estimator_utils.py:50: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
W1010 13:06:03.507730 139865476806464 module_wrapper.py:139] From /home/devilunraveled/pegasus/pegasus/params/estimator_utils.py:50: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.
WARNING:tensorflow:Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f3473f112f0>) includes params argument, but params are not passed to Estimator.
W1010 13:06:03.508445 139865476806464 estimator.py:1994] Estimator's model_fn (<function _estimator_model_fn.<locals>.model_fn at 0x7f3473f112f0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Using config: {'_model_dir': 'ckpt/pegasus_ckpt/arxiv', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3473f0f550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
I1010 13:06:03.509111 139865476806464 estimator.py:212] Using config: {'_model_dir': 'ckpt/pegasus_ckpt/arxiv', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': 1000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3473f0f550>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=None, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None, eval_training_input_configuration=2, experimental_host_call_every_n_steps=1), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
I1010 13:06:03.509505 139865476806464 tpu_context.py:220] _TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
W1010 13:06:03.509690 139865476806464 tpu_context.py:222] eval_on_tpu ignored because use_tpu is False.
WARNING:tensorflow:From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W1010 13:06:03.517121 139865476806464 deprecation.py:506] From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W1010 13:06:03.517755 139865476806464 deprecation.py:323] From /home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I1010 13:06:03.759901 139865476806464 dataset_info.py:430] Load pre-computed DatasetInfo (eg: splits, num examples,...) from GCS: scientific_papers/arxiv/1.1.1
I1010 13:06:03.972944 139865476806464 dataset_info.py:361] Load dataset info from /tmp/tmps2963og0tfds
I1010 13:06:03.975173 139865476806464 dataset_info.py:401] Field info.description from disk and from code do not match. Keeping the one from code.
I1010 13:06:03.975362 139865476806464 dataset_info.py:401] Field info.citation from disk and from code do not match. Keeping the one from code.
I1010 13:06:03.975821 139865476806464 dataset_builder.py:334] Generating dataset scientific_papers (/home/devilunraveled/tensorflow_datasets/scientific_papers/arxiv/1.1.1)
Downloading and preparing dataset scientific_papers/arxiv/1.1.1 (download: 4.20 GiB, generated: 7.07 GiB, total: 11.27 GiB) to /home/devilunraveled/tensorflow_datasets/scientific_papers/arxiv/1.1.1...
Dl Completed...: 0 url [00:00, ? url/s] I1010 13:06:04.215519 139865476806464 download_manager.py:301] Downloading https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download into /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3...
Dl Completed...: 0%| I1010 13:06:04.218434 139865476806464 download_manager.py:301] Downloading https://drive.google.com/uc?id=1lvsqvsFi3W-pE1SqNZI0s8NR9rC1tsja&export=download into /home/devilunraveled/tensorflow_datasets/downloads/ucid_1lvsqvsFi3W-pE1SqNZI0s8NR_export_downloadY_jZrsD4nW0oeCgmL5TDLaYprWnpe0-DuXkeCnmmgwQ.tmp.f7946d585e11408bb56df702b6149b6d...
Dl Completed...: 0%| /usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'drive.google.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
Extraction completed...: 0 file [00:00, ? file/s]███████████████| 2/2 [00:00<00:00, 1.46 url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 2.86 url/s]
ERROR:tensorflow:Error recorded from training_loop: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
* The website may be down (e.g. returned a 503 status code). Please check the url.
* For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
* The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
* If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally
E1010 13:06:04.912575 139865476806464 error_handling.py:75] Error recorded from training_loop: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
* The website may be down (e.g. returned a 503 status code). Please check the url.
* For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
* The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
* If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally
INFO:tensorflow:training_loop marked as finished
I1010 13:06:04.912781 139865476806464 error_handling.py:101] training_loop marked as finished
WARNING:tensorflow:Reraising captured error
W1010 13:06:04.912866 139865476806464 error_handling.py:135] Reraising captured error
Traceback (most recent call last):
File "pegasus/bin/train.py", line 95, in <module>
tf.app.run(main)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "pegasus/bin/train.py", line 90, in main
max_steps=train_steps)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3035, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 136, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 3030, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 370, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1161, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1188, in _train_model_default
input_fn, ModeKeys.TRAIN))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1025, in _get_features_and_labels_from_input_fn
self._call_input_fn(input_fn, mode))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2987, in _call_input_fn
return input_fn(**kwargs)
File "/home/devilunraveled/pegasus/pegasus/data/infeed.py", line 41, in input_fn
dataset = all_datasets.get_dataset(input_pattern, training)
File "/home/devilunraveled/pegasus/pegasus/data/all_datasets.py", line 51, in get_dataset
dataset, _ = builder.build(input_pattern, shuffle_files)
File "/home/devilunraveled/pegasus/pegasus/data/datasets.py", line 199, in build
dataset, num_examples = self.load(build_name, split, shuffle_files)
File "/home/devilunraveled/pegasus/pegasus/data/datasets.py", line 157, in load
data_dir=self.data_dir)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 53, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/registered.py", line 339, in load
dbuilder.download_and_prepare(**download_and_prepare_kwargs)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/api_utils.py", line 53, in disallow_positional_args_dec
return fn(*args, **kwargs)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 364, in download_and_prepare
download_config=download_config)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 1072, in _download_and_prepare
max_examples_per_split=download_config.max_examples_per_split,
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/dataset_builder.py", line 933, in _download_and_prepare
dl_manager, **split_generators_kwargs):
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/summarization/scientific_papers.py", line 112, in _split_generators
dl_paths = dl_manager.download_and_extract(_URLS)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 419, in download_and_extract
return _map_promise(self._download_extract, url_or_urls)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 462, in _map_promise
res = utils.map_nested(_wait_on_promise, all_promises)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 147, in map_nested
for k, v in data_struct.items()
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 147, in <dictcomp>
for k, v in data_struct.items()
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/utils/py_utils.py", line 161, in map_nested
return function(data_struct)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 446, in _wait_on_promise
return p.get()
File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 512, in get
return self._target_settled_value(_raise=True)
File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 516, in _target_settled_value
return self._target()._settled_value(_raise)
File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 226, in _settled_value
reraise(type(raise_val), raise_val, self._traceback)
File "/usr/local/lib/python3.7/dist-packages/six.py", line 703, in reraise
raise value
File "/usr/local/lib/python3.7/dist-packages/promise/promise.py", line 87, in try_catch
return (handler(*args, **kwargs), None)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 306, in callback
resource, download_dir_path, checksum, dl_size)
File "/home/devilunraveled/.local/lib/python3.7/site-packages/tensorflow_datasets/core/download/download_manager.py", line 261, in _handle_download_result
raise NonMatchingChecksumError(resource.url, tmp_path)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact https://drive.google.com/uc?id=1b3rmCSIoh6VhD4HKWjI4HOW-cSwcwbeC&export=download, downloaded to /home/devilunraveled/tensorflow_datasets/downloads/ucid_1b3rmCSIoh6VhD4H-cSwcwbeC_export_downloadwN6uevfZyH8l3632IfcSb3CNfcrG01PHVkiDCEoAAHY.tmp.b1856425a2844ecb95292d15954caef3/uc, has wrong checksum. This might indicate:
* The website may be down (e.g. returned a 503 status code). Please check the url.
* For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See https://github.com/tensorflow/datasets/issues/1482
* The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
* If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally
The text was updated successfully, but these errors were encountered:
No, I did not find a resolution, my guess is that I'll need to change the datasets themselves, since the checkSumError will not go away, (unless of course) you modify the code for loading that itself. I eventually just used the model from huggingface. If you can do it, I recommend doing that, because the repository is probably not being maintained.
TL;DR:
NonMatchingChecksumError when trying to pre-train the model.
Detailed Issue :
I have created a google instance and followed the steps to download the repository and also the checkpoint files.
Since I used a VM, the requirements.txt was installed successfully.
However, after running the command for pre-training on the arxiv dataset,
I get a long error message, ending with :
Now, the first indication is false, since I manually checked the site and its working.
Since the
arxiv
dataset is on google drive, the second possibility is there, so I manually downloaded the dataset to the instance usingAfter this I extracted the zip file and placed the extracted folders into the
tensorflow_datasets/downloads/extracted
folder as well as thetensorflow_datasets/downloads/manual
folder in the hope that it works.But I still get the same error, since the files specified are temp files, they are generated temporarily, so I can't simply place the zip or the extracted directory at that path.
I know I can probably try to treat this as a custom dataset, but I would like to avoid that if possible. Is there a way to manually set the data to a desired path and then continue from there?
Detailed Error
The text was updated successfully, but these errors were encountered: