You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
2023-05-1008:56 INFO 259781:cc_net.jsonql - preparing [<cc_net.minify.MetadataFetcher object at 0x7f6b262a5d60>, <cc_net.jsonql.where object at 0x7f6b262a5b20>, <cc_net.jsonql.where object at 0x7f6b262a5d30>]
2023-05-1008:56 INFO 259781:cc_net.jsonql - Opening /tmp/wet_2019-09.paths.gz with mode 'rt'
2023-05-1008:56 INFO 259781:root - Starting download of https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz
/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py:1102: UserWarning: Swallowed error 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz while downloading https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz (1 out of 3)
warnings.warn(
/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py:1102: UserWarning: Swallowed error 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz while downloading https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz (2 out of 3)
warnings.warn(
2023-05-1008:57 INFO 259781:split - Processed 0 documents in 0.017h ( 0.0 doc/s).
2023-05-1008:57 INFO 259781:split - Found 0 splits.
2023-05-1008:57 INFO 259781:MetadataFetcher - Processed 0 documents in 0.017h ( 0.0 doc/s).
2023-05-1008:57 INFO 259781:MetadataFetcher - Read 0, stocking 0 doc in 0.1g.
2023-05-1008:57 INFO 259781:where - Selected 0 documents out of 0 ( 0.0%)
2023-05-1008:57 INFO 259781:where - Selected 0 documents out of 0 ( 0.0%)
submititERROR(2023-05-1008:57:54,174) - Submitted job triggered an exception
2023-05-1008:57 ERROR 259781:submitit - Submitted job triggered an exception
Traceback(mostrecentcalllast):
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 72, in submitit_main
process_job(args.folder)
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 65, in process_job
raise error
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 54, in process_job
result = delayed.result()
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/utils.py", line 133, in result
self._result = self.function(*self.args, **self.kwargs)
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/mine.py", line 432, in _mine_shard
jsonql.run_pipes(
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 455, in run_pipes
write_jsons(data, output)
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 496, in write_jsons
forresinsource:
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 284, in map
forxinsource:
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 277, in map
forxinsource:
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/process_wet_file.py", line 206, in __iter__
fordocinparse_warc_file(self.open_segment(segment),self.min_len):
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/process_wet_file.py", line 199, in open_segment
return jsonql.open_remote_file(url, cache=file)
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1124, in open_remote_file
raw_bytes = request_get_content(url)
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1101, in request_get_content
raise e
File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1095, in request_get_content
r.raise_for_status()
File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz
The text was updated successfully, but these errors were encountered:
When I execute:
python -m cc_net --dump 2019-13
Here is the full log. Err:
The text was updated successfully, but these errors were encountered: