Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url #49

Open
Hieunohair opened this issue May 10, 2023 · 1 comment

Comments

@Hieunohair
Copy link

When I execute:
python -m cc_net --dump 2019-13

Here is the full log. Err:

2023-05-10 08:56 INFO 259781:cc_net.jsonql - preparing [<cc_net.minify.MetadataFetcher object at 0x7f6b262a5d60>, <cc_net.jsonql.where object at 0x7f6b262a5b20>, <cc_net.jsonql.where object at 0x7f6b262a5d30>]
2023-05-10 08:56 INFO 259781:cc_net.jsonql - Opening /tmp/wet_2019-09.paths.gz with mode 'rt'
2023-05-10 08:56 INFO 259781:root - Starting download of https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz
/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py:1102: UserWarning: Swallowed error 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz while downloading https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz (1 out of 3)
  warnings.warn(
/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py:1102: UserWarning: Swallowed error 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz while downloading https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz (2 out of 3)
  warnings.warn(
2023-05-10 08:57 INFO 259781:split - Processed 0 documents in 0.017h (  0.0 doc/s).
2023-05-10 08:57 INFO 259781:split - Found 0 splits.
2023-05-10 08:57 INFO 259781:MetadataFetcher - Processed 0 documents in 0.017h (  0.0 doc/s).
2023-05-10 08:57 INFO 259781:MetadataFetcher - Read 0, stocking 0 doc in 0.1g.
2023-05-10 08:57 INFO 259781:where - Selected 0 documents out of 0 ( 0.0%)
2023-05-10 08:57 INFO 259781:where - Selected 0 documents out of 0 ( 0.0%)
submitit ERROR (2023-05-10 08:57:54,174) - Submitted job triggered an exception
2023-05-10 08:57 ERROR 259781:submitit - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 72, in submitit_main
    process_job(args.folder)
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 65, in process_job
    raise error
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/submission.py", line 54, in process_job
    result = delayed.result()
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/submitit/core/utils.py", line 133, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/mine.py", line 432, in _mine_shard
    jsonql.run_pipes(
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 455, in run_pipes
    write_jsons(data, output)
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 496, in write_jsons
    for res in source:
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 284, in map
    for x in source:
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 277, in map
    for x in source:
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/process_wet_file.py", line 206, in __iter__
    for doc in parse_warc_file(self.open_segment(segment), self.min_len):
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/process_wet_file.py", line 199, in open_segment
    return jsonql.open_remote_file(url, cache=file)
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1124, in open_remote_file
    raw_bytes = request_get_content(url)
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1101, in request_get_content
    raise e
  File "/home/admin1/Documents/hieu/Code/ccnet/cc_net/cc_net/jsonql.py", line 1095, in request_get_content
    r.raise_for_status()
  File "/home/admin1/miniconda3/envs/ccnetpy38/lib/python3.8/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://data.commoncrawl.org/crawl-data/CC-MAIN-2019-09/segments/1550247479159.2/wet/CC-MAIN-20190215204316-20190215230316-00200.warc.wet.gz
@namespace-Pt
Copy link

@Hieunohair Did you solve it? I have the same problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants