-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpaceNet8 Download broken #2366
Comments
Nevermind, I just need to learn how to use |
I'm not able to reproduce the exact error message (the download "succeeds" for me), but the downloaded file is corrupted, and tar crashes instead: > python3
>>> from torchgeo.datasets import SpaceNet8
>>> ds = SpaceNet8(root="data", split="train", download=True)
download: s3://spacenet-dataset/spacenet/SN8_floods/tarballs/Germany_Training_Public.tar.gz to data/SN8_floods/train/Germany_Training_Public.tar.gz
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/Adam/torchgeo/torchgeo/datasets/spacenet.py", line 146, in __init__
self._verify()
File "/Users/Adam/torchgeo/torchgeo/datasets/spacenet.py", line 336, in _verify
extract_archive(os.path.join(root, tarball), root)
File "/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 374, in extract_archive
extractor(from_path, to_path, compression)
File "/Users/Adam/spack/var/spack/environments/default/.spack-env/view/lib/python3.11/site-packages/torchvision/datasets/utils.py", line 220, in _extract_tar
tar.extractall(to_path)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2265, in extractall
self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2328, in _extract_one
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2411, in _extract_member
self.makefile(tarinfo, targetpath)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 2465, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/tarfile.py", line 252, in copyfileobj
buf = src.read(bufsize)
^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/gzip.py", line 301, in read
return self._buffer.read(size)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/Adam/spack/opt/spack/darwin-sequoia-m2/apple-clang-16.0.0/python-3.11.9-miamin5zo2vhkrb22ej7xpjqlcjsuugs/lib/python3.11/gzip.py", line 518, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached The checksum is indeed different. However, when I download the file outside of TorchGeo, I don't see this issue: > aws s3 cp s3://spacenet-dataset/spacenet/SN8_floods/tarballs/Germany_Training_Public.tar.gz .
> md5 Germany_Training_Public.tar.gz
MD5 (Germany_Training_Public.tar.gz) = 5f1c9ac3ea94f2909da593d894680ea2
> tar xzf Germany_Training_Public.tar.gz Unclear if this is a transient issue or something else. P.S. I think I still have SN8 (and all other versions) downloaded on our AI4EO server if you need it immediately. |
Also, the lead on SN8 was Ronny Haensch from DLR. I have an email thread with him asking about the SN8 AOIs if you want me to ping him on this. But I think we need to get to the bottom of why it isn't working inside TorchGeo first. |
You are right, the corrupted download also happens for the "test" split. I wanted to download the dataset, so I can add a datamodule for spacenet 6 and 8. Spacenet6 downloads fine with no errors. |
Description
Steps to reproduce
Or potentially, I also need to configure something else? I do have
aws-cli
installed.Version
0.7.0.dev0
The text was updated successfully, but these errors were encountered: