-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README.md benchmark dataset code #2069
Comments
Hi @douglasmacdonald, sorry you ran into these issues!
@calebrob6 do we have any contacts we can use to upgrade the default torchgeo version on PC?
I am not able to reproduce this. What version of torchvision are you using? TorchGeo uses torchvision download utils, and torchvision 0.17.1+ switched from requests to gdown for all Google Drive downloads. It may resolve the issue if you delete the file, upgrade to torchvision 0.17.1+, and install gdown.
This is indeed a version issue. The feature you are trying to use was added in #1082 and will be included in the 0.6.0 release. My personal recommendation would be to pick a different dataset, VHR-10 is actually one of the more complicated ones. If you're completely new to PyTorch, you're actually better off starting with a torchvision tutorial. All TorchGeo NonGeoDatasets are designed to be functionally identical to torchvision datasets. So if you know how to use torchvision, you know how to use torchgeo. If you still want to use VHR-10, either install the development version (0.6.0.dev0) or wait for the 0.6.0 release (maybe in 1 month?). |
Hello, One moment! Could it have anything to do with using:
Where I maybe should be using
? Best, |
VHR-10 requires 3 optional dependencies:
Running We may want to add 1 to EDIT: I opened pytorch/vision#8430 to help better document this. With this, we could use |
Should we also change the example to use a different dataset like InriaAIL or EuroSAT? |
The example is fixed (it should work now on main), but I would be happy to change to a different dataset too. I only used that example because VHR-10 was the first dataset I wrote and it had a cool prediction plot. |
@robmarkcole what version of torchvision are you using? You can turn off the integrity check by passing checksum=False to the dataset/datamodule. |
Can confirm no issues using There appears to be another issue, which I believe is due to
Usage: class VHR10DataModule(L.LightningDataModule):
def __init__(self, data_dir: str = "", batch_size: int = 4, num_workers: int = 0,):
super().__init__()
self.data_dir = data_dir
self.batch_size = batch_size
self.num_workers = num_workers
def setup(self, stage: str):
return
def train_dataloader(self):
return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)
def val_dataloader(self):
return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)
def test_dataloader(self):
return DataLoader(dataset, batch_size=self.batch_size, collate_fn=collate_fn_detection, num_workers=self.num_workers)
datamodule = VHR10DataModule(data_dir="data", batch_size=4, num_workers=0)
datamodule.setup("fit") |
The collate fn is being applied but the trainer doesn't accept a list of images but expects it to be a tensor only which definitely is a bug when each image is a different size in the VHR-10 dataset they can't be stacked properly. |
In this batch, the images all had different shapes - presume I just need to add a cropping augmentation?
|
Yep that's correct. I know in the past that Kornia had some bugs with the augmentations not properly being applied to the boxes but that appears to have been fixed. |
OK just noticed So in summary:
|
Oof I thought you were already using it or I would have suggested the datamodule. I'll take a look at fixes for this. Thanks for being an A+ test engineer! |
Trying to catch up on this thread...
Well, that didn't age well. Looks like PC will be shutting down, so no need to worry about this anymore.
If we're seeing intermittent issues with GDrive, we could rehost the dataset on HF. It appears to be released under an MIT license.
I'm happy to submit a PR to fix this, but then no one will review it... Does anyone else want to submit a PR? @ashnair1 was the last person to touch this dataset. |
PC hub (the free compute) is shutting down, the 50+ PB of data hosting and APIs that let you index into it, explorer for visualizing it, and catalog are all unchanged AFAIK |
Good catch regarding normalization. By default the images are uint8 and are loaded as floats. During training the images are normalized (in the datamodule) to a range of 0-1 before plotting which is why the training plots look normal. However while plotting samples via the method directly, the tensor has values that range from 0-255 and is in float dtype making the plot incorrect. |
@burakekim is going to inquire about redistributing VHR-10 on Hugging Face, which will allow us to get rid of the Google Drive issues and remove dependencies on rarfile and gdown. Hopefully that will solve some of the issues you encountered! I think we should also replace the README example with a simpler dataset like EuroSAT, which will finally close this issue. |
Issue
I need help getting the example code on the README.md to work. I am now concentrating on the Benchmark datasets (https://github.com/microsoft/torchgeo?tab=readme-ov-file#benchmark-datasets).
I am running on the Planetary Computer platform.
I did not have any luck with the platform's default
torchgeo
and so run!pip install torchgeo --upgrade
And this gives me version '0.5.2'.
However, I am still having problems....
dataset = VHR10('data', download=True, checksum=True)
RuntimeError: The MD5 checksum of the download file data[/NWPU](https://pccompute.westeurope.cloudapp.azure.com/NWPU) VHR-10 dataset.rar does not match the one on record.
Fix
I assume version problems.
The text was updated successfully, but these errors were encountered: