👉 satellite-image-deep-learning.com 👈

Datasets for deep learning applied to satellite and aerial imagery.

👉 satellite-image-deep-learning.com 👈

How to use this repository: if you know exactly what you are looking for (e.g. you have the paper name) you can Control+F to search for it in this page (or search in the raw markdown).

Lists of datasets

Earth Observation Database

awesome-satellite-imagery-datasets
Awesome_Satellite_Benchmark_Datasets
awesome-remote-sensing-change-detection -> dedicated to change detection
Callisto-Dataset-Collection -> datasets that use Copernicus/sentinel data
geospatial-data-catalogs -> A list of open geospatial datasets available on AWS, Earth Engine, Planetary Computer, and STAC Index
BED4RS
Satellite-Image-Time-Series-Datasets

Remote sensing dataset hubs

Radiant MLHub -> both datasets and models
Registry of Open Data on AWS
Microsoft Planetary Computer data catalog
Google Earth Engine Data Catalog

Sentinel

As part of the EU Copernicus program, multiple Sentinel satellites are capturing imagery -> see wikipedia

awesome-sentinel -> a curated list of awesome tools, tutorials and APIs related to data from the Copernicus Sentinel Satellites.
Sentinel-2 Cloud-Optimized GeoTIFFs and Sentinel-2 L2A 120m Mosaic
Open access data on GCP
Paid access to Sentinel & Landsat data via sentinel-hub and python-api
Example loading sentinel data in a notebook
Jupyter Notebooks for working with Sentinel-5P Level 2 data stored on S3. The data can be browsed here
Sentinel NetCDF data
Analyzing Sentinel-2 satellite data in Python with Keras
Xarray backend to Copernicus Sentinel-1 satellite data products
SEN2VENµS -> a dataset for the training of Sentinel-2 super-resolution algorithms
SEN12MS -> A Curated Dataset of Georeferenced Multi-spectral Sentinel-1/2 Imagery for Deep Learning and Data Fusion. Checkout SEN12MS toolbox and many referenced uses on paperswithcode.com
Sen4AgriNet -> A Sentinel-2 multi-year, multi-country benchmark dataset for crop classification and segmentation with deep learning, with and models
earthspy -> Monitor and study any place on Earth and in Near Real-Time (NRT) using the Sentinel Hub services developed by the EO research team at Sinergise
Space2Ground -> dataset with Space (Sentinel-1/2) and Ground (street-level images) components, annotated with crop-type labels for agriculture monitoring.
sentinel2tools -> downloading & basic processing of Sentinel 2 imagesry. Read Sentinel2tools: simple lib for downloading Sentinel-2 satellite images
open-sentinel-map -> The OpenSentinelMap dataset contains Sentinel-2 imagery and per-pixel semantic label masks derived from OpenStreetMap
MSCDUnet -> change detection datasets containing VHR, multispectral (Sentinel-2) and SAR (Sentinel-1)
OMBRIA -> Sentinel-1 & 2 dataset for adressing the flood mapping problem
Canadian-cropland-dataset -> a novel patch-based dataset compiled using optical satellite images of Canadian agricultural croplands retrieved from Sentinel-2
Sentinel-2 Cloud Cover Segmentation Dataset on Radiant mlhub
The Azavea Cloud Dataset which is used to train this cloud-model
fMoW-Sentinel -> The Functional Map of the World - Sentinel-2 corresponding images (fMoW-Sentinel) dataset consists of image time series collected by the Sentinel-2 satellite, corresponding to locations from the Functional Map of the World (fMoW) dataset across several different times. Used in SatMAE
Earth Surface Water Dataset -> a dataset for deep learning of surface water features on Sentinel-2 satellite images. See this ref using it in torchgeo
Ship-S2-AIS dataset -> 13k tiles extracted from 29 free Sentinel-2 products. 2k images showing ships in Denmark sovereign waters: one may detect cargos, fishing, or container ships
Amazon Rainforest dataset for semantic segmentation -> Sentinel 2 images
Mining and clandestine airstrips datasets
Satellite Burned Area Dataset -> segmentation dataset containing several satellite acquisitions related to past forest wildfires. It contains 73 acquisitions from Sentinel-2 and Sentinel-1 (Copernicus).
mmflood -> Flood delineation from Sentinel-1 SAR imagery, with paper
MATTER -> a Sentinel 2 dataset for Self-Supervised Training
Industrial Smoke Plumes
MARIDA: Marine Debris Archive
S2GLC -> High resolution Land Cover Map of Europe
Generating Imperviousness Maps from Multispectral Sentinel-2 Satellite Imagery
Sentinel-2 Water Edges Dataset (SWED)
Sentinel-1 for Science Amazonas -> forest lost time series dataset
Sentinel2 Munich480 -> dataset for crop mapping by exploiting the time series of Sentinel-2 satellite
Meadows vs Orchards -> a pixel time series dataset
SEN12_GUM -> SEN12 Global Urban Mapping Dataset
Sentinel-1&2 Image Pairs (SAR & Optical)
Sentinel-2 Image Time Series for Crop Mapping -> data for the Lombardy region in Italy
Deforestation in Ukraine from Sentinel2 data
Multitask Learning for Estimating Power Plant Greenhouse Gas Emissions from Satellite Imagery
METER-ML: A Multi-sensor Earth Observation Benchmark for Automated Methane Source Mapping -> data on Zenodo
satellite-change-events -> CaiRoad & CalFire change detection Sentinel 2 datasets
OMS2CD -> hand-labelled images for change-detection in open-pit mining areas
coal power plants’ emissions -> a dataset of coal power plants’ emissions, including images, metadata and labels.
RapidAI4EO -> dense time series satellite imagery sampled at 500,000 locations across Europe, comprising S2 & Planet imagery, with CORINE Land Cover multiclass labels for 2018
Sentinel 2 super-resolved data cubes - 92 scenes over 2 regions in Switzerland spanning 5 years
MS-HS-BCD-dataset -> multisource change detection dataset used in paper: Building Change Detection with Deep Learning by Fusing Spectral and Texture Features of Multisource Remote Sensing Images: A GF-1 and Sentinel 2B Data Case
MSOSCD -> change detection datasets containing VHR, multispectral (Sentinel-2) and SAR (Sentinel-1)
Sentinel-2 dataset for ship detection, also edited and redistributed as VDS2RAW
MineSegSAT -> dataset for paper: AN AUTOMATED SYSTEM TO EVALUATE MINING DISTURBED AREA EXTENTS FROM SENTINEL-2 IMAGERY
CropNet: An Open Large-Scale Dataset with Multiple Modalities for Climate Change-aware Crop Yield Predictions -> terabyte-sized, publicly available, and multi-modal dataset for climate change-aware crop yield predictions
Tiny CropNet dataset
CaBuAr -> California Burned Areas dataset for delineation
sen12mscr -> Multimodal Cloud Removal
Greenearthnet -> dataset specifically designed for high-resolution vegetation forecasting
MultiSenGE -> large-scale multimodal and multitemporal benchmark dataset
Floating-Marine-Debris-Data -> floating marine debris, with annotations for six debris classes, including plastic, driftwood, seaweed, pumice, sea snot, and sea foam.
Sen2Fire -> A Challenging Benchmark Dataset for Wildfire Detection using Sentinel Data
L1BSR -> 3740 pairs of overlapping image crops extracted from two L1B products
GloSoFarID -> Global multispectral dataset for Solar Farm IDentification
SICKLE -> A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Cropping Parameters. Multi-resolution time-series images from Landsat-8, Sentinel-1, and Sentinel-2
MARIDA -> Marine Debris detection from Sentinel-2
MADOS -> Marine Debris and Oil Spill from Sentinel-2
Sentinel-1 and Sentinel-2 Vessel Detection
TreeSatAI -> Sentinel-1, Sentinel-2
Sentinel-2 dataset for ship detection and characterization -> RGB
S2-SHIPS -> all 12 channels
ChatEarthNet -> A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models, utilizes Sentinel-2 data with captions generated by ChatGPT
UKFields -> over 2.3 million automatically delineated field boundaries spanning England, Wales, Scotland, and Northern Ireland
ShipWakes -> Keypoints Method for Recognition of Ship Wake Components in Sentinel-2 Images by Deep Learning
TimeSen2Crop -> a Million Labeled Samples Dataset of Sentinel 2 Image Time Series for Crop Type Classification
AgriSen-COG -> a Multicountry, Multitemporal Large-Scale Sentinel-2 Benchmark Dataset for Crop Mapping: includes an anomaly detection preprocessing step
MagicBathyNet -> a new multimodal benchmark dataset made up of image patches of Sentinel-2, SPOT-6 and aerial imagery, bathymetry in raster format and seabed classes annotations
AI2-S2-NAIP -> aligned NAIP, Sentinel-2, Sentinel-1, and Landsat images spanning the entire continental US
MuS2: A Benchmark for Sentinel-2 Multi-Image Super-Resolution
Sen4Map -> Sentinel-2 time series images, covering over 335,125 geo-tagged locations across the European Union. These geo-tagged locations are associated with detailed landcover and land-use information
CloudSEN12Plus -> the largest cloud detection dataset to date for Sentinel-2
mayrajeo S2 ship detection -> labels for Detecting marine vessels from Sentinel-2 imagery with YOLOv8
Fields of The World -> instance segmentation of agricultural field boundaries
ai4boundaries -> field boundaries with Sentinel-2 and aerial photography
California Wildfire GeoImaging Dataset - CWGID -> Development and Application of a Sentinel-2 Satellite Imagery Dataset for Deep-Learning Driven Forest Wildfire Detection
POPCORN: High-resolution Population Maps Derived from Sentinel-1 and Sentinel-2
substation-seg -> segmenting substations dataset
PhilEO-downstream -> a 400GB Sentinel-2 dataset for building density estimation, road segmentation, and land cover classification.
PhilEO-pretrain -> a 500GB global dataset of Sentinel-2 images for model pre-training.
KappaSet: Sentinel-2 KappaZeta Cloud and Cloud Shadow Masks
AllClear A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery
Sentinel-2 reference cloud masks generated by an active learning method
Cloud gap-filling with deep learning for improved grassland monitoring

Landsat

Long running US program -> see Wikipedia

8 bands, 15 to 60 meters, 185km swath, the temporal resolution is 16 days
Landsat 4, 5, 7, and 8 imagery on Google, see the GCP bucket here, with Landsat 8 imagery in COG format analysed in this notebook
Landsat 8 imagery on AWS, with many tutorials and tools listed
https://github.com/kylebarron/landsat-mosaic-latest -> Auto-updating cloudless Landsat 8 mosaic from AWS SNS notifications
Visualise landsat imagery using Datashader
Landsat-mosaic-tiler -> This repo hosts all the code for landsatlive.live website and APIs.
LandsatSCD -> a change detection dataset, it consists of 8468 pairs of images, each having the spatial resolution of 416 × 416
The Landsat Irish Coastal Segmentation Dataset

VENμS

Vegetation and Environment monitoring on a New Micro-Satellite (VENμS)

VENUS L2A Cloud-Optimized GeoTIFFs
VENuS cloud mask training dataset
Sen2Venµs -> a dataset for the training of Sentinel-2 super-resolution algorithms
sen2venus-pytorch-dataset -> torch dataloader and other utilities

Maxar

Satellites owned by Maxar (formerly DigitalGlobe) include GeoEye-1, WorldView-2, 3 & 4

Maxar Open Data Program provides pre and post-event high-resolution satellite imagery in support of emergency planning, response, damage assessment, and recovery
WorldView-2 European Cities -> dataset covering the most populated areas in Europe at 40 cm resolution

Planet

Planet’s high-resolution, analysis-ready mosaics of the world’s tropics, supported through Norway’s International Climate & Forests Initiative. BBC coverage
Planet have made imagery available via kaggle competitions
Alberta Wells Dataset -> Pinpointing Oil and Gas Wells from Satellite Imagery

UC Merced

Land use classification dataset with 21 classes and 100 RGB TIFF images for each class. Each image measures 256x256 pixels with a pixel resolution of 1 foot

http://weegee.vision.ucmerced.edu/datasets/landuse.html
Also available as a multi-label dataset
Read Vision Transformers for Remote Sensing Image Classification where a Vision Transformer classifier achieves 98.49% classification accuracy on Merced

EuroSAT

Land use classification dataset of Sentinel-2 satellite images covering 13 spectral bands and consisting of 10 classes with 27000 labeled and geo-referenced samples. Available in RGB and 13 band versions

EuroSAT: Land Use and Land Cover Classification with Sentinel-2 -> publication where a CNN achieves a classification accuracy 98.57%
Repos using fastai here and here
evolved_channel_selection -> explores the trade off between mixed resolutions and whether to use a channel at all, with repo
RGB version available as dataset in pytorch with the 13 band version in torchgeo. Checkout the tutorial on data augmentation with this dataset
EuroSAT-SAR -> matched each Sentinel-2 image in EuroSAT with one Sentinel-1 patch according to the geospatial coordinates

PatternNet

Land use classification dataset with 38 classes and 800 RGB JPG images for each class

Gaofen Image Dataset (GID) for classification

https://captain-whu.github.io/GID/
a large-scale classification set and a fine land-cover classification set

Million-AID

A large-scale benchmark dataset containing million instances for RS scene classification, 51 scene categories organized by the hierarchical category

https://captain-whu.github.io/DiRS/
Pretrained models
Also see AID, AID-Multilabel-Dataset & DFC15-multilabel-dataset

DIOR object detection dataset

A large-scale benchmark dataset for object detection in optical remote sensing images, which consists of 23,463 images and 192,518 object instances annotated with horizontal bounding boxes

https://gcheng-nwpu.github.io/
https://arxiv.org/abs/1909.00133
ors-detection -> Object Detection on the DIOR dataset using YOLOv3
dior_detect -> benchmarks for object detection on DIOR dataset
Tools -> for dealing with the DIOR
Object_Detection_Satellite_Imagery_Yolov8_DIOR

Multiscene

MultiScene dataset aims at two tasks: Developing algorithms for multi-scene recognition & Network learning with noisy labels

https://multiscene.github.io/ & https://github.com/Hua-YS/Multi-Scene-Recognition

FAIR1M object detection dataset

A Benchmark Dataset for Fine-grained Object Recognition in High-Resolution Remote Sensing Imagery

arxiv papr
Download at gaofen-challenge.com
2020Gaofen -> 2020 Gaofen Challenge data, baselines, and metrics

DOTA object detection dataset

A Large-Scale Benchmark and Challenges for Object Detection in Aerial Images. Segmentation annotations available in iSAID dataset

https://captain-whu.github.io/DOTA/index.html
DOTA_devkit for loading dataset
Arxiv paper
Pretrained models in mmrotate
DOTA2VOCtools -> dataset split and transform to voc format
dotatron -> 2021 Learning to Understand Aerial Images Challenge on DOTA dataset

iSAID instance segmentation dataset

A Large-scale Dataset for Instance Segmentation in Aerial Images

https://captain-whu.github.io/iSAID/dataset.html
Uses images from the DOTA dataset

HRSC RGB ship object detection dataset

SAR Ship Detection Dataset (SSDD)

High-Resolution SAR Rotation Ship Detection Dataset (SRSDD)

LEVIR ship dataset

A dataset for tiny ship detection under medium-resolution remote sensing images. Annotations in bounding box format

LEVIR-Ship

Hosted on Nucleus

SAR Aircraft Detection Dataset

2966 non-overlapped 224×224 slices are collected with 7835 aircraft targets

https://github.com/hust-rslab/SAR-aircraft-data

xView1: Objects in context for overhead imagery

A fine-grained object detection dataset with 60 object classes along an ontology of 8 class types. Over 1,000,000 objects across over 1,400 km^2 of 0.3m resolution imagery. Annotations in bounding box format

Official website
arXiv paper.
paperswithcode
Satellite_Imagery_Detection_YOLOV7 -> YOLOV7 applied to xView1

xView2: xBD building damage assessment

Annotated high-resolution satellite imagery for building damage assessment, precise segmentation masks and damage labels on a four-level spectrum, 0.3m resolution imagery

Official website
arXiv paper
paperswithcode
xView2_baseline -> baseline solution in tensorflow
metadamagenet -> pytorch solution
U-Net models from michal2409
DAHiTra -> code for 2022 paper: Large-scale Building Damage Assessment using a Novel Hierarchical Transformer Architecture on Satellite Images. Uses xView2 xBD dataset
Damage assessment using Amazon SageMaker geospatial capabilities and custom SageMaker models
Xview2_Strong_Baseline -> a simple implementation of a strong baseline

xView3: Detecting dark vessels in SAR

Detecting dark vessels engaged in illegal, unreported, and unregulated (IUU) fishing activities on synthetic aperture radar (SAR) imagery. With human and algorithm annotated instances of vessels and fixed infrastructure across 43,200,000 km^2 of Sentinel-1 imagery, this multi-modal dataset enables algorithms to detect and classify dark vessels

Official website
arXiv paper
Github -> all reference code, dataset processing utilities, and winning model codes + weights
paperswithcode
xview3_ship_detection

Vehicle Detection in Aerial Imagery (VEDAI)

Vehicle Detection in Aerial Imagery. Bounding box annotations

Cars Overhead With Context (COWC)

Large set of annotated cars from overhead. Established baseline for object detection and counting tasks. Annotations in bounding box format

AI-TOD & AI-TOD-v2 - tiny object detection

The mean size of objects in AI-TOD is about 12.8 pixels, which is much smaller than other datasets. Annotations in bounding box format. V2 is a meticulous relabelling of the v1 dataset

https://github.com/jwwangchn/AI-TOD
https://chasel-tsui.github.io/AI-TOD-v2/
NWD -> code for 2021 paper: A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. Uses AI-TOD dataset
ORFENet -> Tiny Object Detection in Remote Sensing Images Based on Object Reconstruction and Multiple Receptive Field Adaptive Feature Enhancement. Uses LEVIR-ship & AI-TOD-v2

RarePlanes

RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft. Read the arxiv paper and checkout this repo. Note the dataset is available through the AWS Open-Data Program for free download
Understanding the RarePlanes Dataset and Building an Aircraft Detection Model -> blog post
Read this article from NVIDIA which discusses fine tuning a model pre-trained on synthetic data (Rareplanes) with 10% real data, then pruning the model to reduce its size, before quantizing the model to improve inference speed
yoltv4 includes examples on the RarePlanes dataset
rareplanes-yolov5 -> using YOLOv5 and the RarePlanes dataset to detect and classify sub-characteristics of aircraft, with article

Counting from Sky

A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method

https://github.com/gaoguangshuai/Counting-from-Sky-A-Large-scale-Dataset-for-Remote-Sensing-Object-Counting-and-A-Benchmark-Method

AIRS (Aerial Imagery for Roof Segmentation)

Public dataset for roof segmentation from very-high-resolution aerial imagery (7.5cm). Covers almost the full area of Christchurch, the largest city in the South Island of New Zealand.

On Kaggle
Rooftop-Instance-Segmentation -> VGG-16, Instance Segmentation, uses the Airs dataset

Inria building/not building segmentation dataset

RGB GeoTIFF at spatial resolution of 0.3 m. Data covering Austin, Chicago, Kitsap County, Western & Easter Tyrol, Innsbruck, San Francisco & Vienna

https://project.inria.fr/aerialimagelabeling/contest/
SemSegBuildings -> Project using fast.ai framework for semantic segmentation on Inria building segmentation dataset
UNet_keras_for_RSimage -> keras code for binary semantic segmentation

AICrowd Mapping Challenge: building segmentation dataset

300x300 pixel RGB images with annotations in COCO format. Imagery appears to be global but with significant fraction from North America

Dataset release as part of the mapping-challenge
Winning solution published by neptune.ai here, achieved precision 0.943 and recall 0.954 using Unet with Resnet.
mappingchallenge -> YOLOv5 applied to the AICrowd Mapping Challenge dataset

BONAI - building footprint dataset

BONAI (Buildings in Off-Nadir Aerial Images) is a dataset for building footprint extraction (BFE) in off-nadir aerial images

https://github.com/jwwangchn/BONAI

LEVIR-CD building change detection dataset

https://justchenhao.github.io/LEVIR/
FCCDN_pytorch -> pytorch implemention of FCCDN for change detection task
RSICC -> the Remote Sensing Image Change Captioning dataset uses LEVIR-CD imagery

Onera (OSCD) Sentinel-2 change detection dataset

It comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018.

Onera Satellite Change Detection Dataset comprises 24 pairs of multispectral images taken from the Sentinel-2 satellites between 2015 and 2018
Website
change_detection_onera_baselines -> Siamese version of U-Net baseline model
Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks -> with paper
DS_UNet -> code for 2021 paper: Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection using a Dual Stream U-Net, uses Onera Satellite Change Detection dataset
ChangeDetection_wOnera
OSCD + additional Dates -> extended with three different dates
MSOSCD -> change detection datasets containing VHR, multispectral (Sentinel-2) and SAR (Sentinel-1)

SECOND - semantic change detection

https://captain-whu.github.io/SCD/
Change detection at the pixel level

Amazon and Atlantic Forest dataset

For semantic segmentation with Sentinel 2

Amazon and Atlantic Forest image datasets for semantic segmentation
attention-mechanism-unet -> An attention-based U-Net for detecting deforestation within satellite sensor imagery
TransUNetplus2 -> Rethinking attention gated TransU-Net for deforestation mapping

Functional Map of the World ( fMoW)

https://github.com/fMoW/dataset
RGB & multispectral variants
High resolution, chip classification dataset
Purpose: predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features

HRSCD change detection

https://rcdaudt.github.io/hrscd/
291 coregistered image pairs of high resolution RGB aerial images
Pixel-level change and land cover annotations are provided

MiniFrance-DFC22 - semi-supervised semantic segmentation

The MiniFrance-DFC22 (MF-DFC22) dataset extends and modifies the MiniFrance dataset for training semi-supervised semantic segmentation models for land use/land cover mapping
dfc2022-baseline -> baseline solution to the 2022 IEEE GRSS Data Fusion Contest (DFC2022) using TorchGeo, PyTorch Lightning, and Segmentation Models PyTorch to train a U-Net with a ResNet-18 backbone and a loss function of Focal + Dice loss to perform semantic segmentation on the DFC2022 dataset
https://github.com/mveo/mveo-challenge

FLAIR

Semantic segmentation and domain adaptation challenge proposed by the French National Institute of Geographical and Forest Information (IGN). Uses a dataset composed of over 70,000 aerial imagery patches with pixel-based annotations and 50,000 Sentinel-2 satellite acquisitions.

ISPRS

Semantic segmentation dataset. 38 patches of 6000x6000 pixels, each consisting of a true orthophoto (TOP) extracted from a larger TOP mosaic, and a DSM. Resolution 5 cm

https://www.isprs.org/education/benchmarks/UrbanSemLab/2d-sem-label-potsdam.aspx

SpaceNet

SpaceNet is a series of competitions with datasets and utilities provided. The challenges covered are: (1 & 2) building segmentation, (3) road segmentation, (4) off-nadir buildings, (5) road network extraction, (6) multi-senor mapping, (7) multi-temporal urban change, (8) Flood Detection Challenge Using Multiclass Segmentation

spacenet.ai is an online hub for data, challenges, algorithms, and tools
The SpaceNet 7 Multi-Temporal Urban Development Challenge: Dataset Release
spacenet-three-topcoder solution
official utilities -> Packages intended to assist in the preprocessing of SpaceNet satellite imagery dataset to a format that is consumable by machine learning algorithms
andraugust spacenet-utils -> Display geotiff image with building-polygon overlay & label buildings using kNN on the pixel spectra
Spacenet-Building-Detection -> uses keras and Spacenet 1 dataset
Spacenet 8 winners blog post

WorldStrat Dataset

Nearly 10,000 km² of free high-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.

https://github.com/worldstrat/worldstrat
Quick tour of the WorldStrat Dataset
Each high-resolution image (1.5 m/pixel) comes with multiple temporally-matched low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites (10 m/pixel)
Several super-resolution benchmark models trained on it

Satlas Pretrain

SatlasPretrain is a large-scale pre-training dataset for tasks that involve understanding satellite images. Regularly-updated satellite data is publicly available for much of the Earth through sources such as Sentinel-2 and NAIP, and can inform numerous applications from tackling illegal deforestation to monitoring marine infrastructure.

Website
Code

FLAIR 1 & 2 Segmentation datasets

https://ignf.github.io/FLAIR/
The FLAIR #1 semantic segmentation dataset consists of 77,412 high resolution patches (512x512 at 0.2 m spatial resolution) with 19 semantic classes
FLAIR #2 includes an expanded dataset of Sentinel-2 time series for multi-modal semantic segmentation

Five Billion Pixels segmentation dataset

https://x-ytong.github.io/project/Five-Billion-Pixels.html
4m Gaofen-2 imagery over China
24 land cover classes
Paper and code demonstrating domain adaptation to Sentinel-2 and Planetscope imagery
Extends the GID15 large scale semantic segmentation dataset
GID -> the Gaofen Image Dataset is a large-scale land-cover dataset with Gaofen-2 (GF-2) satellite images

RF100 object detection benchmark

RF100 is compiled from 100 real world datasets that straddle a range of domains. The aim is that performance evaluation on this dataset will enable a more nuanced guide of how a model will perform in different domains. Contains 10k aerial images

SODA-A rotated bounding boxes

https://shaunyuan22.github.io/SODA/
SODA-A comprises 2513 high-resolution images of aerial scenes, which has 872069 instances annotated with oriented rectangle box annotations over 9 classes
https://github.com/shaunyuan22/CFINet

EarthView from Satellogic

https://huggingface.co/datasets/satellogic/EarthView
Dataset for foundational models, with Sentinel 1 & 2 and 1m RGB

Microsoft datasets

US Building Footprints -> building footprints in all 50 US states, GeoJSON format, generated using semantic segmentation. Also Australia, Canadian, Uganda-Tanzania, Kenya-Nigeria and GlobalMLBuildingFootprints are available. Checkout RasterizingBuildingFootprints to convert vector shapefiles to raster layers
Microsoft Planetary Computer is a Dask-Gateway enabled JupyterHub deployment focused on supporting scalable geospatial analysis, source repo
landcover-orinoquia -> Land cover mapping of the Orinoquía region in Colombia, in collaboration with Wildlife Conservation Society Colombia. An #AIforEarth project
RoadDetections dataset by Microsoft

Google datasets

open-buildings -> A dataset of building footprints to support social good applications covering 64% of the African continent. Read Mapping Africa’s Buildings with Satellite Imagery

Google Earth Engine (GEE)

Since there is a whole community around GEE I will not reproduce it here but list very select references. Get started at https://developers.google.com/earth-engine/

Various imagery and climate datasets, including Landsat & Sentinel imagery
Supports large scale processing with classical algorithms, e.g. clustering for land use. For deep learning, you export datasets from GEE as tfrecords, train on your preferred GPU platform, then upload inference results back to GEE
awesome-google-earth-engine
Awesome-GEE
awesome-earth-engine-apps
How to Use Google Earth Engine and Python API to Export Images to Roboflow -> to acquire training data
ee-fastapi is a simple FastAPI web application for performing flood detection using Google Earth Engine in the backend.
How to Download High-Resolution Satellite Data for Anywhere on Earth
wxee -> Export data from GEE to xarray using wxee then train with pytorch or tensorflow models. Useful since GEE only suports tfrecord export natively

Image captioning datasets

RSICD -> 10921 images with five sentences descriptions per image. Used in Fine tuning CLIP with Remote Sensing (Satellite) images and captions, models at this repo
RSICC -> the Remote Sensing Image Change Captioning dataset contains 10077 pairs of bi-temporal remote sensing images and 50385 sentences describing the differences between images. Uses LEVIR-CD imagery
ChatEarthNet -> A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models, utilizes Sentinel-2 data with captions generated by ChatGPT

Weather Datasets

NASA (make request and emailed when ready) -> https://search.earthdata.nasa.gov
NOAA (requires BigQuery) -> https://www.kaggle.com/datasets/noaa/goes16/home
Time series weather data for several US cities -> https://www.kaggle.com/datasets/selfishgene/historical-hourly-weather-data
DeepWeather -> improve weather forecasting accuracy by analyzing satellite images

Cloud datasets

Planet-CR -> A Multi-Modal and Multi-Resolution Dataset for Cloud Removal in High Resolution Optical Remote Sensing Imagery, 3m resolution, with paper
The Azavea Cloud Dataset which is used to train this cloud-model
Sentinel-2 Cloud Cover Segmentation Dataset on Radiant mlhub
cloudsen12 -> see video
HRC_WHU -> High-Resolution Cloud Detection Dataset comprising 150 RGB images and a resolution varying from 0.5 to 15 m in different global regions
AIR-CD -> a challenging cloud detection data set called AIR-CD, with higher spatial resolution and more representative landcover types
Landsat 8 Cloud Cover Assessment Validation Data

Forest datasets

OpenForest -> A catalogue of open access forest datasets
awesome-forests -> A curated list of ground-truth forest datasets for the machine learning and forestry community
ReforesTree -> A dataset for estimating tropical forest biomass based on drone and field data
yosemite-tree-dataset -> a benchmark dataset for tree counting from aerial images
Amazon Rainforest dataset for semantic segmentation -> Sentinel 2 images. Used in the paper 'An attention-based U-Net for detecting deforestation within satellite sensor imagery'
Amazon and Atlantic Forest image datasets for semantic segmentation -> Sentinel 2 images. Used in paper 'An attention-based U-Net for detecting deforestation within satellite sensor imagery'
TreeSatAI -> Sentinel-1, Sentinel-2
PureForest -> VHR RGB + Near-Infrared & lidar, each patch represents a monospecific forest

Geospatial datasets

Resource Watch provides a wide range of geospatial datasets and a UI to visualise them

Time series & change detection datasets

BreizhCrops -> A Time Series Dataset for Crop Type Mapping
The SeCo dataset contains image patches from Sentinel-2 tiles captured at different timestamps at each geographical location. Download SeCo here
SYSU-CD -> The dataset contains 20000 pairs of 0.5-m aerial images of size 256×256 taken between the years 2007 and 2014 in Hong Kong

DEM (digital elevation maps)

Shuttle Radar Topography Mission, search online at usgs.gov
Copernicus Digital Elevation Model (DEM) on S3, represents the surface of the Earth including buildings, infrastructure and vegetation. Data is provided as Cloud Optimized GeoTIFFs. link
Awesome-DEM

UAV & Drone datasets

Many on https://www.visualdata.io
AU-AIR dataset -> a multi-modal UAV dataset for object detection.
ERA -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos.
Aerial Maritime Drone Dataset -> bounding boxes
RetinaNet for pedestrian detection -> bounding boxes
BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos -> Thermal IR videos of humans and animals
ERA: A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos
DroneVehicle -> Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning. Annotations are rotated bounding boxes. With Github repo
UAVOD10 -> 10 class of objects at 15 cm resolution. Classes are; building, ship, vehicle, prefabricated house, well, cable tower, pool, landslide, cultivation mesh cage, and quarry. Bounding boxes
Busy-parking-lot-dataset---vehicle-detection-in-UAV-video -> Vehicle instance segmentation. Unsure format of annotations, possible Matlab specific
dd-ml-segmentation-benchmark -> DroneDeploy Machine Learning Segmentation Benchmark
SeaDronesSee -> Vision Benchmark for Maritime Search and Rescue. Bounding box object detection, single-object tracking and multi-object tracking annotations
aeroscapes -> semantic segmentation benchmark comprises of images captured using a commercial drone from an altitude range of 5 to 50 metres.
ALTO -> Aerial-view Large-scale Terrain-Oriented. For deep learning based UAV visual place recognition and localization tasks.
HIT-UAV-Infrared-Thermal-Dataset -> A High-altitude Infrared Thermal Object Detection Dataset for Unmanned Aerial Vehicles
caltech-aerial-rgbt-dataset -> synchronized RGB, thermal, GPS, and IMU data
Leafy Spurge Dataset -> Real-world Weed Classification Within Aerial Drone Imagery
UAV-HSI-Crop-Dataset -> dataset for "HSI-TransUNet: A Transformer based semantic segmentation model for crop mapping from UAV hyperspectral imagery"
UAVVaste -> COCO-like dataset and effective waste detection in aerial images

Other datasets

land-use-land-cover-datasets
EORSSD-dataset -> Extended Optical Remote Sensing Saliency Detection (EORSSD) Dataset
RSD46-WHU -> 46 scene classes for image classification, free for education, research and commercial use
RSOD-Dataset -> dataset for object detection in PASCAL VOC format. Aircraft, playgrounds, overpasses & oiltanks
VHR-10_dataset_coco -> Object detection and instance segmentation dataset based on NWPU VHR-10 dataset. RGB & SAR
HRSID -> high resolution sar images dataset for ship detection, semantic segmentation, and instance segmentation tasks
MAR20 -> Military Aircraft Recognition dataset
RSSCN7 -> Dataset of the article “Deep Learning Based Feature Selection for Remote Sensing Scene Classification”
Sewage-Treatment-Plant-Dataset -> object detection
TGRS-HRRSD-Dataset -> High Resolution Remote Sensing Detection (HRRSD)
MUSIC4HA -> MUltiband Satellite Imagery for object Classification (MUSIC) to detect Hot Area
MUSIC4GC -> MUltiband Satellite Imagery for object Classification (MUSIC) to detect Golf Course
MUSIC4P3 -> MUltiband Satellite Imagery for object Classification (MUSIC) to detect Photovoltaic Power Plants (solar panels)
ABCDdataset -> damage detection dataset to identify whether buildings have been washed-away by tsunami
OGST -> Oil and Gas Tank Dataset
LS-SSDD-v1.0-OPEN -> Large-Scale SAR Ship Detection Dataset
S2Looking -> A Satellite Side-Looking Dataset for Building Change Detection, paper
AISD -> Aerial Imagery dataset for Shadow Detection
Awesome-Remote-Sensing-Relative-Radiometric-Normalization-Datasets
SearchAndRescueNet -> Satellite Imagery for Search And Rescue Dataset, with example Faster R-CNN model
geonrw -> orthorectified aerial photographs, LiDAR derived digital elevation models and segmentation maps with 10 classes. With repo
Thermal power plans dataset
University1652-Baseline -> A Multi-view Multi-source Benchmark for Drone-based Geo-localization
benchmark_ISPRS2021 -> A new stereo dense matching benchmark dataset for deep learning
WHU-SEN-City -> A paired SAR-to-optical image translation dataset which covers 34 big cities of China
SAR_vehicle_detection_dataset -> 104 SAR images for vehicle detection, collected from Sandia MiniSAR/FARAD SAR images and MSTAR images
ERA-DATASET -> A Dataset and Deep Learning Benchmark for Event Recognition in Aerial Videos
SSL4EO-S12 -> a large-scale dataset for self-supervised learning in Earth observation
UBC-dataset -> a dataset for building detection and classification from very high-resolution satellite imagery with the focus on object-level interpretation of individual buildings
AIR-CD -> a challenging cloud detection data set called AIR-CD, with higher spatial resolution and more representative landcover types
AIR-PolSAR-Seg -> a challenging PolSAR terrain segmentation dataset
HRC_WHU -> High-Resolution Cloud Detection Dataset comprising 150 RGB images and a resolution varying from 0.5 to 15 m in different global regions
AeroRIT -> A New Scene for Hyperspectral Image Analysis
Building_Dataset -> High-speed Rail Line Building Dataset Display
Haiming-Z/MtS-WH-reference-map -> a reference map for change detection based on MtS-WH
MtS-WH-Dataset -> Multi-temporal Scene WuHan (MtS-WH) Dataset
Multi-modality-image-matching -> image matching dataset including several remote sensing modalities
RID -> Roof Information Dataset for CV-Based Photovoltaic Potential Assessment. With paper
APKLOT -> A dataset for aerial parking block segmentation
QXS-SAROPT -> Optical and SAR pairing dataset from the paper: The QXS-SAROPT Dataset for Deep Learning in SAR-Optical Data Fusion
SAR-ACD -> SAR-ACD consists of 4322 aircraft clips with 6 civil aircraft categories and 14 other aircraft categories
SODA -> A large-scale Small Object Detection dataset. SODA-A comprises 2510 high-resolution images of aerial scenes, which has 800203 instances annotated with oriented rectangle box annotations over 9 classes.
Data-CSHSI -> Open source datasets for Cross-Scene Hyperspectral Image Classification, includes Houston, Pavia & HyRank datasets
SynthWakeSAR -> A Synthetic SAR Dataset for Deep Learning Classification of Ships at Sea, with paper
SAR2Opt-Heterogeneous-Dataset -> SAR-optical images to be used as a benchmark in change detection and image transaltion on remote sensing images
urban-tree-detection-data -> Dataset for training and evaluating tree detectors in urban environments with aerial imagery
Landsat 8 Cloud Cover Assessment Validation Data
Attribute-Cooperated-Classification-Datasets -> Three datasets based on AID, UCM, and Sydney. For each image, there is a label of scene classification and a label vector of attribute items.
dynnet -> DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation
open_earth_map -> a benchmark dataset for global high-resolution land cover mapping
Satellite imagery datasets containing ships -> A list of radar and optical satellite datasets for ship detection, classification, semantic segmentation and instance segmentation tasks
SolarDK -> A high-resolution urban solar panel image classification and localization dataset
Roofline-Extraction -> dataset for paper 'Knowledge-Based 3D Building Reconstruction (3DBR) Using Single Aerial Images and Convolutional Neural Networks (CNNs)'
Building-detection-and-roof-type-recognition -> datasets for the paper 'A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image'
PanCollection -> Pansharpening Datasets from WorldView 2, WorldView 3, QuickBird, Gaofen 2 sensors
OnlyPlanes -> Synthetic dataset and pretrained models for Detectron2
Remote Sensing Satellite Video Dataset for Super-resolution
WHU-Stereo -> A Challenging Benchmark for Stereo Matching of High-Resolution Satellite Images
FireRisk -> A Remote Sensing Dataset for Fire Risk Assessment with Benchmarks Using Supervised and Self-supervised Learning
Road-Change-Detection-Dataset
3DCD -> infer 3D CD maps using only remote sensing optical bitemporal images as input without the need of Digital Elevation Models (DEMs)
Hyperspectral Change Detection Dataset Irrigated Agricultural Area
CNN-RNN-Yield-Prediction -> soybean dataset
HySpecNet-11k -> a large-scale hyperspectral benchmark dataset
Mumbai-Semantic-Segmentation-Dataset
SZTAKI -> A Ground truth collection for change detection in optical aerial images taken with several years time differences
DSIFN -> change detection dataset, it consists of six large bi-temporal high resolution images covering six cities in China
SV248S -> Single Object Tracking Dataset, tracking Vehicle, Large-Vehicle, Ship and Airplane
GAMUS -> A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data
Oil and Gas Infrastructure Mapping (OGIM) database -> includes locations and facility attributes of oil and gas infrastructure types that are important sources of methane emissions
openWUSU -> WUSU is a semantic understanding dataset focusing on urban structure and the urbanization process in Wuhan
Digital Typhoon Dataset -> aimed at benchmarking machine learning models for long-term spatio-temporal data
RSE_Cross-city -> Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks
AErial Lane -> AErial Lane (AEL) Dataset is a first large-scale aerial image dataset built for lane detection, with high-quality polyline lane annotations on high-resolution images of around 80 kilometers of road
GeoPile pretraining dataset -> compiles imagery from other datasets including RSD46-WHU, MLRSNet and RESISC45 for pretraining of Foundational models
NWPU-MOC -> A Benchmark for Fine-grained Multi-category Object Counting in Aerial Images
Chesapeake Roads Spatial Context (RSC)
STARCOP dataset: Semantic Segmentation of Methane Plumes with Hyperspectral Machine Learning Models
Toulouse Hyperspectral Data Set
CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds -> the dataset consists of 1,780 MODIS satellite images hand-labeled for the presence of more than 12,000 ship tracks.
Vehicle Perception from Satellite -> a large-scale benchmark for traffic monitoring from satellite
SARDet-100K -> Large-Scale Synthetic Aperture Radar (SAR) Object Detection
So2Sat-POP-DL -> Dataset discovery: So2Sat Population dataset covering 98 EU cities
Urban Vehicle Segmentation Dataset (UV6K)
TimeMatch -> dataset for cross-region adaptation for crop identification from SITS in four different regions in Europe
BirdSAT -> Cross-View iNAT Birds 2021: This cross-view birds species dataset consists of paired ground-level bird images and satellite images, along with meta-information associated with the iNaturalist-2021 dataset.
OpenSARWake -> A SAR ship wake rotation detection benchmark dataset.
TUE-CD -> A change detection detection for building damage estimation after earthquake
Overhead Wind Turbine Dataset - NAIP
Toulouse Hyperspectral Data Set
Hi-UCD -> ultra-High Urban Change Detection for urban semantic change detection
LEVIR-CC-Dataset -> A Large Dataset for Remote Sensing Image Change Captioning
ShipRSImageNet -> A Large-scale Fine-Grained Dataset for Ship Detection in High-Resolution Optical Remote Sensing Images
pangaea-bench -> A Global and Inclusive Benchmark for Geospatial Foundation Models
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
SeeFar -> Satellite Agnostic Multi-Resolution Dataset for Geospatial Foundation Models
RSHaze+ -> remote sensing dehazing datasets in PhDnet: A novel physic-aware dehazing network for remote sensing images
GDCLD -> A globally distributed dataset of coseismic landslide mapping via multi-source high-resolution remote sensing images
10,000 Crop Field Boundaries across India -> using Airbus SPOT

Kaggle

Kaggle hosts over > 200 satellite image datasets, search results here. The kaggle blog is an interesting read.

Kaggle - Amazon from space - classification challenge

https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data
3-5 meter resolution GeoTIFF images from planet Dove satellite constellation
12 classes including - cloudy, primary + waterway etc
1st place winner interview - used 11 custom CNN
FastAI Multi-label image classification
Multi-Label Classification of Satellite Photos of the Amazon Rainforest
Understanding the Amazon Rainforest with Multi-Label Classification + VGG-19, Inceptionv3, AlexNet & Transfer Learning
amazon-classifier -> compares random forest with CNN
multilabel-classification -> compares various CNN architecutres
Planet-Amazon-Kaggle -> uses fast.ai
deforestation_deep_learning
Track-Human-Footprint-in-Amazon-using-Deep-Learning
Amazon-Rainforest-CNN -> uses a 3-layer CNN in Tensorflow
rainforest-tagging -> Convolutional Neural Net and Recurrent Neural Net in Tensorflow for satellite images multi-label classification
satellite-deforestation -> Using Satellite Imagery to Identify the Leading Indicators of Deforestation, applied to the Kaggle Challenge Understanding the Amazon from Space

Kaggle - DSTL segmentation challenge

https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection
Rating - medium, many good examples (see the Discussion as well as kernels), but as this competition was run a couple of years ago many examples use python 2
WorldView 3 - 45 satellite images covering 1km x 1km in both 3 (i.e. RGB) and 16-band (400nm - SWIR) images
10 Labelled classes include - Buildings, Road, Trees, Crops, Waterway, Vehicles
Interview with 1st place winner who used segmentation networks - 40+ models, each tweaked for particular target (e.g. roads, trees)
ZF_UNET_224_Pretrained_Model 2nd place solution ->
3rd place soluton -> which explored pansharpening & calculating reflectance indices, with arxiv paper
Deepsense 4th place solution
Entry by lopuhin using UNet with batch-normalization
Multi-class semantic segmentation of satellite images using U-Net using DSTL dataset, tensorflow 1 & python 2.7. Accompanying article
Deep-Satellite-Image-Segmentation
Dstl-Satellite-Imagery-Feature-Detection-Improved
Satellite-imagery-feature-detection
Satellite_Image_Classification -> using XGBoost and ensemble classification methods
Unet-for-Satellite
building-segmentation -> TensorFlow U-Net implementation trained to segment buildings in satellite imagery

Kaggle - DeepSat land cover classification

https://www.kaggle.com/datasets/crawford/deepsat-sat4 & https://www.kaggle.com/datasets/crawford/deepsat-sat6
DeepSat-Kaggle -> uses Julia
deepsat-aws-emr-pyspark -> Using PySpark for Image Classification on Satellite Imagery of Agricultural Terrains

Kaggle - Airbus ship detection challenge

https://www.kaggle.com/c/airbus-ship-detection/overview
Rating - medium, most solutions using deep-learning, many kernels, good example kernel
Detecting ships in satellite imagery: five years later…
I believe there was a problem with this dataset, which led to many complaints that the competition was ruined
Lessons Learned from Kaggle’s Airbus Challenge
Airbus-Ship-Detection -> This solution scored 139 out of 884 for the competition, combines ResNeXt50 based classifier and a U-net segmentation model
Ship-Detection-Project -> uses Mask R-CNN and UNet model
Airbus_SDC
Airbus_SDC_dup -> Project focused on detecting duplicate regions of overlapping satellite imagery. Applied to Airbus ship detection dataset
airbus-ship-detection -> CNN with REST API
Ship-Detection-from-Satellite-Images-using-YOLOV4 -> uses Kaggle Airbus Ship Detection dataset
Image Segmentation: Kaggle experience -> Medium article by gold medal winner Vlad Shmyhlo

Kaggle - Shipsnet classification dataset

https://www.kaggle.com/datasets/rhammell/ships-in-satellite-imagery -> Classify ships in San Franciso Bay using Planet satellite imagery
4000 80x80 RGB images labeled with either a "ship" or "no-ship" classification, 3 meter pixel size
shipsnet-detector -> Detect container ships in Planet imagery using machine learning

Kaggle - Ships in Google Earth

https://www.kaggle.com/datasets/tomluther/ships-in-google-earth
794 jpegs showing various sized ships in satellite imagery, annotations in Pascal VOC format for object detection models
/kaggle-ships-in-satellite-imagery-with-YOLOv8

Kaggle - Ships in San Franciso Bay

https://www.kaggle.com/datasets/rhammell/ships-in-satellite-imagery
4000 80x80 RGB images labeled with either a "ship" or "no-ship" classification, provided by Planet
DeepLearningShipDetection
Ship-Detection-Using-Satellite-Imagery

Kaggle - Swimming pool and car detection using satellite imagery

https://www.kaggle.com/datasets/kbhartiya83/swimming-pool-and-car-detection
3750 satellite images of residential areas with annotation data for swimming pools and cars
Object detection on Satellite Imagery using RetinaNet

Kaggle - Planesnet classification dataset

https://www.kaggle.com/datasets/rhammell/planesnet -> Detect aircraft in Planet satellite image chips
20x20 RGB images, the "plane" class includes 8000 images and the "no-plane" class includes 24000 images
Dataset repo and planesnet-detector demonstrates a small CNN classifier on this dataset
ergo-planes-detector -> An ergo based project that relies on a convolutional neural network to detect airplanes from satellite imagery, uses the PlanesNet dataset
Using AWS SageMaker/PlanesNet to process Satellite Imagery
Airplane-in-Planet-Image -> pytorch model

Kaggle - CGI Planes in Satellite Imagery w/ BBoxes

https://www.kaggle.com/datasets/aceofspades914/cgi-planes-in-satellite-imagery-w-bboxes
500 computer generated satellite images of planes
Faster RCNN to detect airplanes
aircraft-detection-from-satellite-images-yolov3

Kaggle - Draper challenge to place images in order of time

https://www.kaggle.com/c/draper-satellite-image-chronology/data
Rating - hard. Not many useful kernels.
Images are grouped into sets of five, each of which have the same setId. Each image in a set was taken on a different day (but not necessarily at the same time each day). The images for each set cover approximately the same area but are not exactly aligned.
Kaggle interviews for entrants who used XGBOOST and a hybrid human/ML approach
deep-cnn-sat-image-time-series -> uses LSTM

Kaggle - Dubai segmentation

https://www.kaggle.com/datasets/humansintheloop/semantic-segmentation-of-aerial-imagery
72 satellite images of Dubai, the UAE, and is segmented into 6 classes
dubai-satellite-imagery-segmentation -> due to the small dataset, image augmentation was used
U-Net for Semantic Segmentation on Unbalanced Aerial Imagery -> using the Dubai dataset
Semantic-Segmentation-using-U-Net -> uses keras
unet_satelite_image_segmentation

Kaggle - Massachusetts Roads & Buildings Datasets - segmentation

https://www.kaggle.com/datasets/balraj98/massachusetts-roads-dataset
https://www.kaggle.com/datasets/balraj98/massachusetts-buildings-dataset
Official published dataset
Road_seg_dataset -> subset of the roads dataset containing only 200 images and masks
Road and Building Semantic Segmentation in Satellite Imagery uses U-Net on the Massachusetts Roads Dataset & keras
Semantic-segmentation repo by fuweifu-vtoo -> uses pytorch and the Massachusetts Buildings & Roads Datasets
ssai-cnn -> This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset
building-footprint-segmentation -> pip installable library to train building footprint segmentation on satellite and aerial imagery, applied to Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset
Road detection using semantic segmentation and albumentations for data augmention using the Massachusetts Roads Dataset, U-net & Keras
Image-Segmentation) -> using Massachusetts Road dataset and fast.ai

Kaggle - Deepsat classification challenge

Not satellite but airborne imagery. Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared. The training and test labels are one-hot encoded 1x6 vectors. Each image patch is size normalized to 28x28 pixels. Data in .mat Matlab format. JPEG?

Sat4 500,000 image patches covering four broad land cover classes - barren land, trees, grassland and a class that consists of all land cover classes other than the above three
Sat6 405,000 image patches each of size 28x28 and covering 6 landcover classes - barren land, trees, grassland, roads, buildings and water bodies.

Kaggle - High resolution ship collections 2016 (HRSC2016)

https://www.kaggle.com/datasets/guofeng/hrsc2016
Ship images harvested from Google Earth
HRSC2016_SOTA -> Fair comparison of different algorithms on the HRSC2016 dataset

Kaggle - SWIM-Ship Wake Imagery Mass

https://www.kaggle.com/datasets/lilitopia/swimship-wake-imagery-mass
An optical ship wake detection benchmark dataset built for deep learning
WakeNet -> A CNN-based optical image ship wake detector, code for 2021 paper: Rethinking Automatic Ship Wake Detection: State-of-the-Art CNN-based Wake Detection via Optical Images

Kaggle - Understanding Clouds from Satellite Images

In this challenge, you will build a model to classify cloud organization patterns from satellite images.

Kaggle - 38-Cloud Cloud Segmentation

https://www.kaggle.com/datasets/sorour/38cloud-cloud-segmentation-in-satellite-images
Contains 38 Landsat 8 images and manually extracted pixel-level ground truths
38-Cloud Github repository and follow up 95-Cloud dataset
How to create a custom Dataset / Loader in PyTorch, from Scratch, for multi-band Satellite Images Dataset from Kaggle
Cloud-Net: A semantic segmentation CNN for cloud detection -> an end-to-end cloud detection algorithm for Landsat 8 imagery, trained on 38-Cloud Training Set
Segmentation of Clouds in Satellite Images Using Deep Learning -> semantic segmentation using a Unet on the Kaggle 38-Cloud dataset

Kaggle - Airbus Aircraft Detection Dataset

https://www.kaggle.com/airbusgeo/airbus-aircrafts-sample-dataset
One hundred civilian airports and over 3000 annotated commercial aircrafts
detecting-aircrafts-on-airbus-pleiades-imagery-with-yolov5
pytorch-remote-sensing -> Aircraft detection using the 'Airbus Aircraft Detection' dataset and Faster-RCNN with ResNet-50 backbone in pytorch

Kaggle - Airbus oil storage detection dataset

https://www.kaggle.com/airbusgeo/airbus-oil-storage-detection-dataset
Oil-Storage Tank Instance Segmentation with Mask R-CNN with accompanying article
Oil Storage Detection on Airbus Imagery with YOLOX -> uses the Kaggle Airbus Oil Storage Detection dataset
Oil-Storage-Tanks-Data-Preparation-YOLO-Format

Kaggle - Satellite images of hurricane damage

Kaggle - Austin Zoning Satellite Images

https://www.kaggle.com/datasets/franchenstein/austin-zoning-satellite-images
classify a images of Austin into one of its zones, such as residential, industrial, etc. 3667 satellite images

Kaggle - Statoil/C-CORE Iceberg Classifier Challenge

Classify the target in a SAR image chip as either a ship or an iceberg. The dataset for the competition included 5000 images extracted from multichannel SAR data collected by the Sentinel-1 satellite. Top entries used ensembles to boost prediction accuracy from about 92% to 97%.

https://www.kaggle.com/c/statoil-iceberg-classifier-challenge/data
An interview with David Austin: 1st place winner
radar-image-recognition
Iceberg-Classification-Using-Deep-Learning -> uses keras
Deep-Learning-Project -> uses keras
iceberg-classifier-challenge solution by ShehabSunny -> uses keras
Analyzing Satellite Radar Imagery with Deep Learning -> by Matlab, uses ensemble with greedy search
16th place solution
fastai solution

Kaggle - Land Cover Classification Dataset from DeepGlobe Challenge - segmentation

https://www.kaggle.com/datasets/balraj98/deepglobe-land-cover-classification-dataset
Satellite Imagery Semantic Segmentation with CNN -> 7 different segmentation classes, DeepGlobe Land Cover Classification Challenge dataset, with repo
Land Cover Classification with U-Net -> Satellite Image Multi-Class Semantic Segmentation Task with PyTorch Implementation of U-Net, uses DeepGlobe Land Cover Segmentation dataset, with code
DeepGlobe Land Cover Classification Challenge solution

Kaggle - Next Day Wildfire Spread

A Data Set to Predict Wildfire Spreading from Remote-Sensing Data

Kaggle - Satellite Next Day Wildfire Spread

Inspired by the above dataset, using different data sources

Kaggle - Spacenet 7 Multi-Temporal Urban Change Detection

https://www.kaggle.com/datasets/amerii/spacenet-7-multitemporal-urban-development
SatFootprint -> building segmentation on the Spacenet 7 dataset

Kaggle - Satellite Images to predict poverty in Africa

https://www.kaggle.com/datasets/sandeshbhat/satellite-images-to-predict-povertyafrica
Uses satellite imagery and nightlights data to predict poverty levels at a local level
Predicting-Poverty -> Combining satellite imagery and machine learning to predict poverty, in PyTorch

Kaggle - NOAA Fisheries Steller Sea Lion Population Count

https://www.kaggle.com/competitions/noaa-fisheries-steller-sea-lion-population-count -> count sea lions from aerial images
Sealion-counting
Sealion_Detection_Classification

Kaggle - Arctic Sea Ice Image Masking

Kaggle - Overhead-MNIST

A Benchmark Satellite Dataset as Drop-In Replacement for MNIST
https://www.kaggle.com/datasets/datamunge/overheadmnist -> kaggle
https://arxiv.org/abs/2102.04266 -> paper
https://github.com/reveondivad/ov-mnist -> github

Kaggle - Satellite Image Classification

Kaggle - EuroSAT - Sentinel-2 Dataset

https://www.kaggle.com/datasets/raoofnaushad/eurosat-sentinel2-dataset
RGB Land Cover and Land Use Classification using Sentinel-2 Satellite
Used in paper Image Augmentation for Satellite Images

Kaggle - Satellite Images of Water Bodies

https://www.kaggle.com/datasets/franciscoescobar/satellite-images-of-water-bodies
pytorch-waterbody-segmentation -> UNET model trained on the Satellite Images of Water Bodies dataset from Kaggle. The model is deployed on Hugging Face Spaces

Kaggle - NOAA sea lion count

https://www.kaggle.com/c/noaa-fisheries-steller-sea-lion-population-count
noaa -> UNET, object detection and image level regression approaches

Kaggle - miscellaneous

https://www.kaggle.com/datasets/reubencpereira/spatial-data-repo -> Satellite + loan data
https://www.kaggle.com/datasets/towardsentropy/oil-storage-tanks -> Image data of industrial oil tanks with bounding box annotations, estimate tank fill % from shadows
https://www.kaggle.com/datasets/airbusgeo/airbus-wind-turbines-patches -> Airbus SPOT satellites images over wind turbines for classification
https://www.kaggle.com/datasets/aceofspades914/cgi-planes-in-satellite-imagery-w-bboxes -> CGI planes object detection dataset
https://www.kaggle.com/datasets/atilol/aerialimageryforroofsegmentation -> Aerial Imagery for Roof Segmentation
https://www.kaggle.com/datasets/andrewmvd/ship-detection -> 621 images of boats and ships
https://www.kaggle.com/datasets/alpereniek/vehicle-detection-from-satellite-images-data-set
https://www.kaggle.com/datasets/sergiishchus/maxar-satellite-data -> Example Maxar data at 15 cm resolution
https://www.kaggle.com/datasets/cici118/swimming-pool-detection-algarves-landscape
https://www.kaggle.com/datasets/donkroco/solar-panel-module -> object detection for solar panels
https://www.kaggle.com/datasets/balraj98/deepglobe-road-extraction-dataset -> segment roads
https://www.kaggle.com/datasets/towardsentropy/oil-storage-tanks -> Image data of industrial Oil Storage Tanks with bounding box annotations
https://www.kaggle.com/competitions/widsdatathon2019/ -> Palm oil plantations
https://www.kaggle.com/datasets/siddharthkumarsah/ships-in-aerial-images -> Ships/Vessels in Aerial Images
https://www.kaggle.com/datasets/jangsienicajzkowy/afo-aerial-dataset-of-floating-objects -> Aerial dataset for maritime Search and Rescue applications
https://www.kaggle.com/datasets/yaroslavnaychuk/satelliteimagesegmentation -> Segmentation on Gaofen Satellite Image, extracted from GID-15 dataset

Competitions

Competitions are an excellent source for accessing clean, ready-to-use satellite datasets and model benchmarks.

https://codalab.lisn.upsaclay.fr/competitions/9603 -> object detection from diversified satellite imagery
https://www.drivendata.org/competitions/143/tick-tick-bloom/ -> detect and classify algal bloom
https://www.drivendata.org/competitions/81/detect-flood-water/ -> map floodwater from radar imagery
https://platform.ai4eo.eu/enhanced-sentinel2-agriculture -> map cultivated land using Sentinel imagery
https://www.diu.mil/ai-xview-challenge -> multiple challenges ranging from detecting fishing vessals to estimating building damages
https://competitions.codalab.org/competitions/30440 -> flood detection
https://www.drivendata.org/competitions/83/cloud-cover/ -> cloud cover detection
https://www.drivendata.org/competitions/78/overhead-geopose-challenge/page/372/ -> predicts geocentric pose from single-view oblique satellite images
https://www.drivendata.org/competitions/60/building-segmentation-disaster-resilience/ -> building segmentation
https://captain-whu.github.io/DOTA/ -> large dataset for object detection in aerial imagery
https://spacenet.ai/ -> set of 8 challenges such as road network detection
https://huggingface.co/spaces/competitions/ChaBuD-ECML-PKDD2023 -> binary image segmentation task on forest fires monitored over California

https://spaceml.org/repo/project/6269285b14d764000d798fde -> ML for floods
https://spaceml.org/repo/project/60002402f5647f00129f7287 -> lightning and extreme weather
https://spaceml.org/repo/project/6025107d79c197001219c481/true -> ~1TB dataset for precipitation forecasting
https://spaceml.org/repo/project/61c0a1b9ff8868000dfb79e1/true -> Sentinel-2 image super-resolution

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github/workflows		.github/workflows
.mlc_config.json		.mlc_config.json
README.md		README.md
logo.png		logo.png

satellite-image-deep-learning/datasets

Folders and files

Latest commit

History

Repository files navigation

Datasets for deep learning applied to satellite and aerial imagery.

👉 satellite-image-deep-learning.com 👈

Lists of datasets

Remote sensing dataset hubs

Sentinel

Landsat

VENμS

Maxar

Planet

UC Merced

EuroSAT

PatternNet

Gaofen Image Dataset (GID) for classification

Million-AID

DIOR object detection dataset

Multiscene

FAIR1M object detection dataset

DOTA object detection dataset

iSAID instance segmentation dataset

HRSC RGB ship object detection dataset

SAR Ship Detection Dataset (SSDD)

High-Resolution SAR Rotation Ship Detection Dataset (SRSDD)

LEVIR ship dataset

SAR Aircraft Detection Dataset

xView1: Objects in context for overhead imagery

xView2: xBD building damage assessment

xView3: Detecting dark vessels in SAR

Vehicle Detection in Aerial Imagery (VEDAI)

Cars Overhead With Context (COWC)

AI-TOD & AI-TOD-v2 - tiny object detection

RarePlanes

Counting from Sky

AIRS (Aerial Imagery for Roof Segmentation)

Inria building/not building segmentation dataset

AICrowd Mapping Challenge: building segmentation dataset

BONAI - building footprint dataset

LEVIR-CD building change detection dataset

Onera (OSCD) Sentinel-2 change detection dataset

SECOND - semantic change detection

Amazon and Atlantic Forest dataset

Functional Map of the World ( fMoW)

HRSCD change detection

MiniFrance-DFC22 - semi-supervised semantic segmentation

FLAIR

ISPRS

SpaceNet

WorldStrat Dataset

Satlas Pretrain

FLAIR 1 & 2 Segmentation datasets

Five Billion Pixels segmentation dataset

RF100 object detection benchmark

SODA-A rotated bounding boxes

EarthView from Satellogic

Microsoft datasets

Google datasets

Google Earth Engine (GEE)

Image captioning datasets

Weather Datasets

Cloud datasets

Forest datasets

Geospatial datasets

Time series & change detection datasets

DEM (digital elevation maps)

UAV & Drone datasets

Other datasets

Kaggle

Kaggle - Amazon from space - classification challenge

Kaggle - DSTL segmentation challenge

Kaggle - DeepSat land cover classification

Kaggle - Airbus ship detection challenge

Kaggle - Shipsnet classification dataset

Kaggle - Ships in Google Earth

Kaggle - Ships in San Franciso Bay

Kaggle - Swimming pool and car detection using satellite imagery

Kaggle - Planesnet classification dataset

Packages