Disclaimer

Datasets Hub for the IceVision Framework

Note: We Need Your Help If you find this work useful, please let other people know by starring it, and sharing it. Thank you!

Contributors

Documentation

Installation

pip install icedata

For more installation options, check our extensive documentation.

Important: We currently only support Linux/MacOS.

Why IceData?

IceData is a dataset hub for the IceVision Framework
It includes community maintained datasets and parsers and has out-of-the-box support for common annotation formats (COCO, VOC, etc.)
It provides an overview of each included dataset with a description, an annotation example, and other helpful information
It makes end-to-end training straightforward thanks to IceVision's unified API
It enables practioners to get moving with object detection technology quickly

Datasets

Source

The Datasets class is designed to simplify loading and parsing a wide range of computer vision datasets.

Main Features:

Caches data so you don't need to download it over and over
Lightweight and fast
Transparent and pythonic API
Out-of-the-box parsers convert common dataset annotation formats into the unified IceVision Data Format

IceData provides several ready-to-use datasets that use both common annotation formats such as COCO and VOC as well as other annotation formats such WheatParser used in the Kaggle Global Wheat Competition

Usage

Object detection datasets use multiple annotation formats (COCO, VOC, and others). IceVision makes it easy to work across all of them with its easy-to-use and extend parsers.

COCO and VOC compatible datasets

For COCO or VOC compatible datasets - especially ones that are not include in IceData - it is easiest to use the IceData COCO or VOC parser.

Example: Raccoon - a dataset using the VOC parser

# Imports
from icevision.all import *
import icedata


# WARNING: Make sure you have already cloned the raccoon dataset using the command shown here above
# Set images and annotations directories
data_dir = Path("raccoon_dataset")
images_dir = data_dir / "images"
annotations_dir = data_dir / "annotations"

# Define the class_map
class_map = ClassMap(["raccoon"])

# Create a parser for dataset using the predefined icevision VOC parser
parser = parsers.voc(
    annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map
)

# Parse the annotations to create the train and validation records
train_records, valid_records = parser.parse()
show_records(train_records[:3], ncols=3, class_map=class_map)

!!! info "Note" Notice how we use the predifined parsers.voc() function:

**parser = parsers.voc(
annotations_dir=annotations_dir, images_dir=images_dir, class_map=class_map
)**

Datasets included in IceData

Datasets included in IceData always have their own parser. It can be invoked with icedata.datasetname.parser(...).

Example: The IceData Fridge dataset

Please check out the fridge folder for more information on how this dataset is structured.

# Imports
from icevision.all import *
import icedata

# Load the Fridge Objects dataset
data_dir = icedata.fridge.load()

# Get the class_map
class_map = icedata.fridge.class_map()

# Parse the annotations
parser = icedata.fridge.parser(data_dir, class_map)
train_records, valid_records = parser.parse()

# Show images with their boxes and labels
show_records(train_records[:3], ncols=3, class_map=class_map)

!!! info "Note" Notice how we use the parser associated with the fridge dataset icedata.fridge.parser():

**parser = icedata.fridge.parser(data_dir, class_map)**

Datasets with a new annotation format

Sometimes, you will need to define a new annotation format for you dataset. Additional information can be found in the documentation. In this case, we strongly recommend you following the file structure and naming conventions used in the examples such as the Fridge dataset, or the PETS dataset.

Disclaimer

Inspired from the excellent HuggingFace Datasets project, icedata is a utility library that downloads and prepares computer vision datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the its license.

If you are a dataset owner and wish to update any of the information in IceData (description, citation, etc.), or do not want your dataset to be included, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!

If you are interested in learning more about responsible AI practices, including fairness, please see Google AI's Responsible AI Practices.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github		.github
docs		docs
examples		examples
icedata		icedata
images		images
notebooks		notebooks
scripts		scripts
templates/dataset		templates/dataset
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Creating_Dataset.ipynb		Creating_Dataset.ipynb
DATASETS_TOOLS.md		DATASETS_TOOLS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PARSING_CHEATSHEET.md		PARSING_CHEATSHEET.md
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datasets Hub for the IceVision Framework

Contributors

Installation

Why IceData?

Datasets

Usage

COCO and VOC compatible datasets

Datasets included in IceData

Datasets with a new annotation format

Disclaimer

About

Releases

Packages

Languages

License

fstroth/icedata

Folders and files

Latest commit

History

Repository files navigation

Datasets Hub for the IceVision Framework

Contributors

Installation

Why IceData?

Datasets

Usage

COCO and VOC compatible datasets

Datasets included in IceData

Datasets with a new annotation format

Disclaimer

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages