Measuring the effect of corrupted labels within training data on model performance.
In this work, we focus on studying the impairment of image classification on model performance due to corrupted labels in the training dataset. We artificially generate alterations on images with the help of computer vision algorithms - and consequently label them 100% correctly without discrepancies. We then introduce and steadily increase the ratio of falsified labels and measure the effect of the corrupted label ratio on model performance. Thereby, we hope to draw and generalize conclusion of said effect for potential inference as well as find models or model architectures that are as robust as possible to data sets with incorrect annotations.
Note: "corrupted labels" and "false labels" are used interchangeably in this work.
Two different classification tasks (4 & 14 classes) are performed using computer vision deep learning models. Modifications to the images and respective labels are generated artificially, resulting in 100% correct data labels. As a baseline, the model is trained and tested on the 100% correctly labeled data.
Then, some labels are updated (corrupted) iteratively per training run (e.g. 0.5%, 1%, 2%, etc.) to match a desired corrupted label rate in the training data. The model is then trained on the modified data including falsified labels, and tested using correctly labeled data. Resulting metrics are examined comparing the model performance on the modified data with each other and to the model performance on the 100% correctly labeled data using regression.
Examples of altered images, each left to right: original image, original image with drawn in region of change, original image with drawn-in region including changed pixel values - and changed image with which the neural networks were trained
Using the data_generator
module, 4 classes and 14 subclasses are generated:
- Blobs: Add blobs to image within the randomly generated polygonal shape
- uni-color Red
- uni-color Green
- uni-color Blue
- multi-color Blue / Green / Red
- Blur: Blur image within the randomly generated polygonal shape
- Gaussian blur
- Color-channel-change: Randomly changes the order of the color channels within the gernerated polygon
- Blue - Green - Red
- Blue - Red - Green
- Green - Blue - Red
- Green - Red - Blue
- Red - Blue - Green
- Color distortion: Change the hue within the generated polygonal shape
- Red
- Green
- Blue
- All
The projects is structured as follows:
- ./assets/:
- artificial image altering examples (png files)
- evaluation visualizations will be created as generated by
./notebooks/
(png files)
- ./config/: log configurations
- ./data/:
- artificially altered image data (based on pascal voc, png files) for training and evaluation as well as labels (npy files)
- aggregated class reports as generated by
./src/evaluation/main.py
(csv file)
- ./logs/: training and evaluation logs
- subfolder
class_report
will be created during training, including a sklearn based class reports after each epoch per training run (json files) - subfolder
label_mapping
will be created during training, including a mapping of the labels to their respective names for each training run (json files) - subfolder
model_summary
will be created during training, including a summary of the model architecture for each run (txt files) - subfolder
models
will be created during training, storing the trained model for each run (h5 files) - subfolder
scalars
will be created during training, including tensorboard scalar metrics logs for training, validation and test phase for each run (tensorboard event files) - subfolder
regr_results
will be created during classification result evaluation (e.g., accuracy over false labels ratio regression), including a csv file with the regression results (csv file)
- subfolder
- ./notebooks/: Jupyter notebooks
- ./notebooks/exploratory.ipynb: exploratory analysis of the classification labels
- ./notebooks/result_regression.ipynb: view regression results of false labels ratio prediction based on classification results
- ./notebooks/result_visualization.ipynb: visualizing classification preformance results (e.g., accuracy, precision, recall, f1-score)
- ./src/: preprocessing, training and evaluation modules
- ./src/data_generator/: module to modify images and create labels (100% correctly labeled data)
- ./src/evaluation/: module to train and test regression on classification results
- ./src/false_labels_effect/: module to train and test the classification models
- ./tests/: testing module
The modules in src
are setup as follows:
- ./src/false_labels_effect/: module to train and test the classification models
- ./src/false_labels_effect/callbacks.py: tensorflow callbacks for the models
- ./src/false_labels_effect/data_loader.py: keras data loader
- ./src/false_labels_effect/gpu_config.py: GPU configuration
- ./src/false_labels_effect/main.py: entry point for classification training and testing
- ./src/false_labels_effect/models.py: model architecutre definition and initialization
- ./src/false_labels_effect/util.py: utility functions for data and label processing
- ./src/data_generator/: module to modify images and create labels (100% correctly labeled data)
- ./src/data_generator/generate_data.py: entry point for generating artificially altering images and respective labels
./src/data_generator/main.py:currently not used- ./src/data_generator/util.py: utility functions for artificial image altering and label generation
- ./src/evaluation/: module to train and test regression on classification results
- ./src/evaluation/main.py: entry point for regression training and testing
- ./src/evaluation/util.py: utility function for data and label processing as well as model training
To set up your local development environment, please use a fresh virtual environment.
To create the environment run:
conda env create --file=environment-dev.yml
To activate the environment run:
conda activate false-labels
To update this environment with your production dependencies run:
conda env update --file=environment.yml
Project flow:
- download data and alter based on ./src/data_generator/generate_data.py
- train and test classification models based on ./src/false_labels_effect/main.py
- train and test classification result regression estimator based on ./src/evaluation/main.py
Optionally, you can (review required to meet specific requirements):
- explore the classification data and labels based on ./notebooks/exploratory.ipynb
- visualize classification results based on ./notebooks/result_visualization.ipynb
- visualize regression results based on ./notebooks/result_regression.ipynb
We use pytest
as test framework. To execute the tests, please run
python setup.py test
To run the tests with coverage information, please use
python setup.py testcov
and have a look at the htmlcov
folder, after the tests are done.
To use your module code (src/
) in Jupyter notebooks (notebooks/
) without running into import errors, make sure to install the source locally
pip install -e .
This way, you'll always use the latest version of your module code in your notebooks via import false_labels_effect
.
Assuming you already have Jupyter installed, you can make your virtual environment available as a separate kernel by running:
conda install ipykernel
python -m ipykernel install --user --name="false-labels-effect"
Note that we mainly use notebooks for experiments, visualizations and reports. Every piece of functionality that is meant to be reused should go into module code and be imported into notebooks.
Tensorboard is partially used for logging training, validation and testing resutls. To view Tensorboard with logged metrics, open terminal, activate created env and from project root directory start Tensorboard via
tensorboard --logdir ./logs/scalars
Then open http://localhost:6006/ in your prefered browser.
Before contributing, please set up the pre-commit hooks to reduce errors and ensure consistency
pip install -U pre-commit
pre-commit install
© Alexander Thamm GmbH