TL;DR - We propose a method of measuring segmentation model performance when ground truth is not available.
This repository is a part of our final project in EE046211 Deep Learning course in the Technion - Israel Institute of Technology By Daniel Katz and Natalie Mendelson, spring 2023 you are welcome to read our report , our presentation is available here and a video in Hebrew is available here
- overview
- Motivation
- Model
- Dataset
- Approach
- Uncertainty Metric
- Student T-test between classes
- Entropy between classes
- Entropy between all predictions
- Our files and modifications
- How to use
- Install
- Train
- Inference
- Evaluate Performance
- Acknowledgements
Motivation
Detecting segmentation model errors is crucial in medical imaging due to the consequences of mistakes in this field. Segmentation model performance is usually measured using metrics such as Intersection-Over-Union or Dice Coefficient. but these are relevant only when ground truth is present. During inference, we don't have ground truth. Our goal is to introduce a method to estimate models' prediction uncertainty for medical image segmentation
Model
We implemented our method on nnUNet. nnU-Net is a semantic segmentation method that automatically adapts to a given dataset. It will analyze the provided training cases and automatically configure a matching U-Net-based segmentation pipeline. nnU-Net is widely recognized for its exceptional performance in image segmentation tasks. However, one limitation of nnU-Net is the lack of a measure to indicate the possibility of failure or uncertainty, particularly in large-scale image segmentation applications with heterogeneous data. This is the issue we address in our project. If you are not familiar with nnUNet we advise you to take a look at nnUNet git and paper.
Dataset
We used a publicly available dataset from Harvard Cardiac MR Center Dataverse. It contains cardiac T1-weighted images for 210 patients, 5 slices per patient, and 11 T1-weighted images per slice. Manual contours for Epi- and Endocardial contours are provided for each T1-weighted image. Total of ~11.5K images and labels.
The project employed several steps to estimate the uncertainty of nnU-Net predictions. The learning rate was modified to utilize the cyclic learning rate (clr) technique.
By changing the learning rate in a cyclic manner, the model's convergence to multiple minima was ensured. At each of these minima, multiple checkpoints were extracted from the model. (read about extraction types in Train
Next, we make predictions (probability maps) of each class from the extracted checkpoints. This prediction and the variance between them are used to asses prediction uncertainty. Our pipeline is able to produce an uncertainty map and score in three different methods, as described in choosing uncertainty metric. nnUnet trains 5-fold cross-validation by default, but less is also possible. For each fold the prediction of the model is done using the checkpoint with the best dice metric in the training. on further inference, all 5 folds are used to make predictions, the final prediction of the model is chosen by taking the prediction that has the lowest uncertainty score.
We implemented 3 types of uncertainty metrics:
-
Student T-test between classes: we run pixel-wise T test between the probability maps of each class. for n checkpoints, we will have a group of n probabilities for each class for each pixel. so we can run a statistical hypothesis test the assumption H0: the two classes come from the same distribution. after running the test we get the P value map - we then set the pixels where the P value is lower than 5% to null.
-
Entropy between classes: After obtaining the mean probability maps through ensembling (considering all saved checkpoints), we calculated the entropy value for each pixel.
- Entropy between all predictions: we calculate entropy on all prediction and all classes for each pixel.
We tended to mark our changes with commenting #$ within the code files of nnUNet. Modified files in nnUNet package:
file | modification |
---|---|
setup.py |
changed to support UnnUNet setup |
nnunetv2\training\nnUNetTrainer\nnUNetTrainer.py |
modified for cyclic lr support and custom checkpoint saving |
nnunetv2\training\lr_scheduler\polylr.py |
modified for cyclic lr support |
nnunetv2\run\run_training.py |
modified for cyclic lr |
nnunetv2\utilities\utils.py |
added an option for temperature in softmax |
UnnUNET new files
file | modification |
---|---|
nnunetv2\unnunet\predict_from_folder.py |
prediction using multiple checkpoints |
nnunetv2\unnunet\run_uncertainty_on_fold.py |
uncertainty map calculation |
nnunetv2\unnunet\uncertainty_utils.py |
utilities for UnnUNet |
nnunetv2\unnunet\visualize_results_and_correlation.ipynb |
jupyter nb which you can use to visualize results and calc correlation between dice and uncertainty |
First, go to nnUNet installation instructions and make sure you have all prerequisites.
next run :
git clone https://github.com/KanielDatz/UnnUNet.git
cd UnnUNet
pip install -e .
UnnUNet needs to know where you intend to save raw data, preprocessed data, and trained models. For this, you need to set a few environment variables. Please follow the instructions here.
- instructions are given on 2d configuration only but can be implemented on all nnUnet configurations. If you wish to dive deeper we recommend reading nnUNet 'how to use file.
Dataset - Before you can train your model, please prepare your dataset to match the dataset format that fits the nnUNet. follow the instruction on nnUNet documentation. Place your dataset in UnnUNet_raw directory you set as discribed in [here](# documentation/setting_up_paths.md)
To run preprocessing on the data:
nnUNetv2_plan_and_preprocess -d DATASET --verify_dataset_integrity
write your dataset id or name instead of DATASET
After running preprocessing the dataset fingerprint and training plans will be available in the UnnUnet_preproccesd directory.
For training each fold, run using bash:
CUDA_VISIBLE_DEVICES=[Index of GPU] nnUNetv2_train [DATASET] 2d [FOLD] --npz -device cuda -num_epochs [NUM_E] -num_of_cycles [Tc] -checkpoints [RULE]
When:
Index of GPU
- choose the index of the GPU you want to run on your machine
DATASET
- dataset name or id
FOLD
- which fold do you wish to train
Tc
- Set to 1 to get the regular nnUNet. (default is 1)
NUM_E
- total number of epochs. (default is 1200)
RULE
- here you choose how you want to save the checkpoints -
sparse
- will save 6 evenly spaced checkpoints each cycle, starting at 0.7*(epochs per cycle).late
- will save checkpoints from 10 last epochs of the cycle.
you can adjust as wanted. we recommend first experimenting with one fold.
try nnUNetv2_train -h
for help!
on inference, we first run a prediction for each checkpoint to get uncertainty maps and then combine the results to output a prediction and uncertainty map.
To get probability maps:
-
Make a directory with the images you wish to predict in the nnUNet format.
-
run:
UnnUnet_predict_from_folder -dataset DATASET -fold FOLD -input_folder INPATH -output_folder OUTPATH -rule [RULE]
when:
DATASET
- dataset name or id
FOLD
- which fold do you wish to train
INPATH
- path to the folder with the images you want to predict.
OUTPATH
path to the output folder for the probability maps.
RULE
- here you choose how you want to save the checkpoints -spars
or late
try UnnUnet_predict_from_folder -h
for help!
Now in OUTPATH folder you will have a folder for each checkpoint prediction.
To get an uncertainty map and score:
-
You need to choose the uncertainty method for calculation.
-
run:
UnnUnet_run_uncertainty_on_fold --proba_dir PATH --raw_path PATH --labels PATH --score_type TYPE --outpot_pred_path PATH
--proba_dir
path to the folder with the checkpoints folders (output of the previous script)
--raw_path
path to the folder with the dataset the user wants to predict ( input of the previous script)
--labels
path to the labels of the dataset. optional, if given- the model will add dice to the final output if given.
--score_type
The score type to use for the uncertainty score. default is class_entropy
- other options are total_entropy
and t_test
--outpot_pred_path
path to the folder where the predictions will be saved. default is proba_dir + /unnunet_pred
To visualize results and evaluate uncertainty reliability you can use: nnunetv2\unnunet\visualize_results_and_correlation.ipynb
- when does the score fail?
Overall, we see that it works when the prediction is ‘on the right track, but not there yet’ or ‘somehow right but not exactly’, but it doesn’t work when the prediction is entirely wrong.
If you wish to see failure examples - head to nnunetv2\unnunet\visualize_results_and_correlation.ipynb
and read 'Where does the uncertainty metric fail to predict model performance?' in UnnUnet_documentation\Estimating Uncertainty in nnUnet.pdf
Our teachers:
-
Prof. Daniel Soudry and TA Tal Daniel. Electrical and Computer Engineering Department, Technion
-
Dr. Moti Freiman and Eyal H. Computational MRI Lab, Biomedical Engineering, Technion.
The papers we relied on:
Special Thanks to nnUNet developers! nnU-Net is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging and the Division of Medical Image Computing at the German Cancer Research Center (DKFZ).