Skip to content

Commit

Permalink
Added COVIDNet CXR-S and COVIDxSev for airspace grading to scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
Maya Pavlova committed Apr 16, 2021
1 parent 74eaec6 commit aeaf3d0
Show file tree
Hide file tree
Showing 13 changed files with 1,265 additions and 20 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.ipynb_checkpoints
data
data_sev
.DS_Store
*.zip
__pycache__
Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

**Recording to webinar on [How we built COVID-Net in 7 days with Gensynth](https://darwinai.news/fny)**

**Update 04/16/2021:** We released a new COVIDNet CXR-S [model](docs/models.md) and [COVIDxSev](create_COVIDxSev.ipynb) dataset for airspace severity grading in COVID-19 positive patient CXR images. For more information on training, testing and inference please refer to severity [docs](docs/covidnet_severity.md).
**Update 03/20/2021:** We released a new COVID-Net CXR-2 [model](docs/models.md) for COVID-19 positive/negative detection which was trained on the new COVIDx8B dataset with 16,352 CXR images from a multinational cohort of 15,346 patients from at least 51 countries. The test results are based on the new COVIDx8B test set of 200 COVID-19 positive and 200 negative CXR images.\
**Update 03/19/2021:** We released updated datasets and dataset curation scripts. The COVIDx V8A dataset and create_COVIDx.ipynb are for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V8B dataset and create_COVIDx_binary.ipynb are for COVID-19 positive/negative detection. Both datasets contain over 16000 CXR images with over 2300 positive COVID-19 images.\
**Update 01/28/2021:** We released updated datasets and dataset curation scripts. The COVIDx V7A dataset and create_COVIDx.ipynb are for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V7B dataset and create_COVIDx_binary.ipynb are for COVID-19 positive/negative detection. Both datasets contain over 15600 CXR images with over 1700 positive COVID-19 images.\
Expand Down Expand Up @@ -69,12 +70,13 @@ If you find our work useful, can cite our paper using:
## Quick Links
1. COVIDNet-CXR models (COVID-19 detection using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
2. COVIDNet-CT models (COVID-19 detection using chest CT scans): https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/models.md
3. COVIDNet-S models (COVID-19 lung severity assessment using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
4. COVIDx-CXR dataset: https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md
5. COVIDx-CT dataset: https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/dataset.md
6. COVIDx-S dataset: https://github.com/lindawangg/COVID-Net/tree/master/annotations
7. COVIDNet-P inference for pneumonia: https://github.com/lindawangg/COVID-Net/blob/master/docs/covidnet_pneumonia.md
8. CancerNet-SCa models for skin cancer detection: https://github.com/jamesrenhoulee/CancerNet-SCa/blob/main/docs/models.md
3. COVIDNet-CXR-S models (COVID-19 airspace severity grading using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
4. COVIDNet-S models (COVID-19 lung severity assessment using chest x-rays): https://github.com/lindawangg/COVID-Net/blob/master/docs/models.md
5. COVIDx-CXR dataset: https://github.com/lindawangg/COVID-Net/blob/master/docs/COVIDx.md
6. COVIDx-CT dataset: https://github.com/haydengunraj/COVIDNet-CT/blob/master/docs/dataset.md
7. COVIDx-S dataset: https://github.com/lindawangg/COVID-Net/tree/master/annotations
8. COVIDNet-P inference for pneumonia: https://github.com/lindawangg/COVID-Net/blob/master/docs/covidnet_pneumonia.md
9. CancerNet-SCa models for skin cancer detection: https://github.com/jamesrenhoulee/CancerNet-SCa/blob/main/docs/models.md

Training, inference, and evaluation scripts for COVIDNet-CXR, COVIDNet-CT, COVIDNet-S, and CancerNet-SCa models are available at the respective repos

Expand Down
253 changes: 253 additions & 0 deletions create_COVIDxSev.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"import os\n",
"import random \n",
"from shutil import copyfile\n",
"import pydicom as dicom\n",
"import cv2\n",
"import mdai\n",
"import json\n",
"from collections import Counter\n",
"from pathlib import Path"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [],
"source": [
"# set data directory here\n",
"savepath = 'data_sev'\n",
"Path(os.path.join(savepath, 'test')).mkdir(parents=True, exist_ok=True)\n",
"Path(os.path.join(savepath, 'train')).mkdir(parents=True, exist_ok=True)\n",
"\n",
"seed = 0\n",
"np.random.seed(seed) # Reset the seed so all runs are the same.\n",
"random.seed(seed)\n",
"MAXVAL = 255 # Range [0 255]\n",
"\n",
"# COVIDxSev requires the path to the ricord annotations to also be downloaded\n",
"ricord_annotations = 'create_ricord_dataset/1c_mdai_rsna_project_MwBeK3Nr_annotations_labelgroup_all_2021-01-08-164102.json'\n",
"\n",
"# path to ricord covid-19 images created by create_ricord_dataset/create_ricord_dataset.ipynb\n",
"# run create_ricord_dataset.ipynb before this notebook\n",
"ricord_imgpath = 'create_ricord_dataset/ricord_images'\n",
"ricord_txt = 'create_ricord_dataset/ricord_data_set.txt'\n",
"ricord_studyids = 'create_ricord_dataset/ricord_patientid_to_studyid_mapping.json'\n",
"\n",
"\n",
"\n",
"# parameters for COVIDx dataset\n",
"train = []\n",
"test = []\n",
"test_count = {'level1': 0,'level2': 0, 'NA': 0}\n",
"train_count = {'level1': 0,'level2': 0, 'NA': 0}\n",
"\n",
"\n",
"\n",
"# to avoid duplicates\n",
"patient_imgpath = {}"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [],
"source": [
"mapping = {}\n",
"mapping['Mild Opacities (1-2 lung zones)'] = 'level1'\n",
"mapping['Moderate Opacities (3-4 lung zones)'] = 'level2'\n",
"mapping['Severe Opacities (>4 lung zones)'] = 'level2'\n",
"mapping['Invalid Study'] = 'NA'\n",
"\n",
"classification=[\"Typical Appearance\",\"Indeterminate Appearance\",\"Atypical Appearance\",\"Negative for Pneumonia\"]\n",
"airspace_Disease_Grading=[\"Mild Opacities (1-2 lung zones)\",\"Moderate Opacities (3-4 lung zones)\",\"Severe Opacities (>4 lung zones)\",\"Invalid Study\"]\n",
"\n",
" \n",
" \n",
"def get_label_study(annotations_df, studyid):\n",
" airspace_grading_labels = []\n",
" labels = annotations_df[\"annotations\"].loc[annotations_df[\"annotations\"][\"StudyInstanceUID\"]==studyid][\"labelName\"]\n",
"# print(labels)\n",
" for label in list(labels):\n",
" if label in mapping.keys():\n",
" airspace_grading_labels.append(mapping[label])\n",
" \n",
" severity = Counter(airspace_grading_labels).most_common()[0][0] if airspace_grading_labels else 'NA'\n",
" return severity\n"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Data distribution from covid datasets:\n",
"{'level1': 226, 'level2': 683, 'NA': 187}\n"
]
}
],
"source": [
"filename_label = {'level1': [],'level2': [], 'NA': []}\n",
"count = {'level1': 0,'level2': 0, 'NA':0}\n",
"covid_ds = {'ricord': []}\n",
" \n",
"# get ricord file names \n",
"with open(ricord_txt) as f:\n",
" ricord_file_names = [line.split()[0] for line in f]\n",
" \n",
"# get study ids for every patientid\n",
"with open(ricord_studyids, 'r') as f:\n",
" studyids = json.load(f)\n",
" \n",
"# load ricord annotations\n",
"annotations = mdai.common_utils.json_to_dataframe(ricord_annotations)\n",
"\n",
"for imagename in ricord_file_names:\n",
" patientid = imagename.split('-')[3] + '-' + imagename.split('-')[4]\n",
" study_uuid = imagename.split('-')[-2]\n",
" \n",
" # get complete study id from ricord_studyids json file to match to labels stored in ricord annotations\n",
" for studyid in studyids[patientid]:\n",
" if studyid[-5:] == study_uuid:\n",
" severity_level = get_label_study(annotations, studyid)\n",
" break\n",
" count[severity_level] += 1\n",
" entry = [patientid, imagename, severity_level, 'ricord']\n",
" filename_label[severity_level].append(entry)\n",
" \n",
" covid_ds['ricord'].append(patientid)\n",
" \n",
"print('Data distribution from covid datasets:')\n",
"print(count)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"test count: {'level1': 52, 'level2': 98, 'NA': 0}\n",
"train count: {'level1': 174, 'level2': 585, 'NA': 0}\n"
]
}
],
"source": [
"# Write images into train and test directories accordingly\n",
"\n",
"# get test patients from label file\n",
"with open('labels/test_COVIDxSev.txt', 'r') as f:\n",
" test_patients = [line.split()[0] for line in f]\n",
"\n",
"for label in filename_label.keys():\n",
" # Skip all studyies that do not have an airspace grading\n",
" if label != 'NA':\n",
" for image in filename_label[label]:\n",
" patientid = image[0]\n",
" if patientid in test_patients:\n",
" copyfile(os.path.join(ricord_imgpath, image[1]), os.path.join(savepath, 'test', image[1]))\n",
" test.append(image)\n",
" test_count[image[2]] += 1\n",
" else:\n",
" copyfile(os.path.join(ricord_imgpath, image[1]), os.path.join(savepath, 'train', image[1]))\n",
" train.append(image)\n",
" train_count[image[2]] += 1\n",
"\n",
"print('test count: ', test_count)\n",
"print('train count: ', train_count)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Final stats\n",
"Train count: {'level1': 174, 'level2': 585, 'NA': 0}\n",
"Test count: {'level1': 52, 'level2': 98, 'NA': 0}\n",
"Total length of train: 759\n",
"Total length of test: 150\n"
]
}
],
"source": [
"# final stats\n",
"print('Final stats')\n",
"print('Train count: ', train_count)\n",
"print('Test count: ', test_count)\n",
"print('Total length of train: ', len(train))\n",
"print('Total length of test: ', len(test))"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [],
"source": [
"# export to train and test files\n",
"# format as patientid, filename, label, separated by a space\n",
"# where label is either \"level1\" for mild air space grading or \"level2\" for moderate and severe grading\n",
"with open(\"train_split.txt\",'w') as train_file:\n",
" for sample in train:\n",
" info = str(sample[0]) + ' ' + sample[1] + ' ' + sample[2] + ' ' + sample[3] + '\\n'\n",
" train_file.write(info)\n",
"\n",
"with open(\"test_split.txt\", 'w') as test_file:\n",
" for sample in test:\n",
" info = str(sample[0]) + ' ' + sample[1] + ' ' + sample[2] + ' ' + sample[3] + '\\n'\n",
" test_file.write(info)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}

Large diffs are not rendered by default.

5 changes: 0 additions & 5 deletions data.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,6 @@

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# To remove TF Warnings
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

def crop_top(img, percent=0.15):
offset = int(img.shape[0] * percent)
return img[offset:]
Expand Down Expand Up @@ -131,7 +127,6 @@ def __init__(
datasets[l.split()[2]].append(l)

if self.is_severity_model:
# For COVIDNet CXR-S upsample the severity level 1 cases to create balanced 50/50 batches
self.datasets = [
datasets['level2'], datasets['level1']
]
Expand Down
3 changes: 2 additions & 1 deletion docs/COVIDx.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# COVIDx Dataset
**Update 04/16/2021:Released COVIDxSev, a new airspace severity grading dataset for COVID-19 positive patients for COVIDNet CXR-S model.**\
**Update 03/19/2021:Released new datasets with both over 16,000 CXR images from a multinational cohort of over 15,100 patients from at least 51 countries. The dataset contains over 2,300 positive COVID-19 images from over 1,500 patients. The COVIDx V8A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V8B dataset is for COVID-19 positive/negative detection.**\
**Update 01/28/2021:Released new datasets with over 15600 CXR images and over 1700 positive COVID-19 images. The COVIDx V7A dataset is for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V7B dataset is for COVID-19 positive/negative detection.**\
**Update 01/05/2021: Released new dataset for binary classification (COVID-19 positive or COVID-19 negative). Train dataset contains 517 positive and 13794 negative samples. Test dataset contains 100 positive and 100 negative samples.**\
Expand All @@ -24,7 +25,7 @@ The current COVIDx dataset is constructed by the following open source chest rad
* `git clone https://github.com/agchung/Actualmed-COVID-chestxray-dataset.git`
* go to this [link](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database/version/3) to download the COVID-19 Radiography database. Only the COVID-19 image folder and metadata file is required. The overlaps between covid-chestxray-dataset are handled in the dataset curation scripts. **Note:** for COVIDx versions 8 & 7 please use Version 3 of the dataset, and for versions COVIDx6 and below please use Version 1.
* go to this [link](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/data) to download the RSNA pneumonia dataset
* go to this [link] (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281) to download the RICORD COVID-19 dataset and clinical data csv
* go to this [link] (https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70230281) to download the RICORD COVID-19 dataset, clinical data csv, and annotations
2. Create a `data` directory and within the data directory, create a `train` and `test` directory
3. Use [create\_ricord\_dataset\\create\_ricord\_dataset.ipynb](../create_ricord_dataset/create_ricord_dataset.ipynb) to pre-process the RICORD dataset before handling.
3. Use [create\_COVIDx\_binary.ipynb](../create_COVIDx_binary.ipynb) to combine the three datasets to create COVIDx for binary classification. Make sure to remember to change the file paths. Use [create\_COVIDx.ipynb](../create_COVIDx.ipynb) for datasets compatible with COVIDx5 and earlier models (not binary classification).
Expand Down
70 changes: 70 additions & 0 deletions docs/covidnet_severity.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,73 @@
# COVIDNet CXR-S Air Space Severity Grading
COVDNet CXR-S model takes as input a chest x-ray image of shape (N, 480, 480, 3). where N is the number of batches,
and outputs the airspace severity of a SARS-CoV-2 positive patient. The airspace severity is grouped into two levels: 1) Level 1: opacities in 1-2 lung zones, and 2) Level 2: opacities in 3 or more lung zones.

If using the TF checkpoints, here are some useful tensors:

* input tensor: `input_1:0`
* logit tensor: `norm_dense_2/MatMul:0`
* output tensor: `norm_dense_2/Softmax:0`
* label tensor: `norm_dense_1_target:0`
* class weights tensor: `norm_dense_1_sample_weights:0`
* loss tensor: `Mean:0`

### Steps for training
To train the model the COVIDxSev dataset is required, to create the dataset please run [create_COVIDxSev.ipynb](../create_COVIDxSev.ipynb).
TF training script from a pretrained model:
1. We provide you with the tensorflow evaluation script, [train_tf.py](../train_tf.py)
2. Locate the tensorflow checkpoint files (location of pretrained model)
3. To train from the COVIDNet-CXR-S pretrained model:
```
python3 train_tf.py \
--weightspath models/COVIDNet-CXR-S \
--metaname model.meta \
--ckptname model \
--n_classes 2 \
--datadir data_sev \
--trainfile labels/train_COVIDxSev.txt \
--testfile labels/test_COVIDxSev.txt \
--out_tensorname norm_dense_2/Softmax:0 \
--logit_tensorname norm_dense_2/MatMul:0 \
--is_severity_model
```
4. For more options and information, `python train_tf.py --help`

### Steps for evaluation
To evaluate the model the COVIDxSev dataset is required, to create the dataset please run [create_COVIDxSev.ipynb](../create_COVIDxSev.ipynb).
1. We provide you with the tensorflow evaluation script, [eval.py](../eval.py)
2. Locate the tensorflow checkpoint files
3. To evaluate a tf checkpoint:
```
python eval.py \
--weightspath models/COVIDNet-CXR-S \
--metaname model.meta \
--ckptname model \
--n_classes 2 \
--testfolder data_sev/test \
--testfile labels/test_COVIDxSev.txt \
--out_tensorname norm_dense_2/Softmax:0 \
--is_severity_model
```
4. For more options and information, `python eval.py --help`

### Steps for inference
**DISCLAIMER: Do not use this prediction for self-diagnosis. You should check with your local authorities for the latest advice on seeking medical assistance.**

1. Download a model from the [pretrained models section](models.md)
2. Locate models and xray image to be inferenced
3. To inference,
```
python inference.py \
--weightspath models/COVIDNet-CXR-S \
--metaname model.meta \
--ckptname model \
--n_classes 2 \
--imagepath assets/ex-covid.jpeg \
--out_tensorname norm_dense_2/Softmax:0 \
--is_severity_model
```
4. For more options and information, `python inference.py --help`

# COVIDNet Lung Severity Scoring
COVIDNet-S-GEO and COVIDNet-S-OPC models takes as input a chest x-ray image of shape (N, 480, 480, 3), where N is the number of batches, and outputs the SARS-CoV-2 severity scores for geographic extent and opacity extent, respectively. COVIDNet-S-GEO predicts the geographic severity. Geographic severity is based on the geographic extent score for right and left lung. For each lung: 0 = no involvement; 1 = <25%; 2 = 25-50%; 3 = 50-75%; 4 = >75% involvement, resulting in scores from 0 to 8. COVIDNet-S-OPC predicts the opacity severity. Opacity severity is based on the opacity extent score for right and left lung. For each lung, the score is from 0 to 4, with 0 = no opacity and 4 = white-out, resulting in scores from 0 to 8. For detailed description of COVIDNet lung severity scoring methodology, see the paper [here](https://arxiv.org/abs/2005.12855).

Expand Down
Loading

0 comments on commit aeaf3d0

Please sign in to comment.