Added new COVID-Net CXR-2 model and updated scripts to handle both bi…

…nary and 3 class detection
lindawangg · Mar 18, 2021 · 28aac68 · 28aac68
1 parent 3e3aa45
commit 28aac68
Show file tree

Hide file tree

Showing 10 changed files with 16,982 additions and 51 deletions.
diff --git a/README.md b/README.md
@@ -4,6 +4,7 @@
 
 **Recording to webinar on [How we built COVID-Net in 7 days with Gensynth](https://darwinai.news/fny)**
 
+**Update 03/19/2021:** We released a new COVID-Net CXR-2 [model](docs/models.md) for COVID-19 positive/negative detection which was trained on the new COVIDx8B dataset with over 16,500 CXR images from a multinational cohort of 15,528 patients from at least 51 countries. The test results are based on the new COVIDx8B test set of 200 COVID-19 positive and 200 negative CXR images.\
 **Update 01/28/2021:** We released updated datasets and dataset curation scripts. The COVIDx V7A dataset and create_COVIDx.ipynb are for detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia, and COVIDx V7B dataset and create_COVIDx_binary.ipynb are for COVID-19 positive/negative detection. Both datasets contain over 15600 CXR images with over 1700 positive COVID-19 images.\
 **Update 01/05/2021:** We released a new COVIDx6 dataset for binary classification (COVID-19 positive or COVID-19 negative) with over 14500 CXR images and 617 positive COVID-19 images.\
 **Update 11/24/2020:** We released [CancerNet-SCa](https://github.com/jamesrenhoulee/CancerNet-SCa) for skin cancer detection, part of the CancerNet initiatives.\
@@ -18,9 +19,9 @@
 **Update 04/16/2020:** If you have questions, please check the new [FAQ](docs/FAQ.md) page first.
 
 <p align="center">
-	<img src="assets/covidnetv3-3p-rca.png" alt="photo not available" width="70%" height="70%">
+	<img src="assets/covidnet-cxr-2.png" alt="photo not available" width="70%" height="70%">
 	<br>
-	<em>Example chest radiography images of COVID-19 cases from 2 different patients and their associated critical factors (highlighted in red) as identified by GSInquire.</em>
+	<em>COVID-Net CXR-2 for COVID-19 positive/negative detection architecture and example chest radiography images of COVID-19 cases from 2 different patients and their associated critical factors (highlighted in red) as identified by GSInquire.</em>
 </p>
 
 The COVID-19 pandemic continues to have a devastating effect on the health and well-being of the global population.  A critical step in the fight against COVID-19 is effective screening of infected patients, with one of the key screening approaches being radiology examination using chest radiography.  It was found in early studies that patients present abnormalities in chest radiography images that are characteristic of those infected with COVID-19.  Motivated by this and inspired by the open source efforts of the research community, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images that is open source and available to the general public. To the best of the authors' knowledge, COVID-Net is one of the first open source network designs for COVID-19 detection from CXR images at the time of initial release.  We also introduce COVIDx, an open access benchmark dataset that we generated comprising of 13,975 CXR images across 13,870 patient patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors' knowledge.  Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to not only gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening, but also audit COVID-Net in a responsible and transparent manner to validate that it is making decisions based on relevant information from the CXR images.  **By no means a production-ready solution**, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.
@@ -127,6 +128,35 @@ Additional requirements to generate dataset:
 ## Results
 These are the final results for the COVIDNet models.
 
+### COVIDNet-CXR-2 on COVIDx8B (200 COVID-19 test)
+<div class="tg-wrap"><table class="tg">
+  <tr>
+    <th class="tg-7btt" colspan="3">Sensitivity (%)</th>
+  </tr>
+  <tr>
+    <td class="tg-7btt">Negative</td>
+    <td class="tg-7btt">Positive</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">96.5</td>
+    <td class="tg-c3ow">95.5</td>
+  </tr>
+</table></div>
+
+<div class="tg-wrap"><table class="tg">
+  <tr>
+    <th class="tg-7btt" colspan="3">Positive Predictive Value (%)</th>
+  </tr>
+  <tr>
+    <td class="tg-7btt">Negative</td>
+    <td class="tg-7btt">Positive</td>
+  </tr>
+  <tr>
+    <td class="tg-c3ow">95.5</td>
+    <td class="tg-c3ow">96.5</td>
+  </tr>
+</table></div>
+
 ### COVIDNet-CXR4-A on COVIDx4 (100 COVID-19 test)
 <div class="tg-wrap"><table class="tg">
   <tr>

diff --git a/assets/covidnet-cxr-2.png b/assets/covidnet-cxr-2.png
diff --git a/data.py b/data.py
@@ -7,6 +7,10 @@
 
 from tensorflow.keras.preprocessing.image import ImageDataGenerator
 
+# To remove TF Warnings
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
+
 def crop_top(img, percent=0.15):
     offset = int(img.shape[0] * percent)
     return img[offset:]
@@ -85,17 +89,16 @@ def __init__(
             is_training=True,
             batch_size=8,
             input_shape=(224, 224),
-            n_classes=3,
+            n_classes=2,
             num_channels=3,
             mapping={
-                'normal': 0,
-                'pneumonia': 1,
-                'COVID-19': 2
+                'negative': 0,
+                'positive': 1,
             },
             shuffle=True,
             augmentation=apply_augmentation,
-            covid_percent=0.3,
-            class_weights=[1., 1., 6.],
+            covid_percent=0.5,
+            class_weights=[1., 1.],
             top_percent=0.08
     ):
         'Initialization'
@@ -108,20 +111,34 @@ def __init__(
         self.n_classes = n_classes
         self.num_channels = num_channels
         self.mapping = mapping
-        self.shuffle = True
+        self.shuffle = shuffle
         self.covid_percent = covid_percent
         self.class_weights = class_weights
         self.n = 0
         self.augmentation = augmentation
         self.top_percent = top_percent
 
-        datasets = {'normal': [], 'pneumonia': [], 'COVID-19': []}
+        datasets = {}
+        for key in self.mapping.keys():
+            datasets[key] = []
+
         for l in self.dataset:
-            datasets[l.split()[2]].append(l)
-        self.datasets = [
-            datasets['normal'] + datasets['pneumonia'],
-            datasets['COVID-19'],
-        ]
+            if l.split()[-1] == 'sirm':
+                datasets[l.split()[3]].append(l)
+            else:
+                datasets[l.split()[2]].append(l)
+
+        if self.n_classes == 2:
+            self.datasets = [
+                datasets['negative'], datasets['positive']
+            ]
+        elif self.n_classes == 3:
+            self.datasets = [
+                datasets['normal'] + datasets['pneumonia'],
+                datasets['COVID-19'],
+            ]
+        else:
+            raise Exception('Only binary or 3 class classification currently supported.')
         print(len(self.datasets[0]), len(self.datasets[1]))
 
         self.on_epoch_end()
@@ -170,6 +187,10 @@ def __getitem__(self, idx):
         for i in range(len(batch_files)):
             sample = batch_files[i].split()
 
+            # Remove first item from sirm samples for proper indexing as a result of spacing in file name
+            if sample[-1] == 'sirm':
+                sample.pop(0)
+
             if self.is_training:
                 folder = 'train'
             else:

diff --git a/docs/models.md b/docs/models.md
@@ -3,6 +3,7 @@
 ## COVIDNet Chest X-Ray Classification
 |  Type | Input Resolution | COVID-19 Sensitivity | Accuracy | # Params (M) | MACs (G) |        Model        |
 |:-----:|:----------------:|:--------------------:|:--------:|:------------:|:--------:|:-------------------:|
+|  ckpt |      480x480     |         95.5         |   96.0   |      8.8    |  5.55   |[COVIDNet-CXR-2](https://bit.ly/COVIDNet-CXR-2)|
 |  ckpt |      480x480     |         95.0         |   94.3   |      40.2    |  23.63   |[COVIDNet-CXR4-A](https://bit.ly/COVIDNet-CXR4-A)|
 |  ckpt |      480x480     |         93.0         |   93.7   |      11.7    |   7.50   |[COVIDNet-CXR4-B](https://bit.ly/COVIDNet-CXR4-B)|
 |  ckpt |      480x480     |         96.0         |   93.3   |       9.2    |   5.55   |[COVIDNet-CXR4-C](https://bit.ly/COVIDNet-CXR4-C)|

diff --git a/docs/train_eval_inference.md b/docs/train_eval_inference.md
@@ -1,5 +1,64 @@
 # Training, Evaluation and Inference
-COVIDNet-CXR4 models takes as input an image of shape (N, 480, 480, 3) and outputs the softmax probabilities as (N, 3), where N is the number of batches.
+## COVID-19 positive/negative detection
+COVIDNet-CXR-2 model takes as input an image of shape (N, 480, 480, 3) and outputs the softmax probabilities of COVID-19 positive and negative detection as (N, 2), where N is the number of batches.
+If using the TF checkpoints, here are some useful tensors:
+
+* input tensor: `input_1:0`
+* logit tensor: `norm_dense_2/MatMul:0`
+* output tensor: `norm_dense_2/Softmax:0`
+* label tensor: `norm_dense_1_target:0`
+* class weights tensor: `norm_dense_1_sample_weights:0`
+* loss tensor: `Mean:0`
+
+### Steps for training
+TF training script from a pretrained model:
+1. We provide you with the tensorflow evaluation script, [train_tf.py](../train_tf.py)
+2. Locate the tensorflow checkpoint files (location of pretrained model)
+3. To train from the COVIDNet-CXR-2 pretrained model:
+```
+python train_tf.py \
+    --weightspath models/COVIDNet-CXR-2 \
+    --metaname model.meta \
+    --ckptname model \
+    --n_classes 2 \
+    --trainfile train_COVIDx8B.txt \
+    --testfile test_COVIDx8B.txt \
+```
+4. For more options and information, `python train_tf.py --help`
+
+### Steps for evaluation
+
+1. We provide you with the tensorflow evaluation script, [eval.py](../eval.py)
+2. Locate the tensorflow checkpoint files
+3. To evaluate a tf checkpoint:
+```
+python eval.py \
+    --weightspath models/COVIDNet-CXR-2 \
+    --metaname model.meta \
+    --ckptname model \
+    --n_classes 2 \
+    --testfile test_COVIDx8B.txt
+```
+4. For more options and information, `python eval.py --help`
+
+### Steps for inference
+**DISCLAIMER: Do not use this prediction for self-diagnosis. You should check with your local authorities for the latest advice on seeking medical assistance.**
+
+1. Download a model from the [pretrained models section](models.md)
+2. Locate models and xray image to be inferenced
+3. To inference,
+```
+python inference.py \
+    --weightspath models/COVIDNet-CXR-2 \
+    --metaname model.meta \
+    --ckptname model \
+    --n_classes 2 \
+    --imagepath assets/ex-covid.jpeg
+```
+4. For more options and information, `python inference.py --help`
+
+## Detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia
+COVIDNet-CXR4 models take as input an image of shape (N, 480, 480, 3) and outputs the softmax probabilities as (N, 2), where N is the number of batches.
 If using the TF checkpoints, here are some useful tensors:
 
 * input tensor: `input_1:0`
@@ -9,7 +68,7 @@ If using the TF checkpoints, here are some useful tensors:
 * class weights tensor: `norm_dense_1_sample_weights:0`
 * loss tensor: `loss/mul:0`
 
-## Steps for training
+### Steps for training
 TF training script from a pretrained model:
 1. We provide you with the tensorflow evaluation script, [train_tf.py](../train_tf.py)
 2. Locate the tensorflow checkpoint files (location of pretrained model)
@@ -19,12 +78,13 @@ python train_tf.py \
     --weightspath models/COVIDNet-CXR4-A \
     --metaname model.meta \
     --ckptname model-18540 \
+    --n_classes 3 \
     --trainfile train_COVIDx5.txt \
     --testfile test_COVIDx5.txt \
 ```
 4. For more options and information, `python train_tf.py --help`
 
-## Steps for evaluation
+### Steps for evaluation
 
 1. We provide you with the tensorflow evaluation script, [eval.py](../eval.py)
 2. Locate the tensorflow checkpoint files
@@ -33,11 +93,13 @@ python train_tf.py \
 python eval.py \
     --weightspath models/COVIDNet-CXR4-A \
     --metaname model.meta \
-    --ckptname model-18540
+    --ckptname model-18540 \
+    --n_classes 3 \
+    --testfile test_COVIDx7A.txt
 ```
 4. For more options and information, `python eval.py --help`
 
-## Steps for inference
+### Steps for inference
 **DISCLAIMER: Do not use this prediction for self-diagnosis. You should check with your local authorities for the latest advice on seeking medical assistance.**
 
 1. Download a model from the [pretrained models section](models.md)
@@ -48,6 +110,7 @@ python inference.py \
     --weightspath models/COVIDNet-CXR4-A \
     --metaname model.meta \
     --ckptname model-18540 \
+    --n_classes 3 \
     --imagepath assets/ex-covid.jpeg
 ```
 4. For more options and information, `python inference.py --help`

diff --git a/eval.py b/eval.py
@@ -6,9 +6,11 @@
 
 from data import process_image_file
 
-mapping = {'normal': 0, 'pneumonia': 1, 'COVID-19': 2}
+# To remove TF Warnings
+tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
 
-def eval(sess, graph, testfile, testfolder, input_tensor, output_tensor, input_size):
+def eval(sess, graph, testfile, testfolder, input_tensor, output_tensor, input_size, mapping):
     image_tensor = graph.get_tensor_by_name(input_tensor)
     pred_tensor = graph.get_tensor_by_name(output_tensor)
 
@@ -29,23 +31,22 @@ def eval(sess, graph, testfile, testfolder, input_tensor, output_tensor, input_s
     print(matrix)
     #class_acc = np.array(cm_norm.diagonal())
     class_acc = [matrix[i,i]/np.sum(matrix[i,:]) if np.sum(matrix[i,:]) else 0 for i in range(len(matrix))]
-    print('Sens Normal: {0:.3f}, Pneumonia: {1:.3f}, COVID-19: {2:.3f}'.format(class_acc[0],
-                                                                               class_acc[1],
-                                                                               class_acc[2]))
+
+    mapping_keys = list(mapping.keys())
+    print('Sens', ' '.join(mapping_keys[i].capitalize() + ': ' + str(class_acc[i]) + ' ' for i in range(len(mapping))))
     ppvs = [matrix[i,i]/np.sum(matrix[:,i]) if np.sum(matrix[:,i]) else 0 for i in range(len(matrix))]
-    print('PPV Normal: {0:.3f}, Pneumonia {1:.3f}, COVID-19: {2:.3f}'.format(ppvs[0],
-                                                                             ppvs[1],
-                                                                             ppvs[2]))
+    print('PPV', ' '.join(mapping_keys[i].capitalize() + ': ' + str(ppvs[i]) + ' ' for i in range(len(mapping))))
 
 if __name__ == '__main__':
     parser = argparse.ArgumentParser(description='COVID-Net Evaluation')
-    parser.add_argument('--weightspath', default='models/COVIDNet-CXR4-A', type=str, help='Path to output folder')
+    parser.add_argument('--weightspath', default='models/COVIDNet-CXR-2', type=str, help='Path to output folder')
     parser.add_argument('--metaname', default='model.meta', type=str, help='Name of ckpt meta file')
-    parser.add_argument('--ckptname', default='model-18540', type=str, help='Name of model ckpts')
-    parser.add_argument('--testfile', default='test_COVIDx5.txt', type=str, help='Name of testfile')
+    parser.add_argument('--ckptname', default='model', type=str, help='Name of model ckpts')
+    parser.add_argument('--n_classes', default=2, type=int, help='Number of detected classes, defaults to 2')
+    parser.add_argument('--testfile', default='labels/test_COVIDx8B.txt', type=str, help='Name of testfile')
     parser.add_argument('--testfolder', default='data/test', type=str, help='Folder where test data is located')
     parser.add_argument('--in_tensorname', default='input_1:0', type=str, help='Name of input tensor to graph')
-    parser.add_argument('--out_tensorname', default='norm_dense_1/Softmax:0', type=str, help='Name of output tensor from graph')
+    parser.add_argument('--out_tensorname', default='norm_dense_2/Softmax:0', type=str, help='Name of output tensor from graph')
     parser.add_argument('--input_size', default=480, type=int, help='Size of input (ex: if 480x480, --input_size 480)')
 
     args = parser.parse_args()
@@ -60,4 +61,22 @@ def eval(sess, graph, testfile, testfolder, input_tensor, output_tensor, input_s
     file = open(args.testfile, 'r')
     testfile = file.readlines()
 
-    eval(sess, graph, testfile, args.testfolder, args.in_tensorname, args.out_tensorname, args.input_size)
+    if args.n_classes == 2:
+        # For COVID-19 positive/negative detection
+        mapping = {
+            'negative': 0,
+            'positive': 1,
+        }
+    elif args.n_classes == 3:
+        # For detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia
+        mapping = {
+            'normal': 0,
+            'pneumonia': 1,
+            'COVID-19': 2
+        }
+    else:
+        raise Exception('''COVID-Net currently only supports 2 class COVID-19 positive/negative detection
+            or 3 class detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia''')
+
+
+    eval(sess, graph, testfile, args.testfolder, args.in_tensorname, args.out_tensorname, args.input_size, mapping)
diff --git a/inference.py b/inference.py
@@ -6,19 +6,30 @@
 from data import process_image_file
 
 parser = argparse.ArgumentParser(description='COVID-Net Inference')
-parser.add_argument('--weightspath', default='models/COVIDNet-CXR4-A', type=str, help='Path to output folder')
+parser.add_argument('--weightspath', default='models/COVIDNet-CXR-2', type=str, help='Path to output folder')
 parser.add_argument('--metaname', default='model.meta', type=str, help='Name of ckpt meta file')
-parser.add_argument('--ckptname', default='model-18540', type=str, help='Name of model ckpts')
+parser.add_argument('--ckptname', default='model', type=str, help='Name of model ckpts')
+parser.add_argument('--n_classes', default=2, type=int, help='Number of detected classes, defaults to 2')
 parser.add_argument('--imagepath', default='assets/ex-covid.jpeg', type=str, help='Full path to image to be inferenced')
 parser.add_argument('--in_tensorname', default='input_1:0', type=str, help='Name of input tensor to graph')
-parser.add_argument('--out_tensorname', default='norm_dense_1/Softmax:0', type=str, help='Name of output tensor from graph')
+parser.add_argument('--out_tensorname', default='norm_dense_2/Softmax:0', type=str, help='Name of output tensor from graph')
 parser.add_argument('--input_size', default=480, type=int, help='Size of input (ex: if 480x480, --input_size 480)')
 parser.add_argument('--top_percent', default=0.08, type=float, help='Percent top crop from top of image')
 
 args = parser.parse_args()
 
-mapping = {'normal': 0, 'pneumonia': 1, 'COVID-19': 2}
-inv_mapping = {0: 'normal', 1: 'pneumonia', 2: 'COVID-19'}
+if args.n_classes == 2:
+    # For COVID-19 positive/negative detection
+    mapping = {'negative': 0, 'positive': 1}
+    inv_mapping = {0: 'negative', 1: 'positive'}
+elif args.n_classes == 3:
+    # For detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia
+    mapping = {'normal': 0, 'pneumonia': 1, 'COVID-19': 2}
+    inv_mapping = {0: 'normal', 1: 'pneumonia', 2: 'COVID-19'}
+else:
+    raise Exception('''COVID-Net currently only supports 2 class COVID-19 positive/negative detection
+        or 3 class detection of no pneumonia/non-COVID-19 pneumonia/COVID-19 pneumonia''')
+mapping_keys = list(mapping.keys())
 
 sess = tf.Session()
 tf.get_default_graph()
@@ -36,6 +47,6 @@
 
 print('Prediction: {}'.format(inv_mapping[pred.argmax(axis=1)[0]]))
 print('Confidence')
-print('Normal: {:.3f}, Pneumonia: {:.3f}, COVID-19: {:.3f}'.format(pred[0][0], pred[0][1], pred[0][2]))
+print(' '.join(mapping_keys[i].capitalize() + ': ' + str(pred[0][i]) + ' ' for i in range(args.n_classes)))
 print('**DISCLAIMER**')
 print('Do not use this prediction for self-diagnosis. You should check with your local authorities for the latest advice on seeking medical assistance.')