This is the pytroch implementation for the paper:
RadFusion: Benchmarking Performance and Fairness for Multimodal Pulmonary Embolism Detection from CT and EHR. Paper in arXiv.
Pulmonary Embolism auto detection from CT data and Electronic Health Records.
- data: the CT images and EHR data should be put under this directory.
- dataset: custom dataset/dataloader class (inherited from Dataset/Dataloader).
- models: PENet (a CNN model) and Fusion model.(A vision transformer model is going to be uploaded).
The pacakges required are list below and the version is the lastest.
- Pytorch
- Numpy
- Pandas
- sklearn
- scipy
- The path in train.sh and test.sh have to be windows-style.
- The path in
read_pkl.py
,./scripts/create_pe_hdf5_update.py
andgenerate_ehr.py
have to be windows-style. - Go to
datasets/ct_pe_dataset_3d.py
and find_load_volume()
to see the comment in that function.
CT data and EHR records can be downloaded here
If you choose to use only part of the CT data,
- put the .npy files into the directory
data/raw
- put the .csv files into
data/
- Generate .pkl file for the part of data we choose: modify the list part_of_study in
read_pkl.py
: fill the list with 'idx' of the data you choose and runpython read_pkl.py
, then a file namedseries_list.pkl
will appear indata/processed
- Generate hdf5 file for the part of data we choose: run
python ./scripts/create_pe_hdf5_update.py
to generate data.hdf5 file under the directorydata/processed
( - Generate combined EHR record for the part of data we choose: modify the list part_of_study in
generate_ehr.py
:fill the list with 'idx' of the data you choose and runpython generate_ehr.py
, then a file named part_of_ehr.csv will appear indata/processed
Just to check there are three files in data/processed
after doing the steps above:
- series_list.pkl
- data.hdf5
- part_of_ehr.csv
Our model has two parts: PENet and Elasticnet. Download the best checkpoint of PENet and put it into ./data/ckpt
.
To train the fusion model, run sh train.sh
. After the training is finished, the trained model is stored at ./train_logs
.
To test the model, modify the ckpt_path
in test.sh and run sh test.sh
If you choose to use all the CT data,
Just put the corresponding .hdf5, series_list.pkl and part_of_ehr.csv into
data/processed
.