The repository for the paper "A Multimodal Transformer: Fusing Clinical Notes With Structured EHR Data for Interpretable In-Hospital Mortality Prediction" submitted to AMIA'22 Annual Symposium.
The codes are tested on CUDA 11.4 with 24GB RAM GPU. For environment setup, please follow the install instruction in Section 'Clinical Data Processing'.
Clone https://github.com/YerevaNN/mimic3-benchmarks (Harutyunyan et al.) to 'Multimodal_Transformer/mimic3-benchmarks' folder. Setup the environment, and run all data generation steps to generate training data without text features.
create folder 'data-mimic3' under 'Multimodal_Transformer' folder, and all the MIMIC-III processed data will be stored in 'data-mimi3' folder.
Clinical Notes processing is based on repository in https://github.com/kaggarwal/ClinicalNotesICU.
setup the environment for notes processing and model training. Install environment:
pip install -r requrements.txt
Update all paths and configuration in 'mmtransformer/config.py'.
- Run 'mmtransformer/scripts/extract_notes.py', the folder 'data-mimic3/root/test_text_fixed/', and 'data-mimic3/root/text_fixed/' will be generated.
- Run 'mmtransformer/scripts/extract_T0.py' file.
For our well-trained model, you can download from GoogleDrive. Unzip the file and put them in './Multimodal_Transformer/mmtransformer/models/Checkpoints' and './Multimodal_Transformer/mmtransformer/models/Data' accordingly. Or you can generate the files yourself.
For model with only clinical notes (mbert), run
python mbert.py --gpu_id 1
For multimodal transformer, run
python IHM_mmtransformer.py --mode test --model_type both --model_name BioBert --TSModel Transformer --checkpoint_path Multimodal_Transformer --MaxLen 512 --NumOfNotes 0 --TextModelCheckpoint BioClinicalBERT_FT --freeze_model 1 --number_epoch 5 --batch_size 5 --load_model 1 --gpu_id 1
For multimodal transformer training, run
python IHM_mmtransformer.py --mode train --model_type both --model_name BioBert --TSModel Transformer --checkpoint_path Multimodal_Transformer --MaxLen 512 --NumOfNotes 0 --TextModelCheckpoint BioClinicalBERT_FT --freeze_model 1 --number_epoch 5 --batch_size 5 --load_model 0 --gpu_id 1
The output of all analysis are in 'Analysis' folder. For important clinical words analysis and visualization in clinical notes,
-
Run 'notes_analysis.py' to get the IG value with associated words, stored in file 'Analysis/bert_analysis_pred_all2.pkl'
-
Run 'notes_analysis3.py' to get the word list with frequency, stored in 'pred_tokenlist_top10_l0_2.txt'. We further filtered the list to remove the irrelavent words and tokens, which is stored in 'filter_pred_tokenlist_top10_l0_2.txt'.
It will also generate the word cloud 'filter_pred_tokenlist_top10_l0_2.png'.
The code is based on repository by Khadanga et al. given in https://github.com/kaggarwal/ClinicalNotesICU, and by Deznabi et al. given in https://github.com/Information-Fusion-Lab-Umass/ClinicalNotes_TimeSeries for experimental setup.
The MIMIC-III clinical variables pre-processing is clone from repository by Harutyunyan et al. given in https://github.com/YerevaNN/mimic3-benchmarks