This repository provides the official implementation of "LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies" (AAAI 2025). KG-LLaVA integrates Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) with Vision-Language Models (VLMs) to generate Natural Language Explanations (NLEs) for medical imaging.
To set up the environment, run:
git clone https://github.com/yourusername/AAAI-Reproducibility.git
cd AAAI-Reproducibility
pip install -r requirements.txt
To set up the MedCLIP environment separately:
pip install git+https://github.com/RyanWangZf/MedCLIP.git
pip install faiss-gpu
To obtain the MIMIC-NLE dataset, follow the instructions at:
- MIMIC-NLE Repository
- Download MIMIC-CXR reports: PhysioNet
We use the training dataset following the official MIMIC-NLE split.
To generate Knowledge Graph (KG) triplets from medical reports, we utilize RadGraph. Follow the instructions below:
- Download and set up RadGraph from: Stanford-AIMI RadGraph
- Run the inference script to extract triplets:
python dataset_preparation.py
- We filter triplets to retain only those with suggestive_of relationships.
A datastore is built using MedCLIP and FAISS to facilitate knowledge retrieval. The datastore includes:
kg_nle_index
kg_nle_index_captions.json
To retrieve triplets for test images, use:
python datastore_retrieval.py
To prepare the dataset for LLaVA training, execute the following:
python dataset_preparation.py
This ensures the dataset is in the required format:
[
{
"id": "0",
"split": "train",
"image": "p11/p11941242/s50000014/dffc8ab2-ff37704f-2fb29e6d-51e08075-88bca914.jpg",
"conversations": [
{
"from": "human",
"value": "<image>\nThe image-specific triplets from the knowledge graph are: opacities suggestive_of effusions; TRIPLETS HERE BASED ON KG;. And for the given image, Which signs show that the patient has uncertain Atelectasis, positive Lung Opacity, uncertain Pleural Effusion, uncertain Pneumonia?"
},
{
"from": "gpt",
"value": "Retrocardiac opacity with silhouetting of the left hemidiaphragm and lateral border of the descending aorta is nonspecific and could reflect any of a combination of atelectasis, focal pneumonia or even a small effusion."
}
]
}
]
To train LLaVA with KG triplets, run:
bash KG-LLaVA/models/LLaVA/scripts/v1_5/finetune_task_lora.sh
For model evaluation, use:
bash KG-LLaVA/models/LLaVA/scripts/v1_5/eval/vqav2.sh
This repository makes use of the following open-source projects:
This work will appear in AAAI-2025 soon. If you use this work, please cite:
@article{hamza2024llava,
title={LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies},
author={Hamza, Ameer and Ahn, Yong Hyun and Lee, Sungyoung and Kim, Seong Tae and others},
journal={arXiv preprint arXiv:2410.04749},
year={2024}
}