This repository contains the official PyTorch implementation of paper "Knowledge-Augmented Visual Question Answering with Natural Language Explanation" for Transaction on Image Processing (TIP) 2024.
The KICNLE model enhances visual question answering by using an iterative method where each answer is refined based on the previous explanation. It includes a knowledge retrieval module to ensure relevant and accurate information. This results in high-quality, consistent answers and explanations closely tied to the visual content.
- Install Anaconda or Miniconda distribution based on Python3.8
- Main packages: PyTorch = 1.12, transformers = 4.30
- CLIP ViT-based model
pip install git+https://github.com/openai/CLIP.git
- For VQA-X dataset
python vqaX.py
- For A-OKVQA dataset
python a_okvqa.py