This repository contains the code and data necessary to replicate the experiments from the paper VQA: Visual Question Answering. The aim is to develop a model capable of answering questions about images, similar to the system presented in the paper.
- Download Data
python download_data.py
- Preprocess Images
python preprocess_image.py
- Create Vocabulary
python make_vocabulary.py
- Prepare VQA Inputs
python make_vqa_inputs.py
datasets
├── Annotations
├── Images
├── Questions
├── Resized_Images
├── test-dev.npy
├── test.npy
├── train_valid.npy
├── train.npy
├── valid.npy
├── vocab_answers.txt
├── vocab_questions.txt\
```bash
python train.py
- COCO Dataset: http://cocodataset.org/
- Paper: VQA: Visual Question Answering