This is a simple implementation of Image Caption
trained on MS COCO dataset.
The project is based on these repos:
system: win10 x64
cuda version: 8.0
cudnn version: 5.1
You should also have a GPU card with 4GB or larger graph memory. Nvidia GTX 1060+ is recommended.
joblib==0.11
numpy==1.12.1
tensorflow-gpu==0.12.0
keras==1.2.2
Download inception_v3_2016_08_28_frozen.pb and unpack it to model/inception_v3_2016_08_28_frozen.pb
Download COCO 2014 Training images [80K/13GB] dataset and unpack all training jpg files to train/images/
The anns.csv
is a table contains training images' path and their captions. When training, ONLY captions in anns.csv
will be used.
We provide a default anns.csv
contains about 56K captionss. You can generate this file on your own.
Run
python extractor.py
to generate image features.
Warning: pickle.dump
method in python will cost a large amount of memory.
When using tensorflow as keras backend, Maybe You should modify keras/optimizers.py
like this.
Run
python train.py
to train the models. Checkpoint file will be save to weights/
.
Modify model_path
to checkpoint file you have got and run
python test.py path/to/test/image.jpg
to get the result.
- add val data