Manu Gaur, Darshan Singh S, Makarand Tapaswi
This repository contains data and code for self-retrieval evaluation on the TrueMatch benchmark + MLE training, REINFORCE fine-tuning.
⚡ For instant visualization of data samples, please visit our Project Page
conda create -n d3 python=3.10 -y
conda activate ndlb
pip install -r requirements.txt
- The benchmark is available at Huggingface.
- Make sure you have git-lfs installed.
mkdir data && cd data
git clone https://huggingface.co/datasets/manu-gaur/NDLB-TrueMatch-Benchmark
Ensure the following directory structure for NDLB-TrueMatch-Benchmark
├── truematch_images
│ ├── COCOID.jpg
│ ├── COCOID.jpg
│ ├── ...
├── benchmark
│ ├── 1.json
│ ├── 3.json
│ ├── ...
├── test_val_cocoid2idx.json
The benchmark
folder contains JSON files, each named by bag_size. Each JSON file lists groups of images (bags), where each bag includes the COCOIDs of its images.
- Generate fine-grained captions for each image in
./data/NDLB-TrueMatch-Benchmark/truematch_images
. - Store the generated captions in a python dict as a
.pkl
file with (COCOID, Caption) as key-value pairs.
Simply extract COCOID for each image from its filename:
filename = "COCO_val2014_000000003310.jpg"
cocoid = int(filename.split(".")[0].split("_")[-1])
- To evaluate the captioning system on the TrueMatch benchmark, run:
python truematch_eval.py \
--preds_path [PATH_TO_GENERATED_CAPTIONS] \
--out_dir [PATH_TO_YOUR_OUTPUT_DIR]
You can adjust the following:
--preds_path
: Path to captions independently generated by your model for TrueMatch images---out_dir
: Path to a directory to store R@1 scores for all bags in TrueMatch
To MLE train CLIPCap on a particular dataset [COCO, BlendCap, HolisticCap], run:
python -m egg.zoo.emergent_captioner.finetuning.train dataset mle
To REINFORCE fine-tune a model with a particular reward (SR, CIDEr) that has been MLE trained with a dataset [COCO, BlendCap, HolisticCap], run:
python -m egg.zoo.emergent_captioner.finetuning.train dataset reward
If you find our work useful, please cite as below
@article{gaur2024detect,
title={No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning},
author={Gaur, Manu and Singh S, Darshan and Tapaswi, Makarand.},
journal={arXiv preprint arXiv:2409.03025},
year={2024}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.