No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning

Manu Gaur, Darshan Singh S, Makarand Tapaswi

Welcome to the code repository for TMLR 2024 accepted paper No Detail Left Behind.

This repository contains data and code for self-retrieval evaluation on the TrueMatch benchmark + MLE training, REINFORCE fine-tuning.

⚡ For instant visualization of data samples, please visit our Project Page

🧰 Setting up the repository

🌏 Setting up the environment

conda create -n d3 python=3.10 -y
conda activate ndlb
pip install -r requirements.txt

Setting-up TrueMatch Benchmark 💿

The benchmark is available at Huggingface.
Make sure you have git-lfs installed.

mkdir data && cd data
git clone https://huggingface.co/datasets/manu-gaur/NDLB-TrueMatch-Benchmark

Ensure the following directory structure for NDLB-TrueMatch-Benchmark

├── truematch_images
│   ├── COCOID.jpg
│   ├── COCOID.jpg
│   ├── ...
├── benchmark
│   ├── 1.json
│   ├── 3.json
│   ├── ...
├── test_val_cocoid2idx.json

The benchmark folder contains JSON files, each named by bag_size. Each JSON file lists groups of images (bags), where each bag includes the COCOIDs of its images.

Self-Retrieval Evaluation on TrueMatch

Generate fine-grained captions for each image in ./data/NDLB-TrueMatch-Benchmark/truematch_images.
Store the generated captions in a python dict as a .pkl file with (COCOID, Caption) as key-value pairs.

Simply extract COCOID for each image from its filename:

filename = "COCO_val2014_000000003310.jpg"
cocoid = int(filename.split(".")[0].split("_")[-1])

To evaluate the captioning system on the TrueMatch benchmark, run:

python truematch_eval.py \
  --preds_path [PATH_TO_GENERATED_CAPTIONS] \
  --out_dir [PATH_TO_YOUR_OUTPUT_DIR]

You can adjust the following:

--preds_path: Path to captions independently generated by your model for TrueMatch images
---out_dir: Path to a directory to store R@1 scores for all bags in TrueMatch

MLE and REINFORCE (CIDEr, SR, CIDEr + SR) Training

To MLE train CLIPCap on a particular dataset [COCO, BlendCap, HolisticCap], run:

python -m egg.zoo.emergent_captioner.finetuning.train dataset mle

To REINFORCE fine-tune a model with a particular reward (SR, CIDEr) that has been MLE trained with a dataset [COCO, BlendCap, HolisticCap], run:

python -m egg.zoo.emergent_captioner.finetuning.train dataset reward

BibTeX

If you find our work useful, please cite as below

@article{gaur2024detect,
  title={No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning},
  author={Gaur, Manu and Singh S, Darshan and Tapaswi, Makarand.},
  journal={arXiv preprint arXiv:2409.03025},
  year={2024}
}

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning

Welcome to the code repository for TMLR 2024 accepted paper No Detail Left Behind.

This repository contains data and code for self-retrieval evaluation on the TrueMatch benchmark + MLE training, REINFORCE fine-tuning.

⚡ For instant visualization of data samples, please visit our Project Page

🧰 Setting up the repository

🌏 Setting up the environment

Setting-up TrueMatch Benchmark 💿

Self-Retrieval Evaluation on TrueMatch

MLE and REINFORCE (CIDEr, SR, CIDEr + SR) Training

BibTeX

Files

README.md

Latest commit

History

README.md

File metadata and controls

No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning

Welcome to the code repository for TMLR 2024 accepted paper No Detail Left Behind.

This repository contains data and code for self-retrieval evaluation on the TrueMatch benchmark + MLE training, REINFORCE fine-tuning.

⚡ For instant visualization of data samples, please visit our Project Page

🧰 Setting up the repository

🌏 Setting up the environment

Setting-up TrueMatch Benchmark 💿

Self-Retrieval Evaluation on TrueMatch

MLE and REINFORCE (CIDEr, SR, CIDEr + SR) Training

BibTeX