Skip to content

Latest commit

 

History

History
132 lines (97 loc) · 4.76 KB

README.md

File metadata and controls

132 lines (97 loc) · 4.76 KB

No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning

Manu Gaur, Darshan Singh S, Makarand Tapaswi

arXiv Github Pages HF Model: NDLB HF Dataset: TrueMatch HF Dataset: VCB Datasets

Welcome to the code repository for TMLR 2024 accepted paper No Detail Left Behind.

This repository contains data and code for self-retrieval evaluation on the TrueMatch benchmark + MLE training, REINFORCE fine-tuning.

⚡ For instant visualization of data samples, please visit our Project Page

🧰 Setting up the repository

🌏 Setting up the environment

conda create -n d3 python=3.10 -y
conda activate ndlb
pip install -r requirements.txt

Setting-up TrueMatch Benchmark 💿

mkdir data && cd data
git clone https://huggingface.co/datasets/manu-gaur/NDLB-TrueMatch-Benchmark

Ensure the following directory structure for NDLB-TrueMatch-Benchmark

├── truematch_images
│   ├── COCOID.jpg
│   ├── COCOID.jpg
│   ├── ...
├── benchmark
│   ├── 1.json
│   ├── 3.json
│   ├── ...
├── test_val_cocoid2idx.json

The benchmark folder contains JSON files, each named by bag_size. Each JSON file lists groups of images (bags), where each bag includes the COCOIDs of its images.


Self-Retrieval Evaluation on TrueMatch

  1. Generate fine-grained captions for each image in ./data/NDLB-TrueMatch-Benchmark/truematch_images.
  2. Store the generated captions in a python dict as a .pkl file with (COCOID, Caption) as key-value pairs.

Simply extract COCOID for each image from its filename:

filename = "COCO_val2014_000000003310.jpg"
cocoid = int(filename.split(".")[0].split("_")[-1])
  1. To evaluate the captioning system on the TrueMatch benchmark, run:
python truematch_eval.py \
  --preds_path [PATH_TO_GENERATED_CAPTIONS] \
  --out_dir [PATH_TO_YOUR_OUTPUT_DIR]

You can adjust the following:

  • --preds_path: Path to captions independently generated by your model for TrueMatch images
  • ---out_dir: Path to a directory to store R@1 scores for all bags in TrueMatch

MLE and REINFORCE (CIDEr, SR, CIDEr + SR) Training

To MLE train CLIPCap on a particular dataset [COCO, BlendCap, HolisticCap], run:

python -m egg.zoo.emergent_captioner.finetuning.train dataset mle

To REINFORCE fine-tune a model with a particular reward (SR, CIDEr) that has been MLE trained with a dataset [COCO, BlendCap, HolisticCap], run:

python -m egg.zoo.emergent_captioner.finetuning.train dataset reward

BibTeX

If you find our work useful, please cite as below

@article{gaur2024detect,
  title={No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning},
  author={Gaur, Manu and Singh S, Darshan and Tapaswi, Makarand.},
  journal={arXiv preprint arXiv:2409.03025},
  year={2024}
}

CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC-SA 4.0