Taming Self-Training for Open-Vocabulary Object Detection

Official implementation of online self-training and a split-and-fusion (SAF) head for Open-Vocabulary Object Detection (OVD), SAS-Det for short. This project was named as Improving Pseudo Labels for Open-Vocabulary Object Detection.

arXiv

Installation

Our project is developed on Detectron2. Please follow the official installation instructions, OR the following instructions.

# create new environment
conda create -n sas_det python=3.8
conda activate sas_det

# install pytorch
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

# install Detectron2 from a local clone
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

Install CLIP

# install CLIP
pip install scipy
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

Datasets

Please follow RegionCLIP's dataset instructions to prepare COCO and LVIS datasets.
Download and put metadata for datasets in the folder datasets (i.e., $DETECTRON2_DATASETS used in the last step), which will be used in our evaluation and training.

Download pretrained weights

Download various RegionCLIP's pretrained weights. Check here for more details. Create a new folder pretrained_ckpt to put those weights. In this repository, regionclip, concept_emb and rpn will be used.
Download our pretrained weights and put them in corresponding folders in pretrained_ckpt. Our pretrained weights includes:
- r50_3x_pre_RegCLIP_cocoRPN_2: RPN weights pretrained only with COCO Base categories. This is used for experiments on COCO to avoid potential data leakage.
- concept_emb: Complementary to RegionCLIP's concept_emb.

Evaluation with released weights

Results on COCO-OVD

Configs	Novel AP	Base AP	Overall AP
w/o SAF head	31.4	55.7	49.4
with SAF head	37.4	58.5	53.0

Evaluation without the SAF Head (baseline in the paper),

python3 ./test_net.py \
    --num-gpus 8 \
    --eval-only \
    --config-file ./sas_det/configs/regionclip/COCO-InstanceSegmentation/customized/CLIP_fast_rcnn_R_50_C4_ovd_PLs.yaml \
    MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_coco_no_saf_head_baseline.pth \
    MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \
    MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \
    MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
    MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
    MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \
    OUTPUT_DIR output/eval

Evaluation with the SAF Head,

python3 ./test_net.py \
    --num-gpus 8 \
    --eval-only \
    --config-file ./sas_det/configs/ovd_coco_R50_C4_ensemble_PLs.yaml \
    MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_coco.pth \
    MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \
    MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \
    MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \
    MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_coco_48_base_17_cls_emb.pth \
    MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
    MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \
    MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/coco_ovd_continue_cat_ids.json" \
    MODEL.ENSEMBLE.ALPHA 0.3 MODEL.ENSEMBLE.BETA 0.7 \
    OUTPUT_DIR output/eval

Results on LVIS-OVD

Configs	APr	APc	APf	AP
RN50-C4 as backbone	20.1	27.1	32.9	28.1
RN50x4-C4 as backbone	29.0	32.3	36.8	33.5

Evaluation with RN50-C4 as the backbone,

python3 ./test_net.py \
    --num-gpus 8 \
    --eval-only \
    --config-file ./sas_det/configs/ovd_lvis_R50_C4_ensemble_PLs.yaml \
    MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_lvis_r50.pth \
    MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
    MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866_lsj.pth \
    MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_866_base_cls_emb.pth \
    MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_lvis_866_base_337_cls_emb.pth \
    MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb.pth \
    MODEL.CLIP.OFFLINE_RPN_LSJ_PRETRAINED True \
    MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/lvis_ovd_continue_cat_ids.json" \
    MODEL.ENSEMBLE.ALPHA 0.33 MODEL.ENSEMBLE.BETA 0.67 \
    OUTPUT_DIR output/eval

Evaluation with RN50x4-C4 as the backbone,

python3 ./test_net.py \
    --num-gpus 8 \
    --eval-only \
    --config-file ./sas_det/configs/ovd_lvis_R50_C4_ensemble_PLs.yaml \
    MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_lvis_r50x4.pth \
    MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
    MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866_lsj.pth \
    MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_866_base_cls_emb_rn50x4.pth \
    MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_lvis_866_base_337_cls_emb_rn50x4.pth \
    MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb_rn50x4.pth \
    MODEL.CLIP.OFFLINE_RPN_LSJ_PRETRAINED True \
    MODEL.CLIP.TEXT_EMB_DIM 640 \
    MODEL.RESNETS.DEPTH 200 \
    MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION 18 \
    MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION 18 \
    MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/lvis_ovd_continue_cat_ids.json" \
    MODEL.ENSEMBLE.ALPHA 0.33 MODEL.ENSEMBLE.BETA 0.67 \
    OUTPUT_DIR output/eval

Acknowledgement

This repository was built on top of Detectron2, RegionCLIP, and VLDet. We thank the effort from our community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Taming Self-Training for Open-Vocabulary Object Detection

Installation

Datasets

Download pretrained weights

Evaluation with released weights

Results on COCO-OVD

Results on LVIS-OVD

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

Taming Self-Training for Open-Vocabulary Object Detection

Installation

Datasets

Download pretrained weights

Evaluation with released weights

Results on COCO-OVD

Results on LVIS-OVD

Acknowledgement