Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

Yicheng Xiao¹*, Zhuoyan Luo¹*, Yong Liu¹, Yue Ma¹, Hengwei Bian², Yatai Ji¹, Yujiu Yang¹ and Xiu Li¹

¹ Tsinghua University, ² Carnegie Mellon University

📖 Abstract

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis. Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture. However, we observe that the emphasis of MR and HD differs, with one necessitating the perception of local relationships and the other prioritizing the understanding of global contexts. Consequently, the lack of task-specific design will inevitably lead to limitations in associating the intrinsic specialty of two tasks. To tackle the issue, we propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively. By performing progressive integration on intra and inter-modality across multi-granularity, UVCOM achieves the comprehensive understanding in processing a video. Moreover, we present multi-aspect contrastive learning to consolidate the local relation modeling and global knowledge accumulation via well aligned multi-modal space. Extensive experiments on QVHighlights, Charades-STA, TACoS , YouTube Highlights and TVSum datasets demonstrate the effectiveness and rationality of UVCOM which outperforms the state-of-the-art methods by a remarkable margin.

📚 Datasets

QVHighlights : The data is set as followed, you need to replace the feat_root path in the bash file with your own. You can download the official QVHighlight dataset from moment_detr_features.tar.gz.

QVHighlight
└──── features
    ├── slowfast_features
    ├── clip_text_features
    ├── clip_features
    ├── pann_features
    └── clip_sub_features

🛠️ Environment Setup

conda create -n uvcom python=3.7
conda activate uvcom

# Install pytorch 
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

# Install other packages
pip install -r requirements.txt

Tips: If you want to reproduce 100%, it is necessary to follow the version I provided and run it on RTX3090 ! !

🍺 Main Results

QVHighlights

Extra Training Data	Use Audio	Set Split	MR [email protected]	MR [email protected]	MR mAP	HD mAP	HD HIT@1	Log/ckpt
✗	✗	Val	65.10	51.81	45.79	40.03	63.29	log/ckpt
✗	✗	Test	63.55	47.47	43.18	39.74	64.20	log/ckpt
✗	✔	Test	63.18	48.70	43.27	39.79	64.79	--/--
ASR	✗	Test	64.53	48.31	43.80	39.98	65.58	--/--

Charades-STA

Extra Training Data	Use Audio	Set Split	MR [email protected]	MR [email protected]	Log/ckpt
✗	✗	Test	59.25	36.64	log/ckpt

🚀 Train & Evaluate

Train from scratch

QVHighlights

bash scripts/train_QV_scratch.sh

You need to modify the relevant path to your own.

Evaluate

QVHighlights

bash scripts/eval_QV_scratch.sh

You need to modify the resume ckpt path to your own.

❤️ Acknowledgement

Code in this repository is built upon several public repositories. Thanks for the wonderful work Moment-DETR and QD-DETR ! !

⭐️ BibTeX

If you find this work useful for your research, please cite:

@article{DBLP:journals/corr/abs-2311-16464,
  author       = {Yicheng Xiao and
                  Zhuoyan Luo and
                  Yong Liu and
                  Yue Ma and
                  Hengwei Bian and
                  Yatai Ji and
                  Yujiu Yang and
                  Xiu Li},
  title        = {Bridging the Gap: {A} Unified Video Comprehension Framework for Moment
                  Retrieval and Highlight Detection},
  journal      = {CoRR},
  volume       = {abs/2311.16464},
  year         = {2023}
}

☑️ LICENSE

Our codes are under MIT license.

🎤🎤🎤 Todo

[ ✔ ] Release the code.
[ ✔ ] Release the config and checkpoints.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
data		data
scripts		scripts
standalone_eval		standalone_eval
utils		utils
uvcom		uvcom
.DS_Store		.DS_Store
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

📖 Abstract

📚 Datasets

🛠️ Environment Setup

🍺 Main Results

QVHighlights

Charades-STA

🚀 Train & Evaluate

Train from scratch

Evaluate

❤️ Acknowledgement

⭐️ BibTeX

☑️ LICENSE

🎤🎤🎤 Todo

About

Releases

Packages

Contributors 2

Languages

License

EasonXiao-888/UVCOM

Folders and files

Latest commit

History

Repository files navigation

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

📖 Abstract

📚 Datasets

🛠️ Environment Setup

🍺 Main Results

QVHighlights

Charades-STA

🚀 Train & Evaluate

Train from scratch

Evaluate

❤️ Acknowledgement

⭐️ BibTeX

☑️ LICENSE

🎤🎤🎤 Todo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages