After data preparation and before running the training script, please modify data_root command in scripts, e.g.,
data_root='./dataset/cmumosei'
Download CMU-MOSEI [link] meta files and videos and organize the data structures as below
Dataset
│
├── CMU_MOSEI
│ ├── train
│ │ ├── audio
│ │ ├── video
│ │ └── text
│ ├── valid
│ ├── test
│ ├── labels
│ └── labels_emotion
Sentiment analysis
bash scripts/finetune_mosei.sh
Emotional analysis
bash scripts/finetune_moseiemo.sh
Download VQAv2 [link] meta files and audios [link] and images and organize the data structures as below
Dataset
│
├── VQA
│ ├── audios
│ ├── train2014
│ │ ├── 0.jpg
│ │ └── ...
│ ├── val2014
│ ├── test2014
│ ├── train.jsonl
│ ├── dev.jsonl
│ └── test.jsonl
bash scripts/finetune_vqa.sh
Download meta files from frozen-in-time or directly [link] and videos and organize the data structures as below
Dataset
│
├── MSRVTT
│ ├── high-quality
│ ├── structured-symlinks
│ ├── annotation
│ │ └── MSR_VTT.json
│ ├── videos
│ └── raw_audios
bash scripts/finetune_msrvtt.sh