GitHub - William-wAng618/M2PT: Official repo of M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

Official Repo of M2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

(👉Under construction! The key code is uploaded. However, there are several redundancies in the current version, and the commands/instructions are not perfectly ready for formal release. I will gradually update it! Please stay tuned.)

This repository contains the official PyTorch implementation for M2PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning. Our work is based on LLaVA, and we thank the great work of them.

Figure1: Overview of our M2PT approach. Here, visual prompts are embedded into each layer of the Visual Encoder, and textual prompts are embedded into each layer of the LLM. These prompts facilitate the extraction and alignment of features across modalities (e.g., vision, language). The cross-modality interaction between visual and textual features is enhanced through layered integration, ultimately improving the model's capability in zero-shot instruction learning tasks.

Install

Clone this repository and navigate to LLaVA folder

[email protected]:William-wAng618/M2PT.git
cd M2PT

Install Package

conda create -n M2PT python=3.10 -y
conda activate M2PT
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Stage-one LLaVA_align Weights

The weigth for stage-1 Align is liuhaotian/llava-pretrain-vicuna-7b-v1.3 and lmsys/vicuna-7b-v1.3 please download it for M2PT.

M2PT-emnlp2024

Prepare data.

Please download the annotation of the Vision-Flan 191k data and place it in playground.

├── M2PT
│   └── playground
|       └──Vision-Flan (unzip here)

Start training.

There are several parameter need to be notice in ==\scripts\PT_full_schedule.sh==

--PT_len_llm: The num of textual prompts add in LLM.
--PT_len_vision_encoder: The num of visual prompts add in Vision encoder.

Then run:

bash scripts/PT_full_schedule.sh

Evaluation. For evaluation, please use:

./M2PT/eval/model_vqa_loader_PT_mme.py

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
M2PT		M2PT
docs		docs
images		images
llava		llava
scripts		scripts
README.md		README.md
m2pt_eval.zip		m2pt_eval.zip
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install

Stage-one LLaVA_align Weights

M2PT-emnlp2024

About

Releases

Packages

Languages

William-wAng618/M2PT

Folders and files

Latest commit

History

Repository files navigation

Install

Stage-one LLaVA_align Weights

M2PT-emnlp2024

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages