⭐️ Our series works: [MMStar] [ShareGPT4V] [ShareGPT4Omni]
🚀🚀🚀 Official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
Here is a video for introducing ShareGPT4Video clearly:
demo_clip_v2.mp4
- Authors: Lin Chen*, Xilin Wei* Jinsong Li*, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao📧, Jiaqi Wang 📧
- Institutes: University of Science and Technology of China; The Chinese University of Hong Kong; Peking University; Shanghai AI Laboratory
- Resources: [Paper] [Project Page] [ShareGPT4Video Dataset]
- Models: [🤗ShareGPT4Video-8B] [🤗ShareCaptioner-Video]
- Demo: [🤗ShareGPT4Video-8B] [🤗ShareCaptioner-Video]
- 🔥 A large-scale highly descriptive video-text dataset, 40K GPT4-Vision-generated video captions, around 400K implicit video split captions.
- 🔥 A general video captioner for various video durations, resolutions, and aspect ratios, approaching GPT4-Vision's caption capability, featuring two inference modes targeted for quality and efficiency, separately.
- 🔥 A superior large video-language model ShareGPT4Video-8B, lasting 5 hours on 8xA100 GPUs of training respectively.
- 🔥 Improving Text-to-Video performance with high-quality video captions generated by our ShareCaptioner-Video. Thanks to Open-Sora-Plan.
[2024/6/11] The web demo and local demo of ShareCaptioner-Video are available now!
[2024/6/11] The web demo and local demo of ShareGPT4Video-8B are available now!
[2024/6/7] Our paper has been featured as HuggingFace Daily Papers and ranked 1st in 6.7.
[2024/5/27] The ShareGPT4Video-8B model is released!
[2024/5/26] The ShareGPT4Video dataset and project page are released!
- Training and evaluation code for ShareGPT4Video-8B
- Batch inference code for ShareCaptioner-Video
- Web demo and local demo of ShareCaptioner-Video
- Web demo and local demo of ShareGPT4Video-8B
- Checkpoints of ShareGPT4Video-8B
You can directly use our ShareGPT4Video model for conversation with your own video by the following command:
python run.py --model-path Lin-Chen/sharegpt4video-8b --video examples/yoga.mp4 --query Describe this video in detail.
Or you can build your local demo for enjoying our ShareGPT4Video-8B with the following command:
python app.py
You can build your local demo for enjoying our ShareCaptioner-Video with the following command:
cd captioner
python app.py
git clone https://github.com/ShareGPT4Omni/ShareGPT4Video
conda create -n share4video python=3.10 -y
conda activate share4video
cd ShareGPT4Video
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝
@article{chen2024sharegpt4video,
title={ShareGPT4Video: Improving Video Understanding and Generation with Better Captions},
author={Chen, Lin and Wei, Xilin and Li, Jinsong and Dong, Xiaoyi and Zhang, Pan and Zang, Yuhang and Chen, Zehui and Duan, Haodong and Lin, Bin and Tang, Zhenyu and others},
journal={arXiv preprint arXiv:2406.04325},
year={2024}
}
- LLaVA: the codebase we built upon. Thanks for their wonderful work.
- Open-Sora-Plan: an excellent open-source codebase for Sora-like text-to-video implementation. Thanks for their wonderful work.
- Open-LLaVA-NeXT: an open-source codebase for re-producing the training procedure of LLaVA-NeXT series.