👋 加入我们的 微信社区
Mini Sora 开源社区定位为由社区同学自发组织的开源社区(免费不收取任何费用、不割韭菜),Mini Sora 计划探索 Sora 的实现路径和后续的发展方向:
- 将定期举办 Sora 的圆桌和社区一起探讨可能性
- 视频生成的现有技术路径探讨
复现论文主要有
- DiT with OpenDiT
- SiT
- W.A.L.T
欢迎加入 Sora 有关论文复现小组!
主讲: 邢桢 复旦大学视觉与学习实验室博士生
直播看点: 图像生成扩散模型基础/文生视频扩散模型的发展/浅谈 Sora 背后技术和复现挑战
在线直播时间: 02/28 20:00-21:00
回放微信视频号搜索: 聊聊 Sora 之 Video Diffusion 综述
PPT: 飞书下载链接
- Sora: Creating video from text 技术报告: Video generation models as world simulators
- DiT: Scalable Diffusion Models with Transformers
- Latte: Latte: Latent Diffusion Transformer for Video Generation latte论文精读翻译.pdf Latte论文解读
- 更新中...
Diffusion Model | |
---|---|
论文 | 链接 |
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | Paper, Github |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | Paper, Github |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | Paper, Github |
4) DDPM: Denoising Diffusion Probabilistic Models | Paper, Github |
5) DDIM: Denoising Diffusion Implicit Models | Paper, Github |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | Paper, Github, Blog |
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models | Paper, Github, Blog |
Diffusion Transformer | |
论文 | 链接 |
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | Paper, Github, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | Paper, Github, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | Paper, Github, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | Paper, Github |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | Paper, Github |
6) OpenDiT: An Easy, Fast and Memory-Efficent System for DiT Training and Inference | Github |
Video Generation | |
论文 | 链接 |
1) Animatediff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning | Paper, Github, ModelScope |
2) I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models | Paper, Github, ModelScope |
3) Imagen Video: High Definition Video Generation with Diffusion Models | Paper |
4) MoCoGAN: Decomposing Motion and Content for Video Generation | Paper |
5) Adversarial Video Generation on Complex Datasets | Paper |
6) W.A.L.T: Photorealistic Video Generation with Diffusion Models | Paper Project |
7) VideoGPT: Video Generation using VQ-VAE and Transformers | Paper, Github |
8) Video Diffusion Models | Paper, Github, Project |
9) MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | Paper, Github, Project, Blog |
10) VideoPoet: A Large Language Model for Zero-Shot Video Generation | Paper |
11) MAGVIT: Masked Generative Video Transformer | Paper, Github, Project, Colab |
12) EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Paper, Github, Project |
13) SimDA: Simple Diffusion Adapter for Efficient Video Generation | Paper, Github, Project |
14) [ICCV 23] StableVideo: Text-driven Consistency-aware Diffusion Video Editing | Paper, Github, Project |
Long-context | |
论文 | 链接 |
1) World Model on Million-Length Video And Language With RingAttention | Paper, Github |
2) Ring Attention with Blockwise Transformers for Near-Infinite Context | Paper, Github |
3) Extending LLMs' Context Window with 100 Samples | Paper, Github |
4) Efficient Streaming Language Models with Attention Sinks | Paper, Github |
5) The What, Why, and How of Context Length Extension Techniques in Large Language Models – A Detailed Survey | Paper |
6) [CVPR 24] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding | Paper, Github, Project |
Base Video Models | |
论文 | 链接 |
1) ViViT: A Video Vision Transformer | Paper, Github |
2) VideoLDM: Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | Paper |
3) LVDM: Latent Video Diffusion Models for High-Fidelity Long Video Generation | Paper, Github |
4) LFDM: Conditional Image-to-Video Generation with Latent Flow Diffusion Models | Paper, Github |
5) MotionDirector: Motion Customization of Text-to-Video Diffusion Models | Paper, Github |
现有高质量资料 | |
资料 | 链接 |
1) Datawhale - AI视频生成学习 | Feishu doc |
2) A Survey on Generative Diffusion Model | Paper, Github |
3) Awesome-Video-Diffusion-Models | Paper, Github |
4) Awesome-Text-To-Video:A Survey on Text-to-Video Generation/Synthesis | Github |
5) video-generation-survey: A reading list of video generation | Github |
6) Awesome-Video-Diffusion | Github |
7) Video Generation Task in Papers With Code | Task |
8) Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Paper |