Skip to content

Latest commit

 

History

History
103 lines (72 loc) · 2.79 KB

README.md

File metadata and controls

103 lines (72 loc) · 2.79 KB

VideoAutoEncoder

VideoAutoencoder Logo

It is a small experiment to create an efficient Video Autoencoder for graphics with little VRAM memory.

Dataset's used:

Refactor: https://huggingface.co/datasets/Fredtt3/Videos Original: https://huggingface.co/datasets/lmms-lab/VideoDetailCaption

AdaptiveEfficientVideoAutoencoder (Version 0.3.0)

The new AdaptiveEfficientVideoAutoencoder offers a Video AutoEncoder that can have different qualities or durations. Currently, tests and improvements are being done on this autoencoder. We have noticed that it takes time to learn to rebuild depending on the quality and duration.

All information regarding VideoAutoEncoder usage and training is in the Test folder.

Memory Usage at 480p 5s videos at 15fps

Reconstruction at 480p 5s 15fps

Memory Usage Comparison

Version 0.1.0

RAM

VRAM

Version 0.2.0

RAM

VRAM

Version 0.3.0

You can now train from a Colab for 240p 10s videos at 15fps

Installation

git clone https://github.com/Rivera-ai/VideoAutoencoder.git
cd VideoAutoencoder
pip install -e .

Installation via Pypi

pip install VideoAutoencoder

Training Results V0.1.0

Epoch 0 Reconstruction Progress

The following demonstrations show the reconstruction quality at different steps during the first epoch of training:

Step 0

Step 0 Reconstruction

Step 50

Step 50 Reconstruction

Step 100

Step 100 Reconstruction

Step 150

Step 150 Reconstruction

Step 200

Step 200 Reconstruction

Training Results V0.2.0

Epoch 0 Reconstruction Progress

The following demonstrations show the reconstruction quality at different steps during the first epoch of training:

Step 0

Steps 0 Reconstruction

Step 200

Steps 200 Reconstruction

Epoch 1

Step 450

Steps 450 Reconstruction

Epoch 2

Step 650

Steps 650 Reconstruction

Epoch 3

Step 850

Steps 850 Reconstruction

Epoch 4

Step 1050

Steps 1050 Reconstruction

Obviously, training it on larger datasets and for more epochs will yield better results in terms of reconstruction and version 0.2.0 is much better optimized to train even on 3GB of VRAM but at the cost of requiring more epochs and training steps.