Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRAM requirements? #1

Open
jpgallegoar opened this issue Jan 7, 2025 · 16 comments
Open

VRAM requirements? #1

jpgallegoar opened this issue Jan 7, 2025 · 16 comments

Comments

@jpgallegoar
Copy link

Hello, first of all thank you for this awesome model. I wanted to ask what the VRAM requirements are, since I tried to run the I2VGen-XL-based model and it OOM with an RTX 4090 24gb.

@Falkonar
Copy link

Falkonar commented Jan 8, 2025

I2VGen-XL-based I try to run it on 24 and get out of memory.

@SEGAUG
Copy link

SEGAUG commented Jan 8, 2025

Regarding this, I would also like to guide with specific parameters. Currently, inference on the V100 seems to be running into out-of-memory issues. Is there support for FP16 or INT8? Best regards.

@CSRuiXie
Copy link
Collaborator

CSRuiXie commented Jan 8, 2025

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size.
You can set the frame length to 12, which should work within 24GB of VRAM.

@cxzhou35
Copy link

cxzhou35 commented Jan 10, 2025

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Hi @CSRuiXie,
I am running the toy example on the 4090 GPU(24GB memory) with the settings below:

  1. frame_length = 12
  2. upscale = 4
  3. chunk_size = 2

Still has OOM error, how to fix that?
Another question is whether there has the limitations of the input video resolution?
For example, my input video is 1920x1080. Thanks in advance 🙏

@CSRuiXie
Copy link
Collaborator

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Hi @CSRuiXie, I am running the toy example on the 4090 GPU(24GB memory) with the settings below:

  1. frame_length = 12
  2. upscale = 4
  3. chunk_size = 2

Still has OOM error, how to fix that? Another question is whether there has the limitations of the input video resolution? For example, my input video is 1920x1080. Thanks in advance 🙏

Hi, I believe the main issue is that your input video resolution is too large for 4x upscaling. For example, with the default settings, upscaling a 640x480 video by 4x can require more than 80GB of VRAM.

@FurkanGozukara
Copy link

Numbers are huge any way to quantize, tile or slice and reduce VRAM?

@CSRuiXie
Copy link
Collaborator

Numbers are huge any way to quantize, tile or slice and reduce VRAM?

Yes, we are aware of the VRAM issue, and we definitely plan to introduce some techniques to optimize it, such as tiling. In the meantime, you can follow this instruction to reduce VRAM usage.

@FurkanGozukara
Copy link

@CSRuiXie can you modify app here and add these 2 options?

https://huggingface.co/spaces/SherryX/STAR/blob/main/app.py

ty so much

@nitinmukesh
Copy link

@CSRuiXie

Thank you for sharing your work with us.
It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

@FurkanGozukara
Copy link

@CSRuiXie

Thank you for sharing your work with us. It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

yes this needs to be made so that longer videos can be processed

@nitinmukesh
Copy link

nitinmukesh commented Jan 11, 2025

I also did a quick test using sample video (023_klingai_reedit.mp4) on 8 GB VRAM + 8 GB shared. Earlier was getting OOM on 1st step but after making few changes it start to work but slow as h...
77 frames
426 x 248
2x upscale
frame length 8

  • With few changes, the memory consumption is 11 GB
  • Each step takes 30 minutes, so 50 steps * 30 = 1500 minutes = 25 hours
  • I killed it after 5 steps.

I guess frame-wise processing should make it work on atleast 16/24GB VRAM. Any dirty/quick code to try frame-wise processing.

@CSRuiXie
Copy link
Collaborator

@CSRuiXie can you modify app here and add these 2 options?

https://huggingface.co/spaces/SherryX/STAR/blob/main/app.py

ty so much

We have now added these two options to the STAR demo. You can visit the Hugging Face demo to check the updates.

@CSRuiXie
Copy link
Collaborator

@CSRuiXie

Thank you for sharing your work with us. It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

Thank you for your interest in our work. You can set the frame_length to 1 for frame-wise processing. However, the restored results may be worse than the default setting, mainly due to two reasons: (1) the frame length during inference differs significantly from that during training, and (2) the model cannot extract temporal information from other frames.

@somenewaccountthen
Copy link

Please put minimum VRAM in the install instructions.
Even 1 frame doesn't work on my card. 11Gb
Would safe a lot of people a lot of time.

@CSRuiXie
Copy link
Collaborator

Please put minimum VRAM in the install instructions. Even 1 frame doesn't work on my card. 11Gb Would safe a lot of people a lot of time.

Thanks for your advice! We will add more details about the VRAM requirements in the installation instructions.

@Falkonar
Copy link

Falkonar commented Feb 18, 2025

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Thank you for the detailed explanation on VRAM management. I have a follow-up question: Would your script work with two RTX 3090 GPUs using DistributedDataParallel (DDP)? Model Parallelism ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants