VRAM requirements? #1

jpgallegoar · 2025-01-07T18:18:26Z

Hello, first of all thank you for this awesome model. I wanted to ask what the VRAM requirements are, since I tried to run the I2VGen-XL-based model and it OOM with an RTX 4090 24gb.

Falkonar · 2025-01-08T02:38:42Z

I2VGen-XL-based I try to run it on 24 and get out of memory.

SEGAUG · 2025-01-08T03:44:21Z

Regarding this, I would also like to guide with specific parameters. Currently, inference on the V100 seems to be running into out-of-memory issues. Is there support for FP16 or INT8? Best regards.

CSRuiXie · 2025-01-08T04:43:10Z

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size.
You can set the frame length to 12, which should work within 24GB of VRAM.

cxzhou35 · 2025-01-10T06:22:59Z

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Hi @CSRuiXie,
I am running the toy example on the 4090 GPU(24GB memory) with the settings below:

frame_length = 12
upscale = 4
chunk_size = 2

Still has OOM error, how to fix that?
Another question is whether there has the limitations of the input video resolution?
For example, my input video is 1920x1080. Thanks in advance 🙏

CSRuiXie · 2025-01-10T07:11:23Z

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Hi @CSRuiXie, I am running the toy example on the 4090 GPU(24GB memory) with the settings below:

frame_length = 12

upscale = 4

chunk_size = 2

Still has OOM error, how to fix that? Another question is whether there has the limitations of the input video resolution? For example, my input video is 1920x1080. Thanks in advance 🙏

Hi, I believe the main issue is that your input video resolution is too large for 4x upscaling. For example, with the default settings, upscaling a 640x480 video by 4x can require more than 80GB of VRAM.

FurkanGozukara · 2025-01-10T09:06:26Z

Numbers are huge any way to quantize, tile or slice and reduce VRAM?

CSRuiXie · 2025-01-10T09:25:00Z

Numbers are huge any way to quantize, tile or slice and reduce VRAM?

Yes, we are aware of the VRAM issue, and we definitely plan to introduce some techniques to optimize it, such as tiling. In the meantime, you can follow this instruction to reduce VRAM usage.

FurkanGozukara · 2025-01-10T15:42:21Z

@CSRuiXie can you modify app here and add these 2 options?

https://huggingface.co/spaces/SherryX/STAR/blob/main/app.py

ty so much

nitinmukesh · 2025-01-11T17:03:01Z

@CSRuiXie

Thank you for sharing your work with us.
It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

FurkanGozukara · 2025-01-11T17:25:42Z

@CSRuiXie

Thank you for sharing your work with us. It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

yes this needs to be made so that longer videos can be processed

nitinmukesh · 2025-01-11T18:29:49Z

I also did a quick test using sample video (023_klingai_reedit.mp4) on 8 GB VRAM + 8 GB shared. Earlier was getting OOM on 1st step but after making few changes it start to work but slow as h...
77 frames
426 x 248
2x upscale
frame length 8

With few changes, the memory consumption is 11 GB
Each step takes 30 minutes, so 50 steps * 30 = 1500 minutes = 25 hours
I killed it after 5 steps.

I guess frame-wise processing should make it work on atleast 16/24GB VRAM. Any dirty/quick code to try frame-wise processing.

CSRuiXie · 2025-01-14T03:06:23Z

@CSRuiXie can you modify app here and add these 2 options?

https://huggingface.co/spaces/SherryX/STAR/blob/main/app.py

ty so much

We have now added these two options to the STAR demo. You can visit the Hugging Face demo to check the updates.

CSRuiXie · 2025-01-14T03:20:15Z

@CSRuiXie

Thank you for sharing your work with us. It seems all the frames are processed and kept in memory. Is there a way to process 1 frame at a time and dump in hard disk. Not sure if this can be implemented. It will help a lot of users and we will be able to use it on consumer gpus.

Thank you for your interest in our work. You can set the frame_length to 1 for frame-wise processing. However, the restored results may be worse than the default setting, mainly due to two reasons: (1) the frame length during inference differs significantly from that during training, and (2) the model cannot extract temporal information from other frames.

somenewaccountthen · 2025-01-19T13:26:22Z

Please put minimum VRAM in the install instructions.
Even 1 frame doesn't work on my card. 11Gb
Would safe a lot of people a lot of time.

CSRuiXie · 2025-01-20T02:49:26Z

Please put minimum VRAM in the install instructions. Even 1 frame doesn't work on my card. 11Gb Would safe a lot of people a lot of time.

Thanks for your advice! We will add more details about the VRAM requirements in the installation instructions.

Falkonar · 2025-02-18T23:30:09Z

Thank you for your interest in our work! Regarding the VRAM requirements, with the default settings and the toy example we provided, the GPU peak memory usage is approximately 39GB. Currently, there are two ways to reduce the VRAM requirements: (1) decrease the frame_length; and (2) decrease the chunk_size. You can set the frame length to 12, which should work within 24GB of VRAM.

Thank you for the detailed explanation on VRAM management. I have a follow-up question: Would your script work with two RTX 3090 GPUs using DistributedDataParallel (DDP)? Model Parallelism ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM requirements? #1

VRAM requirements? #1

jpgallegoar commented Jan 7, 2025

Falkonar commented Jan 8, 2025

SEGAUG commented Jan 8, 2025

CSRuiXie commented Jan 8, 2025

cxzhou35 commented Jan 10, 2025 •

edited

Loading

CSRuiXie commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

CSRuiXie commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

nitinmukesh commented Jan 11, 2025

FurkanGozukara commented Jan 11, 2025

nitinmukesh commented Jan 11, 2025 •

edited

Loading

CSRuiXie commented Jan 14, 2025

CSRuiXie commented Jan 14, 2025

somenewaccountthen commented Jan 19, 2025

CSRuiXie commented Jan 20, 2025

Falkonar commented Feb 18, 2025 •

edited

Loading

VRAM requirements? #1

VRAM requirements? #1

Comments

jpgallegoar commented Jan 7, 2025

Falkonar commented Jan 8, 2025

SEGAUG commented Jan 8, 2025

CSRuiXie commented Jan 8, 2025

cxzhou35 commented Jan 10, 2025 • edited Loading

CSRuiXie commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

CSRuiXie commented Jan 10, 2025

FurkanGozukara commented Jan 10, 2025

nitinmukesh commented Jan 11, 2025

FurkanGozukara commented Jan 11, 2025

nitinmukesh commented Jan 11, 2025 • edited Loading

CSRuiXie commented Jan 14, 2025

CSRuiXie commented Jan 14, 2025

somenewaccountthen commented Jan 19, 2025

CSRuiXie commented Jan 20, 2025

Falkonar commented Feb 18, 2025 • edited Loading

cxzhou35 commented Jan 10, 2025 •

edited

Loading

nitinmukesh commented Jan 11, 2025 •

edited

Loading

Falkonar commented Feb 18, 2025 •

edited

Loading