Why isn’t VRAM being released after training LoRA? #9876

hjw-0909 · 2024-11-06T11:58:59Z

Describe the bug

When I use train_dreambooth_lora_sdxl.py, the VRAM is not released after training. How can I fix this?

Reproduction

Not used.

Logs

No response

System Info

🤗 Diffusers version: 0.31.0.dev0
Platform: Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.17
Running on Google Colab?: No
Python version: 3.8.20
PyTorch version (GPU?): 2.2.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.25.2
Transformers version: 4.45.2
Accelerate version: 1.0.1
PEFT version: 0.13.2
Bitsandbytes version: 0.44.1
Safetensors version: 0.4.5
xFormers version: not installed
Accelerator: NVIDIA H800, 81559 MiB
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

SahilCarterr · 2024-11-06T13:50:09Z

I think you should Try manual flushing the gpu memory
To see PID of the process sudo fuser -v /dev/nvidia*
Then kill the PID that you no longer need
sudo kill -9 PID.
@hjw-0909

hjw-0909 · 2024-11-06T13:54:30Z

@SahilCarterr I mean that after training, I want to perform other tasks without ending the entire Python script. In theory, VRAM should be released once train_lora.py completes the training, but it isn’t being freed.

charchit7 · 2024-11-06T14:09:14Z

As @SahilCarterr mentioned your process might be stalled.
Or try freeing the GPU mem in your code after the training loop is completed.

hjw-0909 · 2024-11-06T14:12:04Z

@charchit7 I have added torch.cuda.empty_cache() after train, but it not worked.

sayakpaul · 2024-11-07T14:56:13Z

Can you show a snap of the memory to make sure the memory is actually not released?

hjw-0909 · 2024-11-08T02:30:44Z

@sayakpaul I ensure that the memory is properly released at the end of the .py script. However, I have noticed that after training with LoRA, the memory isn't fully released.

sayakpaul · 2024-11-08T10:46:59Z

I ensure that the memory is properly released at the end of the .py script.

I don't understand what does this mean. Could you explain further?

elismasilva · 2024-11-15T18:43:44Z

Hi @hjw-0909 , This usually happens to me when I'm doing inferences. So you can try adding with torch.no_grad(): before all inference calls inside the script and also try putting it before the start of the training loop (I've never tested this, I don't know if it can affect the final result). At the end of the process you do torch.cuda.empty_cache()

asomoza · 2024-11-15T19:00:52Z

The training script is just for that and never was intended as part of something bigger or to keep doing inference after it.

Probably the cuda empty cache isn't freeing the VRAM because you will have to delete the pipelines and any reference to the models the script has.

Usually what I do is to create a separate thread for training and then just delete the whole thread so it doesn't keep any reference of the training part.

sayakpaul · 2024-11-16T13:02:20Z

torch.no_grad() may not be needed as we also keep our pipeline calls decorated under torch.no_grad().

elismasilva · 2024-11-16T15:26:21Z

torch.no_grad() may not be needed as we also keep our pipeline calls decorated under torch.no_grad().

You are right, the pipeline already has it, however, when I do my local tests I usually call encode_prompt and prepare_ip_adapter_image_embeds before the pipeline call and then I put with torch.no_grad() in them to avoid this problem. But I don't know if this happens in all environments.
In the training script I noticed that the line 1713 does not have this instruction before

diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py

Line 1713 in e255920

prompt_embeds, pooled_prompt_embeds = encode_prompt(

unlike the line that does, maybe this is the cause of the problem.

diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py

Line 1460 in e255920

with torch.no_grad():

Maybe it's worth testing to see if this works.

hjw-0909 added the bug Something isn't working label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why isn’t VRAM being released after training LoRA? #9876

Why isn’t VRAM being released after training LoRA? #9876

hjw-0909 commented Nov 6, 2024

SahilCarterr commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

charchit7 commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

sayakpaul commented Nov 7, 2024

hjw-0909 commented Nov 8, 2024

sayakpaul commented Nov 8, 2024

elismasilva commented Nov 15, 2024

asomoza commented Nov 15, 2024

sayakpaul commented Nov 16, 2024

elismasilva commented Nov 16, 2024

Why isn’t VRAM being released after training LoRA? #9876

Why isn’t VRAM being released after training LoRA? #9876

Comments

hjw-0909 commented Nov 6, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

SahilCarterr commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

charchit7 commented Nov 6, 2024

hjw-0909 commented Nov 6, 2024

sayakpaul commented Nov 7, 2024

hjw-0909 commented Nov 8, 2024

sayakpaul commented Nov 8, 2024

elismasilva commented Nov 15, 2024

asomoza commented Nov 15, 2024

sayakpaul commented Nov 16, 2024

elismasilva commented Nov 16, 2024