Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why isn’t VRAM being released after training LoRA? #9876

Open
hjw-0909 opened this issue Nov 6, 2024 · 11 comments
Open

Why isn’t VRAM being released after training LoRA? #9876

hjw-0909 opened this issue Nov 6, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@hjw-0909
Copy link

hjw-0909 commented Nov 6, 2024

Describe the bug

When I use train_dreambooth_lora_sdxl.py, the VRAM is not released after training. How can I fix this?

Reproduction

Not used.

Logs

No response

System Info

  • 🤗 Diffusers version: 0.31.0.dev0
  • Platform: Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.17
  • Running on Google Colab?: No
  • Python version: 3.8.20
  • PyTorch version (GPU?): 2.2.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.25.2
  • Transformers version: 4.45.2
  • Accelerate version: 1.0.1
  • PEFT version: 0.13.2
  • Bitsandbytes version: 0.44.1
  • Safetensors version: 0.4.5
  • xFormers version: not installed
  • Accelerator: NVIDIA H800, 81559 MiB
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help?

No response

@hjw-0909 hjw-0909 added the bug Something isn't working label Nov 6, 2024
@SahilCarterr
Copy link
Contributor

I think you should Try manual flushing the gpu memory
To see PID of the process sudo fuser -v /dev/nvidia*
Then kill the PID that you no longer need
sudo kill -9 PID.
@hjw-0909

@hjw-0909
Copy link
Author

hjw-0909 commented Nov 6, 2024

@SahilCarterr I mean that after training, I want to perform other tasks without ending the entire Python script. In theory, VRAM should be released once train_lora.py completes the training, but it isn’t being freed.

@charchit7
Copy link
Contributor

As @SahilCarterr mentioned your process might be stalled.
Or try freeing the GPU mem in your code after the training loop is completed.

@hjw-0909
Copy link
Author

hjw-0909 commented Nov 6, 2024

@charchit7 I have added torch.cuda.empty_cache() after train, but it not worked.

@sayakpaul
Copy link
Member

Can you show a snap of the memory to make sure the memory is actually not released?

@hjw-0909
Copy link
Author

hjw-0909 commented Nov 8, 2024

@sayakpaul I ensure that the memory is properly released at the end of the .py script. However, I have noticed that after training with LoRA, the memory isn't fully released.

@sayakpaul
Copy link
Member

I ensure that the memory is properly released at the end of the .py script.

I don't understand what does this mean. Could you explain further?

@elismasilva
Copy link
Contributor

Hi @hjw-0909 , This usually happens to me when I'm doing inferences. So you can try adding with torch.no_grad(): before all inference calls inside the script and also try putting it before the start of the training loop (I've never tested this, I don't know if it can affect the final result). At the end of the process you do torch.cuda.empty_cache()

@asomoza
Copy link
Member

asomoza commented Nov 15, 2024

The training script is just for that and never was intended as part of something bigger or to keep doing inference after it.

Probably the cuda empty cache isn't freeing the VRAM because you will have to delete the pipelines and any reference to the models the script has.

Usually what I do is to create a separate thread for training and then just delete the whole thread so it doesn't keep any reference of the training part.

@sayakpaul
Copy link
Member

torch.no_grad() may not be needed as we also keep our pipeline calls decorated under torch.no_grad().

@elismasilva
Copy link
Contributor

torch.no_grad() may not be needed as we also keep our pipeline calls decorated under torch.no_grad().

You are right, the pipeline already has it, however, when I do my local tests I usually call encode_prompt and prepare_ip_adapter_image_embeds before the pipeline call and then I put with torch.no_grad() in them to avoid this problem. But I don't know if this happens in all environments.
In the training script I noticed that the line 1713 does not have this instruction before

prompt_embeds, pooled_prompt_embeds = encode_prompt(

unlike the line that does, maybe this is the cause of the problem.

Maybe it's worth testing to see if this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants