-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why isn’t VRAM being released after training LoRA? #9876
Comments
I think you should Try manual flushing the gpu memory |
@SahilCarterr I mean that after training, I want to perform other tasks without ending the entire Python script. In theory, VRAM should be released once |
As @SahilCarterr mentioned your process might be stalled. |
@charchit7 I have added |
Can you show a snap of the memory to make sure the memory is actually not released? |
@sayakpaul I ensure that the memory is properly released at the end of the |
I don't understand what does this mean. Could you explain further? |
Hi @hjw-0909 , This usually happens to me when I'm doing inferences. So you can try adding |
The training script is just for that and never was intended as part of something bigger or to keep doing inference after it. Probably the cuda empty cache isn't freeing the VRAM because you will have to delete the pipelines and any reference to the models the script has. Usually what I do is to create a separate thread for training and then just delete the whole thread so it doesn't keep any reference of the training part. |
|
You are right, the pipeline already has it, however, when I do my local tests I usually call encode_prompt and prepare_ip_adapter_image_embeds before the pipeline call and then I put with torch.no_grad() in them to avoid this problem. But I don't know if this happens in all environments.
unlike the line that does, maybe this is the cause of the problem.
Maybe it's worth testing to see if this works. |
Describe the bug
When I use train_dreambooth_lora_sdxl.py, the VRAM is not released after training. How can I fix this?
Reproduction
Not used.
Logs
No response
System Info
Who can help?
No response
The text was updated successfully, but these errors were encountered: