Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resource tracker and train process kill #172

Open
Feng204825 opened this issue Jan 15, 2025 · 1 comment
Open

resource tracker and train process kill #172

Feng204825 opened this issue Jan 15, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Feng204825
Copy link

Describe the bug
During the fine-tuning process of the model, there is a resource leak issue that causes the training process to be terminated. Initial investigation shows that resources are sufficient. I would appreciate your help in determining the cause of this issue.

To Reproduce
Please provide a code snippet of a minimal reproducible example for the error.

python -m cli.train -cp conf/finetune run_name=example_run model=moirai_1.1_R_small data=etth1 val_data=etth1  

Expected behavior
During the fine-tuning process of the model, there is a resource leak issue that causes the training process to be terminated. Initial investigation shows that resources are sufficient. I would appreciate your help in determining the cause of this issue.
image
Pasted Graphic 3

Environment

  • Operating system: Ubuntu 20.04.6 LTS
  • Python version: Python 3.10.16
  • PyTorch version:torch 2.4.1
  • uni2ts version:1.2.0
@Feng204825 Feng204825 added the bug Something isn't working label Jan 15, 2025
@liuxu77
Copy link
Contributor

liuxu77 commented Jan 15, 2025

Hi @Feng204825, thanks for asking this question. I don't think this is a problem, I guess it's because you terminated the last command and started a new one immediately, causing some resources not to be fully released. To prevent this, after you stop the program, wait for all resources to be released before starting the next one. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants