An error occurs while training the model multiple times #23

HyCfl · 2024-08-05T12:52:01Z

Hello,
I apologize for disturbing you! Thank you for your open-source project ReST, which has been incredibly helpful for my studies. During my learning process, I encountered the following issues:
Based on the original code, I only modified the parameters related to the dataset settings in the Wildtrack.yml file. During the training of the SG model, the results of the first and second trainings were correct . However, starting from the third training, the model could not converge during training. After avg_train_loss reached 0.0045, the loss value stopped decreasing. In all subsequent training attempts, I could not obtain correct results, and the same error as in the third training occurred.
I printed the gradient of the model's back-propagation during each training session in the training logs. I found that the gradient values decreased rapidly, and I suspect this is the cause of the issue.
Notes:
The code was not modified between these training runs; it was exactly the same, and the intervals between runs were very short.
In the attachments, log_1 and log_2 are the correct training logs; log_3 and log_4 are the incorrect training logs; log_5 and log_6 are the incorrect training logs with the training gradients saved.
I am not sure how to resolve this issue and hope you can guide me on how to fix it. Looking forward to your reply, thank you very much.
log_1.txt
log_2.txt
log_3.txt
log_4.txt
log_5.txt
log_6.txt

Mr-Akbari · 2024-12-23T11:00:53Z

Hello, I apologize for disturbing you! Thank you for your open-source project ReST, which has been incredibly helpful for my studies. During my learning process, I encountered the following issues: Based on the original code, I only modified the parameters related to the dataset settings in the Wildtrack.yml file. During the training of the SG model, the results of the first and second trainings were correct . However, starting from the third training, the model could not converge during training. After avg_train_loss reached 0.0045, the loss value stopped decreasing. In all subsequent training attempts, I could not obtain correct results, and the same error as in the third training occurred. I printed the gradient of the model's back-propagation during each training session in the training logs. I found that the gradient values decreased rapidly, and I suspect this is the cause of the issue. Notes: The code was not modified between these training runs; it was exactly the same, and the intervals between runs were very short. In the attachments, log_1 and log_2 are the correct training logs; log_3 and log_4 are the incorrect training logs; log_5 and log_6 are the incorrect training logs with the training gradients saved. I am not sure how to resolve this issue and hope you can guide me on how to fix it. Looking forward to your reply, thank you very much. log_1.txt log_2.txt log_3.txt log_4.txt log_5.txt log_6.txt

Hello HyCfI,
Could you solve it? I have the same problem, can you help me?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An error occurs while training the model multiple times #23

An error occurs while training the model multiple times #23

HyCfl commented Aug 5, 2024

Mr-Akbari commented Dec 23, 2024 •

edited

Loading

An error occurs while training the model multiple times #23

An error occurs while training the model multiple times #23

Comments

HyCfl commented Aug 5, 2024

Mr-Akbari commented Dec 23, 2024 • edited Loading

Mr-Akbari commented Dec 23, 2024 •

edited

Loading