Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training for a while (random time) the just stops with errors. second try to get som answers #3028

Open
Erlandsson opened this issue Dec 31, 2024 · 1 comment

Comments

@Erlandsson
Copy link

Erlandsson commented Dec 31, 2024

I have so much problems. I thought i setup a new system but this time with windows 11. completely CLEAN with just firefox installed.

i9, 64Gb of RAM, rtx4080super with latest drivers.

Installed all requrements as per instructions.
gitcloned a new kohya, set it up the same way as usual.

It installs with no errors, and browser window start. I set up training as i did a large number of times when it DID work on my win10 (i successfully trained several loras, from 15 minutes to 10 hours, no problems).

I use standard constant train and adamw or adafactor, no strange settings. tried 5 repeats, 10, 20 etc.
3 epochs, 5, 10 etc. no difference.
All these new clean installs, train for a while, sometimes 5 minutes, sometimes 2 hours, but ultimately fails with these errors in pic attached.
Cudas is working 90-100% while training.

deleting env and restart does not help.

Training while writing here, AND IT STOPPED.. same training, but got a little longer this time. 38% this time.
kohya error

@Erlandsson
Copy link
Author

Problem is solved. Everything works now (automatic1111,Forge ui, Kohya)

I found out it was INTEL CPU BOOST !!!!. I noticed that when running kohya, automatic etc, that cpu frequency spiked around 5.5-5.8Ghz (i9 1490k 3.2Ghz), and began to think it if was that somehow files read back and forth just got corrupted in that speed?
Even though NO OTHER program i run have or show any problems. Like Blender for example. But otn the other hand, it does not access drives that much)..

So i tried turning off INTEL TURBO BOOST in the bios. That helped RIGHT AWAY. No problems at all installing kohya, no more failed files, no more failed downloads etc. I have now run several 4-5 hour trains with no hickup.

Now i just need to see if i can get windows OWN cpu limiter to work , maybe i can try my way up to a higher clock where it is stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant