Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warmup steps #272

Open
mikehdt opened this issue Jan 16, 2025 · 10 comments
Open

Warmup steps #272

mikehdt opened this issue Jan 16, 2025 · 10 comments

Comments

@mikehdt
Copy link

mikehdt commented Jan 16, 2025

Just learning some lora training, and I'm trying to use the LR scheduler constant with warmup, with a Warmup Ratio set. However, I'm not sure the field is doing anything, or if I've used it correctly.

I'm guessing the value is 0 -> 1 where eg. 0.05 would be 5% of your steps used as warmup.

First, would this not make more sense as a field of 0 -> 100% instead of a fractional value?

Second, how can I validate that the correct number of warmup steps have been set when training starts? I don't see any value for how many warmup steps it should be using in the logs, hence why I'm not sure whether it worked or not.

Thanks!

@Jelosus2
Copy link
Collaborator

When you train you should be able to see the total warm up steps in the settings of the optimizer

@mikehdt
Copy link
Author

mikehdt commented Jan 19, 2025

To be clear, yes, I can see the warmup steps in the UI, albeit as a fractional number which as noted I'm assuming is meant to be between 0 -> 1. However, when I start training, there's no indication in the logs as to how many steps it's calculated for warmup.

I'm comparing this to using Kohya's GUI, which does output in the log window the calculated number of steps.

To be clear I know that it's just logging info, but without seeing what it's calculated, I'm not sure if what I've chosen is right.

@Jelosus2
Copy link
Collaborator

Jelosus2 commented Jan 19, 2025

No, like, literally in the scheduler parameters

Image

@uwidev
Copy link

uwidev commented Jan 19, 2025

I made some modifications on my end that will allow float values to be used. Unfortunately this isn't just as simple as just modifying the scheduler as the max steps isn't passed at all on sd-script's end. I got around this by doing adding these lines of code to sd-script's library/train-util.py.

https://github.com/uwidev/sd-scripts/blob/491c32c57c406e12c548e2e267c1d0592ef6b6e1/library/train_util.py#L4429-L4438

This will parse the args, such that if it's a float, it will be a percentage of the max steps. You use the same variables.

e.g.
I set training for 100 steps.
first_cycle_max_steps = 1.0 OR 0 -> first_cycle_max_steps = 100 must be 1.0, not 1
warmup_steps = 0.1 -> warmup_steps = 10

This specific code probably won't be added to sd-scripts, but perhaps something more generic is that sd-scripts could at least pass num_training_steps.

@uwidev
Copy link

uwidev commented Jan 19, 2025

Just made a pull request that would allow custom schedulers, like the implemented REX, to utilize pre-calculated variables related to the scheduler. I've adjusted REX on my end to utilize those values, but until the pull request is merged, use what I posted in my previous post.

kohya-ss/sd-scripts#1883

@mikehdt
Copy link
Author

mikehdt commented Jan 19, 2025

No, like, literally in the scheduler parameters

Hmm. I don't see that, at least not with AdamW. Here's how I've set the optimizer, just with 30% as a number to test:

Image

And what I get in the logs:

Image

Interestingly I went back to KSS GUI just to compare (although I've found the generation is inexplicably slower with it, probably a setting that is set in it but not in Lora ETS, I'm not experienced with training and the parameters enough to determine though). This is what's in the log. It does show calculated steps:

Image

But oddly once it hits the optimizer it doesn't set it at all?:

Image

Very confusing. Again, I'm not an expert with this, maybe I've set something incorrectly?

@uwidev
Copy link

uwidev commented Jan 19, 2025

My bad. I read this wrong and assumed it was relating to the custom schedulers. If you use the normal schedulers, having a float (decimal) will convert the number to as a percentage of the max steps. Using an integer (whole number) will have exactly that many warmup steps.

If the warmup was displayed as a percentage, users would be locked to using a percentage rather than a static number of warmup steps. Of course, you could have a way to change the units from percent to whole number, but having a single value makes it subjectively neater on the backend.

As for validation, you will have to log the training. Make sure you set up the logging directory, and then with python, install tensorboard with
python -m pip install tensorboard

And then run tensorboard with
tensorboard --logdir <your logging directory>

It should give you a link to see the training details. There you can verify that warmup steps are correct. You need to actually run the training and wait until it hits the end of the intended warmup, then you'll know it's correct or not.

Image

@uwidev
Copy link

uwidev commented Jan 20, 2025

What @Jelosus2 posted is specifically for custom schedulers. They need a specific parameter warmup_steps in optional args and do not use Warmup Ratio, which is used for built-in schedulers. constant with warmup, which is what you, @mikehdt are using, uses Warmup Ratio, and it is not displayed on console as far as I am aware.

@Jelosus2
Copy link
Collaborator

Jelosus2 commented Jan 20, 2025

You are right, but regardless, the warm up steps are calculated in the backend and then passed as an argument to the scripts. To know how many warmup steps you are using you can just use this formula (steps * batch size) * warmup percentage. Take in mind that gradient accumulation is counted as batch size and to get the total batch size it would be gradient accumulation * batch size and steps is the value you see in the console while training

@derrian-distro
Copy link
Owner

derrian-distro commented Jan 25, 2025

it appears that kohya doesn't display the warmup steps if provided an integer value. I have not tested how it displays in the event of providing it a ratio, but it seems like kohya supports that naturally now.

edit: it appears that nothing is actually displayed regardless, I will add a print statement so that the calculated value is visible to the end user, however it will just be standard text, so it might be a bit hard to pick out if it moves too fast

edit edit: I have decided to try and replicate the logger output so it's easier to pick out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants