Add Learning Rate Scheduler to neurallamConfig #107

matschreiner · 2025-01-27T10:44:41Z

Describe your changes

< Summary of the changes.>

I have added an optimization field to the TrainingConfig in neural_lam/config.py. This field takes a lr, lr_scheduler and lr_scheduler_kwargs.

The lr_scheduler is checked against available schedulers in pytorch after the field is initialized. If the scheduler doesn't exist it will throw an InvalidConfigError.

The optimization config is added as a member to the ARModel class, so we can access it in configure_optimizers.
I think it would have been better to add the entire config, but maybe its a conscious choice that the config is not an attribute of the model.

I have included tests to check if the configuration instantiation fails with an invalid scheduler and a valid scheduler.
I also test if the learning rate is decreasing as expected on the optimizer when calling the learning_rate scheduler.

It is a breaking change because I have removed the --lr option in the arguments parser and moved it to the configuration file.

If for example we wanted to add a CosineAnnealingLR with T_max 1000 and eta_min = we would add the following to the config yaml.

datastore:
     ...
training:
  state_feature_weighting:
    ...
  optimization:
    lr_scheduler: CosineAnnealingLR
    lr_scheduler_kwargs:
      T_max: 1000
      eta_min: 0.0

Also update some tests to use a fixture for model_args

< Please also include relevant motivation and context. >
I was tasked with exploring different schedulers and learning rates and thought it would be nice to log these things in the configuration files.

< List any dependencies that are required for this change. >
None

Issue Link

#97

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the README to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug

Checklist for assignee

PR is up to date with the base branch
the tests pass
author has added an entry to the changelog (and designated the change as added, changed or fixed)
Once the PR is ready to be merged, squash commits and merge the PR.

matschreiner · 2025-01-27T10:56:21Z

@khintz
@leifdenby

leifdenby · 2025-01-28T07:55:00Z

From a quick glance this is looking really great! Thanks for doing this @matschreiner! Since this is looking quite close to complete and is quite an isolated feature, do you want to propose this for the next release (v0.4.0, edit: I can see @joeloskarsson already proposed this for v0.5.0 on #97 so maybe we keep this feature for that release)?

It looks like the linting is complaining right now. We use pre-commit for linting. You can install the development tools with python -m pip install -e ".[dev]" or python -m pip install -e ".[dev]". And you can install a git hook so pre-commit is automatically run with pre-commit install :)

Maybe if you fit the linting issues and then I can give this a proper testing and review?

…tached. Otherwise return the lr in the configuration

joeloskarsson · 2025-02-03T16:21:52Z

This is really nice to have! A couple of thoughts:

It would probably be good to handle lr-scheduler parameters similar to

neural-lam/neural_lam/models/ar_model.py

Lines 755 to 757 in 807e476

    
           if not self.restore_opt: 
        
               opt = self.configure_optimizers() 
        
               checkpoint["optimizer_states"] = [opt.state_dict()]

, to have some control if they should be loaded from a previous checkpoint or not.

With this the lr is configured in the config script rather than with argparse flags. I think there is a larger discussion to be had about what should be in the config vs command line arguments. To me it is a bit unintuitive why e.g. lr should be somewhere else than the number of epochs I train for, or the ar_steps. I would always adjust these together, meaning I would have to change the config file and the bash-script I use for starting training runs. Up until now the setup in practice has been that you only write the config once, to tell neural-lam where to read data and how to weight variables, and then the actual training setup being ran is specified using command line args. There is definitely an argument to be made that all/most things should be in a config file. For now I just want us to have a thought-through idea of what goes where.

matschreiner · 2025-02-04T08:11:58Z

@joeloskarsson

Thanks for having a look at my code :)

I've changed it now so that we can restore lr_schedulers same way as we can restore the optimizer :)
I agree that training time and learning rate scheduler setup is related, and I think that the issue is partly in the way pytorch is set up. But since the model is responsible for instantiating optimizers and schedulers, I think it makes sense to also include them in the model configuration file. The number of epochs/training steps on the other hand are arguments for the trainer so I'd say they belong in the arguments for the training script.

matschreiner · 2025-02-04T08:13:13Z

Also I can't pass the tests because of timeout errors. Maybe it could be fixed by merging in Leifs PR.

joeloskarsson · 2025-02-04T10:32:35Z

Yea the way these things are handled in pytorch and lightning really is not optimal (from a software design standpoint). I'm not sure if I have any idea for a good solution. Just wanted to add that I don't think it is suitable to think of the config file as a model config, as the model architecture is mainly defined through command line arguments. I think of it more as a config of how the model should read and weight data at the moment, but maybe that is something that should actually change (what the config is for).

Jacob Mathias Schreiner added 7 commits January 24, 2025 15:01

add configuration

5a78931

post init

a4f2013

check valid scheduler

c737c89

test valid and invalid config

f5a84f7

remove lr arg

5dcab8c

ar_model

2d780d4

remove optimization_config

a770f29

Jacob Mathias Schreiner added 5 commits January 27, 2025 12:59

cosine annealing test

8171e5d

add model args fixture

d55bf91

get from optimization config

14ab353

add test to check if lr_scheduler is working as expected

9824d82

use args fixture

8f3cd51

Jacob Mathias Schreiner added 8 commits January 28, 2025 09:24

remove unused import

20bd8f5

add config instead

d5a06c8

move scheduler step to batch_end and add step frequency to the scheduler

c702800

add bal_steps to log to model args, is needed for somet tests

bad0e33

add learning rate to logs

ea9b9df

make lr a feature that is only accessed if the model has a trainer at…

4e111a6

…tached. Otherwise return the lr in the configuration

Merge branch 'main' into scheduler

f38dddd

trigger action

6c6f6e2

Jacob Mathias Schreiner added 2 commits February 4, 2025 08:49

change return type and enable resetting learning rate scheduler

1ee2c2a

remove commented out code

65b9086

Jacob Mathias Schreiner added 2 commits February 4, 2025 13:09

Merge branch 'main' into scheduler

0e8921e

add metrics watch to modelconfig

4a47c03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Learning Rate Scheduler to neurallamConfig #107

Add Learning Rate Scheduler to neurallamConfig #107

matschreiner commented Jan 27, 2025 •

edited by leifdenby

Loading

matschreiner commented Jan 27, 2025 •

edited

Loading

leifdenby commented Jan 28, 2025 •

edited

Loading

joeloskarsson commented Feb 3, 2025

matschreiner commented Feb 4, 2025

matschreiner commented Feb 4, 2025 •

edited

Loading

joeloskarsson commented Feb 4, 2025

Add Learning Rate Scheduler to neurallamConfig #107

Are you sure you want to change the base?

Add Learning Rate Scheduler to neurallamConfig #107

Conversation

matschreiner commented Jan 27, 2025 • edited by leifdenby Loading

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

matschreiner commented Jan 27, 2025 • edited Loading

leifdenby commented Jan 28, 2025 • edited Loading

joeloskarsson commented Feb 3, 2025

matschreiner commented Feb 4, 2025

matschreiner commented Feb 4, 2025 • edited Loading

joeloskarsson commented Feb 4, 2025

matschreiner commented Jan 27, 2025 •

edited by leifdenby

Loading

matschreiner commented Jan 27, 2025 •

edited

Loading

leifdenby commented Jan 28, 2025 •

edited

Loading

matschreiner commented Feb 4, 2025 •

edited

Loading