Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Learning Rate Scheduler to neurallamConfig #107

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

matschreiner
Copy link

@matschreiner matschreiner commented Jan 27, 2025

Describe your changes

< Summary of the changes.>

I have added an optimization field to the TrainingConfig in neural_lam/config.py. This field takes a lr, lr_scheduler and lr_scheduler_kwargs.

The lr_scheduler is checked against available schedulers in pytorch after the field is initialized. If the scheduler doesn't exist it will throw an InvalidConfigError.

The optimization config is added as a member to the ARModel class, so we can access it in configure_optimizers.
I think it would have been better to add the entire config, but maybe its a conscious choice that the config is not an attribute of the model.

I have included tests to check if the configuration instantiation fails with an invalid scheduler and a valid scheduler.
I also test if the learning rate is decreasing as expected on the optimizer when calling the learning_rate scheduler.

It is a breaking change because I have removed the --lr option in the arguments parser and moved it to the configuration file.

If for example we wanted to add a CosineAnnealingLR with T_max 1000 and eta_min = we would add the following to the config yaml.

datastore:
     ...
training:
  state_feature_weighting:
    ...
  optimization:
    lr_scheduler: CosineAnnealingLR
    lr_scheduler_kwargs:
      T_max: 1000
      eta_min: 0.0

Also update some tests to use a fixture for model_args

< Please also include relevant motivation and context. >
I was tasked with exploring different schedulers and learning rates and thought it would be nice to log these things in the configuration files.

< List any dependencies that are required for this change. >
None

Issue Link

#97

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@matschreiner
Copy link
Author

matschreiner commented Jan 27, 2025

@khintz
@leifdenby

@leifdenby
Copy link
Member

leifdenby commented Jan 28, 2025

From a quick glance this is looking really great! Thanks for doing this @matschreiner! Since this is looking quite close to complete and is quite an isolated feature, do you want to propose this for the next release (v0.4.0, edit: I can see @joeloskarsson already proposed this for v0.5.0 on #97 so maybe we keep this feature for that release)?

It looks like the linting is complaining right now. We use pre-commit for linting. You can install the development tools with python -m pip install -e ".[dev]" or python -m pip install -e ".[dev]". And you can install a git hook so pre-commit is automatically run with pre-commit install :)

Maybe if you fit the linting issues and then I can give this a proper testing and review?

@joeloskarsson
Copy link
Collaborator

This is really nice to have! A couple of thoughts:

  1. It would probably be good to handle lr-scheduler parameters similar to
    if not self.restore_opt:
    opt = self.configure_optimizers()
    checkpoint["optimizer_states"] = [opt.state_dict()]
    , to have some control if they should be loaded from a previous checkpoint or not.
  2. With this the lr is configured in the config script rather than with argparse flags. I think there is a larger discussion to be had about what should be in the config vs command line arguments. To me it is a bit unintuitive why e.g. lr should be somewhere else than the number of epochs I train for, or the ar_steps. I would always adjust these together, meaning I would have to change the config file and the bash-script I use for starting training runs. Up until now the setup in practice has been that you only write the config once, to tell neural-lam where to read data and how to weight variables, and then the actual training setup being ran is specified using command line args. There is definitely an argument to be made that all/most things should be in a config file. For now I just want us to have a thought-through idea of what goes where.

@matschreiner
Copy link
Author

@joeloskarsson

Thanks for having a look at my code :)

  1. I've changed it now so that we can restore lr_schedulers same way as we can restore the optimizer :)

  2. I agree that training time and learning rate scheduler setup is related, and I think that the issue is partly in the way pytorch is set up. But since the model is responsible for instantiating optimizers and schedulers, I think it makes sense to also include them in the model configuration file. The number of epochs/training steps on the other hand are arguments for the trainer so I'd say they belong in the arguments for the training script.

@matschreiner
Copy link
Author

matschreiner commented Feb 4, 2025

Also I can't pass the tests because of timeout errors. Maybe it could be fixed by merging in Leifs PR.

@joeloskarsson
Copy link
Collaborator

Yea the way these things are handled in pytorch and lightning really is not optimal (from a software design standpoint). I'm not sure if I have any idea for a good solution. Just wanted to add that I don't think it is suitable to think of the config file as a model config, as the model architecture is mainly defined through command line arguments. I think of it more as a config of how the model should read and weight data at the moment, but maybe that is something that should actually change (what the config is for).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants