Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #718 WandB in OpenFL and example to change number of epochs per round #895

Closed
wants to merge 2 commits into from

Conversation

CasellaJr
Copy link

With this PR it is possible to use WandB for tracking metrics of OpenFL experiments. For the Aggregator and each Collaborator (thanks to @Giemp95) will be created a new WandB run, that will be grouped under the name of the Aggregator for the sake of cleanliness.
Moreover, in the PyTorch_TinyImageNet.ipynb you can find how to track the metrics, and how to modify the number of epochs per round.

Signed-off-by: Bruno Casella <[email protected]>
Signed-off-by: Bruno Casella <[email protected]>
@CasellaJr
Copy link
Author

Sorry @psfoley, there was a typo in the aggregator.py. I fixed it.

@psfoley psfoley self-requested a review November 14, 2023 16:47
Copy link
Contributor

@psfoley psfoley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CasellaJr - thanks for this contribution. There's some changes that are required before this can be approved. Those changes fall into two categories:

  1. wandb cannot be assumed to be present in all federations. It should be made optional, and ideally this would be configurable in the FL Plan. The LOG_WANDB variable you defined I think aims to make it's use optional, but this is not something a user could set without modifying aggregator.py or collaborator.py directly in the current version. See the get_aggregator function in plan.py as an example of how the aggregator object gets constructed, and how the plan.yaml parameters are used to initialize the object.
  2. Hard coded variables. The wandb project name and configuration should be made configurable (again through the plan).

@@ -18,6 +18,8 @@
from openfl.utilities import TensorKey
from openfl.utilities.logs import write_metric

import wandb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require wandb as an import for OpenFL, which is something we want to avoid.

@@ -18,6 +18,8 @@
from openfl.utilities import TensorKey
from openfl.utilities.logs import write_metric

import wandb
LOG_WANDB = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable by the definer of the experiment, i.e. pass this through the OpenFL Plan

#The following variable, my_aggregator_name, can be changed to whatever you want. Right now I suppose that the name of the collaborators of the federation is in the format DATASETNAME_ENV_NUMBER
my_aggregator_name = '_'.join(set(element.split('_')[0] for element in authorized_cols))
if LOG_WANDB:
wandb.init(project="my_project", entity="my_group", group=f"{my_aggregator_name}", tags=["my_tag"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the project, entity, group, etc. should also be configurable through the plan

@@ -55,6 +57,17 @@ def __init__(self,
log_metric_callback=None,
**kwargs):
"""Initialize."""
#INITIALIZE WANDB WITH CORRECT NAME
#The following variable, my_aggregator_name, can be changed to whatever you want. Right now I suppose that the name of the collaborators of the federation is in the format DATASETNAME_ENV_NUMBER
my_aggregator_name = '_'.join(set(element.split('_')[0] for element in authorized_cols))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved into the if LOG_WANDB block because it's not used elsewhere. Also, it's a bit confusing that the my_aggregator_name variable is a combined list of collaborators. May want to modify this to:

Suggested change
my_aggregator_name = '_'.join(set(element.split('_')[0] for element in authorized_cols))
my_aggregator_name = 'Aggregator_' + '_'.join(set(element.split('_')[0] for element in authorized_cols))

Comment on lines +66 to +67
"num_clients": 4,
"rounds": 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with the wandb configuration, but the same guidance applies here as earlier comments. I would expect num_clients and rounds to be configurable or based on existing parameters used elsewhere

Comment on lines +17 to +18
import wandb
LOG_WANDB = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Should only import this if wandb is required in the plan

Comment on lines +139 to +144
wandb.init(project="my_project", entity="my_group", tags=["my_tags"],
config={
"num_clients": 4,
"rounds": 100,
},
name=self.collaborator_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above. Make project, entity, tags configurable. Use client count and rounds from existing plan variables.

@theakshaypant
Copy link
Collaborator

@CasellaJr can you please resolve the comments/conflicts and request for review again?

@psfoley
Copy link
Contributor

psfoley commented Jan 3, 2025

Hi @CasellaJr - OpenFL will be adding a callback API in 1.8 so that WandB and other MLOps tools can be more easily inserted into the experiment. Closing this for now

@psfoley psfoley closed this Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants