Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task Runner API: Secure Aggregation #1346

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

theakshaypant
Copy link
Collaborator

@theakshaypant theakshaypant commented Feb 6, 2025

Introduction

This PR introduces secure aggregation to Task Runner API.

Would specially apprecaite comments on how tensors are exchanged between the participants.

Changes

  • Secure aggregation on client side is implemented as callback.
  • This callback is run on_experiment_begin.
  • Collaborator side steps are implemented using CollaboratorSecAgg callback.
  • Aggregator side steps are implemented using openfl.utilities.secagg.Setup whcih performs "aggregation" of the tensors sent by the collaborators..
  • send_local_task_results is modfied to save secure aggregation setup stage tensors to aggregator tensor db.
  • Masks are added to "metric"s when a task is completed before sending the result to the aggregator.
  • SecureAggregation is used for "metric"s when enabled instead of WeightedAverage.
  • Added a secure_aggregation: bool flag to the plan under
    • aggregator.settings
    • collaborator.settings

NOTE

  • Does not cover dropouts, will be handled as part of SecAgg+ integration in future PRs.
  • In this implementation, there is a 60 second wait period when collaborator tries to fetch keys from the aggregator, fails if there is no response during this time.
    • This time can be increased or further retries can be added.

Testing

  • Aggregator logs (with extra debug prints)
    Screenshot 2025-02-11 at 1 52 04 AM

  • Collaborator 1 logs
    Screenshot 2025-02-11 at 1 52 41 AM

  • Collaborator 2 logs
    Screenshot 2025-02-11 at 1 53 02 AM

@@ -94,6 +94,7 @@ def run(self):
'tensorboardX',
'protobuf>=4.22,<6.0.0',
'grpcio>=1.56.2,<1.66.0',
'pycryptodome'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this as setup, even when secure aggregation not needed by customer, they need to install it.

@@ -0,0 +1,154 @@
# Copyright 2020-2025 Intel Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is cryptography package in openfl. Isn't secagg packages more suitable inside cryptography.

openfl/utilities/secagg/crypto.py Outdated Show resolved Hide resolved
shape (Tuple): Shape of the numpy array to be generated.

Returns:
np.ndarray: array with pseudo-randomly generated numbers.

Check notice

Code scanning / Bandit

Standard pseudo-random generators are not suitable for security/cryptographic purposes. Note

Standard pseudo-random generators are not suitable for security/cryptographic purposes.
"""
super().__init__(batch_size, **kwargs)

# TODO: We should be downloading the dataset shard into a directory
Copy link
Collaborator

@tanwarsh tanwarsh Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this comment - refer #1333 (comment)

@@ -168,6 +175,18 @@ def set_available_devices(self, cuda: Tuple[str] = ()):
def run(self):
"""Run the collaborator."""
# Experiment begin

# FIXME: Not working when added to callbacks on line 157.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MasterSkepticista Need your input on this.

openfl-workspace/keras/mnist_secagg/plan/plan.yaml Outdated Show resolved Hide resolved
openfl-workspace/keras/mnist_secagg/plan/plan.yaml Outdated Show resolved Hide resolved
@@ -0,0 +1,118 @@
# Copyright (C) 2020-2021 Intel Corporation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it not possible to refer the original file, if there is no sec_agg specific changes in here?
If not possible, we can change name to utils.py, mnist is already in the path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have followed the same format as all other workspaces where the funnction definition is in all of them.
Changed file name in 1579c05.

@@ -168,6 +175,18 @@ def set_available_devices(self, cuda: Tuple[str] = ()):
def run(self):
"""Run the collaborator."""
# Experiment begin

# FIXME: Not working when added to callbacks on line 157.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this or remove the comment if no longer required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need input on this. Refer #1346 (comment).

openfl/component/aggregator/aggregator.py Outdated Show resolved Hide resolved
openfl/component/aggregator/aggregator.py Outdated Show resolved Hide resolved
openfl/component/aggregator/aggregator.py Outdated Show resolved Hide resolved
tensor_name = named_tensor.name
# Check if all collaborators have sent their data for the
# current key.
all_collaborators_sent = self.secagg.wait_for_all_collaborators(tensor_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of staggler, will it hang indefinitely?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no wait in this method. Only checks if all collaborators have shared "tensor_name", will rename it appropriately.

openfl/utilities/secagg/setup.py Outdated Show resolved Hide resolved
@payalcha
Copy link
Collaborator

payalcha commented Feb 13, 2025

@theakshaypant are we are storing keys and cipher text right in persistent storage?
Collaborator restart should not impact, secure aggregation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants