updated changes

securefederatedai · Nov 21, 2024 · 05df4a3 · 05df4a3
1 parent c734c73
commit 05df4a3
Showing 1 changed file with 103 additions and 62 deletions.
diff --git a/openfl-tutorials/experimental/106_Scikit_Learn_Linear_Regression_Workflow.ipynb b/openfl-tutorials/experimental/106_Scikit_Learn_Linear_Regression_Workflow.ipynb
@@ -5,11 +5,10 @@
    "id": "9a61e30e-83d1-422d-aa0a-ebd4376197ec",
    "metadata": {},
    "source": [
-    "## Scikit-learn Linear Regression Tutorial using Workflow Interface with Ridge Regularization\n",
+    "# Scikit-learn Linear Regression Tutorial using Workflow Interface with Ridge Regularization\n",
     "\n",
     "\n",
-    "This tutorial demonstrates how to train a linear regression model using scikit-learn with Ridge regularization on a dataset, leveraging the new FedAI Workflow Interface. The Workflow Interface provides a novel way to compose federated learning experiments with OpenFL, enabling researchers to handle non-IID data and perform federated averaging with optimizations like FedProx. Through this tutorial, you will learn how to set up the federated learning environment, define the flow, and execute the training process across multiple collaborators\n",
-    "\n"
+    "This tutorial demonstrates how to train a linear regression model using scikit-learn with Ridge regularization on a dataset, leveraging the new FedAI Workflow Interface. The Workflow Interface provides a novel way to compose federated learning experiments with OpenFL, enabling researchers to handle non-IID data and perform federated averaging. Through this tutorial, you will learn how to set up the federated learning environment, define the flow, and execute the training process across multiple collaborators."
    ]
   },
   {
@@ -19,7 +18,7 @@
     "tags": []
    },
    "source": [
-    "# We will use MSE as loss function and Ridge weights regularization\n",
+    "## We will use MSE as loss function and Ridge weights regularization\n",
     "![image.png](https://www.analyticsvidhya.com/wp-content/uploads/2016/01/eq5-1.png)"
    ]
   },
@@ -156,8 +155,8 @@
    },
    "outputs": [],
    "source": [
-    "# Define input array with angles from 60deg to 300deg converted to radians\n",
-    "x = np.array([i*np.pi/180 for i in range(60,300,4)])\n",
+    "# Define input array with angles from 60deg to 400deg converted to radians\n",
+    "x = np.array([i*np.pi/180 for i in range(60,400,4)])\n",
     "np.random.seed(10)  # Setting seed for reproducibility\n",
     "y = np.sin(x) + np.random.normal(0,0.15,len(x))\n",
     "# plt.plot(x,y,'.')"
@@ -222,20 +221,12 @@
     "## Now we run the same training on federated learning workflow api"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "id": "66cb4ecf-cb27-4367-891a-6cea2f1d347f",
-   "metadata": {},
-   "source": [
-    "## Test on a Federation"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "08527aab-4b0f-472b-af0c-27ed4ade85c1",
    "metadata": {},
    "source": [
-    "## Import necessary libraries for federated learning"
+    "## Import required libraries for federated learning"
    ]
   },
   {
@@ -247,11 +238,10 @@
    },
    "outputs": [],
    "source": [
-    "# Import necessary libraries for federated learning\n",
+    "# Import ncessary libraries\n",
     "import numpy as np\n",
     "from sklearn.linear_model import Lasso\n",
     "from sklearn.metrics import mean_squared_error\n",
-    "from sklearn.datasets import make_regression\n",
     "from openfl.experimental.interface import FLSpec\n",
     "from openfl.experimental.placement import aggregator, collaborator\n",
     "from openfl.experimental.runtime import FederatedRuntime\n",
@@ -269,35 +259,12 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "42187367-6a96-4d78-a576-f7154e8f987f",
+   "cell_type": "markdown",
+   "id": "052a4195-a410-4983-84b9-942159d9d345",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "# Define a class to split the dataset into shards\n",
-    "class ShardSplitter:\n",
-    "    def __init__(self, num_shards):\n",
-    "        self.num_shards = num_shards\n",
-    "\n",
-    "    def split(self, X, y):\n",
-    "        \"\"\"Split the given 2D numpy arrays X and y into equal shards and return list of indexes for each shard.\"\"\"\n",
-    "        num_samples = X.shape[0]\n",
-    "        shard_size = num_samples // self.num_shards\n",
-    "        indexes = np.arange(num_samples)\n",
-    "        #np.random.shuffle(indexes)\n",
-    "        \n",
-    "        shards = []\n",
-    "        for i in range(self.num_shards):\n",
-    "            start_idx = i * shard_size\n",
-    "            if i == self.num_shards - 1:\n",
-    "                # Include any remaining samples in the last shard\n",
-    "                end_idx = num_samples\n",
-    "            else:\n",
-    "                end_idx = start_idx + shard_size\n",
-    "            shards.append(indexes[start_idx:end_idx])\n",
-    "        \n",
-    "        return shards"
+    "## Federated Learning Helper Functions\n",
+    "Define helper functions for training and validating the federated models."
    ]
   },
   {
@@ -351,42 +318,89 @@
   },
   {
    "cell_type": "markdown",
-   "id": "8a3aefde-45a3-4bba-8b5b-6040808de966",
+   "id": "a5cf193b-62a2-403a-96b1-5fa716d9087f",
    "metadata": {},
    "source": [
-    "## Define the federated learning workflow"
+    "## Shard Splitter Class\n",
+    "Define a helper class to split the data into shards for federated learning."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c04b4ab2-1d40-44c7-907b-a6a7d176c959",
+   "id": "42187367-6a96-4d78-a576-f7154e8f987f",
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Define the federated learning workflow\n",
-    "from openfl.experimental.placement import aggregator, collaborator\n",
+    "# Define a class to split the dataset into shards\n",
+    "class ShardSplitter:\n",
+    "    def __init__(self, num_shards):\n",
+    "        self.num_shards = num_shards\n",
     "\n",
-    "def inference(model, test_loader, batch_size):\n",
-    "    x_test, y_test = test_loader\n",
-    "    loss, accuracy = model.evaluate(\n",
-    "        x_test,\n",
-    "        y_test,\n",
-    "        batch_size=batch_size,\n",
-    "        verbose=1\n",
-    "    )\n",
-    "    accuracy_percentage = accuracy * 100\n",
-    "    print(f\"Test set: Avg. loss: {loss}, Accuracy: {accuracy_percentage:.2f}%\")\n",
-    "    return accuracy\n",
-    "    \n",
+    "    def split(self, X, y):\n",
+    "        \"\"\"Split the given 2D numpy arrays X and y into equal shards and return list of indexes for each shard.\"\"\"\n",
+    "        num_samples = X.shape[0]\n",
+    "        shard_size = num_samples // self.num_shards\n",
+    "        indexes = np.arange(num_samples)\n",
+    "        #np.random.shuffle(indexes)\n",
+    "        \n",
+    "        shards = []\n",
+    "        for i in range(self.num_shards):\n",
+    "            start_idx = i * shard_size\n",
+    "            if i == self.num_shards - 1:\n",
+    "                # Include any remaining samples in the last shard\n",
+    "                end_idx = num_samples\n",
+    "            else:\n",
+    "                end_idx = start_idx + shard_size\n",
+    "            shards.append(indexes[start_idx:end_idx])\n",
+    "        \n",
+    "        return shards"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a915fa13-fc10-4b02-808c-ad2c05cee297",
+   "metadata": {},
+   "source": [
+    "## Define Federated Averaging Method\n",
+    "The FedAvg method is used to average the models from all the collaborators after training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65e2b0a2-5a79-4f92-b255-4c8d3e28b635",
+   "metadata": {},
+   "outputs": [],
+   "source": [
     "# Federated Averaging for Lasso models\n",
     "def FedAvg(models):\n",
     "    new_model = models[0]\n",
     "    coef_list = [model.model.coef_ for model in models]\n",
     "    intercept_list = [model.model.intercept_ for model in models]\n",
     "    new_model.coef_ = np.mean(coef_list, axis=0)\n",
     "    new_model.intercept_ = np.mean(intercept_list, axis=0)\n",
-    "    return new_model\n",
+    "    return new_model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8a3aefde-45a3-4bba-8b5b-6040808de966",
+   "metadata": {},
+   "source": [
+    "## Define Federated Learning Workflow\n",
+    "Define the workflow for federated learning using OpenFL's FLSpec."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c04b4ab2-1d40-44c7-907b-a6a7d176c959",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define the federated learning workflow\n",
+    "from openfl.experimental.placement import aggregator, collaborator\n",
     "\n",
     "# Federated Learning Workflow using OpenFL's Workflow API\n",
     "class FederatedLassoFlow(FLSpec):\n",
@@ -463,6 +477,15 @@
     "        print(f\"Final aggregated model MSE on test data: {final_mse:.4f}\")"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3255cac0-caf3-4713-beef-9ff32fe73372",
+   "metadata": {},
+   "source": [
+    "## Start the Federated Learning Process\n",
+    "Create an instance of FederatedLassoFlow and run it with the new larger dataset."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -479,6 +502,24 @@
     "# Start the federated learning process\n",
     "federated_flow.run()"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddfb3ac6-d14a-4e00-83a9-04195b1efdf8",
+   "metadata": {},
+   "source": [
+    "## 🎉 Congratulations! 🎉"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbb985a7-1a10-410e-99d3-cebdf2b809a2",
+   "metadata": {},
+   "source": [
+    "Now that you've completed workflow interface notebook for **scikit-learn Linear Regression** using federated learning.\n",
+    "\n",
+    "### Happy learning and happy coding with OpenFL! 🎉"
+   ]
   }
  ],
  "metadata": {