Examples for use XPU with IPEX (securefederatedai#904)

* Added XPU example for Workflow Interface This commit introduces an example of using an XPU with the Workflow Interface. The example demonstrates how to leverage the power of XPU to optimize the execution of complex workflows with OpenFL * Added XPU example for non-federated_case This commit introduces an example of using an XPU with the non-federated_case. * Removed non-federated_case_XPU file * Updated Workflow_Interface_104_MNIST_XPU file * Added TinyImagenet example for interactive api and XPU * Update xpu definition and copyright * Added link for download xpu driver Signed-off-by: nammbash <[email protected]>
nammbash · Feb 27, 2024 · ccf23fa · ccf23fa
1 parent ca9895e
commit ccf23fa
Show file tree

Hide file tree

Showing 12 changed files with 1,513 additions and 0 deletions.
diff --git a/openfl-tutorials/experimental/Workflow_Interface_104_MNIST_XPU.ipynb b/openfl-tutorials/experimental/Workflow_Interface_104_MNIST_XPU.ipynb
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/README.md b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/README.md
@@ -0,0 +1,90 @@
+# PyTorch_TinyImageNet
+
+## **How to run this tutorial (without TLC and locally as a simulation):**
+<br/>
+
+Before we dive in, let's clarify some terms. XPU is a term coined by Intel to describe their line of computing devices, which includes CPUs, GPUs, FPGAs, and other accelerators. In this tutorial, we will be focusing on the Intel® Data Center GPU Max Series model, a GPU that is part of Intel's XPU lineup.
+
+### 0a. If you haven't done so already, create a virtual environment, install OpenFL, and upgrade pip:
+  - For help with this step, visit the "Install the Package" section of the [OpenFL installation instructions](https://openfl.readthedocs.io/en/latest/install.html#install-the-package).
+
+<br/>
+
+### 0b. Quick XPU Setup
+  In this tutorial, when we refer to XPU, we are specifically referring to the Intel® Data Center GPU Max Series. When using the Intel® Extension for PyTorch* package, selecting the device as 'xpu' will refer to this Intel® Data Center GPU Max Series.
+
+  For a successful setup, please follow the steps outlined in the [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/tutorials/installation.html). This guide provides detailed information on system requirements and the installation process for the Intel® Extension for PyTorch. For a deeper understanding of features, APIs, and technical details, refer to the [Intel® Extension for PyTorch* Documentation](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/index.html).
+
+Hardware Prerequisite: Intel® Data Center GPU Max Series.
+
+This Jupyter Notebook has been tested and confirmed to work with the following versions:
+
+  - intel-extension-for-pytorch==2.0.120 (xpu)
+  - pytorch==2.0.1
+  - torchvision==0.15.2
+
+These versions were obtained from official Intel® channels.
+
+Additionally, the XPU driver version used in testing was:
+
+  - [XPU_Driver==803](https://dgpu-docs.intel.com/driver/installation.html)
+
+
+<br/>
+
+### 1. Split terminal into 3 (1 terminal for the director, 1 for the envoy, and 1 for the experiment)
+
+<br/> 
+
+### 2. Do the following in each terminal:
+   - Activate the virtual environment from step 0:
+
+   ```sh
+   source venv/bin/activate
+   ```
+   - If you are in a network environment with a proxy, ensure proxy environment variables are set in each of your terminals.
+   - Navigate to the tutorial:
+
+   ```sh
+   cd openfl/openfl-tutorials/interactive_api/PyTorch_TinyImageNet
+   ```
+
+<br/>
+
+### 3. In the first terminal, run the director:
+
+```sh
+cd director
+./start_director.sh
+```
+
+<br/>
+
+### 4. In the second terminal, install requirements and run the envoy:
+
+```sh
+cd envoy
+pip install -r requirements.txt
+./start_envoy.sh env_one envoy_config.yaml
+```
+
+Optional: Run a second envoy in an additional terminal:
+  - Ensure step 2 is complete for this terminal as well.
+  - Run the second envoy:
+```sh
+cd envoy
+./start_envoy.sh env_two envoy_config.yaml
+```
+
+<br/>
+
+### 5. Now that your director and envoy terminals are set up, run the Jupyter Notebook in your experiment terminal:
+
+```sh
+cd workspace
+jupyter lab pytorch_tinyimagenet_XPU.ipynb
+```
+- A Jupyter Server URL will appear in your terminal. In your browser, proceed to that link. Once the webpage loads, click on the pytorch_tinyimagenet.ipynb file. 
+- To run the experiment, select the icon that looks like two triangles to "Restart Kernel and Run All Cells". 
+- You will notice activity in your terminals as the experiment runs, and when the experiment is finished the director terminal will display a message that the experiment has finished successfully.  
+
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/director_config.yaml b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/director_config.yaml
@@ -0,0 +1,5 @@
+settings:
+  listen_host: localhost
+  listen_port: 50051
+  sample_shape: ['64', '64', '3']
+  target_shape: ['64', '64']
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/start_director.sh b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/start_director.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+
+fx director start --disable-tls -c director_config.yaml
diff --git a/...fl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/start_director_with_tls.sh b/...fl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/director/start_director_with_tls.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+FQDN=$1
+fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/envoy_config.yaml b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/envoy_config.yaml
@@ -0,0 +1,10 @@
+params:
+  cuda_devices: []
+
+optional_plugin_components: {}
+
+shard_descriptor:
+  template: tinyimagenet_shard_descriptor.TinyImageNetShardDescriptor
+  params:
+    data_folder: tinyimagenet_data
+    rank_worldsize: 1,1
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/requirements.txt b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/requirements.txt
@@ -0,0 +1 @@
+Pillow==10.0.1
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/start_envoy.sh b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/start_envoy.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+set -e
+
+fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
diff --git a/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/start_envoy_with_tls.sh b/openfl-tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/start_envoy_with_tls.sh
@@ -0,0 +1,6 @@
+#!/bin/bash
+set -e
+ENVOY_NAME=$1
+DIRECTOR_FQDN=$2
+
+fx envoy start -n "$ENVOY_NAME" --envoy-config-path envoy_config.yaml -dh "$DIRECTOR_FQDN" -dp 50051 -rc cert/root_ca.crt -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt
diff --git a/...tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/tinyimagenet_shard_descriptor.py b/...tutorials/interactive_api/PyTorch_TinyImageNet_XPU/envoy/tinyimagenet_shard_descriptor.py
@@ -0,0 +1,120 @@
+# Copyright (C) 2020-2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+"""TinyImageNet Shard Descriptor."""
+
+import glob
+import logging
+import os
+import shutil
+from pathlib import Path
+from typing import Tuple
+
+from PIL import Image
+
+from openfl.interface.interactive_api.shard_descriptor import ShardDataset
+from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor
+
+logger = logging.getLogger(__name__)
+
+
+class TinyImageNetDataset(ShardDataset):
+    """TinyImageNet shard dataset class."""
+
+    NUM_IMAGES_PER_CLASS = 500
+
+    def __init__(self, data_folder: Path, data_type='train', rank=1, worldsize=1):
+        """Initialize TinyImageNetDataset."""
+        self.data_type = data_type
+        self._common_data_folder = data_folder
+        self._data_folder = os.path.join(data_folder, data_type)
+        self.labels = {}  # fname - label number mapping
+        self.image_paths = sorted(
+            glob.iglob(
+                os.path.join(self._data_folder, '**', '*.JPEG'),
+                recursive=True
+            )
+        )[rank - 1::worldsize]
+        wnids_path = os.path.join(self._common_data_folder, 'wnids.txt')
+        with open(wnids_path, 'r', encoding='utf-8') as fp:
+            self.label_texts = sorted([text.strip() for text in fp.readlines()])
+        self.label_text_to_number = {text: i for i, text in enumerate(self.label_texts)}
+        self.fill_labels()
+
+    def __len__(self) -> int:
+        """Return the len of the shard dataset."""
+        return len(self.image_paths)
+
+    def __getitem__(self, index: int) -> Tuple['Image', int]:
+        """Return an item by the index."""
+        file_path = self.image_paths[index]
+        label = self.labels[os.path.basename(file_path)]
+        return self.read_image(file_path), label
+
+    def read_image(self, path: Path) -> Image:
+        """Read the image."""
+        img = Image.open(path)
+        return img
+
+    def fill_labels(self) -> None:
+        """Fill labels."""
+        if self.data_type == 'train':
+            for label_text, i in self.label_text_to_number.items():
+                for cnt in range(self.NUM_IMAGES_PER_CLASS):
+                    self.labels[f'{label_text}_{cnt}.JPEG'] = i
+        elif self.data_type == 'val':
+            val_annotations_path = os.path.join(self._data_folder, 'val_annotations.txt')
+            with open(val_annotations_path, 'r', encoding='utf-8') as fp:
+                for line in fp.readlines():
+                    terms = line.split('\t')
+                    file_name, label_text = terms[0], terms[1]
+                    self.labels[file_name] = self.label_text_to_number[label_text]
+
+
+class TinyImageNetShardDescriptor(ShardDescriptor):
+    """Shard descriptor class."""
+
+    def __init__(
+            self,
+            data_folder: str = 'data',
+            rank_worldsize: str = '1,1',
+            **kwargs
+    ):
+        """Initialize TinyImageNetShardDescriptor."""
+        self.common_data_folder = Path.cwd() / data_folder
+        self.data_folder = Path.cwd() / data_folder / 'tiny-imagenet-200'
+        self.download_data()
+        self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))
+
+    def download_data(self):
+        """Download prepared shard dataset."""
+        zip_file_path = self.common_data_folder / 'tiny-imagenet-200.zip'
+        os.makedirs(self.common_data_folder, exist_ok=True)
+        os.system(f'wget --no-clobber http://cs231n.stanford.edu/tiny-imagenet-200.zip'
+                  f' -O {zip_file_path}')
+        shutil.unpack_archive(str(zip_file_path), str(self.common_data_folder))
+
+    def get_dataset(self, dataset_type):
+        """Return a shard dataset by type."""
+        return TinyImageNetDataset(
+            data_folder=self.data_folder,
+            data_type=dataset_type,
+            rank=self.rank,
+            worldsize=self.worldsize
+        )
+
+    @property
+    def sample_shape(self):
+        """Return the sample shape info."""
+        return ['64', '64', '3']
+
+    @property
+    def target_shape(self):
+        """Return the target shape info."""
+        return ['64', '64']
+
+    @property
+    def dataset_description(self) -> str:
+        """Return the shard dataset description."""
+        return (f'TinyImageNetDataset dataset, shard number {self.rank}'
+                f' out of {self.worldsize}')