Skip to content

Commit

Permalink
Examples for use XPU with IPEX (securefederatedai#904)
Browse files Browse the repository at this point in the history
* Added XPU example for Workflow Interface

This commit introduces an example of using an XPU with the Workflow Interface. The example demonstrates how to leverage the power of XPU to optimize the execution of complex workflows with OpenFL

* Added XPU example for non-federated_case

This commit introduces an example of using an XPU with the non-federated_case.

* Removed non-federated_case_XPU file

* Updated Workflow_Interface_104_MNIST_XPU file

* Added TinyImagenet example for interactive api and XPU

* Update xpu definition and copyright

* Added link for download xpu driver

Signed-off-by: nammbash <[email protected]>
  • Loading branch information
manuelhsantana authored and nammbash committed Feb 27, 2024
1 parent ca9895e commit ccf23fa
Show file tree
Hide file tree
Showing 12 changed files with 1,513 additions and 0 deletions.
735 changes: 735 additions & 0 deletions openfl-tutorials/experimental/Workflow_Interface_104_MNIST_XPU.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# PyTorch_TinyImageNet

## **How to run this tutorial (without TLC and locally as a simulation):**
<br/>

Before we dive in, let's clarify some terms. XPU is a term coined by Intel to describe their line of computing devices, which includes CPUs, GPUs, FPGAs, and other accelerators. In this tutorial, we will be focusing on the Intel® Data Center GPU Max Series model, a GPU that is part of Intel's XPU lineup.

### 0a. If you haven't done so already, create a virtual environment, install OpenFL, and upgrade pip:
- For help with this step, visit the "Install the Package" section of the [OpenFL installation instructions](https://openfl.readthedocs.io/en/latest/install.html#install-the-package).

<br/>

### 0b. Quick XPU Setup
In this tutorial, when we refer to XPU, we are specifically referring to the Intel® Data Center GPU Max Series. When using the Intel® Extension for PyTorch* package, selecting the device as 'xpu' will refer to this Intel® Data Center GPU Max Series.

For a successful setup, please follow the steps outlined in the [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/tutorials/installation.html). This guide provides detailed information on system requirements and the installation process for the Intel® Extension for PyTorch. For a deeper understanding of features, APIs, and technical details, refer to the [Intel® Extension for PyTorch* Documentation](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/index.html).

Hardware Prerequisite: Intel® Data Center GPU Max Series.

This Jupyter Notebook has been tested and confirmed to work with the following versions:

- intel-extension-for-pytorch==2.0.120 (xpu)
- pytorch==2.0.1
- torchvision==0.15.2

These versions were obtained from official Intel® channels.

Additionally, the XPU driver version used in testing was:

- [XPU_Driver==803](https://dgpu-docs.intel.com/driver/installation.html)


<br/>

### 1. Split terminal into 3 (1 terminal for the director, 1 for the envoy, and 1 for the experiment)

<br/>

### 2. Do the following in each terminal:
- Activate the virtual environment from step 0:

```sh
source venv/bin/activate
```
- If you are in a network environment with a proxy, ensure proxy environment variables are set in each of your terminals.
- Navigate to the tutorial:

```sh
cd openfl/openfl-tutorials/interactive_api/PyTorch_TinyImageNet
```

<br/>

### 3. In the first terminal, run the director:

```sh
cd director
./start_director.sh
```

<br/>

### 4. In the second terminal, install requirements and run the envoy:

```sh
cd envoy
pip install -r requirements.txt
./start_envoy.sh env_one envoy_config.yaml
```

Optional: Run a second envoy in an additional terminal:
- Ensure step 2 is complete for this terminal as well.
- Run the second envoy:
```sh
cd envoy
./start_envoy.sh env_two envoy_config.yaml
```

<br/>

### 5. Now that your director and envoy terminals are set up, run the Jupyter Notebook in your experiment terminal:

```sh
cd workspace
jupyter lab pytorch_tinyimagenet_XPU.ipynb
```
- A Jupyter Server URL will appear in your terminal. In your browser, proceed to that link. Once the webpage loads, click on the pytorch_tinyimagenet.ipynb file.
- To run the experiment, select the icon that looks like two triangles to "Restart Kernel and Run All Cells".
- You will notice activity in your terminals as the experiment runs, and when the experiment is finished the director terminal will display a message that the experiment has finished successfully.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
settings:
listen_host: localhost
listen_port: 50051
sample_shape: ['64', '64', '3']
target_shape: ['64', '64']
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e
FQDN=$1
fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
params:
cuda_devices: []

optional_plugin_components: {}

shard_descriptor:
template: tinyimagenet_shard_descriptor.TinyImageNetShardDescriptor
params:
data_folder: tinyimagenet_data
rank_worldsize: 1,1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Pillow==10.0.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
DIRECTOR_FQDN=$2

fx envoy start -n "$ENVOY_NAME" --envoy-config-path envoy_config.yaml -dh "$DIRECTOR_FQDN" -dp 50051 -rc cert/root_ca.crt -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Copyright (C) 2020-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""TinyImageNet Shard Descriptor."""

import glob
import logging
import os
import shutil
from pathlib import Path
from typing import Tuple

from PIL import Image

from openfl.interface.interactive_api.shard_descriptor import ShardDataset
from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor

logger = logging.getLogger(__name__)


class TinyImageNetDataset(ShardDataset):
"""TinyImageNet shard dataset class."""

NUM_IMAGES_PER_CLASS = 500

def __init__(self, data_folder: Path, data_type='train', rank=1, worldsize=1):
"""Initialize TinyImageNetDataset."""
self.data_type = data_type
self._common_data_folder = data_folder
self._data_folder = os.path.join(data_folder, data_type)
self.labels = {} # fname - label number mapping
self.image_paths = sorted(
glob.iglob(
os.path.join(self._data_folder, '**', '*.JPEG'),
recursive=True
)
)[rank - 1::worldsize]
wnids_path = os.path.join(self._common_data_folder, 'wnids.txt')
with open(wnids_path, 'r', encoding='utf-8') as fp:
self.label_texts = sorted([text.strip() for text in fp.readlines()])
self.label_text_to_number = {text: i for i, text in enumerate(self.label_texts)}
self.fill_labels()

def __len__(self) -> int:
"""Return the len of the shard dataset."""
return len(self.image_paths)

def __getitem__(self, index: int) -> Tuple['Image', int]:
"""Return an item by the index."""
file_path = self.image_paths[index]
label = self.labels[os.path.basename(file_path)]
return self.read_image(file_path), label

def read_image(self, path: Path) -> Image:
"""Read the image."""
img = Image.open(path)
return img

def fill_labels(self) -> None:
"""Fill labels."""
if self.data_type == 'train':
for label_text, i in self.label_text_to_number.items():
for cnt in range(self.NUM_IMAGES_PER_CLASS):
self.labels[f'{label_text}_{cnt}.JPEG'] = i
elif self.data_type == 'val':
val_annotations_path = os.path.join(self._data_folder, 'val_annotations.txt')
with open(val_annotations_path, 'r', encoding='utf-8') as fp:
for line in fp.readlines():
terms = line.split('\t')
file_name, label_text = terms[0], terms[1]
self.labels[file_name] = self.label_text_to_number[label_text]


class TinyImageNetShardDescriptor(ShardDescriptor):
"""Shard descriptor class."""

def __init__(
self,
data_folder: str = 'data',
rank_worldsize: str = '1,1',
**kwargs
):
"""Initialize TinyImageNetShardDescriptor."""
self.common_data_folder = Path.cwd() / data_folder
self.data_folder = Path.cwd() / data_folder / 'tiny-imagenet-200'
self.download_data()
self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))

def download_data(self):
"""Download prepared shard dataset."""
zip_file_path = self.common_data_folder / 'tiny-imagenet-200.zip'
os.makedirs(self.common_data_folder, exist_ok=True)
os.system(f'wget --no-clobber http://cs231n.stanford.edu/tiny-imagenet-200.zip'
f' -O {zip_file_path}')
shutil.unpack_archive(str(zip_file_path), str(self.common_data_folder))

def get_dataset(self, dataset_type):
"""Return a shard dataset by type."""
return TinyImageNetDataset(
data_folder=self.data_folder,
data_type=dataset_type,
rank=self.rank,
worldsize=self.worldsize
)

@property
def sample_shape(self):
"""Return the sample shape info."""
return ['64', '64', '3']

@property
def target_shape(self):
"""Return the target shape info."""
return ['64', '64']

@property
def dataset_description(self) -> str:
"""Return the shard dataset description."""
return (f'TinyImageNetDataset dataset, shard number {self.rank}'
f' out of {self.worldsize}')
Loading

0 comments on commit ccf23fa

Please sign in to comment.