Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples for use XPU with IPEX #904

Merged
merged 7 commits into from
Feb 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
735 changes: 735 additions & 0 deletions openfl-tutorials/experimental/Workflow_Interface_104_MNIST_XPU.ipynb

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# PyTorch_TinyImageNet

## **How to run this tutorial (without TLC and locally as a simulation):**
<br/>

Before we dive in, let's clarify some terms. XPU is a term coined by Intel to describe their line of computing devices, which includes CPUs, GPUs, FPGAs, and other accelerators. In this tutorial, we will be focusing on the Intel® Data Center GPU Max Series model, a GPU that is part of Intel's XPU lineup.

### 0a. If you haven't done so already, create a virtual environment, install OpenFL, and upgrade pip:
- For help with this step, visit the "Install the Package" section of the [OpenFL installation instructions](https://openfl.readthedocs.io/en/latest/install.html#install-the-package).

<br/>

### 0b. Quick XPU Setup
manuelhsantana marked this conversation as resolved.
Show resolved Hide resolved
In this tutorial, when we refer to XPU, we are specifically referring to the Intel® Data Center GPU Max Series. When using the Intel® Extension for PyTorch* package, selecting the device as 'xpu' will refer to this Intel® Data Center GPU Max Series.

For a successful setup, please follow the steps outlined in the [Installation Guide](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/tutorials/installation.html). This guide provides detailed information on system requirements and the installation process for the Intel® Extension for PyTorch. For a deeper understanding of features, APIs, and technical details, refer to the [Intel® Extension for PyTorch* Documentation](https://intel.github.io/intel-extension-for-pytorch/xpu/2.1.10+xpu/index.html).

Hardware Prerequisite: Intel® Data Center GPU Max Series.

This Jupyter Notebook has been tested and confirmed to work with the following versions:

- intel-extension-for-pytorch==2.0.120 (xpu)
- pytorch==2.0.1
- torchvision==0.15.2

These versions were obtained from official Intel® channels.

Additionally, the XPU driver version used in testing was:

- [XPU_Driver==803](https://dgpu-docs.intel.com/driver/installation.html)


<br/>

### 1. Split terminal into 3 (1 terminal for the director, 1 for the envoy, and 1 for the experiment)

<br/>

### 2. Do the following in each terminal:
- Activate the virtual environment from step 0:

```sh
source venv/bin/activate
```
- If you are in a network environment with a proxy, ensure proxy environment variables are set in each of your terminals.
- Navigate to the tutorial:

```sh
cd openfl/openfl-tutorials/interactive_api/PyTorch_TinyImageNet
```

<br/>

### 3. In the first terminal, run the director:

```sh
cd director
./start_director.sh
```

<br/>

### 4. In the second terminal, install requirements and run the envoy:

```sh
cd envoy
pip install -r requirements.txt
./start_envoy.sh env_one envoy_config.yaml
```

Optional: Run a second envoy in an additional terminal:
- Ensure step 2 is complete for this terminal as well.
- Run the second envoy:
```sh
cd envoy
./start_envoy.sh env_two envoy_config.yaml
```

<br/>

### 5. Now that your director and envoy terminals are set up, run the Jupyter Notebook in your experiment terminal:

```sh
cd workspace
jupyter lab pytorch_tinyimagenet_XPU.ipynb
```
- A Jupyter Server URL will appear in your terminal. In your browser, proceed to that link. Once the webpage loads, click on the pytorch_tinyimagenet.ipynb file.
- To run the experiment, select the icon that looks like two triangles to "Restart Kernel and Run All Cells".
- You will notice activity in your terminals as the experiment runs, and when the experiment is finished the director terminal will display a message that the experiment has finished successfully.

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
settings:
listen_host: localhost
listen_port: 50051
sample_shape: ['64', '64', '3']
target_shape: ['64', '64']
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx director start --disable-tls -c director_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e
FQDN=$1
fx director start -c director_config.yaml -rc cert/root_ca.crt -pk cert/"${FQDN}".key -oc cert/"${FQDN}".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
params:
cuda_devices: []

optional_plugin_components: {}

shard_descriptor:
template: tinyimagenet_shard_descriptor.TinyImageNetShardDescriptor
params:
data_folder: tinyimagenet_data
rank_worldsize: 1,1
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Pillow==10.0.1
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/bin/bash
set -e

fx envoy start -n env_one --disable-tls --envoy-config-path envoy_config.yaml -dh localhost -dp 50051
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash
set -e
ENVOY_NAME=$1
DIRECTOR_FQDN=$2

fx envoy start -n "$ENVOY_NAME" --envoy-config-path envoy_config.yaml -dh "$DIRECTOR_FQDN" -dp 50051 -rc cert/root_ca.crt -pk cert/"$ENVOY_NAME".key -oc cert/"$ENVOY_NAME".crt
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Copyright (C) 2020-2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

"""TinyImageNet Shard Descriptor."""

import glob
import logging
import os
import shutil
from pathlib import Path
from typing import Tuple

from PIL import Image

from openfl.interface.interactive_api.shard_descriptor import ShardDataset
from openfl.interface.interactive_api.shard_descriptor import ShardDescriptor

logger = logging.getLogger(__name__)


class TinyImageNetDataset(ShardDataset):
"""TinyImageNet shard dataset class."""

NUM_IMAGES_PER_CLASS = 500

def __init__(self, data_folder: Path, data_type='train', rank=1, worldsize=1):
"""Initialize TinyImageNetDataset."""
self.data_type = data_type
self._common_data_folder = data_folder
self._data_folder = os.path.join(data_folder, data_type)
self.labels = {} # fname - label number mapping
self.image_paths = sorted(
glob.iglob(
os.path.join(self._data_folder, '**', '*.JPEG'),
recursive=True
)
)[rank - 1::worldsize]
wnids_path = os.path.join(self._common_data_folder, 'wnids.txt')
with open(wnids_path, 'r', encoding='utf-8') as fp:
self.label_texts = sorted([text.strip() for text in fp.readlines()])
self.label_text_to_number = {text: i for i, text in enumerate(self.label_texts)}
self.fill_labels()

def __len__(self) -> int:
"""Return the len of the shard dataset."""
return len(self.image_paths)

def __getitem__(self, index: int) -> Tuple['Image', int]:
"""Return an item by the index."""
file_path = self.image_paths[index]
label = self.labels[os.path.basename(file_path)]
return self.read_image(file_path), label

def read_image(self, path: Path) -> Image:
"""Read the image."""
img = Image.open(path)
return img

def fill_labels(self) -> None:
"""Fill labels."""
if self.data_type == 'train':
for label_text, i in self.label_text_to_number.items():
for cnt in range(self.NUM_IMAGES_PER_CLASS):
self.labels[f'{label_text}_{cnt}.JPEG'] = i
elif self.data_type == 'val':
val_annotations_path = os.path.join(self._data_folder, 'val_annotations.txt')
with open(val_annotations_path, 'r', encoding='utf-8') as fp:
for line in fp.readlines():
terms = line.split('\t')
file_name, label_text = terms[0], terms[1]
self.labels[file_name] = self.label_text_to_number[label_text]


class TinyImageNetShardDescriptor(ShardDescriptor):
"""Shard descriptor class."""

def __init__(
self,
data_folder: str = 'data',
rank_worldsize: str = '1,1',
**kwargs
):
"""Initialize TinyImageNetShardDescriptor."""
self.common_data_folder = Path.cwd() / data_folder
self.data_folder = Path.cwd() / data_folder / 'tiny-imagenet-200'
self.download_data()
self.rank, self.worldsize = tuple(int(num) for num in rank_worldsize.split(','))

def download_data(self):
"""Download prepared shard dataset."""
zip_file_path = self.common_data_folder / 'tiny-imagenet-200.zip'
os.makedirs(self.common_data_folder, exist_ok=True)
os.system(f'wget --no-clobber http://cs231n.stanford.edu/tiny-imagenet-200.zip'
f' -O {zip_file_path}')
shutil.unpack_archive(str(zip_file_path), str(self.common_data_folder))

def get_dataset(self, dataset_type):
"""Return a shard dataset by type."""
return TinyImageNetDataset(
data_folder=self.data_folder,
data_type=dataset_type,
rank=self.rank,
worldsize=self.worldsize
)

@property
def sample_shape(self):
"""Return the sample shape info."""
return ['64', '64', '3']

@property
def target_shape(self):
"""Return the target shape info."""
return ['64', '64']

@property
def dataset_description(self) -> str:
"""Return the shard dataset description."""
return (f'TinyImageNetDataset dataset, shard number {self.rank}'
f' out of {self.worldsize}')
Loading
Loading