Template-Distribute-Training-AzureML-PyTorchLightning

📌 Introduction

This repository provides a PyTorch Lighting Template for Distribute Training on Azure ML.

This template was aim to user PytorchLightning 2.0 or Higher.

Why PyTorch Lightning?

PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. It makes your code neatly organized and provides lots of useful features, like ability to run model on CPU, GPU, multi-GPU cluster and TPU.

Why Azure ML?

It comes to the partnership between Microsoft Azure and OpenAI. This groundbreaking collaboration introduces a cloud-based platform designed to empower developers and data scientists to swiftly and effortlessly build and deploy AI models. Leveraging Azure OpenAI, users gain access to a comprehensive suite of cutting-edge AI tools and technologies, enabling intelligent applications that harness the power of natural language processing, computer vision, and deep learning.

Code Structure

To adhere to best practices, we will store all Azure SDK-related code in separate Python files located in the azure-jobs folder. Jobs can be seen as the connecting element between the compute cluster, data asset components, and PyTorch code. Conversely, the native PyTorch code will be placed in the src folder. As a result, if we decide to run our training on a different cloud provider, no modifications will be required in the src folder. Here is an example of how folder extractor could look like:

.
├── README.md
├── azure-jobs/
│   ├── config/
│   │   └── workspace.json
│   └── job.py
└── src/
    ├── datamodule.py
    ├── model.py
    ├── transforms.py
    └── trainer.py

Zero to Hero: A Comprehensive Tutorial for Distribute Training with Pytorch Lightning on Azure ML

I wrote this Tutorial as a comprehensive guide for Distributed Training (Multiple Nodes and multiple GPUs per node) with PyTorch Lightning on Azure ML.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Template-Distribute-Training-AzureML-PyTorchLightning

📌 Introduction

Why PyTorch Lightning?

Why Azure ML?

Code Structure

Zero to Hero: A Comprehensive Tutorial for Distribute Training with Pytorch Lightning on Azure ML

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
azure-jobs		azure-jobs
src		src
README.md		README.md

felipevillaarenas/Template-Distribute-Training-AzureML-PyTorchLightning

Folders and files

Latest commit

History

Repository files navigation

Template-Distribute-Training-AzureML-PyTorchLightning

📌 Introduction

Why PyTorch Lightning?

Why Azure ML?

Code Structure

Zero to Hero: A Comprehensive Tutorial for Distribute Training with Pytorch Lightning on Azure ML

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages