Skip to content

felipevillaarenas/Template-Distribute-Training-AzureML-PyTorchLightning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Template-Distribute-Training-AzureML-PyTorchLightning

Python PyTorch Lightning MLAzure

📌  Introduction

This repository provides a PyTorch Lighting Template for Distribute Training on Azure ML.

This template was aim to user PytorchLightning 2.0 or Higher.

Why PyTorch Lightning?

PyTorch Lightning is a lightweight PyTorch wrapper for high-performance AI research. It makes your code neatly organized and provides lots of useful features, like ability to run model on CPU, GPU, multi-GPU cluster and TPU.

Why Azure ML?

It comes to the partnership between Microsoft Azure and OpenAI. This groundbreaking collaboration introduces a cloud-based platform designed to empower developers and data scientists to swiftly and effortlessly build and deploy AI models. Leveraging Azure OpenAI, users gain access to a comprehensive suite of cutting-edge AI tools and technologies, enabling intelligent applications that harness the power of natural language processing, computer vision, and deep learning.

Code Structure

To adhere to best practices, we will store all Azure SDK-related code in separate Python files located in the azure-jobs folder. Jobs can be seen as the connecting element between the compute cluster, data asset components, and PyTorch code. Conversely, the native PyTorch code will be placed in the src folder. As a result, if we decide to run our training on a different cloud provider, no modifications will be required in the src folder. Here is an example of how folder extractor could look like:

.
├── README.md
├── azure-jobs/
│   ├── config/
│   │   └── workspace.json
│   └── job.py
└── src/
    ├── datamodule.py
    ├── model.py
    ├── transforms.py
    └── trainer.py

Zero to Hero: A Comprehensive Tutorial for Distribute Training with Pytorch Lightning on Azure ML

I wrote this Tutorial as a comprehensive guide for Distributed Training (Multiple Nodes and multiple GPUs per node) with PyTorch Lightning on Azure ML.

About

Template for Distributed traning on AzureML with Pytorch Lightning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages