Name	Name	Last commit message	Last commit date
Latest commit dgrove-oss Create gh-pages-static.yml (#7 ) Jun 28, 2024 0dd15f5 · Jun 28, 2024 History 11 Commits
.github/workflows	.github/workflows	Create gh-pages-static.yml (#7 )	Jun 28, 2024
docs	docs	Initial import	Jun 26, 2024
samples	samples	Initial import	Jun 26, 2024
scheduler-plugins @ 96a3366	scheduler-plugins @ 96a3366	Initial import	Jun 26, 2024
setup	setup	disable Kueue's waitForPodsReady feature (#1 )	Jun 28, 2024
tools/pytorchjob-generator	tools/pytorchjob-generator	CI action (#6 )	Jun 28, 2024
.gitignore	.gitignore	Initial import	Jun 26, 2024
.gitmodules	.gitmodules	Initial import	Jun 26, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	enable precommit hooks	Jun 27, 2024
LICENSE	LICENSE	add license	Jun 27, 2024
README.md	README.md	add license	Jun 27, 2024
SETUP.md	SETUP.md	document waitForPodsReady setting (#2 )	Jun 28, 2024
USAGE.md	USAGE.md	refer to helm chart in ft section of usage.md	Jun 27, 2024

Repository files navigation

MLBatch

This repository describes the setup and use of the MLBatch queuing and quota management system on OpenShift clusters. MLBatch leverages Kueue, the Kubeflow Training Operator, KubeRay, and the Codeflare Operator from Red Hat OpenShift AI. MLBatch enables AppWrappers and adds Coscheduler. MLBatch includes a number of configuration steps to help these components work in harmony and support large workloads on large clusters.

MLBatch handles the queuing and dispatching of batch workloads on OpenShift clusters. It enforces team quotas at the namespace level. It automates the borrowing and reclamation of unused quotas across teams. Teams can use priorities within their namespaces without impact on other teams. Using AppWrappers to submit workloads activates a number of fault detection and recovery capabilities, including automatically detecting failed pods and automatically retrying failed workloads. Coscheduler supports gang scheduling and minimizes fragmentation by preferentially packing jobs requiring less than a full node's worth of GPUs together.

Cluster Setup

To learn how to setup MLBatch on a cluster and onboard teams see SETUP.md.

Quick Start

To learn how to use MLBatch to run workloads see USAGE.md.

PyTorchJobs via the MLBatch Helm Chart

Properly configuring a distributed PyTorchJob to make effective use of the MLBatch system and hardware accelerators (GPUs, RoCE GDR) can be tedious. To automate this process, we provide a Helm chart that captures best practices and common configuration options. Using this Helm chart helps eliminate common mistakes. Please see pytorchjob-generator for detailed usage instructions.

Development Setup

If you will be contributing to the development of the MLBatch project, you must setup precommit hooks for your local clone of the repository. Do the following once, immediately after cloning this repo:

helm plugin install https://github.com/helm-unittest/helm-unittest.git
pre-commit install

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLBatch

Cluster Setup

Quick Start

PyTorchJobs via the MLBatch Helm Chart

Development Setup

License

About

Releases 9

Packages

Contributors 2

Languages

License

project-codeflare/mlbatch

Folders and files

Latest commit

History

Repository files navigation

MLBatch

Cluster Setup

Quick Start

PyTorchJobs via the MLBatch Helm Chart

Development Setup

License

About

Resources

License

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 2

Languages

Packages