PyTorch Connector Testing

This repo contains subdirectories which test various PyTorch connectors for loading data from OCI object storage.

When working with PyTorch connectors pulling data from object storage, it is generally recommended by each connector to convert the dataset into an optimized format for cloud storage (generally larger shards that are more efficient to load). If provided, the tests will follow the conversion of data to the optimized format prior to running the test so that we are following best practices.

Given a test and dataset, transformers.Trainer() will be used with LoRA peft for fine-tuning. The transformers.Trainer() will use a custom callback function to save checkpoints, where it will directly write checkpoints to OCI object storage. The general testing flow will look like the following:

Optimize the dataset using the library specific optimizer in OCI object storage.
Stream the dataset from OCI object storage during training
Write the model checkpoints using a custom callback which uses the connector's writing functionality (if provided)

Evaluations

Both technical and experiential evaluations will be performed on each connector. In this way, if connectors share similar performance characteristics, experiential evaluations may guide preference.

Experiential Evaluation Criteria

The following experiential evaluations will be made with respect to each connector, as they are important for adoption:

Ease of use: how much work / studying / customization do I need to get each connector to work
Flexibility - is it flexible to handle a variety of data types
Integration - How well does it integrate with existing / common tooling?
Popularity - Is it easy to find information / example usage? Does it have a lot of GitHub stars?
Documentation - How good is the documentation for the tooling

Technical Evaluation Criteria

The following technical evaluations will be made with respect to each connector as performance measurements:

Metric	Description
Total Time (s)	End to end time of fine-tuning
GPU % Utilization	Percent of GPU compute utilized
CPU % Utilization	Percent of CPU compute utilized
GPU % Memory Utilization	Percent of GPU memory utilized
CPU % memory Utilization	Percent of CPU memory utilized

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
dataloading/mosaic_ml_streaming		dataloading/mosaic_ml_streaming
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch Connector Testing

Evaluations

Experiential Evaluation Criteria

Technical Evaluation Criteria

About

Releases

Packages

Languages

dkennetzoracle/pytorch_connector_testing

Folders and files

Latest commit

History

Repository files navigation

PyTorch Connector Testing

Evaluations

Experiential Evaluation Criteria

Technical Evaluation Criteria

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages