How to reduce costs and improve performance of your Machine Learning (ML) workloads?

In this repo you'll learn how to use AWS Trainium and AWS Inferentia with Amazon SageMaker and Hugging Face Optimum Neuron, to optimize your ML workloads! Here you find workshops, tutorials, blog post content, etc. you can use to learn and inspire your own solution.

The content you find here is focused on particular use cases. If you're looking for standalone model samples for inference and training, please check this other repo: https://github.com/aws-neuron/aws-neuron-samples.

Workshops

Title
Fine-tune and deploy LLM from Hugging Face on AWS Trainium and AWS Inferentia	Learn how to create a spam classifier that can be easily integrated to your own application
Adapting LLMs for domain-aware applications with AWS Trainium post-training	Learn how to adapt a pre-trained model to your own business needs and add a conversational interface your customers can interact with

These workshops are supported by AWS Workshop Studio

Tutorials

Description
inf1 - Extract embeddings from raw text
inf1 - Track objects in video streaming using CV
inf1 - Create a closed question Q&A model
ind2 - Generate images using SD
inf1 - Answer questions given a context
trn1 - Fine-tune a LLM using distributed training
inf2 - Deploy a LLM to HF TGI

Blog posts content

Description
Llama3-8B Deployment on AWS Inferentia 2 with Amazon EKS and vLLM

Contributing

If you have questions, comments, suggestions, etc. please feel free to cut tickets in this repo.

Also, please refer to the CONTRIBUTING document for further details on contributing to this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

Workshops

Tutorials

Blog posts content

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

Workshops

Tutorials

Blog posts content

Contributing