Skip to content

Commit

Permalink
Modify the name of baselines and the catalogue. (#115)
Browse files Browse the repository at this point in the history
* Add files via upload

* Add files via upload

* Add files via upload

* Update README.md

* Rename figure4overview.pdf to overview.pdf

* Update README.md

* Add files via upload

ALL IMAGES IN THE PAPER.THE FORMAT IS PGN.

* Update README.md

* Update README.md

* Update README.md

* Delete assets/strategy_compare.pdf

* Delete assets/GPU_memory_usage.pdf

* Delete assets/Exp-Adaptive.pdf

* Delete assets/Multi-LORA.pdf

* Delete assets/Throuput_compare.pdf

* Delete assets/abnormal_and_normal.pdf

* Delete assets/data_distribution.pdf

* Delete assets/early_stop_and_original.pdf

* Delete assets/early_stop_example.pdf

* Delete assets/gpu-memory-utilization.pdf

* Delete assets/overview.pdf

* Delete assets/pad.pdf

* Delete assets/Adaptive_scheduling.png

* Delete assets/pad_example.png

* Delete assets/strategy_compare.png

* Delete assets/minpad.png

* Delete assets/join-accuracy-and-loss.png

* Delete assets/early_stop_example.png

* Delete assets/early_stop_and_original.png

* Delete assets/different_sequence_length.png

* Delete assets/data_distribution.png

* Delete assets/abnormal_and_normal.png

* Delete assets/LoRA_and_MultiLoRA.png

* Delete assets/Exp-Mem.png

* Delete assets/gpu-memory-utilization.png

* Update README.md

* Update README.md

* modify README,add supported models table and example

* Modify the name of baselines.example:SYNC->Alpaca-Parallel.Modify the Catalogue.Add more explanation.
  • Loading branch information
Trilarflagz authored Dec 6, 2023
1 parent 3c6e9db commit 8fd0283
Showing 1 changed file with 28 additions and 11 deletions.
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ ASPEN (a.k.a Multi-Lora Fine-Tune) is an open-source framework for fine-tuning L
## Contents

- [Updates](#updates)
- [Supported Models](#Models)
- [Overview](#overview)
- [Getting Started](#Quickstart)
- [Installation](#Installation)
Expand All @@ -26,6 +27,21 @@ ASPEN (a.k.a Multi-Lora Fine-Tune) is an open-source framework for fine-tuning L
- Support multiple LLaMA fine-tuning
- On the way, Baichuan

## Models

| | Model | Model size |
|---------------------------------|------------------------------------------------|-----------------|
| <input type="checkbox" checked> | [ChatGLM](https://github.com/THUDM/ChatGLM-6B) | 6B |
| <input type="checkbox" checked> | [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B) | 6B/12B |
| <input type="checkbox"> | [ChatGLM3](https://github.com/THUDM/ChatGLM3) | 6B | |
| <input type="checkbox" checked> | [LLaMA](https://github.com/facebookresearch/llama) | 7B//13B/33B/65B |
| <input type="checkbox" checked> | [LLaMA-2](https://huggingface.co/meta-llama) | 7B/13B/70B |
| <input type="checkbox"> | [Baichuan](https://github.com/baichuan-inc/Baichuan-13B) | 7B/13B |
| <input type="checkbox"> | [Baichuan2](https://github.com/baichuan-inc/Baichuan2) | 7B/13B |

> **Example:** Use our system to improve the LLaMa-2 fine-tuning with less resources
>https://www.kaggle.com/code/rraydata/multi-lora-example/notebook
## Overview

**ASPEN** is a high-throughput LLM fine-tuning framework based on LoRA and QLoRA, compatible with HuggingFace-Transformers LLaMA Models and ChatGLM Models.
Expand All @@ -48,45 +64,46 @@ ASPEN requires [PyTorch](https://pytorch.org/) and [NVIDIA CUDA](https://develop

Environment: NVIDIA RTX A6000 with Intel Xeon Silver 4314 on Ubuntu 22.04.3

Baseline: We utilized the widely adopted [Alpaca-LoRA](https://github.com/tloen/alpaca-lora) as a foundation. On a single GPU, we independently ran multiple Alpaca-LoRA processes in parallel (marked as *Baseline@SYNC*) and sequentially (marked as *Baseline@SEQ*), forming two baseline methods for the experiments.
Baseline: We utilized the widely adopted [Alpaca-LoRA](https://github.com/tloen/alpaca-lora) as a foundation. On a single GPU, we independently ran multiple Alpaca-LoRA processes in parallel (marked as *Baseline@Alpaca-Parallel*) and sequentially (marked as *Baseline@Alpaca-Seq*), forming two baseline methods for the experiments. We test this on A100, and rest of results are based on the same GPU configure.

#### Training Latency and Throughput

Method|Latency|Throughput
:---:|:---:|:---:
Baseline@SEQ|10.51h|608.41 token/s
Baseline@SYNC|9.85h|649.30 token/s
Baseline@Alpaca-Seq|10.51h|608.41 token/s
Baseline@Alpaca-Parallel|9.85h|649.30 token/s
ASPEN|9.46h|674.58 token/s

We conducted four identical fine-tuning jobs with same dataset and same hyper-parameters, incorporating two baselines and ASPEN. During the experimental process, we collected the completion times for each task in the baseline methods and calculated the time taken by the slowest task as the *Training Latency*. As shown in Table, ASPEN exhibits lower *Training Latency* compared to both baseline methods. Specifically, ASPEN is 9.99% faster than *Baseline@SEQ* and 3.92% faster than *Baseline@SYNC*.
We conducted four identical fine-tuning jobs with same dataset and same hyper-parameters, incorporating two baselines and ASPEN. During the experimental process, we collected the completion times for each task in the baseline methods and calculated the time taken by the slowest task as the *Training Latency*. As shown in Table, ASPEN exhibits lower *Training Latency* compared to both baseline methods. Specifically, ASPEN is 9.99% faster than *Baseline@Alpaca-Seq* and 3.92% faster than *Baseline@Alpaca-Parallel*.
<div align="center"><img src="./assets/throughput_compare.png" width="100%"></div>

#### Video Memory Usage


#### Video Memory Usage
<div align="center"><img src="./assets/GPU_memory_usage.png" width="100%"></div>

We conducted several fine-tuning jobs with same dataset and `batch_size = {2,4, 6, 8}`, incorporating *Baseline@SYNC* and ASPEN.
We conducted several fine-tuning jobs with same dataset and `batch_size = {2,4, 6, 8}`, incorporating *Baseline@Alpaca-Parallel* and ASPEN.

*Baseline@SYNC* triggered OOM error after 3 parallel tasks when batch size = 8, while ASPEN can handle twice that amount.
*Baseline@Alpaca-Parallel* triggered OOM error after 3 parallel tasks when batch size = 8, while ASPEN can handle twice that amount.

#### Batching Strategies

Method|Training Latency|Peak Memory Usage|Average GPU Utilization|Training Throughput
:---:|:---:|:---:|:---:|:---:
Baseline@SEQ|27.73h|10.68GB|79.39%|653.35 token/s
Baseline@Alpaca-Seq|27.73h|10.68GB|79.39%|653.35 token/s
ASPEN@M1|36.82h|23.82GB|96.52%|672.54 token/s
ASPEN@M2|39.14h|23.86GB|96.41%|671.28 token/s
ASPEN@M3|22.97h|23.85GB|95.22%|674.41 token/s

We conducted four fine-tuning jobs with different dataset but same hyper-parameters, incorporating *Baseline@SEQ* and ASPEN.
We conducted four fine-tuning jobs with different dataset but same hyper-parameters, incorporating *Baseline@Alpaca-Seq* and ASPEN.

During the experimental process, we collected following metrics:
+ *Training Latency* = Job completion time
+ *Throughput* = The number of passed tokens in model forward process / training latency
+ *Memory Usage* = Peak video memory usage
+ *GPU Utilization* = Average GPU utilization

All metrics are computed for each job. `M1, M2, M3` represent three batch strategies of ASPEN: *Optimal-Fit, Trivial, and Fast-Fit*. `BASELINE` denotes *Baseline@SEQ*.
All metrics are computed for each job. `M1, M2, M3` represent three batch strategies of ASPEN: *Optimal-Fit, Trivial, and Fast-Fit*. `BASELINE` denotes *Baseline@Alpaca-Seq*.

The *Optimal-Fit* strategy performs the best across all four metrics, while the other two strategies also outperform the baseline method other than training latency.
### Use Cases:
Expand Down Expand Up @@ -145,7 +162,7 @@ Submit a pull request with a detailed explanation of your changes.
Please cite the repo if you use the code in this repo.
```bibtex
@misc{Multi-LoRA,
author = {Zhengmao, Ye\textsuperscript{*} and Dengchun, Li\textsuperscript{*} and Tingfeng, Lan and Yanbo, Liang and Yexi, Jiang and Jie, Zuo and Hui, Lu and Lei, Duan and Mingjie, Tang},
author = {Zhengmao, Ye\textsuperscript{*} and Dengchun, Li\textsuperscript{*} and Jingqi, Tian and Tingfeng, Lan and Yanbo, Liang and Yexi, Jiang and Jie, Zuo and Hui, Lu and Lei, Duan and Mingjie, Tang},
title = {ASPEN: Efficient LLM Model Fine-tune and Inference via Multi-Lora Optimization},
year = {2023},
publisher = {GitHub},
Expand Down

0 comments on commit 8fd0283

Please sign in to comment.