-
Notifications
You must be signed in to change notification settings - Fork 72
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extending TGI benchmarking and documentation (#621)
* Initial Llama3-70b test * Missing .env file. .gitignore strikes again! * Adding script to run multiple batch sizes at once * changed mode to +x on shell script * fixing my poor bash syntax * Renaming directory to test on Trainium * adding trainium * Trainium compose example added * Readme changes * More Readme changes * Adding BS1 numbers * misspelling in export_model.mdx * misspelling in benchmark/text-generation-inference/README.md Co-authored-by: David Corvoysier <[email protected]> * Removing redundant HF_BATCH_SIZE and HF_SEQUENCE_LENGTH settings from .env and docker compose. * Trainium batch size 8 numbers added. --------- Co-authored-by: David Corvoysier <[email protected]>
- Loading branch information
1 parent
a3bb344
commit af0506f
Showing
15 changed files
with
257 additions
and
12 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
7 changes: 7 additions & 0 deletions
7
benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/.env
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
MODEL_ID='NousResearch/Meta-Llama-3-70B-Instruct' | ||
HF_AUTO_CAST_TYPE='fp16' | ||
MAX_BATCH_SIZE=4 | ||
MAX_INPUT_LENGTH=4000 | ||
MAX_TOTAL_TOKENS=4096 | ||
# MESSAGES_API_ENABLED='true' # Enable the messages API | ||
|
29 changes: 29 additions & 0 deletions
29
benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/docker-compose.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
version: '3.7' | ||
|
||
services: | ||
tgi-1: | ||
image: neuronx-tgi:latest | ||
ports: | ||
- "8080:8080" | ||
environment: | ||
- PORT=8080 | ||
- MODEL_ID=${MODEL_ID} | ||
- HF_AUTO_CAST_TYPE=${HF_AUTO_CAST_TYPE} | ||
- HF_NUM_CORES=24 | ||
- MAX_BATCH_SIZE=${MAX_BATCH_SIZE} | ||
- MAX_INPUT_LENGTH=${MAX_INPUT_LENGTH} | ||
- MAX_TOTAL_TOKENS=${MAX_TOTAL_TOKENS} | ||
- MAX_CONCURRENT_REQUESTS=512 | ||
devices: | ||
- "/dev/neuron0" | ||
- "/dev/neuron1" | ||
- "/dev/neuron2" | ||
- "/dev/neuron3" | ||
- "/dev/neuron4" | ||
- "/dev/neuron5" | ||
- "/dev/neuron6" | ||
- "/dev/neuron7" | ||
- "/dev/neuron8" | ||
- "/dev/neuron9" | ||
- "/dev/neuron10" | ||
- "/dev/neuron11" |
11 changes: 11 additions & 0 deletions
11
benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/tgi-results-batchsize-1.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
model_id,concurrent requests,throughput (t/s),Time-to-first-token @ P50 (s),average latency (ms) | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,1,30.170455300639418,0.7694021150018671,31.60879417184807 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,2,30.942167505908383,3.5238446079965797,42.88674224324184 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,4,31.216016638279726,11.17000110349909,70.63270124966144 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,8,31.442002397963858,28.047803349007154,138.61752441316904 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,16,31.622091010175804,60.1780687940045,290.1370155727129 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,32,31.734201827193452,123.7196121570014,523.1909448482422 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,64,31.72544803588566,250.5079138929941,1010.6170931343223 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,128,31.805759572717598,512.6742304505024,1997.340511562319 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,256,31.776200117214845,1025.654853393993,3954.0575741908333 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,512,31.715118036351587,2034.146784478493,8002.648273082725 |
11 changes: 11 additions & 0 deletions
11
benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/tgi-results.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
model_id,concurrent requests,throughput (t/s),Time-to-first-token @ P50 (s),average latency (ms) | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,1,18.818667211424472,1.3884793975012144,51.46871325828836 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,2,32.22257477833452,2.0121661404991755,56.734265583687296 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,4,50.19917175671667,5.205651430500438,66.04042245148653 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,8,52.13272738944358,9.568476632499369,97.32615035298838 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,16,53.59997031445967,26.087651531999654,191.19227161475598 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,32,56.08684244759754,61.25285707449984,310.16900484570965 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,64,57.40338464731561,129.3146581359997,560.2474255463762 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,128,58.39025853766574,267.3882590960002,1094.9986170264501 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,256,58.589480601098536,541.6153878579971,2147.5413489446523 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,512,58.69645477077839,1085.1772966810022,4231.7554182432905 |
8 changes: 8 additions & 0 deletions
8
benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/.env
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
#MODEL_ID='NousResearch/Meta-Llama-3-70B-Instruct' | ||
MODEL_ID='/data/exportedmodel' | ||
HF_AUTO_CAST_TYPE='fp16' | ||
MAX_BATCH_SIZE=4 | ||
MAX_INPUT_LENGTH=4000 | ||
MAX_TOTAL_TOKENS=4096 | ||
# MESSAGES_API_ENABLED='true' # Enable the messages API | ||
|
36 changes: 36 additions & 0 deletions
36
benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/docker-compose.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
version: '3.7' | ||
|
||
services: | ||
tgi-1: | ||
image: neuronx-tgi:latest | ||
ports: | ||
- "8080:8080" | ||
environment: | ||
- PORT=8080 | ||
- MODEL_ID=${MODEL_ID} | ||
- HF_AUTO_CAST_TYPE=${HF_AUTO_CAST_TYPE} | ||
- HF_NUM_CORES=32 | ||
- MAX_BATCH_SIZE=${MAX_BATCH_SIZE} | ||
- MAX_INPUT_LENGTH=${MAX_INPUT_LENGTH} | ||
- MAX_TOTAL_TOKENS=${MAX_TOTAL_TOKENS} | ||
- MAX_CONCURRENT_REQUESTS=512 | ||
volumes: | ||
- $PWD:/data | ||
devices: | ||
- "/dev/neuron0" | ||
- "/dev/neuron1" | ||
- "/dev/neuron2" | ||
- "/dev/neuron3" | ||
- "/dev/neuron4" | ||
- "/dev/neuron5" | ||
- "/dev/neuron6" | ||
- "/dev/neuron7" | ||
- "/dev/neuron8" | ||
- "/dev/neuron9" | ||
- "/dev/neuron10" | ||
- "/dev/neuron11" | ||
- "/dev/neuron12" | ||
- "/dev/neuron13" | ||
- "/dev/neuron14" | ||
- "/dev/neuron15" | ||
|
11 changes: 11 additions & 0 deletions
11
benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/tgi-results-batchsize-1.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
model_id,concurrent requests,throughput (t/s),Time-to-first-token @ P50 (s),average latency (ms) | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,1,38.29638310438374,0.5521726660008426,24.784959740501066 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,2,38.98036959617541,2.72243953349971,32.827924415254174 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,4,39.39299322930307,8.926065296996967,63.795771842799695 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,8,39.85480734427003,22.479033984491252,110.33245410384168 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,16,39.797703130119444,48.74777327400079,218.4971534548553 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,32,39.88112179496438,98.32968477499526,419.0164926030421 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,64,40.021570341867225,201.50347035600862,787.0418267487788 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,128,40.15190355766733,412.9219288924942,1608.1377339868322 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,256,40.10404829156176,831.7238280020028,3167.7755826448656 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,512,39.94606130182408,1654.066714687011,6348.469898092637 |
11 changes: 11 additions & 0 deletions
11
benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/tgi-results-batchsize-8.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
model_id,concurrent requests,throughput (t/s),Time-to-first-token @ P50 (s),average latency (ms) | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,1,17.8322790536497,0.9939256490033586,54.45429111182844 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,2,31.140113024869468,1.418605798491626,58.17940704286386 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,4,52.71447508703364,3.691673280511168,65.510341492747 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,8,85.23757246875635,7.40343523149204,79.86574747355823 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,16,83.41704442714865,12.134337133495137,119.80365178993138 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,32,86.31413401709217,33.19637775150477,221.51387761253872 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,64,91.54051788296289,78.17263232148252,378.5575452672668 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,128,93.59227409861985,163.85781266850245,709.4836254794548 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,256,94.49695504491365,332.89309809000406,1342.054465909721 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,512,94.76202310893393,671.8385370509932,2633.1926459323054 |
11 changes: 11 additions & 0 deletions
11
benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/tgi-results.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
model_id,concurrent requests,throughput (t/s),Time-to-first-token @ P50 (s),average latency (ms) | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,1,27.321283482983713,0.9897541589998582,34.53017190612728 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,2,47.14780790833105,1.4317841799993403,38.47682874008382 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,4,75.46880157534952,3.7293467640001836,45.219761063884626 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,8,76.656177664245,6.710071522500584,67.5562098563004 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,16,78.10745154737947,18.174910198499674,130.32796764867985 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,32,80.94695720514072,42.99618862100033,211.52529640942643 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,64,83.41961944293132,90.68870028399942,387.7336944140728 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,128,84.68410927601217,187.20342993849863,761.1909438667759 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,256,85.08930039980858,376.98190486400017,1484.3806421055476 | ||
huggingface/NousResearch/Meta-Llama-3-70B-Instruct,512,84.99711473871804,758.8232675055006,2947.3092666464 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
#!/bin/bash -v | ||
# This is made to be run on the Hugging Face DLAMI on an inferentia/trainium system | ||
|
||
# at the end of this script, run | ||
# python generate_csv.py | ||
|
||
# change the modelname on the next line. | ||
modelname=${1:-NousResearch/Llama-2-7b-chat-hf} | ||
echo on | ||
#set for your environment if not already set | ||
#export LLMPerf=/home/ubuntu/llmperf | ||
|
||
for concurrency in 1 2 4 8 16 32 64 128 256 512 | ||
do | ||
|
||
./benchmark.sh ${modelname} ${concurrency} | ||
|
||
|
||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters