Skip to content

Commit

Permalink
update perf number for rel-8.2
Browse files Browse the repository at this point in the history
Signed-off-by: Vincent Huang <[email protected]>
  • Loading branch information
ttyio authored and rajeevsrao committed Mar 29, 2022
1 parent 864502a commit 7e190be
Showing 1 changed file with 100 additions and 100 deletions.
200 changes: 100 additions & 100 deletions demo/BERT/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -434,78 +434,78 @@ Our results for BERT were obtained by running the `scripts/inference_benchmark.s
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 0.33 | 0.97 | 0.58 | 0.75 | 0.75 | 0.72 |
| 128 | 2 | 0.78 | 0.79 | 0.63 | 0.84 | 1.07 | 0.84 |
| 128 | 4 | 0.76 | 0.98 | 0.76 | 1.13 | 1.46 | 1.14 |
| 128 | 8 | 1.08 | 1.08 | 0.98 | 1.66 | 1.81 | 1.66 |
| 128 | 12 | 1.26 | 1.63 | 1.27 | 2.07 | 2.07 | 2.07 |
| 128 | 16 | 1.47 | 1.48 | 1.47 | 2.48 | 2.49 | 2.48 |
| 128 | 24 | 2.13 | 2.13 | 2.13 | 3.47 | 3.49 | 3.46 |
| 128 | 32 | 2.54 | 2.83 | 2.54 | 4.37 | 4.40 | 4.34 |
| 128 | 64 | 4.58 | 4.59 | 4.54 | 8.70 | 8.79 | 8.65 |
| 128 | 128 | 9.04 | 9.06 | 8.97 | 17.05 | 17.07 | 16.90 |
| 384 | 1 | 1.15 | 1.15 | 1.15 | 1.43 | 1.44 | 1.43 |
| 384 | 2 | 1.37 | 1.37 | 1.37 | 1.84 | 2.21 | 1.84 |
| 384 | 4 | 1.73 | 1.74 | 1.73 | 2.47 | 2.48 | 2.47 |
| 384 | 8 | 2.51 | 2.51 | 2.51 | 3.77 | 3.80 | 3.76 |
| 384 | 12 | 3.61 | 3.62 | 3.61 | 5.36 | 5.37 | 5.30 |
| 384 | 16 | 4.39 | 4.40 | 4.38 | 7.32 | 7.32 | 7.24 |
| 384 | 24 | 6.24 | 6.24 | 6.23 | 10.50 | 10.51 | 10.41 |
| 384 | 32 | 8.42 | 8.50 | 8.42 | 14.32 | 14.44 | 14.27 |
| 384 | 64 | 16.48 | 16.52 | 16.36 | 27.51 | 27.54 | 27.33 |
| 384 | 128 | 31.71 | 31.78 | 31.58 | | | |
| 128 | 1 | 0.72 | 0.72 | 0.59 | 0.66 | 0.81 | 0.65 |
| 128 | 2 | 0.68 | 0.68 | 0.64 | 0.97 | 0.97 | 0.79 |
| 128 | 4 | 0.99 | 0.99 | 0.79 | 1.02 | 1.29 | 1.02 |
| 128 | 8 | 0.94 | 1.21 | 0.94 | 1.38 | 1.39 | 1.38 |
| 128 | 12 | 1.22 | 1.23 | 1.22 | 1.91 | 1.92 | 1.91 |
| 128 | 16 | 1.40 | 1.40 | 1.40 | 2.19 | 2.20 | 2.19 |
| 128 | 24 | 1.93 | 1.94 | 1.92 | 3.37 | 3.38 | 3.34 |
| 128 | 32 | 2.48 | 2.48 | 2.47 | 4.08 | 4.14 | 4.07 |
| 128 | 64 | 4.31 | 4.31 | 4.27 | 8.08 | 8.09 | 8.00 |
| 128 | 128 | 8.37 | 8.38 | 8.31 | 16.14 | 16.21 | 16.02 |
| 384 | 1 | 1.15 | 1.47 | 1.15 | 1.30 | 1.65 | 1.31 |
| 384 | 2 | 1.34 | 1.72 | 1.35 | 1.66 | 1.67 | 1.66 |
| 384 | 4 | 1.69 | 1.70 | 1.69 | 2.27 | 2.28 | 2.27 |
| 384 | 8 | 2.29 | 2.30 | 2.28 | 3.67 | 3.70 | 3.66 |
| 384 | 12 | 3.46 | 3.46 | 3.45 | 5.06 | 5.08 | 5.01 |
| 384 | 16 | 4.20 | 4.20 | 4.19 | 6.73 | 6.75 | 6.67 |
| 384 | 24 | 5.94 | 5.95 | 5.94 | 9.86 | 9.87 | 9.75 |
| 384 | 32 | 7.93 | 7.94 | 7.92 | 13.56 | 13.61 | 13.44 |
| 384 | 64 | 15.48 | 15.49 | 15.39 | 26.09 | 26.26 | 25.94 |
| 384 | 128 | 29.92 | 29.95 | 29.68 | 51.65 | 51.71 | 51.02 |

##### BERT Large

| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.24 | 1.56 | 1.24 | 1.73 | 2.11 | 1.73 |
| 128 | 2 | 1.49 | 1.49 | 1.49 | 2.20 | 2.20 | 2.20 |
| 128 | 4 | 1.91 | 1.92 | 1.91 | 3.22 | 3.23 | 3.22 |
| 128 | 8 | 2.94 | 2.94 | 2.93 | 4.84 | 4.84 | 4.83 |
| 128 | 12 | 3.34 | 3.34 | 3.34 | 5.95 | 5.96 | 5.90 |
| 128 | 16 | 4.63 | 4.64 | 4.62 | 7.98 | 7.99 | 7.90 |
| 128 | 24 | 5.87 | 5.88 | 5.87 | 11.05 | 11.08 | 10.94 |
| 128 | 32 | 7.99 | 7.99 | 7.98 | 14.74 | 14.77 | 14.59 |
| 128 | 64 | 14.74 | 17.74 | 14.56 | 28.09 | 28.25 | 27.85 |
| 128 | 128 | 28.32 | 23.38 | 28.03 | 54.38 | 54.40 | 54.12 |
| 384 | 1 | 2.80 | 2.80 | 2.80 | 3.49 | 3.49 | 3.48 |
| 384 | 2 | 3.12 | 3.13 | 3.12 | 4.71 | 4.72 | 4.71 |
| 384 | 4 | 4.27 | 4.27 | 4.27 | 6.70 | 6.71 | 6.70 |
| 384 | 8 | 7.66 | 7.67 | 7.66 | 12.41 | 12.53 | 12.37 |
| 384 | 12 | 10.07 | 10.08 | 10.07 | 17.63 | 17.76 | 17.56 |
| 384 | 16 | 13.34 | 13.34 | 13.33 | 23.40 | 23.46 | 23.19 |
| 384 | 24 | 19.36 | 19.38 | 19.22 | 34.34 | 34.36 | 34.10 |
| 384 | 32 | 25.56 | 25.60 | 25.56 | 44.94 | 44.98 | 44.78 |
| 384 | 64 | 49.84 | 49.92 | 49.60 | 87.26 | 87.56 | 86.77 |
| 384 | 128 | 97.66 | 97.78 | 97.06 | 170.85 | 171.00 | 170.08 |
| 128 | 1 | 1.24 | 1.25 | 1.24 | 1.58 | 1.60 | 1.58 |
| 128 | 2 | 1.51 | 1.52 | 1.51 | 2.00 | 2.02 | 2.00 |
| 128 | 4 | 1.83 | 1.84 | 1.82 | 2.95 | 2.96 | 2.95 |
| 128 | 8 | 2.69 | 2.70 | 2.68 | 4.44 | 4.45 | 4.43 |
| 128 | 12 | 3.11 | 3.12 | 3.11 | 5.25 | 5.30 | 5.23 |
| 128 | 16 | 4.05 | 4.06 | 4.05 | 7.65 | 7.72 | 7.63 |
| 128 | 24 | 5.24 | 5.25 | 5.23 | 10.14 | 10.16 | 10.09 |
| 128 | 32 | 7.01 | 7.07 | 7.01 | 13.89 | 13.89 | 13.77 |
| 128 | 64 | 13.15 | 13.18 | 13.05 | 26.10 | 26.13 | 26.00 |
| 128 | 128 | 25.29 | 25.32 | 25.21 | 51.69 | 51.77 | 51.38 |
| 384 | 1 | 2.66 | 2.66 | 2.66 | 3.09 | 3.10 | 3.09 |
| 384 | 2 | 3.03 | 3.05 | 3.03 | 4.14 | 4.15 | 4.14 |
| 384 | 4 | 4.04 | 4.05 | 4.04 | 5.99 | 5.99 | 5.93 |
| 384 | 8 | 7.13 | 7.14 | 7.13 | 11.60 | 11.62 | 11.47 |
| 384 | 12 | 9.21 | 9.22 | 9.20 | 16.33 | 16.34 | 16.09 |
| 384 | 16 | 12.37 | 12.39 | 12.36 | 22.14 | 22.22 | 21.98 |
| 384 | 24 | 17.51 | 17.52 | 17.49 | 32.44 | 32.56 | 32.29 |
| 384 | 32 | 23.38 | 23.40 | 23.14 | 43.12 | 43.23 | 42.73 |
| 384 | 64 | 45.20 | 45.25 | 45.07 | 83.75 | 83.92 | 83.15 |
| 384 | 128 | 88.18 | 88.26 | 88.01 | 163.61 | 164.08 | 162.62 |

##### Megatron Large with Sparsity

| Sequence Length | Batch Size | INT8 QAT Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.16 | 1.46 | 1.17 |
| 128 | 2 | 1.44 | 1.45 | 1.44 |
| 128 | 4 | 1.69 | 1.7 | 1.69 |
| 128 | 8 | 2.34 | 2.34 | 2.34 |
| 128 | 12 | 2.8 | 2.8 | 2.8 |
| 128 | 16 | 3.7 | 3.71 | 3.7 |
| 128 | 24 | 4.63 | 4.63 | 4.62 |
| 128 | 32 | 6.33 | 6.33 | 6.32 |
| 128 | 64 | 11.34 | 11.35 | 11.24 |
| 128 | 128 | 21.18 | 21.19 | 21.06 |
| 384 | 1 | 1.61 | 1.61 | 1.61 |
| 384 | 2 | 2.19 | 2.19 | 2.18 |
| 384 | 4 | 3.31 | 3.31 | 3.31 |
| 384 | 8 | 5.48 | 5.48 | 5.47 |
| 384 | 12 | 7.69 | 7.7 | 7.69 |
| 384 | 16 | 10.02 | 10.02 | 10.01 |
| 384 | 24 | 14.15 | 14.15 | 14.14 |
| 384 | 32 | 18.41 | 18.56 | 18.4 |
| 384 | 64 | 35.71 | 35.73 | 35.44 |
| 384 | 128 | 68.52 | 68.55 | 68.19 |
| 128 | 1 | 1.16 | 1.48 | 1.17 |
| 128 | 2 | 1.41 | 1.42 | 1.41 |
| 128 | 4 | 1.87 | 1.88 | 1.87 |
| 128 | 8 | 2.84 | 2.84 | 2.83 |
| 128 | 12 | 3.30 | 3.31 | 3.30 |
| 128 | 16 | 4.40 | 4.42 | 4.39 |
| 128 | 24 | 5.86 | 5.87 | 5.85 |
| 128 | 32 | 7.67 | 7.68 | 7.67 |
| 128 | 64 | 13.81 | 13.82 | 13.79 |
| 128 | 128 | 27.00 | 27.02 | 26.80 |
| 384 | 1 | 1.72 | 1.78 | 1.72 |
| 384 | 2 | 2.35 | 2.36 | 2.35 |
| 384 | 4 | 3.80 | 3.81 | 3.80 |
| 384 | 8 | 6.70 | 6.71 | 6.70 |
| 384 | 12 | 8.98 | 8.99 | 8.97 |
| 384 | 16 | 12.38 | 12.39 | 12.37 |
| 384 | 24 | 17.52 | 17.54 | 17.51 |
| 384 | 32 | 22.82 | 22.89 | 22.64 |
| 384 | 64 | 43.78 | 43.90 | 43.59 |
| 384 | 128 | 85.23 | 85.25 | 84.61 |


#### Inference performance: NVIDIA T4 (16GB)
Expand All @@ -517,49 +517,49 @@ Our results were obtained by running the `scripts/inference_benchmark.sh --gpu T
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 1.55 | 1.57 | 1.33 | 2.00 | 2.06 | 1.93 |
| 128 | 2 | 1.78 | 2.06 | 1.75 | 2.54 | 2.58 | 2.49 |
| 128 | 4 | 2.80 | 2.88 | 2.74 | 4.25 | 4.34 | 4.16 |
| 128 | 8 | 4.48 | 4.56 | 4.42 | 8.13 | 8.74 | 7.88 |
| 128 | 12 | 6.28 | 6.31 | 6.12 | 11.67 | 12.12 | 11.30 |
| 128 | 16 | 8.92 | 9.11 | 8.78 | 17.24 | 17.79 | 16.70 |
| 128 | 24 | 12.70 | 12.84 | 12.53 | 24.48 | 24.85 | 24.90 |
| 128 | 32 | 17.90 | 18.41 | 17.59 | 33.02 | 33.51 | 32.65 |
| 128 | 64 | 34.80 | 34.83 | 34.31 | 65.38 | 65.43 | 64.28 |
| 128 | 128 | 68.16 | 68.46 | 67.05 | 130.77 | 131.01 | 129.19 |
| 384 | 1 | 2.47 | 2.53 | 2.43 | 3.76 | 3.81 | 3.69 |
| 384 | 2 | 3.87 | 3.95 | 3.81 | 6.31 | 6.43 | 6.21 |
| 384 | 4 | 7.15 | 7.18 | 6.97 | 12.16 | 12.22 | 12.03 |
| 384 | 8 | 14.09 | 12.11 | 13.73 | 25.45 | 25.83 | 24.94 |
| 384 | 12 | 20.99 | 21.12 | 20.66 | 38.15 | 38.38 | 37.51 |
| 384 | 16 | 27.49 | 27.65 | 27.08 | 50.90 | 51.36 | 50.04 |
| 384 | 24 | 41.93 | 42.17 | 41.36 | 77.25 | 78.16 | 76.05 |
| 384 | 32 | 54.65 | 54.87 | 54.06 | 102.44 | 103.09 | 101.30 |
| 384 | 64 | 109.78 | 110.42 | 108.24 | 200.58 | 201.20 | 198.68 |
| 384 | 128 | 227.46 | 228.80 | 223.92 | 401.33 | 402.14 | 399.24 |
| 128 | 1 | 1.64 | 1.68 | 1.37 | 2.00 | 2.15 | 1.96 |
| 128 | 2 | 1.88 | 2.11 | 1.82 | 2.77 | 2.78 | 2.70 |
| 128 | 4 | 2.70 | 2.70 | 2.63 | 4.55 | 4.57 | 4.48 |
| 128 | 8 | 4.73 | 4.97 | 4.64 | 9.33 | 10.22 | 8.85 |
| 128 | 12 | 6.63 | 6.73 | 6.55 | 12.82 | 13.19 | 12.39 |
| 128 | 16 | 9.45 | 9.77 | 9.31 | 18.08 | 18.63 | 17.35 |
| 128 | 24 | 14.07 | 14.35 | 13.63 | 27.77 | 28.77 | 26.88 |
| 128 | 32 | 19.75 | 20.59 | 19.12 | 37.42 | 37.79 | 36.66 |
| 128 | 64 | 37.78 | 38.34 | 37.02 | 72.84 | 72.88 | 71.84 |
| 128 | 128 | 74.62 | 75.10 | 73.61 | 147.01 | 147.83 | 145.46 |
| 384 | 1 | 2.59 | 2.63 | 2.51 | 4.12 | 4.16 | 4.03 |
| 384 | 2 | 4.11 | 4.13 | 3.98 | 6.85 | 7.38 | 6.62 |
| 384 | 4 | 7.43 | 7.48 | 7.32 | 13.43 | 13.80 | 12.93 |
| 384 | 8 | 14.94 | 15.08 | 14.73 | 28.62 | 29.65 | 27.84 |
| 384 | 12 | 22.51 | 22.86 | 22.05 | 42.67 | 43.18 | 41.88 |
| 384 | 16 | 30.07 | 30.77 | 29.10 | 57.08 | 57.56 | 56.18 |
| 384 | 24 | 45.34 | 45.90 | 44.62 | 87.62 | 88.20 | 85.69 |
| 384 | 32 | 60.10 | 60.50 | 58.77 | 118.02 | 118.76 | 115.03 |
| 384 | 64 | 121.20 | 121.69 | 118.76 | 235.94 | 237.30 | 230.79 |
| 384 | 128 | 243.66 | 244.15 | 242.68 | 447.69 | 448.97 | 445.64 |

##### BERT Large

| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
| 128 | 1 | 3.59 | 3.61 | 3.51 | 5.10 | 5.18 | 5.02 |
| 128 | 2 | 4.93 | 5.03 | 4.83 | 7.72 | 7.73 | 7.58 |
| 128 | 4 | 8.15 | 8.19 | 7.93 | 13.67 | 13.85 | 13.56 |
| 128 | 8 | 14.21 | 14.23 | 13.89 | 26.88 | 27.66 | 26.35 |
| 128 | 12 | 22.41 | 22.47 | 21.91 | 41.04 | 41.29 | 40.30 |
| 128 | 16 | 29.30 | 29.83 | 28.82 | 55.04 | 55.27 | 54.05 |
| 128 | 24 | 44.60 | 44.63 | 43.92 | 81.24 | 82.28 | 79.59 |
| 128 | 32 | 60.88 | 61.48 | 58.97 | 114.13 | 114.47 | 112.78 |
| 128 | 64 | 111.78 | 112.02 | 110.77 | 224.24 | 225.02 | 221.97 |
| 128 | 128 | 223.99 | 224.28 | 222.33 | 417.56 | 418.54 | 415.33 |
| 384 | 1 | 7.18 | 7.27 | 7.07 | 11.74 | 11.96 | 11.51 |
| 384 | 2 | 12.22 | 12.25 | 11.92 | 21.47 | 21.61 | 20.97 |
| 384 | 4 | 35.95 | 36.43 | 35.63 | 42.03 | 42.35 | 41.36 |
| 384 | 8 | 47.06 | 47.22 | 46.41 | 83.16 | 83.51 | 82.06 |
| 384 | 12 | 66.04 | 66.04 | 65.89 | 127.10 | 127.99 | 127.46 |
| 384 | 16 | 87.98 | 88.45 | 87.13 | 164.13 | 165.12 | 161.96 |
| 384 | 24 | 132.56 | 132.96 | 131.24 | 262.76 | 263.68 | 258.96 |
| 384 | 32 | 179.44 | 180.61 | 176.66 | 329.99 | 331.67 | 325.59 |
| 384 | 64 | 352.81 | 353.39 | 350.21 | 684.19 | 686.39 | 674.76 |
| 384 | 128 | 706.85 | 707.73 | 704.38 | 1318.74 | 1320.22 | 1315.10 |
| 128 | 1 | 3.72 | 3.81 | 3.67 | 5.69 | 5.76 | 5.57 |
| 128 | 2 | 5.08 | 5.27 | 5.00 | 8.52 | 8.64 | 8.40 |
| 128 | 4 | 8.51 | 8.54 | 8.32 | 14.93 | 15.23 | 14.53 |
| 128 | 8 | 14.77 | 14.89 | 14.58 | 28.77 | 29.22 | 28.25 |
| 128 | 12 | 23.07 | 23.22 | 22.74 | 46.08 | 46.10 | 45.28 |
| 128 | 16 | 30.29 | 30.56 | 29.57 | 60.23 | 60.93 | 58.35 |
| 128 | 24 | 48.58 | 48.70 | 47.77 | 90.67 | 91.77 | 89.92 |
| 128 | 32 | 64.13 | 64.80 | 63.15 | 117.89 | 118.47 | 116.12 |
| 128 | 64 | 127.74 | 128.46 | 125.80 | 243.10 | 243.52 | 241.59 |
| 128 | 128 | 242.26 | 242.86 | 240.10 | 465.64 | 466.77 | 463.31 |
| 384 | 1 | 7.50 | 7.54 | 7.31 | 12.56 | 12.67 | 12.37 |
| 384 | 2 | 12.46 | 12.58 | 12.23 | 23.09 | 23.11 | 22.57 |
| 384 | 4 | 24.93 | 25.10 | 24.58 | 47.41 | 47.43 | 46.68 |
| 384 | 8 | 50.73 | 50.95 | 49.83 | 93.40 | 94.03 | 92.25 |
| 384 | 12 | 72.95 | 73.36 | 71.97 | 140.44 | 141.25 | 138.23 |
| 384 | 16 | 95.85 | 96.26 | 94.21 | 186.44 | 187.16 | 184.91 |
| 384 | 24 | 145.08 | 145.57 | 143.04 | 281.55 | 282.20 | 279.78 |
| 384 | 32 | 188.62 | 189.24 | 187.12 | 375.30 | 375.91 | 372.80 |
| 384 | 64 | 376.59 | 377.52 | 374.39 | 760.16 | 760.96 | 757.81 |
| 384 | 128 | 758.68 | 759.85 | 754.89 | 1459.63 | 1460.42 | 1457.38 |

0 comments on commit 7e190be

Please sign in to comment.