switch ordering in readme

This commit is contained in:
Swetha Mandava 2019-09-19 22:30:37 -07:00
parent 890fc1c143
commit 71fea240de

View file

@ -1081,7 +1081,6 @@ BERT BASE FP16
| 384 | 4 | 318.45 | 12.56 | 12.65 | 12.76 | 13.36 |
| 384 | 8 | 380.14 | 21.05 | 21.1 | 21.25 | 21.83 |
BERT BASE FP32
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
@ -1096,7 +1095,6 @@ BERT BASE FP32
| 384 | 8 | 139.75 | 57.25 | 57.74 | 58.08 | 59.53 |
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
##### Inference performance: NVIDIA DGX-2 (1x V100 32G)
@ -1126,7 +1124,6 @@ BERT LARGE FP16
| 384 | 4 | 121.04 | 33.05 | 33.08 | 33.31 | 34.97 |
| 384 | 8 | 142.03 | 56.33 | 56.46 | 57.49 | 59.85 |
BERT LARGE FP32
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
@ -1140,7 +1137,6 @@ BERT LARGE FP32
| 384 | 4 | 42.79 | 93.48 | 94.73 | 96.52 | 104.37 |
| 384 | 8 | 45.91 | 174.24 | 175.34 | 176.59 | 183.76 |
BERT BASE FP16
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
@ -1154,8 +1150,6 @@ BERT BASE FP16
| 384 | 4 | 318.45 | 12.56 | 12.65 | 12.76 | 13.36 |
| 384 | 8 | 380.14 | 21.05 | 21.1 | 21.25 | 21.83 |
BERT BASE FP32
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
@ -1176,6 +1170,32 @@ BERT BASE FP32
Our results were obtained by running the `scripts/finetune_inference_benchmark.sh` training script in the TensorFlow 19.06-py3 NGC container on NVIDIA Tesla T4 with 1x T4 16G GPUs. Performance numbers (throughput in sentences per second and latency in milliseconds) were averaged from 1024 iterations. Latency is computed as the time taken for a batch to process as they are fed in one after another in the model ie no pipelining.
BERT LARGE FP16
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
|-----------------|------------|------------------------------|---------------------|-----------------|-----------------|-----------------|
| 128 | 1 | 53.56 | 18.67 | 20.22 | 20.31 | 20.49 |
| 128 | 2 | 95.39 | 20.97 | 22.86 | 23.15 | 23.73 |
| 128 | 4 | 137.44 | 29.1 | 30.34 | 30.62 | 31.5 |
| 128 | 8 | 166.19 | 48.14 | 49.38 | 49.73 | 50.86 |
| 384 | 1 | 34.28 | 29.17 | 30.58 | 30.77 | 31.28 |
| 384 | 2 | 41.89 | 47.74 | 49.05 | 49.34 | 50 |
| 384 | 4 | 47.15 | 84.83 | 86.79 | 87.41 | 88.73 |
| 384 | 8 | 50.28 | 159.11 | 161.75 | 162.85 | 165.72 |
BERT LARGE FP32
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
|-----------------|------------|------------------------------|---------------------|-----------------|-----------------|-----------------|
| 128 | 1 | 40.34 | 24.79 | 26.97 | 27.38 | 28.21 |
| 128 | 2 | 45.17 | 44.27 | 46.01 | 46.6 | 47.68 |
| 128 | 4 | 47.39 | 84.41 | 86.31 | 86.92 | 88.14 |
| 128 | 8 | 46.98 | 170.29 | 173.35 | 174.15 | 175.48 |
| 384 | 1 | 14.07 | 71.06 | 73 | 73.42 | 73.99 |
| 384 | 2 | 14.91 | 134.17 | 136.72 | 137.51 | 138.66 |
| 384 | 4 | 14.44 | 277.03 | 281.89 | 282.63 | 284.41 |
| 384 | 8 | 14.95 | 534.94 | 540.45 | 542.32 | 544.75 |
BERT BASE FP16
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
@ -1203,34 +1223,6 @@ BERT BASE FP32
| 384 | 8 | 48.04 | 166.51 | 169.9 | 170.84 | 172.6 |
BERT LARGE FP16
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
|-----------------|------------|------------------------------|---------------------|-----------------|-----------------|-----------------|
| 128 | 1 | 53.56 | 18.67 | 20.22 | 20.31 | 20.49 |
| 128 | 2 | 95.39 | 20.97 | 22.86 | 23.15 | 23.73 |
| 128 | 4 | 137.44 | 29.1 | 30.34 | 30.62 | 31.5 |
| 128 | 8 | 166.19 | 48.14 | 49.38 | 49.73 | 50.86 |
| 384 | 1 | 34.28 | 29.17 | 30.58 | 30.77 | 31.28 |
| 384 | 2 | 41.89 | 47.74 | 49.05 | 49.34 | 50 |
| 384 | 4 | 47.15 | 84.83 | 86.79 | 87.41 | 88.73 |
| 384 | 8 | 50.28 | 159.11 | 161.75 | 162.85 | 165.72 |
BERT LARGE FP32
| Sequence Length | Batch Size | Throughput-Average(sent/sec) | Latency-Average(ms) | Latency-90%(ms) | Latency-95%(ms) | Latency-99%(ms) |
|-----------------|------------|------------------------------|---------------------|-----------------|-----------------|-----------------|
| 128 | 1 | 40.34 | 24.79 | 26.97 | 27.38 | 28.21 |
| 128 | 2 | 45.17 | 44.27 | 46.01 | 46.6 | 47.68 |
| 128 | 4 | 47.39 | 84.41 | 86.31 | 86.92 | 88.14 |
| 128 | 8 | 46.98 | 170.29 | 173.35 | 174.15 | 175.48 |
| 384 | 1 | 14.07 | 71.06 | 73 | 73.42 | 73.99 |
| 384 | 2 | 14.91 | 134.17 | 136.72 | 137.51 | 138.66 |
| 384 | 4 | 14.44 | 277.03 | 281.89 | 282.63 | 284.41 |
| 384 | 8 | 14.95 | 534.94 | 540.45 | 542.32 | 544.75 |
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
## Release notes