updated Fine-tuning training performance table numbers for DGX A100

update has been made for Throughput- mixed precision number (8 GPUs) updates. -table columns for weak scaling were swapped. Fixed
2021-02-09 13:56:21 -08:00 · 2021-02-09 13:56:21 -08:00 · 36c0549c58
parent 004385d0ce
commit 36c0549c58
1 changed files with 2 additions and 2 deletions
--- a/TensorFlow/LanguageModeling/BERT/README.md
+++ b/TensorFlow/LanguageModeling/BERT/README.md
@ -939,8 +939,8 @@ Our results were obtained by running the `scripts/run_squad.sh` training script
 | **GPUs** | **Batch size / GPU: mixed precision, TF32** | **Throughput - mixed precision** | **Throughput - TF32** | **Throughput speedup (TF32 to mixed precision)** | **Weak scaling - TF32** | **Weak scaling - mixed precision** |
 |----------|---------------------------------------------|----------------------------------|-----------------------|--------------------------------------------------|-------------------------|------------------------------------|
 |        1 | 32, 16                                      |                           102.26 |                61.364 |                                             1.67 |                    1.00 |                               1.00 |
-|        4 | 32, 16                                      |                          366.353 |               223.187 |                                             1.64 |                    3.58 |                               3.64 |
-|        8 | 32, 16                                      |                          518.898 |                440.47 |                                             1.18 |                    5.07 |                               7.18 |
+|        4 | 32, 16                                      |                          366.353 |               223.187 |                                             1.64 |                    3.64 |                               3.58 |
+|        8 | 32, 16                                      |                          767.071 |                440.47 |                                             1.74 |                    7.18 |                               7.50 |

 Note: The respective values for TF32 runs that use a batch size of 32 are not available due to out of memory errors that arise.