This script runs 2 epochs by default on the SQuAD v1.1 dataset and extracts performance numbers for various batch sizes and sequence lengths in both FP16 and FP32. These numbers are saved at `/results/squad_inference_benchmark_bert_<bert_model>_gpu_<num_gpu>.log`.
This script runs 2 epochs by default on the SQuAD v1.1 dataset and extracts performance numbers for various batch sizes and sequence lengths in both FP16 and FP32. These numbers are saved at `/results/squad_train_benchmark_bert_<bert_model>_gpu_<num_gpu>.log`.
#### Inference performance benchmark
@ -650,7 +650,7 @@ Inference benchmarking can be performed by running the script:
This script runs 1024 eval iterations by default on the SQuAD v1.1 dataset and extracts performance and latency numbers for various batch sizes and sequence lengths in both FP16 with XLA and FP32 without XLA. These numbers are saved at `/results/squad_train_benchmark_bert_<bert_model>.log`.
This script runs 1024 eval iterations by default on the SQuAD v1.1 dataset and extracts performance and latency numbers for various batch sizes and sequence lengths in both FP16 with XLA and FP32 without XLA. These numbers are saved at `/results/squad_inference_benchmark_bert_<bert_model>.log`.
### Results
@ -1152,4 +1152,4 @@ March 2019
### Known issues
- There is a known performance regression with the 19.08 release on Tesla V100 boards with 16 GB memory, smaller batch sizes may be a better choice for this model on these GPUs with the 19.08 release. 32 GB GPUs are not affected.
- There is a known performance regression with the 19.08 release on Tesla V100 boards with 16 GB memory, smaller batch sizes may be a better choice for this model on these GPUs with the 19.08 release. 32 GB GPUs are not affected.
* [Pre-training training performance: multi-node on DGX-2 32G](#pre-training-training-performance-multi-node-on-dgx-2-32g)
* [Fine-tuning training performance for NER on DGX-2 32G](#fine-tuning-training-performance-for-ner-on-dgx-2-32g)
* [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
@ -86,7 +85,7 @@ To download and preprocess pre-training data as well as the required vocab files
bash biobert/scripts/biobert_data_download.sh
```
Datasets for finetuning can be obtained from this (repository)[https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1]
Datasets for finetuning can be obtained from this (repository)[https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1]
Place them in `/workspace/bert/data/biobert/` to be automatically picked up by our scripts.
@ -100,7 +99,7 @@ bash scripts/docker/launch.sh
5. Download the pre-trained checkpoint, vocabulary, and configuration files.
We have uploaded checkpoints for fine tuning and pre-training on BioMedical Corpus’s on the NGC Model Registry. You can download them directly from the [NGC model catalog](https://ngc.nvidia.com/catalog/models).
We have uploaded checkpoints for fine tuning and pre-training on BioMedical Corpus’s on the NGC Model Registry. You can download them directly from the [NGC model catalog](https://ngc.nvidia.com/catalog/models).
Place our `BioBERT checkpoints` in the `results/` to easily access it in your scripts.
This script runs inference on the test and dev sets and extracts performance and latency numbers for various batch sizes and sequence lengths in both FP16 with XLA and FP32 without XLA. These numbers are saved at `/results/tf_bert_biobert_<task>_training_benchmark__<bert_model>_<cased/uncased>_num_gpu_<num_gpu>_<DATESTAMP>`
### Results
## Results
The following sections provide detailed results of downstream fine-tuning task on NER and RE benchmark tasks.
#### Training accuracy results
##### Training accuracy
###### Pre-training accuracy
### Training accuracy results
#### Pre-training accuracy
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container.
Our results were obtained by running the `biobert/scripts/ner_bc5cdr-chem.sh` training script in the TensorFlow 19.08-py3 NGC container.
@ -457,9 +454,9 @@ Our results were obtained by running the `biobert/scripts/ner_bc5cdr-chem.sh` tr
| DGX-2 32G | 64 |93.66|93.47|12.26|8.16|
##### Training stability test
### Training stability test
###### Fine-tuning stability test:
#### Fine-tuning stability test:
The following tables compare F1 scores scores across 5 different training runs on the NER Chemical task with different seeds, for both FP16 and FP32. The runs showcase consistent convergence on all 5 seeds with very little deviation.
@ -469,14 +466,14 @@ The following tables compare F1 scores scores across 5 different training runs o
##### Training performance: NVIDIA DGX-1 (8x V100 16G)
###### Pre-training training performance: multi-node on DGX-1 16G
### Training performance results
#### Training performance: NVIDIA DGX-1 (8x V100 16G)
##### Pre-training training performance: multi-node on DGX-1 16G
Our results were obtained by running the `biobert/scripts/run_biobert.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput.
Note: The respective values for FP32 runs that use a batch size of 16, 2 in sequence lengths 128 and 512 respectively are not available due to out of memory errors that arise.
###### Fine-tuning training performance for NER on DGX-1 16G
##### Fine-tuning training performance for NER on DGX-1 16G
Our results were obtained by running the `biobert/scripts/ner_bc5cdr-chem.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the mean throughput from 2 epochs.
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
##### Training performance: NVIDIA DGX-1 (8x V100 32G)
###### Fine-tuning training performance for NER on DGX-1 32G
#### Training performance: NVIDIA DGX-1 (8x V100 32G)
##### Fine-tuning training performance for NER on DGX-1 32G
Our results were obtained by running the `biobert/scripts/ner_bc5cdr-chem.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 32G GPUs. Performance (in sentences per second) is the mean throughput from 2 epochs.
To achieve these same results, follow the [Quick Start Guide](#quick-start-guide) outlined above.
##### Training performance: NVIDIA DGX-2 (16x V100 32G)
###### Pre-training training performance: multi-node on DGX-2H 32G
#### Training performance: NVIDIA DGX-2 (16x V100 32G)
##### Pre-training training performance: multi-node on DGX-2H 32G
Our results were obtained by running the `biobert/scripts/run_biobert.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-2H with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
###### Fine-tuning training performance for NER on DGX-2 32G
##### Fine-tuning training performance for NER on DGX-2 32G
Our results were obtained by running the `biobert/scripts/ner_bc5cdr-chem.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the mean throughput from 2 epochs.