diff --git a/CUDA-Optimized/FastSpeech/README.md b/CUDA-Optimized/FastSpeech/README.md index e79fb89f..dc51a273 100644 --- a/CUDA-Optimized/FastSpeech/README.md +++ b/CUDA-Optimized/FastSpeech/README.md @@ -315,6 +315,8 @@ Sample result waveforms are [FP32](fastspeech/trt/samples) and [FP16](fastspeech ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/Kaldi/SpeechRecognition/README.md b/Kaldi/SpeechRecognition/README.md index aa76d831..f3a22646 100644 --- a/Kaldi/SpeechRecognition/README.md +++ b/Kaldi/SpeechRecognition/README.md @@ -192,6 +192,8 @@ you can set `count` to `1` in the [`instance_group` section](https://docs.nvidia ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Metrics diff --git a/MxNet/Classification/RN50v1.5/README.md b/MxNet/Classification/RN50v1.5/README.md index c8d22c35..e75d9a71 100644 --- a/MxNet/Classification/RN50v1.5/README.md +++ b/MxNet/Classification/RN50v1.5/README.md @@ -552,6 +552,8 @@ By default: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking To benchmark training and inference, run: diff --git a/PyTorch/Classification/ConvNets/efficientnet/README.md b/PyTorch/Classification/ConvNets/efficientnet/README.md index c25dc4b7..a6a45c65 100644 --- a/PyTorch/Classification/ConvNets/efficientnet/README.md +++ b/PyTorch/Classification/ConvNets/efficientnet/README.md @@ -492,6 +492,8 @@ Quantized models could also be used to classify new images using the `classify.p ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/Classification/ConvNets/resnet50v1.5/README.md b/PyTorch/Classification/ConvNets/resnet50v1.5/README.md index f5e22dda..40e619e6 100644 --- a/PyTorch/Classification/ConvNets/resnet50v1.5/README.md +++ b/PyTorch/Classification/ConvNets/resnet50v1.5/README.md @@ -498,6 +498,8 @@ To run inference on JPEG image using pretrained weights: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md index f578bf76..997e4c23 100644 --- a/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md +++ b/PyTorch/Classification/ConvNets/resnext101-32x4d/README.md @@ -481,6 +481,8 @@ To run inference on JPEG image using pretrained weights: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md index c8c89df7..825aa921 100644 --- a/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md +++ b/PyTorch/Classification/ConvNets/se-resnext101-32x4d/README.md @@ -483,6 +483,8 @@ To run inference on JPEG image using pretrained weights: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/Classification/ConvNets/triton/resnet50/README.md b/PyTorch/Classification/ConvNets/triton/resnet50/README.md index 2cc57444..fe8bdf4c 100644 --- a/PyTorch/Classification/ConvNets/triton/resnet50/README.md +++ b/PyTorch/Classification/ConvNets/triton/resnet50/README.md @@ -325,6 +325,8 @@ we can consider that all clients are local. ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Offline scenario This table lists the common variable parameters for all performance measurements: diff --git a/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md index 2addc5a0..ac6a591e 100644 --- a/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md +++ b/PyTorch/Classification/ConvNets/triton/resnext101-32x4d/README.md @@ -194,6 +194,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Dynamic batching performance The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable. diff --git a/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md b/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md index df0ecde2..25e19c38 100644 --- a/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md +++ b/PyTorch/Classification/ConvNets/triton/se-resnext101-32x4d/README.md @@ -195,6 +195,8 @@ To process static configuration logs, `triton/scripts/process_output.sh` script ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Dynamic batching performance The Triton Inference Server has a dynamic batching mechanism built-in that can be enabled. When it is enabled, the server creates inference batches from multiple received requests. This allows us to achieve better performance than doing inference on each single request. The single request is assumed to be a single image that needs to be inferenced. With dynamic batching enabled, the server will concatenate single image requests into an inference batch. The upper bound of the size of the inference batch is set to 64. All these parameters are configurable. diff --git a/PyTorch/Detection/SSD/README.md b/PyTorch/Detection/SSD/README.md index ca25079d..98d0b8c6 100644 --- a/PyTorch/Detection/SSD/README.md +++ b/PyTorch/Detection/SSD/README.md @@ -565,6 +565,8 @@ To use the inference example script in your own code, you can call the `main` fu ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/LanguageModeling/BERT/README.md b/PyTorch/LanguageModeling/BERT/README.md index d1cb5454..de8028f2 100755 --- a/PyTorch/LanguageModeling/BERT/README.md +++ b/PyTorch/LanguageModeling/BERT/README.md @@ -692,6 +692,8 @@ For SQuAD, to run inference interactively on question-context pairs, use the scr The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or GRPC endpoint, allowing remote clients to request inferencing for any model being managed by the server. More information on how to perform inference using NVIDIA Triton Inference Server can be found in [triton/README.md](./triton/README.md). ## Performance + +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). ### Benchmarking diff --git a/PyTorch/LanguageModeling/BERT/triton/README.md b/PyTorch/LanguageModeling/BERT/triton/README.md index 159fdd69..c238e83e 100644 --- a/PyTorch/LanguageModeling/BERT/triton/README.md +++ b/PyTorch/LanguageModeling/BERT/triton/README.md @@ -102,6 +102,8 @@ To make the machine wait until the server is initialized, and the model is ready ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + The numbers below are averages, measured on Triton on V100 32G GPU, with [static batching](https://docs.nvidia.com/deeplearning/sdk/tensorrt-inference-server-guide/docs/model_configuration.html#scheduling-and-batching). | Format | GPUs | Batch size | Sequence length | Throughput - FP32(sequences/sec) | Throughput - mixed precision(sequences/sec) | Throughput speedup (mixed precision/FP32) | diff --git a/PyTorch/LanguageModeling/Transformer-XL/README.md b/PyTorch/LanguageModeling/Transformer-XL/README.md index 5ba872a1..1ab0a50d 100644 --- a/PyTorch/LanguageModeling/Transformer-XL/README.md +++ b/PyTorch/LanguageModeling/Transformer-XL/README.md @@ -1113,6 +1113,8 @@ perplexity on the test dataset. ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model diff --git a/PyTorch/Recommendation/DLRM/README.md b/PyTorch/Recommendation/DLRM/README.md index 6f248f64..e7e65367 100644 --- a/PyTorch/Recommendation/DLRM/README.md +++ b/PyTorch/Recommendation/DLRM/README.md @@ -574,6 +574,8 @@ The NVIDIA Triton Inference Server provides a cloud inferencing solution optimiz ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/Recommendation/DLRM/triton/README.md b/PyTorch/Recommendation/DLRM/triton/README.md index c854a11b..13644ce2 100644 --- a/PyTorch/Recommendation/DLRM/triton/README.md +++ b/PyTorch/Recommendation/DLRM/triton/README.md @@ -192,6 +192,8 @@ For more information about `perf_client` please refer to [official documentation ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Throughput/Latency results Throughput is measured in recommendations/second, and latency in milliseconds. diff --git a/PyTorch/Recommendation/NCF/README.md b/PyTorch/Recommendation/NCF/README.md index 3fb85853..71b1fe71 100644 --- a/PyTorch/Recommendation/NCF/README.md +++ b/PyTorch/Recommendation/NCF/README.md @@ -379,6 +379,8 @@ The script will then: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking #### Training performance benchmark diff --git a/PyTorch/Segmentation/MaskRCNN/README.md b/PyTorch/Segmentation/MaskRCNN/README.md index e018298d..21060da9 100755 --- a/PyTorch/Segmentation/MaskRCNN/README.md +++ b/PyTorch/Segmentation/MaskRCNN/README.md @@ -484,6 +484,8 @@ __Note__: The score is always the Average Precision(AP) at - maxDets = 100 ## Performance + +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). ### Benchmarking Benchmarking can be performed for both training and inference. Both scripts run the Mask R-CNN model using the parameters defined in `configs/e2e_mask_rcnn_R_50_FPN_1x.yaml`. You can specify whether benchmarking is performed in FP16, TF32 or FP32 by specifying it as an argument to the benchmarking scripts. diff --git a/PyTorch/Segmentation/nnUNet/README.md b/PyTorch/Segmentation/nnUNet/README.md index 4c73002a..343b2b44 100755 --- a/PyTorch/Segmentation/nnUNet/README.md +++ b/PyTorch/Segmentation/nnUNet/README.md @@ -454,6 +454,8 @@ The script will then: ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks to measure the model performance in training and inference modes. diff --git a/PyTorch/Segmentation/nnUNet/triton/README.md b/PyTorch/Segmentation/nnUNet/triton/README.md index bb41f1c1..2d10b9b7 100644 --- a/PyTorch/Segmentation/nnUNet/triton/README.md +++ b/PyTorch/Segmentation/nnUNet/triton/README.md @@ -344,6 +344,8 @@ we can consider that all clients are local. ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Offline scenario This table lists the common variable parameters for all performance measurements: diff --git a/PyTorch/SpeechRecognition/Jasper/README.md b/PyTorch/SpeechRecognition/Jasper/README.md index 581883e7..30d52d4a 100644 --- a/PyTorch/SpeechRecognition/Jasper/README.md +++ b/PyTorch/SpeechRecognition/Jasper/README.md @@ -567,6 +567,8 @@ More information on how to perform inference using Triton Inference Server with ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model performance in training and inference modes. diff --git a/PyTorch/SpeechRecognition/Jasper/triton/README.md b/PyTorch/SpeechRecognition/Jasper/triton/README.md index 26438907..56c02d2c 100644 --- a/PyTorch/SpeechRecognition/Jasper/triton/README.md +++ b/PyTorch/SpeechRecognition/Jasper/triton/README.md @@ -274,6 +274,8 @@ For more information about `perf_client`, refer to the [official documentation]( ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Inference Benchmarking in Triton Inference Server To benchmark the inference performance on Volta Turing or Ampere GPU, run `bash triton/scripts/execute_all_perf_runs.sh` according to [Quick-Start-Guide](#quick-start-guide) Step 7. diff --git a/PyTorch/SpeechSynthesis/FastPitch/README.md b/PyTorch/SpeechSynthesis/FastPitch/README.md index 648f0dc9..94143d98 100644 --- a/PyTorch/SpeechSynthesis/FastPitch/README.md +++ b/PyTorch/SpeechSynthesis/FastPitch/README.md @@ -532,6 +532,8 @@ More examples are presented on the website with [samples](https://fastpitch.gith ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Benchmarking The following section shows how to run benchmarks measuring the model diff --git a/PyTorch/SpeechSynthesis/FastPitch/triton/README.md b/PyTorch/SpeechSynthesis/FastPitch/triton/README.md index 9c93fd46..19b6e2ae 100644 --- a/PyTorch/SpeechSynthesis/FastPitch/triton/README.md +++ b/PyTorch/SpeechSynthesis/FastPitch/triton/README.md @@ -342,6 +342,8 @@ we can consider that all clients are local. ## Performance +The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA’s latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). + ### Offline scenario diff --git a/PyTorch/SpeechSynthesis/Tacotron2/README.md b/PyTorch/SpeechSynthesis/Tacotron2/README.md index 7a605263..7e443b0f 100644 --- a/PyTorch/SpeechSynthesis/Tacotron2/README.md +++ b/PyTorch/SpeechSynthesis/Tacotron2/README.md @@ -524,6 +524,8 @@ python inference.py --tacotron2 --waveglow