Merge pull request #851 from GrzegorzKarchNV/gkarch/onnx-fixes

[Jasper/PyT] updated onnx runtime info
2021-03-02 13:04:35 +01:00 · 2021-03-02 13:04:35 +01:00 · b181af5e5b
parent 8d9f74b9bf 1b698c9a67
commit b181af5e5b
4 changed files with 8 additions and 6 deletions
--- a/PyTorch/SpeechRecognition/Jasper/images/static_fp16_16.7s.png
+++ b/PyTorch/SpeechRecognition/Jasper/images/static_fp16_16.7s.png
--- a/PyTorch/SpeechRecognition/Jasper/images/static_fp16_2s.png
+++ b/PyTorch/SpeechRecognition/Jasper/images/static_fp16_2s.png
--- a/PyTorch/SpeechRecognition/Jasper/images/static_fp16_7s.png
+++ b/PyTorch/SpeechRecognition/Jasper/images/static_fp16_7s.png
--- a/PyTorch/SpeechRecognition/Jasper/triton/README.md
+++ b/PyTorch/SpeechRecognition/Jasper/triton/README.md
@ -13,11 +13,10 @@ This subfolder of the Jasper for PyTorch repository contains scripts for  deploy
 - [Performance](#performance)
     * [Inference Benchmarking in Triton Inference Server](#inference-benchmarking-in-triton-inference-server)
     * [Results](#results)
-       * [Performance Analysis for Triton Inference Server: NVIDIA T4
-](#performance-analysis-for-triton-inference-server-nvidia-t4)
+       * [Performance Analysis for Triton Inference Server: NVIDIA T4](#performance-analysis-for-triton-inference-server-nvidia-t4)
       * [Maximum batch size](#maximum-batch-size)
            * [Batching techniques: Static versus Dynamic Batching](#batching-techniques-static-versus-dynamic)
-            * [TensorRT, ONNX, and PyTorch JIT comparisons](#tensorrt-onnx-and-pytorch-jit-comparisons)
+            * [TensorRT, ONNXRT-CUDA, and PyTorch JIT comparisons](#tensorrt-onnxrt-cuda-and-pytorch-jit-comparisons)
 - [Release Notes](#release-notes)
 	* [Changelog](#change-log)
 	* [Known issues](#known-issues)
@ -327,7 +326,7 @@ Figure 5: Triton pipeline - Latency & Throughput vs Concurrency using dynamic Ba
 ![](../images/tensorrt_16.7s.png)
 Figure 6: Triton pipeline - Latency & Throughput vs Concurrency using dynamic Batching at maximum server batch size = 8, max_queue_delay_microseconds = 5000, input audio length = 16.7 seconds, TensorRT backend.

-##### TensorRT, ONNX, and PyTorch JIT comparisons
+##### TensorRT, ONNXRT-CUDA, and PyTorch JIT comparisons

 The following tables show inference and latency comparisons across all 3 backends for mixed precision and static batching. The main observations are:
 Increasing the batch size leads to higher inference throughput and - latency up to a certain batch size, after which it slowly saturates.
@ -337,7 +336,7 @@ The longer the audio length, the lower the throughput and the higher the latency

 The following table shows the throughput benchmark results for all 3 model backends in Triton Inference Server using static batching under optimal concurrency

-|Audio length in seconds|Batch Size|TensorRT (inf/s)|PyTorch (inf/s)|ONNX (inf/s)|TensorRT/PyTorch Speedup|TensorRT/Onnx Speedup|
+|Audio length in seconds|Batch Size|TensorRT (inf/s)|PyTorch (inf/s)|ONNXRT-CUDA (inf/s)|TensorRT/PyTorch Speedup|TensorRT/ONNXRT-CUDA Speedup|
 |---    |---    |---    |---    |---    |---    |---    |
 |  2.0| 1|  49.67|  55.67|  41.67| 0.89| 1.19|
 |  2.0| 2|  98.67|  96.00|  77.33| 1.03| 1.28|
@ -356,7 +355,7 @@ The following table shows the throughput benchmark results for all 3 model backe

 The following table shows the throughput benchmark results for all 3 model backends in Triton Inference Server using static batching and a single concurrent request.

-|Audio length in seconds|Batch Size|TensorRT (ms)|PyTorch (ms)|ONNX (ms)|TensorRT/PyTorch Speedup|TensorRT/Onnx Speedup|
+|Audio length in seconds|Batch Size|TensorRT (ms)|PyTorch (ms)|ONNXRT-CUDA (ms)|TensorRT/PyTorch Speedup|TensorRT/ONNXRT-CUDA Speedup|
 |---    |---    |---    |---    |---    |---    |---    |
 |  2.0| 1|  23.61|  25.06| 31.84| 1.06| 1.35|
 |  2.0| 2|  24.56|  25.11| 37.54| 1.02| 1.53|
@ -375,6 +374,9 @@ The following table shows the throughput benchmark results for all 3 model backe

 ### Changelog

+March 2021
+* Updated ONNX runtime information
+
 February 2021
 * Updated Triton scripts for compatibility with Triton Inference Server version 2
 * Updated Quick Start Guide