History

Przemek Strzelczyk f0c8bc571a [Tacotron2/PyT} Updating for Ampere		2020-07-04 01:15:57 +02:00
..
export_onnx2trt.py	[Tacotron2/PyT} Updating for Ampere	2020-07-04 01:15:57 +02:00
inference_trt.py	[Tacotron2/PyT} Updating for Ampere	2020-07-04 01:15:57 +02:00
README.md	[Tacotron2/PyT} Updating for Ampere	2020-07-04 01:15:57 +02:00
run_latency_tests_trt.sh	fixing trt tests	2020-04-20 17:01:35 +02:00
test_infer_trt.py	fixed trt inference	2020-04-27 16:05:33 +02:00
trt_utils.py	fixed trt inference	2020-04-27 16:05:33 +02:00

README.md

Tacotron 2 and WaveGlow Inference with TensorRT

This is subfolder of the Tacotron 2 for PyTorch repository, tested and maintained by NVIDIA, and provides scripts to perform high-performance inference using NVIDIA TensorRT.

The Tacotron 2 and WaveGlow models form a text-to-speech (TTS) system that enables users to synthesize natural sounding speech from raw transcripts without any additional information such as patterns and/or rhythms of speech. More information about the TTS system and its training can be found in the Tacotron 2 PyTorch README.

NVIDIA TensorRT is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. After optimizing the compute-intensive acoustic model with NVIDIA TensorRT, inference throughput increased by up to 1.4x over native PyTorch in mixed precision.

Quick Start Guide

Clone the repository.

git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2

Download pretrained checkpoints from NGC and copy them to the ./checkpoints directory:

Tacotron2 checkpoint

WaveGlow checkpoint

mkdir -p checkpoints
cp <Tacotron2_and_WaveGlow_checkpoints> ./checkpoints/

Build the Tacotron 2 and WaveGlow PyTorch NGC container.
```
bash scripts/docker/build.sh
```
Start an interactive session in the NGC container to run training/inference. After you build the container image, you can start an interactive CLI session with:
```
bash scripts/docker/interactive.sh
```
Verify that TensorRT version installed is 7.0 or greater. If necessary, download and install the latest release from https://developer.nvidia.com/nvidia-tensorrt-download
```
pip list | grep tensorrt
dpkg -l | grep TensorRT
```
Export the models to ONNX intermediate representation (ONNX IR). Export Tacotron 2 to three ONNX parts: Encoder, Decoder, and Postnet:
```
mkdir -p output
python exports/export_tacotron2_onnx.py --tacotron2 ./checkpoints/nvidia_tacotron2pyt_fp16_20190427 -o output/ --fp16
```
Export WaveGlow to ONNX IR:
```
python exports/export_waveglow_onnx.py --waveglow ./checkpoints/nvidia_waveglow256pyt_fp16 --wn-channels 256 -o output/ --fp16
```
After running the above commands, there should be four new ONNX files in ./output/ directory: encoder.onnx, decoder_iter.onnx, postnet.onnx, and waveglow.onnx.
Export the ONNX IRs to TensorRT engines with fp16 mode enabled:
```
python trt/export_onnx2trt.py --encoder output/encoder.onnx --decoder output/decoder_iter.onnx --postnet output/postnet.onnx --waveglow output/waveglow.onnx -o output/ --fp16
```
After running the command, there should be four new engine files in ./output/ directory: encoder_fp16.engine, decoder_iter_fp16.engine, postnet_fp16.engine, and waveglow_fp16.engine.

Run TTS inference pipeline with fp16:

python trt/inference_trt.py -i phrases/phrase.txt --encoder output/encoder_fp16.engine --decoder output/decoder_iter_fp16.engine --postnet output/postnet_fp16.engine --waveglow output/waveglow_fp16.engine -o output/ --fp16

Inference performance: NVIDIA T4

Our results were obtained by running the ./trt/run_latency_tests_trt.sh script in the PyTorch-19.11-py3 NGC container. Please note that to reproduce the results, you need to provide pretrained checkpoints for Tacotron 2 and WaveGlow. Please edit the script to provide your checkpoint filenames. For all tests in this table, we used WaveGlow with 256 residual channels.

Framework	Batch size	Input length	Precision	Avg latency (s)	Latency std (s)	Latency confidence interval 90% (s)	Latency confidence interval 95% (s)	Latency confidence interval 99% (s)	Throughput (samples/sec)	Speed-up PyT+TRT/TRT	Avg mels generated (81 mels=1 sec of speech)	Avg audio length (s)	Avg RTF
PyT+TRT	1	128	FP16	1.02	0.05	1.09	1.10	1.14	150,439	1.59	602	6.99	6.86
PyT	1	128	FP16	1.63	0.07	1.71	1.73	1.81	94,758	1.00	601	6.98	4.30