Model Preparation

Clone the repository

git clone
cd DeepLearningExamples

You will build our ConversationalAI in the Tacotron2 folder:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai

Download checkpoints

Download the PyTorch checkpoints from NGC:


Move the downloaded checkpoints to models directory:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai nvidia_tacotron2pyt_fp16_20190427 nvidia_waveglow256pyt_fp16 models/

Prepare Jasper

First, let's generate a TensorRT engine for Jasper using TensorRT version 7.

Download the Jasper checkpoint from NGC and move it to Jasper/checkpoints/ direcotry:

mkdir -p DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/checkpoints
mv DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/checkpoints

Apply a patch to enable support of TensorRT 7:

cd DeepLearningExamples/ 
git apply --ignore-space-change --reject --whitespace=fix ../patch_jasper_trt7

Now, build a container for Jasper:

cd DeepLearningExamples/PyTorch/SpeechRecognition/Jasper/
bash trt/scripts/docker/

To run the container, type:

cd DeepLearningExamples/PyTorch/SpeechRecognition/Jasper
export JASPER_DIR=${PWD}
export DATA_DIR=$JASPER_DIR/data/
export CHECKPOINT_DIR=$JASPER_DIR/checkpoints/
export RESULT_DIR=$JASPER_DIR/results/
bash trt/scripts/docker/ $DATA_DIR $CHECKPOINT_DIR $RESULT_DIR

Inside the container export Jasper TensorRT engine by executing:

mkdir -p /results/onnxs/ /results/engines/
cd /jasper
python trt/ --batch_size 1 --engine_batch_size 1 --model_toml configs/jasper10x5dr_nomask.toml --ckpt_path /checkpoints/ --trt_fp16 --pyt_fp16 --engine_path /results/engines/fp16_DYNAMIC.engine --onnx_path /results/onnxs/fp32_DYNAMIC.onnx --seq_len 3600 --make_onnx

After successful export, copy the engine to model_repo:

cd DeepLearningExamples/Pytorch
mkdir -p SpeechSynthesis/Tacotron2/notebooks/conversationalai/model_repo/jasper-trt/1
cp SpeechRecognition/Jasper/results/engines/fp16_DYNAMIC.engine SpeechSynthesis/Tacotron2/notebooks/conversationalai/model_repo/jasper-trt/1/jasper_fp16.engine

You will also need Jasper feature extractor and decoder. Download them from NGC and move to the model_repo:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/model_repo/
mkdir -p jasper-decoder/1 jasper-feature-extractor/1
wget -P jasper-decoder/
wget -P jasper-decoder/1/
wget -P jasper-feature-extractor/
wget -P jasper-feature-extractor/1/

Prepare BERT

With the generated Jasper model, we can proceed to BERT.

Download the BERT checkpoint from NGC and move it to BERT/checkpoints/ direcotry:

mkdir -p DeepLearningExamples/PyTorch/LanguageModeling/BERT/checkpoints/
mv DeepLearningExamples/PyTorch/LanguageModeling/BERT/checkpoints/

Now, build a container for BERT:

cd PyTorch/LanguageModeling/BERT/
bash scripts/docker/

Use the Triton export script to convert the model checkpoints/ to ONNX:

bash triton/

The model will be saved in results/triton_models/bertQA-onnx, together with Triton configuration file. Copy the model and configuration file to the model_repo:

cd DeepLearningExamples
cp -r PyTorch/LanguageModeling/BERT/results/triton_models/bertQA-onnx DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/model_repo/

Prepare Tacotron 2 and WaveGlow

Now to the final part - TTS system.

Download the Tacotron 2 and WaveGlow checkpoints from NGC and move them to Tacotron2/checkpoints/ direcotry:

mkdir -p DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/checkpoints/
mv nvidia_tacotron2pyt_fp16_20190427 nvidia_waveglow256pyt_fp16 DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/checkpoints/

Build the Tacotron 2 container:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/
bash scripts/docker/

Run the container in th interactive mode by typing:

bash scripts/docker/

Export Tacotron 2 to TorchScript:

cd /workspace/tacotron2/
mkdir -p output
python exports/ --tacotron2 checkpoints/nvidia_tacotron2pyt_fp16_20190427 -o output/ --amp-run

To export WaveGlow to TensorRT 7, install ONNX-TRT

cd /workspace && git clone
cd /workspace/onnx-tensorrt/ && git submodule update --init --recursive
cd /workspace/onnx-tensorrt && mkdir -p build
cd /workspace/onnx-tensorrt/build && cmake .. -DCMAKE_CXX_FLAGS=-isystem\\ /usr/local/cuda/include && make -j12 && make install
cd /workspace/tacotron2

Export WaveGlow to ONNX intermediate representation:

python exports/ --waveglow checkpoints/nvidia_waveglow256pyt_fp16 --wn-channels 256 --amp-run -o output/

Use the exported ONNX IR to generate TensorRT engine:

python trt/ --waveglow output/waveglow.onnx -o output/ --fp16

After successful export, exit the container and copy the Tacotron 2 model and the WaveGlow engine to model_repo:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/
mkdir -p notebooks/conversationalai/model_repo/tacotron2/1/ notebooks/conversationalai/model_repo/waveglow-trt/1/
cp output/ notebooks/conversationalai/model_repo/tacotron2/1/
cp output/waveglow_fp16.engine mnotebooks/conversationalai/odel_repo/waveglow-trt/1/


Will all models ready for deployment, go to the conversationalai/client folder and build the Triron client:

cd DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client
docker build -f Dockerfile --network=host -t speech_ai_client:demo .

From terminal start the Triton server:

NV_GPU=1 nvidia-docker run --ipc=host --network=host --rm -p8000:8000 -p8001:8001 \\
-v /home/gkarch/dev/gtc2020/speechai/model_repo/:/models trtserver --model-store=/models --log-verbose 1

In another another terminal run the client:

docker run -it --rm --network=host --device /dev/snd:/dev/snd --device /dev/usb:/dev/usb speech_ai_client:demo bash /workspace/speech_ai_demo/