SSD/TF bug fixing and README updates

2019-11-27 16:45:11 +01:00 · 2019-11-27 16:45:11 +01:00 · 1164331725
parent 437eaba2bc
commit 1164331725
27 changed files with 327 additions and 52 deletions
--- a/TensorFlow/Detection/SSD/README.md
+++ b/TensorFlow/Detection/SSD/README.md
@ -3,40 +3,41 @@
 This repository provides a script and recipe to train SSD320 v1.2 to achieve state of the art accuracy, and is tested and maintained by NVIDIA.

 ## Table Of Contents
-* [The model](#the-model)
+* [Model overview](#model-overview)
  * [Default configuration](#default-configuration)
 * [Setup](#setup)
  * [Requirements](#requirements)
-* [Quick start guide](#quick-start-guide)
-* [Details](#details)
+* [Quick Start Guide](#quick-start-guide)
+* [Advanced](#advanced)
  * [Command line options](#command-line-options)
  * [Getting the data](#getting-the-data)
  * [Training process](#training-process)
    * [Data preprocessing](#data-preprocessing)
    * [Data augmentation](#data-augmentation)
  * [Enabling mixed precision](#enabling-mixed-precision)
-* [Benchmarking](#benchmarking)
-  * [Training performance benchmark](#training-performance-benchmark)
-  * [Inference performance benchmark](#inference-performance-benchmark)
-* [Results](#results)
-  * [Training accuracy results](#training-accuracy-results)
-  * [Training performance results](#training-performance-results)
-  * [Inference performance results](#inference-performance-results)
-* [Changelog](#changelog)
-* [Known issues](#known-issues)
+* [Performance](#performance)
+  * [Benchmarking](#benchmarking)
+    * [Training performance benchmark](#training-performance-benchmark)
+    * [Inference performance benchmark](#inference-performance-benchmark)
+  * [Results](#results)
+    * [Training accuracy results](#training-accuracy-results)
+    * [Training performance results](#training-performance-results)
+    * [Inference performance results](#inference-performance-results)
+* [Release notes](#release-notes)
+  * [Changelog](#changelog)
+  * [Known issues](#known-issues)

-## The model
+## Model overview

 The SSD320 v1.2 model is based on the [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper, which describes SSD as “a method for detecting objects in images using a single deep neural network”.

-We have altered the network in order to improve accuracy and increase throughput. Changes we have made include:
+Our implementation is based on the existing [model from the TensorFlow models repository](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config).
+The network was altered in order to improve accuracy and increase throughput. Changes include:
 - Replacing the VGG backbone with the more popular ResNet50.
 - Adding multi-scale detection to the backbone using [Feature Pyramid Networks](https://arxiv.org/pdf/1612.03144.pdf).
 - Replacing the original hard negative mining loss function with [Focal Loss](https://arxiv.org/pdf/1708.02002.pdf).
 - Decreasing the input size to 320 x 320.

-Our implementation is based on the existing [model from the TensorFlow models repository](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config).
-
 This model trains with mixed precision tensor cores on NVIDIA Volta GPUs, therefore you can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.

 The following features were implemented in this model:
@ -160,7 +161,7 @@ If you want to run inference with tensor cores acceleration, run:
 bash examples/SSD320_evaluate.sh <path to checkpoint>
 ```

-## Details
+## Advanced

 The following sections provide greater details of the dataset, running training and inference, and the training results.

@ -230,10 +231,12 @@ For information about:
 - How to access and enable AMP for TensorFlow, see [Using TF-AMP](https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tfamp) from the TensorFlow User Guide.
 - Techniques used for mixed precision training, see the [Mixed-Precision Training of Deep Neural Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) blog.

-## Benchmarking
+## Performance
+
+### Benchmarking
 The following section shows how to run benchmarks measuring the model performance in training and inference modes.

-### Training performance benchmark
+#### Training performance benchmark
 Training benchmark was run in various scenarios on V100 16G GPU. For each scenario, batch size was set to 32. 

 To benchmark training, run:
@ -247,7 +250,7 @@ Where the `{NGPU}` defines number of GPUs used in benchmark, and the `{PREC}` de
 The benchmark runs training with only 1200 steps and computes average training speed of last 300 steps.


-### Inference performance benchmark
+#### Inference performance benchmark
 Inference benchmark was run with various batch-sizes on V100 16G GPU.
 For inference we are using single GPU setting. Examples are taken from the validation dataset.

@ -267,11 +270,11 @@ We were using default values for the extra arguments during the experiments. For
 bash examples/SSD320_FP16_inference.sh --help
 ```

-## Results
+### Results

 The following sections provide details on how we achieved our performance and accuracy in training and inference.

-### Training accuracy results
+#### Training accuracy results
 Our results were obtained by running the `./examples/SSD320_FP{16,32}_{1,4,8}GPU.sh` script in the TensorFlow-19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
 All the results are obtained with batch size set to 32.

@ -288,7 +291,7 @@ Here are example graphs of FP32 and FP16 training on 8 GPU configuration:

 ![ValidationAccuracy](./img/validation_accuracy.png)

-### Training performance results
+#### Training performance results

 Our results were obtained by running:

@ -311,7 +314,7 @@ Those results can be improved when [XLA](https://www.tensorflow.org/xla) is used
 in conjunction with mixed precision, delivering up to 2x speedup over FP32 on a single GPU (~179 img/s).
 However XLA is still considered experimental.

-### Inference performance results
+#### Inference performance results

 Our results were obtained by running the `examples/SSD320_FP{16,32}_inference.sh` script in the TensorFlow-19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPUs.

@ -328,7 +331,9 @@ Our results were obtained by running the `examples/SSD320_FP{16,32}_inference.sh

 To achieve same results, follow the [Quick start guide](#quick-start-guide) outlined above.

-## Changelog
+## Release notes
+
+### Changelog

 March 2019
 * Initial release
--- a/TensorFlow/Detection/SSD/configs/ssd320_bench.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_bench.config
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
 # loss (a.k.a Retinanet).
 # See Lin et al, https://arxiv.org/abs/1708.02002
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_1gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_1gpus.config
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
 # loss (a.k.a Retinanet).
 # See Lin et al, https://arxiv.org/abs/1708.02002
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_4gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_4gpus.config
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
 # loss (a.k.a Retinanet).
 # See Lin et al, https://arxiv.org/abs/1708.02002
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_8gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_8gpus.config
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 # SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
 # loss (a.k.a Retinanet).
 # See Lin et al, https://arxiv.org/abs/1708.02002
--- a/TensorFlow/Detection/SSD/download_all.sh
+++ b/TensorFlow/Detection/SSD/download_all.sh
@ -29,7 +29,7 @@ wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
 tar -xzf resnet_v1_50_2016_08_28.tar.gz
 mkdir -p resnet_v1_50
 mv resnet_v1_50.ckpt resnet_v1_50/model.ckpt
-nvidia-docker run --rm -it -u 123 -v $COCO_DIR:/data/coco2017_tfrecords $CONTAINER bash -c '
+docker run --rm -u 123 -v $COCO_DIR:/data/coco2017_tfrecords $CONTAINER bash -c '
 # Create TFRecords
 bash /workdir/models/research/object_detection/dataset_tools/download_and_preprocess_mscoco.sh \
    /data/coco2017_tfrecords'
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=1
@ -14,7 +28,7 @@ TRAIN_LOG=$(python -u ./object_detection/model_main.py \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
       "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "Single GPU mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
 GPUS=4
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=4
@ -24,7 +38,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
               "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
 GPUS=8
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=8
@ -24,7 +38,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
               "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_inference.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_inference.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=1
@ -12,7 +26,7 @@ TRAIN_LOG=$(python -u ./object_detection/model_main.py \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
       "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "Single GPU single precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
 GPUS=4
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=4
@ -22,7 +36,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
               "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
 GPUS=8
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU_BENCHMARK.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
 PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=8
@ -22,7 +36,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
               "${@:3}" 2>&1)
-PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")

 mkdir -p $CKPT_DIR
 echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_inference.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_inference.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")
--- a/TensorFlow/Detection/SSD/examples/SSD320_evaluate.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_evaluate.sh
@ -1,3 +1,17 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 CHECKPINT_DIR=$1

 TENSOR_OPS=0
--- a/TensorFlow/Detection/SSD/models/AUTHORS
+++ b/TensorFlow/Detection/SSD/models/AUTHORS
@ -8,3 +8,4 @@

 Google Inc.
 David Dao <daviddao@broad.mit.edu>
+NVIDIA Inc.
--- a/TensorFlow/Detection/SSD/models/research/object_detection/core/box_list_ops.py
+++ b/TensorFlow/Detection/SSD/models/research/object_detection/core/box_list_ops.py
@ -743,9 +743,10 @@ def non_max_suppression(boxlist, thresh, max_output_size, scope=None):
      raise ValueError('boxlist must be a BoxList')
    if not boxlist.has_field('scores'):
      raise ValueError('input boxlist must have \'scores\' field')
-    selected_indices = tf.image.non_max_suppression(
-        boxlist.get(), boxlist.get_field('scores'),
-        max_output_size, iou_threshold=thresh)
+    with tf.device('/CPU:0'):
+        selected_indices = tf.image.non_max_suppression(
+            boxlist.get(), boxlist.get_field('scores'),
+            max_output_size, iou_threshold=thresh)
    return gather(boxlist, selected_indices)


--- a/TensorFlow/Detection/SSD/models/research/object_detection/core/losses.py
+++ b/TensorFlow/Detection/SSD/models/research/object_detection/core/losses.py
@ -593,8 +593,9 @@ class HardExampleMiner(object):
        num_hard_examples = self._num_hard_examples
      else:
        num_hard_examples = detection_boxlist.num_boxes()
-      selected_indices = tf.image.non_max_suppression(
-          box_locations, image_losses, num_hard_examples, self._iou_threshold)
+      with tf.device('/CPU:0'):
+          selected_indices = tf.image.non_max_suppression(
+              box_locations, image_losses, num_hard_examples, self._iou_threshold)
      if self._max_negatives_per_positive is not None and match:
        (selected_indices, num_positives,
         num_negatives) = self._subsample_selection_to_desired_neg_pos_ratio(
--- a/TensorFlow/Detection/SSD/models/research/object_detection/core/post_processing.py
+++ b/TensorFlow/Detection/SSD/models/research/object_detection/core/post_processing.py
@ -151,23 +151,25 @@ def multiclass_non_max_suppression(boxes,

      if pad_to_max_output_size:
        max_selection_size = max_size_per_class
-        selected_indices, num_valid_nms_boxes = (
-            tf.image.non_max_suppression_padded(
+        with tf.device('/CPU:0'):
+            selected_indices, num_valid_nms_boxes = (
+                tf.image.non_max_suppression_padded(
+                    boxlist_and_class_scores.get(),
+                    boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
+                    max_selection_size,
+                    iou_threshold=iou_thresh,
+                    score_threshold=score_thresh,
+                    pad_to_max_output_size=True))
+      else:
+        max_selection_size = tf.minimum(max_size_per_class,
+                                        boxlist_and_class_scores.num_boxes())
+        with tf.device('/CPU:0'):
+            selected_indices = tf.image.non_max_suppression(
                boxlist_and_class_scores.get(),
                boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
                max_selection_size,
                iou_threshold=iou_thresh,
-                score_threshold=score_thresh,
-                pad_to_max_output_size=True))
-      else:
-        max_selection_size = tf.minimum(max_size_per_class,
-                                        boxlist_and_class_scores.num_boxes())
-        selected_indices = tf.image.non_max_suppression(
-            boxlist_and_class_scores.get(),
-            boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
-            max_selection_size,
-            iou_threshold=iou_thresh,
-            score_threshold=score_thresh)
+                score_threshold=score_thresh)
        num_valid_nms_boxes = tf.shape(selected_indices)[0]
        selected_indices = tf.concat(
            [selected_indices,
--- a/TensorFlow/Detection/SSD/models/research/object_detection/model_main.py
+++ b/TensorFlow/Detection/SSD/models/research/object_detection/model_main.py
@ -82,7 +82,7 @@ def main(unused_argv):
  flags.mark_flag_as_required('model_dir')
  flags.mark_flag_as_required('pipeline_config_path')
  session_config = tf.ConfigProto()
-  session_config.gpu_options.allow_growth = True
+  session_config.gpu_options.per_process_gpu_memory_fraction=0.9
  session_config.gpu_options.visible_device_list = str(hvd.local_rank())
  if FLAGS.allow_xla:
      session_config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
--- a/TensorFlow/Detection/SSD/requirements.txt
+++ b/TensorFlow/Detection/SSD/requirements.txt
@ -1,3 +1,2 @@
-pillow==6.2.0
 pycocotools==2.0.0
 contextlib2