SSD/TF bug fixing and README updates

This commit is contained in:
Przemek Strzelczyk 2019-11-27 16:45:11 +01:00
parent 437eaba2bc
commit 1164331725
27 changed files with 327 additions and 52 deletions

View file

@ -3,40 +3,41 @@
This repository provides a script and recipe to train SSD320 v1.2 to achieve state of the art accuracy, and is tested and maintained by NVIDIA.
## Table Of Contents
* [The model](#the-model)
* [Model overview](#model-overview)
* [Default configuration](#default-configuration)
* [Setup](#setup)
* [Requirements](#requirements)
* [Quick start guide](#quick-start-guide)
* [Details](#details)
* [Quick Start Guide](#quick-start-guide)
* [Advanced](#advanced)
* [Command line options](#command-line-options)
* [Getting the data](#getting-the-data)
* [Training process](#training-process)
* [Data preprocessing](#data-preprocessing)
* [Data augmentation](#data-augmentation)
* [Enabling mixed precision](#enabling-mixed-precision)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training performance results](#training-performance-results)
* [Inference performance results](#inference-performance-results)
* [Changelog](#changelog)
* [Known issues](#known-issues)
* [Performance](#performance)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training performance results](#training-performance-results)
* [Inference performance results](#inference-performance-results)
* [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
## The model
## Model overview
The SSD320 v1.2 model is based on the [SSD: Single Shot MultiBox Detector](https://arxiv.org/abs/1512.02325) paper, which describes SSD as “a method for detecting objects in images using a single deep neural network”.
We have altered the network in order to improve accuracy and increase throughput. Changes we have made include:
Our implementation is based on the existing [model from the TensorFlow models repository](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config).
The network was altered in order to improve accuracy and increase throughput. Changes include:
- Replacing the VGG backbone with the more popular ResNet50.
- Adding multi-scale detection to the backbone using [Feature Pyramid Networks](https://arxiv.org/pdf/1612.03144.pdf).
- Replacing the original hard negative mining loss function with [Focal Loss](https://arxiv.org/pdf/1708.02002.pdf).
- Decreasing the input size to 320 x 320.
Our implementation is based on the existing [model from the TensorFlow models repository](https://github.com/tensorflow/models/blob/master/research/object_detection/samples/configs/ssd_resnet50_v1_fpn_shared_box_predictor_640x640_coco14_sync.config).
This model trains with mixed precision tensor cores on NVIDIA Volta GPUs, therefore you can get results much faster than training without tensor cores. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
The following features were implemented in this model:
@ -160,7 +161,7 @@ If you want to run inference with tensor cores acceleration, run:
bash examples/SSD320_evaluate.sh <path to checkpoint>
```
## Details
## Advanced
The following sections provide greater details of the dataset, running training and inference, and the training results.
@ -230,10 +231,12 @@ For information about:
- How to access and enable AMP for TensorFlow, see [Using TF-AMP](https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tfamp) from the TensorFlow User Guide.
- Techniques used for mixed precision training, see the [Mixed-Precision Training of Deep Neural Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) blog.
## Benchmarking
## Performance
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
### Training performance benchmark
#### Training performance benchmark
Training benchmark was run in various scenarios on V100 16G GPU. For each scenario, batch size was set to 32.
To benchmark training, run:
@ -247,7 +250,7 @@ Where the `{NGPU}` defines number of GPUs used in benchmark, and the `{PREC}` de
The benchmark runs training with only 1200 steps and computes average training speed of last 300 steps.
### Inference performance benchmark
#### Inference performance benchmark
Inference benchmark was run with various batch-sizes on V100 16G GPU.
For inference we are using single GPU setting. Examples are taken from the validation dataset.
@ -267,11 +270,11 @@ We were using default values for the extra arguments during the experiments. For
bash examples/SSD320_FP16_inference.sh --help
```
## Results
### Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
### Training accuracy results
#### Training accuracy results
Our results were obtained by running the `./examples/SSD320_FP{16,32}_{1,4,8}GPU.sh` script in the TensorFlow-19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
All the results are obtained with batch size set to 32.
@ -288,7 +291,7 @@ Here are example graphs of FP32 and FP16 training on 8 GPU configuration:
![ValidationAccuracy](./img/validation_accuracy.png)
### Training performance results
#### Training performance results
Our results were obtained by running:
@ -311,7 +314,7 @@ Those results can be improved when [XLA](https://www.tensorflow.org/xla) is used
in conjunction with mixed precision, delivering up to 2x speedup over FP32 on a single GPU (~179 img/s).
However XLA is still considered experimental.
### Inference performance results
#### Inference performance results
Our results were obtained by running the `examples/SSD320_FP{16,32}_inference.sh` script in the TensorFlow-19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPUs.
@ -328,7 +331,9 @@ Our results were obtained by running the `examples/SSD320_FP{16,32}_inference.sh
To achieve same results, follow the [Quick start guide](#quick-start-guide) outlined above.
## Changelog
## Release notes
### Changelog
March 2019
* Initial release

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# SSD with Resnet 50 v1 FPN feature extractor, shared box predictor and focal
# loss (a.k.a Retinanet).
# See Lin et al, https://arxiv.org/abs/1708.02002

View file

@ -29,7 +29,7 @@ wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -xzf resnet_v1_50_2016_08_28.tar.gz
mkdir -p resnet_v1_50
mv resnet_v1_50.ckpt resnet_v1_50/model.ckpt
nvidia-docker run --rm -it -u 123 -v $COCO_DIR:/data/coco2017_tfrecords $CONTAINER bash -c '
docker run --rm -u 123 -v $COCO_DIR:/data/coco2017_tfrecords $CONTAINER bash -c '
# Create TFRecords
bash /workdir/models/research/object_detection/dataset_tools/download_and_preprocess_mscoco.sh \
/data/coco2017_tfrecords'

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=1
@ -14,7 +28,7 @@ TRAIN_LOG=$(python -u ./object_detection/model_main.py \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "Single GPU mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
GPUS=4

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=4
@ -24,7 +38,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
GPUS=8

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=8
@ -24,7 +38,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
export TF_ENABLE_AUTO_MIXED_PRECISION=1

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=1
@ -12,7 +26,7 @@ TRAIN_LOG=$(python -u ./object_detection/model_main.py \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "Single GPU single precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
GPUS=4

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=4
@ -22,7 +36,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
GPUS=8

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=8
@ -22,7 +36,7 @@ TRAIN_LOG=$(mpirun --allow-run-as-root \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
PERF=$(echo "$TRAIN_LOG" | sed -n 's|.*global_step/sec: \(\S\+\).*|\1|p' | python -c "import sys; x = sys.stdin.readlines(); x = [float(a) for a in x[int(len(x)*3/4):]]; print(32*$GPUS*sum(x)/len(x), 'img/s')")
mkdir -p $CKPT_DIR
echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")

View file

@ -1,3 +1,17 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
CHECKPINT_DIR=$1
TENSOR_OPS=0

View file

@ -8,3 +8,4 @@
Google Inc.
David Dao <daviddao@broad.mit.edu>
NVIDIA Inc.

View file

@ -743,9 +743,10 @@ def non_max_suppression(boxlist, thresh, max_output_size, scope=None):
raise ValueError('boxlist must be a BoxList')
if not boxlist.has_field('scores'):
raise ValueError('input boxlist must have \'scores\' field')
selected_indices = tf.image.non_max_suppression(
boxlist.get(), boxlist.get_field('scores'),
max_output_size, iou_threshold=thresh)
with tf.device('/CPU:0'):
selected_indices = tf.image.non_max_suppression(
boxlist.get(), boxlist.get_field('scores'),
max_output_size, iou_threshold=thresh)
return gather(boxlist, selected_indices)

View file

@ -593,8 +593,9 @@ class HardExampleMiner(object):
num_hard_examples = self._num_hard_examples
else:
num_hard_examples = detection_boxlist.num_boxes()
selected_indices = tf.image.non_max_suppression(
box_locations, image_losses, num_hard_examples, self._iou_threshold)
with tf.device('/CPU:0'):
selected_indices = tf.image.non_max_suppression(
box_locations, image_losses, num_hard_examples, self._iou_threshold)
if self._max_negatives_per_positive is not None and match:
(selected_indices, num_positives,
num_negatives) = self._subsample_selection_to_desired_neg_pos_ratio(

View file

@ -151,23 +151,25 @@ def multiclass_non_max_suppression(boxes,
if pad_to_max_output_size:
max_selection_size = max_size_per_class
selected_indices, num_valid_nms_boxes = (
tf.image.non_max_suppression_padded(
with tf.device('/CPU:0'):
selected_indices, num_valid_nms_boxes = (
tf.image.non_max_suppression_padded(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_to_max_output_size=True))
else:
max_selection_size = tf.minimum(max_size_per_class,
boxlist_and_class_scores.num_boxes())
with tf.device('/CPU:0'):
selected_indices = tf.image.non_max_suppression(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh,
pad_to_max_output_size=True))
else:
max_selection_size = tf.minimum(max_size_per_class,
boxlist_and_class_scores.num_boxes())
selected_indices = tf.image.non_max_suppression(
boxlist_and_class_scores.get(),
boxlist_and_class_scores.get_field(fields.BoxListFields.scores),
max_selection_size,
iou_threshold=iou_thresh,
score_threshold=score_thresh)
score_threshold=score_thresh)
num_valid_nms_boxes = tf.shape(selected_indices)[0]
selected_indices = tf.concat(
[selected_indices,

View file

@ -82,7 +82,7 @@ def main(unused_argv):
flags.mark_flag_as_required('model_dir')
flags.mark_flag_as_required('pipeline_config_path')
session_config = tf.ConfigProto()
session_config.gpu_options.allow_growth = True
session_config.gpu_options.per_process_gpu_memory_fraction=0.9
session_config.gpu_options.visible_device_list = str(hvd.local_rank())
if FLAGS.allow_xla:
session_config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1

View file

@ -1,3 +1,2 @@
pillow==6.2.0
pycocotools==2.0.0
contextlib2