Changes in TF models:

* added UNet for medical image segmentation
* added TF-AMP support for RN50
* small updates for other models (READMEs, benchmark & testing scripts)
This commit is contained in:
Przemek Strzelczyk 2019-05-25 01:23:11 +02:00
parent 2d33a72240
commit d2bc3da0a1
124 changed files with 4252 additions and 224 deletions

View file

@ -18,7 +18,9 @@ The examples are organized first by framework, such as TensorFlow, PyTorch, etc.
- __ResNet-50__ [[MXNet](https://github.com/NVIDIA/DeepLearningExamples/tree/master/MxNet/Classification/RN50v1.5)] [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/RN50v1.5)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/RN50v1.5)]
- __SSD__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Detection/SSD)]
- __Mask R-CNN__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Segmentation/MaskRCNN)]
- __U-Net__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial)]
- __U-Net(industrial)__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial)]
- __U-Net(medical)__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Medical)]
### Natural Language Processing
- __GNMT__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/GNMT)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Translation/GNMT)]

View file

@ -1,4 +1,4 @@
FROM nvcr.io/nvidia/tensorflow:19.01-py3
FROM nvcr.io/nvidia/tensorflow:19.05-py3
## MAINTAINER Paweł Sołtysiak <psoltysiak@nvidia.com>
ADD . /workspace/rn50v15_tf

View file

@ -1,11 +1,202 @@
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
1. Definitions.
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -267,10 +267,10 @@ To control warmup and benchmark length, use `--warmup_steps`, `--num_iter` and `
To benchmark the inference performance on a specific batch size, run:
* FP32
`python ./main.py --mode=inference_benchmark --precision=fp32 --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
`python ./main.py --mode=inference_benchmark --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
* FP16
`python ./main.py --mode=inference_benchmark --precision=fp16 --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
`python ./main.py --mode=inference_benchmark --use_tf_amp --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
Each of these scripts, by default runs 20 warm-up iterations and measures the next 80 iterations.
@ -307,9 +307,6 @@ Our results were obtained by running the `./scripts/benchmarking/DGX1V_trainbenc
Our results were obtained by running the `./scripts/benchmarking/DGX1V_inferbench_fp16.sh` and `./scripts/benchmarking/DGX1V_inferbench_fp32.sh` scripts in the tensorflow-19.02-py3 Docker container on NVIDIA DGX-1 with 8 V100 16G GPUs.
Those results can be improved when [XLA](https://www.tensorflow.org/xla) is used
in conjunction with mixed precision, delivering up to 3.3x speedup over FP32 on a single GPU (~1179 img/s).
However XLA is still considered experimental.
## Inference performance results
@ -331,5 +328,9 @@ However XLA is still considered experimental.
1. March 1, 2019
* Initial release
2. May 23, 2019
* TF-AMP support added
* Benchmark scripts updated
# Known issues
There are no known issues with this model.

View file

@ -1,4 +1,4 @@
# !/usr/bin/env python
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@ -63,13 +63,12 @@ if __name__ == "__main__":
weight_decay=FLAGS.weight_decay,
momentum=FLAGS.momentum,
loss_scale=FLAGS.loss_scale,
use_auto_loss_scaling=FLAGS.use_auto_loss_scaling,
use_static_loss_scaling=FLAGS.use_static_loss_scaling,
distort_colors=False,
# ======= Optimization HParams ======== #
use_xla=FLAGS.use_xla,
use_tf_amp=FLAGS.use_tf_amp,
use_fast_math=FLAGS.use_fast_math,
seed=FLAGS.seed,
)
@ -93,7 +92,6 @@ if __name__ == "__main__":
# ======= Optimization HParams ======== #
use_xla=RUNNING_CONFIG.use_xla,
use_tf_amp=RUNNING_CONFIG.use_tf_amp,
use_fast_math=RUNNING_CONFIG.use_fast_math,
seed=RUNNING_CONFIG.seed
)
@ -110,7 +108,7 @@ if __name__ == "__main__":
learning_rate_init=RUNNING_CONFIG.learning_rate_init,
momentum=RUNNING_CONFIG.momentum,
loss_scale=RUNNING_CONFIG.loss_scale,
use_auto_loss_scaling=FLAGS.use_auto_loss_scaling,
use_static_loss_scaling=FLAGS.use_static_loss_scaling,
is_benchmark=RUNNING_CONFIG.mode == 'training_benchmark',
)

View file

@ -56,7 +56,6 @@ class Runner(object):
# ======= Optimization HParams ======== #
use_xla=False,
use_tf_amp=False,
use_fast_math=False,
# ======== Debug Flags ======== #
debug_verbosity=0,
@ -105,34 +104,19 @@ class Runner(object):
os.environ['TF_DISABLE_NVTX_RANGES'] = '1'
# ============================================
# TF-AMP and Fast Math Setup - Do not remove
# TF-AMP Setup - Do not remove
# ============================================
if dtype == tf.float16:
if use_fast_math:
raise RuntimeError("Fast Math can not be activated for FP16 precision")
if use_tf_amp:
raise RuntimeError("TF AMP can not be activated for FP16 precision")
elif use_fast_math and use_tf_amp:
raise RuntimeError("TF AMP and Fast Math can not be activated simultaneously")
else:
if use_fast_math:
if hvd.rank() == 0:
LOGGER.log("Fast Math computation is activated - Experimental Feature")
os.environ["TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32"] = "1"
os.environ["TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32"] = "1"
os.environ["TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32"] = "1"
elif use_tf_amp:
if hvd.rank() == 0:
LOGGER.log("TF AMP is activated - Experimental Feature")
os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE"] = "1"
elif use_tf_amp:
if hvd.rank() == 0:
LOGGER.log("TF AMP is activated - Experimental Feature")
os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE"] = "1"
# =================================================
@ -150,7 +134,6 @@ class Runner(object):
run_config_performance = tf.contrib.training.HParams(
num_preprocessing_threads=32,
use_fast_math=use_fast_math,
use_tf_amp=use_tf_amp,
use_xla=use_xla,
)
@ -159,7 +142,7 @@ class Runner(object):
model_dir=model_dir if not hvd_utils.is_using_hvd() or hvd.rank() == 0 else None,
log_dir=log_dir if not hvd_utils.is_using_hvd() or hvd.rank() == 0 else None,
data_dir=data_dir,
num_preprocessing_threads=32,
num_preprocessing_threads=16,
)
self.run_hparams = Runner._build_hparams(model_hparams, run_config_additional, run_config_performance)
@ -311,7 +294,7 @@ class Runner(object):
momentum=0.9,
log_every_n_steps=1,
loss_scale=256,
use_auto_loss_scaling=False,
use_static_loss_scaling=False,
is_benchmark=False
):
@ -321,15 +304,14 @@ class Runner(object):
if self.run_hparams.data_dir is None and not is_benchmark:
raise ValueError('`data_dir` must be specified for training!')
if self.run_hparams.use_fast_math or self.run_hparams.use_tf_amp or self.run_hparams.dtype == tf.float16:
if use_auto_loss_scaling:
if self.run_hparams.use_tf_amp or self.run_hparams.dtype == tf.float16:
if use_static_loss_scaling:
os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "0"
else:
LOGGER.log("TF Loss Auto Scaling is activated - Experimental Feature")
os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "1"
else:
os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "0"
else:
use_auto_loss_scaling = False # Make sure it hasn't been set to True on FP32 training
use_static_loss_scaling = False # Make sure it hasn't been set to True on FP32 training
num_gpus = 1 if not hvd_utils.is_using_hvd() else hvd.size()
global_batch_size = batch_size * num_gpus
@ -407,7 +389,7 @@ class Runner(object):
'learning_rate_init': learning_rate_init,
'weight_decay': weight_decay,
'loss_scale': loss_scale,
'apply_loss_scaling': not use_auto_loss_scaling
'apply_loss_scaling': use_static_loss_scaling
}
image_classifier = self._get_estimator(

View file

@ -0,0 +1,19 @@
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches ResNet50 training in FP16 on 16 GPUs using 4096 batch size (256 per GPU)
# Usage ./RN50_FP16_16GPU.sh <path to this repository> <path to dataset> <path to results directory>
mpiexec --allow-run-as-root --bind-to socket -np 16 \
python $1/main.py --num_iter=90 --iter_unit=epoch --data_dir=$2 --batch_size=256 --use_tf_amp --results_dir=$3

View file

@ -0,0 +1,19 @@
# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches ResNet50 training in FP32 on 16 GPUs using 2048 batch size (128 per GPU)
## Usage ./RN50_FP32_16GPU.sh <path to this repository> <path to dataset> <path to results directory>
mpiexec --allow-run-as-root --bind-to socket -np 16 \
python $1/main.py --num_iter=90 --iter_unit=epoch --data_dir=$2 --batch_size=128 --results_dir=$3

View file

@ -2,4 +2,6 @@
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode inference --use_tf_amp --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_infer_fp16.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 192 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --perf_args "use_tf_amp" "use_xla" --data_dir $1 --results_dir $2/xla

View file

@ -2,4 +2,6 @@
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 96 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla

View file

@ -2,4 +2,6 @@
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode training --use_tf_amp --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_train_fp16.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json --data_dir $1 --perf_args "use_tf_amp" --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 192 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla

View file

@ -2,4 +2,6 @@
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 96 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla

View file

@ -0,0 +1,7 @@
#!/bin/bash
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla

View file

@ -0,0 +1,7 @@
#!/bin/bash
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla

View file

@ -0,0 +1,7 @@
#!/bin/bash
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla

View file

@ -0,0 +1,7 @@
#!/bin/bash
mkdir -p /tmp/results
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla

View file

@ -0,0 +1,51 @@
{
"metric_keys": [
"total_ips"
],
"metrics": {
"1": {
"16": {
"total_ips": 1300.0
},
"32": {
"total_ips": 1600.0
},
"1": {
"total_ips": 160.0
},
"2": {
"total_ips": 320.0
},
"64": {
"total_ips": 1800.0
},
"4": {
"total_ips": 550.0
},
"128": {
"total_ips": 1950.0
},
"8": {
"total_ips": 950.0
},
"256": {
"total_ips": 2050.0
}
}
},
"model": "",
"ngpus": [
1
],
"bs": [
1,
2,
4,
8,
16,
32,
64,
128,
256
]
}

View file

@ -0,0 +1,47 @@
{
"metric_keys": [
"total_ips"
],
"metrics": {
"1": {
"16": {
"total_ips": 800.0
},
"32": {
"total_ips": 920.0
},
"1": {
"total_ips": 150.0
},
"2": {
"total_ips": 270.0
},
"64": {
"total_ips": 1000.0
},
"4": {
"total_ips": 450.0
},
"128": {
"total_ips": 1075.0
},
"8": {
"total_ips": 650.0
}
}
},
"model": "",
"ngpus": [
1
],
"bs": [
1,
2,
4,
8,
16,
32,
64,
128
]
}

View file

@ -0,0 +1,63 @@
{
"metric_keys": [
"total_ips"
],
"metrics": {
"1": {
"64": {
"total_ips": 630.0
},
"128": {
"total_ips": 710.0
},
"256": {
"total_ips": 750.0
}
},
"4": {
"64": {
"total_ips": 2250.0
},
"128": {
"total_ips": 2600.0
},
"256": {
"total_ips": 2900.0
}
},
"8": {
"64": {
"total_ips": 4650.0
},
"128": {
"total_ips": 5500.0
},
"256": {
"total_ips": 6000.0
}
},
"16": {
"64": {
"total_ips": 9000.0
},
"128": {
"total_ips": 10500.0
},
"256": {
"total_ips": 11500.0
}
}
},
"model": "",
"ngpus": [
1,
4,
8,
16
],
"bs": [
64,
128,
256
]
}

View file

@ -0,0 +1,63 @@
{
"metric_keys": [
"total_ips"
],
"metrics": {
"1": {
"32": {
"total_ips": 300.0
},
"64": {
"total_ips": 330.0
},
"128": {
"total_ips": 350.0
}
},
"4": {
"32": {
"total_ips": 1050.0
},
"64": {
"total_ips": 1250.0
},
"128": {
"total_ips": 1350.0
}
},
"8": {
"32": {
"total_ips": 2100.0
},
"64": {
"total_ips": 2500.0
},
"128": {
"total_ips": 2700.0
}
},
"16": {
"32": {
"total_ips": 4100.0
},
"64": {
"total_ips": 5100.0
},
"128": {
"total_ips": 5500.0
}
}
},
"model": "",
"ngpus": [
1,
4,
8,
16
],
"bs": [
32,
64,
128
]
}

View file

@ -150,10 +150,10 @@ def parse_cmdline():
_add_bool_argument(
parser=p,
name="use_auto_loss_scaling",
name="use_static_loss_scaling",
default=False,
required=False,
help="Use AutoLossScaling in FP16, FP32 - Fast Math or FP32 AMP."
help="Use static loss scaling in FP16 or FP32 AMP."
)
_add_bool_argument(
@ -164,14 +164,6 @@ def parse_cmdline():
help="Enable XLA (Accelerated Linear Algebra) computation for improved performance."
)
#Enable FastMath Computation using TensorCores to speedup FP32 computation.
p.add_argument(
"--use_fast_math",
action='store_true',
required=False,
help=argparse.SUPPRESS
)
_add_bool_argument(
parser=p,
name="use_tf_amp",

View file

@ -95,8 +95,8 @@ def get_tfrecords_input_fn(filenames, batch_size, height, width, training, disto
ds = ds.apply(
tf.data.experimental.parallel_interleave(
tf.data.TFRecordDataset,
cycle_length=4,
block_length=16,
cycle_length=10,
block_length=8,
sloppy=not deterministic,
prefetch_input_elements=16
)
@ -109,7 +109,7 @@ def get_tfrecords_input_fn(filenames, batch_size, height, width, training, disto
return image_processing.preprocess_image_record(record, height, width, _NUM_CHANNELS, training)
ds = ds.cache()
if training:
ds = ds.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=shuffle_buffer_size, seed=seed))

View file

@ -1,4 +1,4 @@
FROM nvcr.io/nvidia/tensorflow:19.03-py3 as base
FROM nvcr.io/nvidia/tensorflow:19.05-py3 as base
FROM base as sha

View file

@ -108,7 +108,7 @@ Moreover the script will download pre-trained RN50 checkpoint in the `<checkpoin
### 4. Launch the NGC container to run training/inference.
```
nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <data_dir_path>:/data -v <checkpoint_dir_path>:/checkpoints --ipc=host nvidia_ssd
nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <data_dir_path>:/data/coco2017_tfrecords -v <checkpoint_dir_path>:/checkpoints --ipc=host nvidia_ssd
```
### 5. Start training.
@ -116,6 +116,7 @@ nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=6710
The `./examples` directory provides several sample scripts for various GPU settings and act as wrappers around
`object_detection/model_main.py` script. The example scripts can be modified by arguments:
- A path to directory for checkpoints
- A path to directory for configs
- Additional arguments to `object_detection/model_main.py`
If you want to run 8 GPUs, training with tensor cores acceleration and save checkpoints in `/checkpoints` directory, run:
@ -178,7 +179,7 @@ The SSD320 v1.2 model was trained on the COCO 2017 dataset. The val2017 validati
The `download_data.sh` script will preprocess the data to tfrecords format.
This repository contains the `download_dataset.sh` script which will automatically download and preprocess the training,
validation and test datasets. By default, data will be downloaded to the `/data` directory.
validation and test datasets. By default, data will be downloaded to the `/data/coco2017_tfrecords` directory.
### Training process
Training the SSD model is implemented in the `object_detection/model_main.py` script.
@ -331,6 +332,8 @@ To achieve same results, follow the [Quick start guide](#quick-start-guide) outl
March 2019
* Initial release
May 2019
* Test scripts updated
## Known issues
There are no known issues with this model.

View file

@ -172,7 +172,7 @@ train_config: {
train_input_reader: {
tf_record_input_reader {
input_path: "/data/*train*"
input_path: "/data/coco2017_tfrecords/*train*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
}
@ -185,7 +185,7 @@ eval_config: {
eval_input_reader: {
tf_record_input_reader {
input_path: "/data/*val*"
input_path: "/data/coco2017_tfrecords/*val*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
shuffle: false

View file

@ -172,7 +172,7 @@ train_config: {
train_input_reader: {
tf_record_input_reader {
input_path: "/data/*train*"
input_path: "/data/coco2017_tfrecords/*train*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
}
@ -185,7 +185,7 @@ eval_config: {
eval_input_reader: {
tf_record_input_reader {
input_path: "/data/*val*"
input_path: "/data/coco2017_tfrecords/*val*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
shuffle: false

View file

@ -172,7 +172,7 @@ train_config: {
train_input_reader: {
tf_record_input_reader {
input_path: "/data/*train*"
input_path: "/data/coco2017_tfrecords/*train*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
}
@ -185,7 +185,7 @@ eval_config: {
eval_input_reader: {
tf_record_input_reader {
input_path: "/data/*val*"
input_path: "/data/coco2017_tfrecords/*val*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
shuffle: false

View file

@ -172,7 +172,7 @@ train_config: {
train_input_reader: {
tf_record_input_reader {
input_path: "/data/*train*"
input_path: "/data/coco2017_tfrecords/*train*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
}
@ -185,7 +185,7 @@ eval_config: {
eval_input_reader: {
tf_record_input_reader {
input_path: "/data/*val*"
input_path: "/data/coco2017_tfrecords/*val*"
}
label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
shuffle: false

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -8,8 +8,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
time python -u /workdir/models/research/object_detection/model_main.py \
time python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}"

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=1
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,9 +9,13 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "Single GPU mixed precision training performance: " && \
python -u /workdir/models/research/object_detection/model_main.py \
TRAIN_LOG=$(python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "Single GPU mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_4gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
GPUS=4
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -19,8 +19,8 @@ time mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}"

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=4
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,8 +9,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "$GPUS GPUs mixed precision training performance: " && \
mpirun --allow-run-as-root \
TRAIN_LOG=$(mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
-bind-to none \
@ -20,8 +19,13 @@ mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_8gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
GPUS=8
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,6 +9,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
mkdir -p $CKPT_DIR
time mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
@ -19,8 +21,8 @@ time mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}" 2>&1 | tee $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=8
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,8 +9,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "$GPUS GPUs mixed precision training performance: " && \
mpirun --allow-run-as-root \
TRAIN_LOG=$(mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
-bind-to none \
@ -20,8 +19,13 @@ mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,4 +1,4 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -8,4 +8,4 @@ PYTHONPATH=$PYTHONPATH:$OBJECT_DETECTION
python $SCRIPT_DIR/SSD320_inference.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
"$@"
"${@:2}"

View file

@ -1,13 +1,13 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
TENSOR_OPS=0
export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
time python -u /workdir/models/research/object_detection/model_main.py \
time python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}"

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=1
TENSOR_OPS=0
@ -7,9 +7,13 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "Single GPU single precision training performance: " && \
python -u /workdir/models/research/object_detection/model_main.py \
TRAIN_LOG=$(python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "Single GPU single precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_4gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
GPUS=4
TENSOR_OPS=0
@ -17,8 +17,8 @@ time mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}"

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=4
TENSOR_OPS=0
@ -7,8 +7,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "$GPUS GPUs single precision training performance: " && \
mpirun --allow-run-as-root \
TRAIN_LOG=$(mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
-bind-to none \
@ -18,8 +17,13 @@ mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_8gpus.config
CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
GPUS=8
TENSOR_OPS=0
@ -7,6 +7,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
mkdir -p $CKPT_DIR
time mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
@ -17,8 +19,8 @@ time mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}"
"${@:3}" 2>&1 | tee $CKPT_DIR/train_log

View file

@ -1,5 +1,5 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
GPUS=8
TENSOR_OPS=0
@ -7,8 +7,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
echo -n "$GPUS GPUs single precision training performance: " && \
mpirun --allow-run-as-root \
TRAIN_LOG=$(mpirun --allow-run-as-root \
-np $GPUS \
-H localhost:$GPUS \
-bind-to none \
@ -18,8 +17,13 @@ mpirun --allow-run-as-root \
-x PATH \
-mca pml ob1 \
-mca btl ^openib \
python -u /workdir/models/research/object_detection/model_main.py \
python -u ./object_detection/model_main.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
--model_dir=${CKPT_DIR} \
--alsologtostder \
"${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
"${@:3}" 2>&1)
PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
mkdir -p $CKPT_DIR
echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
echo "$TRAIN_LOG" >> $CKPT_DIR/train_log

View file

@ -1,4 +1,4 @@
PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"
SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")
OBJECT_DETECTION=$(realpath $SCRIPT_DIR/../object_detection/)
@ -6,4 +6,4 @@ PYTHONPATH=$PYTHONPATH:$OBJECT_DETECTION
python $SCRIPT_DIR/SSD320_inference.py \
--pipeline_config_path=${PIPELINE_CONFIG_PATH} \
"$@"
"${@:2}"

View file

@ -13,6 +13,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from absl import flags
from time import time
@ -63,7 +65,8 @@ class TimingHook(tf.train.SessionRunHook):
self.start_time = time()
def log_progress(self):
print(len(self.times) - FLAGS.warmup_iters, '/', FLAGS.benchmark_iters, ' '*10, end='\r')
if sys.stdout.isatty():
print(len(self.times) - FLAGS.warmup_iters, '/', FLAGS.benchmark_iters, ' '*10, end='\r')
def after_run(self, *args, **kwargs):
super(TimingHook, self).after_run(*args, **kwargs)

View file

@ -1,23 +0,0 @@
TARGET_mAP=${TARGET_mAP:-0.0020408058}
TARGET_loss=${TARGET_loss:-2.2808013}
TOLERANCE=${TOLERANCE:-0.1}
PRECISION=${PRECISION:-FP16}
TRAIN_LOG=$(bash examples/SSD320_${PRECISION}_8GPU.sh /results/SSD320_${PRECISION}_8GPU --num_train_steps $((12500/27)) 2>&1 | tee /dev/tty)
mAP=$( echo $TRAIN_LOG | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
loss=$(echo $TRAIN_LOG | sed -n 's|.*Loss for final step: \(.*\)\.|\1|p' | tail -n1)
mAP_error=$( python -c "print(abs($TARGET_mAP - $mAP)/$mAP)")
loss_error=$(python -c "print(abs($TARGET_loss - $loss)/$loss)")
if [[ $mAP_error < $TOLERANCE && $loss_error < $TOLERANCE ]]
then
echo PASS
else
echo expected: mAP=$TARGET_mAP loss=$TARGET_loss
echo got: mAP=$mAP loss=$loss
echo FAIL
exit 1
fi

View file

@ -1 +0,0 @@
PRECISION=FP16 bash qa/testing_DGX1V_8GPU_1epoch.sh

View file

@ -1 +0,0 @@
PRECISION=FP32 bash qa/testing_DGX1V_8GPU_1epoch.sh

View file

@ -0,0 +1,26 @@
TARGET_mAP=${TARGET_mAP:-0.137}
TARGET_loss=${TARGET_loss:-2.3}
TOLERANCE=${TOLERANCE:-0.1}
PRECISION=${PRECISION:-FP16}
bash ../../examples/SSD320_${PRECISION}_8GPU_BENCHMARK.sh /results/SSD320_${PRECISION}_8GPU ../../configs
mAP=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
loss=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*Loss for final step: \(.*\)\.|\1|p' | tail -n1)
mAP_error=$( python -c "print(abs($TARGET_mAP - $mAP)/$mAP)")
loss_error=$(python -c "print(abs($TARGET_loss - $loss)/$loss)")
cat /results/SSD320_${PRECISION}_8GPU/train_log
echo expected: mAP=$TARGET_mAP loss=$TARGET_loss
echo got: mAP=$mAP loss=$loss
if [[ -n $mAP_error && $mAP_error < $TOLERANCE && -n $loss_error && $loss_error < $TOLERANCE ]]
then
echo PASS
else
echo FAIL
exit 1
fi

View file

@ -0,0 +1 @@
PRECISION=FP16 bash ../../qa/testing_DGX1V_accuracy.sh

View file

@ -0,0 +1 @@
PRECISION=FP32 bash ../../qa/testing_DGX1V_accuracy.sh

View file

@ -0,0 +1,20 @@
TARGET_mAP=${TARGET_mAP:-0.281}
TOLERANCE=${TOLERANCE:-0.04}
PRECISION=${PRECISION:-FP16}
bash ../../examples/SSD320_${PRECISION}_8GPU.sh /results/SSD320_${PRECISION}_8GPU ../../configs
mAP=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
mAP_error=$( python -c "print(abs($TARGET_mAP - $mAP)/$TARGET_mAP)")
echo expected: mAP=$TARGET_mAP
echo got: mAP=$mAP
if [[ $mAP_error < $TOLERANCE ]]
then
echo PASS
else
echo FAIL
exit 1
fi

View file

@ -0,0 +1 @@
PRECISION=FP16 bash ../../qa/testing_DGX1V_convergence.sh

View file

@ -0,0 +1 @@
PRECISION=FP32 bash ../../qa/testing_DGX1V_convergence.sh

View file

@ -1,11 +1,20 @@
#!/bin/bash
BASELINES=(193.6 135.2 171.5 188.3 187 187.6 191.4)
BASELINES=(93.6 136.3 172.1 190.8 188.2 189.4 192.2)
TOLERANCE=0.07
PRECISION=FP16
for i in `seq 0 6`
do
echo "Testing mixed precision inference speed on batch size = $((2 ** $i))"
bash examples/SSD320_FP16_inference.sh --batch_size $((2 ** $i)) > tmp 2> /dev/null
echo -n "img/s: "; tail -n 1 tmp | awk '{print $3}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($3 - BASELINE)^2)/$3}'`; echo $err
rm tmp
if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
BS=$((2 ** $i))
MSG="Testing mixed precision inference speed on batch size = $BS"
CMD="bash ../../examples/SSD320_${PRECISION}_inference.sh ../../configs --batch_size $BS"
if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
then
exit $?
fi
done
return $result

View file

@ -1,11 +1,20 @@
#!/bin/bash
BASELINES=(97.4 134 163 175.1 175.4 174.5 177.6)
BASELINES=(93.2 136.2 171.2 189.4 188.0 188.7 192.5)
PRECISION=FP32
TOLERANCE=0.07
for i in `seq 0 6`
do
echo "Testing single precision inference speed on batch size = $((2 ** $i))"
bash examples/SSD320_FP16_inference.sh --batch_size $((2 ** $i)) > tmp 2> /dev/null
echo -n "img/s: "; tail -n 1 tmp | awk '{print $3}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($3 - BASELINE)^2)/$3}'`; echo $err
rm tmp
if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
BS=$((2 ** $i))
MSG="Testing single precision inference speed on batch size = $BS"
CMD="bash ../../examples/SSD320_${PRECISION}_inference.sh ../../configs --batch_size $BS"
if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
then
exit $?
fi
done
return $result

View file

@ -0,0 +1,24 @@
if [[ -z $CMD || -z $BASELINE || -z $TOLERANCE ]]
then
echo some variables is not set
exit 1
fi
echo $MSG
RESULT=$($CMD)
imgps=$(echo $RESULT | tail -n 1 | awk '{print $3}')
LB_imgps=$(python -c "print($BASELINE * (1-$TOLERANCE))")
echo imgs/s: $imgps expected imgs/s: $BASELINE
echo accepted minimum: $LB_imgps
if [[ $imgps > $LB_imgps ]]
then
echo PASSED
else
echo $RESULT
echo FAILED
exit 1
fi

View file

@ -1,13 +1,21 @@
#!/bin/bash
BASELINES=(125 430 750)
BASELINES=(120 480 800)
GPUS=(1 4 8)
PRECISION=FP16
TOLERANCE=0.11
i=0
for GPUS in 1 4 8
for i in {1..4}
do
echo "Testing mixed precision training speed on $GPUS GPUs"
bash examples/SSD320_FP16_${GPUS}_BENCHMARK.sh > tmp 2> /dev/null
echo -n "img/s: "; tail -n 1 tmp | awk '{print $7}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($7 - BASELINE)^2)/$7}'`; echo $err
rm tmp
if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
i=$(($i + 1))
GPU=${GPUS[$i]}
MSG="Testing mixed precision training speed on $GPUS GPUs"
CMD="bash ../../examples/SSD320_FP16_${GPU}GPU_BENCHMARK.sh /results/SSD320_FP16_${GPU}GPU ../../configs"
if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
then
exit $?
fi
done
exit $result

View file

@ -1,13 +1,19 @@
#!/bin/bash
BASELINES=(87 330 569)
GPUS=(1 4 8)
PRECISION=FP32
TOLERANCE=0.11
i=0
for GPUS in 1 4 8
for i in {1..4}
do
echo "Testing single precision training speed on $GPUS GPUs"
bash examples/SSD320_FP32_${GPUS}_BENCHMARK.sh > tmp 2> /dev/null
echo -n "img/s: "; tail -n 1 tmp | awk '{print $7}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($7 - BASELINE)^2)/$7}'`; echo $err
rm tmp
if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
i=$(($i + 1))
GPU=${GPUS[$i]}
MSG="Testing mixed precision training speed on $GPUS GPUs"
CMD="bash ../../examples/SSD320_FP16_${GPU}GPU_BENCHMARK.sh /results/SSD320_FP16_${GPU}GPU ../../configs"
if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
then
exit $?
fi
done

View file

@ -18,7 +18,7 @@ then
download_1m
elif [[ ${DATASET_NAME} == "ml-20m" ]]
then
download_20m
download_20m
else
echo "Unsupported dataset name: $DATASET_NAME"
exit 1

View file

@ -14,7 +14,7 @@ set -e
DATASET_NAME=${1:-'ml-20m'}
RAW_DATADIR='/data'
CACHED_DATADIR='/data/cache/'${DATASET_NAME}
CACHED_DATADIR='/tmp/cache/'${DATASET_NAME}
# you can add another option to this case in order to support other datasets
case ${DATASET_NAME} in

View file

@ -16,7 +16,7 @@
#
# ==============================================================================
FROM nvcr.io/nvidia/tensorflow:19.03-py3
FROM nvcr.io/nvidia/tensorflow:19.05-py3
LABEL version="1.0" maintainer="Jonathan DEKHTIAR <jonathan.dekhtiar@nvidia.com>"

View file

@ -41,6 +41,7 @@ if __name__ == "__main__":
RUNNING_CONFIG = tf.contrib.training.HParams(
exec_mode=FLAGS.exec_mode,
save_eval_results_to_json=FLAGS.save_eval_results_to_json,
# ======= Directory HParams ======= #
log_dir=os.path.join(FLAGS.results_dir, "logs"),
@ -158,5 +159,6 @@ if __name__ == "__main__":
num_iter=RUNNING_CONFIG.num_iter if RUNNING_CONFIG.exec_mode != "train_and_evaluate" else 1,
warmup_steps=RUNNING_CONFIG.warmup_steps,
batch_size=RUNNING_CONFIG.batch_size,
is_benchmark=RUNNING_CONFIG.exec_mode == 'inference_benchmark'
is_benchmark=RUNNING_CONFIG.exec_mode == 'inference_benchmark',
save_eval_results_to_json=RUNNING_CONFIG.save_eval_results_to_json
)

View file

@ -22,6 +22,7 @@
from __future__ import print_function
import os
import json
import multiprocessing
import operator
import random
@ -509,7 +510,7 @@ class Runner(object):
if not hvd_utils.is_using_hvd() or hvd.local_rank() == 0:
LOGGER.log('Ending Model Training ...')
def evaluate(self, iter_unit, num_iter, batch_size, warmup_steps=50, is_benchmark=False):
def evaluate(self, iter_unit, num_iter, batch_size, warmup_steps=50, is_benchmark=False, save_eval_results_to_json=False):
if iter_unit not in ["epoch", "batch"]:
raise ValueError('`iter_unit` value is unknown: %s (allowed: ["epoch", "batch"])' % iter_unit)
@ -540,7 +541,7 @@ class Runner(object):
log_every=self.run_hparams.log_every_n_steps,
warmup_steps=warmup_steps,
is_training=False,
sample_dir=None
sample_dir=self.run_hparams.sample_dir
)
]
@ -630,5 +631,31 @@ class Runner(object):
LOGGER.log('TPR', tpr)
LOGGER.log('TNR', tnr)
if save_eval_results_to_json:
results_dict = {
'IoU': {
'0.75': str(eval_results["IoU_THS_0.75"]),
'0.85': str(eval_results["IoU_THS_0.85"]),
'0.95': str(eval_results["IoU_THS_0.95"]),
'0.99': str(eval_results["IoU_THS_0.99"]),
},
'TPR': {
'0.75': str(tpr[-4]),
'0.85': str(tpr[-3]),
'0.95': str(tpr[-2]),
'0.99': str(tpr[-1]),
},
'TNR': {
'0.75': str(tnr[-4]),
'0.85': str(tnr[-3]),
'0.95': str(tnr[-2]),
'0.99': str(tnr[-1]),
}
}
with open(os.path.join(self.run_hparams.model_dir, "..", "results.json"), 'w') as f:
json.dump(results_dict, f)
except KeyboardInterrupt:
print("Keyboard interrupt")

View file

@ -17,9 +17,11 @@
# This script launches UNet training in FP32-AMP on 1 GPU using 16 batch size (16 per GPU)
# Usage ./UNet_FP32AMP_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../main.py \
pip install ${BASEDIR}/../dllogger/
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training in FP32-AMP on 4 GPUs using 16 batch size (4 per GPU)
# Usage ./UNet_FP32AMP_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../dllogger/
mpirun \
-np 4 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../main.py \
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training in FP32-AMP on 8 GPUs using 16 batch size (2 per GPU)
# Usage ./UNet_FP32AMP_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../dllogger/
mpirun \
-np 8 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../main.py \
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,9 +17,11 @@
# This script launches UNet evaluation in FP32-AMP on 1 GPUs using 16 batch size
# Usage ./UNet_FP32AMP_EVAL.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../main.py \
pip install ${BASEDIR}/../dllogger/
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='evaluate' \

View file

@ -17,9 +17,11 @@
# This script launches UNet training in FP32 on 1 GPU using 16 batch size (16 per GPU)
# Usage ./UNet_FP32_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../main.py \
pip install ${BASEDIR}/../dllogger/
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training in FP32 on 4 GPUs using 16 batch size (4 per GPU)
# Usage ./UNet_FP32_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../dllogger/
mpirun \
-np 4 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../main.py \
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training in FP32 on 8 GPUs using 16 batch size (2 per GPU)
# Usage ./UNet_FP32_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../dllogger/
mpirun \
-np 8 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../main.py \
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='train_and_evaluate' \

View file

@ -17,9 +17,11 @@
# This script launches UNet evaluation in FP32 on 1 GPUs using 16 batch size
# Usage ./UNet_FP32_EVAL.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../main.py \
pip install ${BASEDIR}/../dllogger/
python ${BASEDIR}/../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='evaluate' \

View file

@ -17,9 +17,11 @@
# This script launches UNet evaluation benchmark in FP32-AMP on 1 GPUs using 16 batch size
# Usage ./DGX1v_evalbench_FP32AMP.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../../main.py \
pip install ${BASEDIR}/../../dllogger/
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='inference_benchmark' \

View file

@ -17,9 +17,11 @@
# This script launches UNet evaluation benchmark in FP32 on 1 GPUs using 16 batch size
# Usage ./DGX1v_evalbench_FP32.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../../main.py \
pip install ${BASEDIR}/../../dllogger/
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='inference_benchmark' \

View file

@ -17,9 +17,11 @@
# This script launches UNet training benchmark in FP32-AMP on 1 GPU using 16 batch size (16 per GPU)
# Usage ./DGX1v_trainbench_FP32AMP_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../../main.py \
pip install ${BASEDIR}/../../dllogger/
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training benchmark in FP32-AMP on 4 GPUs using 16 batch size (4 per GPU)
# Usage ./DGX1v_trainbench_FP32AMP_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../../dllogger/
mpirun \
-np 4 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../../main.py \
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training benchmark in FP32-AMP on 8 GPUs using 16 batch size (2 per GPU)
# Usage ./DGX1v_trainbench_FP32AMP_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../../dllogger/
mpirun \
-np 8 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../../main.py \
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -17,9 +17,11 @@
# This script launches UNet training benchmark in FP32 on 1 GPU using 16 batch size (16 per GPU)
# Usage ./DGX1v_trainbench_FP32_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
python ../../main.py \
pip install ${BASEDIR}/../../dllogger/
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training benchmark in FP32 on 4 GPUs using 16 batch size (4 per GPU)
# Usage ./DGX1v_trainbench_FP32_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../../dllogger/
mpirun \
-np 4 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../../main.py \
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -17,7 +17,9 @@
# This script launches UNet training benchmark in FP32 on 8 GPUs using 16 batch size (2 per GPU)
# Usage ./DGX1v_trainbench_FP32_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>
pip install ../../dllogger/
BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
pip install ${BASEDIR}/../../dllogger/
mpirun \
-np 8 \
@ -29,7 +31,7 @@ mpirun \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python ../../main.py \
python ${BASEDIR}/../../main.py \
--unet_variant='tinyUNet' \
--activation_fn='relu' \
--exec_mode='training_benchmark' \

View file

@ -95,6 +95,14 @@ def parse_cmdline():
help="""Directory in which to write training logs, summaries and checkpoints."""
)
_add_bool_argument(
parser=p,
name="save_eval_results_to_json",
default=False,
required=False,
help="Whether to save evaluation results in JSON format."
)
p.add_argument('--data_dir', required=False, default=None, type=str, help="Path to dataset directory")
p.add_argument(

View file

@ -20,6 +20,7 @@
# ==============================================================================
import os
import json
import time
import operator
@ -97,7 +98,7 @@ class ProfilerHook(tf.train.SessionRunHook):
# ==================== Samples ==================== #
if self._sample_dir is not None:
if self._sample_dir is not None and self._is_training:
additional_fetches["samples"] = {}
additional_fetches["samples"]["input_image"] = tf.get_default_graph(
@ -170,7 +171,7 @@ class ProfilerHook(tf.train.SessionRunHook):
LOGGER.log("False Positives:", run_values.results["confusion_matrix"]["fp"])
LOGGER.log("False Negatives:", run_values.results["confusion_matrix"]["fn"])
if self._sample_dir is not None:
if self._sample_dir is not None and self._is_training:
for key in sorted(run_values.results["samples"].keys(), key=operator.itemgetter(0)):
@ -208,3 +209,13 @@ class ProfilerHook(tf.train.SessionRunHook):
"\t[*] Total Processing Time: %dh %02dm %02ds\n" %
(avg_processing_speed, total_processing_hours, total_processing_minutes, total_processing_seconds)
)
perf_dict = {
'throughput': str(avg_processing_speed),
'processing_time': str(total_processing_time)
}
perf_filename = "performances_%s.json" % ("train" if self._is_training else "eval")
with open(os.path.join(self._sample_dir, "..", perf_filename), 'w') as f:
json.dump(perf_dict, f)

View file

@ -0,0 +1,8 @@
.idea/
.ipynb_checkpoints
/_python_build
*.pyc
__pycache__
*.swp
/results
*.zip

View file

@ -0,0 +1,6 @@
FROM nvcr.io/nvidia/tensorflow:19.05-py3
ADD . /workspace/unet
WORKDIR /workspace/unet
RUN pip install -r requirements.txt

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2019 NVIDIA Corporation
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,17 @@
Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Ths repository includes software from:
* TensorFlow, (https://github.com/tensorflow/tensorflow) licensed
under the Apache License, Version 2.0

View file

@ -0,0 +1,423 @@
# UNet
This repository provides a script and recipe to train U-Net Medical to achieve state of the art accuracy, and is tested and maintained by NVIDIA.
## Table of contents
1. [The model](#1-the-model)
1. [Default configuration](#11-default-configuration)
2. [Model architecture](#12-model-architecture)
3. [Feature support matrix](#13-feature-support-matrix)
1. [Features](##131)
2. [Setup](#2-setup)
1. [Requirements](#21-requirements)
3. [Quick start guide](#3-quick-start-guide)
1. [Clone the repository](#31-clone-the-repository)
2. [Download and preprocess the dataset](#32-download-and-preprocess-the-dataset)
3. [Build the U-Net TensorFlow container](#33-build-and-start-the-docker-container-based-on-the-tensorflow-ngc-container)
4. [Start an interactive session in the NGC container to run training/inference](#34-start-an-interactive-session-in-the-ngc-container-to-run-traininginference)
5. [Start training](#35-start-training)
6. [Start inference/predictions](#36-start-inferencepredictions)
4. [Details](#4-details)
1. [Scripts and sample code](#41-scripts-and-sample-code)
2. [Parameters](#42-parameters)
3. [Command line options](#43-command-line-options)
4. [Getting the data](#44-getting-the-data)
1. [Dataset guidelines](#441-dataset-guidelines)
5. [Training process](#45-training-process)
1. [Optimizer](#451-optimizer)
2. [Augmentation](#452-augmentation)
6. [Inference process](#46-inference-process)
5. [Mixed precision training](#5-mixed-precision-training)
1. [Enabling mixed precision](#51-enabling-mixed-precision)
6. [Benchmarking](#6-benchmarking)
1. [Training performance benchmark](#61-training-performance-benchmark)
2. [Inference performance benchmark](#62-inference-performance-benchmark)
7. [Results](#7-results)
1. [Training accuracy results](#71-training-accuracy-results)
1. [NVIDIA DGX-1 (8x V100 16G)](#711-nvidia-dgx-1-8x-v100-16g)
2. [Training performance results](#72-training-performance-results)
1. [NVIDIA DGX-1 (1x V100 16G)](#721-nvidia-dgx-1-1x-v100-16g)
2. [NVIDIA DGX-1 (8x V100 16G)](#721-nvidia-dgx-1-8x-v100-16g)
3. [Inference performance results](#73-inference-performance-results)
1. [NVIDIA DGX-1 (1x V100 16G)](#731)
7. [Glossary](#7-glossary)
8. [Changelog](#8-changelog)
9. [Known issues](#9-known-issues)
## 1. The model
The U-Net model is a convolutional neural network for 2D image segmentation. This repository contains a U-Net implementation as described in the paper [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597), without any alteration.
This model is trained with mixed precision using tensor cores on NVIDIA Volta GPUs. Therefore, researchers can get results much faster than training without Tensor Cores, while experiencing the benefits of mixed precision training (for example, up to 3.5x performance boost). This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
### 1.1. Model architecture
U-Net was first introduced by Olaf Ronneberger, Philip Fischer, and Thomas Brox in the paper: U-Net: Convolutional Networks for Biomedical Image Segmentation. U-Net allows for seamless segmentation of 2D images, with high accuracy and performance, and can be adapted to solve many different segmentation problems.
The following figure shows the construction of the UNet model and its different components. UNet is composed of a contractive and an expanding path, that aims at building a bottleneck in its centermost part through a combination of convolution and pooling operations. After this bottleneck, the image is reconstructed through a combination of convolutions and upsampling. Skip connections are added with the goal of helping the backward flow of gradients in order to improve the training.
![UNet](images/unet.png)
### 1.2. Default configuration
U-Net consists of a contractive (left-side) and expanding (right-side) path. It repeatedly applies unpadded convolutions followed by max pooling for downsampling. Every step in the expanding path consists of an upsampling of the feature maps and a concatenation with the correspondingly cropped feature map from the contractive path.
The following features were implemented in this model:
* Data-parallel multi-GPU training with Horovod.
* Mixed precision support with TensorFlow Automatic Mixed Precision (TF-AMP), which enables mixed precision training without any changes to the code-base by performing automatic graph rewrites and loss scaling controlled by an environmental variable.
* Tensor Core operations to maximize throughput using NVIDIA Volta GPUs.
* Static loss scaling for tensor cores (mixed precision) training.
The following performance optimizations were implemented in this model:
* XLA support (experimental). For TensorFlow, easily adding mixed-precision support is available from NVIDIAs APEX, a TensorFlow extension that contains utility libraries, such as AMP, which require minimal network code changes to leverage tensor cores performance.
### 1.3. Feature support matrix
The following features are supported by this model.
| **Feature** | **UNet_Medical_TF** |
|:---:|:--------:|
| Horovod Multi-GPU (NCCL) | Yes |
### 1.3.1. Features
**Horovod** - Horovod is a distributed training framework for TensorFlow, Keras, PyTorch and MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use. For more information about how to get started with Horovod, see the [Horovod: Official repository](https://github.com/horovod/horovod).
## 2. Setup
The following section lists the requirements in order to start training the U-Net model.
### 2.1. Requirements
This repository contains a `Dockerfile` which extends the TensorFlow NGC container and encapsulates some additional dependencies. Aside from these dependencies, ensure you have the following components:
* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
* [tensorflow:19.03-py3 NGC container](https://ngc.nvidia.com/registry/nvidia-tensorflow)
* [NVIDIA Volta based GPU](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
For more information about how to get started with NGC containers, see the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning DGX Documentation:
* [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
* [Accessing And Pulling From The NGC container registry](https://docs.nvidia.com/deeplearning/dgx/user-guide/index.html#accessing_registry)
* [Running Tensorflow](https://docs.nvidia.com/deeplearning/dgx/tensorflow-release-notes/running.html#running)
## 3. Quick start guide
To train your model using mixed precision with tensor cores or using FP32, perform the following steps using the default parameters of the U-Net model on the [EM segmentation challenge dataset](http://brainiac2.mit.edu/isbi_challenge/home).
### 3.1. Clone the repository
```
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/TensorFlow/Segmentation/UNet_Medical
```
### 3.2. Download and preprocess the dataset
The U-Net script main.py operates on data from the [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge/home), the dataset originally employed in the [U-Net paper](https://arxiv.org/abs/1505.04597). Upon registration, the challenge's data is made available through the following links:
* [train-volume.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-volume.tif)
* [train-labels.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-labels.tif)
* [train-volume.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/test-volume.tif)
The script `download_dataset.py` is provided for data download. It is possible to select the destination folder when downloading the files by using the `--data_dir` flag. For example:
```
python download_dataset.py --data_dir ./dataset
```
Training and test data are composed of 3 multi-page `TIF` files, each containing 30 2D-images. The training and test datasets are given as stacks of 30 2D-images provided as a multi-page `TIF` that can be read using the Pillow library and NumPy (both Python packages are installed by the `Dockerfile`):
```
From PIL import Image, ImageSequence
Import numpy as np
im = Image.open(path)
slices = [np.array(i) for i in ImageSequence.Iterator(im)]
```
Once downloaded the data using the `download_dataset.py` script, it can be used to run the training and benchmark scripts described below, by pointing `main.py` to its location using the `--data_dir` flag.
**Note:** Masks are only provided for training data.
### 3.3. Build the U-Net TensorFlow container
After Docker is correctly set up, the U-Net TensorFlow container can be built with:
```
user@~/Documents/unet_medical_tf # docker build -t unet_tf .
```
### 3.4. Start an interactive session in the NGC container to run training/inference.
Run the previously built Docker container:
```
user@~/path/to/unet_medical_tf # docker run --runtime=nvidia --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /path/to/dataset:/data unet_tf:latest bash
```
**Note:** Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
### 3.5. Start training
To run training for a default configuration (for example 1/8 GPUs FP32/TF-AMP), run one of the scripts in the `./examples` directory, as follows:
```
bash examples/unet_{FP32, TF-AMP}_{1,8}.sh <path to main.py> <path to dataset> <path to results directory>
```
For example:
```
root@8e522945990f:/workspace/unet# bash examples/unet_FP32_1GPU.sh . /data results
```
### 3.6. Start inference/predictions
To run inference on a checkpointed model, run:
```
python main.py --data_dir /data --model_dir <path to checkpoint> --exec_mode predict
```
## 4. Details
The following sections provide greater details of the dataset, running training and inference, and the training results.
### 4.1. Scripts and sample code
In the root directory, the most important files are:
* `main.py`: Serves as the entry point to the application.
* `Dockerfile`: Container with the basic set of dependencies to run UNet
* `requirements.txt`: Set of extra requirements for running UNet
* `download_data.py`: Automatically downloads the dataset for training
The utils/ folder encapsulates the necessary tools to train and perform inference using UNet. Its main components are:
* `runner.py`: Implements the logic for training and inference
* `data_loader.py`: Implements the data loading and augmentation
* `hooks/profiler.py`: Collects different metrics to be used for benchmarking and testing
* `var_storage.py`: Helper functions for TF-AMP
The model/ folder contains information about the building blocks of UNet and the way they are assembled. Its contents are:
* `layers.py`: Defines the different blocks that are used to assemble UNet
* `unet.py`: Defines the model architecture using the blocks from the `layers.py` script
Other folders included in the root directory are:
* `dllogger/`: Contains the utils for logging
* `examples/`: Provides examples for training and benchmarking UNet
* `images/`: Contains a model diagram
### 4.2. Parameters
The complete list of the available parameters for the main.py script contains:
* `--exec_mode`: Select the execution mode to run the model (default: train_and_predict)
* `--model_dir`: Set the output directory for information related to the model (default: result/)
* `--data_dir`: Set the input directory containing the dataset (defaut: None)
* `--batch_size`: Size of each minibatch per GPU (default: 1)
* `--max_steps`: Maximum number of steps (batches) for training (default: 1000)
* `--seed`: Set random seed for reproducibility (default: 0)
* `--weight_decay`: Weight decay coefficient (default: 0.0005)
* `--log_every`: Log performance every n steps (default: 100)
* `--warmup_steps`: Skip logging during the first n steps (default: 200)
* `--learning_rate`: Models learning rate (default: 0.01)
* `--momentum`: Momentum coefficient for models optimizer (default: 0.99)
* `--decay_steps`: Number of steps before learning rate decay (default: 5000)
* `--decay_rate`: Decay rate for polynomial learning rate decay (default 0.95)
* `--augment`: Enable data augmentation (default: False)
* `--benchmark`: Enable performance benchmarking (default: False)
* `--use_amp`: Enable automatic mixed precision (default: False)
### 4.3. Command line options
To see the full list of available options and their descriptions, use the `-h` or `--help` command line option, for example:
```
root@ac1c9afe0a0b:/workspace/unet# python main.py
usage: main.py [-h]
[--exec_mode {train,train_and_predict,predict,benchmark}]
[--model_dir MODEL_DIR]
--data_dir DATA_DIR
[--batch_size BATCH_SIZE]
[--max_steps MAX_STEPS]
[--seed SEED]
[--weight_decay WEIGHT_DECAY]
[--log_every LOG_EVERY]
[--warmup_steps WARMUP_STEPS]
[--learning_rate LEARNING_RATE]
[--momentum MOMENTUM]
[--decay_steps DECAY_STEPS]
[--decay_rate DECAY_RATE]
[--augment]
[--no-augment]
[--benchmark]
[--no-benchmark]
[--use_amp]
```
### 4.4. Getting the data
The U-Net model was trained in the [EM segmentation challenge dataset](http://brainiac2.mit.edu/isbi_challenge/home). Test images provided by the organization were used to produce the resulting masks for submission.
Training and test data is comprised of three 512x512x30 `TIF` volumes (`test-volume.tif`, `train-volume.tif` and `train-labels.tif`). Files `test-volume.tif` and `train-volume.tif` contain grayscale 2D slices to be segmented. Additionally, training masks are provided in `train-labels.tif` as a 512x512x30 `TIF` volume, where each pixel has one of two classes:
* 0 indicating the presence of cellular membrane, and
* 1 corresponding to background.
The objective is to produce a set of masks that segment the data as accurately as possible. The results are expected to be submitted as a 32-bit `TIF` 3D image, which values between `0` (100% membrane certainty) and `1` (100% non-membrane certainty).
#### 4.4.1 Dataset guidelines
The process of loading, normalizing and augmenting the data contained in the dataset can be found in the `data_loader.py` script.
Initially, data is loaded from a multi-page `TIF` file and converted to 512x512x30 NumPy arrays with the use of Pillow. These NumPy arrays are fed to the model through `tf.data.Dataset.from_tensor_slices()`, in order to achieve high performance.
Intensities on the volumes are then normalized to an interval `[-1, 1]`, whereas labels are one-hot encoded for their later use in pixel wise cross entropy loss, becoming 512x512x30x2 tensors.
If augmentation is enabled, the following set of augmentation techniques are applied:
* Random horizontal flipping
* Random vertical flipping
* Elastic deformation through dense_image_warp
* Random rotation
* Crop to a random dimension and resize to input dimension
* Random brightness shifting
At the end, intensities are clipped to the `[-1, 1]` interval.
### 4.5. Training process
#### 4.5.1. Optimizer
The model trains for 40,000 batches, with the default U-Net setup as specified in the [original paper](https://arxiv.org/abs/1505.04597):
* SGD with momentum (0.99)
* Learning rate = 0.01
This default parametrization is employed when running scripts from the ./examples directory and when running main.py without explicitly overriding these fields.
* Augmentation
* During training, we perform the following augmentation techniques:
* Random flip left and right
* Random flip up and down
* Elastic deformation
* Random rotation
* Random crop and resize
* Random brightness changes
To run a pre-parameterized configuration (1 or 8 GPUs, FP32 or AMP), run one of the scripts in the `./examples` directory, for example:
```
./examples/unet_{FP32, TF-AMP}_{1, 8}GPU.sh <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>
```
Use `-h` or `--help` to obtain a list of available options in the `main.py` script.
**Note:** When calling the `main.py` script manually, data augmentation is disabled. In order to enable data augmentation, use the `--augment` flag at the end of your invocation.
Use the `--model_dir` flag to select the location where to store the artifacts of the training.
### 4.6. Inference process
To run inference on a checkpointed model, run the script below, although, it requires a pre-trained model checkpoint and tokenized input.
```
python main.py --data_dir /data --model_dir <path to checkpoint> --exec_mode predict
```
This script should produce the prediction results over a set of masks which will be located in `<path to checkpoint>/eval`.
## 5. Mixed precision training
Mixed precision is the combined use of different numerical precisions in a computational method. [Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. Since the introduction of [tensor cores](https://developer.nvidia.com/tensor-cores) in the Volta and Turing architecture, significant training speedups are experienced by switching to mixed precision -- up to 3x overall speedup on the most arithmetically intense model architectures. Using mixed precision training requires two steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced in the Pascal architecture and first supported in [CUDA 8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep Learning SDK.
For information about:
- How to train using mixed precision, see the [Mixed Precision Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html) documentation.
- Techniques used for mixed precision training, see the [Mixed-Precision Training of Deep Neural Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) blog.
- How to access and enable AMP for TensorFlow, see [Using TF-AMP](https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tfamp) from the TensorFlow User Guide.
- APEX tools for mixed precision training, see the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/).
## 5.1. Enabling mixed precision
In order to enable mixed precision training, the following environment variables must be defined with the correct value before the training starts:
```
TF_ENABLE_AUTO_MIXED_PRECISION=1
```
Exporting these variables ensures that loss scaling is performed correctly and automatically.
By supplying the `--use_amp` flag to the `main.py` script while training in FP32, the following variables are set to their correct value for mixed precision training inside the `./utils/runner.py` script:
```
if params['use_amp']:
assert params['dtype'] == tf.float32, "TF-AMP requires FP32 precision"
LOGGER.log("TF AMP is activated - Experimental Feature")
os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'
```
## 6. Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
### 6.1. Training performance benchmark
To benchmark training, run one of the scripts in `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_{1, 8}GPU.sh <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>`.
Each of these scripts will by default run 200 warm-up iterations and benchmark the performance during training in the next 100 iterations. To control warmup and benchmark length, use `--warmup_steps`, and `--max_steps` flags.
### 6.2. Inference performance benchmark
To benchmark inference, run one of the scripts in `./examples/unet_INFER_BENCHMARK_{FP32, TF-AMP}.sh <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>`.
Each of these scripts will by default run 200 warmup iterations and benchmark the performance during inference in the next 100 iterations. To control warmup and benchmark length, use `--warmup_steps`, and `--max_steps` flags.
## 7. Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
### 7.1. Training accuracy results
#### 7.1.1 NVIDIA DGX-1 (8x V100 16G)
Our results were obtained by running the `./examples/unet_{FP32, TF-AMP}_{1, 8}GPU.sh` scripts in the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
Metrics employed by the organization are explained in detail [here](http://brainiac2.mit.edu/isbi_challenge/evaluation).
The results described below were obtained after the submission of our evaluations to the [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge) organizers.
| **Number og GPUs** | **FP32 Rand Score Thin** | **FP32 Information Score Thin** | **TF-AMP Rand Score Thin** | **TF-AMP Information Score Thin** | **Total time to train with FP16 (Hrs)** | **Total time to train with FP32 (Hrs)** |
|:---:|:--------:|:-------:|:--------:|:-------:|:--------:|:-------:|
|1 | 0.938508265 | 0.970255682 | 0.939619101 | 0.970120138 | 7.1 | 11.28 |
|8 | 0.932395087 | 0.9786346 | 0.941360867 | 0.976235311 | 0.9 | 1.41 |
### 7.2. Training performance results
#### 7.2.1 NVIDIA DGX-1 (1x V100 16G)
Our results were obtained by running the `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_1GPU.sh` scripts in
the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPU while data augmentation is enabled.
| **Batch size** | **FP32 max img/s** | **TF-AMP max img/s** | **Speedup factor** |
|:---:|:--------:|:-------:|:-------:|
| 1 | 12.37 | 21.91 | 1.77 |
| 8 | 13.81 | 29.58 | 2.14 |
| 16 | Out of memory | 30.77 | - |
To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
#### 7.2.2 NVIDIA DGX-1 (8x V100 16G)
Our results were obtained by running the `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_8GPU.sh` scripts in
the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPU while data augmentation is enabled.
| **Batch size per GPU** | **FP32 max img/s** | **TF-AMP max img/s** | **Speedup factor** |
|:---:|:--------:|:-------:|:-------:|
| 1 | 89.93 | 126.66 | 1.41 |
| 8 | 105.35 | 130.66 | 1.24 |
| 16 | Out of memory | 132.78 | - |
To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
### 7.3. Inference performance results
Our results were obtained by running the `./examples/unet_INFER_BENCHMARK_{FP32, TF-AMP}.sh` scripts in
the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPU while data augmentation is enabled.
| **Batch size** | **FP32 img/s** | **TF-AMP img/s** | **Speedup factor** |
|:---:|:--------:|:-------:|:-------:|
| 1 | 34.27 | 62.81 | 1.83 |
| 8 | 37.09 | 79.62 | 2.14 |
| 16 | Out of memory | 83.33 | - |
To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
## 8. Changelog
May 2019
* Initial release
## 9. Known issues
There are no known issues in this release.

View file

@ -0,0 +1,19 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from .logger import LOGGER, StdOutBackend, MLPerfBackend, JsonBackend, CompactBackend, Scope, AverageMeter, StandardMeter
from . import tags
__all__ = [LOGGER, StdOutBackend, MLPerfBackend, JsonBackend, CompactBackend, Scope, AverageMeter, StandardMeter, tags]

View file

@ -0,0 +1,60 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Common values reported
import subprocess
import xml.etree.ElementTree as ET
#TODO: print CUDA version, container version etc
def log_hardware(logger):
# TODO: asserts - what if you cannot launch those commands?
# number of CPU threads
cpu_info_command = 'cat /proc/cpuinfo'
cpu_info = subprocess.run(cpu_info_command.split(), stdout=subprocess.PIPE).stdout.split()
cpu_num_index = len(cpu_info) - cpu_info[::-1].index(b'processor') + 1
cpu_num = int(cpu_info[cpu_num_index]) + 1
# CPU name
cpu_name_begin_index = cpu_info.index(b'name')
cpu_name_end_index = cpu_info.index(b'stepping')
cpu_name = b' '.join(cpu_info[cpu_name_begin_index + 2:cpu_name_end_index]).decode('utf-8')
logger.log(key='cpu_info', value={"num": cpu_num, "name": cpu_name})
# RAM memory
ram_info_command = 'free -m -h'
ram_info = subprocess.run(ram_info_command.split(), stdout=subprocess.PIPE).stdout.split()
ram_index = ram_info.index(b'Mem:') + 1
ram = ram_info[ram_index].decode('utf-8')
logger.log(key='mem_info', value={"ram": ram})
# GPU
nvidia_smi_command = 'nvidia-smi -q -x'
nvidia_smi_output = subprocess.run(nvidia_smi_command.split(), stdout=subprocess.PIPE).stdout
nvidia_smi = ET.fromstring(nvidia_smi_output)
gpus = nvidia_smi.findall('gpu')
ver = nvidia_smi.findall('driver_version')
logger.log(key="gpu_info",
value={
"driver_version": ver[0].text,
"num": len(gpus),
"name": [g.find('product_name').text for g in gpus],
"mem": [g.find('fb_memory_usage').find('total').text for g in gpus]})
def log_args(logger, args):
logger.log(key='args', value=vars(args))

View file

@ -0,0 +1,519 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
import json
import logging
import inspect
import sys
from contextlib import contextmanager
import functools
from collections import OrderedDict
import datetime
from . import autologging
NVLOGGER_NAME = 'nv_dl_logger'
NVLOGGER_VERSION = '0.3.1'
NVLOGGER_TOKEN = ':::NVLOG'
MLPERF_NAME = 'mlperf_logger'
MLPERF_VERSION = '0.5.0'
MLPERF_TOKEN = ':::MLP'
COMPACT_NAME = 'compact_logger'
DEFAULT_JSON_FILENAME = 'nvlog.json'
class Scope:
RUN = 0
EPOCH = 1
TRAIN_ITER = 2
class Level:
CRITICAL = 5
ERROR = 4
WARNING = 3
INFO = 2
DEBUG = 1
_data = OrderedDict([
('model', None),
('epoch', -1),
('iteration', -1),
('total_iteration', -1),
('metrics', OrderedDict()),
('timed_blocks', OrderedDict()),
('current_scope', Scope.RUN)
])
def get_caller(root_dir=None):
stack_files = [s.filename.split('/')[-1] for s in inspect.stack()]
stack_index = 0
while stack_index < len(stack_files) and stack_files[stack_index] != 'logger.py':
stack_index += 1
while (stack_index < len(stack_files) and
stack_files[stack_index] in ['logger.py', 'autologging.py', 'contextlib.py']):
stack_index += 1
caller = inspect.stack()[stack_index]
return "%s:%d" % (stack_files[stack_index], caller.lineno)
class StandardMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.value = None
def record(self, value):
self.value = value
def get_value(self):
return self.value
def get_last(self):
return self.value
class AverageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.count = 0
self.value = 0
self.last = 0
def record(self, value, n = 1):
self.last = value
self.count += n
self.value += value * n
def get_value(self):
return self.value / self.count
def get_last(self):
return self.last
class JsonBackend(object):
def __init__(self, log_file=DEFAULT_JSON_FILENAME, logging_scope=Scope.TRAIN_ITER,
iteration_interval=1):
self.log_file = log_file
self.logging_scope = logging_scope
self.iteration_interval = iteration_interval
self.json_log = OrderedDict([
('run', OrderedDict()),
('epoch', OrderedDict()),
('iter', OrderedDict()),
('event', OrderedDict()),
])
self.json_log['epoch']['x'] = []
if self.logging_scope == Scope.TRAIN_ITER:
self.json_log['iter']['x'] = [[]]
def register_metric(self, key, metric_scope):
if (metric_scope == Scope.TRAIN_ITER and
self.logging_scope == Scope.TRAIN_ITER):
if not key in self.json_log['iter'].keys():
self.json_log['iter'][key] = [[]]
if metric_scope == Scope.EPOCH:
if not key in self.json_log['epoch'].keys():
self.json_log['epoch'][key] = []
def log(self, key, value):
if _data['current_scope'] == Scope.RUN:
self.json_log['run'][key] = value
elif _data['current_scope'] == Scope.EPOCH:
pass
elif _data['current_scope'] == Scope.TRAIN_ITER:
pass
else:
raise ValueError('log function for scope "', _data['current_scope'],
'" not implemented')
def log_event(self, key, value):
if not key in self.json_log['event'].keys():
self.json_log['event'][key] = []
entry = OrderedDict()
entry['epoch'] = _data['epoch']
entry['iter'] = _data['iteration']
entry['timestamp'] = time.time()
if value:
entry['value'] = value
self.json_log['event'][key].append(str(entry))
def log_iteration_summary(self):
if (self.logging_scope == Scope.TRAIN_ITER and
_data['total_iteration'] % self.iteration_interval == 0):
for key, m in _data['metrics'].items():
if m.metric_scope == Scope.TRAIN_ITER:
self.json_log['iter'][key][-1].append(str(m.get_last()))
# log x for iteration number
self.json_log['iter']['x'][-1].append(_data['iteration'])
def dump_json(self):
if self.log_file is None:
print(json.dumps(self.json_log, indent=4))
else:
with open(self.log_file, 'w') as f:
json.dump(self.json_log, fp=f, indent=4)
def log_epoch_summary(self):
for key, m in _data['metrics'].items():
if m.metric_scope == Scope.EPOCH:
self.json_log['epoch'][key].append(str(m.get_value()))
elif (m.metric_scope == Scope.TRAIN_ITER and
self.logging_scope == Scope.TRAIN_ITER):
# create new sublists for each iter metric in the next epoch
self.json_log['iter'][key].append([])
# log x for epoch number
self.json_log['epoch']['x'].append(_data['epoch'])
# create new sublist for iter's x in the next epoch
if self.logging_scope == Scope.TRAIN_ITER:
self.json_log['iter']['x'].append([])
self.dump_json()
def timed_block_start(self, name):
pass
def timed_block_stop(self, name):
pass
def finish(self):
self.dump_json()
class _ParentStdOutBackend(object):
def __init__(self, name, token, version, log_file, logging_scope, iteration_interval):
self.root_dir = None
self.worker = [0]
self.prefix = ''
self.name = name
self.token = token
self.version = version
self.log_file = log_file
self.logging_scope = logging_scope
self.iteration_interval = iteration_interval
self.logger = logging.getLogger(self.name)
self.logger.setLevel(logging.DEBUG)
self.logger.handlers = []
if (self.log_file is None):
self.stream_handler = logging.StreamHandler(stream=sys.stdout)
self.stream_handler.setLevel(logging.DEBUG)
self.logger.addHandler(self.stream_handler)
else:
self.file_handler = logging.FileHandler(self.log_file, mode='w')
self.file_handler.setLevel(logging.DEBUG)
self.logger.addHandler(self.file_handler)
def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
pass
def log_epoch_summary(self):
pass
def log_iteration_summary(self):
pass
def log(self, key, value):
if _data['current_scope'] > self.logging_scope:
pass
elif (_data['current_scope'] == Scope.TRAIN_ITER and
_data['total_iteration'] % self.iteration_interval != 0):
pass
else:
self.log_stdout(key, value)
def log_event(self, key, value):
self.log_stdout(key, value)
def log_stdout(self, key, value=None, forced=False):
# TODO: worker 0
# only the 0-worker will log
#if not forced and self.worker != 0:
# pass
if value is None:
msg = key
else:
str_json = json.dumps(str(value))
msg = '{key}: {value}'.format(key=key, value=str_json)
call_site = get_caller(root_dir=self.root_dir)
now = time.time()
message = '{prefix}{token}v{ver} {model} {secs:.9f} ({call_site}) {msg}'.format(
prefix=self.prefix, token=self.token, ver=self.version, secs=now,
model=_data['model'],
call_site=call_site, msg=msg)
self.logger.debug(message)
def timed_block_start(self, name):
self.log_stdout(key=name + "_start")
def timed_block_stop(self, name):
self.log_stdout(key=name + "_stop")
def finish(self):
pass
class StdOutBackend(_ParentStdOutBackend):
def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
_ParentStdOutBackend.__init__(self, name=NVLOGGER_NAME, token=NVLOGGER_TOKEN,
version=NVLOGGER_VERSION, log_file=log_file, logging_scope=logging_scope,
iteration_interval=iteration_interval)
class MLPerfBackend(_ParentStdOutBackend):
def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
_ParentStdOutBackend.__init__(self, name=MLPERF_NAME, token=MLPERF_TOKEN,
version=MLPERF_VERSION, log_file=log_file, logging_scope=logging_scope,
iteration_interval=iteration_interval)
class CompactBackend(object):
def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
self.log_file = log_file
self.logging_scope = logging_scope
self.iteration_interval = iteration_interval
self.logger = logging.getLogger(COMPACT_NAME)
self.logger.setLevel(logging.DEBUG)
self.logger.handlers = []
if (self.log_file is None):
self.stream_handler = logging.StreamHandler(stream=sys.stdout)
self.stream_handler.setLevel(logging.DEBUG)
self.logger.addHandler(self.stream_handler)
else:
self.file_handler = logging.FileHandler(self.log_file, mode='w')
self.file_handler.setLevel(logging.DEBUG)
self.logger.addHandler(self.file_handler)
def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
pass
def timestamp_prefix(self):
return datetime.datetime.now().strftime('[%Y-%m-%d %H:%M:%S]')
def log(self, key, value):
if _data['current_scope'] == Scope.RUN:
self.log_event(key, value)
def log_event(self, key, value):
msg = self.timestamp_prefix() + ' ' + str(key)
if value is not None:
msg += ": " + str(value)
self.logger.debug(msg)
def log_epoch_summary(self):
if self.logging_scope >= Scope.EPOCH:
summary = self.timestamp_prefix() + ' Epoch {:<4} '.format(str(_data['epoch']) + ':')
for key, m in _data['metrics'].items():
if m.metric_scope >= Scope.EPOCH:
summary += str(key) + ": " + str(m.get_value()) + ", "
self.logger.debug(summary)
def log_iteration_summary(self):
if self.logging_scope >= Scope.TRAIN_ITER and _data['total_iteration'] % self.iteration_interval == 0:
summary = self.timestamp_prefix() + ' Iter {:<5} '.format(str(_data['iteration']) + ':')
for key, m in _data['metrics'].items():
if m.metric_scope == Scope.TRAIN_ITER:
summary += str(key) + ": " + str(m.get_last()) + ", "
self.logger.debug(summary)
def timed_block_start(self, name):
pass
def timed_block_stop(self, name):
pass
def finish(self):
pass
class _Logger(object):
def __init__(self):
self.backends = [
CompactBackend(),
JsonBackend()
]
self.level = Level.INFO
def set_model_name(self, name):
_data['model'] = name
def set_backends(self, backends):
self.backends = backends
def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
if meter is None:
meter = StandardMeter()
#TODO: move to argument of Meter?
meter.metric_scope = metric_scope
_data['metrics'][key] = meter
for b in self.backends:
b.register_metric(key, metric_scope)
def log(self, key, value=None, forced=False, level=Level.INFO):
if level < self.level:
return
if _data['current_scope'] == Scope.TRAIN_ITER or _data['current_scope'] == Scope.EPOCH:
if key in _data['metrics'].keys():
if _data['metrics'][key].metric_scope == _data['current_scope']:
_data['metrics'][key].record(value)
for b in self.backends:
b.log(key, value)
def debug(self, *args, **kwargs):
self.log(*args, level=Level.DEBUG, **kwargs)
def info(self, *args, **kwargs):
self.log(*args, level=Level.INFO, **kwargs)
def warning(self, *args, **kwargs):
self.log(*args, level=Level.WARNING, **kwargs)
def error(self, *args, **kwargs):
self.log(*args, level=Level.ERROR, **kwargs)
def critical(self, *args, **kwargs):
self.log(*args, level=Level.CRITICAL, **kwargs)
def log_event(self, key, value=None):
for b in self.backends:
b.log_event(key, value)
def timed_block_start(self, name):
if not name in _data['timed_blocks']:
_data['timed_blocks'][name] = OrderedDict()
_data['timed_blocks'][name]['start'] = time.time()
for b in self.backends:
b.timed_block_start(name)
def timed_block_stop(self, name):
if not name in _data['timed_blocks']:
raise ValueError('timed_block_stop called before timed_block_start for ' + name)
_data['timed_blocks'][name]['stop'] = time.time()
delta = _data['timed_blocks'][name]['stop'] - _data['timed_blocks'][name]['start']
self.log(name + '_time', delta)
for b in self.backends:
b.timed_block_stop(name)
def iteration_start(self):
_data['current_scope'] = Scope.TRAIN_ITER
_data['iteration'] += 1
_data['total_iteration'] += 1
def iteration_stop(self):
for b in self.backends:
b.log_iteration_summary()
_data['current_scope'] = Scope.EPOCH
def epoch_start(self):
_data['current_scope'] = Scope.EPOCH
_data['epoch'] += 1
_data['iteration'] = -1
for n, m in _data['metrics'].items():
if m.metric_scope == Scope.TRAIN_ITER:
m.reset()
def epoch_stop(self):
for b in self.backends:
b.log_epoch_summary()
_data['current_scope'] = Scope.RUN
def finish(self):
for b in self.backends:
b.finish()
def iteration_generator_wrapper(self, gen):
for g in gen:
self.iteration_start()
yield g
self.iteration_stop()
def epoch_generator_wrapper(self, gen):
for g in gen:
self.epoch_start()
yield g
self.epoch_stop()
@contextmanager
def timed_block(self, prefix, value=None, forced=False):
""" This function helps with timed blocks
----
Parameters:
prefix - one of items from TIMED_BLOCKS; the action to be timed
logger - NVLogger object
forced - if True then the events are always logged (even if it should be skipped)
"""
self.timed_block_start(prefix)
yield self
self.timed_block_stop(prefix)
def log_hardware(self):
autologging.log_hardware(self)
def log_args(self, args):
autologging.log_args(self, args)
def timed_function(self, prefix, variable=None, forced=False):
""" This decorator helps with timed functions
----
Parameters:
prefix - one of items from TIME_BLOCK; the action to be timed
logger - NVLogger object
forced - if True then the events are always logged (even if it should be skipped)
"""
def timed_function_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
value = kwargs.get(variable, next(iter(args), None))
with self.timed_block(prefix=prefix, value=value, forced=forced):
func(*args, **kwargs)
return wrapper
return timed_function_decorator
LOGGER = _Logger()

View file

@ -0,0 +1,255 @@
# Copyright 2018 MLBenchmark Group. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Common values reported
VALUE_EPOCH = "epoch"
VALUE_ITERATION = "iteration"
VALUE_ACCURACY = "accuracy"
VALUE_BLEU = "bleu"
VALUE_TOP1 = "top1"
VALUE_TOP5 = "top5"
VALUE_BBOX_MAP = "bbox_map"
VALUE_MASK_MAP = "mask_map"
VALUE_BCE = "binary_cross_entropy"
# Timed blocks (used with timed_function & timed_block
# For each there should be *_start and *_stop tags defined
RUN_BLOCK = "run"
SETUP_BLOCK = "setup"
PREPROC_BLOCK = "preproc"
TRAIN_BLOCK = "train"
TRAIN_PREPROC_BLOCK = "train_preproc"
TRAIN_EPOCH_BLOCK = "train_epoch"
TRAIN_EPOCH_PREPROC_BLOCK = "train_epoch_preproc"
TRAIN_CHECKPOINT_BLOCK = "train_checkpoint"
TRAIN_ITER_BLOCK = "train_iteration"
EVAL_BLOCK = "eval"
EVAL_ITER_BLOCK = "eval_iteration"
#TODO: to remove?
TIMED_BLOCKS = {
RUN_BLOCK,
SETUP_BLOCK,
PREPROC_BLOCK,
TRAIN_BLOCK,
TRAIN_PREPROC_BLOCK,
TRAIN_EPOCH_BLOCK,
TRAIN_EPOCH_PREPROC_BLOCK,
TRAIN_CHECKPOINT_BLOCK,
TRAIN_ITER_BLOCK,
EVAL_BLOCK,
EVAL_ITER_BLOCK,
}
# Events
RUN_INIT = "run_init"
SETUP_START = "setup_start"
SETUP_STOP = "setup_stop"
PREPROC_START = "preproc_start"
PREPROC_STOP = "preproc_stop"
RUN_START = "run_start"
RUN_STOP = "run_stop"
RUN_FINAL = "run_final"
TRAIN_CHECKPOINT_START = "train_checkpoint_start"
TRAIN_CHECKPOINT_STOP = "train_checkpoint_stop"
TRAIN_PREPROC_START = "train_preproc_start"
TRAIN_PREPROC_STOP = "train_preproc_stop"
TRAIN_EPOCH_PREPROC_START = "train_epoch_preproc_start"
TRAIN_EPOCH_PREPROC_STOP = "train_epoch_preproc_stop"
TRAIN_ITER_START = "train_iter_start"
TRAIN_ITER_STOP = "train_iter_stop"
TRAIN_EPOCH_START = "train_epoch_start"
TRAIN_EPOCH_STOP = "train_epoch_stop"
# MLPerf specific tags
RUN_CLEAR_CACHES = "run_clear_caches"
PREPROC_NUM_TRAIN_EXAMPLES = "preproc_num_train_examples"
PREPROC_NUM_EVAL_EXAMPLES = "preproc_num_eval_examples"
PREPROC_TOKENIZE_TRAINING = "preproc_tokenize_training"
PREPROC_TOKENIZE_EVAL = "preproc_tokenize_eval"
PREPROC_VOCAB_SIZE = "preproc_vocab_size"
RUN_SET_RANDOM_SEED = "run_set_random_seed"
INPUT_SIZE = "input_size"
INPUT_BATCH_SIZE = "input_batch_size"
INPUT_ORDER = "input_order"
INPUT_SHARD = "input_shard"
INPUT_BN_SPAN = "input_bn_span"
INPUT_CENTRAL_CROP = "input_central_crop"
INPUT_CROP_USES_BBOXES = "input_crop_uses_bboxes"
INPUT_DISTORTED_CROP_MIN_OBJ_COV = "input_distorted_crop_min_object_covered"
INPUT_DISTORTED_CROP_RATIO_RANGE = "input_distorted_crop_aspect_ratio_range"
INPUT_DISTORTED_CROP_AREA_RANGE = "input_distorted_crop_area_range"
INPUT_DISTORTED_CROP_MAX_ATTEMPTS = "input_distorted_crop_max_attempts"
INPUT_MEAN_SUBTRACTION = "input_mean_subtraction"
INPUT_RANDOM_FLIP = "input_random_flip"
INPUT_RESIZE = "input_resize"
INPUT_RESIZE_ASPECT_PRESERVING = "input_resize_aspect_preserving"
# Opt
OPT_NAME = "opt_name"
OPT_LR = "opt_learning_rate"
OPT_MOMENTUM = "opt_momentum"
OPT_WEIGHT_DECAY = "opt_weight_decay"
OPT_HP_ADAM_BETA1 = "opt_hp_Adam_beta1"
OPT_HP_ADAM_BETA2 = "opt_hp_Adam_beta2"
OPT_HP_ADAM_EPSILON = "opt_hp_Adam_epsilon"
OPT_LR_WARMUP_STEPS = "opt_learning_rate_warmup_steps"
# Train
TRAIN_LOOP = "train_loop"
TRAIN_EPOCH = "train_epoch"
TRAIN_CHECKPOINT = "train_checkpoint"
TRAIN_LOSS = "train_loss"
TRAIN_ITERATION_LOSS = "train_iteration_loss"
# Eval
EVAL_START = "eval_start"
EVAL_SIZE = "eval_size"
EVAL_TARGET = "eval_target"
EVAL_ACCURACY = "eval_accuracy"
EVAL_STOP = "eval_stop"
# Perf
PERF_IT_PER_SEC = "perf_it_per_sec"
PERF_TIME_TO_TRAIN = "time_to_train"
EVAL_ITERATION_ACCURACY = "eval_iteration_accuracy"
# Model
MODEL_HP_LOSS_FN = "model_hp_loss_fn"
MODEL_HP_INITIAL_SHAPE = "model_hp_initial_shape"
MODEL_HP_FINAL_SHAPE = "model_hp_final_shape"
MODEL_L2_REGULARIZATION = "model_l2_regularization"
MODEL_EXCLUDE_BN_FROM_L2 = "model_exclude_bn_from_l2"
MODEL_HP_RELU = "model_hp_relu"
MODEL_HP_CONV2D_FIXED_PADDING = "model_hp_conv2d_fixed_padding"
MODEL_HP_BATCH_NORM = "model_hp_batch_norm"
MODEL_HP_DENSE = "model_hp_dense"
# GNMT specific
MODEL_HP_LOSS_SMOOTHING = "model_hp_loss_smoothing"
MODEL_HP_NUM_LAYERS = "model_hp_num_layers"
MODEL_HP_HIDDEN_SIZE = "model_hp_hidden_size"
MODEL_HP_DROPOUT = "model_hp_dropout"
EVAL_HP_BEAM_SIZE = "eval_hp_beam_size"
TRAIN_HP_MAX_SEQ_LEN = "train_hp_max_sequence_length"
EVAL_HP_MAX_SEQ_LEN = "eval_hp_max_sequence_length"
EVAL_HP_LEN_NORM_CONST = "eval_hp_length_normalization_constant"
EVAL_HP_LEN_NORM_FACTOR = "eval_hp_length_normalization_factor"
EVAL_HP_COV_PENALTY_FACTOR = "eval_hp_coverage_penalty_factor"
# NCF specific
PREPROC_HP_MIN_RATINGS = "preproc_hp_min_ratings"
PREPROC_HP_NUM_EVAL = "preproc_hp_num_eval"
PREPROC_HP_SAMPLE_EVAL_REPLACEMENT = "preproc_hp_sample_eval_replacement"
INPUT_HP_NUM_NEG = "input_hp_num_neg"
INPUT_HP_SAMPLE_TRAIN_REPLACEMENT = "input_hp_sample_train_replacement"
INPUT_STEP_TRAIN_NEG_GEN = "input_step_train_neg_gen"
INPUT_STEP_EVAL_NEG_GEN = "input_step_eval_neg_gen"
EVAL_HP_NUM_USERS = "eval_hp_num_users"
EVAL_HP_NUM_NEG = "eval_hp_num_neg"
MODEL_HP_MF_DIM = "model_hp_mf_dim"
MODEL_HP_MLP_LAYER_SIZES = "model_hp_mlp_layer_sizes"
# RESNET specific
EVAL_EPOCH_OFFSET = "eval_offset"
MODEL_HP_INITIAL_MAX_POOL = "model_hp_initial_max_pool"
MODEL_HP_BEGIN_BLOCK = "model_hp_begin_block"
MODEL_HP_END_BLOCK = "model_hp_end_block"
MODEL_HP_BLOCK_TYPE = "model_hp_block_type"
MODEL_HP_PROJECTION_SHORTCUT = "model_hp_projection_shortcut"
MODEL_HP_SHORTCUT_ADD = "model_hp_shorcut_add"
MODEL_HP_RESNET_TOPOLOGY = "model_hp_resnet_topology"
# Transformer specific
INPUT_MAX_LENGTH = "input_max_length"
MODEL_HP_INITIALIZER_GAIN = "model_hp_initializer_gain"
MODEL_HP_VOCAB_SIZE = "model_hp_vocab_size"
MODEL_HP_NUM_HIDDEN_LAYERS = "model_hp_hidden_layers"
MODEL_HP_EMBEDDING_SHARED_WEIGHTS = "model_hp_embedding_shared_weights"
MODEL_HP_ATTENTION_DENSE = "model_hp_attention_dense"
MODEL_HP_ATTENTION_DROPOUT = "model_hp_attention_dropout"
MODEL_HP_FFN_OUTPUT_DENSE = "model_hp_ffn_output_dense"
MODEL_HP_FFN_FILTER_DENSE = "model_hp_ffn_filter_dense"
MODEL_HP_RELU_DROPOUT = "model_hp_relu_dropout"
MODEL_HP_LAYER_POSTPROCESS_DROPOUT = "model_hp_layer_postprocess_dropout"
MODEL_HP_NORM = "model_hp_norm"
MODEL_HP_SEQ_BEAM_SEARCH = "model_hp_sequence_beam_search"

View file

@ -0,0 +1,38 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import os
PARSER = argparse.ArgumentParser(description="U-Net medical")
PARSER.add_argument('--data_dir',
type=str,
default=1,
help="""Directory where to download the dataset""")
def main():
FLAGS = PARSER.parse_args()
if not os.path.exists(FLAGS.data_dir):
os.makedirs(FLAGS.data_dir)
os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-volume.tif -P {}'.format(FLAGS.data_dir))
os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-labels.tif -P {}'.format(FLAGS.data_dir))
os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/test-volume.tif -P {}'.format(FLAGS.data_dir))
print("Finished downloading files for U-Net medical to {}".format(FLAGS.data_dir))
if __name__ == '__main__':
main()

View file

@ -0,0 +1,27 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
# Usage ./unet_TRAIN_BENCHMARK_FP32_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
python $1/main.py \
--data_dir $2 \
--model_dir $3 \
--warmup_steps 200 \
--log_every 100 \
--max_steps 320000 \
--batch_size 2 \
--benchmark \
--exec_mode train_and_predict \
--augment

View file

@ -0,0 +1,37 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
# Usage ./unet_TRAIN_BENCHMARK_FP32_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
mpirun \
-np 8 \
-H localhost:8 \
-bind-to none \
-map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python $1/main.py \
--data_dir $2 \
--model_dir $3 \
--warmup_steps 200 \
--log_every 100 \
--max_steps 40000 \
--batch_size 2 \
--benchmark \
--exec_mode train_and_predict \
--augment

View file

@ -0,0 +1,18 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
# Usage ./unet_INFER_BENCHMARK_FP32.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
python $1/main.py --data_dir $2 --model_dir $3 --batch_size $4 --benchmark --exec_mode benchmark --augment --warmup_steps 200 --log_every 100 --max_steps 300

View file

@ -0,0 +1,18 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net inference in TF-AMP on 1 GPUs using 2 batch size
# Usage ./unet_INFER_BENCHMARK_TF-AMP.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
python $1/main.py --data_dir $2 --model_dir $3 --batch_size $4 --benchmark --use_amp --exec_mode benchmark --augment --warmup_steps 200 --log_every 100 --max_steps 300

View file

@ -0,0 +1,28 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
# Usage ./unet_TF-AMP_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
python $1/main.py \
--data_dir $2 \
--model_dir $3 \
--warmup_steps 200 \
--log_every 100 \
--max_steps 320000 \
--batch_size 2 \
--benchmark \
--use_amp \
--exec_mode train_and_predict \
--augment

View file

@ -0,0 +1,38 @@
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
# Usage ./unet_TF-AMP_8GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
mpirun \
-np 8 \
-H localhost:8 \
-bind-to none \
-map-by slot \
-x NCCL_DEBUG=INFO \
-x LD_LIBRARY_PATH \
-x PATH \
-mca pml ob1 -mca btl ^openib \
--allow-run-as-root \
python $1/main.py \
--data_dir $2 \
--model_dir $3 \
--warmup_steps 200 \
--log_every 100 \
--max_steps 40000 \
--batch_size 2 \
--benchmark \
--use_amp \
--exec_mode train_and_predict \
--augment

Some files were not shown because too many files have changed in this diff Show more