Changes in TF models:

* added UNet for medical image segmentation * added TF-AMP support for RN50 * small updates for other models (READMEs, benchmark & testing scripts)
2019-05-25 01:23:11 +02:00 · 2019-05-25 01:23:11 +02:00 · d2bc3da0a1
parent 2d33a72240
commit d2bc3da0a1
124 changed files with 4252 additions and 224 deletions
--- a/README.md
+++ b/README.md
@ -18,7 +18,9 @@ The examples are organized first by framework, such as TensorFlow, PyTorch, etc.
 - __ResNet-50__ [[MXNet](https://github.com/NVIDIA/DeepLearningExamples/tree/master/MxNet/Classification/RN50v1.5)] [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Classification/RN50v1.5)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Classification/RN50v1.5)]
 - __SSD__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Detection/SSD)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Detection/SSD)]
 - __Mask R-CNN__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Segmentation/MaskRCNN)]
- __U-Net__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial)]
+- __U-Net(industrial)__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Industrial)]
+- __U-Net(medical)__ [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Segmentation/UNet_Medical)]
+

 ### Natural Language Processing
 - __GNMT__ [[PyTorch](https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/Translation/GNMT)] [[TensorFlow](https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/Translation/GNMT)]
--- a/TensorFlow/Classification/RN50v1.5/Dockerfile
+++ b/TensorFlow/Classification/RN50v1.5/Dockerfile
@ -1,4 +1,4 @@
-FROM nvcr.io/nvidia/tensorflow:19.01-py3
+FROM nvcr.io/nvidia/tensorflow:19.05-py3
 ## MAINTAINER Paweł Sołtysiak <psoltysiak@nvidia.com>

 ADD . /workspace/rn50v15_tf
--- a/TensorFlow/Classification/RN50v1.5/LICENSE
+++ b/TensorFlow/Classification/RN50v1.5/LICENSE
@ -1,11 +1,202 @@
-All rights reserved.

-Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/

-1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

-2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+   1. Definitions.

-3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.

-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/TensorFlow/Classification/RN50v1.5/README.md
+++ b/TensorFlow/Classification/RN50v1.5/README.md
@ -267,10 +267,10 @@ To control warmup and benchmark length, use `--warmup_steps`, `--num_iter` and `
 To benchmark the inference performance on a specific batch size, run:

 * FP32
-`python ./main.py --mode=inference_benchmark --precision=fp32 --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
+`python ./main.py --mode=inference_benchmark --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`

 * FP16
-`python ./main.py --mode=inference_benchmark --precision=fp16 --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`
+`python ./main.py --mode=inference_benchmark --use_tf_amp --warmup_steps 20 --train_iter 100 --iter_unit batch --batch_size <batch size> --data_dir=<path to imagenet> --log_dir=<path to results directory>`

 Each of these scripts, by default runs 20 warm-up iterations and measures the next 80 iterations.

@ -307,9 +307,6 @@ Our results were obtained by running the `./scripts/benchmarking/DGX1V_trainbenc

 Our results were obtained by running the `./scripts/benchmarking/DGX1V_inferbench_fp16.sh` and `./scripts/benchmarking/DGX1V_inferbench_fp32.sh` scripts in the tensorflow-19.02-py3 Docker container on NVIDIA DGX-1 with 8 V100 16G GPUs.

-Those results can be improved when [XLA](https://www.tensorflow.org/xla) is used 
-in conjunction with mixed precision, delivering up to 3.3x speedup over FP32 on a single GPU (~1179 img/s).
-However XLA is still considered experimental.

 ## Inference performance results

@ -331,5 +328,9 @@ However XLA is still considered experimental.
 1. March 1, 2019
  * Initial release

+2. May 23, 2019
+  * TF-AMP support added
+  * Benchmark scripts updated
+
 # Known issues
 There are no known issues with this model.
--- a/TensorFlow/Classification/RN50v1.5/main.py
+++ b/TensorFlow/Classification/RN50v1.5/main.py
@ -1,4 +1,4 @@
-# !/usr/bin/env python
+#!/usr/bin/env python
 # -*- coding: utf-8 -*-

 # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
@ -63,13 +63,12 @@ if __name__ == "__main__":
        weight_decay=FLAGS.weight_decay,
        momentum=FLAGS.momentum,
        loss_scale=FLAGS.loss_scale,
-        use_auto_loss_scaling=FLAGS.use_auto_loss_scaling,
+        use_static_loss_scaling=FLAGS.use_static_loss_scaling,
        distort_colors=False,

        # ======= Optimization HParams ======== #
        use_xla=FLAGS.use_xla,
        use_tf_amp=FLAGS.use_tf_amp,
-        use_fast_math=FLAGS.use_fast_math,
        
        seed=FLAGS.seed,
    )
@ -93,7 +92,6 @@ if __name__ == "__main__":
        # ======= Optimization HParams ======== #
        use_xla=RUNNING_CONFIG.use_xla,
        use_tf_amp=RUNNING_CONFIG.use_tf_amp,
-        use_fast_math=RUNNING_CONFIG.use_fast_math,

        seed=RUNNING_CONFIG.seed
    )
@ -110,7 +108,7 @@ if __name__ == "__main__":
            learning_rate_init=RUNNING_CONFIG.learning_rate_init,
            momentum=RUNNING_CONFIG.momentum,
            loss_scale=RUNNING_CONFIG.loss_scale,
-            use_auto_loss_scaling=FLAGS.use_auto_loss_scaling,
+            use_static_loss_scaling=FLAGS.use_static_loss_scaling,
            is_benchmark=RUNNING_CONFIG.mode == 'training_benchmark',
        )

--- a/TensorFlow/Classification/RN50v1.5/runtime/runner.py
+++ b/TensorFlow/Classification/RN50v1.5/runtime/runner.py
@ -56,7 +56,6 @@ class Runner(object):
        # ======= Optimization HParams ======== #
        use_xla=False,
        use_tf_amp=False,
-        use_fast_math=False,

        # ======== Debug Flags ======== #
        debug_verbosity=0,
@ -105,34 +104,19 @@ class Runner(object):
        os.environ['TF_DISABLE_NVTX_RANGES'] = '1'

        # ============================================
-        # TF-AMP and Fast Math Setup - Do not remove
+        # TF-AMP Setup - Do not remove
        # ============================================

        if dtype == tf.float16:

-            if use_fast_math:
-                raise RuntimeError("Fast Math can not be activated for FP16 precision")
-
            if use_tf_amp:
                raise RuntimeError("TF AMP can not be activated for FP16 precision")

-        elif use_fast_math and use_tf_amp:
-            raise RuntimeError("TF AMP and Fast Math can not be activated simultaneously")
-
-        else:
-
-            if use_fast_math:
-                if hvd.rank() == 0:
-                    LOGGER.log("Fast Math computation is activated - Experimental Feature")
-
-                os.environ["TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32"] = "1"
-                os.environ["TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32"] = "1"
-                os.environ["TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32"] = "1"
-
-            elif use_tf_amp:
-                if hvd.rank() == 0:
-                    LOGGER.log("TF AMP is activated - Experimental Feature")
-                os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE"] = "1"
+        elif use_tf_amp:
+            
+            if hvd.rank() == 0:
+                LOGGER.log("TF AMP is activated - Experimental Feature")
+            os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_GRAPH_REWRITE"] = "1"

        # =================================================

@ -150,7 +134,6 @@ class Runner(object):

        run_config_performance = tf.contrib.training.HParams(
            num_preprocessing_threads=32,
-            use_fast_math=use_fast_math,
            use_tf_amp=use_tf_amp,
            use_xla=use_xla,
        )
@ -159,7 +142,7 @@ class Runner(object):
            model_dir=model_dir if not hvd_utils.is_using_hvd() or hvd.rank() == 0 else None,
            log_dir=log_dir if not hvd_utils.is_using_hvd() or hvd.rank() == 0 else None,
            data_dir=data_dir,
-            num_preprocessing_threads=32,
+            num_preprocessing_threads=16,
        )

        self.run_hparams = Runner._build_hparams(model_hparams, run_config_additional, run_config_performance)
@ -311,7 +294,7 @@ class Runner(object):
        momentum=0.9,
        log_every_n_steps=1,
        loss_scale=256,
-        use_auto_loss_scaling=False,
+        use_static_loss_scaling=False,
        is_benchmark=False
    ):

@ -321,15 +304,14 @@ class Runner(object):
        if self.run_hparams.data_dir is None and not is_benchmark:
            raise ValueError('`data_dir` must be specified for training!')

-        if self.run_hparams.use_fast_math or self.run_hparams.use_tf_amp or self.run_hparams.dtype == tf.float16:
-            if use_auto_loss_scaling:
+        if self.run_hparams.use_tf_amp or self.run_hparams.dtype == tf.float16:
+            if use_static_loss_scaling:
+                os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "0"
+            else:
                LOGGER.log("TF Loss Auto Scaling is activated - Experimental Feature")
                os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "1"
-
-            else:
-                os.environ["TF_ENABLE_AUTO_MIXED_PRECISION_LOSS_SCALING"] = "0"
        else:
-            use_auto_loss_scaling = False  # Make sure it hasn't been set to True on FP32 training
+            use_static_loss_scaling = False  # Make sure it hasn't been set to True on FP32 training

        num_gpus = 1 if not hvd_utils.is_using_hvd() else hvd.size()
        global_batch_size = batch_size * num_gpus
@ -407,7 +389,7 @@ class Runner(object):
            'learning_rate_init': learning_rate_init,
            'weight_decay': weight_decay,
            'loss_scale': loss_scale,
-            'apply_loss_scaling': not use_auto_loss_scaling
+            'apply_loss_scaling': use_static_loss_scaling
        }

        image_classifier = self._get_estimator(
--- a/TensorFlow/Classification/RN50v1.5/scripts/RN50_FP16_16GPU.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/RN50_FP16_16GPU.sh
@ -0,0 +1,19 @@
+# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches ResNet50 training in FP16 on 16 GPUs using 4096 batch size (256 per GPU)
+# Usage ./RN50_FP16_16GPU.sh <path to this repository> <path to dataset> <path to results directory>
+
+mpiexec --allow-run-as-root --bind-to socket -np 16 \
+python $1/main.py --num_iter=90 --iter_unit=epoch --data_dir=$2 --batch_size=256 --use_tf_amp --results_dir=$3
--- a/TensorFlow/Classification/RN50v1.5/scripts/RN50_FP32_16GPU.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/RN50_FP32_16GPU.sh
@ -0,0 +1,19 @@
+# Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches ResNet50 training in FP32 on 16 GPUs using 2048 batch size (128 per GPU)
+## Usage ./RN50_FP32_16GPU.sh <path to this repository> <path to dataset> <path to results directory>
+
+mpiexec --allow-run-as-root --bind-to socket -np 16 \
+python $1/main.py --num_iter=90 --iter_unit=epoch --data_dir=$2  --batch_size=128 --results_dir=$3
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_inferbench_fp16.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_inferbench_fp16.sh
@ -2,4 +2,6 @@

 mkdir -p /tmp/results

-python ./scripts/benchmarking/benchmark.py --mode inference --use_tf_amp --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_infer_fp16.json  --data_dir $1 --results_dir $2
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 192 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --perf_args "use_tf_amp" "use_xla" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_inferbench_fp32.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_inferbench_fp32.sh
@ -2,4 +2,6 @@

 mkdir -p /tmp/results

-python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 96 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_trainbench_fp16.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_trainbench_fp16.sh
@ -2,4 +2,6 @@

 mkdir -p /tmp/results

-python ./scripts/benchmarking/benchmark.py --mode training --use_tf_amp --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_train_fp16.json  --data_dir $1 --results_dir $2
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json  --data_dir $1 --perf_args "use_tf_amp" --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 192 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_trainbench_fp32.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX1V_trainbench_fp32.sh
@ -2,4 +2,6 @@

 mkdir -p /tmp/results

-python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 --bs 32 64 96 --baseline ./scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_inferbench_fp16.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_inferbench_fp16.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+mkdir -p /tmp/results
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_inferbench_fp32.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_inferbench_fp32.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+mkdir -p /tmp/results
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode inference --bench-warmup 100 --bench-iterations 200 --ngpus 1 --bs 1 2 4 8 16 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_trainbench_fp16.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_trainbench_fp16.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+mkdir -p /tmp/results
+
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json --perf_args "use_tf_amp" --data_dir $1 --results_dir $2
+
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 64 128 256 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json --perf_args "use_xla" "use_tf_amp" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_trainbench_fp32.sh
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/DGX2_trainbench_fp32.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+mkdir -p /tmp/results
+
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json --data_dir $1 --results_dir $2
+ 	
+python ./scripts/benchmarking/benchmark.py --mode training --bench-warmup 200 --bench-iterations 500 --ngpus 1 4 8 16 --bs 32 64 128 --baseline ./scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json --perf_args "use_xla" --data_dir $1 --results_dir $2/xla
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp16.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp16.json
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_infer_fp32.json
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp16.json
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX1V_RN50_tensorflow_train_fp32.json
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp16.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp16.json
@ -0,0 +1,51 @@
+{
+    "metric_keys": [
+        "total_ips"
+    ], 
+    "metrics": {
+        "1": {
+            "16": {
+                "total_ips": 1300.0
+            }, 
+            "32": {
+                "total_ips": 1600.0
+            }, 
+            "1": {
+                "total_ips": 160.0
+            }, 
+            "2": {
+                "total_ips": 320.0
+            }, 
+            "64": {
+                "total_ips": 1800.0
+            }, 
+            "4": {
+                "total_ips": 550.0
+            }, 
+            "128": {
+                "total_ips": 1950.0
+            }, 
+            "8": {
+                "total_ips": 950.0
+            }, 
+            "256": {
+                "total_ips": 2050.0
+            }
+        }
+    }, 
+    "model": "", 
+    "ngpus": [
+        1
+    ], 
+    "bs": [
+        1, 
+        2, 
+        4, 
+        8, 
+        16, 
+        32, 
+        64, 
+        128, 
+        256
+    ]
+}
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_infer_fp32.json
@ -0,0 +1,47 @@
+{
+    "metric_keys": [
+        "total_ips"
+    ], 
+    "metrics": {
+        "1": {
+            "16": {
+                "total_ips": 800.0
+            }, 
+            "32": {
+                "total_ips": 920.0
+            }, 
+            "1": {
+                "total_ips": 150.0
+            }, 
+            "2": {
+                "total_ips": 270.0
+            }, 
+            "64": {
+                "total_ips": 1000.0
+            }, 
+            "4": {
+                "total_ips": 450.0
+            }, 
+            "128": {
+                "total_ips": 1075.0
+            }, 
+            "8": {
+                "total_ips": 650.0
+            }
+        }
+    }, 
+    "model": "", 
+    "ngpus": [
+        1
+    ], 
+    "bs": [
+        1, 
+        2, 
+        4, 
+        8, 
+        16, 
+        32, 
+        64, 
+        128
+    ]
+}
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp16.json
@ -0,0 +1,63 @@
+{
+    "metric_keys": [
+        "total_ips"
+    ], 
+    "metrics": {
+        "1": {
+            "64": {
+                "total_ips": 630.0
+            }, 
+            "128": {
+                "total_ips": 710.0
+            }, 
+            "256": {
+                "total_ips": 750.0
+            }
+        }, 
+        "4": {
+            "64": {
+                "total_ips": 2250.0
+            }, 
+            "128": {
+                "total_ips": 2600.0
+            }, 
+            "256": {
+                "total_ips": 2900.0
+            }
+        }, 
+        "8": {
+            "64": {
+                "total_ips": 4650.0
+            }, 
+            "128": {
+                "total_ips": 5500.0
+            }, 
+            "256": {
+                "total_ips": 6000.0
+            }
+        },
+        "16": {
+            "64": {
+                "total_ips": 9000.0
+            }, 
+            "128": {
+                "total_ips": 10500.0
+            }, 
+            "256": {
+                "total_ips": 11500.0
+            }
+        }
+    }, 
+    "model": "", 
+    "ngpus": [
+        1,
+        4,
+        8,
+        16
+    ], 
+    "bs": [
+        64, 
+        128, 
+        256
+    ]
+}
--- a/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json
+++ b/TensorFlow/Classification/RN50v1.5/scripts/benchmarking/baselines/DGX2_RN50_tensorflow_train_fp32.json
@ -0,0 +1,63 @@
+{
+    "metric_keys": [
+        "total_ips"
+    ], 
+    "metrics": {
+        "1": {
+            "32": {
+                "total_ips": 300.0
+            }, 
+            "64": {
+                "total_ips": 330.0
+            }, 
+            "128": {
+                "total_ips": 350.0
+            }
+        }, 
+        "4": {
+            "32": {
+                "total_ips": 1050.0
+            }, 
+            "64": {
+                "total_ips": 1250.0
+            }, 
+            "128": {
+                "total_ips": 1350.0
+            }
+        }, 
+        "8": {
+            "32": {
+                "total_ips": 2100.0
+            }, 
+            "64": {
+                "total_ips": 2500.0
+            }, 
+            "128": {
+                "total_ips": 2700.0
+            }
+        },
+         "16": {
+            "32": {
+                "total_ips": 4100.0
+            }, 
+            "64": {
+                "total_ips": 5100.0
+            }, 
+            "128": {
+                "total_ips": 5500.0
+            }
+        }
+    }, 
+    "model": "", 
+    "ngpus": [
+        1,
+        4,
+        8,
+        16
+    ], 
+    "bs": [
+        32, 
+        64, 
+        128
+    ]
+}
--- a/TensorFlow/Classification/RN50v1.5/utils/cmdline_helper.py
+++ b/TensorFlow/Classification/RN50v1.5/utils/cmdline_helper.py
@ -150,10 +150,10 @@ def parse_cmdline():

    _add_bool_argument(
        parser=p,
-        name="use_auto_loss_scaling",
+        name="use_static_loss_scaling",
        default=False,
        required=False,
-        help="Use AutoLossScaling in FP16, FP32 - Fast Math or FP32 AMP."
+        help="Use static loss scaling in FP16 or FP32 AMP."
    )

    _add_bool_argument(
@ -164,14 +164,6 @@ def parse_cmdline():
        help="Enable XLA (Accelerated Linear Algebra) computation for improved performance."
    )

-    #Enable FastMath Computation using TensorCores to speedup FP32 computation.
-    p.add_argument(
-        "--use_fast_math",
-        action='store_true',
-        required=False,
-        help=argparse.SUPPRESS 
-    )
-
    _add_bool_argument(
        parser=p,
        name="use_tf_amp",
--- a/TensorFlow/Classification/RN50v1.5/utils/data_utils.py
+++ b/TensorFlow/Classification/RN50v1.5/utils/data_utils.py
@ -95,8 +95,8 @@ def get_tfrecords_input_fn(filenames, batch_size, height, width, training, disto
    ds = ds.apply(
        tf.data.experimental.parallel_interleave(
            tf.data.TFRecordDataset,
-            cycle_length=4,
-            block_length=16,
+            cycle_length=10,
+            block_length=8,
            sloppy=not deterministic,
            prefetch_input_elements=16
        )
@ -109,7 +109,7 @@ def get_tfrecords_input_fn(filenames, batch_size, height, width, training, disto
        return image_processing.preprocess_image_record(record, height, width, _NUM_CHANNELS, training)

    ds = ds.cache()
-
+    
    if training:

        ds = ds.apply(tf.data.experimental.shuffle_and_repeat(buffer_size=shuffle_buffer_size, seed=seed))
--- a/TensorFlow/Detection/SSD/Dockerfile
+++ b/TensorFlow/Detection/SSD/Dockerfile
@ -1,4 +1,4 @@
-FROM nvcr.io/nvidia/tensorflow:19.03-py3 as base
+FROM nvcr.io/nvidia/tensorflow:19.05-py3 as base

 FROM base as sha

--- a/TensorFlow/Detection/SSD/README.md
+++ b/TensorFlow/Detection/SSD/README.md
@ -108,7 +108,7 @@ Moreover the script will download pre-trained RN50 checkpoint in the `<checkpoin

 ### 4. Launch the NGC container to run training/inference.
 ```
-nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <data_dir_path>:/data -v <checkpoint_dir_path>:/checkpoints --ipc=host nvidia_ssd
+nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v <data_dir_path>:/data/coco2017_tfrecords -v <checkpoint_dir_path>:/checkpoints --ipc=host nvidia_ssd
 ```

 ### 5. Start training.
@ -116,6 +116,7 @@ nvidia-docker run --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=6710
 The `./examples` directory provides several sample scripts for various GPU settings and act as wrappers around
 `object_detection/model_main.py` script. The example scripts can be modified by arguments: 
 - A path to directory for checkpoints
+- A path to directory for configs
 - Additional arguments to `object_detection/model_main.py`

 If you want to run 8 GPUs, training with tensor cores acceleration and save checkpoints in `/checkpoints` directory, run:
@ -178,7 +179,7 @@ The SSD320 v1.2 model was trained on the COCO 2017 dataset. The val2017 validati
 The `download_data.sh` script will preprocess the data to tfrecords format.

 This repository contains the `download_dataset.sh` script which will automatically download and preprocess the training,
-validation and test datasets. By default, data will be downloaded to the `/data` directory.
+validation and test datasets. By default, data will be downloaded to the `/data/coco2017_tfrecords` directory.

 ### Training process
 Training the SSD model is implemented in the `object_detection/model_main.py` script. 
@ -331,6 +332,8 @@ To achieve same results, follow the [Quick start guide](#quick-start-guide) outl

 March 2019
 * Initial release
+May 2019
+ * Test scripts updated

 ## Known issues
 There are no known issues with this model.
--- a/TensorFlow/Detection/SSD/configs/ssd320_bench.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_bench.config
@ -172,7 +172,7 @@ train_config: {

 train_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*train*"
+    input_path: "/data/coco2017_tfrecords/*train*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
 }
@ -185,7 +185,7 @@ eval_config: {

 eval_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*val*"
+    input_path: "/data/coco2017_tfrecords/*val*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
  shuffle: false
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_1gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_1gpus.config
@ -172,7 +172,7 @@ train_config: {

 train_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*train*"
+    input_path: "/data/coco2017_tfrecords/*train*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
 }
@ -185,7 +185,7 @@ eval_config: {

 eval_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*val*"
+    input_path: "/data/coco2017_tfrecords/*val*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
  shuffle: false
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_4gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_4gpus.config
@ -172,7 +172,7 @@ train_config: {

 train_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*train*"
+    input_path: "/data/coco2017_tfrecords/*train*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
 }
@ -185,7 +185,7 @@ eval_config: {

 eval_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*val*"
+    input_path: "/data/coco2017_tfrecords/*val*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
  shuffle: false
--- a/TensorFlow/Detection/SSD/configs/ssd320_full_8gpus.config
+++ b/TensorFlow/Detection/SSD/configs/ssd320_full_8gpus.config
@ -172,7 +172,7 @@ train_config: {

 train_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*train*"
+    input_path: "/data/coco2017_tfrecords/*train*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
 }
@ -185,7 +185,7 @@ eval_config: {

 eval_input_reader: {
  tf_record_input_reader {
-    input_path: "/data/*val*"
+    input_path: "/data/coco2017_tfrecords/*val*"
  }
  label_map_path: "object_detection/data/mscoco_label_map.pbtxt"
  shuffle: false
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 export TF_ENABLE_AUTO_MIXED_PRECISION=1

@ -8,8 +8,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-time python -u /workdir/models/research/object_detection/model_main.py \
+time python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
-       "${@:2}"
+       "${@:3}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_1GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_1GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=1

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,9 +9,13 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "Single GPU mixed precision training performance: " && \
-python -u /workdir/models/research/object_detection/model_main.py \
+TRAIN_LOG=$(python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
-        "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+       "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "Single GPU mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_4gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
 GPUS=4

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -19,8 +19,8 @@ time mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}"
+               "${@:3}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_4GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_4GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=4

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,8 +9,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "$GPUS GPUs mixed precision training performance: " && \
-mpirun --allow-run-as-root \
+TRAIN_LOG=$(mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
       -bind-to none \
@ -20,8 +19,13 @@ mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+               "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_8gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
 GPUS=8

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,6 +9,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

+mkdir -p $CKPT_DIR
+
 time mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
@ -19,8 +21,8 @@ time mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}"
+               "${@:3}" 2>&1 | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_8GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
 CKPT_DIR=${1:-"/results/SSD320_FP16_8GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=8

 export TF_ENABLE_AUTO_MIXED_PRECISION=1
@ -9,8 +9,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "$GPUS GPUs mixed precision training performance: " && \
-mpirun --allow-run-as-root \
+TRAIN_LOG=$(mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
       -bind-to none \
@ -20,8 +19,13 @@ mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+               "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "$GPUS GPUs mixed precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP16_inference.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP16_inference.sh
@ -1,4 +1,4 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
+PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 export TF_ENABLE_AUTO_MIXED_PRECISION=1

@ -8,4 +8,4 @@ PYTHONPATH=$PYTHONPATH:$OBJECT_DETECTION

 python $SCRIPT_DIR/SSD320_inference.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
-       "$@"
+       "${@:2}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU.sh
@ -1,13 +1,13 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 TENSOR_OPS=0
 export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-time python -u /workdir/models/research/object_detection/model_main.py \
+time python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
-       "${@:2}"
+       "${@:3}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_1GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
 CKPT_DIR=${1:-"/results/SSD320_FP32_1GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=1

 TENSOR_OPS=0
@ -7,9 +7,13 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "Single GPU single precision training performance: " && \
-python -u /workdir/models/research/object_detection/model_main.py \
+TRAIN_LOG=$(python -u ./object_detection/model_main.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
       --model_dir=${CKPT_DIR} \
       --alsologtostder \
-        "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+       "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "Single GPU single precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_4gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_4gpus.config"
 GPUS=4

 TENSOR_OPS=0
@ -17,8 +17,8 @@ time mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}"
+               "${@:3}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_4GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
-CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
+CKPT_DIR=${1:-"/results/SSD320_FP32_4GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=4

 TENSOR_OPS=0
@ -7,8 +7,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "$GPUS GPUs single precision training performance: " && \
-mpirun --allow-run-as-root \
+TRAIN_LOG=$(mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
       -bind-to none \
@ -18,8 +17,13 @@ mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+               "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_8gpus.config
 CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_full_8gpus.config"
 GPUS=8

 TENSOR_OPS=0
@ -7,6 +7,8 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

+mkdir -p $CKPT_DIR
+
 time mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
@ -17,8 +19,8 @@ time mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}"
+               "${@:3}" 2>&1 | tee $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU_BENCHMARK.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_8GPU_BENCHMARK.sh
@ -1,5 +1,5 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_bench.config
 CKPT_DIR=${1:-"/results/SSD320_FP32_8GPU"}
+PIPELINE_CONFIG_PATH=${2:-"/workdir/models/research/configs"}"/ssd320_bench.config"
 GPUS=8

 TENSOR_OPS=0
@ -7,8 +7,7 @@ export TF_ENABLE_CUBLAS_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}
 export TF_ENABLE_CUDNN_RNN_TENSOR_OP_MATH_FP32=${TENSOR_OPS}

-echo -n "$GPUS GPUs single precision training performance: " && \
-mpirun --allow-run-as-root \
+TRAIN_LOG=$(mpirun --allow-run-as-root \
       -np $GPUS \
       -H localhost:$GPUS \
       -bind-to none \
@ -18,8 +17,13 @@ mpirun --allow-run-as-root \
       -x PATH \
       -mca pml ob1 \
       -mca btl ^openib \
-        python -u /workdir/models/research/object_detection/model_main.py \
+        python -u ./object_detection/model_main.py \
               --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
               --model_dir=${CKPT_DIR} \
               --alsologtostder \
-               "${@:2}" 2>&1 | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}'
+               "${@:3}" 2>&1)
+PERF=$(echo "$TRAIN_LOG" | awk -v GPUS=$GPUS '/global_step\/sec/{ array[num++]=$2 } END { for (x = 3*num/4; x < num; ++x) { sum += array[x] }; print GPUS*32*4*sum/num " img/s"}')
+
+mkdir -p $CKPT_DIR
+echo "$GPUS GPUs single precision training performance: $PERF" | tee $CKPT_DIR/train_log
+echo "$TRAIN_LOG" >> $CKPT_DIR/train_log
--- a/TensorFlow/Detection/SSD/examples/SSD320_FP32_inference.sh
+++ b/TensorFlow/Detection/SSD/examples/SSD320_FP32_inference.sh
@ -1,4 +1,4 @@
-PIPELINE_CONFIG_PATH=/workdir/models/research/configs/ssd320_full_1gpus.config
+PIPELINE_CONFIG_PATH=${1:-"/workdir/models/research/configs"}"/ssd320_full_1gpus.config"

 SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")
 OBJECT_DETECTION=$(realpath $SCRIPT_DIR/../object_detection/)
@ -6,4 +6,4 @@ PYTHONPATH=$PYTHONPATH:$OBJECT_DETECTION

 python $SCRIPT_DIR/SSD320_inference.py \
       --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
-       "$@"
+       "${@:2}"
--- a/TensorFlow/Detection/SSD/examples/SSD320_inference.py
+++ b/TensorFlow/Detection/SSD/examples/SSD320_inference.py
@ -13,6 +13,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+import sys
+
 from absl import flags
 from time import time

@ -63,7 +65,8 @@ class TimingHook(tf.train.SessionRunHook):
        self.start_time = time()

    def log_progress(self):
-        print(len(self.times) - FLAGS.warmup_iters, '/', FLAGS.benchmark_iters, ' '*10, end='\r')
+        if sys.stdout.isatty():
+            print(len(self.times) - FLAGS.warmup_iters, '/', FLAGS.benchmark_iters, ' '*10, end='\r')

    def after_run(self, *args, **kwargs):
        super(TimingHook, self).after_run(*args, **kwargs)
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_1epoch.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_1epoch.sh
@ -1,23 +0,0 @@
-TARGET_mAP=${TARGET_mAP:-0.0020408058}
-TARGET_loss=${TARGET_loss:-2.2808013}
-TOLERANCE=${TOLERANCE:-0.1}
-
-PRECISION=${PRECISION:-FP16}
-
-TRAIN_LOG=$(bash examples/SSD320_${PRECISION}_8GPU.sh /results/SSD320_${PRECISION}_8GPU --num_train_steps $((12500/27)) 2>&1 | tee /dev/tty)
-
-mAP=$( echo $TRAIN_LOG | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
-loss=$(echo $TRAIN_LOG | sed -n 's|.*Loss for final step: \(.*\)\.|\1|p' | tail -n1)
-
-mAP_error=$( python -c "print(abs($TARGET_mAP  - $mAP)/$mAP)")
-loss_error=$(python -c "print(abs($TARGET_loss - $loss)/$loss)")
-
-if [[ $mAP_error < $TOLERANCE && $loss_error < $TOLERANCE ]]
-then
-    echo PASS
-else
-    echo expected: mAP=$TARGET_mAP loss=$TARGET_loss
-    echo got:      mAP=$mAP        loss=$loss
-    echo FAIL
-    exit 1
-fi
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_FP16_1epoch.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_FP16_1epoch.sh
@ -1 +0,0 @@
-PRECISION=FP16 bash qa/testing_DGX1V_8GPU_1epoch.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_FP32_1epoch.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_8GPU_FP32_1epoch.sh
@ -1 +0,0 @@
-PRECISION=FP32 bash qa/testing_DGX1V_8GPU_1epoch.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy.sh
@ -0,0 +1,26 @@
+TARGET_mAP=${TARGET_mAP:-0.137}
+TARGET_loss=${TARGET_loss:-2.3}
+TOLERANCE=${TOLERANCE:-0.1}
+
+PRECISION=${PRECISION:-FP16}
+
+bash ../../examples/SSD320_${PRECISION}_8GPU_BENCHMARK.sh /results/SSD320_${PRECISION}_8GPU ../../configs
+
+mAP=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
+loss=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*Loss for final step: \(.*\)\.|\1|p' | tail -n1)
+
+mAP_error=$( python -c "print(abs($TARGET_mAP  - $mAP)/$mAP)")
+loss_error=$(python -c "print(abs($TARGET_loss - $loss)/$loss)")
+
+
+cat /results/SSD320_${PRECISION}_8GPU/train_log
+echo expected: mAP=$TARGET_mAP loss=$TARGET_loss
+echo got:      mAP=$mAP        loss=$loss
+
+if [[ -n $mAP_error && $mAP_error < $TOLERANCE && -n $loss_error && $loss_error < $TOLERANCE ]]
+then
+    echo PASS
+else
+    echo FAIL
+    exit 1
+fi
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy_fp16.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy_fp16.sh
@ -0,0 +1 @@
+PRECISION=FP16 bash ../../qa/testing_DGX1V_accuracy.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy_fp32.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_accuracy_fp32.sh
@ -0,0 +1 @@
+PRECISION=FP32 bash ../../qa/testing_DGX1V_accuracy.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence.sh
@ -0,0 +1,20 @@
+TARGET_mAP=${TARGET_mAP:-0.281}
+TOLERANCE=${TOLERANCE:-0.04}
+
+PRECISION=${PRECISION:-FP16}
+
+bash ../../examples/SSD320_${PRECISION}_8GPU.sh /results/SSD320_${PRECISION}_8GPU ../../configs
+
+mAP=$(cat /results/SSD320_${PRECISION}_8GPU/train_log | sed -n 's|.*DetectionBoxes_Precision/mAP = \([^,]*\),.*|\1|p' | tail -n1)
+
+mAP_error=$( python -c "print(abs($TARGET_mAP  - $mAP)/$TARGET_mAP)")
+
+echo expected: mAP=$TARGET_mAP
+echo got:      mAP=$mAP
+if [[ $mAP_error < $TOLERANCE ]]
+then
+   echo PASS
+else
+    echo FAIL
+    exit 1
+fi
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence_fp16.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence_fp16.sh
@ -0,0 +1 @@
+PRECISION=FP16 bash ../../qa/testing_DGX1V_convergence.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence_fp32.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_convergence_fp32.sh
@ -0,0 +1 @@
+PRECISION=FP32 bash ../../qa/testing_DGX1V_convergence.sh
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_inference_benchmark_fp16.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_inference_benchmark_fp16.sh
@ -1,11 +1,20 @@
 #!/bin/bash
-BASELINES=(193.6 135.2 171.5 188.3 187 187.6 191.4)
+BASELINES=(93.6 136.3 172.1 190.8 188.2 189.4 192.2)
+TOLERANCE=0.07
+PRECISION=FP16

 for i in `seq 0 6`
 do
-        echo "Testing mixed precision inference speed on batch size = $((2 ** $i))"
-        bash examples/SSD320_FP16_inference.sh --batch_size $((2 ** $i)) > tmp 2> /dev/null
-        echo -n "img/s: "; tail -n 1 tmp | awk '{print $3}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($3 - BASELINE)^2)/$3}'`; echo $err
-        rm tmp
-        if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
+    BS=$((2 ** $i))
+    
+    MSG="Testing mixed precision inference speed on batch size = $BS"
+    CMD="bash ../../examples/SSD320_${PRECISION}_inference.sh ../../configs --batch_size $BS"
+
+    if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
+    then
+        exit $?
+    fi
+
 done
+
+return $result
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_inference_benchmark_fp32.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_inference_benchmark_fp32.sh
@ -1,11 +1,20 @@
 #!/bin/bash
-BASELINES=(97.4 134 163 175.1 175.4 174.5 177.6)
+BASELINES=(93.2 136.2 171.2 189.4 188.0 188.7 192.5)
+PRECISION=FP32
+TOLERANCE=0.07

 for i in `seq 0 6`
 do
-        echo "Testing single precision inference speed on batch size = $((2 ** $i))"
-        bash examples/SSD320_FP16_inference.sh --batch_size $((2 ** $i)) > tmp 2> /dev/null
-        echo -n "img/s: "; tail -n 1 tmp | awk '{print $3}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($3 - BASELINE)^2)/$3}'`; echo $err
-        rm tmp
-        if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
+    BS=$((2 ** $i))
+    
+    MSG="Testing single precision inference speed on batch size = $BS"
+    CMD="bash ../../examples/SSD320_${PRECISION}_inference.sh ../../configs --batch_size $BS"
+
+    if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
+    then
+        exit $?
+    fi
+
 done
+
+return $result
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_performance.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_performance.sh
@ -0,0 +1,24 @@
+if [[ -z $CMD || -z $BASELINE || -z $TOLERANCE ]]
+then
+    echo some variables is not set
+    exit 1
+fi
+
+
+echo $MSG
+RESULT=$($CMD)
+
+imgps=$(echo $RESULT | tail -n 1 | awk '{print $3}')
+LB_imgps=$(python -c "print($BASELINE * (1-$TOLERANCE))")
+
+echo imgs/s: $imgps expected imgs/s: $BASELINE
+echo accepted minimum: $LB_imgps
+
+if [[ $imgps > $LB_imgps ]]
+then
+    echo PASSED
+else
+    echo $RESULT
+    echo FAILED
+    exit 1
+fi
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_training_benchmark_fp16.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_training_benchmark_fp16.sh
@ -1,13 +1,21 @@
 #!/bin/bash
-BASELINES=(125 430 750)
+BASELINES=(120 480 800)
+GPUS=(1 4 8)
+PRECISION=FP16
+TOLERANCE=0.11

-i=0
-for GPUS in 1 4 8
+for i in {1..4}
 do
-        echo "Testing mixed precision training speed on $GPUS GPUs"
-        bash examples/SSD320_FP16_${GPUS}_BENCHMARK.sh > tmp 2> /dev/null
-        echo -n "img/s: "; tail -n 1 tmp | awk '{print $7}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($7 - BASELINE)^2)/$7}'`; echo $err
-        rm tmp
-        if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
-        i=$(($i + 1))
+    GPU=${GPUS[$i]}
+    
+    MSG="Testing mixed precision training speed on $GPUS GPUs"
+    CMD="bash ../../examples/SSD320_FP16_${GPU}GPU_BENCHMARK.sh /results/SSD320_FP16_${GPU}GPU ../../configs"
+
+    if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
+    then
+        exit $?
+    fi
+
 done
+
+exit $result
--- a/TensorFlow/Detection/SSD/qa/testing_DGX1V_training_benchmark_fp32.sh
+++ b/TensorFlow/Detection/SSD/qa/testing_DGX1V_training_benchmark_fp32.sh
@ -1,13 +1,19 @@
 #!/bin/bash
 BASELINES=(87 330 569)
+GPUS=(1 4 8)
+PRECISION=FP32
+TOLERANCE=0.11

-i=0
-for GPUS in 1 4 8
+for i in {1..4}
 do
-        echo "Testing single precision training speed on $GPUS GPUs"
-        bash examples/SSD320_FP32_${GPUS}_BENCHMARK.sh > tmp 2> /dev/null
-        echo -n "img/s: "; tail -n 1 tmp | awk '{print $7}'; echo "expected img/s: ${BASELINES[$i]}"; echo -n "relative error: "; err=`tail -n 1 tmp | awk -v BASELINE=${BASELINES[$i]} '{print sqrt(($7 - BASELINE)^2)/$7}'`; echo $err
-        rm tmp
-        if [[ $err > 0.1 ]]; then echo "FAILED" && exit 1; else echo "PASSED"; fi
-        i=$(($i + 1))
+    GPU=${GPUS[$i]}
+    
+    MSG="Testing mixed precision training speed on $GPUS GPUs"
+    CMD="bash ../../examples/SSD320_FP16_${GPU}GPU_BENCHMARK.sh /results/SSD320_FP16_${GPU}GPU ../../configs"
+
+    if CMD=$CMD BASELINE=${BASELINES[$i]} TOLERANCE=$TOLERANCE MSG=$MSG bash ../../qa/testing_DGX1V_performance.sh
+    then
+        exit $?
+    fi
+
 done
--- a/TensorFlow/Recommendation/NCF/download_dataset.sh
+++ b/TensorFlow/Recommendation/NCF/download_dataset.sh
@ -18,7 +18,7 @@ then
 	download_1m
 elif [[ ${DATASET_NAME} == "ml-20m" ]]
 then
-        download_20m
+    download_20m
 else
 	echo "Unsupported dataset name: $DATASET_NAME"
 	exit 1
--- a/TensorFlow/Recommendation/NCF/prepare_dataset.sh
+++ b/TensorFlow/Recommendation/NCF/prepare_dataset.sh
@ -14,7 +14,7 @@ set -e

 DATASET_NAME=${1:-'ml-20m'}
 RAW_DATADIR='/data'
-CACHED_DATADIR='/data/cache/'${DATASET_NAME}
+CACHED_DATADIR='/tmp/cache/'${DATASET_NAME}

 # you can add another option to this case in order to support other datasets
 case ${DATASET_NAME} in
--- a/TensorFlow/Segmentation/UNet_Industrial/Dockerfile
+++ b/TensorFlow/Segmentation/UNet_Industrial/Dockerfile
@ -16,7 +16,7 @@
 #
 # ==============================================================================

-FROM nvcr.io/nvidia/tensorflow:19.03-py3
+FROM nvcr.io/nvidia/tensorflow:19.05-py3

 LABEL version="1.0" maintainer="Jonathan DEKHTIAR <jonathan.dekhtiar@nvidia.com>"

--- a/TensorFlow/Segmentation/UNet_Industrial/main.py
+++ b/TensorFlow/Segmentation/UNet_Industrial/main.py
@ -41,6 +41,7 @@ if __name__ == "__main__":

    RUNNING_CONFIG = tf.contrib.training.HParams(
        exec_mode=FLAGS.exec_mode,
+        save_eval_results_to_json=FLAGS.save_eval_results_to_json,

        # ======= Directory HParams ======= #
        log_dir=os.path.join(FLAGS.results_dir, "logs"),
@ -158,5 +159,6 @@ if __name__ == "__main__":
                num_iter=RUNNING_CONFIG.num_iter if RUNNING_CONFIG.exec_mode != "train_and_evaluate" else 1,
                warmup_steps=RUNNING_CONFIG.warmup_steps,
                batch_size=RUNNING_CONFIG.batch_size,
-                is_benchmark=RUNNING_CONFIG.exec_mode == 'inference_benchmark'
+                is_benchmark=RUNNING_CONFIG.exec_mode == 'inference_benchmark',
+                save_eval_results_to_json=RUNNING_CONFIG.save_eval_results_to_json
            )
--- a/TensorFlow/Segmentation/UNet_Industrial/runtime/runner.py
+++ b/TensorFlow/Segmentation/UNet_Industrial/runtime/runner.py
@ -22,6 +22,7 @@
 from __future__ import print_function

 import os
+import json
 import multiprocessing
 import operator
 import random
@ -509,7 +510,7 @@ class Runner(object):
        if not hvd_utils.is_using_hvd() or hvd.local_rank() == 0:
            LOGGER.log('Ending Model Training ...')

-    def evaluate(self, iter_unit, num_iter, batch_size, warmup_steps=50, is_benchmark=False):
+    def evaluate(self, iter_unit, num_iter, batch_size, warmup_steps=50, is_benchmark=False, save_eval_results_to_json=False):

        if iter_unit not in ["epoch", "batch"]:
            raise ValueError('`iter_unit` value is unknown: %s (allowed: ["epoch", "batch"])' % iter_unit)
@ -540,7 +541,7 @@ class Runner(object):
                log_every=self.run_hparams.log_every_n_steps,
                warmup_steps=warmup_steps,
                is_training=False,
-                sample_dir=None
+                sample_dir=self.run_hparams.sample_dir
            )
        ]

@ -630,5 +631,31 @@ class Runner(object):
            LOGGER.log('TPR', tpr)
            LOGGER.log('TNR', tnr)

+            if save_eval_results_to_json:
+
+                results_dict = {
+                    'IoU': {
+                        '0.75': str(eval_results["IoU_THS_0.75"]),
+                        '0.85': str(eval_results["IoU_THS_0.85"]),
+                        '0.95': str(eval_results["IoU_THS_0.95"]),
+                        '0.99': str(eval_results["IoU_THS_0.99"]),
+                    },
+                    'TPR': {
+                        '0.75': str(tpr[-4]),
+                        '0.85': str(tpr[-3]),
+                        '0.95': str(tpr[-2]),
+                        '0.99': str(tpr[-1]),
+                    },
+                    'TNR': {
+                        '0.75': str(tnr[-4]),
+                        '0.85': str(tnr[-3]),
+                        '0.95': str(tnr[-2]),
+                        '0.99': str(tnr[-1]),
+                    }
+                }
+
+                with open(os.path.join(self.run_hparams.model_dir, "..", "results.json"), 'w') as f:
+                    json.dump(results_dict, f)
+
        except KeyboardInterrupt:
            print("Keyboard interrupt")
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_1GPU.sh
@ -17,9 +17,11 @@
 # This script launches UNet training in FP32-AMP on 1 GPU using 16 batch size (16 per GPU)
 # Usage ./UNet_FP32AMP_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../main.py \
+pip install ${BASEDIR}/../dllogger/
+
+python ${BASEDIR}/../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_4GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_4GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training in FP32-AMP on 4 GPUs using 16 batch size (4 per GPU)
 # Usage ./UNet_FP32AMP_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../dllogger/

 mpirun \
    -np 4 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../main.py \
+    python ${BASEDIR}/../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_8GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training in FP32-AMP on 8 GPUs using 16 batch size (2 per GPU)
 # Usage ./UNet_FP32AMP_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../dllogger/

 mpirun \
    -np 8 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../main.py \
+    python ${BASEDIR}/../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_EVAL.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_AMP_EVAL.sh
@ -17,9 +17,11 @@
 # This script launches UNet evaluation in FP32-AMP on 1 GPUs using 16 batch size
 # Usage ./UNet_FP32AMP_EVAL.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../main.py \
+pip install ${BASEDIR}/../dllogger/
+
+python ${BASEDIR}/../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_1GPU.sh
@ -17,9 +17,11 @@
 # This script launches UNet training in FP32 on 1 GPU using 16 batch size (16 per GPU)
 # Usage ./UNet_FP32_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../main.py \
+pip install ${BASEDIR}/../dllogger/
+
+python ${BASEDIR}/../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_4GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_4GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training in FP32 on 4 GPUs using 16 batch size (4 per GPU)
 # Usage ./UNet_FP32_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../dllogger/

 mpirun \
    -np 4 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../main.py \
+    python ${BASEDIR}/../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_8GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training in FP32 on 8 GPUs using 16 batch size (2 per GPU)
 # Usage ./UNet_FP32_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../dllogger/

 mpirun \
    -np 8 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../main.py \
+    python ${BASEDIR}/../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='train_and_evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_EVAL.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/UNet_FP32_EVAL.sh
@ -17,9 +17,11 @@
 # This script launches UNet evaluation in FP32 on 1 GPUs using 16 batch size
 # Usage ./UNet_FP32_EVAL.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../main.py \
+pip install ${BASEDIR}/../dllogger/
+
+python ${BASEDIR}/../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='evaluate' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_evalbench_AMP.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_evalbench_AMP.sh
@ -17,9 +17,11 @@
 # This script launches UNet evaluation benchmark in FP32-AMP on 1 GPUs using 16 batch size
 # Usage ./DGX1v_evalbench_FP32AMP.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../../main.py \
+pip install ${BASEDIR}/../../dllogger/
+
+python ${BASEDIR}/../../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='inference_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_evalbench_FP32.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_evalbench_FP32.sh
@ -17,9 +17,11 @@
 # This script launches UNet evaluation benchmark in FP32 on 1 GPUs using 16 batch size
 # Usage ./DGX1v_evalbench_FP32.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../../main.py \
+pip install ${BASEDIR}/../../dllogger/
+
+python ${BASEDIR}/../../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='inference_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_1GPU.sh
@ -17,9 +17,11 @@
 # This script launches UNet training benchmark in FP32-AMP on 1 GPU using 16 batch size (16 per GPU)
 # Usage ./DGX1v_trainbench_FP32AMP_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../../main.py \
+pip install ${BASEDIR}/../../dllogger/
+
+python ${BASEDIR}/../../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_4GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_4GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training benchmark in FP32-AMP on 4 GPUs using 16 batch size (4 per GPU)
 # Usage ./DGX1v_trainbench_FP32AMP_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../../dllogger/

 mpirun \
    -np 4 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../../main.py \
+    python ${BASEDIR}/../../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_AMP_8GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training benchmark in FP32-AMP on 8 GPUs using 16 batch size (2 per GPU)
 # Usage ./DGX1v_trainbench_FP32AMP_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../../dllogger/

 mpirun \
    -np 8 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../../main.py \
+    python ${BASEDIR}/../../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_1GPU.sh
@ -17,9 +17,11 @@
 # This script launches UNet training benchmark in FP32 on 1 GPU using 16 batch size (16 per GPU)
 # Usage ./DGX1v_trainbench_FP32_1GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

-python ../../main.py \
+pip install ${BASEDIR}/../../dllogger/
+
+python ${BASEDIR}/../../main.py \
    --unet_variant='tinyUNet' \
    --activation_fn='relu' \
    --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_4GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_4GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training benchmark in FP32 on 4 GPUs using 16 batch size (4 per GPU)
 # Usage ./DGX1v_trainbench_FP32_4GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../../dllogger/

 mpirun \
    -np 4 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../../main.py \
+    python ${BASEDIR}/../../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Industrial/scripts/benchmarking/DGX1v_trainbench_FP32_8GPU.sh
@ -17,7 +17,9 @@
 # This script launches UNet training benchmark in FP32 on 8 GPUs using 16 batch size (2 per GPU)
 # Usage ./DGX1v_trainbench_FP32_8GPU.sh <path to result repository> <path to dataset> <dagm classID (1-10)>

-pip install ../../dllogger/
+BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+
+pip install ${BASEDIR}/../../dllogger/

 mpirun \
    -np 8 \
@ -29,7 +31,7 @@ mpirun \
    -x PATH \
    -mca pml ob1 -mca btl ^openib \
    --allow-run-as-root \
-    python ../../main.py \
+    python ${BASEDIR}/../../main.py \
        --unet_variant='tinyUNet' \
        --activation_fn='relu' \
        --exec_mode='training_benchmark' \
--- a/TensorFlow/Segmentation/UNet_Industrial/utils/cmdline_helper.py
+++ b/TensorFlow/Segmentation/UNet_Industrial/utils/cmdline_helper.py
@ -95,6 +95,14 @@ def parse_cmdline():
        help="""Directory in which to write training logs, summaries and checkpoints."""
    )

+    _add_bool_argument(
+        parser=p,
+        name="save_eval_results_to_json",
+        default=False,
+        required=False,
+        help="Whether to save evaluation results in JSON format."
+    )
+
    p.add_argument('--data_dir', required=False, default=None, type=str, help="Path to dataset directory")

    p.add_argument(
--- a/TensorFlow/Segmentation/UNet_Industrial/utils/hooks/profiler_hook.py
+++ b/TensorFlow/Segmentation/UNet_Industrial/utils/hooks/profiler_hook.py
@ -20,6 +20,7 @@
 # ==============================================================================

 import os
+import json
 import time
 import operator

@ -97,7 +98,7 @@ class ProfilerHook(tf.train.SessionRunHook):

            # ==================== Samples ==================== #

-            if self._sample_dir is not None:
+            if self._sample_dir is not None and self._is_training:
                additional_fetches["samples"] = {}

                additional_fetches["samples"]["input_image"] = tf.get_default_graph(
@ -170,7 +171,7 @@ class ProfilerHook(tf.train.SessionRunHook):
            LOGGER.log("False Positives:", run_values.results["confusion_matrix"]["fp"])
            LOGGER.log("False Negatives:", run_values.results["confusion_matrix"]["fn"])

-            if self._sample_dir is not None:
+            if self._sample_dir is not None and self._is_training:

                for key in sorted(run_values.results["samples"].keys(), key=operator.itemgetter(0)):

@ -208,3 +209,13 @@ class ProfilerHook(tf.train.SessionRunHook):
            "\t[*] Total Processing Time: %dh %02dm %02ds\n" %
            (avg_processing_speed, total_processing_hours, total_processing_minutes, total_processing_seconds)
        )
+
+        perf_dict = {
+            'throughput': str(avg_processing_speed),
+            'processing_time': str(total_processing_time)
+        }
+
+        perf_filename = "performances_%s.json" % ("train" if self._is_training else "eval")
+
+        with open(os.path.join(self._sample_dir, "..", perf_filename), 'w') as f:
+            json.dump(perf_dict, f)
--- a/TensorFlow/Segmentation/UNet_Medical/.gitignore
+++ b/TensorFlow/Segmentation/UNet_Medical/.gitignore
@ -0,0 +1,8 @@
+.idea/
+.ipynb_checkpoints
+/_python_build
+*.pyc
+__pycache__
+*.swp
+/results
+*.zip
--- a/TensorFlow/Segmentation/UNet_Medical/Dockerfile
+++ b/TensorFlow/Segmentation/UNet_Medical/Dockerfile
@ -0,0 +1,6 @@
+FROM nvcr.io/nvidia/tensorflow:19.05-py3
+
+ADD . /workspace/unet
+WORKDIR /workspace/unet
+
+RUN pip install -r requirements.txt
--- a/TensorFlow/Segmentation/UNet_Medical/LICENSE
+++ b/TensorFlow/Segmentation/UNet_Medical/LICENSE
@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   
+   Copyright 2019 NVIDIA Corporation
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/TensorFlow/Segmentation/UNet_Medical/NOTICE
+++ b/TensorFlow/Segmentation/UNet_Medical/NOTICE
@ -0,0 +1,17 @@
+Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+Ths repository includes software from:
+* TensorFlow, (https://github.com/tensorflow/tensorflow) licensed 
+  under the Apache License, Version 2.0
--- a/TensorFlow/Segmentation/UNet_Medical/README.md
+++ b/TensorFlow/Segmentation/UNet_Medical/README.md
@ -0,0 +1,423 @@
+# UNet
+
+This repository provides a script and recipe to train U-Net Medical to achieve state of the art accuracy, and is tested and maintained by NVIDIA.
+
+## Table of contents
+
+1. [The model](#1-the-model)
+    1. [Default configuration](#11-default-configuration)
+    2. [Model architecture](#12-model-architecture)
+    3. [Feature support matrix](#13-feature-support-matrix)
+        1.  [Features](##131) 
+2. [Setup](#2-setup)
+    1. [Requirements](#21-requirements)
+3. [Quick start guide](#3-quick-start-guide)
+    1. [Clone the repository](#31-clone-the-repository)
+    2. [Download and preprocess the dataset](#32-download-and-preprocess-the-dataset)
+    3. [Build the U-Net TensorFlow container](#33-build-and-start-the-docker-container-based-on-the-tensorflow-ngc-container)
+    4. [Start an interactive session in the NGC container to run training/inference](#34-start-an-interactive-session-in-the-ngc-container-to-run-traininginference)
+    5. [Start training](#35-start-training)
+    6. [Start inference/predictions](#36-start-inferencepredictions)
+4. [Details](#4-details)
+    1. [Scripts and sample code](#41-scripts-and-sample-code)
+    2. [Parameters](#42-parameters)
+    3. [Command line options](#43-command-line-options)
+    4. [Getting the data](#44-getting-the-data)
+        1. [Dataset guidelines](#441-dataset-guidelines)
+    5. [Training process](#45-training-process)
+        1. [Optimizer](#451-optimizer)
+        2. [Augmentation](#452-augmentation)
+    6. [Inference process](#46-inference-process) 
+5. [Mixed precision training](#5-mixed-precision-training)
+    1. [Enabling mixed precision](#51-enabling-mixed-precision)
+6. [Benchmarking](#6-benchmarking)
+    1. [Training performance benchmark](#61-training-performance-benchmark)
+    2. [Inference performance benchmark](#62-inference-performance-benchmark)
+7. [Results](#7-results)
+    1. [Training accuracy results](#71-training-accuracy-results)
+        1. [NVIDIA DGX-1 (8x V100 16G)](#711-nvidia-dgx-1-8x-v100-16g) 
+    2. [Training performance results](#72-training-performance-results)
+        1. [NVIDIA DGX-1 (1x V100 16G)](#721-nvidia-dgx-1-1x-v100-16g)
+        2. [NVIDIA DGX-1 (8x V100 16G)](#721-nvidia-dgx-1-8x-v100-16g)
+    3. [Inference performance results](#73-inference-performance-results)
+        1. [NVIDIA DGX-1 (1x V100 16G)](#731)
+7. [Glossary](#7-glossary)
+8. [Changelog](#8-changelog)
+9. [Known issues](#9-known-issues)
+
+## 1. The model
+
+The U-Net model is a convolutional neural network for 2D image segmentation. This repository contains a U-Net implementation as described in the paper [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597), without any alteration.
+
+This model is trained with mixed precision using tensor cores on NVIDIA Volta GPUs. Therefore, researchers can get results much faster than training without Tensor Cores, while experiencing the benefits of mixed precision training (for example, up to 3.5x performance boost). This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
+
+### 1.1. Model architecture
+
+U-Net was first introduced by Olaf Ronneberger, Philip Fischer, and Thomas Brox in the paper: U-Net: Convolutional Networks for Biomedical Image Segmentation.  U-Net allows for seamless segmentation of 2D images, with high accuracy and performance, and can be adapted to solve many different segmentation problems.
+
+The following figure shows the construction of the UNet model and its different components. UNet is composed of a contractive and an expanding path, that aims at building a bottleneck in its centermost part through a combination of convolution and pooling operations. After this bottleneck, the image is reconstructed through a combination of convolutions and upsampling. Skip connections are added with the goal of helping the backward flow of gradients in order to improve the training.
+
+![UNet](images/unet.png)
+
+### 1.2. Default configuration
+
+U-Net consists of a contractive (left-side) and expanding (right-side) path. It repeatedly applies unpadded convolutions followed by max pooling for downsampling. Every step in the expanding path consists of an upsampling of the feature maps and a concatenation with the correspondingly cropped feature map from the contractive path.
+
+The following features were implemented in this model:
+* Data-parallel multi-GPU training with Horovod.
+* Mixed precision support with TensorFlow Automatic Mixed Precision (TF-AMP), which enables mixed precision training without any changes to the code-base by performing automatic graph rewrites and loss scaling controlled by an environmental variable.
+* Tensor Core operations to maximize throughput using NVIDIA Volta GPUs.
+* Static loss scaling for tensor cores (mixed precision) training.
+
+The following performance optimizations were implemented in this model:
+* XLA support (experimental). For TensorFlow, easily adding mixed-precision support is available from NVIDIA’s APEX, a TensorFlow extension that contains utility libraries, such as AMP, which require minimal network code changes to leverage tensor cores performance.
+
+### 1.3. Feature support matrix
+
+The following features are supported by this model.
+
+| **Feature** | **UNet_Medical_TF** |
+|:---:|:--------:|
+| Horovod Multi-GPU (NCCL) | Yes |
+
+### 1.3.1. Features
+
+**Horovod** - Horovod is a distributed training framework for TensorFlow, Keras, PyTorch and MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.  For more information about how to get started with Horovod, see the [Horovod: Official repository](https://github.com/horovod/horovod).
+
+## 2. Setup
+
+The following section lists the requirements in order to start training the U-Net model.
+
+### 2.1. Requirements
+
+This repository contains a `Dockerfile` which extends the TensorFlow NGC container and encapsulates some additional dependencies. Aside from these dependencies, ensure you have the following components:
+* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
+* [tensorflow:19.03-py3 NGC container](https://ngc.nvidia.com/registry/nvidia-tensorflow)
+* [NVIDIA Volta based GPU](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
+
+For more information about how to get started with NGC containers, see the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning DGX Documentation:
+
+* [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
+* [Accessing And Pulling From The NGC container registry](https://docs.nvidia.com/deeplearning/dgx/user-guide/index.html#accessing_registry)
+* [Running Tensorflow](https://docs.nvidia.com/deeplearning/dgx/tensorflow-release-notes/running.html#running)
+
+## 3. Quick start guide
+
+To train your model using mixed precision with tensor cores or using FP32, perform the following steps using the default parameters of the U-Net model on the [EM segmentation challenge dataset](http://brainiac2.mit.edu/isbi_challenge/home).
+
+### 3.1. Clone the repository
+```
+git clone https://github.com/NVIDIA/DeepLearningExamples
+cd DeepLearningExamples/TensorFlow/Segmentation/UNet_Medical
+```
+
+### 3.2. Download and preprocess the dataset
+
+The U-Net script  main.py operates on data from the [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge/home), the dataset originally employed in the [U-Net paper](https://arxiv.org/abs/1505.04597). Upon registration, the challenge's data is made available through the following links:
+
+* [train-volume.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-volume.tif)
+* [train-labels.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-labels.tif)
+* [train-volume.tif](http://brainiac2.mit.edu/isbi_challenge/sites/default/files/test-volume.tif)
+
+The script `download_dataset.py` is provided for data download. It is possible to select the destination folder when downloading the files by using the `--data_dir` flag.  For example: 
+```
+python download_dataset.py --data_dir ./dataset
+```
+Training and test data are composed of 3 multi-page `TIF` files, each containing 30 2D-images. The training and test datasets are given as stacks of 30 2D-images provided as a multi-page `TIF` that can be read using the Pillow library and NumPy (both Python packages are installed by the `Dockerfile`):
+```
+From PIL import Image, ImageSequence
+Import numpy as np
+
+im = Image.open(path)
+slices = [np.array(i) for i in ImageSequence.Iterator(im)]
+```
+Once downloaded the data using the `download_dataset.py` script, it can be used to run the training and benchmark scripts described below, by pointing `main.py` to its location using the `--data_dir` flag.
+
+**Note:** Masks are only provided for training data.
+
+### 3.3. Build the U-Net TensorFlow container
+
+After Docker is correctly set up, the U-Net TensorFlow container can be built with:
+```
+user@~/Documents/unet_medical_tf # docker build -t unet_tf .
+```
+
+### 3.4. Start an interactive session in the NGC container to run training/inference.
+
+Run the previously built Docker container:
+```
+user@~/path/to/unet_medical_tf # docker run --runtime=nvidia --rm -it --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /path/to/dataset:/data unet_tf:latest bash
+```
+**Note:** Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
+
+### 3.5. Start training
+
+To run training for a default configuration (for example 1/8 GPUs FP32/TF-AMP), run one of the scripts in the `./examples` directory, as follows:
+```
+bash examples/unet_{FP32, TF-AMP}_{1,8}.sh <path to main.py> <path to dataset> <path to results directory>
+```
+For example:
+```
+root@8e522945990f:/workspace/unet# bash examples/unet_FP32_1GPU.sh . /data results
+```
+
+### 3.6. Start inference/predictions
+To run inference on a checkpointed model, run:
+```
+python main.py --data_dir /data --model_dir <path to checkpoint> --exec_mode predict
+```
+
+## 4. Details
+
+The following sections provide greater details of the dataset, running training and inference, and the training results.
+
+### 4.1. Scripts and sample code
+
+In the root directory, the most important files are:
+* `main.py`: Serves as the entry point to the application.
+* `Dockerfile`: Container with the basic set of dependencies to run UNet
+* `requirements.txt`: Set of extra requirements for running UNet
+* `download_data.py`: Automatically downloads the dataset for training
+
+The utils/ folder encapsulates the necessary tools to train and perform inference using UNet. Its main components are:
+* `runner.py`: Implements the logic for training and inference
+* `data_loader.py`: Implements the data loading and augmentation
+* `hooks/profiler.py`: Collects different metrics to be used for benchmarking and testing
+* `var_storage.py`: Helper functions for TF-AMP
+
+The model/ folder contains information about the building blocks of UNet and the way they are assembled. Its contents are:
+* `layers.py`: Defines the different blocks that are used to assemble UNet
+* `unet.py`: Defines the model architecture using the blocks from the `layers.py` script
+
+Other folders included in the root directory are:
+* `dllogger/`: Contains the utils for logging
+* `examples/`: Provides examples for training and benchmarking UNet
+* `images/`: Contains a model diagram
+
+### 4.2. Parameters
+The complete list of the available parameters for the main.py script contains:
+* `--exec_mode`: Select the execution mode to run the model (default: train_and_predict)
+* `--model_dir`: Set the output directory for information related to the model (default: result/)
+* `--data_dir`: Set the input directory containing the dataset (defaut: None)
+* `--batch_size`: Size of each minibatch per GPU (default: 1)
+* `--max_steps`: Maximum number of steps (batches) for training (default: 1000)
+* `--seed`: Set random seed for reproducibility (default: 0)
+* `--weight_decay`: Weight decay coefficient (default: 0.0005)
+* `--log_every`: Log performance every n steps (default: 100)
+* `--warmup_steps`: Skip logging during the first n steps (default: 200)
+* `--learning_rate`: Model’s learning rate (default: 0.01)
+* `--momentum`: Momentum coefficient for model’s optimizer (default: 0.99)
+* `--decay_steps`: Number of steps before learning rate decay (default: 5000)
+* `--decay_rate`: Decay rate for polynomial learning rate decay (default 0.95)
+* `--augment`: Enable data augmentation (default: False)
+* `--benchmark`: Enable performance benchmarking (default: False)
+* `--use_amp`: Enable automatic mixed precision (default: False)
+
+### 4.3. Command line options
+
+To see the full list of available options and their descriptions, use the `-h` or `--help` command line option, for example: 
+```
+root@ac1c9afe0a0b:/workspace/unet# python main.py
+usage: main.py [-h] 
+            [--exec_mode {train,train_and_predict,predict,benchmark}]
+            [--model_dir MODEL_DIR] 
+            --data_dir DATA_DIR 
+            [--batch_size BATCH_SIZE] 
+            [--max_steps MAX_STEPS]
+            [--seed SEED]
+            [--weight_decay WEIGHT_DECAY]
+            [--log_every LOG_EVERY]
+            [--warmup_steps WARMUP_STEPS]
+            [--learning_rate LEARNING_RATE]
+            [--momentum MOMENTUM]
+            [--decay_steps DECAY_STEPS]
+            [--decay_rate DECAY_RATE]
+            [--augment]
+            [--no-augment]
+            [--benchmark]
+            [--no-benchmark]
+            [--use_amp]
+```
+
+### 4.4. Getting the data
+
+The U-Net model was trained in the [EM segmentation challenge dataset](http://brainiac2.mit.edu/isbi_challenge/home). Test images provided by the organization were used to produce the resulting masks for submission.
+
+Training and test data is comprised of three 512x512x30 `TIF` volumes (`test-volume.tif`, `train-volume.tif` and `train-labels.tif`). Files `test-volume.tif` and `train-volume.tif` contain grayscale 2D slices to be segmented. Additionally, training masks are provided in `train-labels.tif` as a 512x512x30 `TIF` volume, where each pixel has one of two classes: 
+* 0 indicating the presence of cellular membrane, and 
+* 1 corresponding to background.
+
+The objective is to produce a set of masks that segment the data as accurately as possible. The results are expected to be submitted as a 32-bit `TIF` 3D image, which values between `0` (100% membrane certainty) and `1` (100% non-membrane certainty). 
+
+#### 4.4.1 Dataset guidelines
+
+The process of loading, normalizing and augmenting the data contained in the dataset can be found in the `data_loader.py` script. 
+
+Initially, data is loaded from a multi-page `TIF` file and converted to 512x512x30 NumPy arrays with the use of Pillow. These NumPy arrays are fed to the model through `tf.data.Dataset.from_tensor_slices()`, in order to achieve high performance.
+
+Intensities on the volumes are then normalized to an interval `[-1, 1]`, whereas labels are one-hot encoded for their later use in pixel wise cross entropy loss, becoming 512x512x30x2 tensors.
+
+If augmentation is enabled, the following set of augmentation techniques are applied:
+* Random horizontal flipping
+* Random vertical flipping
+* Elastic deformation through dense_image_warp
+* Random rotation
+* Crop to a random dimension and resize to input dimension
+* Random brightness shifting
+
+At the end, intensities are clipped to the `[-1, 1]` interval.
+
+
+### 4.5. Training process
+
+#### 4.5.1. Optimizer
+
+The model trains for 40,000 batches, with the default U-Net setup as specified in the [original paper](https://arxiv.org/abs/1505.04597):
+
+* SGD with momentum (0.99)
+* Learning rate = 0.01
+
+
+This default parametrization is employed when running scripts from the ./examples directory and when running main.py without explicitly overriding these fields.
+* Augmentation
+* During training, we perform the following augmentation techniques:
+* Random flip left and right
+* Random flip up and down
+* Elastic deformation
+* Random rotation
+* Random crop and resize
+* Random brightness changes
+
+To run a pre-parameterized configuration (1 or 8 GPUs, FP32 or AMP), run one of the scripts in the `./examples` directory, for example:
+```
+./examples/unet_{FP32, TF-AMP}_{1, 8}GPU.sh <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>
+```
+Use `-h` or `--help` to obtain a list of available options in the `main.py` script.
+
+**Note:** When calling the `main.py` script manually, data augmentation is disabled. In order to enable data augmentation, use the `--augment` flag at the end of your invocation.
+
+Use the `--model_dir` flag to select the location where to store the artifacts of the training.
+
+### 4.6. Inference process
+
+To run inference on a checkpointed model, run the script below, although, it requires a pre-trained model checkpoint and tokenized input.
+```
+python main.py --data_dir /data --model_dir <path to checkpoint> --exec_mode predict
+```
+This script should produce the prediction results over a set of masks which will be located in `<path to checkpoint>/eval`.
+
+## 5. Mixed precision training
+
+Mixed precision is the combined use of different numerical precisions in a computational method. [Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. Since the introduction of [tensor cores](https://developer.nvidia.com/tensor-cores) in the Volta and Turing architecture, significant training speedups are experienced by switching to mixed precision -- up to 3x overall speedup on the most arithmetically intense model architectures.  Using mixed precision training requires two steps:
+1. Porting the model to use the FP16 data type where appropriate.
+2. Adding loss scaling to preserve small gradient values.
+
+The ability to train deep learning networks with lower precision was introduced in the Pascal architecture and first supported in [CUDA 8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep Learning SDK.
+
+For information about:
+- How to train using mixed precision, see the [Mixed Precision Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html) documentation.
+- Techniques used for mixed precision training, see the [Mixed-Precision Training of Deep Neural Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) blog.
+- How to access and enable AMP for TensorFlow, see [Using TF-AMP](https://docs.nvidia.com/deeplearning/dgx/tensorflow-user-guide/index.html#tfamp) from the TensorFlow User Guide.
+- APEX tools for mixed precision training, see the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/).
+
+## 5.1. Enabling mixed precision
+
+In order to enable mixed precision training, the following environment variables must be defined with the correct value before the training starts:
+```
+TF_ENABLE_AUTO_MIXED_PRECISION=1
+```
+Exporting these variables ensures that loss scaling is performed correctly and automatically. 
+By supplying the `--use_amp` flag to the `main.py` script while training in FP32, the following variables are set to their correct value for mixed precision training inside the `./utils/runner.py` script:
+```
+if params['use_amp']:
+   assert params['dtype'] == tf.float32, "TF-AMP requires FP32 precision"
+
+   LOGGER.log("TF AMP is activated - Experimental Feature")
+   os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'
+```
+
+## 6. Benchmarking
+
+The following section shows how to run benchmarks measuring the model performance in training and inference modes.
+
+### 6.1. Training performance benchmark
+
+To benchmark training, run one of the scripts in `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_{1, 8}GPU.sh  <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>`.
+
+Each of these scripts will by default run 200 warm-up iterations and benchmark the performance during training in the next 100 iterations. To control warmup and benchmark length, use `--warmup_steps`, and `--max_steps` flags.
+
+### 6.2. Inference performance benchmark
+
+To benchmark inference, run one of the scripts in `./examples/unet_INFER_BENCHMARK_{FP32, TF-AMP}.sh <path/to/main.py> <path/to/dataset> <path/to/checkpoints> <batch size>`.
+
+Each of these scripts will by default run 200 warmup iterations and benchmark the performance during inference in the next 100 iterations. To control warmup and benchmark length, use `--warmup_steps`, and `--max_steps` flags.
+
+## 7. Results
+
+The following sections provide details on how we achieved our performance and accuracy in training and inference. 
+
+### 7.1. Training accuracy results
+
+#### 7.1.1 NVIDIA DGX-1 (8x V100 16G)
+
+Our results were obtained by running the `./examples/unet_{FP32, TF-AMP}_{1, 8}GPU.sh` scripts in the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs.
+
+Metrics employed by the organization are explained in detail [here](http://brainiac2.mit.edu/isbi_challenge/evaluation).
+
+The results described below were obtained after the submission of our evaluations to the [ISBI Challenge](http://brainiac2.mit.edu/isbi_challenge) organizers. 
+
+| **Number og GPUs** | **FP32 Rand Score Thin** | **FP32 Information Score Thin** | **TF-AMP Rand Score Thin** | **TF-AMP Information Score Thin** | **Total time to train with FP16 (Hrs)** | **Total time to train with FP32 (Hrs)** |
+|:---:|:--------:|:-------:|:--------:|:-------:|:--------:|:-------:|
+|1 | 0.938508265 | 0.970255682 | 0.939619101 | 0.970120138 | 7.1 | 11.28 |
+|8 | 0.932395087 | 0.9786346 | 0.941360867 | 0.976235311 | 0.9 | 1.41 |
+
+### 7.2. Training performance results
+
+#### 7.2.1 NVIDIA DGX-1 (1x V100 16G)
+
+Our results were obtained by running the `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_1GPU.sh` scripts in
+the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPU while data augmentation is enabled.
+
+
+| **Batch size** | **FP32 max img/s** | **TF-AMP max img/s** | **Speedup factor** | 
+|:---:|:--------:|:-------:|:-------:|
+| 1 | 12.37 | 21.91 | 1.77 |
+| 8 | 13.81  | 29.58 | 2.14 |
+| 16 | Out of memory | 30.77 | - |
+
+To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
+
+#### 7.2.2 NVIDIA DGX-1 (8x V100 16G)
+
+Our results were obtained by running the `./examples/unet_TRAIN_BENCHMARK_{FP32, TF-AMP}_8GPU.sh` scripts in
+the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPU while data augmentation is enabled.
+
+| **Batch size per GPU** | **FP32 max img/s** | **TF-AMP max img/s** | **Speedup factor** | 
+|:---:|:--------:|:-------:|:-------:|
+| 1 | 89.93 | 126.66  | 1.41 |
+| 8 | 105.35 | 130.66 | 1.24 |
+| 16 | Out of memory | 132.78  | - |
+
+To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
+
+### 7.3. Inference performance results
+
+Our results were obtained by running the `./examples/unet_INFER_BENCHMARK_{FP32, TF-AMP}.sh` scripts in
+the tensorflow:19.03-py3 NGC container on NVIDIA DGX-1 with 1x V100 16G GPU while data augmentation is enabled.
+
+| **Batch size** | **FP32 img/s** | **TF-AMP img/s** | **Speedup factor** | 
+|:---:|:--------:|:-------:|:-------:|
+| 1 | 34.27 | 62.81  | 1.83 |
+| 8 | 37.09 | 79.62 | 2.14 |
+| 16 | Out of memory | 83.33  | - |
+
+To achieve these same results, follow the [Quick start guide](#3-quick-start-guide) outlined above.
+
+## 8. Changelog
+
+May 2019
+* Initial release
+
+## 9. Known issues
+
+There are no known issues in this release.
--- a/TensorFlow/Segmentation/UNet_Medical/dllogger/init.py
+++ b/TensorFlow/Segmentation/UNet_Medical/dllogger/init.py
@ -0,0 +1,19 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+from .logger import LOGGER, StdOutBackend, MLPerfBackend, JsonBackend, CompactBackend, Scope, AverageMeter, StandardMeter
+from . import tags
+
+__all__ = [LOGGER, StdOutBackend, MLPerfBackend, JsonBackend, CompactBackend, Scope, AverageMeter, StandardMeter, tags]
--- a/TensorFlow/Segmentation/UNet_Medical/dllogger/autologging.py
+++ b/TensorFlow/Segmentation/UNet_Medical/dllogger/autologging.py
@ -0,0 +1,60 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Common values reported
+
+
+import subprocess
+import xml.etree.ElementTree as ET
+
+#TODO: print CUDA version, container version etc
+
+def log_hardware(logger):
+    # TODO: asserts - what if you cannot launch those commands?
+    # number of CPU threads
+    cpu_info_command = 'cat /proc/cpuinfo'
+    cpu_info = subprocess.run(cpu_info_command.split(), stdout=subprocess.PIPE).stdout.split()
+    cpu_num_index = len(cpu_info) - cpu_info[::-1].index(b'processor') + 1
+    cpu_num = int(cpu_info[cpu_num_index]) + 1
+
+    # CPU name
+    cpu_name_begin_index = cpu_info.index(b'name')
+    cpu_name_end_index = cpu_info.index(b'stepping')
+    cpu_name = b' '.join(cpu_info[cpu_name_begin_index + 2:cpu_name_end_index]).decode('utf-8')
+
+    logger.log(key='cpu_info', value={"num": cpu_num, "name": cpu_name})
+
+    # RAM memory
+    ram_info_command = 'free -m -h'
+    ram_info = subprocess.run(ram_info_command.split(), stdout=subprocess.PIPE).stdout.split()
+    ram_index = ram_info.index(b'Mem:') + 1
+    ram = ram_info[ram_index].decode('utf-8')
+
+    logger.log(key='mem_info', value={"ram": ram})
+
+    # GPU
+    nvidia_smi_command = 'nvidia-smi -q -x'
+    nvidia_smi_output = subprocess.run(nvidia_smi_command.split(), stdout=subprocess.PIPE).stdout
+    nvidia_smi = ET.fromstring(nvidia_smi_output)
+    gpus = nvidia_smi.findall('gpu')
+    ver = nvidia_smi.findall('driver_version')
+
+    logger.log(key="gpu_info",
+                 value={
+                      "driver_version": ver[0].text,
+                      "num": len(gpus),
+                      "name": [g.find('product_name').text for g in gpus],
+                      "mem": [g.find('fb_memory_usage').find('total').text for g in gpus]})
+
+def log_args(logger, args):
+    logger.log(key='args', value=vars(args))
--- a/TensorFlow/Segmentation/UNet_Medical/dllogger/logger.py
+++ b/TensorFlow/Segmentation/UNet_Medical/dllogger/logger.py
@ -0,0 +1,519 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+import time
+import json
+import logging
+import inspect
+import sys
+from contextlib import contextmanager
+import functools
+from collections import OrderedDict
+import datetime
+
+from . import autologging
+
+NVLOGGER_NAME = 'nv_dl_logger'
+NVLOGGER_VERSION = '0.3.1'
+NVLOGGER_TOKEN = ':::NVLOG'
+
+MLPERF_NAME = 'mlperf_logger'
+MLPERF_VERSION = '0.5.0'
+MLPERF_TOKEN = ':::MLP'
+
+COMPACT_NAME = 'compact_logger'
+
+DEFAULT_JSON_FILENAME = 'nvlog.json'
+
+class Scope:
+    RUN = 0
+    EPOCH = 1
+    TRAIN_ITER = 2
+
+
+class Level:
+    CRITICAL = 5
+    ERROR = 4
+    WARNING = 3
+    INFO = 2
+    DEBUG = 1
+
+
+_data = OrderedDict([
+    ('model', None),
+    ('epoch', -1),
+    ('iteration', -1),
+    ('total_iteration', -1),
+    ('metrics', OrderedDict()),
+    ('timed_blocks', OrderedDict()),
+    ('current_scope', Scope.RUN)
+    ])
+
+def get_caller(root_dir=None):
+    stack_files = [s.filename.split('/')[-1] for s in inspect.stack()]
+    stack_index = 0
+    while stack_index < len(stack_files) and stack_files[stack_index] != 'logger.py':
+        stack_index += 1
+    while (stack_index < len(stack_files) and 
+            stack_files[stack_index] in ['logger.py', 'autologging.py', 'contextlib.py']):
+        stack_index += 1
+
+    caller = inspect.stack()[stack_index]
+
+    return "%s:%d" % (stack_files[stack_index], caller.lineno)
+
+class StandardMeter(object):
+
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.value = None
+
+    def record(self, value):
+        self.value = value
+
+    def get_value(self):
+        return self.value
+
+    def get_last(self):
+        return self.value
+
+class AverageMeter(object):
+
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        self.count = 0
+        self.value = 0
+        self.last = 0
+
+    def record(self, value, n = 1):
+        self.last = value
+        self.count += n
+        self.value += value * n
+
+    def get_value(self):
+        return self.value / self.count
+
+    def get_last(self):
+        return self.last
+
+class JsonBackend(object):
+
+    def __init__(self, log_file=DEFAULT_JSON_FILENAME, logging_scope=Scope.TRAIN_ITER,
+            iteration_interval=1):
+        self.log_file = log_file
+        self.logging_scope = logging_scope
+        self.iteration_interval = iteration_interval
+
+        self.json_log = OrderedDict([
+            ('run', OrderedDict()),
+            ('epoch', OrderedDict()),
+            ('iter', OrderedDict()),
+            ('event', OrderedDict()),
+            ])
+        
+        self.json_log['epoch']['x'] = []
+        if self.logging_scope == Scope.TRAIN_ITER:
+            self.json_log['iter']['x'] = [[]]
+
+    def register_metric(self, key, metric_scope):
+        if (metric_scope == Scope.TRAIN_ITER and
+                self.logging_scope == Scope.TRAIN_ITER):
+            if not key in self.json_log['iter'].keys():
+                self.json_log['iter'][key] = [[]]
+        if metric_scope == Scope.EPOCH:
+            if not key in self.json_log['epoch'].keys():
+                self.json_log['epoch'][key] = []
+
+    def log(self, key, value):
+        if _data['current_scope'] == Scope.RUN:
+            self.json_log['run'][key] = value
+        elif _data['current_scope'] == Scope.EPOCH: 
+            pass
+        elif _data['current_scope'] == Scope.TRAIN_ITER:
+            pass
+        else:
+            raise ValueError('log function for scope "', _data['current_scope'], 
+                    '" not implemented')
+
+    def log_event(self, key, value):
+        if not key in self.json_log['event'].keys():
+            self.json_log['event'][key] = []
+        entry = OrderedDict()
+        entry['epoch'] = _data['epoch']
+        entry['iter'] = _data['iteration']
+        entry['timestamp'] = time.time()
+        if value:
+            entry['value'] = value
+        self.json_log['event'][key].append(str(entry))
+
+    def log_iteration_summary(self):
+        if (self.logging_scope == Scope.TRAIN_ITER and 
+                _data['total_iteration'] % self.iteration_interval == 0):
+            for key, m in _data['metrics'].items():
+                if m.metric_scope == Scope.TRAIN_ITER:
+                    self.json_log['iter'][key][-1].append(str(m.get_last()))
+
+            # log x for iteration number
+            self.json_log['iter']['x'][-1].append(_data['iteration'])
+
+
+    def dump_json(self):
+        if self.log_file is None:
+            print(json.dumps(self.json_log, indent=4))
+        else:
+            with open(self.log_file, 'w') as f:
+                json.dump(self.json_log, fp=f, indent=4)
+
+    def log_epoch_summary(self):
+        for key, m in _data['metrics'].items():
+            if m.metric_scope == Scope.EPOCH:
+                self.json_log['epoch'][key].append(str(m.get_value()))
+            elif (m.metric_scope == Scope.TRAIN_ITER and 
+                    self.logging_scope == Scope.TRAIN_ITER):
+                # create new sublists for each iter metric in the next epoch
+                self.json_log['iter'][key].append([])
+        
+        # log x for epoch number
+        self.json_log['epoch']['x'].append(_data['epoch'])
+
+        # create new sublist for iter's x in the next epoch
+        if self.logging_scope == Scope.TRAIN_ITER:
+            self.json_log['iter']['x'].append([])
+
+        self.dump_json()
+
+    def timed_block_start(self, name):
+        pass
+
+    def timed_block_stop(self, name):
+        pass
+
+    def finish(self):
+        self.dump_json()
+
+class _ParentStdOutBackend(object):
+
+    def __init__(self, name, token, version, log_file, logging_scope, iteration_interval):
+
+        self.root_dir = None
+        self.worker = [0]
+        self.prefix = ''
+
+        self.name = name
+        self.token = token
+        self.version = version
+        self.log_file = log_file
+        self.logging_scope = logging_scope
+        self.iteration_interval = iteration_interval
+
+        self.logger = logging.getLogger(self.name)
+        self.logger.setLevel(logging.DEBUG)
+        self.logger.handlers = []
+
+        if (self.log_file is None):
+            self.stream_handler = logging.StreamHandler(stream=sys.stdout)
+            self.stream_handler.setLevel(logging.DEBUG)
+            self.logger.addHandler(self.stream_handler)
+        else:
+            self.file_handler = logging.FileHandler(self.log_file, mode='w')
+            self.file_handler.setLevel(logging.DEBUG)
+            self.logger.addHandler(self.file_handler)
+
+    def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
+        pass
+
+    def log_epoch_summary(self):
+        pass
+
+    def log_iteration_summary(self):
+        pass
+
+    def log(self, key, value):
+        if _data['current_scope'] > self.logging_scope:
+            pass
+        elif (_data['current_scope'] == Scope.TRAIN_ITER and 
+                _data['total_iteration'] % self.iteration_interval != 0):
+            pass
+        else:
+            self.log_stdout(key, value)
+
+    def log_event(self, key, value):
+        self.log_stdout(key, value)
+        
+    def log_stdout(self, key, value=None, forced=False):
+        # TODO: worker 0 
+        # only the 0-worker will log
+        #if not forced and self.worker != 0:
+        #    pass
+
+        if value is None:
+            msg = key
+        else:
+            str_json = json.dumps(str(value))
+            msg = '{key}: {value}'.format(key=key, value=str_json)
+
+        call_site = get_caller(root_dir=self.root_dir)
+        now = time.time()
+
+        message = '{prefix}{token}v{ver} {model} {secs:.9f} ({call_site}) {msg}'.format(
+            prefix=self.prefix, token=self.token, ver=self.version, secs=now, 
+            model=_data['model'],
+            call_site=call_site, msg=msg)
+
+        self.logger.debug(message)
+
+    def timed_block_start(self, name):
+        self.log_stdout(key=name + "_start")
+
+    def timed_block_stop(self, name):
+        self.log_stdout(key=name + "_stop")
+
+    def finish(self):
+        pass
+
+class StdOutBackend(_ParentStdOutBackend):
+
+    def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
+        _ParentStdOutBackend.__init__(self, name=NVLOGGER_NAME, token=NVLOGGER_TOKEN, 
+                version=NVLOGGER_VERSION, log_file=log_file, logging_scope=logging_scope, 
+                iteration_interval=iteration_interval)
+        
+class MLPerfBackend(_ParentStdOutBackend):
+
+    def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
+        _ParentStdOutBackend.__init__(self, name=MLPERF_NAME, token=MLPERF_TOKEN, 
+                version=MLPERF_VERSION, log_file=log_file, logging_scope=logging_scope, 
+                iteration_interval=iteration_interval)
+
+class CompactBackend(object):
+
+    def __init__(self, log_file=None, logging_scope=Scope.TRAIN_ITER, iteration_interval=1):
+        self.log_file = log_file
+        self.logging_scope = logging_scope
+        self.iteration_interval = iteration_interval
+
+        self.logger = logging.getLogger(COMPACT_NAME)
+        self.logger.setLevel(logging.DEBUG)
+        self.logger.handlers = []
+
+        if (self.log_file is None):
+            self.stream_handler = logging.StreamHandler(stream=sys.stdout)
+            self.stream_handler.setLevel(logging.DEBUG)
+            self.logger.addHandler(self.stream_handler)
+        else:
+            self.file_handler = logging.FileHandler(self.log_file, mode='w')
+            self.file_handler.setLevel(logging.DEBUG)
+            self.logger.addHandler(self.file_handler)
+    
+    def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
+        pass
+    
+    def timestamp_prefix(self):
+        return datetime.datetime.now().strftime('[%Y-%m-%d %H:%M:%S]')
+
+    def log(self, key, value):
+        if _data['current_scope'] == Scope.RUN:
+            self.log_event(key, value)
+    
+    def log_event(self, key, value):
+        msg = self.timestamp_prefix() + ' ' + str(key)
+        if value is not None:
+            msg += ": " + str(value)
+        self.logger.debug(msg)
+    
+    def log_epoch_summary(self):
+        if self.logging_scope >= Scope.EPOCH:
+            summary = self.timestamp_prefix() + ' Epoch {:<4} '.format(str(_data['epoch']) + ':')
+            for key, m in _data['metrics'].items():
+                if m.metric_scope >= Scope.EPOCH:
+                    summary += str(key) + ": " + str(m.get_value()) + ", "
+            self.logger.debug(summary)
+
+    def log_iteration_summary(self):
+        if self.logging_scope >= Scope.TRAIN_ITER and _data['total_iteration'] % self.iteration_interval == 0:
+            summary = self.timestamp_prefix() + ' Iter {:<5} '.format(str(_data['iteration']) + ':')
+            for key, m in _data['metrics'].items():
+                if m.metric_scope == Scope.TRAIN_ITER:
+                    summary += str(key) + ": " + str(m.get_last()) + ", "
+            self.logger.debug(summary)
+ 
+    def timed_block_start(self, name):
+        pass
+
+    def timed_block_stop(self, name):
+        pass
+
+    def finish(self):
+        pass
+
+class _Logger(object):
+    def __init__(self):
+
+        self.backends = [
+                CompactBackend(),
+                JsonBackend()
+                ]
+
+        self.level = Level.INFO
+   
+    def set_model_name(self, name):
+        _data['model'] = name
+
+
+    def set_backends(self, backends):
+        self.backends = backends
+        
+    def register_metric(self, key, meter=None, metric_scope=Scope.EPOCH):
+        if meter is None:
+            meter = StandardMeter()
+        #TODO: move to argument of Meter?
+        meter.metric_scope = metric_scope
+        _data['metrics'][key] = meter
+        for b in self.backends:
+            b.register_metric(key, metric_scope)
+
+    def log(self, key, value=None, forced=False, level=Level.INFO):
+        if level < self.level:
+            return
+
+        if _data['current_scope'] == Scope.TRAIN_ITER or _data['current_scope'] == Scope.EPOCH:
+            if key in _data['metrics'].keys():
+                if _data['metrics'][key].metric_scope == _data['current_scope']:
+                    _data['metrics'][key].record(value)
+        for b in self.backends:
+            b.log(key, value)
+
+    def debug(self, *args, **kwargs):
+        self.log(*args, level=Level.DEBUG, **kwargs)
+
+    def info(self, *args, **kwargs):
+        self.log(*args, level=Level.INFO, **kwargs)
+
+    def warning(self, *args, **kwargs):
+        self.log(*args, level=Level.WARNING, **kwargs)
+
+    def error(self, *args, **kwargs):
+        self.log(*args, level=Level.ERROR, **kwargs)
+
+    def critical(self, *args, **kwargs):
+        self.log(*args, level=Level.CRITICAL, **kwargs)
+
+    def log_event(self, key, value=None):
+        for b in self.backends:
+            b.log_event(key, value)
+    
+    def timed_block_start(self, name):
+        if not name in _data['timed_blocks']:
+            _data['timed_blocks'][name] = OrderedDict()
+        _data['timed_blocks'][name]['start'] = time.time()
+        for b in self.backends:
+            b.timed_block_start(name)
+    
+    def timed_block_stop(self, name):
+        if not name in _data['timed_blocks']:
+            raise ValueError('timed_block_stop called before timed_block_start for ' + name)
+        _data['timed_blocks'][name]['stop'] = time.time()
+        delta = _data['timed_blocks'][name]['stop'] - _data['timed_blocks'][name]['start']
+        self.log(name + '_time', delta)
+        for b in self.backends:
+            b.timed_block_stop(name)
+
+    def iteration_start(self):
+        _data['current_scope'] = Scope.TRAIN_ITER
+        _data['iteration'] += 1
+        _data['total_iteration'] += 1
+
+
+    def iteration_stop(self):
+        for b in self.backends:
+            b.log_iteration_summary()
+        _data['current_scope'] = Scope.EPOCH
+
+    def epoch_start(self):
+        _data['current_scope'] = Scope.EPOCH 
+        _data['epoch'] += 1
+        _data['iteration'] = -1
+
+        for n, m in _data['metrics'].items():
+            if m.metric_scope == Scope.TRAIN_ITER:
+                m.reset()
+
+    def epoch_stop(self):
+        for b in self.backends:
+            b.log_epoch_summary()
+        _data['current_scope'] = Scope.RUN
+
+    def finish(self):
+        for b in self.backends:
+            b.finish()
+
+    def iteration_generator_wrapper(self, gen):
+        for g in gen:
+            self.iteration_start()
+            yield g
+            self.iteration_stop()
+
+    def epoch_generator_wrapper(self, gen):
+        for g in gen:
+            self.epoch_start()
+            yield g
+            self.epoch_stop()
+
+    @contextmanager
+    def timed_block(self, prefix, value=None, forced=False):
+        """ This function helps with timed blocks
+            ----
+            Parameters:
+            prefix - one of items from TIMED_BLOCKS; the action to be timed
+            logger - NVLogger object
+            forced - if True then the events are always logged (even if it should be skipped)
+        """
+        self.timed_block_start(prefix)
+        yield self
+        self.timed_block_stop(prefix)
+
+    def log_hardware(self):
+        autologging.log_hardware(self)
+
+    def log_args(self, args):
+        autologging.log_args(self, args)
+
+    def timed_function(self, prefix, variable=None, forced=False):
+        """ This decorator helps with timed functions
+            ----
+            Parameters:
+            prefix - one of items from TIME_BLOCK; the action to be timed
+            logger - NVLogger object
+            forced - if True then the events are always logged (even if it should be skipped)
+        """
+
+        def timed_function_decorator(func):
+            @functools.wraps(func)
+            def wrapper(*args, **kwargs):
+                value = kwargs.get(variable, next(iter(args), None))
+                with self.timed_block(prefix=prefix, value=value, forced=forced):
+                    func(*args, **kwargs)
+
+            return wrapper
+
+        return timed_function_decorator
+
+
+LOGGER = _Logger()
+
--- a/TensorFlow/Segmentation/UNet_Medical/dllogger/tags.py
+++ b/TensorFlow/Segmentation/UNet_Medical/dllogger/tags.py
@ -0,0 +1,255 @@
+# Copyright 2018 MLBenchmark Group. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# Common values reported
+
+VALUE_EPOCH = "epoch"
+VALUE_ITERATION = "iteration"
+VALUE_ACCURACY = "accuracy"
+VALUE_BLEU = "bleu"
+VALUE_TOP1 = "top1"
+VALUE_TOP5 = "top5"
+VALUE_BBOX_MAP = "bbox_map"
+VALUE_MASK_MAP = "mask_map"
+VALUE_BCE = "binary_cross_entropy"
+
+
+# Timed blocks (used with timed_function & timed_block
+# For each there should be *_start and *_stop tags defined
+
+RUN_BLOCK = "run"
+SETUP_BLOCK = "setup"
+PREPROC_BLOCK = "preproc"
+
+TRAIN_BLOCK = "train"
+TRAIN_PREPROC_BLOCK = "train_preproc"
+TRAIN_EPOCH_BLOCK = "train_epoch"
+TRAIN_EPOCH_PREPROC_BLOCK = "train_epoch_preproc"
+TRAIN_CHECKPOINT_BLOCK = "train_checkpoint"
+TRAIN_ITER_BLOCK = "train_iteration"
+
+EVAL_BLOCK = "eval"
+EVAL_ITER_BLOCK = "eval_iteration"
+
+#TODO: to remove?
+TIMED_BLOCKS = {
+    RUN_BLOCK,
+    SETUP_BLOCK,
+    PREPROC_BLOCK,
+    TRAIN_BLOCK,
+    TRAIN_PREPROC_BLOCK,
+    TRAIN_EPOCH_BLOCK,
+    TRAIN_EPOCH_PREPROC_BLOCK,
+    TRAIN_CHECKPOINT_BLOCK,
+    TRAIN_ITER_BLOCK,
+    EVAL_BLOCK,
+    EVAL_ITER_BLOCK,
+}
+
+
+# Events
+
+RUN_INIT = "run_init"
+
+SETUP_START = "setup_start"
+SETUP_STOP = "setup_stop"
+
+PREPROC_START = "preproc_start"
+PREPROC_STOP = "preproc_stop"
+
+RUN_START = "run_start"
+RUN_STOP = "run_stop"
+RUN_FINAL = "run_final"
+
+TRAIN_CHECKPOINT_START = "train_checkpoint_start"
+TRAIN_CHECKPOINT_STOP = "train_checkpoint_stop"
+
+TRAIN_PREPROC_START = "train_preproc_start"
+TRAIN_PREPROC_STOP = "train_preproc_stop"
+
+TRAIN_EPOCH_PREPROC_START = "train_epoch_preproc_start"
+TRAIN_EPOCH_PREPROC_STOP = "train_epoch_preproc_stop"
+
+TRAIN_ITER_START = "train_iter_start"
+TRAIN_ITER_STOP = "train_iter_stop"
+
+TRAIN_EPOCH_START = "train_epoch_start"
+TRAIN_EPOCH_STOP = "train_epoch_stop"
+
+
+# MLPerf specific tags
+
+RUN_CLEAR_CACHES = "run_clear_caches"
+
+PREPROC_NUM_TRAIN_EXAMPLES = "preproc_num_train_examples"
+PREPROC_NUM_EVAL_EXAMPLES = "preproc_num_eval_examples"
+PREPROC_TOKENIZE_TRAINING = "preproc_tokenize_training"
+PREPROC_TOKENIZE_EVAL = "preproc_tokenize_eval"
+PREPROC_VOCAB_SIZE = "preproc_vocab_size"
+
+RUN_SET_RANDOM_SEED = "run_set_random_seed"
+
+INPUT_SIZE = "input_size"
+INPUT_BATCH_SIZE = "input_batch_size"
+INPUT_ORDER = "input_order"
+INPUT_SHARD = "input_shard"
+INPUT_BN_SPAN = "input_bn_span"
+
+INPUT_CENTRAL_CROP = "input_central_crop"
+INPUT_CROP_USES_BBOXES = "input_crop_uses_bboxes"
+INPUT_DISTORTED_CROP_MIN_OBJ_COV = "input_distorted_crop_min_object_covered"
+INPUT_DISTORTED_CROP_RATIO_RANGE = "input_distorted_crop_aspect_ratio_range"
+INPUT_DISTORTED_CROP_AREA_RANGE = "input_distorted_crop_area_range"
+INPUT_DISTORTED_CROP_MAX_ATTEMPTS = "input_distorted_crop_max_attempts"
+INPUT_MEAN_SUBTRACTION = "input_mean_subtraction"
+INPUT_RANDOM_FLIP = "input_random_flip"
+
+INPUT_RESIZE = "input_resize"
+INPUT_RESIZE_ASPECT_PRESERVING = "input_resize_aspect_preserving"
+
+
+# Opt
+
+OPT_NAME = "opt_name"
+
+OPT_LR = "opt_learning_rate"
+OPT_MOMENTUM = "opt_momentum"
+
+OPT_WEIGHT_DECAY = "opt_weight_decay"
+
+OPT_HP_ADAM_BETA1 = "opt_hp_Adam_beta1"
+OPT_HP_ADAM_BETA2 = "opt_hp_Adam_beta2"
+OPT_HP_ADAM_EPSILON = "opt_hp_Adam_epsilon"
+
+OPT_LR_WARMUP_STEPS = "opt_learning_rate_warmup_steps"
+
+
+#  Train
+
+TRAIN_LOOP = "train_loop"
+TRAIN_EPOCH = "train_epoch"
+TRAIN_CHECKPOINT = "train_checkpoint"
+TRAIN_LOSS = "train_loss"
+TRAIN_ITERATION_LOSS = "train_iteration_loss"
+
+
+# Eval
+
+EVAL_START = "eval_start"
+EVAL_SIZE = "eval_size"
+EVAL_TARGET = "eval_target"
+EVAL_ACCURACY = "eval_accuracy"
+EVAL_STOP = "eval_stop"
+
+
+# Perf
+
+PERF_IT_PER_SEC = "perf_it_per_sec"
+PERF_TIME_TO_TRAIN = "time_to_train"
+
+EVAL_ITERATION_ACCURACY = "eval_iteration_accuracy"
+
+
+# Model
+
+MODEL_HP_LOSS_FN = "model_hp_loss_fn"
+
+MODEL_HP_INITIAL_SHAPE = "model_hp_initial_shape"
+MODEL_HP_FINAL_SHAPE = "model_hp_final_shape"
+
+MODEL_L2_REGULARIZATION = "model_l2_regularization"
+MODEL_EXCLUDE_BN_FROM_L2 = "model_exclude_bn_from_l2"
+
+MODEL_HP_RELU = "model_hp_relu"
+MODEL_HP_CONV2D_FIXED_PADDING = "model_hp_conv2d_fixed_padding"
+MODEL_HP_BATCH_NORM = "model_hp_batch_norm"
+MODEL_HP_DENSE = "model_hp_dense"
+
+
+# GNMT specific
+
+MODEL_HP_LOSS_SMOOTHING = "model_hp_loss_smoothing"
+MODEL_HP_NUM_LAYERS = "model_hp_num_layers"
+MODEL_HP_HIDDEN_SIZE = "model_hp_hidden_size"
+MODEL_HP_DROPOUT = "model_hp_dropout"
+
+EVAL_HP_BEAM_SIZE = "eval_hp_beam_size"
+TRAIN_HP_MAX_SEQ_LEN = "train_hp_max_sequence_length"
+EVAL_HP_MAX_SEQ_LEN = "eval_hp_max_sequence_length"
+EVAL_HP_LEN_NORM_CONST = "eval_hp_length_normalization_constant"
+EVAL_HP_LEN_NORM_FACTOR = "eval_hp_length_normalization_factor"
+EVAL_HP_COV_PENALTY_FACTOR = "eval_hp_coverage_penalty_factor"
+
+
+# NCF specific
+
+PREPROC_HP_MIN_RATINGS = "preproc_hp_min_ratings"
+PREPROC_HP_NUM_EVAL = "preproc_hp_num_eval"
+PREPROC_HP_SAMPLE_EVAL_REPLACEMENT = "preproc_hp_sample_eval_replacement"
+
+INPUT_HP_NUM_NEG = "input_hp_num_neg"
+INPUT_HP_SAMPLE_TRAIN_REPLACEMENT = "input_hp_sample_train_replacement"
+INPUT_STEP_TRAIN_NEG_GEN = "input_step_train_neg_gen"
+INPUT_STEP_EVAL_NEG_GEN = "input_step_eval_neg_gen"
+
+EVAL_HP_NUM_USERS = "eval_hp_num_users"
+EVAL_HP_NUM_NEG = "eval_hp_num_neg"
+
+MODEL_HP_MF_DIM = "model_hp_mf_dim"
+MODEL_HP_MLP_LAYER_SIZES = "model_hp_mlp_layer_sizes"
+
+
+# RESNET specific
+
+EVAL_EPOCH_OFFSET = "eval_offset"
+
+MODEL_HP_INITIAL_MAX_POOL = "model_hp_initial_max_pool"
+MODEL_HP_BEGIN_BLOCK = "model_hp_begin_block"
+MODEL_HP_END_BLOCK = "model_hp_end_block"
+MODEL_HP_BLOCK_TYPE = "model_hp_block_type"
+MODEL_HP_PROJECTION_SHORTCUT = "model_hp_projection_shortcut"
+MODEL_HP_SHORTCUT_ADD = "model_hp_shorcut_add"
+MODEL_HP_RESNET_TOPOLOGY = "model_hp_resnet_topology"
+
+
+# Transformer specific
+
+INPUT_MAX_LENGTH = "input_max_length"
+
+MODEL_HP_INITIALIZER_GAIN = "model_hp_initializer_gain"
+MODEL_HP_VOCAB_SIZE = "model_hp_vocab_size"
+MODEL_HP_NUM_HIDDEN_LAYERS = "model_hp_hidden_layers"
+MODEL_HP_EMBEDDING_SHARED_WEIGHTS = "model_hp_embedding_shared_weights"
+MODEL_HP_ATTENTION_DENSE = "model_hp_attention_dense"
+MODEL_HP_ATTENTION_DROPOUT = "model_hp_attention_dropout"
+MODEL_HP_FFN_OUTPUT_DENSE = "model_hp_ffn_output_dense"
+MODEL_HP_FFN_FILTER_DENSE = "model_hp_ffn_filter_dense"
+MODEL_HP_RELU_DROPOUT = "model_hp_relu_dropout"
+MODEL_HP_LAYER_POSTPROCESS_DROPOUT = "model_hp_layer_postprocess_dropout"
+MODEL_HP_NORM = "model_hp_norm"
+MODEL_HP_SEQ_BEAM_SEARCH = "model_hp_sequence_beam_search"
+
--- a/TensorFlow/Segmentation/UNet_Medical/download_dataset.py
+++ b/TensorFlow/Segmentation/UNet_Medical/download_dataset.py
@ -0,0 +1,38 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+
+PARSER = argparse.ArgumentParser(description="U-Net medical")
+
+PARSER.add_argument('--data_dir',
+                    type=str,
+                    default=1,
+                    help="""Directory where to download the dataset""")
+
+def main():
+    FLAGS = PARSER.parse_args()
+
+    if not os.path.exists(FLAGS.data_dir):
+        os.makedirs(FLAGS.data_dir)
+
+    os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-volume.tif -P {}'.format(FLAGS.data_dir))
+    os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/train-labels.tif -P {}'.format(FLAGS.data_dir))
+    os.system('wget http://brainiac2.mit.edu/isbi_challenge/sites/default/files/test-volume.tif -P {}'.format(FLAGS.data_dir))
+
+    print("Finished downloading files for U-Net medical to {}".format(FLAGS.data_dir))
+
+if __name__ == '__main__':
+    main()
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_FP32_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_FP32_1GPU.sh
@ -0,0 +1,27 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
+# Usage ./unet_TRAIN_BENCHMARK_FP32_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+
+ python $1/main.py \
+     --data_dir $2 \
+     --model_dir $3 \
+     --warmup_steps 200 \
+     --log_every 100 \
+     --max_steps 320000 \
+     --batch_size 2 \
+     --benchmark \
+     --exec_mode train_and_predict \
+     --augment
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_FP32_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_FP32_8GPU.sh
@ -0,0 +1,37 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
+# Usage ./unet_TRAIN_BENCHMARK_FP32_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+
+mpirun \
+    -np 8 \
+    -H localhost:8 \
+    -bind-to none \
+    -map-by slot \
+    -x NCCL_DEBUG=INFO \
+    -x LD_LIBRARY_PATH \
+    -x PATH \
+    -mca pml ob1 -mca btl ^openib \
+    --allow-run-as-root \
+     python $1/main.py \
+     --data_dir $2 \
+     --model_dir $3 \
+     --warmup_steps 200 \
+     --log_every 100 \
+     --max_steps 40000 \
+     --batch_size 2 \
+     --benchmark \
+     --exec_mode train_and_predict \
+     --augment
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_FP32.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_FP32.sh
@ -0,0 +1,18 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
+# Usage ./unet_INFER_BENCHMARK_FP32.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+
+python $1/main.py --data_dir $2 --model_dir $3 --batch_size $4 --benchmark --exec_mode benchmark --augment --warmup_steps 200 --log_every 100 --max_steps 300
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_TF-AMP.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_INFER_BENCHMARK_TF-AMP.sh
@ -0,0 +1,18 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net inference in TF-AMP on 1 GPUs using 2 batch size
+# Usage ./unet_INFER_BENCHMARK_TF-AMP.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+
+python $1/main.py --data_dir $2 --model_dir $3 --batch_size $4 --benchmark --use_amp --exec_mode benchmark --augment --warmup_steps 200 --log_every 100 --max_steps 300
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_TF-AMP_1GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_TF-AMP_1GPU.sh
@ -0,0 +1,28 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
+# Usage ./unet_TF-AMP_1GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+ 
+python $1/main.py \
+     --data_dir $2 \
+     --model_dir $3 \
+     --warmup_steps 200 \
+     --log_every 100 \
+     --max_steps 320000 \
+     --batch_size 2 \
+     --benchmark \
+     --use_amp \
+     --exec_mode train_and_predict \
+     --augment
--- a/TensorFlow/Segmentation/UNet_Medical/examples/unet_TF-AMP_8GPU.sh
+++ b/TensorFlow/Segmentation/UNet_Medical/examples/unet_TF-AMP_8GPU.sh
@ -0,0 +1,38 @@
+# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# This script launches U-Net training in FP32 on 1 GPUs using 2 batch size
+# Usage ./unet_TF-AMP_8GPU.sh <path to this repository> <path to dataset> <path to results directory> <batch size>
+
+mpirun \
+    -np 8 \
+    -H localhost:8 \
+    -bind-to none \
+    -map-by slot \
+    -x NCCL_DEBUG=INFO \
+    -x LD_LIBRARY_PATH \
+    -x PATH \
+    -mca pml ob1 -mca btl ^openib \
+    --allow-run-as-root \
+     python $1/main.py \
+     --data_dir $2 \
+     --model_dir $3 \
+     --warmup_steps 200 \
+     --log_every 100 \
+     --max_steps 40000 \
+     --batch_size 2 \
+     --benchmark \
+     --use_amp \
+     --exec_mode train_and_predict \
+     --augment
--- a/Show more
+++ b/Show more
				`@ -1 +0,0 @@`
				`PRECISION=FP16 bash qa/testing_DGX1V_8GPU_1epoch.sh`
				`@ -1 +0,0 @@`
				`PRECISION=FP32 bash qa/testing_DGX1V_8GPU_1epoch.sh`
				`@ -0,0 +1 @@`
				`PRECISION=FP16 bash ../../qa/testing_DGX1V_accuracy.sh`
				`@ -0,0 +1 @@`
				`PRECISION=FP32 bash ../../qa/testing_DGX1V_accuracy.sh`