[TSPP/PyT][TFT/PyT] Initial release

This commit is contained in:
Kyle Kranen 2021-11-08 14:08:58 -08:00 committed by Krzysztof Kudrynski
parent 8f86776757
commit 38d72e07d2
302 changed files with 35214 additions and 0 deletions

View file

@ -0,0 +1,36 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.06-py3
FROM ${FROM_IMAGE_NAME}
RUN apt-get update && apt-get install -y libb64-dev libb64-0d
WORKDIR /workspace
#ENV PYTHONPATH /workspace
RUN pip uninstall -y typing
RUN apt update && apt install -y p7zip-full
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir --ignore-installed -r requirements.txt
RUN pip install --no-cache-dir -e git://github.com/NVIDIA/dllogger#egg=dllogger
COPY . .
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
# AMP monkey-patch
RUN sed -i 's/ def forward(ctx,/ @amp.custom_fwd\(cast_inputs=torch.float32\)\n def forward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/ def backward(ctx,/ @amp.custom_bwd\n def backward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/^import torch$/import torch\nfrom torch.cuda import amp/' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,25 @@
Individual Contributor License Agreement (CLA)
Thank you for submitting your contributions to this project.
By signing this CLA, you agree that the following terms apply to all of your past, present and future contributions to the project.
License.
You hereby represent that all present, past and future contributions are governed by the Apache 2.0 License copyright statement.
This entails that to the extent possible under law, you transfer all copyright and related or neighboring rights of the code or documents you contribute to the project itself or its maintainers. Furthermore you also represent that you have the authority to perform the above waiver with respect to the entirety of you contributions.
Moral Rights.
To the fullest extent permitted under applicable law, you hereby waive, and agree not to assert, all of your “moral rights” in or relating to your contributions for the benefit of the project.
Third Party Content.
If your Contribution includes or is based on any source code, object code, bug fixes, configuration changes, tools, specifications, documentation, data, materials, feedback, information or other works of authorship that were not authored by you (“Third Party Content”) or if you are aware of any third party intellectual property or proprietary rights associated with your Contribution (“Third Party Rights”), then you agree to include with the submission of your Contribution full details respecting such Third Party Content and Third Party Rights, including, without limitation, identification of which aspects of your Contribution contain Third Party Content or are associated with Third Party Rights, the owner/author of the Third Party Content and Third Party Rights, where you obtained the Third Party Content, and any applicable third party license terms or restrictions respecting the Third Party Content and Third Party Rights. For greater certainty, the foregoing obligations respecting the identification of Third Party Content and Third Party Rights do not apply to any portion of a Project that is incorporated into your Contribution to that same Project.
Representations.
You represent that, other than the Third Party Content and Third Party Rights identified by you in accordance with this Agreement, you are the sole author of your Contributions and are legally entitled to grant the foregoing licenses and waivers in respect of your Contributions. If your Contributions were created in the course of your employment with your past or present employer(s), you represent that such employer(s) has authorized you to make your Contributions on behalf of such employer(s) or such employer (s) has waived all of their right, title or interest in or to your Contributions.
Disclaimer.
To the fullest extent permitted under applicable law, your Contributions are provided on an "as is" basis, without any warranties or conditions, express or implied, including, without limitation, any implied warranties or conditions of non-infringement, merchantability or fitness for a particular purpose. You are not required to provide support for your Contributions, except to the extent you desire to provide support.
No Obligation.
You acknowledge that the maintainers of this project are under no obligation to use or incorporate your contributions into the project. The decision to use or incorporate your contributions into the project will be made at the sole discretion of the maintainers or their authorized delegates.

View file

@ -0,0 +1,3 @@
TFT for PyTorch
This repository includes software from https://github.com/google-research/google-research/tree/master/tft licensed under the Apache License, Version 2.0

View file

@ -0,0 +1,465 @@
# Temporal Fusion Transformer For PyTorch
This repository provides a script and recipe to train the Temporal Fusion Transformer model to achieve state-of-the-art accuracy. The content of this repository is tested and maintained by NVIDIA.
## Table Of Contents
- [Model overview](#model-overview)
* [Model architecture](#model-architecture)
* [Default configuration](#default-configuration)
* [Feature support matrix](#feature-support-matrix)
* [Features](#features)
* [Mixed precision training](#mixed-precision-training)
* [Enabling mixed precision](#enabling-mixed-precision)
* [Enabling TF32](#enabling-tf32)
* [Glossary](#glossary)
- [Setup](#setup)
* [Requirements](#requirements)
- [Quick Start Guide](#quick-start-guide)
- [Advanced](#advanced)
* [Scripts and sample code](#scripts-and-sample-code)
* [Command-line options](#command-line-options)
* [Getting the data](#getting-the-data)
* [Dataset guidelines](#dataset-guidelines)
* [Multi-dataset](#multi-dataset)
* [Training process](#training-process)
* [Inference process](#inference-process)
- [Performance](#performance)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training accuracy: NVIDIA DGX A100 (8x A100 80GB)](#training-accuracy-nvidia-dgx-a100-8x-a100-80gb)
* [Training accuracy: NVIDIA DGX-1 (8x V100 16GB)](#training-accuracy-nvidia-dgx-1-8x-v100-16gb)
* [Training stability test](#training-stability-test)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 (8x V100 16GB)](#training-performance-nvidia-dgx-1-8x-v100-16gb)
- [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
## Model overview
The Temporal Fusion Transformer [TFT](https://arxiv.org/abs/1912.09363) model is a state-of-the-art architecture for interpretable, multi-horizon time-series prediction. The model was first developed and [implemented by Google](https://github.com/google-research/google-research/tree/master/tft) with the collaboration with the University of Oxford.
This implementation differs from the reference implementation by addressing the issue of missing data, which is common in production datasets, by either masking their values in attention matrices or embedding them as a special value in the latent space.
This model enables the prediction of confidence intervals for future values of time series for multiple future timesteps.
This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.45x faster than training without Tensor Cores while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
### Model architecture
The TFT model is a hybrid architecture joining LSTM encoding of time series and interpretability of transformer attention layers. Prediction is based on three types of variables: static (constant for a given time series), known (known in advance for whole history and future), observed (known only for historical data). All these variables come in two flavors: categorical, and continuous. In addition to historical data, we feed the model with historical values of time series. All variables are embedded in high-dimensional space by learning an embedding vector. Categorical variables embeddings are learned in the classical sense of embedding discrete values. The model learns a single vector for each continuous variable, which is then scaled by this variables value for further processing. The next step is to filter variables through the Variable Selection Network (VSN), which assigns weights to the inputs in accordance with their relevance to the prediction. Static variables are used as a context for variable selection of other variables and as an initial state of LSTM encoders.
After encoding, variables are passed to multi-head attention layers (decoder), which produce the final prediction. Whole architecture is interwoven with residual connections with gating mechanisms that allow the architecture to adapt to various problems by skipping some parts of it.
For the sake of explainability, heads of self-attention layers share value matrices. This allows interpreting self-attention as an ensemble of models predicting different temporal patterns over the same feature set. The other feature that helps us understand the model is VSN activations, which tells us how relevant the given feature is to the prediction.
![](TFT_architecture.PNG)
*image source: https://arxiv.org/abs/1912.09363*
### Default configuration
The specific configuration of the TFT model depends on the dataset used. Not only is the volume of the model subject to change but so are the data sampling and preprocessing strategies. During preprocessing, data is normalized per feature. For a part of the datasets, we apply scaling per-time-series, which takes into account shifts in distribution between entities (i.e., a factory consumes more electricity than an average house). The model is trained with the quantile loss: <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_it,\hat{y}_i(q,t),q)}{Nt_{max}}">
For quantiles in [0.1, 0.5, 0.9]. The default configurations are tuned for distributed training on DGX-1-32G with mixed precision. We use dynamic loss scaling. Specific values are provided in the table below.
| Dataset | Training samples | Validation samples | Test samples | History length | Forecast horizon | Dropout | Hidden size | #Heads | BS | LR | Gradient clipping |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Electricity | 450k | 50k | 53.5k | 168 | 24 | 0.1 | 128 | 4 | 8x1024 | 1e-3 | 0.0 |
| Traffic | 450k | 50k | 139.6k | 168 | 24 | 0.3 | 128 | 4 | 8x1024 | 1e-3 | 0.0
### Feature support matrix
The following features are supported by this model:
| Feature | Yes column
|----------------------------|--------------------------
|Distributed data parallel | Yes
|PyTorch AMP | Yes
#### Features
[Automatic Mixed Precision](https://pytorch.org/docs/stable/amp.html)
provides an easy way to leverage Tensor Cores performance. It allows the execution of parts of a network in lower precision. Refer to [Mixed precision training](#mixed-precision-training) for more information.
[PyTorch
DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) - a module
wrapper that enables easy multiprocess distributed data-parallel
training.
### Mixed precision training
Mixed precision is the combined use of different numerical precisions in a
computational method.
[Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant
computational speedup by performing operations in half-precision format while
storing minimal information in single-precision to retain as much information
as possible in critical parts of the network. Since the introduction of [Tensor Cores](https://developer.nvidia.com/tensor-cores) in Volta, and following with
both the Turing and Ampere architectures, significant training speedups are
experienced by switching to
mixed precision -- up to 3x overall speedup on the most arithmetically intense
model architectures. Using mixed precision training previously required two
steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Manually adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced
in the Pascal architecture and first supported in [CUDA
8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep
Learning SDK.
For information about:
* How to train using mixed precision, refer to the [Mixed Precision
Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed
Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html)
documentation.
* Techniques used for mixed precision training, refer to the [Mixed-Precision
Training of Deep Neural
Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/)
blog.
* APEX tools for mixed precision training, refer to the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in
PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/)
.
#### Enabling mixed precision
Mixed precision is enabled in PyTorch by using the Automatic Mixed Precision torch.cuda.amp module, which casts variables to half-precision upon retrieval while storing variables in single-precision format. Furthermore, to preserve small gradient magnitudes in backpropagation, a [loss scaling](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#lossscaling) step must be included when applying gradients. In PyTorch, loss scaling can be applied automatically by the GradScaler class. All the necessary steps to implement AMP are verbosely described [here](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples).
To enable mixed precision for TFT, simply add the `--use_amp` option to the training script.
#### Enabling TF32
TensorFloat-32 (TF32) is the new math mode in [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) GPUs for handling the matrix math, also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs.
TF32 Tensor Cores can speed up networks using FP32, typically with no loss of accuracy. It is more robust than FP16 for models which require high dynamic range for weights or activations.
For more information, refer to the [TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) blog post.
TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
### Glossary
**Multi horizon prediction**
Process of estimating values of a time series for multiple future time steps.
**Quantiles**
Cut points dividing the range of a probability distribution intervals with equal probabilities.
**Time series**
Series of data points indexed and equally spaced in time.
**Transformer**
The paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduces a novel architecture called Transformer that uses an attention mechanism and transforms one sequence into another.
## Setup
The following section lists the requirements that you need to meet in order to start training the TFT model.
### Requirements
This repository contains Dockerfile, which extends the PyTorch NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
- [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
- Supported GPUs:
- [NVIDIA Volta architecture](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
- [NVIDIA Turing architecture](https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/)
- [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
- Running [PyTorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running)
For those unable to use the PyTorch NGC container to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
## Quick Start Guide
To train your model using mixed or TF32 precision with Tensor Cores, perform the following steps using the default parameters of the TFT model on any of the benchmark datasets. For the specifics concerning training and inference, refer to the [Advanced](#advanced) section.
1. Clone the repository.
```bash
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Forecasting/TFT
```
2. Build the TFT PyTorch NGC container.
```bash
docker build --network=host -t tft .
```
3. Start an interactive session in the NGC container to run training/inference.
```bash
docker run -it --rm --ipc=host --network=host --gpus all -v /path/to/your/data:/data/ tft
```
Note: Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
4. Download and preprocess datasets.
```bash
bash scripts/get_data.sh
```
5. Start training. Choose one of the scripts provided in the `scripts/` directory. Results are stored in the `/results` directory.
These scripts are tuned for DGX1-32G. If you have a different system, use NGPU and BATCH_SIZE variables to adjust the parameters for your system.
```bash
bash scripts/run_electricity.sh
bash scripts/run_traffic.sh
```
6. Start validation/evaluation. The metric we use for evaluation is q-risk. We can compare it per-quantile in the Pareto sense or jointly as one number indicating accuracy.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin
```
7. Start inference/predictions. Visualize and save predictions by running the following command.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin \
--visualize \
--save_predictions
```
Now that you have your model trained and evaluated, you can choose to compare your training results with our [Training accuracy results](#training-accuracy-results). You can also choose to benchmark your performance to [Training performance benchmark](#training-performance-results). Following the steps in these sections will ensure that you achieve the same accuracy and performance results as stated in the [Results](#results) section.
## Advanced
The following sections provide more details about the dataset, running training and inference, and the training results.
### Scripts and sample code
In the root directory, the most important files are:
`train.py`: Entry point for training
`data_utils.py`: File containing the dataset implementation and preprocessing functions
`modeling.py`: Definition of the model
`configuration.py`: Contains configuration classes for various experiments
`test.py`: Entry point testing trained model.
`Dockerfile`: Container definition
`log_helper.py`: Contains helper functions for setting up dllogger
`criterions.py`: Definitions of loss functions
The `scripts` directory contains scripts for default use cases:
`run_electricity.sh`: train default model on the electricity dataset
`run_traffic.sh`: train default model on the traffic dataset
### Command-line options
To view the full list of available options and their descriptions, use the `-h` or `--help` command-line option, for example:
`python train.py --help`.
The following example output is printed when running the model:
```
usage: train.py [-h] --data_path DATA_PATH --dataset {electricity,volatility,traffic,favorita} [--epochs EPOCHS] [--sample_data SAMPLE_DATA SAMPLE_DATA] [--batch_size BATCH_SIZE] [--lr LR] [--seed SEED] [--use_amp] [--clip_grad CLIP_GRAD]
[--early_stopping EARLY_STOPPING] [--results RESULTS] [--log_file LOG_FILE] [--distributed_world_size N] [--distributed_rank DISTRIBUTED_RANK] [--local_rank LOCAL_RANK] [--overwrite_config OVERWRITE_CONFIG]
optional arguments:
-h, --help show this help message and exit
--data_path DATA_PATH
--dataset {electricity,volatility,traffic,favorita}
--epochs EPOCHS
--sample_data SAMPLE_DATA SAMPLE_DATA
--batch_size BATCH_SIZE
--lr LR
--seed SEED
--use_amp Enable automatic mixed precision
--clip_grad CLIP_GRAD
--early_stopping EARLY_STOPPING
Stop training if validation loss does not improve for more than this number of epochs.
--results RESULTS
--log_file LOG_FILE
--distributed_world_size N
total number of GPUs across all nodes (default: all visible GPUs)
--distributed_rank DISTRIBUTED_RANK
rank of the current worker
--local_rank LOCAL_RANK
rank of the current worker
--overwrite_config OVERWRITE_CONFIG
JSON string used to overload config
```
### Getting the data
The TFT model was trained on the electricity and traffic benchmark datasets. This repository contains the `get_data.sh` download script, which for electricity and and traffic datasets will automatically download and preprocess the training, validation and test datasets, and produce files that contain scalers.
#### Dataset guidelines
The `data_utils.py` file contains all functions that are used to preprocess the data. Initially the data is loaded to a `pandas.DataFrame` and parsed to the common format which contains the features we will use for training. Then standardized data is cleaned, normalized, encoded and binarized.
This step does the following:
Drop all the columns that are not marked in the configuration file as used for training or preprocessing
Flatten indices in case time series are indexed by more than one column
Split the data into training, validation and test splits
Filter out all the time series shorter than minimal example length
Normalize columns marked as continuous in the configuration file
Encode as integers columns marked as categorical
Save the data in csv and binary formats
#### Multi-dataset
In order to use an alternate dataset, you have to write a function that parses your data to a common format. The format is as follows:
There is at least one id column
There is exactly one time column (that can also be used as a feature column)
Each feature is in a separate column
Each row represents a moment in time for only one time series
Additionally, you must specify a configuration of the network, including a data description. Refer to the example in `configuration.py` file.
### Training process
The `train.py` script is an entry point for a training procedure. Refined recipes can be found in the `scripts` directory.
The model trains for at most `--epochs` epochs. If option `--early_stopping N` is set, then training will end if for N subsequent epochs validation loss hadnt improved.
The details of the architecture and the dataset configuration are encapsulated by the `--dataset` option. This option chooses one of the configurations stored in the `configuration.py` file. You can enable mixed precision training by providing the `--use_amp` option. The training script supports multi-GPU training with the APEX package. To enable distributed training prepend training command with `python -m torch.distributed.launch --nproc_per_node=${NGPU}`.
Example command:
```
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=1024 \
--sample 450000 50000 \
--lr 1e-3 \
--epochs 25 \
--early_stopping 5 \
--seed 1 \
--use_amp \
--results /results/TFT_electricity_bs8x1024_lr1e-3/seed_1
```
The model is trained by optimizing quantile loss <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_{it},\hat{y}_i(q,t),q)}{Nt_{max}}">
. After training, the checkpoint with the least validation loss is evaluated on a test split with q-risk metric <img src="https://render.githubusercontent.com/render/math?math=\Large\frac{2\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}QL(y_t,\hat{y}(q,t),q)}{\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}|y_t|}">.
Results are by default stored in the `/results` directory. This can be changed by providing the `--results` option. At the end of the training, the results directory will contain the trained checkpoint which had the lowest validation loss, dllogger logs (in dictionary per line format), and TensorBoard logs.
### Inference process
Inference can be run by launching the `inference.py` script. The script requires a trained checkpoint to run. It is crucial to prepare the data in the same way as training data prior to running the inference. Example command:
```
python inference.py \
--checkpoint /results/checkpoint.pt \
--data /data/processed/electricity_bin/test.csv \
--tgt_scalers /data/processed/electricity_bin/tgt_scalers.bin \
--cat_encodings /data/processed/electricity_bin/cat_encodings.bin \
--batch_size 2048 \
--visualize \
--save_predictions \
--joint_visualization \
--results /results \
--use_amp
```
In the default setting, it performs the evaluation of the model on a specified dataset and prints q-risk evaluated on this dataset. In order to save the predictions, use the `--save_predictions` option. Predictions will be stored in the directory specified by the `--results` option in the csv format. Option `--joint_visualization` allows us to plot graphs in TensorBoard format, allowing us to inspect the results and compare them to true values. Using `--visualize`, you can save plots for each example in a separate file.
## Performance
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
#### Training performance benchmark
In order to run training benchmarks, use the `scripts/benchmark.sh` script.
#### Inference performance benchmark
To benchmark the inference performance on a specific batch size and dataset, run the `inference.py` script.
### Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
#### Training accuracy results
We conducted an extensive hyperparameter search along with stability tests. The presented results are the averages from the hundreds of runs.
##### Training accuracy: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - TF32 | Accuracy - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (TF32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-------
| Electricity | 1 | 1024 | 0.027 / 0.059 / 0.029 | 0.028 / 0.058 / 0.029 | 1427s | 1087s | 1.313x
| Electricity | 8 | 1024 | 0.027 / 0.056 / 0.028 | 0.026 / 0.054 / 0.029 | 216s | 176s | 1.227x
| Traffic | 1 | 1024 | 0.040 / 0.103 / 0.075 | 0.040 / 0.103 / 0.075 | 957s | 726s | 1.318x
| Traffic | 8 | 1024 | 0.042 / 0.104 / 0.076 | 0.042 / 0.106 / 0.077 | 151s | 126s | 1.198x
##### Training accuracy: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with V100 16GB GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - FP32 | Accuracy - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-----------
| Electricity | 1 | 1024 | 0.027 / 0.056 / 0.028 | 0.027 / 0.058 / 0.029 | 2559s | 1598s | 1.601x
| Electricity | 8 | 1024 | 0.027 / 0.055 / 0.028 | 0.027 / 0.055 / 0.029 | 381s | 261s | 1.460x
| Traffic | 1 | 1024 | 0.040 / 0.102 / 0.075 | 0.041 / 0.101 / 0.074 | 1718s | 1062s | 1.618x
| Traffic | 8 | 1024 | 0.042 / 0.106 / 0.076 | 0.042 / 0.105 / 0.077 | 256s | 176s | 1.455x
##### Training stability test
In order to get a greater picture of the models accuracy, we performed a hyperparameter search along with stability tests on 100 random seeds for each configuration. Then, for each benchmark dataset, we have chosen the architecture with the least mean test q-risk. The table below summarizes the best configurations.
| Dataset | #GPU | Hidden size | #Heads | Local BS | LR | Gradient clipping | Dropout | Mean q-risk | Std q-risk | Min q-risk | Max q-risk
|-------------|------|-------------|--------|----------|------|-------------------|---------|-------------|------------| -----------|------
| Electricity | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.1 | 0.1131 | 0.0025 | 0.1080 | 0.1200
| Traffic | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.3 | 0.2180 | 0.0049 | 0.2069 | 0.2336
#### Training performance results
##### Training performance: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 (A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - TF32 | Throughput - mixed precision | Throughput speedup (TF32 - mixed precision) | Weak scaling - TF32 | Weak scaling - mixed precision
|-------------|---|------|--------|--------|-------|-------|-----
| Electricity | 1 | 1024 | 10173 | 13703 | 1.35x | 1 | 1
| Electricity | 8 | 1024 | 80596 | 107761 | 1.34x | 7.92x | 7.86x
| Traffic | 1 | 1024 | 10197 | 13779 | 1.35x | 1 | 1
| Traffic | 8 | 1024 | 80692 | 107979 | 1.34x | 7.91x | 7.84x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
##### Training performance: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with (V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - FP32 | Throughput - mixed precision | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision
|-------------|---|------|-------|-------|-------|------|----
| Electricity | 1 | 1024 | 5580 | 9148 | 1.64x | 1 | 1
| Electricity | 8 | 1024 | 43351 | 69855 | 1.61x | 7.77x | 7.64x
| Traffic | 1 | 1024 | 5593 | 9194 | 1.64x | 1 | 1
| Traffic | 8 | 1024 | 43426 | 69983 | 1.61x | 7.76x | 7.61x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
## Release notes
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIAs latest software release. For the most up-to-date performance measurements, go to https://developer.nvidia.com/deep-learning-performance-training-inference.
### Changelog
October 2021
- Initial release
### Known issues
There are no known issues with this model.

Binary file not shown.

After

Width:  |  Height:  |  Size: 969 KiB

View file

@ -0,0 +1,36 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.06-py3
FROM ${FROM_IMAGE_NAME}
RUN apt-get update && apt-get install -y libb64-dev libb64-0d
WORKDIR /workspace
#ENV PYTHONPATH /workspace
RUN pip uninstall -y typing
RUN apt update && apt install -y p7zip-full
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir --ignore-installed -r requirements.txt
RUN pip install --no-cache-dir -e git://github.com/NVIDIA/dllogger#egg=dllogger
COPY . .
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
# AMP monkey-patch
RUN sed -i 's/ def forward(ctx,/ @amp.custom_fwd\(cast_inputs=torch.float32\)\n def forward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/ def backward(ctx,/ @amp.custom_bwd\n def backward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/^import torch$/import torch\nfrom torch.cuda import amp/' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,25 @@
Individual Contributor License Agreement (CLA)
Thank you for submitting your contributions to this project.
By signing this CLA, you agree that the following terms apply to all of your past, present and future contributions to the project.
License.
You hereby represent that all present, past and future contributions are governed by the Apache 2.0 License copyright statement.
This entails that to the extent possible under law, you transfer all copyright and related or neighboring rights of the code or documents you contribute to the project itself or its maintainers. Furthermore you also represent that you have the authority to perform the above waiver with respect to the entirety of you contributions.
Moral Rights.
To the fullest extent permitted under applicable law, you hereby waive, and agree not to assert, all of your “moral rights” in or relating to your contributions for the benefit of the project.
Third Party Content.
If your Contribution includes or is based on any source code, object code, bug fixes, configuration changes, tools, specifications, documentation, data, materials, feedback, information or other works of authorship that were not authored by you (“Third Party Content”) or if you are aware of any third party intellectual property or proprietary rights associated with your Contribution (“Third Party Rights”), then you agree to include with the submission of your Contribution full details respecting such Third Party Content and Third Party Rights, including, without limitation, identification of which aspects of your Contribution contain Third Party Content or are associated with Third Party Rights, the owner/author of the Third Party Content and Third Party Rights, where you obtained the Third Party Content, and any applicable third party license terms or restrictions respecting the Third Party Content and Third Party Rights. For greater certainty, the foregoing obligations respecting the identification of Third Party Content and Third Party Rights do not apply to any portion of a Project that is incorporated into your Contribution to that same Project.
Representations.
You represent that, other than the Third Party Content and Third Party Rights identified by you in accordance with this Agreement, you are the sole author of your Contributions and are legally entitled to grant the foregoing licenses and waivers in respect of your Contributions. If your Contributions were created in the course of your employment with your past or present employer(s), you represent that such employer(s) has authorized you to make your Contributions on behalf of such employer(s) or such employer (s) has waived all of their right, title or interest in or to your Contributions.
Disclaimer.
To the fullest extent permitted under applicable law, your Contributions are provided on an "as is" basis, without any warranties or conditions, express or implied, including, without limitation, any implied warranties or conditions of non-infringement, merchantability or fitness for a particular purpose. You are not required to provide support for your Contributions, except to the extent you desire to provide support.
No Obligation.
You acknowledge that the maintainers of this project are under no obligation to use or incorporate your contributions into the project. The decision to use or incorporate your contributions into the project will be made at the sole discretion of the maintainers or their authorized delegates.

View file

@ -0,0 +1,3 @@
TFT for PyTorch
This repository includes software from https://github.com/google-research/google-research/tree/master/tft licensed under the Apache License, Version 2.0

View file

@ -0,0 +1,465 @@
# Temporal Fusion Transformer For PyTorch
This repository provides a script and recipe to train the Temporal Fusion Transformer model to achieve state-of-the-art accuracy. The content of this repository is tested and maintained by NVIDIA.
## Table Of Contents
- [Model overview](#model-overview)
* [Model architecture](#model-architecture)
* [Default configuration](#default-configuration)
* [Feature support matrix](#feature-support-matrix)
* [Features](#features)
* [Mixed precision training](#mixed-precision-training)
* [Enabling mixed precision](#enabling-mixed-precision)
* [Enabling TF32](#enabling-tf32)
* [Glossary](#glossary)
- [Setup](#setup)
* [Requirements](#requirements)
- [Quick Start Guide](#quick-start-guide)
- [Advanced](#advanced)
* [Scripts and sample code](#scripts-and-sample-code)
* [Command-line options](#command-line-options)
* [Getting the data](#getting-the-data)
* [Dataset guidelines](#dataset-guidelines)
* [Multi-dataset](#multi-dataset)
* [Training process](#training-process)
* [Inference process](#inference-process)
- [Performance](#performance)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training accuracy: NVIDIA DGX A100 (8x A100 80GB)](#training-accuracy-nvidia-dgx-a100-8x-a100-80gb)
* [Training accuracy: NVIDIA DGX-1 (8x V100 16GB)](#training-accuracy-nvidia-dgx-1-8x-v100-16gb)
* [Training stability test](#training-stability-test)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 (8x V100 16GB)](#training-performance-nvidia-dgx-1-8x-v100-16gb)
- [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
## Model overview
The Temporal Fusion Transformer [TFT](https://arxiv.org/abs/1912.09363) model is a state-of-the-art architecture for interpretable, multi-horizon time-series prediction. The model was first developed and [implemented by Google](https://github.com/google-research/google-research/tree/master/tft) with the collaboration with the University of Oxford.
This implementation differs from the reference implementation by addressing the issue of missing data, which is common in production datasets, by either masking their values in attention matrices or embedding them as a special value in the latent space.
This model enables the prediction of confidence intervals for future values of time series for multiple future timesteps.
This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.45x faster than training without Tensor Cores while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
### Model architecture
The TFT model is a hybrid architecture joining LSTM encoding of time series and interpretability of transformer attention layers. Prediction is based on three types of variables: static (constant for a given time series), known (known in advance for whole history and future), observed (known only for historical data). All these variables come in two flavors: categorical, and continuous. In addition to historical data, we feed the model with historical values of time series. All variables are embedded in high-dimensional space by learning an embedding vector. Categorical variables embeddings are learned in the classical sense of embedding discrete values. The model learns a single vector for each continuous variable, which is then scaled by this variables value for further processing. The next step is to filter variables through the Variable Selection Network (VSN), which assigns weights to the inputs in accordance with their relevance to the prediction. Static variables are used as a context for variable selection of other variables and as an initial state of LSTM encoders.
After encoding, variables are passed to multi-head attention layers (decoder), which produce the final prediction. Whole architecture is interwoven with residual connections with gating mechanisms that allow the architecture to adapt to various problems by skipping some parts of it.
For the sake of explainability, heads of self-attention layers share value matrices. This allows interpreting self-attention as an ensemble of models predicting different temporal patterns over the same feature set. The other feature that helps us understand the model is VSN activations, which tells us how relevant the given feature is to the prediction.
![](TFT_architecture.PNG)
*image source: https://arxiv.org/abs/1912.09363*
### Default configuration
The specific configuration of the TFT model depends on the dataset used. Not only is the volume of the model subject to change but so are the data sampling and preprocessing strategies. During preprocessing, data is normalized per feature. For a part of the datasets, we apply scaling per-time-series, which takes into account shifts in distribution between entities (i.e., a factory consumes more electricity than an average house). The model is trained with the quantile loss: <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_it,\hat{y}_i(q,t),q)}{Nt_{max}}">
For quantiles in [0.1, 0.5, 0.9]. The default configurations are tuned for distributed training on DGX-1-32G with mixed precision. We use dynamic loss scaling. Specific values are provided in the table below.
| Dataset | Training samples | Validation samples | Test samples | History length | Forecast horizon | Dropout | Hidden size | #Heads | BS | LR | Gradient clipping |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Electricity | 450k | 50k | 53.5k | 168 | 24 | 0.1 | 128 | 4 | 8x1024 | 1e-3 | 0.0 |
| Traffic | 450k | 50k | 139.6k | 168 | 24 | 0.3 | 128 | 4 | 8x1024 | 1e-3 | 0.0
### Feature support matrix
The following features are supported by this model:
| Feature | Yes column
|----------------------------|--------------------------
|Distributed data parallel | Yes
|PyTorch AMP | Yes
#### Features
[Automatic Mixed Precision](https://pytorch.org/docs/stable/amp.html)
provides an easy way to leverage Tensor Cores performance. It allows the execution of parts of a network in lower precision. Refer to [Mixed precision training](#mixed-precision-training) for more information.
[PyTorch
DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) - a module
wrapper that enables easy multiprocess distributed data-parallel
training.
### Mixed precision training
Mixed precision is the combined use of different numerical precisions in a
computational method.
[Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant
computational speedup by performing operations in half-precision format while
storing minimal information in single-precision to retain as much information
as possible in critical parts of the network. Since the introduction of [Tensor Cores](https://developer.nvidia.com/tensor-cores) in Volta, and following with
both the Turing and Ampere architectures, significant training speedups are
experienced by switching to
mixed precision -- up to 3x overall speedup on the most arithmetically intense
model architectures. Using mixed precision training previously required two
steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Manually adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced
in the Pascal architecture and first supported in [CUDA
8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep
Learning SDK.
For information about:
* How to train using mixed precision, refer to the [Mixed Precision
Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed
Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html)
documentation.
* Techniques used for mixed precision training, refer to the [Mixed-Precision
Training of Deep Neural
Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/)
blog.
* APEX tools for mixed precision training, refer to the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in
PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/)
.
#### Enabling mixed precision
Mixed precision is enabled in PyTorch by using the Automatic Mixed Precision torch.cuda.amp module, which casts variables to half-precision upon retrieval while storing variables in single-precision format. Furthermore, to preserve small gradient magnitudes in backpropagation, a [loss scaling](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#lossscaling) step must be included when applying gradients. In PyTorch, loss scaling can be applied automatically by the GradScaler class. All the necessary steps to implement AMP are verbosely described [here](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples).
To enable mixed precision for TFT, simply add the `--use_amp` option to the training script.
#### Enabling TF32
TensorFloat-32 (TF32) is the new math mode in [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) GPUs for handling the matrix math, also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs.
TF32 Tensor Cores can speed up networks using FP32, typically with no loss of accuracy. It is more robust than FP16 for models which require high dynamic range for weights or activations.
For more information, refer to the [TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) blog post.
TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
### Glossary
**Multi horizon prediction**
Process of estimating values of a time series for multiple future time steps.
**Quantiles**
Cut points dividing the range of a probability distribution intervals with equal probabilities.
**Time series**
Series of data points indexed and equally spaced in time.
**Transformer**
The paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduces a novel architecture called Transformer that uses an attention mechanism and transforms one sequence into another.
## Setup
The following section lists the requirements that you need to meet in order to start training the TFT model.
### Requirements
This repository contains Dockerfile, which extends the PyTorch NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
- [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
- Supported GPUs:
- [NVIDIA Volta architecture](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
- [NVIDIA Turing architecture](https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/)
- [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
- Running [PyTorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running)
For those unable to use the PyTorch NGC container to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
## Quick Start Guide
To train your model using mixed or TF32 precision with Tensor Cores, perform the following steps using the default parameters of the TFT model on any of the benchmark datasets. For the specifics concerning training and inference, refer to the [Advanced](#advanced) section.
1. Clone the repository.
```bash
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Forecasting/TFT
```
2. Build the TFT PyTorch NGC container.
```bash
docker build --network=host -t tft .
```
3. Start an interactive session in the NGC container to run training/inference.
```bash
docker run -it --rm --ipc=host --network=host --gpus all -v /path/to/your/data:/data/ tft
```
Note: Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
4. Download and preprocess datasets.
```bash
bash scripts/get_data.sh
```
5. Start training. Choose one of the scripts provided in the `scripts/` directory. Results are stored in the `/results` directory.
These scripts are tuned for DGX1-32G. If you have a different system, use NGPU and BATCH_SIZE variables to adjust the parameters for your system.
```bash
bash scripts/run_electricity.sh
bash scripts/run_traffic.sh
```
6. Start validation/evaluation. The metric we use for evaluation is q-risk. We can compare it per-quantile in the Pareto sense or jointly as one number indicating accuracy.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin
```
7. Start inference/predictions. Visualize and save predictions by running the following command.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin \
--visualize \
--save_predictions
```
Now that you have your model trained and evaluated, you can choose to compare your training results with our [Training accuracy results](#training-accuracy-results). You can also choose to benchmark your performance to [Training performance benchmark](#training-performance-results). Following the steps in these sections will ensure that you achieve the same accuracy and performance results as stated in the [Results](#results) section.
## Advanced
The following sections provide more details about the dataset, running training and inference, and the training results.
### Scripts and sample code
In the root directory, the most important files are:
`train.py`: Entry point for training
`data_utils.py`: File containing the dataset implementation and preprocessing functions
`modeling.py`: Definition of the model
`configuration.py`: Contains configuration classes for various experiments
`test.py`: Entry point testing trained model.
`Dockerfile`: Container definition
`log_helper.py`: Contains helper functions for setting up dllogger
`criterions.py`: Definitions of loss functions
The `scripts` directory contains scripts for default use cases:
`run_electricity.sh`: train default model on the electricity dataset
`run_traffic.sh`: train default model on the traffic dataset
### Command-line options
To view the full list of available options and their descriptions, use the `-h` or `--help` command-line option, for example:
`python train.py --help`.
The following example output is printed when running the model:
```
usage: train.py [-h] --data_path DATA_PATH --dataset {electricity,volatility,traffic,favorita} [--epochs EPOCHS] [--sample_data SAMPLE_DATA SAMPLE_DATA] [--batch_size BATCH_SIZE] [--lr LR] [--seed SEED] [--use_amp] [--clip_grad CLIP_GRAD]
[--early_stopping EARLY_STOPPING] [--results RESULTS] [--log_file LOG_FILE] [--distributed_world_size N] [--distributed_rank DISTRIBUTED_RANK] [--local_rank LOCAL_RANK] [--overwrite_config OVERWRITE_CONFIG]
optional arguments:
-h, --help show this help message and exit
--data_path DATA_PATH
--dataset {electricity,volatility,traffic,favorita}
--epochs EPOCHS
--sample_data SAMPLE_DATA SAMPLE_DATA
--batch_size BATCH_SIZE
--lr LR
--seed SEED
--use_amp Enable automatic mixed precision
--clip_grad CLIP_GRAD
--early_stopping EARLY_STOPPING
Stop training if validation loss does not improve for more than this number of epochs.
--results RESULTS
--log_file LOG_FILE
--distributed_world_size N
total number of GPUs across all nodes (default: all visible GPUs)
--distributed_rank DISTRIBUTED_RANK
rank of the current worker
--local_rank LOCAL_RANK
rank of the current worker
--overwrite_config OVERWRITE_CONFIG
JSON string used to overload config
```
### Getting the data
The TFT model was trained on the electricity and traffic benchmark datasets. This repository contains the `get_data.sh` download script, which for electricity and and traffic datasets will automatically download and preprocess the training, validation and test datasets, and produce files that contain scalers.
#### Dataset guidelines
The `data_utils.py` file contains all functions that are used to preprocess the data. Initially the data is loaded to a `pandas.DataFrame` and parsed to the common format which contains the features we will use for training. Then standardized data is cleaned, normalized, encoded and binarized.
This step does the following:
Drop all the columns that are not marked in the configuration file as used for training or preprocessing
Flatten indices in case time series are indexed by more than one column
Split the data into training, validation and test splits
Filter out all the time series shorter than minimal example length
Normalize columns marked as continuous in the configuration file
Encode as integers columns marked as categorical
Save the data in csv and binary formats
#### Multi-dataset
In order to use an alternate dataset, you have to write a function that parses your data to a common format. The format is as follows:
There is at least one id column
There is exactly one time column (that can also be used as a feature column)
Each feature is in a separate column
Each row represents a moment in time for only one time series
Additionally, you must specify a configuration of the network, including a data description. Refer to the example in `configuration.py` file.
### Training process
The `train.py` script is an entry point for a training procedure. Refined recipes can be found in the `scripts` directory.
The model trains for at most `--epochs` epochs. If option `--early_stopping N` is set, then training will end if for N subsequent epochs validation loss hadnt improved.
The details of the architecture and the dataset configuration are encapsulated by the `--dataset` option. This option chooses one of the configurations stored in the `configuration.py` file. You can enable mixed precision training by providing the `--use_amp` option. The training script supports multi-GPU training with the APEX package. To enable distributed training prepend training command with `python -m torch.distributed.launch --nproc_per_node=${NGPU}`.
Example command:
```
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=1024 \
--sample 450000 50000 \
--lr 1e-3 \
--epochs 25 \
--early_stopping 5 \
--seed 1 \
--use_amp \
--results /results/TFT_electricity_bs8x1024_lr1e-3/seed_1
```
The model is trained by optimizing quantile loss <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_{it},\hat{y}_i(q,t),q)}{Nt_{max}}">
. After training, the checkpoint with the least validation loss is evaluated on a test split with q-risk metric <img src="https://render.githubusercontent.com/render/math?math=\Large\frac{2\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}QL(y_t,\hat{y}(q,t),q)}{\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}|y_t|}">.
Results are by default stored in the `/results` directory. This can be changed by providing the `--results` option. At the end of the training, the results directory will contain the trained checkpoint which had the lowest validation loss, dllogger logs (in dictionary per line format), and TensorBoard logs.
### Inference process
Inference can be run by launching the `inference.py` script. The script requires a trained checkpoint to run. It is crucial to prepare the data in the same way as training data prior to running the inference. Example command:
```
python inference.py \
--checkpoint /results/checkpoint.pt \
--data /data/processed/electricity_bin/test.csv \
--tgt_scalers /data/processed/electricity_bin/tgt_scalers.bin \
--cat_encodings /data/processed/electricity_bin/cat_encodings.bin \
--batch_size 2048 \
--visualize \
--save_predictions \
--joint_visualization \
--results /results \
--use_amp
```
In the default setting, it performs the evaluation of the model on a specified dataset and prints q-risk evaluated on this dataset. In order to save the predictions, use the `--save_predictions` option. Predictions will be stored in the directory specified by the `--results` option in the csv format. Option `--joint_visualization` allows us to plot graphs in TensorBoard format, allowing us to inspect the results and compare them to true values. Using `--visualize`, you can save plots for each example in a separate file.
## Performance
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
#### Training performance benchmark
In order to run training benchmarks, use the `scripts/benchmark.sh` script.
#### Inference performance benchmark
To benchmark the inference performance on a specific batch size and dataset, run the `inference.py` script.
### Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
#### Training accuracy results
We conducted an extensive hyperparameter search along with stability tests. The presented results are the averages from the hundreds of runs.
##### Training accuracy: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - TF32 | Accuracy - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (TF32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-------
| Electricity | 1 | 1024 | 0.027 / 0.059 / 0.029 | 0.028 / 0.058 / 0.029 | 1427s | 1087s | 1.313x
| Electricity | 8 | 1024 | 0.027 / 0.056 / 0.028 | 0.026 / 0.054 / 0.029 | 216s | 176s | 1.227x
| Traffic | 1 | 1024 | 0.040 / 0.103 / 0.075 | 0.040 / 0.103 / 0.075 | 957s | 726s | 1.318x
| Traffic | 8 | 1024 | 0.042 / 0.104 / 0.076 | 0.042 / 0.106 / 0.077 | 151s | 126s | 1.198x
##### Training accuracy: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with V100 16GB GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - FP32 | Accuracy - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-----------
| Electricity | 1 | 1024 | 0.027 / 0.056 / 0.028 | 0.027 / 0.058 / 0.029 | 2559s | 1598s | 1.601x
| Electricity | 8 | 1024 | 0.027 / 0.055 / 0.028 | 0.027 / 0.055 / 0.029 | 381s | 261s | 1.460x
| Traffic | 1 | 1024 | 0.040 / 0.102 / 0.075 | 0.041 / 0.101 / 0.074 | 1718s | 1062s | 1.618x
| Traffic | 8 | 1024 | 0.042 / 0.106 / 0.076 | 0.042 / 0.105 / 0.077 | 256s | 176s | 1.455x
##### Training stability test
In order to get a greater picture of the models accuracy, we performed a hyperparameter search along with stability tests on 100 random seeds for each configuration. Then, for each benchmark dataset, we have chosen the architecture with the least mean test q-risk. The table below summarizes the best configurations.
| Dataset | #GPU | Hidden size | #Heads | Local BS | LR | Gradient clipping | Dropout | Mean q-risk | Std q-risk | Min q-risk | Max q-risk
|-------------|------|-------------|--------|----------|------|-------------------|---------|-------------|------------| -----------|------
| Electricity | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.1 | 0.1131 | 0.0025 | 0.1080 | 0.1200
| Traffic | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.3 | 0.2180 | 0.0049 | 0.2069 | 0.2336
#### Training performance results
##### Training performance: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 (A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - TF32 | Throughput - mixed precision | Throughput speedup (TF32 - mixed precision) | Weak scaling - TF32 | Weak scaling - mixed precision
|-------------|---|------|--------|--------|-------|-------|-----
| Electricity | 1 | 1024 | 10173 | 13703 | 1.35x | 1 | 1
| Electricity | 8 | 1024 | 80596 | 107761 | 1.34x | 7.92x | 7.86x
| Traffic | 1 | 1024 | 10197 | 13779 | 1.35x | 1 | 1
| Traffic | 8 | 1024 | 80692 | 107979 | 1.34x | 7.91x | 7.84x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
##### Training performance: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with (V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - FP32 | Throughput - mixed precision | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision
|-------------|---|------|-------|-------|-------|------|----
| Electricity | 1 | 1024 | 5580 | 9148 | 1.64x | 1 | 1
| Electricity | 8 | 1024 | 43351 | 69855 | 1.61x | 7.77x | 7.64x
| Traffic | 1 | 1024 | 5593 | 9194 | 1.64x | 1 | 1
| Traffic | 8 | 1024 | 43426 | 69983 | 1.61x | 7.76x | 7.61x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
## Release notes
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIAs latest software release. For the most up-to-date performance measurements, go to https://developer.nvidia.com/deep-learning-performance-training-inference.
### Changelog
October 2021
- Initial release
### Known issues
There are no known issues with this model.

Binary file not shown.

After

Width:  |  Height:  |  Size: 969 KiB

View file

@ -0,0 +1,128 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from data_utils import InputTypes, DataTypes, FeatureSpec
import datetime
class ElectricityConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('power_usage', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('hour', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'days_from_start' # This column contains time indices across which we split the data
self.train_range = (1096, 1315)
self.valid_range = (1308, 1339)
self.test_range = (1332, 1346)
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = True
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [369]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.1
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
class TrafficConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('values', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('time_on_day', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'sensor_day' # This column contains time indices across which we split the data
self.train_range = (0, 151)
self.valid_range = (144, 166)
self.test_range = (159, float('inf'))
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = False
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [963]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.3
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
CONFIGS = {'electricity': ElectricityConfig,
'traffic': TrafficConfig,
}

View file

@ -0,0 +1,28 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
import torch.nn.functional as F
class QuantileLoss(nn.Module):
def __init__(self, config):
super().__init__()
self.register_buffer('q', torch.tensor(config.quantiles))
def forward(self, predictions, targets):
diff = predictions - targets
ql = (1-self.q)*F.relu(diff) + self.q*F.relu(-diff)
losses = ql.view(-1, ql.shape[-1]).mean(0)
return losses

View file

@ -0,0 +1,790 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################
# Copyright 2021 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import pickle
import enum
import datetime
from collections import namedtuple, OrderedDict
import sklearn.preprocessing
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
from bisect import bisect
import torch
from torch.utils.data import Dataset,IterableDataset,DataLoader
class DataTypes(enum.IntEnum):
"""Defines numerical types of each column."""
CONTINUOUS = 0
CATEGORICAL = 1
DATE = 2
STR = 3
class InputTypes(enum.IntEnum):
"""Defines input types of each column."""
TARGET = 0
OBSERVED = 1
KNOWN = 2
STATIC = 3
ID = 4 # Single column used as an entity identifier
TIME = 5 # Single column exclusively used as a time index
FeatureSpec = namedtuple('FeatureSpec', ['name', 'feature_type', 'feature_embed_type'])
DTYPE_MAP = {
DataTypes.CONTINUOUS : np.float32,
DataTypes.CATEGORICAL : np.int64,
DataTypes.DATE:'datetime64[ns]',
DataTypes.STR: str
}
FEAT_ORDER = [
(InputTypes.STATIC, DataTypes.CATEGORICAL),
(InputTypes.STATIC, DataTypes.CONTINUOUS),
(InputTypes.KNOWN, DataTypes.CATEGORICAL),
(InputTypes.KNOWN, DataTypes.CONTINUOUS),
(InputTypes.OBSERVED, DataTypes.CATEGORICAL),
(InputTypes.OBSERVED, DataTypes.CONTINUOUS),
(InputTypes.TARGET, DataTypes.CONTINUOUS),
(InputTypes.ID, DataTypes.CATEGORICAL)
]
FEAT_NAMES = ['s_cat' , 's_cont' , 'k_cat' , 'k_cont' , 'o_cat' , 'o_cont' , 'target', 'id']
DEFAULT_ID_COL = 'id'
class TFTBinaryDataset(Dataset):
def __init__(self, path, config):
super(TFTBinaryDataset).__init__()
self.features = [x for x in config.features if x.feature_embed_type != DataTypes.DATE]
self.example_length = config.example_length
self.stride = config.dataset_stride
self.grouped = pickle.load(open(path, 'rb'))
self.grouped = [x for x in self.grouped if x.shape[0] >= self.example_length]
self._cum_examples_in_group = np.cumsum([(g.shape[0] - self.example_length + 1)//self.stride for g in self.grouped])
self.feature_type_col_map = [[i for i,f in enumerate(self.features) if (f.feature_type, f.feature_embed_type) == x] for x in FEAT_ORDER]
# The list comprehension below is an elaborate way of rearranging data into correct order,
# simultaneously doing casting to proper types. Probably can be written neater
self.grouped = [
[
arr[:, idxs].view(dtype=np.float32).astype(DTYPE_MAP[t[1]])
for t, idxs in zip(FEAT_ORDER, self.feature_type_col_map)
]
for arr in self.grouped
]
def __len__(self):
return self._cum_examples_in_group[-1] if len(self._cum_examples_in_group) else 0
def __getitem__(self, idx):
g_idx = bisect(self._cum_examples_in_group, idx)
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx]
tensors = [
torch.from_numpy(feat[e_idx * self.stride:e_idx*self.stride + self.example_length])
if feat.size else torch.empty(0)
for feat in group
]
return OrderedDict(zip(FEAT_NAMES, tensors))
class TFTDataset(Dataset):
def __init__(self, path, config):
super(TFTDataset).__init__()
self.features = config.features
self.data = pd.read_csv(path, index_col=0)
self.example_length = config.example_length
self.stride = config.dataset_stride
# name field is a column name.
# there can be multiple entries with the same name because one column can be interpreted in many ways
time_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.TIME)
id_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.ID)
if not id_col_name in self.data.columns:
id_col_name = DEFAULT_ID_COL
self.features = [x for x in self.features if x.feature_type!=InputTypes.ID]
self.features.append(FeatureSpec(DEFAULT_ID_COL, InputTypes.ID, DataTypes.CATEGORICAL))
col_dtypes = {v.name:DTYPE_MAP[v.feature_embed_type] for v in self.features}
self.data.sort_values(time_col_name,inplace=True)
self.data = self.data[set(x.name for x in self.features)] #leave only relevant columns
self.data = self.data.astype(col_dtypes)
self.data = self.data.groupby(id_col_name).filter(lambda group: len(group) >= self.example_length)
self.grouped = list(self.data.groupby(id_col_name))
self._cum_examples_in_group = np.cumsum([(len(g[1]) - self.example_length + 1)//self.stride for g in self.grouped])
def __len__(self):
return self._cum_examples_in_group[-1]
def __getitem__(self, idx):
g_idx = len([x for x in self._cum_examples_in_group if x <= idx])
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx][1]
sliced = group.iloc[e_idx * self.stride:e_idx*self.stride + self.example_length]
# We need to be sure that tensors are returned in the correct order
tensors = tuple([] for _ in range(8))
for v in self.features:
if v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[0].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[1].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[2].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[3].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[4].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[5].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.TARGET:
tensors[6].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.ID:
tensors[7].append(torch.from_numpy(sliced[v.name].to_numpy()))
tensors = [torch.stack(x, dim=-1) if x else torch.empty(0) for x in tensors]
return OrderedDict(zip(FEAT_NAMES, tensors))
def get_dataset_splits(df, config):
if hasattr(config, 'relative_split') and config.relative_split:
forecast_len = config.example_length - config.encoder_length
# The valid split is shifted from the train split by number of the forecast steps to the future.
# The test split is shifted by the number of the forecast steps from the valid split
train = []
valid = []
test = []
for _, group in df.groupby(DEFAULT_ID_COL):
index = group[config.time_ids]
_train = group.loc[index < config.valid_boundary]
_valid = group.iloc[(len(_train) - config.encoder_length):(len(_train) + forecast_len)]
_test = group.iloc[(len(_train) - config.encoder_length + forecast_len):(len(_train) + 2*forecast_len)]
train.append(_train)
valid.append(_valid)
test.append(_test)
train = pd.concat(train, axis=0)
valid = pd.concat(valid, axis=0)
test = pd.concat(test, axis=0)
else:
index = df[config.time_ids]
train = df.loc[(index >= config.train_range[0]) & (index < config.train_range[1])]
valid = df.loc[(index >= config.valid_range[0]) & (index < config.valid_range[1])]
test = df.loc[(index >= config.test_range[0]) & (index < config.test_range[1])]
return train, valid, test
def flatten_ids(df, config):
if config.missing_id_strategy == 'drop':
if hasattr(config, 'combine_ids') and config.combine_ids:
index = np.logical_or.reduce([df[c].isna() for c in config.combine_ids])
else:
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
index = df[id_col].isna()
index = index[index == True].index # Extract indices of nans
df.drop(index, inplace=True)
if not (hasattr(config, 'combine_ids') and config.combine_ids):
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
ids = df[id_col].apply(str)
df.drop(id_col, axis=1, inplace=True)
encoder = sklearn.preprocessing.LabelEncoder().fit(ids.values)
df[DEFAULT_ID_COL] = encoder.transform(ids)
encoders = OrderedDict({id_col: encoder})
else:
encoders = {c:sklearn.preprocessing.LabelEncoder().fit(df[c].values) for c in config.combine_ids}
encoders = OrderedDict(encoders)
lens = [len(v.classes_) for v in encoders.values()]
clens = np.roll(np.cumprod(lens), 1)
clens[0] = 1
# this takes a looooooot of time. Probably it would be better to create 2 dummy columns
df[DEFAULT_ID_COL] = df.apply(lambda row: sum([encoders[c].transform([row[c]])[0]*clens[i] for i,c in enumerate(encoders.keys())]), axis=1)
df.drop(config.combine_ids, axis=1, inplace=True)
return DEFAULT_ID_COL, encoders
def impute(df, config):
#XXX This ensures that out scaling will have the same mean. We still need to check the variance
if not hasattr(config, 'missing_data_label'):
return df, None
else:
imp = SimpleImputer(missing_values=config.missing_data_label, strategy='mean')
mask = df.applymap(lambda x: True if x == config.missing_data_label else False)
data = df.values
col_mask = (data == config.missing_data_label).all(axis=0)
data[:,~col_mask] = imp.fit_transform(data)
return data, mask
def normalize_reals(train, valid, test, config, id_col=DEFAULT_ID_COL):
tgt_cols = [x.name for x in config.features if x.feature_type == InputTypes.TARGET]
real_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CONTINUOUS).difference(set(tgt_cols)))
real_scalers = {}
tgt_scalers = {}
def apply_scalers(df, name=None):
if name is None:
name = df.name
mask = df.applymap(lambda x: True if x == config.missing_data_label else False) if hasattr(config, 'missing_data_label') else None
df[real_cols] = real_scalers[name].transform(df[real_cols])
if mask is not None and any(mask):
df[real_cols].mask(mask, 10**9)
df[tgt_cols] = tgt_scalers[name].transform(df[tgt_cols])
return df
if config.scale_per_id:
for identifier, sliced in train.groupby(id_col):
data = sliced[real_cols]
data, _ = impute(data, config)
real_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(data)
# XXX We should probably remove examples that contain NaN as a target
target = sliced[tgt_cols]
tgt_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(target)
train = train.groupby(id_col).apply(apply_scalers)
# For valid and testing leave only timeseries previously present in train subset
# XXX for proper data science we should consider encoding unseen timeseries as a special case, not throwing them away
valid = valid.loc[valid[id_col].isin(real_scalers.keys())]
valid = valid.groupby(id_col).apply(apply_scalers)
test = test.loc[test[id_col].isin(real_scalers.keys())]
test = test.groupby(id_col).apply(apply_scalers)
else:
data, _ = impute(train[real_cols], config)
real_scalers[''] = sklearn.preprocessing.StandardScaler().fit(data)
tgt_scalers[''] = sklearn.preprocessing.StandardScaler().fit(train[tgt_cols])
train = apply_scalers(train, name='')
valid = apply_scalers(valid, name='')
test = apply_scalers(test, name='')
return train, valid, test, real_scalers, tgt_scalers
def encode_categoricals(train, valid, test, config):
cat_encodings = {}
cat_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CATEGORICAL and v.feature_type != InputTypes.ID))
num_classes = [] #XXX Maybe we should modify config based on this value? Or send a warninig?
# For TC performance reasons we might want for num_classes[i] be divisible by 8
# Train categorical encoders
for c in cat_cols:
if config.missing_cat_data_strategy == 'special_token':
#XXX this will probably require some data augmentation
unique = train[c].unique()
valid[c].loc[valid[c].isin(unique)] = '<UNK>'
test[c].loc[test[c].isin(unique)] = '<UNK>'
if config.missing_cat_data_strategy == 'encode_all' or \
config.missing_cat_data_strategy == 'special_token':
srs = pd.concat([train[c], valid[c], test[c]]).apply(str)
cat_encodings[c] = sklearn.preprocessing.LabelEncoder().fit(srs.values)
elif config.missing_cat_data_strategy == 'drop':
# TODO: implement this. In addition to dropping rows this has to split specific time series in chunks
# to prevent data from having temporal gaps
pass
num_classes.append(srs.nunique())
print('Categorical variables encodings lens: ', num_classes)
for split in [train, valid, test]:
for c in cat_cols:
srs = split[c].apply(str)
split[c] = srs
split.loc[:,c] = cat_encodings[c].transform(srs)
return cat_encodings
def preprocess(src_path, dst_path, config):
df = pd.read_csv(src_path, index_col=0)
for c in config.features:
if c.feature_embed_type == DataTypes.DATE:
df[c.name] = pd.to_datetime(df[c.name])
# Leave only columns relevant to preprocessing
relevant_columns = list(set([f.name for f in config.features] + [config.time_ids]))
df = df[relevant_columns]
id_col, id_encoders = flatten_ids(df, config)
df = df.reindex(sorted(df.columns), axis=1)
train, valid, test = get_dataset_splits(df, config)
# Length filter the data (all timeseries shorter than example len will be dropped)
#for df in [train, valid, test]:
# df.groupby(id_col).filter(lambda x: len(x) >= config.example_length)
train = pd.concat([x[1] for x in train.groupby(id_col) if len(x[1]) >= config.example_length])
valid = pd.concat([x[1] for x in valid.groupby(id_col) if len(x[1]) >= config.example_length])
test = pd.concat([x[1] for x in test.groupby(id_col) if len(x[1]) >= config.example_length])
train, valid, test, real_scalers, tgt_scalers = normalize_reals(train, valid, test, config, id_col)
cat_encodings = encode_categoricals(train, valid, test, config)
os.makedirs(dst_path, exist_ok=True)
train.to_csv(os.path.join(dst_path, 'train.csv'))
valid.to_csv(os.path.join(dst_path, 'valid.csv'))
test.to_csv(os.path.join(dst_path, 'test.csv'))
# Save relevant columns in binary form for faster dataloading
# IMORTANT: We always expect id to be a single column indicating the complete timeseries
# We also expect a copy of id in form of static categorical input!!!
col_names = [id_col] + [x.name for x in config.features if x.feature_embed_type != DataTypes.DATE and x.feature_type != InputTypes.ID]
grouped_train = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in train.groupby(id_col)]
grouped_valid = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in valid.groupby(id_col)]
grouped_test = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in test.groupby(id_col)]
pickle.dump(grouped_train, open(os.path.join(dst_path, 'train.bin'), 'wb'))
pickle.dump(grouped_valid, open(os.path.join(dst_path, 'valid.bin'), 'wb'))
pickle.dump(grouped_test, open(os.path.join(dst_path, 'test.bin'), 'wb'))
with open(os.path.join(dst_path, 'real_scalers.bin'), 'wb') as f:
pickle.dump(real_scalers, f)
with open(os.path.join(dst_path, 'tgt_scalers.bin'), 'wb') as f:
pickle.dump(tgt_scalers, f)
with open(os.path.join(dst_path, 'cat_encodings.bin'), 'wb') as f:
pickle.dump(cat_encodings, f)
with open(os.path.join(dst_path, 'id_encoders.bin'), 'wb') as f:
pickle.dump(id_encoders, f)
def sample_data(dataset, num_samples):
if num_samples < 0:
return dataset
else:
return torch.utils.data.Subset(dataset, np.random.choice(np.arange(len(dataset)), size=num_samples, replace=False))
def standarize_electricity(path):
"""Code taken from https://github.com/google-research/google-research/blob/master/tft/script_download_data.py"""
df = pd.read_csv(os.path.join(path, 'LD2011_2014.txt'), index_col=0, sep=';', decimal=',')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
# Used to determine the start and end dates of a series
output = df.resample('1h').mean().replace(0., np.nan)
earliest_time = output.index.min()
df_list = []
for label in output:
print('Processing {}'.format(label))
srs = output[label]
start_date = min(srs.fillna(method='ffill').dropna().index)
end_date = max(srs.fillna(method='bfill').dropna().index)
active_range = (srs.index >= start_date) & (srs.index <= end_date)
srs = srs[active_range].fillna(0.)
tmp = pd.DataFrame({'power_usage': srs})
date = tmp.index
tmp['t'] = (date - earliest_time).seconds / 60 / 60 + (
date - earliest_time).days * 24
tmp['days_from_start'] = (date - earliest_time).days
tmp['categorical_id'] = label
tmp['date'] = date
tmp['id'] = label
tmp['hour'] = date.hour
tmp['day'] = date.day
tmp['day_of_week'] = date.dayofweek
tmp['month'] = date.month
df_list.append(tmp)
output = pd.concat(df_list, axis=0, join='outer').reset_index(drop=True)
output['categorical_id'] = output['id'].copy()
output['hours_from_start'] = output['t']
output['categorical_day_of_week'] = output['day_of_week'].copy()
output['categorical_hour'] = output['hour'].copy()
output.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_volatility(path):
df = pd.read_csv(os.path.join(path, 'oxfordmanrealizedvolatilityindices.csv'), index_col=0) # no explicit index
# Adds additional date/day fields
idx = [str(s).split('+')[0] for s in df.index
] # ignore timezones, we don't need them
dates = pd.to_datetime(idx)
df['date'] = dates
df['days_from_start'] = (dates - pd.datetime(2000, 1, 3)).days
df['day_of_week'] = dates.dayofweek
df['day_of_month'] = dates.day
df['week_of_year'] = dates.weekofyear
df['month'] = dates.month
df['year'] = dates.year
df['categorical_id'] = df['Symbol'].copy()
# Processes log volatility
vol = df['rv5_ss'].copy()
vol.loc[vol == 0.] = np.nan
df['log_vol'] = np.log(vol)
# Adds static information
symbol_region_mapping = {
'.AEX': 'EMEA',
'.AORD': 'APAC',
'.BFX': 'EMEA',
'.BSESN': 'APAC',
'.BVLG': 'EMEA',
'.BVSP': 'AMER',
'.DJI': 'AMER',
'.FCHI': 'EMEA',
'.FTMIB': 'EMEA',
'.FTSE': 'EMEA',
'.GDAXI': 'EMEA',
'.GSPTSE': 'AMER',
'.HSI': 'APAC',
'.IBEX': 'EMEA',
'.IXIC': 'AMER',
'.KS11': 'APAC',
'.KSE': 'APAC',
'.MXX': 'AMER',
'.N225': 'APAC ',
'.NSEI': 'APAC',
'.OMXC20': 'EMEA',
'.OMXHPI': 'EMEA',
'.OMXSPI': 'EMEA',
'.OSEAX': 'EMEA',
'.RUT': 'EMEA',
'.SMSI': 'EMEA',
'.SPX': 'AMER',
'.SSEC': 'APAC',
'.SSMI': 'EMEA',
'.STI': 'APAC',
'.STOXX50E': 'EMEA'
}
df['Region'] = df['Symbol'].apply(lambda k: symbol_region_mapping[k])
# Performs final processing
output_df_list = []
for grp in df.groupby('Symbol'):
sliced = grp[1].copy()
sliced.sort_values('days_from_start', inplace=True)
# Impute log volatility values
sliced['log_vol'].fillna(method='ffill', inplace=True)
sliced.dropna()
output_df_list.append(sliced)
df = pd.concat(output_df_list, axis=0)
df.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_traffic(path):
def process_list(s, variable_type=int, delimiter=None):
"""Parses a line in the PEMS format to a list."""
if delimiter is None:
l = [
variable_type(i) for i in s.replace('[', '').replace(']', '').split()
]
else:
l = [
variable_type(i)
for i in s.replace('[', '').replace(']', '').split(delimiter)
]
return l
def read_single_list(filename):
"""Returns single list from a file in the PEMS-custom format."""
with open(os.path.join(path, filename), 'r') as dat:
l = process_list(dat.readlines()[0])
return l
def read_matrix(filename):
"""Returns a matrix from a file in the PEMS-custom format."""
array_list = []
with open(os.path.join(path, filename), 'r') as dat:
lines = dat.readlines()
for i, line in enumerate(lines):
if (i + 1) % 50 == 0:
print('Completed {} of {} rows for {}'.format(i + 1, len(lines),
filename))
array = [
process_list(row_split, variable_type=float, delimiter=None)
for row_split in process_list(
line, variable_type=str, delimiter=';')
]
array_list.append(array)
return array_list
shuffle_order = np.array(read_single_list('randperm')) - 1 # index from 0
train_dayofweek = read_single_list('PEMS_trainlabels')
train_tensor = read_matrix('PEMS_train')
test_dayofweek = read_single_list('PEMS_testlabels')
test_tensor = read_matrix('PEMS_test')
# Inverse permutate shuffle order
print('Shuffling')
inverse_mapping = {
new_location: previous_location
for previous_location, new_location in enumerate(shuffle_order)
}
reverse_shuffle_order = np.array([
inverse_mapping[new_location]
for new_location, _ in enumerate(shuffle_order)
])
# Group and reoder based on permuation matrix
print('Reodering')
day_of_week = np.array(train_dayofweek + test_dayofweek)
combined_tensor = np.array(train_tensor + test_tensor)
day_of_week = day_of_week[reverse_shuffle_order]
combined_tensor = combined_tensor[reverse_shuffle_order]
# Put everything back into a dataframe
print('Parsing as dataframe')
labels = ['traj_{}'.format(i) for i in read_single_list('stations_list')]
hourly_list = []
for day, day_matrix in enumerate(combined_tensor):
# Hourly data
hourly = pd.DataFrame(day_matrix.T, columns=labels)
hourly['hour_on_day'] = [int(i / 6) for i in hourly.index
] # sampled at 10 min intervals
if hourly['hour_on_day'].max() > 23 or hourly['hour_on_day'].min() < 0:
raise ValueError('Invalid hour! {}-{}'.format(
hourly['hour_on_day'].min(), hourly['hour_on_day'].max()))
hourly = hourly.groupby('hour_on_day', as_index=True).mean()[labels]
hourly['sensor_day'] = day
hourly['time_on_day'] = hourly.index
hourly['day_of_week'] = day_of_week[day]
hourly_list.append(hourly)
hourly_frame = pd.concat(hourly_list, axis=0, ignore_index=True, sort=False)
# Flatten such that each entitiy uses one row in dataframe
store_columns = [c for c in hourly_frame.columns if 'traj' in c]
other_columns = [c for c in hourly_frame.columns if 'traj' not in c]
flat_df = pd.DataFrame(columns=['values', 'prev_values', 'next_values'] +
other_columns + ['id'])
for store in store_columns:
print('Processing {}'.format(store))
sliced = hourly_frame[[store] + other_columns].copy()
sliced.columns = ['values'] + other_columns
sliced['id'] = int(store.replace('traj_', ''))
# Sort by Sensor-date-time
key = sliced['id'].apply(str) \
+ sliced['sensor_day'].apply(lambda x: '_{:03d}'.format(x)) \
+ sliced['time_on_day'].apply(lambda x: '_{:03d}'.format(x))
sliced = sliced.set_index(key).sort_index()
sliced['values'] = sliced['values'].fillna(method='ffill')
sliced['prev_values'] = sliced['values'].shift(1)
sliced['next_values'] = sliced['values'].shift(-1)
flat_df = flat_df.append(sliced.dropna(), ignore_index=True, sort=False)
# Filter to match range used by other academic papers
index = flat_df['sensor_day']
flat_df = flat_df[index < 173].copy()
# Creating columns fo categorical inputs
flat_df['categorical_id'] = flat_df['id'].copy()
flat_df['hours_from_start'] = flat_df['time_on_day'] \
+ flat_df['sensor_day']*24.
flat_df['categorical_day_of_week'] = flat_df['day_of_week'].copy()
flat_df['categorical_time_on_day'] = flat_df['time_on_day'].copy()
flat_df.to_csv(os.path.join(path, 'standarized.csv'))
# XXX needs rework
def standarize_favorita(data_folder):
import gc
# Extract only a subset of data to save/process for efficiency
start_date = pd.datetime(2015, 1, 1)
end_date = pd.datetime(2016, 6, 1)
print('Regenerating data...')
# load temporal data
temporal = pd.read_csv(os.path.join(data_folder, 'train.csv'), index_col=0)
store_info = pd.read_csv(os.path.join(data_folder, 'stores.csv'), index_col=0)
oil = pd.read_csv(
os.path.join(data_folder, 'oil.csv'), index_col=0).iloc[:, 0]
holidays = pd.read_csv(os.path.join(data_folder, 'holidays_events.csv'))
items = pd.read_csv(os.path.join(data_folder, 'items.csv'), index_col=0)
transactions = pd.read_csv(os.path.join(data_folder, 'transactions.csv'))
# Take first 6 months of data
temporal['date'] = pd.to_datetime(temporal['date'])
# Filter dates to reduce storage space requirements
if start_date is not None:
temporal = temporal[(temporal['date'] >= start_date)]
if end_date is not None:
temporal = temporal[(temporal['date'] < end_date)]
dates = temporal['date'].unique()
# Add trajectory identifier
temporal['traj_id'] = temporal['store_nbr'].apply(
str) + '_' + temporal['item_nbr'].apply(str)
temporal['unique_id'] = temporal['traj_id'] + '_' + temporal['date'].apply(
str)
# Remove all IDs with negative returns
print('Removing returns data')
min_returns = temporal['unit_sales'].groupby(temporal['traj_id']).min()
valid_ids = set(min_returns[min_returns >= 0].index)
selector = temporal['traj_id'].apply(lambda traj_id: traj_id in valid_ids)
new_temporal = temporal[selector].copy()
del temporal
gc.collect()
temporal = new_temporal
temporal['open'] = 1
# Resampling
print('Resampling to regular grid')
resampled_dfs = []
for traj_id, raw_sub_df in temporal.groupby('traj_id'):
print('Resampling', traj_id)
sub_df = raw_sub_df.set_index('date', drop=True).copy()
sub_df = sub_df.resample('1d').last()
sub_df['date'] = sub_df.index
sub_df[['store_nbr', 'item_nbr', 'onpromotion']] \
= sub_df[['store_nbr', 'item_nbr', 'onpromotion']].fillna(method='ffill')
sub_df['open'] = sub_df['open'].fillna(
0) # flag where sales data is unknown
sub_df['log_sales'] = np.log(sub_df['unit_sales'])
resampled_dfs.append(sub_df.reset_index(drop=True))
new_temporal = pd.concat(resampled_dfs, axis=0)
del temporal
gc.collect()
temporal = new_temporal
print('Adding oil')
oil.name = 'oil'
oil.index = pd.to_datetime(oil.index)
#XXX the lines below match the value of the oil on given date with the rest of the timeseries
# missing values in oil series are copied from the index before. Then the oil series is joined with
# temporal. Then there are some dates present in temporal which arent present in oil, for which
# oil values is substituted with -1. WHY?!
#TODO: check how many nans there are after first step. Previously oil series was extended by dates
# present in dates variable with nan value, which were forward filled.
# This behavior is no longer supported by pandas, so we changed to DataFrame.isin method.
# This leaves us with more nans after first step than previously. To achieve previous behavior
# we have to join series before filling nans.
temporal = temporal.join(
#oil.loc[oil.index.isin(dates)].fillna(method='ffill'), on='date', how='left')
oil.loc[oil.index.isin(dates)], on='date', how='left')
temporal['oil'] = temporal['oil'].fillna(method='ffill')
temporal['oil'] = temporal['oil'].fillna(-1)
print('Adding store info')
temporal = temporal.join(store_info, on='store_nbr', how='left')
print('Adding item info')
temporal = temporal.join(items, on='item_nbr', how='left')
transactions['date'] = pd.to_datetime(transactions['date'])
temporal = temporal.merge(
transactions,
left_on=['date', 'store_nbr'],
right_on=['date', 'store_nbr'],
how='left')
temporal['transactions'] = temporal['transactions'].fillna(-1)
# Additional date info
temporal['day_of_week'] = pd.to_datetime(temporal['date'].values).dayofweek
temporal['day_of_month'] = pd.to_datetime(temporal['date'].values).day
temporal['month'] = pd.to_datetime(temporal['date'].values).month
# Add holiday info
print('Adding holidays')
holiday_subset = holidays[holidays['transferred'].apply(
lambda x: not x)].copy()
holiday_subset.columns = [
s if s != 'type' else 'holiday_type' for s in holiday_subset.columns
]
holiday_subset['date'] = pd.to_datetime(holiday_subset['date'])
local_holidays = holiday_subset[holiday_subset['locale'] == 'Local']
regional_holidays = holiday_subset[holiday_subset['locale'] == 'Regional']
national_holidays = holiday_subset[holiday_subset['locale'] == 'National']
temporal['national_hol'] = temporal.merge(
national_holidays, left_on=['date'], right_on=['date'],
how='left')['description'].fillna('')
temporal['regional_hol'] = temporal.merge(
regional_holidays,
left_on=['state', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal['local_hol'] = temporal.merge(
local_holidays,
left_on=['city', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal.sort_values('unique_id', inplace=True)
# Transform date to integer index
start_date = pd.to_datetime(min(temporal['date']))
dates = temporal['date'].apply(pd.to_datetime)
temporal['days_from_start'] = (dates - start_date).dt.days
temporal['categorical_id'] = temporal['traj_id'].copy()
print('Saving processed file to {}'.format(os.path.join(data_folder, 'standarized.csv')))
temporal.to_csv(os.path.join(data_folder, 'standarized.csv'))

View file

@ -0,0 +1,73 @@
# Copyright 2021 NVIDIA CORPORATION
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2019 Ross Wightman
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Exponential Moving Average (EMA) of model updates
"""
from collections import OrderedDict
from copy import deepcopy
import torch
import torch.nn as nn
class ModelEma(nn.Module):
""" Model Exponential Moving Average V2
Keep a moving average of everything in the model state_dict (parameters and buffers).
V2 of this module is simpler, it does not match params/buffers based on name but simply
iterates in order. It works with torchscript (JIT of full model).
"""
def __init__(self, model, decay=0.999, device=None):
super().__init__()
# make a copy of the model for accumulating moving average of weights
self.module = deepcopy(model)
self.module.eval()
self.decay = decay
self.device = device # perform ema on different device from model if set
if self.device is not None:
self.module.to(device=device)
def update(self, model):
update_fn=lambda ema_v, model_v: self.decay * ema_v + (1. - self.decay) * model_v
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_(update_fn(ema_v, model_v))
def set(self, model):
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_( model_v )
def forward(self, x):
return self.module(x)

View file

@ -0,0 +1,157 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import math
import os
import pathlib
import re
import pynvml
pynvml.nvmlInit()
def systemGetDriverVersion():
return pynvml.nvmlSystemGetDriverVersion()
def deviceGetCount():
return pynvml.nvmlDeviceGetCount()
class device:
# assume nvml returns list of 64 bit ints
_nvml_affinity_elements = math.ceil(os.cpu_count() / 64)
def __init__(self, device_idx):
super().__init__()
self.handle = pynvml.nvmlDeviceGetHandleByIndex(device_idx)
def getName(self):
return pynvml.nvmlDeviceGetName(self.handle)
def getCpuAffinity(self):
affinity_string = ''
for j in pynvml.nvmlDeviceGetCpuAffinity(
self.handle, device._nvml_affinity_elements
):
# assume nvml returns list of 64 bit ints
affinity_string = '{:064b}'.format(j) + affinity_string
affinity_list = [int(x) for x in affinity_string]
affinity_list.reverse() # so core 0 is in 0th element of list
ret = [i for i, e in enumerate(affinity_list) if e != 0]
return ret
def set_socket_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity)
def set_single_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity[:1])
def set_single_unique_affinity(gpu_id, nproc_per_node):
devices = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in devices]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
affinities = []
assigned = []
for socket_affinity in socket_affinities:
for core in socket_affinity:
if core not in assigned:
affinities.append([core])
assigned.append(core)
break
os.sched_setaffinity(0, affinities[gpu_id])
def set_socket_unique_affinity(gpu_id, nproc_per_node, mode):
device_ids = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in device_ids]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
socket_affinities_to_device_ids = collections.defaultdict(list)
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities_to_device_ids[tuple(socket_affinity)].append(idx)
for socket_affinity, device_ids in socket_affinities_to_device_ids.items():
devices_per_group = len(device_ids)
cores_per_device = len(socket_affinity) // devices_per_group
for group_id, device_id in enumerate(device_ids):
if device_id == gpu_id:
if mode == 'interleaved':
affinity = list(socket_affinity[group_id::devices_per_group])
elif mode == 'continuous':
affinity = list(socket_affinity[group_id*cores_per_device:(group_id+1)*cores_per_device])
else:
raise RuntimeError('Unknown set_socket_unique_affinity mode')
# reintroduce siblings
affinity += [siblings_dict[aff] for aff in affinity if aff in siblings_dict]
os.sched_setaffinity(0, affinity)
def get_thread_siblings_list():
path = '/sys/devices/system/cpu/cpu*/topology/thread_siblings_list'
thread_siblings_list = []
pattern = re.compile(r'(\d+)\D(\d+)')
for fname in pathlib.Path(path[0]).glob(path[1:]):
with open(fname) as f:
content = f.read().strip()
res = pattern.findall(content)
if res:
pair = tuple(map(int, res[0]))
thread_siblings_list.append(pair)
return thread_siblings_list
def set_affinity(gpu_id, nproc_per_node, mode='socket'):
if mode == 'socket':
set_socket_affinity(gpu_id)
elif mode == 'single':
set_single_affinity(gpu_id)
elif mode == 'single_unique':
set_single_unique_affinity(gpu_id, nproc_per_node)
elif mode == 'socket_unique_interleaved':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'interleaved')
elif mode == 'socket_unique_continuous':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'continuous')
else:
raise RuntimeError('Unknown affinity mode')
affinity = os.sched_getaffinity(0)
return affinity

View file

@ -0,0 +1,239 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pandas as pd
import numpy as np
import pickle
import argparse
import torch
from torch.utils.data import DataLoader
from torch.cuda import amp
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from modeling import TemporalFusionTransformer
from configuration import ElectricityConfig
from data_utils import TFTDataset
from utils import PerformanceMeter
from criterions import QuantileLoss
import dllogger
from log_helper import setup_logger
def _unscale_per_id(config, values, ids, scalers):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
flat_values['id'] = ids
df_list = []
for idx, group in flat_values.groupby('id'):
scaler = scalers[idx]
group_copy = group.copy()
for col in group_copy.columns:
if not 'id' in col:
_col = np.expand_dims(group_copy[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
group_copy[col] = _t_col
df_list.append(group_copy)
flat_values = pd.concat(df_list, axis=0)
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def _unscale(config, values, scaler):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
for col in flat_values.columns:
if not 'id' in col:
_col = np.expand_dims(flat_values[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
flat_values[col] = _t_col
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=False):
model.eval()
predictions = []
targets = []
ids = []
perf_meter = PerformanceMeter()
n_workers = args.distributed_world_size if hasattr(args, 'distributed_world_size') else 1
for step, batch in enumerate(data_loader):
perf_meter.reset_current_lap()
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
ids.append(batch['id'][:,0,:])
targets.append(batch['target'])
predictions.append(model(batch).float())
perf_meter.update(args.batch_size * n_workers,
exclude_from_total=step in [0, len(data_loader)-1])
targets = torch.cat(targets, dim=0)
if not extend_targets:
targets = targets[:,config.encoder_length:,:]
predictions = torch.cat(predictions, dim=0)
if config.scale_per_id:
ids = torch.cat(ids, dim=0).cpu().numpy()
unscaled_predictions = torch.stack(
[_unscale_per_id(config, predictions[:,:,i], ids, scalers) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale_per_id(config, targets[:,:,0], ids, scalers).unsqueeze(-1)
else:
ids = None
unscaled_predictions = torch.stack(
[_unscale(config, predictions[:,:,i], scalers['']) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale(config, targets[:,:,0], scalers['']).unsqueeze(-1)
return unscaled_predictions, unscaled_targets, ids, perf_meter
def visualize_v2(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, _ = predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=True)
num_horizons = config.example_length - config.encoder_length + 1
pad = unscaled_predictions.new_full((unscaled_targets.shape[0], unscaled_targets.shape[1] - unscaled_predictions.shape[1], unscaled_predictions.shape[2]), fill_value=float('nan'))
pad[:,-1,:] = unscaled_targets[:,-num_horizons,:]
unscaled_predictions = torch.cat((pad, unscaled_predictions), dim=1)
ids = torch.from_numpy(ids.squeeze())
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items():
for i, ex in enumerate(g):
df = pd.DataFrame(ex.numpy(),
index=range(num_horizons - ex.shape[0], num_horizons),
columns=['target'] + [f'P{int(q*100)}' for q in config.quantiles])
fig = df.plot().get_figure()
ax = fig.get_axes()[0]
_values = df.values[config.encoder_length-1:,:]
ax.fill_between(range(num_horizons), _values[:,1], _values[:,-1], alpha=0.2, color='green')
os.makedirs(os.path.join(args.results, 'single_example_vis', str(key)), exist_ok=True)
fig.savefig(os.path.join(args.results, 'single_example_vis', str(key), f'{i}.pdf'))
def inference(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, perf_meter = predict(args, config, model, data_loader, scalers, cat_encodings)
if args.joint_visualization or args.save_predictions:
ids = torch.from_numpy(ids.squeeze())
#ids = torch.cat([x['id'][0] for x in data_loader.dataset])
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items(): #timeseries id, joint targets and predictions
_g = {'targets': g[:,:,0]}
_g.update({f'P{int(q*100)}':g[:,:,i+1] for i, q in enumerate(config.quantiles)})
if args.joint_visualization:
summary_writer = SummaryWriter(log_dir=os.path.join(args.results, 'predictions_vis', str(key)))
for q, t in _g.items(): # target and quantiles, timehorizon values
if q == 'targets':
targets = torch.cat([t[:,0], t[-1,1:]]) # WIP
# We want to plot targets on the same graph as predictions. Probably could be written better.
for i, val in enumerate(targets):
summary_writer.add_scalars(str(key), {f'{q}':val}, i)
continue
# Tensor t contains different time horizons which are shifted in phase
# Next lines realign them
y = t.new_full((t.shape[0] + t.shape[1] -1, t.shape[1]), float('nan'))
for i in range(y.shape[1]):
y[i:i+t.shape[0], i] = t[:,i]
for i, vals in enumerate(y): # timestep, timehorizon values value
summary_writer.add_scalars(str(key), {f'{q}_t+{j+1}':v for j,v in enumerate(vals) if v == v}, i)
summary_writer.close()
if args.save_predictions:
for q, t in _g.items():
df = pd.DataFrame(t.tolist())
df.columns = [f't+{i+1}' for i in range(len(df.columns))]
os.makedirs(os.path.join(args.results, 'predictions', str(key)), exist_ok=True)
df.to_csv(os.path.join(args.results, 'predictions', str(key), q+'.csv'))
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
q_risk = 2 * losses / normalizer
perf_dict = {
'throughput': perf_meter.avg,
'latency_avg': perf_meter.total_time/len(perf_meter.intervals),
'latency_p90': perf_meter.p(90),
'latency_p95': perf_meter.p(95),
'latency_p99': perf_meter.p(99),
'total_infernece_time': perf_meter.total_time,
}
return q_risk, perf_dict
def main(args):
setup_logger(args)
# Set up model
state_dict = torch.load(args.checkpoint)
config = state_dict['config']
model = TemporalFusionTransformer(config).cuda()
model.load_state_dict(state_dict['model'])
model.eval()
model.cuda()
# Set up dataset
test_split = TFTDataset(args.data, config)
data_loader = DataLoader(test_split, batch_size=args.batch_size, num_workers=4)
scalers = pickle.load(open(args.tgt_scalers, 'rb'))
cat_encodings = pickle.load(open(args.cat_encodings, 'rb'))
if args.visualize:
# TODO: abstract away all forms of visualization.
visualize_v2(args, config, model, data_loader, scalers, cat_encodings)
quantiles, perf_dict = inference(args, config, model, data_loader, scalers, cat_encodings)
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, **perf_dict}
dllogger.log(step=(), data=finish_log, verbosity=1)
print('Test q-risk: P10 {} | P50 {} | P90 {}'.format(*quantiles))
print('Latency:\n\tAverage {:.3f}s\n\tp90 {:.3f}s\n\tp95 {:.3f}s\n\tp99 {:.3f}s'.format(
perf_dict['latency_avg'], perf_dict['latency_p90'], perf_dict['latency_p95'], perf_dict['latency_p99']))
if __name__=='__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--checkpoint', type=str,
help='Path to the checkpoint')
parser.add_argument('--data', type=str,
help='Path to the test split of the dataset')
parser.add_argument('--tgt_scalers', type=str,
help='Path to the tgt_scalers.bin file produced by the preprocessing')
parser.add_argument('--cat_encodings', type=str,
help='Path to the cat_encodings.bin file produced by the preprocessing')
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--visualize', action='store_true', help='Visualize predictions - each example on the separate plot')
parser.add_argument('--joint_visualization', action='store_true', help='Visualize predictions - each timeseries on separate plot. Projections will be concatenated.')
parser.add_argument('--save_predictions', action='store_true')
parser.add_argument('--results', type=str, default='/results')
parser.add_argument('--log_file', type=str, default='dllogger.json')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,141 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import subprocess
import sys
import itertools
import atexit
import dllogger
from dllogger import Backend, JSONStreamBackend, StdOutBackend
import torch.distributed as dist
from torch.utils.tensorboard import SummaryWriter
class TensorBoardBackend(Backend):
def __init__(self, verbosity, log_dir):
super().__init__(verbosity=verbosity)
self.summary_writer = SummaryWriter(log_dir=os.path.join(log_dir, 'TB_summary'),
flush_secs=120,
max_queue=200
)
self.hp_cache = None
atexit.register(self.summary_writer.close)
@property
def log_level(self):
return self._log_level
def metadata(self, timestamp, elapsedtime, metric, metadata):
pass
def log(self, timestamp, elapsedtime, step, data):
if step == 'HPARAMS':
parameters = {k: v for k, v in data.items() if not isinstance(v, (list, tuple))}
#Unpack list and tuples
for d in [{k+f'_{i}':v for i,v in enumerate(l)} for k,l in data.items() if isinstance(l, (list, tuple))]:
parameters.update(d)
#Remove custom classes
parameters = {k: v for k, v in data.items() if isinstance(v, (int, float, str, bool))}
parameters.update({k:'None' for k, v in data.items() if v is None})
self.hp_cache = parameters
if step == ():
if self.hp_cache is None:
print('Warning: Cannot save HParameters. Please log HParameters with step=\'HPARAMS\'', file=sys.stderr)
return
self.summary_writer.add_hparams(self.hp_cache, data)
if not isinstance(step, int):
return
for k, v in data.items():
self.summary_writer.add_scalar(k, v, step)
def flush(self):
pass
def setup_logger(args):
os.makedirs(args.results, exist_ok=True)
log_path = os.path.join(args.results, args.log_file)
if os.path.exists(log_path):
for i in itertools.count():
s_fname = args.log_file.split('.')
fname = '.'.join(s_fname[:-1]) + f'_{i}.' + s_fname[-1] if len(s_fname) > 1 else args.stat_file + f'.{i}'
log_path = os.path.join(args.results, fname)
if not os.path.exists(log_path):
break
def metric_format(metric, metadata, value):
return "{}: {}".format(metric, f'{value:.5f}' if isinstance(value, float) else value)
def step_format(step):
if step == ():
return "Finished |"
elif isinstance(step, int):
return "Step {0: <5} |".format(step)
return "Step {} |".format(step)
if not dist.is_initialized() or not args.distributed_world_size > 1 or args.distributed_rank == 0:
dllogger.init(backends=[JSONStreamBackend(verbosity=1, filename=log_path),
TensorBoardBackend(verbosity=1, log_dir=args.results),
StdOutBackend(verbosity=2,
step_format=step_format,
prefix_format=lambda x: "")#,
#metric_format=metric_format)
])
else:
dllogger.init(backends=[])
dllogger.log(step='PARAMETER', data=vars(args), verbosity=0)
container_setup_info = {**get_framework_env_vars(), **get_system_info()}
dllogger.log(step='ENVIRONMENT', data=container_setup_info, verbosity=0)
dllogger.metadata('loss', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'TRAIN', 'format': ':1f'})
dllogger.metadata('val_loss', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format':':5f'})
dllogger.metadata('val_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'VAL', 'format': ':1f'})
dllogger.metadata('test_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('throughput', {'GOAL': 'MAXIMIZE', 'STAGE': 'TEST', 'format': ':1f'})
dllogger.metadata('latency_p90', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p95', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p99', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
def get_framework_env_vars():
return {
'NVIDIA_PYTORCH_VERSION': os.environ.get('NVIDIA_PYTORCH_VERSION'),
'PYTORCH_VERSION': os.environ.get('PYTORCH_VERSION'),
'CUBLAS_VERSION': os.environ.get('CUBLAS_VERSION'),
'NCCL_VERSION': os.environ.get('NCCL_VERSION'),
'CUDA_DRIVER_VERSION': os.environ.get('CUDA_DRIVER_VERSION'),
'CUDNN_VERSION': os.environ.get('CUDNN_VERSION'),
'CUDA_VERSION': os.environ.get('CUDA_VERSION'),
'NVIDIA_PIPELINE_ID': os.environ.get('NVIDIA_PIPELINE_ID'),
'NVIDIA_BUILD_ID': os.environ.get('NVIDIA_BUILD_ID'),
'NVIDIA_TF32_OVERRIDE': os.environ.get('NVIDIA_TF32_OVERRIDE'),
}
def get_system_info():
system_info = subprocess.run('nvidia-smi --query-gpu=gpu_name,memory.total,enforced.power.limit --format=csv'.split(), capture_output=True).stdout
system_info = [i.decode('utf-8') for i in system_info.split(b'\n')]
system_info = [x for x in system_info if x]
return {'system_info': system_info}

View file

@ -0,0 +1,367 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
from typing import Dict, Tuple, Optional, List
if os.environ.get("TFT_SCRIPTING", False):
from torch.nn import LayerNorm
else:
from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm
class MaybeLayerNorm(nn.Module):
def __init__(self, output_size, hidden_size, eps):
super().__init__()
if output_size and output_size == 1:
self.ln = nn.Identity()
else:
self.ln = LayerNorm(output_size if output_size else hidden_size, eps=eps)
def forward(self, x):
return self.ln(x)
class GLU(nn.Module):
def __init__(self, hidden_size, output_size):
super().__init__()
self.lin = nn.Linear(hidden_size, output_size * 2)
def forward(self, x: Tensor) -> Tensor:
x = self.lin(x)
x = F.glu(x)
return x
class GRN(nn.Module):
def __init__(self,
input_size,
hidden_size,
output_size=None,
context_hidden_size=None,
dropout=0):
super().__init__()
self.layer_norm = MaybeLayerNorm(output_size, hidden_size, eps=1e-3)
self.lin_a = nn.Linear(input_size, hidden_size)
if context_hidden_size is not None:
self.lin_c = nn.Linear(context_hidden_size, hidden_size, bias=False)
self.lin_i = nn.Linear(hidden_size, hidden_size)
self.glu = GLU(hidden_size, output_size if output_size else hidden_size)
self.dropout = nn.Dropout(dropout)
self.out_proj = nn.Linear(input_size, output_size) if output_size else None
def forward(self, a: Tensor, c: Optional[Tensor] = None):
x = self.lin_a(a)
if c is not None:
x = x + self.lin_c(c).unsqueeze(1)
x = F.elu(x)
x = self.lin_i(x)
x = self.dropout(x)
x = self.glu(x)
y = a if not self.out_proj else self.out_proj(a)
x = x + y
x = self.layer_norm(x)
return x
class TFTEmbedding(nn.Module):
def __init__(self, config):
super().__init__()
self.s_cat_inp_lens = config.static_categorical_inp_lens
self.t_cat_k_inp_lens = config.temporal_known_categorical_inp_lens
self.t_cat_o_inp_lens = config.temporal_observed_categorical_inp_lens
self.s_cont_inp_size = config.static_continuous_inp_size
self.t_cont_k_inp_size = config.temporal_known_continuous_inp_size
self.t_cont_o_inp_size = config.temporal_observed_continuous_inp_size
self.t_tgt_size = config.temporal_target_size
self.hidden_size = config.hidden_size
# There are 7 types of input:
# 1. Static categorical
# 2. Static continuous
# 3. Temporal known a priori categorical
# 4. Temporal known a priori continuous
# 5. Temporal observed categorical
# 6. Temporal observed continuous
# 7. Temporal observed targets (time series obseved so far)
self.s_cat_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.s_cat_inp_lens]) if self.s_cat_inp_lens else None
self.t_cat_k_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_k_inp_lens]) if self.t_cat_k_inp_lens else None
self.t_cat_o_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_o_inp_lens]) if self.t_cat_o_inp_lens else None
self.s_cont_embedding_vectors = nn.Parameter(torch.Tensor(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_vectors = nn.Parameter(torch.Tensor(self.t_tgt_size, self.hidden_size))
self.s_cont_embedding_bias = nn.Parameter(torch.zeros(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_bias = nn.Parameter(torch.zeros(self.t_tgt_size, self.hidden_size))
if self.s_cont_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.s_cont_embedding_vectors)
if self.t_cont_k_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_k_embedding_vectors)
if self.t_cont_o_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_o_embedding_vectors)
torch.nn.init.xavier_normal_(self.t_tgt_embedding_vectors)
def _apply_embedding(self,
cat: Optional[Tensor],
cont: Optional[Tensor],
cat_emb: Optional[nn.ModuleList],
cont_emb: Tensor,
cont_bias: Tensor,
) -> Tuple[Optional[Tensor], Optional[Tensor]]:
e_cat = torch.stack([embed(cat[...,i]) for i, embed in enumerate(cat_emb)], dim=-2) if cat is not None else None
if cont is not None:
#the line below is equivalent to following einsums
#e_cont = torch.einsum('btf,fh->bthf', cont, cont_emb)
#e_cont = torch.einsum('bf,fh->bhf', cont, cont_emb)
e_cont = torch.mul(cont.unsqueeze(-1), cont_emb)
e_cont = e_cont + cont_bias
else:
e_cont = None
if e_cat is not None and e_cont is not None:
return torch.cat([e_cat, e_cont], dim=-2)
elif e_cat is not None:
return e_cat
elif e_cont is not None:
return e_cont
else:
return None
def forward(self, x: Dict[str, Tensor]):
# temporal/static categorical/continuous known/observed input
s_cat_inp = x.get('s_cat', None)
s_cont_inp = x.get('s_cont', None)
t_cat_k_inp = x.get('k_cat', None)
t_cont_k_inp = x.get('k_cont', None)
t_cat_o_inp = x.get('o_cat', None)
t_cont_o_inp = x.get('o_cont', None)
t_tgt_obs = x['target'] # Has to be present
# Static inputs are expected to be equal for all timesteps
# For memory efficiency there is no assert statement
s_cat_inp = s_cat_inp[:,0,:] if s_cat_inp is not None else None
s_cont_inp = s_cont_inp[:,0,:] if s_cont_inp is not None else None
s_inp = self._apply_embedding(s_cat_inp,
s_cont_inp,
self.s_cat_embed,
self.s_cont_embedding_vectors,
self.s_cont_embedding_bias)
t_known_inp = self._apply_embedding(t_cat_k_inp,
t_cont_k_inp,
self.t_cat_k_embed,
self.t_cont_k_embedding_vectors,
self.t_cont_k_embedding_bias)
t_observed_inp = self._apply_embedding(t_cat_o_inp,
t_cont_o_inp,
self.t_cat_o_embed,
self.t_cont_o_embedding_vectors,
self.t_cont_o_embedding_bias)
# Temporal observed targets
# t_observed_tgt = torch.einsum('btf,fh->btfh', t_tgt_obs, self.t_tgt_embedding_vectors)
t_observed_tgt = torch.matmul(t_tgt_obs.unsqueeze(3).unsqueeze(4), self.t_tgt_embedding_vectors.unsqueeze(1)).squeeze(3)
t_observed_tgt = t_observed_tgt + self.t_tgt_embedding_bias
return s_inp, t_known_inp, t_observed_inp, t_observed_tgt
class VariableSelectionNetwork(nn.Module):
def __init__(self, config, num_inputs):
super().__init__()
self.joint_grn = GRN(config.hidden_size*num_inputs, config.hidden_size, output_size=num_inputs, context_hidden_size=config.hidden_size)
self.var_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(num_inputs)])
def forward(self, x: Tensor, context: Optional[Tensor] = None):
Xi = x.reshape(*x.shape[:-2], -1)
grn_outputs = self.joint_grn(Xi, c=context)
sparse_weights = F.softmax(grn_outputs, dim=-1)
transformed_embed_list = [m(x[...,i,:]) for i, m in enumerate(self.var_grns)]
transformed_embed = torch.stack(transformed_embed_list, dim=-1)
#the line below performs batched matrix vector multiplication
#for temporal features it's bthf,btf->bth
#for static features it's bhf,bf->bh
variable_ctx = torch.matmul(transformed_embed, sparse_weights.unsqueeze(-1)).squeeze(-1)
return variable_ctx, sparse_weights
class StaticCovariateEncoder(nn.Module):
def __init__(self, config):
super().__init__()
self.vsn = VariableSelectionNetwork(config, config.num_static_vars)
self.context_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(4)])
def forward(self, x: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
variable_ctx, sparse_weights = self.vsn(x)
# Context vectors:
# variable selection context
# enrichment context
# state_c context
# state_h context
cs, ce, ch, cc = tuple(m(variable_ctx) for m in self.context_grns)
return cs, ce, ch, cc
class InterpretableMultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.n_head = config.n_head
assert config.hidden_size % config.n_head == 0
self.d_head = config.hidden_size // config.n_head
self.qkv_linears = nn.Linear(config.hidden_size, (2 * self.n_head + 1) * self.d_head, bias=False)
self.out_proj = nn.Linear(self.d_head, config.hidden_size, bias=False)
self.attn_dropout = nn.Dropout(config.attn_dropout)
self.out_dropout = nn.Dropout(config.dropout)
self.scale = self.d_head**-0.5
self.register_buffer("_mask", torch.triu(torch.full((config.example_length, config.example_length), float('-inf')), 1).unsqueeze(0))
def forward(self, x: Tensor, mask_future_timesteps: bool = True) -> Tuple[Tensor, Tensor]:
bs, t, h_size = x.shape
qkv = self.qkv_linears(x)
q, k, v = qkv.split((self.n_head * self.d_head, self.n_head * self.d_head, self.d_head), dim=-1)
q = q.view(bs, t, self.n_head, self.d_head)
k = k.view(bs, t, self.n_head, self.d_head)
v = v.view(bs, t, self.d_head)
# attn_score = torch.einsum('bind,bjnd->bnij', q, k)
attn_score = torch.matmul(q.permute((0, 2, 1, 3)), k.permute((0, 2, 3, 1)))
attn_score.mul_(self.scale)
if mask_future_timesteps:
attn_score = attn_score + self._mask
attn_prob = F.softmax(attn_score, dim=3)
attn_prob = self.attn_dropout(attn_prob)
# attn_vec = torch.einsum('bnij,bjd->bnid', attn_prob, v)
attn_vec = torch.matmul(attn_prob, v.unsqueeze(1))
m_attn_vec = torch.mean(attn_vec, dim=1)
out = self.out_proj(m_attn_vec)
out = self.out_dropout(out)
return out, attn_vec
class TemporalFusionTransformer(nn.Module):
"""
Implementation of https://arxiv.org/abs/1912.09363
"""
def __init__(self, config):
super().__init__()
if hasattr(config, 'model'):
config = config.model
self.encoder_length = config.encoder_length #this determines from how distant past we want to use data from
self.embedding = TFTEmbedding(config)
self.static_encoder = StaticCovariateEncoder(config)
self.history_vsn = VariableSelectionNetwork(config, config.num_historic_vars)
self.history_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.future_vsn = VariableSelectionNetwork(config, config.num_future_vars)
self.future_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.input_gate = GLU(config.hidden_size, config.hidden_size)
self.input_gate_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.enrichment_grn = GRN(config.hidden_size,
config.hidden_size,
context_hidden_size=config.hidden_size,
dropout=config.dropout)
self.attention = InterpretableMultiHeadAttention(config)
self.attention_gate = GLU(config.hidden_size, config.hidden_size)
self.attention_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.positionwise_grn = GRN(config.hidden_size,
config.hidden_size,
dropout=config.dropout)
self.decoder_gate = GLU(config.hidden_size, config.hidden_size)
self.decoder_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.quantile_proj = nn.Linear(config.hidden_size, len(config.quantiles))
def forward(self, x: Dict[str, Tensor]) -> Tensor:
s_inp, t_known_inp, t_observed_inp, t_observed_tgt = self.embedding(x)
# Static context
cs, ce, ch, cc = self.static_encoder(s_inp)
ch, cc = ch.unsqueeze(0), cc.unsqueeze(0) #lstm initial states
# Temporal input
_historical_inputs = [t_known_inp[:,:self.encoder_length,:], t_observed_tgt[:,:self.encoder_length,:]]
if t_observed_inp is not None:
_historical_inputs.insert(0,t_observed_inp[:,:self.encoder_length,:])
historical_inputs = torch.cat(_historical_inputs, dim=-2)
future_inputs = t_known_inp[:, self.encoder_length:]
# Encoders
historical_features, _ = self.history_vsn(historical_inputs, cs)
history, state = self.history_encoder(historical_features, (ch, cc))
future_features, _ = self.future_vsn(future_inputs, cs)
future, _ = self.future_encoder(future_features, state)
torch.cuda.synchronize() # this call gives perf boost for unknown reasons
# skip connection
input_embedding = torch.cat([historical_features, future_features], dim=1)
temporal_features = torch.cat([history, future], dim=1)
temporal_features = self.input_gate(temporal_features)
temporal_features = temporal_features + input_embedding
temporal_features = self.input_gate_ln(temporal_features)
# Static enrichment
enriched = self.enrichment_grn(temporal_features, c=ce)
# Temporal self attention
x, _ = self.attention(enriched, mask_future_timesteps=True)
# Don't compute hictorical quantiles
x = x[:, self.encoder_length:, :]
temporal_features = temporal_features[:, self.encoder_length:, :]
enriched = enriched[:, self.encoder_length:, :]
x = self.attention_gate(x)
x = x + enriched
x = self.attention_ln(x)
# Position-wise feed-forward
x = self.positionwise_grn(x)
# Final skip connection
x = self.decoder_gate(x)
x = x + temporal_features
x = self.decoder_ln(x)
out = self.quantile_proj(x)
return out

View file

@ -0,0 +1 @@
tensorboard

View file

@ -0,0 +1,54 @@
#! /bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
NUM_GPUS=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
[ $NUM_GPUS -eq 16 ] && WORKER_NUMS=(1 8 16) || WORKER_NUMS=(1 8)
DATASETS=(electricity traffic)
rm -r /tmp/benchmark_results
for DATASET in ${DATASETS[@]}
do
for NGPU in ${WORKER_NUMS[@]}
do
for BATCH_SIZE in 512 1024 1536 2048 2560
do
for USE_AMP in --use_amp ""
do
for AFFINITY in "--affinity disabled" "--affinity single" "--affinity socket_unique_interleaved"
do
EXP_NAME="TFT_benchmark_${DATASET}_BS_${BATCH_SIZE}_${NGPU}GPU${USE_AMP}_${AFFINITY}"
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset ${DATASET} \
--data_path /data/processed/${DATASET}_bin \
--batch_size=${BATCH_SIZE} \
--lr 5e-4 \
--epochs 1 \
--sample 100000 5000 \
--seed 1 \
${USE_AMP} \
${AFFINITY} \
--clip_grad 0.1 \
--results /tmp/benchmark_results/${EXP_NAME}
done
done
done
done
done
for P in `ls /tmp/benchmark_results/`;
do
echo ${P}
tail -n 1 /tmp/benchmark_results/${P}/dllogger.json
done

View file

@ -0,0 +1,40 @@
#!/bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
DATAPATH='/data'
declare -A URLS=( ['electricity']='https://archive.ics.uci.edu/ml/machine-learning-databases/00321/LD2011_2014.txt.zip'
['traffic']='https://archive.ics.uci.edu/ml/machine-learning-databases/00204/PEMS-SF.zip'
)
mkdir -p ${DATAPATH}/raw
mkdir -p ${DATAPATH}/processed
for DS in electricity traffic
do
DS_PATH=${DATAPATH}/raw/${DS}
ZIP_FNAME=${DS_PATH}.zip
if [ ! -d ${DS_PATH} ]
then
wget "${URLS[${DS}]}" -O ${ZIP_FNAME}
unzip ${ZIP_FNAME} -d ${DS_PATH}
fi
python -c "from data_utils import standarize_${DS} as standarize; standarize(\"${DS_PATH}\")"
python -c "from data_utils import preprocess; \
from configuration import ${DS^}Config as Config; \
preprocess(\"${DS_PATH}/standarized.csv\", \"${DATAPATH}/processed/${DS}_bin\", Config())"
done

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,36 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.06-py3
FROM ${FROM_IMAGE_NAME}
RUN apt-get update && apt-get install -y libb64-dev libb64-0d
WORKDIR /workspace
#ENV PYTHONPATH /workspace
RUN pip uninstall -y typing
RUN apt update && apt install -y p7zip-full
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir --ignore-installed -r requirements.txt
RUN pip install --no-cache-dir -e git://github.com/NVIDIA/dllogger#egg=dllogger
COPY . .
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
# AMP monkey-patch
RUN sed -i 's/ def forward(ctx,/ @amp.custom_fwd\(cast_inputs=torch.float32\)\n def forward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/ def backward(ctx,/ @amp.custom_bwd\n def backward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/^import torch$/import torch\nfrom torch.cuda import amp/' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,25 @@
Individual Contributor License Agreement (CLA)
Thank you for submitting your contributions to this project.
By signing this CLA, you agree that the following terms apply to all of your past, present and future contributions to the project.
License.
You hereby represent that all present, past and future contributions are governed by the Apache 2.0 License copyright statement.
This entails that to the extent possible under law, you transfer all copyright and related or neighboring rights of the code or documents you contribute to the project itself or its maintainers. Furthermore you also represent that you have the authority to perform the above waiver with respect to the entirety of you contributions.
Moral Rights.
To the fullest extent permitted under applicable law, you hereby waive, and agree not to assert, all of your “moral rights” in or relating to your contributions for the benefit of the project.
Third Party Content.
If your Contribution includes or is based on any source code, object code, bug fixes, configuration changes, tools, specifications, documentation, data, materials, feedback, information or other works of authorship that were not authored by you (“Third Party Content”) or if you are aware of any third party intellectual property or proprietary rights associated with your Contribution (“Third Party Rights”), then you agree to include with the submission of your Contribution full details respecting such Third Party Content and Third Party Rights, including, without limitation, identification of which aspects of your Contribution contain Third Party Content or are associated with Third Party Rights, the owner/author of the Third Party Content and Third Party Rights, where you obtained the Third Party Content, and any applicable third party license terms or restrictions respecting the Third Party Content and Third Party Rights. For greater certainty, the foregoing obligations respecting the identification of Third Party Content and Third Party Rights do not apply to any portion of a Project that is incorporated into your Contribution to that same Project.
Representations.
You represent that, other than the Third Party Content and Third Party Rights identified by you in accordance with this Agreement, you are the sole author of your Contributions and are legally entitled to grant the foregoing licenses and waivers in respect of your Contributions. If your Contributions were created in the course of your employment with your past or present employer(s), you represent that such employer(s) has authorized you to make your Contributions on behalf of such employer(s) or such employer (s) has waived all of their right, title or interest in or to your Contributions.
Disclaimer.
To the fullest extent permitted under applicable law, your Contributions are provided on an "as is" basis, without any warranties or conditions, express or implied, including, without limitation, any implied warranties or conditions of non-infringement, merchantability or fitness for a particular purpose. You are not required to provide support for your Contributions, except to the extent you desire to provide support.
No Obligation.
You acknowledge that the maintainers of this project are under no obligation to use or incorporate your contributions into the project. The decision to use or incorporate your contributions into the project will be made at the sole discretion of the maintainers or their authorized delegates.

View file

@ -0,0 +1,3 @@
TFT for PyTorch
This repository includes software from https://github.com/google-research/google-research/tree/master/tft licensed under the Apache License, Version 2.0

View file

@ -0,0 +1,465 @@
# Temporal Fusion Transformer For PyTorch
This repository provides a script and recipe to train the Temporal Fusion Transformer model to achieve state-of-the-art accuracy. The content of this repository is tested and maintained by NVIDIA.
## Table Of Contents
- [Model overview](#model-overview)
* [Model architecture](#model-architecture)
* [Default configuration](#default-configuration)
* [Feature support matrix](#feature-support-matrix)
* [Features](#features)
* [Mixed precision training](#mixed-precision-training)
* [Enabling mixed precision](#enabling-mixed-precision)
* [Enabling TF32](#enabling-tf32)
* [Glossary](#glossary)
- [Setup](#setup)
* [Requirements](#requirements)
- [Quick Start Guide](#quick-start-guide)
- [Advanced](#advanced)
* [Scripts and sample code](#scripts-and-sample-code)
* [Command-line options](#command-line-options)
* [Getting the data](#getting-the-data)
* [Dataset guidelines](#dataset-guidelines)
* [Multi-dataset](#multi-dataset)
* [Training process](#training-process)
* [Inference process](#inference-process)
- [Performance](#performance)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training accuracy: NVIDIA DGX A100 (8x A100 80GB)](#training-accuracy-nvidia-dgx-a100-8x-a100-80gb)
* [Training accuracy: NVIDIA DGX-1 (8x V100 16GB)](#training-accuracy-nvidia-dgx-1-8x-v100-16gb)
* [Training stability test](#training-stability-test)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 (8x V100 16GB)](#training-performance-nvidia-dgx-1-8x-v100-16gb)
- [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
## Model overview
The Temporal Fusion Transformer [TFT](https://arxiv.org/abs/1912.09363) model is a state-of-the-art architecture for interpretable, multi-horizon time-series prediction. The model was first developed and [implemented by Google](https://github.com/google-research/google-research/tree/master/tft) with the collaboration with the University of Oxford.
This implementation differs from the reference implementation by addressing the issue of missing data, which is common in production datasets, by either masking their values in attention matrices or embedding them as a special value in the latent space.
This model enables the prediction of confidence intervals for future values of time series for multiple future timesteps.
This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.45x faster than training without Tensor Cores while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
### Model architecture
The TFT model is a hybrid architecture joining LSTM encoding of time series and interpretability of transformer attention layers. Prediction is based on three types of variables: static (constant for a given time series), known (known in advance for whole history and future), observed (known only for historical data). All these variables come in two flavors: categorical, and continuous. In addition to historical data, we feed the model with historical values of time series. All variables are embedded in high-dimensional space by learning an embedding vector. Categorical variables embeddings are learned in the classical sense of embedding discrete values. The model learns a single vector for each continuous variable, which is then scaled by this variables value for further processing. The next step is to filter variables through the Variable Selection Network (VSN), which assigns weights to the inputs in accordance with their relevance to the prediction. Static variables are used as a context for variable selection of other variables and as an initial state of LSTM encoders.
After encoding, variables are passed to multi-head attention layers (decoder), which produce the final prediction. Whole architecture is interwoven with residual connections with gating mechanisms that allow the architecture to adapt to various problems by skipping some parts of it.
For the sake of explainability, heads of self-attention layers share value matrices. This allows interpreting self-attention as an ensemble of models predicting different temporal patterns over the same feature set. The other feature that helps us understand the model is VSN activations, which tells us how relevant the given feature is to the prediction.
![](TFT_architecture.PNG)
*image source: https://arxiv.org/abs/1912.09363*
### Default configuration
The specific configuration of the TFT model depends on the dataset used. Not only is the volume of the model subject to change but so are the data sampling and preprocessing strategies. During preprocessing, data is normalized per feature. For a part of the datasets, we apply scaling per-time-series, which takes into account shifts in distribution between entities (i.e., a factory consumes more electricity than an average house). The model is trained with the quantile loss: <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_it,\hat{y}_i(q,t),q)}{Nt_{max}}">
For quantiles in [0.1, 0.5, 0.9]. The default configurations are tuned for distributed training on DGX-1-32G with mixed precision. We use dynamic loss scaling. Specific values are provided in the table below.
| Dataset | Training samples | Validation samples | Test samples | History length | Forecast horizon | Dropout | Hidden size | #Heads | BS | LR | Gradient clipping |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Electricity | 450k | 50k | 53.5k | 168 | 24 | 0.1 | 128 | 4 | 8x1024 | 1e-3 | 0.0 |
| Traffic | 450k | 50k | 139.6k | 168 | 24 | 0.3 | 128 | 4 | 8x1024 | 1e-3 | 0.0
### Feature support matrix
The following features are supported by this model:
| Feature | Yes column
|----------------------------|--------------------------
|Distributed data parallel | Yes
|PyTorch AMP | Yes
#### Features
[Automatic Mixed Precision](https://pytorch.org/docs/stable/amp.html)
provides an easy way to leverage Tensor Cores performance. It allows the execution of parts of a network in lower precision. Refer to [Mixed precision training](#mixed-precision-training) for more information.
[PyTorch
DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) - a module
wrapper that enables easy multiprocess distributed data-parallel
training.
### Mixed precision training
Mixed precision is the combined use of different numerical precisions in a
computational method.
[Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant
computational speedup by performing operations in half-precision format while
storing minimal information in single-precision to retain as much information
as possible in critical parts of the network. Since the introduction of [Tensor Cores](https://developer.nvidia.com/tensor-cores) in Volta, and following with
both the Turing and Ampere architectures, significant training speedups are
experienced by switching to
mixed precision -- up to 3x overall speedup on the most arithmetically intense
model architectures. Using mixed precision training previously required two
steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Manually adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced
in the Pascal architecture and first supported in [CUDA
8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep
Learning SDK.
For information about:
* How to train using mixed precision, refer to the [Mixed Precision
Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed
Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html)
documentation.
* Techniques used for mixed precision training, refer to the [Mixed-Precision
Training of Deep Neural
Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/)
blog.
* APEX tools for mixed precision training, refer to the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in
PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/)
.
#### Enabling mixed precision
Mixed precision is enabled in PyTorch by using the Automatic Mixed Precision torch.cuda.amp module, which casts variables to half-precision upon retrieval while storing variables in single-precision format. Furthermore, to preserve small gradient magnitudes in backpropagation, a [loss scaling](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#lossscaling) step must be included when applying gradients. In PyTorch, loss scaling can be applied automatically by the GradScaler class. All the necessary steps to implement AMP are verbosely described [here](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples).
To enable mixed precision for TFT, simply add the `--use_amp` option to the training script.
#### Enabling TF32
TensorFloat-32 (TF32) is the new math mode in [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) GPUs for handling the matrix math, also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs.
TF32 Tensor Cores can speed up networks using FP32, typically with no loss of accuracy. It is more robust than FP16 for models which require high dynamic range for weights or activations.
For more information, refer to the [TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) blog post.
TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
### Glossary
**Multi horizon prediction**
Process of estimating values of a time series for multiple future time steps.
**Quantiles**
Cut points dividing the range of a probability distribution intervals with equal probabilities.
**Time series**
Series of data points indexed and equally spaced in time.
**Transformer**
The paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduces a novel architecture called Transformer that uses an attention mechanism and transforms one sequence into another.
## Setup
The following section lists the requirements that you need to meet in order to start training the TFT model.
### Requirements
This repository contains Dockerfile, which extends the PyTorch NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
- [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
- Supported GPUs:
- [NVIDIA Volta architecture](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
- [NVIDIA Turing architecture](https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/)
- [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
- Running [PyTorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running)
For those unable to use the PyTorch NGC container to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
## Quick Start Guide
To train your model using mixed or TF32 precision with Tensor Cores, perform the following steps using the default parameters of the TFT model on any of the benchmark datasets. For the specifics concerning training and inference, refer to the [Advanced](#advanced) section.
1. Clone the repository.
```bash
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Forecasting/TFT
```
2. Build the TFT PyTorch NGC container.
```bash
docker build --network=host -t tft .
```
3. Start an interactive session in the NGC container to run training/inference.
```bash
docker run -it --rm --ipc=host --network=host --gpus all -v /path/to/your/data:/data/ tft
```
Note: Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
4. Download and preprocess datasets.
```bash
bash scripts/get_data.sh
```
5. Start training. Choose one of the scripts provided in the `scripts/` directory. Results are stored in the `/results` directory.
These scripts are tuned for DGX1-32G. If you have a different system, use NGPU and BATCH_SIZE variables to adjust the parameters for your system.
```bash
bash scripts/run_electricity.sh
bash scripts/run_traffic.sh
```
6. Start validation/evaluation. The metric we use for evaluation is q-risk. We can compare it per-quantile in the Pareto sense or jointly as one number indicating accuracy.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin
```
7. Start inference/predictions. Visualize and save predictions by running the following command.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin \
--visualize \
--save_predictions
```
Now that you have your model trained and evaluated, you can choose to compare your training results with our [Training accuracy results](#training-accuracy-results). You can also choose to benchmark your performance to [Training performance benchmark](#training-performance-results). Following the steps in these sections will ensure that you achieve the same accuracy and performance results as stated in the [Results](#results) section.
## Advanced
The following sections provide more details about the dataset, running training and inference, and the training results.
### Scripts and sample code
In the root directory, the most important files are:
`train.py`: Entry point for training
`data_utils.py`: File containing the dataset implementation and preprocessing functions
`modeling.py`: Definition of the model
`configuration.py`: Contains configuration classes for various experiments
`test.py`: Entry point testing trained model.
`Dockerfile`: Container definition
`log_helper.py`: Contains helper functions for setting up dllogger
`criterions.py`: Definitions of loss functions
The `scripts` directory contains scripts for default use cases:
`run_electricity.sh`: train default model on the electricity dataset
`run_traffic.sh`: train default model on the traffic dataset
### Command-line options
To view the full list of available options and their descriptions, use the `-h` or `--help` command-line option, for example:
`python train.py --help`.
The following example output is printed when running the model:
```
usage: train.py [-h] --data_path DATA_PATH --dataset {electricity,volatility,traffic,favorita} [--epochs EPOCHS] [--sample_data SAMPLE_DATA SAMPLE_DATA] [--batch_size BATCH_SIZE] [--lr LR] [--seed SEED] [--use_amp] [--clip_grad CLIP_GRAD]
[--early_stopping EARLY_STOPPING] [--results RESULTS] [--log_file LOG_FILE] [--distributed_world_size N] [--distributed_rank DISTRIBUTED_RANK] [--local_rank LOCAL_RANK] [--overwrite_config OVERWRITE_CONFIG]
optional arguments:
-h, --help show this help message and exit
--data_path DATA_PATH
--dataset {electricity,volatility,traffic,favorita}
--epochs EPOCHS
--sample_data SAMPLE_DATA SAMPLE_DATA
--batch_size BATCH_SIZE
--lr LR
--seed SEED
--use_amp Enable automatic mixed precision
--clip_grad CLIP_GRAD
--early_stopping EARLY_STOPPING
Stop training if validation loss does not improve for more than this number of epochs.
--results RESULTS
--log_file LOG_FILE
--distributed_world_size N
total number of GPUs across all nodes (default: all visible GPUs)
--distributed_rank DISTRIBUTED_RANK
rank of the current worker
--local_rank LOCAL_RANK
rank of the current worker
--overwrite_config OVERWRITE_CONFIG
JSON string used to overload config
```
### Getting the data
The TFT model was trained on the electricity and traffic benchmark datasets. This repository contains the `get_data.sh` download script, which for electricity and and traffic datasets will automatically download and preprocess the training, validation and test datasets, and produce files that contain scalers.
#### Dataset guidelines
The `data_utils.py` file contains all functions that are used to preprocess the data. Initially the data is loaded to a `pandas.DataFrame` and parsed to the common format which contains the features we will use for training. Then standardized data is cleaned, normalized, encoded and binarized.
This step does the following:
Drop all the columns that are not marked in the configuration file as used for training or preprocessing
Flatten indices in case time series are indexed by more than one column
Split the data into training, validation and test splits
Filter out all the time series shorter than minimal example length
Normalize columns marked as continuous in the configuration file
Encode as integers columns marked as categorical
Save the data in csv and binary formats
#### Multi-dataset
In order to use an alternate dataset, you have to write a function that parses your data to a common format. The format is as follows:
There is at least one id column
There is exactly one time column (that can also be used as a feature column)
Each feature is in a separate column
Each row represents a moment in time for only one time series
Additionally, you must specify a configuration of the network, including a data description. Refer to the example in `configuration.py` file.
### Training process
The `train.py` script is an entry point for a training procedure. Refined recipes can be found in the `scripts` directory.
The model trains for at most `--epochs` epochs. If option `--early_stopping N` is set, then training will end if for N subsequent epochs validation loss hadnt improved.
The details of the architecture and the dataset configuration are encapsulated by the `--dataset` option. This option chooses one of the configurations stored in the `configuration.py` file. You can enable mixed precision training by providing the `--use_amp` option. The training script supports multi-GPU training with the APEX package. To enable distributed training prepend training command with `python -m torch.distributed.launch --nproc_per_node=${NGPU}`.
Example command:
```
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=1024 \
--sample 450000 50000 \
--lr 1e-3 \
--epochs 25 \
--early_stopping 5 \
--seed 1 \
--use_amp \
--results /results/TFT_electricity_bs8x1024_lr1e-3/seed_1
```
The model is trained by optimizing quantile loss <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_{it},\hat{y}_i(q,t),q)}{Nt_{max}}">
. After training, the checkpoint with the least validation loss is evaluated on a test split with q-risk metric <img src="https://render.githubusercontent.com/render/math?math=\Large\frac{2\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}QL(y_t,\hat{y}(q,t),q)}{\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}|y_t|}">.
Results are by default stored in the `/results` directory. This can be changed by providing the `--results` option. At the end of the training, the results directory will contain the trained checkpoint which had the lowest validation loss, dllogger logs (in dictionary per line format), and TensorBoard logs.
### Inference process
Inference can be run by launching the `inference.py` script. The script requires a trained checkpoint to run. It is crucial to prepare the data in the same way as training data prior to running the inference. Example command:
```
python inference.py \
--checkpoint /results/checkpoint.pt \
--data /data/processed/electricity_bin/test.csv \
--tgt_scalers /data/processed/electricity_bin/tgt_scalers.bin \
--cat_encodings /data/processed/electricity_bin/cat_encodings.bin \
--batch_size 2048 \
--visualize \
--save_predictions \
--joint_visualization \
--results /results \
--use_amp
```
In the default setting, it performs the evaluation of the model on a specified dataset and prints q-risk evaluated on this dataset. In order to save the predictions, use the `--save_predictions` option. Predictions will be stored in the directory specified by the `--results` option in the csv format. Option `--joint_visualization` allows us to plot graphs in TensorBoard format, allowing us to inspect the results and compare them to true values. Using `--visualize`, you can save plots for each example in a separate file.
## Performance
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
#### Training performance benchmark
In order to run training benchmarks, use the `scripts/benchmark.sh` script.
#### Inference performance benchmark
To benchmark the inference performance on a specific batch size and dataset, run the `inference.py` script.
### Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
#### Training accuracy results
We conducted an extensive hyperparameter search along with stability tests. The presented results are the averages from the hundreds of runs.
##### Training accuracy: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - TF32 | Accuracy - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (TF32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-------
| Electricity | 1 | 1024 | 0.027 / 0.059 / 0.029 | 0.028 / 0.058 / 0.029 | 1427s | 1087s | 1.313x
| Electricity | 8 | 1024 | 0.027 / 0.056 / 0.028 | 0.026 / 0.054 / 0.029 | 216s | 176s | 1.227x
| Traffic | 1 | 1024 | 0.040 / 0.103 / 0.075 | 0.040 / 0.103 / 0.075 | 957s | 726s | 1.318x
| Traffic | 8 | 1024 | 0.042 / 0.104 / 0.076 | 0.042 / 0.106 / 0.077 | 151s | 126s | 1.198x
##### Training accuracy: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with V100 16GB GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - FP32 | Accuracy - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-----------
| Electricity | 1 | 1024 | 0.027 / 0.056 / 0.028 | 0.027 / 0.058 / 0.029 | 2559s | 1598s | 1.601x
| Electricity | 8 | 1024 | 0.027 / 0.055 / 0.028 | 0.027 / 0.055 / 0.029 | 381s | 261s | 1.460x
| Traffic | 1 | 1024 | 0.040 / 0.102 / 0.075 | 0.041 / 0.101 / 0.074 | 1718s | 1062s | 1.618x
| Traffic | 8 | 1024 | 0.042 / 0.106 / 0.076 | 0.042 / 0.105 / 0.077 | 256s | 176s | 1.455x
##### Training stability test
In order to get a greater picture of the models accuracy, we performed a hyperparameter search along with stability tests on 100 random seeds for each configuration. Then, for each benchmark dataset, we have chosen the architecture with the least mean test q-risk. The table below summarizes the best configurations.
| Dataset | #GPU | Hidden size | #Heads | Local BS | LR | Gradient clipping | Dropout | Mean q-risk | Std q-risk | Min q-risk | Max q-risk
|-------------|------|-------------|--------|----------|------|-------------------|---------|-------------|------------| -----------|------
| Electricity | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.1 | 0.1131 | 0.0025 | 0.1080 | 0.1200
| Traffic | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.3 | 0.2180 | 0.0049 | 0.2069 | 0.2336
#### Training performance results
##### Training performance: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 (A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - TF32 | Throughput - mixed precision | Throughput speedup (TF32 - mixed precision) | Weak scaling - TF32 | Weak scaling - mixed precision
|-------------|---|------|--------|--------|-------|-------|-----
| Electricity | 1 | 1024 | 10173 | 13703 | 1.35x | 1 | 1
| Electricity | 8 | 1024 | 80596 | 107761 | 1.34x | 7.92x | 7.86x
| Traffic | 1 | 1024 | 10197 | 13779 | 1.35x | 1 | 1
| Traffic | 8 | 1024 | 80692 | 107979 | 1.34x | 7.91x | 7.84x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
##### Training performance: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with (V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - FP32 | Throughput - mixed precision | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision
|-------------|---|------|-------|-------|-------|------|----
| Electricity | 1 | 1024 | 5580 | 9148 | 1.64x | 1 | 1
| Electricity | 8 | 1024 | 43351 | 69855 | 1.61x | 7.77x | 7.64x
| Traffic | 1 | 1024 | 5593 | 9194 | 1.64x | 1 | 1
| Traffic | 8 | 1024 | 43426 | 69983 | 1.61x | 7.76x | 7.61x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
## Release notes
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIAs latest software release. For the most up-to-date performance measurements, go to https://developer.nvidia.com/deep-learning-performance-training-inference.
### Changelog
October 2021
- Initial release
### Known issues
There are no known issues with this model.

Binary file not shown.

After

Width:  |  Height:  |  Size: 969 KiB

View file

@ -0,0 +1,128 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from data_utils import InputTypes, DataTypes, FeatureSpec
import datetime
class ElectricityConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('power_usage', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('hour', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'days_from_start' # This column contains time indices across which we split the data
self.train_range = (1096, 1315)
self.valid_range = (1308, 1339)
self.test_range = (1332, 1346)
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = True
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [369]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.1
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
class TrafficConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('values', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('time_on_day', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'sensor_day' # This column contains time indices across which we split the data
self.train_range = (0, 151)
self.valid_range = (144, 166)
self.test_range = (159, float('inf'))
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = False
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [963]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.3
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
CONFIGS = {'electricity': ElectricityConfig,
'traffic': TrafficConfig,
}

View file

@ -0,0 +1,28 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
import torch.nn.functional as F
class QuantileLoss(nn.Module):
def __init__(self, config):
super().__init__()
self.register_buffer('q', torch.tensor(config.quantiles))
def forward(self, predictions, targets):
diff = predictions - targets
ql = (1-self.q)*F.relu(diff) + self.q*F.relu(-diff)
losses = ql.view(-1, ql.shape[-1]).mean(0)
return losses

View file

@ -0,0 +1,790 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################
# Copyright 2021 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import pickle
import enum
import datetime
from collections import namedtuple, OrderedDict
import sklearn.preprocessing
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
from bisect import bisect
import torch
from torch.utils.data import Dataset,IterableDataset,DataLoader
class DataTypes(enum.IntEnum):
"""Defines numerical types of each column."""
CONTINUOUS = 0
CATEGORICAL = 1
DATE = 2
STR = 3
class InputTypes(enum.IntEnum):
"""Defines input types of each column."""
TARGET = 0
OBSERVED = 1
KNOWN = 2
STATIC = 3
ID = 4 # Single column used as an entity identifier
TIME = 5 # Single column exclusively used as a time index
FeatureSpec = namedtuple('FeatureSpec', ['name', 'feature_type', 'feature_embed_type'])
DTYPE_MAP = {
DataTypes.CONTINUOUS : np.float32,
DataTypes.CATEGORICAL : np.int64,
DataTypes.DATE:'datetime64[ns]',
DataTypes.STR: str
}
FEAT_ORDER = [
(InputTypes.STATIC, DataTypes.CATEGORICAL),
(InputTypes.STATIC, DataTypes.CONTINUOUS),
(InputTypes.KNOWN, DataTypes.CATEGORICAL),
(InputTypes.KNOWN, DataTypes.CONTINUOUS),
(InputTypes.OBSERVED, DataTypes.CATEGORICAL),
(InputTypes.OBSERVED, DataTypes.CONTINUOUS),
(InputTypes.TARGET, DataTypes.CONTINUOUS),
(InputTypes.ID, DataTypes.CATEGORICAL)
]
FEAT_NAMES = ['s_cat' , 's_cont' , 'k_cat' , 'k_cont' , 'o_cat' , 'o_cont' , 'target', 'id']
DEFAULT_ID_COL = 'id'
class TFTBinaryDataset(Dataset):
def __init__(self, path, config):
super(TFTBinaryDataset).__init__()
self.features = [x for x in config.features if x.feature_embed_type != DataTypes.DATE]
self.example_length = config.example_length
self.stride = config.dataset_stride
self.grouped = pickle.load(open(path, 'rb'))
self.grouped = [x for x in self.grouped if x.shape[0] >= self.example_length]
self._cum_examples_in_group = np.cumsum([(g.shape[0] - self.example_length + 1)//self.stride for g in self.grouped])
self.feature_type_col_map = [[i for i,f in enumerate(self.features) if (f.feature_type, f.feature_embed_type) == x] for x in FEAT_ORDER]
# The list comprehension below is an elaborate way of rearranging data into correct order,
# simultaneously doing casting to proper types. Probably can be written neater
self.grouped = [
[
arr[:, idxs].view(dtype=np.float32).astype(DTYPE_MAP[t[1]])
for t, idxs in zip(FEAT_ORDER, self.feature_type_col_map)
]
for arr in self.grouped
]
def __len__(self):
return self._cum_examples_in_group[-1] if len(self._cum_examples_in_group) else 0
def __getitem__(self, idx):
g_idx = bisect(self._cum_examples_in_group, idx)
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx]
tensors = [
torch.from_numpy(feat[e_idx * self.stride:e_idx*self.stride + self.example_length])
if feat.size else torch.empty(0)
for feat in group
]
return OrderedDict(zip(FEAT_NAMES, tensors))
class TFTDataset(Dataset):
def __init__(self, path, config):
super(TFTDataset).__init__()
self.features = config.features
self.data = pd.read_csv(path, index_col=0)
self.example_length = config.example_length
self.stride = config.dataset_stride
# name field is a column name.
# there can be multiple entries with the same name because one column can be interpreted in many ways
time_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.TIME)
id_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.ID)
if not id_col_name in self.data.columns:
id_col_name = DEFAULT_ID_COL
self.features = [x for x in self.features if x.feature_type!=InputTypes.ID]
self.features.append(FeatureSpec(DEFAULT_ID_COL, InputTypes.ID, DataTypes.CATEGORICAL))
col_dtypes = {v.name:DTYPE_MAP[v.feature_embed_type] for v in self.features}
self.data.sort_values(time_col_name,inplace=True)
self.data = self.data[set(x.name for x in self.features)] #leave only relevant columns
self.data = self.data.astype(col_dtypes)
self.data = self.data.groupby(id_col_name).filter(lambda group: len(group) >= self.example_length)
self.grouped = list(self.data.groupby(id_col_name))
self._cum_examples_in_group = np.cumsum([(len(g[1]) - self.example_length + 1)//self.stride for g in self.grouped])
def __len__(self):
return self._cum_examples_in_group[-1]
def __getitem__(self, idx):
g_idx = len([x for x in self._cum_examples_in_group if x <= idx])
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx][1]
sliced = group.iloc[e_idx * self.stride:e_idx*self.stride + self.example_length]
# We need to be sure that tensors are returned in the correct order
tensors = tuple([] for _ in range(8))
for v in self.features:
if v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[0].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[1].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[2].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[3].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[4].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[5].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.TARGET:
tensors[6].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.ID:
tensors[7].append(torch.from_numpy(sliced[v.name].to_numpy()))
tensors = [torch.stack(x, dim=-1) if x else torch.empty(0) for x in tensors]
return OrderedDict(zip(FEAT_NAMES, tensors))
def get_dataset_splits(df, config):
if hasattr(config, 'relative_split') and config.relative_split:
forecast_len = config.example_length - config.encoder_length
# The valid split is shifted from the train split by number of the forecast steps to the future.
# The test split is shifted by the number of the forecast steps from the valid split
train = []
valid = []
test = []
for _, group in df.groupby(DEFAULT_ID_COL):
index = group[config.time_ids]
_train = group.loc[index < config.valid_boundary]
_valid = group.iloc[(len(_train) - config.encoder_length):(len(_train) + forecast_len)]
_test = group.iloc[(len(_train) - config.encoder_length + forecast_len):(len(_train) + 2*forecast_len)]
train.append(_train)
valid.append(_valid)
test.append(_test)
train = pd.concat(train, axis=0)
valid = pd.concat(valid, axis=0)
test = pd.concat(test, axis=0)
else:
index = df[config.time_ids]
train = df.loc[(index >= config.train_range[0]) & (index < config.train_range[1])]
valid = df.loc[(index >= config.valid_range[0]) & (index < config.valid_range[1])]
test = df.loc[(index >= config.test_range[0]) & (index < config.test_range[1])]
return train, valid, test
def flatten_ids(df, config):
if config.missing_id_strategy == 'drop':
if hasattr(config, 'combine_ids') and config.combine_ids:
index = np.logical_or.reduce([df[c].isna() for c in config.combine_ids])
else:
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
index = df[id_col].isna()
index = index[index == True].index # Extract indices of nans
df.drop(index, inplace=True)
if not (hasattr(config, 'combine_ids') and config.combine_ids):
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
ids = df[id_col].apply(str)
df.drop(id_col, axis=1, inplace=True)
encoder = sklearn.preprocessing.LabelEncoder().fit(ids.values)
df[DEFAULT_ID_COL] = encoder.transform(ids)
encoders = OrderedDict({id_col: encoder})
else:
encoders = {c:sklearn.preprocessing.LabelEncoder().fit(df[c].values) for c in config.combine_ids}
encoders = OrderedDict(encoders)
lens = [len(v.classes_) for v in encoders.values()]
clens = np.roll(np.cumprod(lens), 1)
clens[0] = 1
# this takes a looooooot of time. Probably it would be better to create 2 dummy columns
df[DEFAULT_ID_COL] = df.apply(lambda row: sum([encoders[c].transform([row[c]])[0]*clens[i] for i,c in enumerate(encoders.keys())]), axis=1)
df.drop(config.combine_ids, axis=1, inplace=True)
return DEFAULT_ID_COL, encoders
def impute(df, config):
#XXX This ensures that out scaling will have the same mean. We still need to check the variance
if not hasattr(config, 'missing_data_label'):
return df, None
else:
imp = SimpleImputer(missing_values=config.missing_data_label, strategy='mean')
mask = df.applymap(lambda x: True if x == config.missing_data_label else False)
data = df.values
col_mask = (data == config.missing_data_label).all(axis=0)
data[:,~col_mask] = imp.fit_transform(data)
return data, mask
def normalize_reals(train, valid, test, config, id_col=DEFAULT_ID_COL):
tgt_cols = [x.name for x in config.features if x.feature_type == InputTypes.TARGET]
real_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CONTINUOUS).difference(set(tgt_cols)))
real_scalers = {}
tgt_scalers = {}
def apply_scalers(df, name=None):
if name is None:
name = df.name
mask = df.applymap(lambda x: True if x == config.missing_data_label else False) if hasattr(config, 'missing_data_label') else None
df[real_cols] = real_scalers[name].transform(df[real_cols])
if mask is not None and any(mask):
df[real_cols].mask(mask, 10**9)
df[tgt_cols] = tgt_scalers[name].transform(df[tgt_cols])
return df
if config.scale_per_id:
for identifier, sliced in train.groupby(id_col):
data = sliced[real_cols]
data, _ = impute(data, config)
real_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(data)
# XXX We should probably remove examples that contain NaN as a target
target = sliced[tgt_cols]
tgt_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(target)
train = train.groupby(id_col).apply(apply_scalers)
# For valid and testing leave only timeseries previously present in train subset
# XXX for proper data science we should consider encoding unseen timeseries as a special case, not throwing them away
valid = valid.loc[valid[id_col].isin(real_scalers.keys())]
valid = valid.groupby(id_col).apply(apply_scalers)
test = test.loc[test[id_col].isin(real_scalers.keys())]
test = test.groupby(id_col).apply(apply_scalers)
else:
data, _ = impute(train[real_cols], config)
real_scalers[''] = sklearn.preprocessing.StandardScaler().fit(data)
tgt_scalers[''] = sklearn.preprocessing.StandardScaler().fit(train[tgt_cols])
train = apply_scalers(train, name='')
valid = apply_scalers(valid, name='')
test = apply_scalers(test, name='')
return train, valid, test, real_scalers, tgt_scalers
def encode_categoricals(train, valid, test, config):
cat_encodings = {}
cat_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CATEGORICAL and v.feature_type != InputTypes.ID))
num_classes = [] #XXX Maybe we should modify config based on this value? Or send a warninig?
# For TC performance reasons we might want for num_classes[i] be divisible by 8
# Train categorical encoders
for c in cat_cols:
if config.missing_cat_data_strategy == 'special_token':
#XXX this will probably require some data augmentation
unique = train[c].unique()
valid[c].loc[valid[c].isin(unique)] = '<UNK>'
test[c].loc[test[c].isin(unique)] = '<UNK>'
if config.missing_cat_data_strategy == 'encode_all' or \
config.missing_cat_data_strategy == 'special_token':
srs = pd.concat([train[c], valid[c], test[c]]).apply(str)
cat_encodings[c] = sklearn.preprocessing.LabelEncoder().fit(srs.values)
elif config.missing_cat_data_strategy == 'drop':
# TODO: implement this. In addition to dropping rows this has to split specific time series in chunks
# to prevent data from having temporal gaps
pass
num_classes.append(srs.nunique())
print('Categorical variables encodings lens: ', num_classes)
for split in [train, valid, test]:
for c in cat_cols:
srs = split[c].apply(str)
split[c] = srs
split.loc[:,c] = cat_encodings[c].transform(srs)
return cat_encodings
def preprocess(src_path, dst_path, config):
df = pd.read_csv(src_path, index_col=0)
for c in config.features:
if c.feature_embed_type == DataTypes.DATE:
df[c.name] = pd.to_datetime(df[c.name])
# Leave only columns relevant to preprocessing
relevant_columns = list(set([f.name for f in config.features] + [config.time_ids]))
df = df[relevant_columns]
id_col, id_encoders = flatten_ids(df, config)
df = df.reindex(sorted(df.columns), axis=1)
train, valid, test = get_dataset_splits(df, config)
# Length filter the data (all timeseries shorter than example len will be dropped)
#for df in [train, valid, test]:
# df.groupby(id_col).filter(lambda x: len(x) >= config.example_length)
train = pd.concat([x[1] for x in train.groupby(id_col) if len(x[1]) >= config.example_length])
valid = pd.concat([x[1] for x in valid.groupby(id_col) if len(x[1]) >= config.example_length])
test = pd.concat([x[1] for x in test.groupby(id_col) if len(x[1]) >= config.example_length])
train, valid, test, real_scalers, tgt_scalers = normalize_reals(train, valid, test, config, id_col)
cat_encodings = encode_categoricals(train, valid, test, config)
os.makedirs(dst_path, exist_ok=True)
train.to_csv(os.path.join(dst_path, 'train.csv'))
valid.to_csv(os.path.join(dst_path, 'valid.csv'))
test.to_csv(os.path.join(dst_path, 'test.csv'))
# Save relevant columns in binary form for faster dataloading
# IMORTANT: We always expect id to be a single column indicating the complete timeseries
# We also expect a copy of id in form of static categorical input!!!
col_names = [id_col] + [x.name for x in config.features if x.feature_embed_type != DataTypes.DATE and x.feature_type != InputTypes.ID]
grouped_train = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in train.groupby(id_col)]
grouped_valid = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in valid.groupby(id_col)]
grouped_test = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in test.groupby(id_col)]
pickle.dump(grouped_train, open(os.path.join(dst_path, 'train.bin'), 'wb'))
pickle.dump(grouped_valid, open(os.path.join(dst_path, 'valid.bin'), 'wb'))
pickle.dump(grouped_test, open(os.path.join(dst_path, 'test.bin'), 'wb'))
with open(os.path.join(dst_path, 'real_scalers.bin'), 'wb') as f:
pickle.dump(real_scalers, f)
with open(os.path.join(dst_path, 'tgt_scalers.bin'), 'wb') as f:
pickle.dump(tgt_scalers, f)
with open(os.path.join(dst_path, 'cat_encodings.bin'), 'wb') as f:
pickle.dump(cat_encodings, f)
with open(os.path.join(dst_path, 'id_encoders.bin'), 'wb') as f:
pickle.dump(id_encoders, f)
def sample_data(dataset, num_samples):
if num_samples < 0:
return dataset
else:
return torch.utils.data.Subset(dataset, np.random.choice(np.arange(len(dataset)), size=num_samples, replace=False))
def standarize_electricity(path):
"""Code taken from https://github.com/google-research/google-research/blob/master/tft/script_download_data.py"""
df = pd.read_csv(os.path.join(path, 'LD2011_2014.txt'), index_col=0, sep=';', decimal=',')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
# Used to determine the start and end dates of a series
output = df.resample('1h').mean().replace(0., np.nan)
earliest_time = output.index.min()
df_list = []
for label in output:
print('Processing {}'.format(label))
srs = output[label]
start_date = min(srs.fillna(method='ffill').dropna().index)
end_date = max(srs.fillna(method='bfill').dropna().index)
active_range = (srs.index >= start_date) & (srs.index <= end_date)
srs = srs[active_range].fillna(0.)
tmp = pd.DataFrame({'power_usage': srs})
date = tmp.index
tmp['t'] = (date - earliest_time).seconds / 60 / 60 + (
date - earliest_time).days * 24
tmp['days_from_start'] = (date - earliest_time).days
tmp['categorical_id'] = label
tmp['date'] = date
tmp['id'] = label
tmp['hour'] = date.hour
tmp['day'] = date.day
tmp['day_of_week'] = date.dayofweek
tmp['month'] = date.month
df_list.append(tmp)
output = pd.concat(df_list, axis=0, join='outer').reset_index(drop=True)
output['categorical_id'] = output['id'].copy()
output['hours_from_start'] = output['t']
output['categorical_day_of_week'] = output['day_of_week'].copy()
output['categorical_hour'] = output['hour'].copy()
output.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_volatility(path):
df = pd.read_csv(os.path.join(path, 'oxfordmanrealizedvolatilityindices.csv'), index_col=0) # no explicit index
# Adds additional date/day fields
idx = [str(s).split('+')[0] for s in df.index
] # ignore timezones, we don't need them
dates = pd.to_datetime(idx)
df['date'] = dates
df['days_from_start'] = (dates - pd.datetime(2000, 1, 3)).days
df['day_of_week'] = dates.dayofweek
df['day_of_month'] = dates.day
df['week_of_year'] = dates.weekofyear
df['month'] = dates.month
df['year'] = dates.year
df['categorical_id'] = df['Symbol'].copy()
# Processes log volatility
vol = df['rv5_ss'].copy()
vol.loc[vol == 0.] = np.nan
df['log_vol'] = np.log(vol)
# Adds static information
symbol_region_mapping = {
'.AEX': 'EMEA',
'.AORD': 'APAC',
'.BFX': 'EMEA',
'.BSESN': 'APAC',
'.BVLG': 'EMEA',
'.BVSP': 'AMER',
'.DJI': 'AMER',
'.FCHI': 'EMEA',
'.FTMIB': 'EMEA',
'.FTSE': 'EMEA',
'.GDAXI': 'EMEA',
'.GSPTSE': 'AMER',
'.HSI': 'APAC',
'.IBEX': 'EMEA',
'.IXIC': 'AMER',
'.KS11': 'APAC',
'.KSE': 'APAC',
'.MXX': 'AMER',
'.N225': 'APAC ',
'.NSEI': 'APAC',
'.OMXC20': 'EMEA',
'.OMXHPI': 'EMEA',
'.OMXSPI': 'EMEA',
'.OSEAX': 'EMEA',
'.RUT': 'EMEA',
'.SMSI': 'EMEA',
'.SPX': 'AMER',
'.SSEC': 'APAC',
'.SSMI': 'EMEA',
'.STI': 'APAC',
'.STOXX50E': 'EMEA'
}
df['Region'] = df['Symbol'].apply(lambda k: symbol_region_mapping[k])
# Performs final processing
output_df_list = []
for grp in df.groupby('Symbol'):
sliced = grp[1].copy()
sliced.sort_values('days_from_start', inplace=True)
# Impute log volatility values
sliced['log_vol'].fillna(method='ffill', inplace=True)
sliced.dropna()
output_df_list.append(sliced)
df = pd.concat(output_df_list, axis=0)
df.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_traffic(path):
def process_list(s, variable_type=int, delimiter=None):
"""Parses a line in the PEMS format to a list."""
if delimiter is None:
l = [
variable_type(i) for i in s.replace('[', '').replace(']', '').split()
]
else:
l = [
variable_type(i)
for i in s.replace('[', '').replace(']', '').split(delimiter)
]
return l
def read_single_list(filename):
"""Returns single list from a file in the PEMS-custom format."""
with open(os.path.join(path, filename), 'r') as dat:
l = process_list(dat.readlines()[0])
return l
def read_matrix(filename):
"""Returns a matrix from a file in the PEMS-custom format."""
array_list = []
with open(os.path.join(path, filename), 'r') as dat:
lines = dat.readlines()
for i, line in enumerate(lines):
if (i + 1) % 50 == 0:
print('Completed {} of {} rows for {}'.format(i + 1, len(lines),
filename))
array = [
process_list(row_split, variable_type=float, delimiter=None)
for row_split in process_list(
line, variable_type=str, delimiter=';')
]
array_list.append(array)
return array_list
shuffle_order = np.array(read_single_list('randperm')) - 1 # index from 0
train_dayofweek = read_single_list('PEMS_trainlabels')
train_tensor = read_matrix('PEMS_train')
test_dayofweek = read_single_list('PEMS_testlabels')
test_tensor = read_matrix('PEMS_test')
# Inverse permutate shuffle order
print('Shuffling')
inverse_mapping = {
new_location: previous_location
for previous_location, new_location in enumerate(shuffle_order)
}
reverse_shuffle_order = np.array([
inverse_mapping[new_location]
for new_location, _ in enumerate(shuffle_order)
])
# Group and reoder based on permuation matrix
print('Reodering')
day_of_week = np.array(train_dayofweek + test_dayofweek)
combined_tensor = np.array(train_tensor + test_tensor)
day_of_week = day_of_week[reverse_shuffle_order]
combined_tensor = combined_tensor[reverse_shuffle_order]
# Put everything back into a dataframe
print('Parsing as dataframe')
labels = ['traj_{}'.format(i) for i in read_single_list('stations_list')]
hourly_list = []
for day, day_matrix in enumerate(combined_tensor):
# Hourly data
hourly = pd.DataFrame(day_matrix.T, columns=labels)
hourly['hour_on_day'] = [int(i / 6) for i in hourly.index
] # sampled at 10 min intervals
if hourly['hour_on_day'].max() > 23 or hourly['hour_on_day'].min() < 0:
raise ValueError('Invalid hour! {}-{}'.format(
hourly['hour_on_day'].min(), hourly['hour_on_day'].max()))
hourly = hourly.groupby('hour_on_day', as_index=True).mean()[labels]
hourly['sensor_day'] = day
hourly['time_on_day'] = hourly.index
hourly['day_of_week'] = day_of_week[day]
hourly_list.append(hourly)
hourly_frame = pd.concat(hourly_list, axis=0, ignore_index=True, sort=False)
# Flatten such that each entitiy uses one row in dataframe
store_columns = [c for c in hourly_frame.columns if 'traj' in c]
other_columns = [c for c in hourly_frame.columns if 'traj' not in c]
flat_df = pd.DataFrame(columns=['values', 'prev_values', 'next_values'] +
other_columns + ['id'])
for store in store_columns:
print('Processing {}'.format(store))
sliced = hourly_frame[[store] + other_columns].copy()
sliced.columns = ['values'] + other_columns
sliced['id'] = int(store.replace('traj_', ''))
# Sort by Sensor-date-time
key = sliced['id'].apply(str) \
+ sliced['sensor_day'].apply(lambda x: '_{:03d}'.format(x)) \
+ sliced['time_on_day'].apply(lambda x: '_{:03d}'.format(x))
sliced = sliced.set_index(key).sort_index()
sliced['values'] = sliced['values'].fillna(method='ffill')
sliced['prev_values'] = sliced['values'].shift(1)
sliced['next_values'] = sliced['values'].shift(-1)
flat_df = flat_df.append(sliced.dropna(), ignore_index=True, sort=False)
# Filter to match range used by other academic papers
index = flat_df['sensor_day']
flat_df = flat_df[index < 173].copy()
# Creating columns fo categorical inputs
flat_df['categorical_id'] = flat_df['id'].copy()
flat_df['hours_from_start'] = flat_df['time_on_day'] \
+ flat_df['sensor_day']*24.
flat_df['categorical_day_of_week'] = flat_df['day_of_week'].copy()
flat_df['categorical_time_on_day'] = flat_df['time_on_day'].copy()
flat_df.to_csv(os.path.join(path, 'standarized.csv'))
# XXX needs rework
def standarize_favorita(data_folder):
import gc
# Extract only a subset of data to save/process for efficiency
start_date = pd.datetime(2015, 1, 1)
end_date = pd.datetime(2016, 6, 1)
print('Regenerating data...')
# load temporal data
temporal = pd.read_csv(os.path.join(data_folder, 'train.csv'), index_col=0)
store_info = pd.read_csv(os.path.join(data_folder, 'stores.csv'), index_col=0)
oil = pd.read_csv(
os.path.join(data_folder, 'oil.csv'), index_col=0).iloc[:, 0]
holidays = pd.read_csv(os.path.join(data_folder, 'holidays_events.csv'))
items = pd.read_csv(os.path.join(data_folder, 'items.csv'), index_col=0)
transactions = pd.read_csv(os.path.join(data_folder, 'transactions.csv'))
# Take first 6 months of data
temporal['date'] = pd.to_datetime(temporal['date'])
# Filter dates to reduce storage space requirements
if start_date is not None:
temporal = temporal[(temporal['date'] >= start_date)]
if end_date is not None:
temporal = temporal[(temporal['date'] < end_date)]
dates = temporal['date'].unique()
# Add trajectory identifier
temporal['traj_id'] = temporal['store_nbr'].apply(
str) + '_' + temporal['item_nbr'].apply(str)
temporal['unique_id'] = temporal['traj_id'] + '_' + temporal['date'].apply(
str)
# Remove all IDs with negative returns
print('Removing returns data')
min_returns = temporal['unit_sales'].groupby(temporal['traj_id']).min()
valid_ids = set(min_returns[min_returns >= 0].index)
selector = temporal['traj_id'].apply(lambda traj_id: traj_id in valid_ids)
new_temporal = temporal[selector].copy()
del temporal
gc.collect()
temporal = new_temporal
temporal['open'] = 1
# Resampling
print('Resampling to regular grid')
resampled_dfs = []
for traj_id, raw_sub_df in temporal.groupby('traj_id'):
print('Resampling', traj_id)
sub_df = raw_sub_df.set_index('date', drop=True).copy()
sub_df = sub_df.resample('1d').last()
sub_df['date'] = sub_df.index
sub_df[['store_nbr', 'item_nbr', 'onpromotion']] \
= sub_df[['store_nbr', 'item_nbr', 'onpromotion']].fillna(method='ffill')
sub_df['open'] = sub_df['open'].fillna(
0) # flag where sales data is unknown
sub_df['log_sales'] = np.log(sub_df['unit_sales'])
resampled_dfs.append(sub_df.reset_index(drop=True))
new_temporal = pd.concat(resampled_dfs, axis=0)
del temporal
gc.collect()
temporal = new_temporal
print('Adding oil')
oil.name = 'oil'
oil.index = pd.to_datetime(oil.index)
#XXX the lines below match the value of the oil on given date with the rest of the timeseries
# missing values in oil series are copied from the index before. Then the oil series is joined with
# temporal. Then there are some dates present in temporal which arent present in oil, for which
# oil values is substituted with -1. WHY?!
#TODO: check how many nans there are after first step. Previously oil series was extended by dates
# present in dates variable with nan value, which were forward filled.
# This behavior is no longer supported by pandas, so we changed to DataFrame.isin method.
# This leaves us with more nans after first step than previously. To achieve previous behavior
# we have to join series before filling nans.
temporal = temporal.join(
#oil.loc[oil.index.isin(dates)].fillna(method='ffill'), on='date', how='left')
oil.loc[oil.index.isin(dates)], on='date', how='left')
temporal['oil'] = temporal['oil'].fillna(method='ffill')
temporal['oil'] = temporal['oil'].fillna(-1)
print('Adding store info')
temporal = temporal.join(store_info, on='store_nbr', how='left')
print('Adding item info')
temporal = temporal.join(items, on='item_nbr', how='left')
transactions['date'] = pd.to_datetime(transactions['date'])
temporal = temporal.merge(
transactions,
left_on=['date', 'store_nbr'],
right_on=['date', 'store_nbr'],
how='left')
temporal['transactions'] = temporal['transactions'].fillna(-1)
# Additional date info
temporal['day_of_week'] = pd.to_datetime(temporal['date'].values).dayofweek
temporal['day_of_month'] = pd.to_datetime(temporal['date'].values).day
temporal['month'] = pd.to_datetime(temporal['date'].values).month
# Add holiday info
print('Adding holidays')
holiday_subset = holidays[holidays['transferred'].apply(
lambda x: not x)].copy()
holiday_subset.columns = [
s if s != 'type' else 'holiday_type' for s in holiday_subset.columns
]
holiday_subset['date'] = pd.to_datetime(holiday_subset['date'])
local_holidays = holiday_subset[holiday_subset['locale'] == 'Local']
regional_holidays = holiday_subset[holiday_subset['locale'] == 'Regional']
national_holidays = holiday_subset[holiday_subset['locale'] == 'National']
temporal['national_hol'] = temporal.merge(
national_holidays, left_on=['date'], right_on=['date'],
how='left')['description'].fillna('')
temporal['regional_hol'] = temporal.merge(
regional_holidays,
left_on=['state', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal['local_hol'] = temporal.merge(
local_holidays,
left_on=['city', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal.sort_values('unique_id', inplace=True)
# Transform date to integer index
start_date = pd.to_datetime(min(temporal['date']))
dates = temporal['date'].apply(pd.to_datetime)
temporal['days_from_start'] = (dates - start_date).dt.days
temporal['categorical_id'] = temporal['traj_id'].copy()
print('Saving processed file to {}'.format(os.path.join(data_folder, 'standarized.csv')))
temporal.to_csv(os.path.join(data_folder, 'standarized.csv'))

View file

@ -0,0 +1,73 @@
# Copyright 2021 NVIDIA CORPORATION
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2019 Ross Wightman
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Exponential Moving Average (EMA) of model updates
"""
from collections import OrderedDict
from copy import deepcopy
import torch
import torch.nn as nn
class ModelEma(nn.Module):
""" Model Exponential Moving Average V2
Keep a moving average of everything in the model state_dict (parameters and buffers).
V2 of this module is simpler, it does not match params/buffers based on name but simply
iterates in order. It works with torchscript (JIT of full model).
"""
def __init__(self, model, decay=0.999, device=None):
super().__init__()
# make a copy of the model for accumulating moving average of weights
self.module = deepcopy(model)
self.module.eval()
self.decay = decay
self.device = device # perform ema on different device from model if set
if self.device is not None:
self.module.to(device=device)
def update(self, model):
update_fn=lambda ema_v, model_v: self.decay * ema_v + (1. - self.decay) * model_v
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_(update_fn(ema_v, model_v))
def set(self, model):
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_( model_v )
def forward(self, x):
return self.module(x)

View file

@ -0,0 +1,157 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import math
import os
import pathlib
import re
import pynvml
pynvml.nvmlInit()
def systemGetDriverVersion():
return pynvml.nvmlSystemGetDriverVersion()
def deviceGetCount():
return pynvml.nvmlDeviceGetCount()
class device:
# assume nvml returns list of 64 bit ints
_nvml_affinity_elements = math.ceil(os.cpu_count() / 64)
def __init__(self, device_idx):
super().__init__()
self.handle = pynvml.nvmlDeviceGetHandleByIndex(device_idx)
def getName(self):
return pynvml.nvmlDeviceGetName(self.handle)
def getCpuAffinity(self):
affinity_string = ''
for j in pynvml.nvmlDeviceGetCpuAffinity(
self.handle, device._nvml_affinity_elements
):
# assume nvml returns list of 64 bit ints
affinity_string = '{:064b}'.format(j) + affinity_string
affinity_list = [int(x) for x in affinity_string]
affinity_list.reverse() # so core 0 is in 0th element of list
ret = [i for i, e in enumerate(affinity_list) if e != 0]
return ret
def set_socket_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity)
def set_single_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity[:1])
def set_single_unique_affinity(gpu_id, nproc_per_node):
devices = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in devices]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
affinities = []
assigned = []
for socket_affinity in socket_affinities:
for core in socket_affinity:
if core not in assigned:
affinities.append([core])
assigned.append(core)
break
os.sched_setaffinity(0, affinities[gpu_id])
def set_socket_unique_affinity(gpu_id, nproc_per_node, mode):
device_ids = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in device_ids]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
socket_affinities_to_device_ids = collections.defaultdict(list)
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities_to_device_ids[tuple(socket_affinity)].append(idx)
for socket_affinity, device_ids in socket_affinities_to_device_ids.items():
devices_per_group = len(device_ids)
cores_per_device = len(socket_affinity) // devices_per_group
for group_id, device_id in enumerate(device_ids):
if device_id == gpu_id:
if mode == 'interleaved':
affinity = list(socket_affinity[group_id::devices_per_group])
elif mode == 'continuous':
affinity = list(socket_affinity[group_id*cores_per_device:(group_id+1)*cores_per_device])
else:
raise RuntimeError('Unknown set_socket_unique_affinity mode')
# reintroduce siblings
affinity += [siblings_dict[aff] for aff in affinity if aff in siblings_dict]
os.sched_setaffinity(0, affinity)
def get_thread_siblings_list():
path = '/sys/devices/system/cpu/cpu*/topology/thread_siblings_list'
thread_siblings_list = []
pattern = re.compile(r'(\d+)\D(\d+)')
for fname in pathlib.Path(path[0]).glob(path[1:]):
with open(fname) as f:
content = f.read().strip()
res = pattern.findall(content)
if res:
pair = tuple(map(int, res[0]))
thread_siblings_list.append(pair)
return thread_siblings_list
def set_affinity(gpu_id, nproc_per_node, mode='socket'):
if mode == 'socket':
set_socket_affinity(gpu_id)
elif mode == 'single':
set_single_affinity(gpu_id)
elif mode == 'single_unique':
set_single_unique_affinity(gpu_id, nproc_per_node)
elif mode == 'socket_unique_interleaved':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'interleaved')
elif mode == 'socket_unique_continuous':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'continuous')
else:
raise RuntimeError('Unknown affinity mode')
affinity = os.sched_getaffinity(0)
return affinity

View file

@ -0,0 +1,239 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pandas as pd
import numpy as np
import pickle
import argparse
import torch
from torch.utils.data import DataLoader
from torch.cuda import amp
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from modeling import TemporalFusionTransformer
from configuration import ElectricityConfig
from data_utils import TFTDataset
from utils import PerformanceMeter
from criterions import QuantileLoss
import dllogger
from log_helper import setup_logger
def _unscale_per_id(config, values, ids, scalers):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
flat_values['id'] = ids
df_list = []
for idx, group in flat_values.groupby('id'):
scaler = scalers[idx]
group_copy = group.copy()
for col in group_copy.columns:
if not 'id' in col:
_col = np.expand_dims(group_copy[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
group_copy[col] = _t_col
df_list.append(group_copy)
flat_values = pd.concat(df_list, axis=0)
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def _unscale(config, values, scaler):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
for col in flat_values.columns:
if not 'id' in col:
_col = np.expand_dims(flat_values[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
flat_values[col] = _t_col
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=False):
model.eval()
predictions = []
targets = []
ids = []
perf_meter = PerformanceMeter()
n_workers = args.distributed_world_size if hasattr(args, 'distributed_world_size') else 1
for step, batch in enumerate(data_loader):
perf_meter.reset_current_lap()
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
ids.append(batch['id'][:,0,:])
targets.append(batch['target'])
predictions.append(model(batch).float())
perf_meter.update(args.batch_size * n_workers,
exclude_from_total=step in [0, len(data_loader)-1])
targets = torch.cat(targets, dim=0)
if not extend_targets:
targets = targets[:,config.encoder_length:,:]
predictions = torch.cat(predictions, dim=0)
if config.scale_per_id:
ids = torch.cat(ids, dim=0).cpu().numpy()
unscaled_predictions = torch.stack(
[_unscale_per_id(config, predictions[:,:,i], ids, scalers) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale_per_id(config, targets[:,:,0], ids, scalers).unsqueeze(-1)
else:
ids = None
unscaled_predictions = torch.stack(
[_unscale(config, predictions[:,:,i], scalers['']) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale(config, targets[:,:,0], scalers['']).unsqueeze(-1)
return unscaled_predictions, unscaled_targets, ids, perf_meter
def visualize_v2(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, _ = predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=True)
num_horizons = config.example_length - config.encoder_length + 1
pad = unscaled_predictions.new_full((unscaled_targets.shape[0], unscaled_targets.shape[1] - unscaled_predictions.shape[1], unscaled_predictions.shape[2]), fill_value=float('nan'))
pad[:,-1,:] = unscaled_targets[:,-num_horizons,:]
unscaled_predictions = torch.cat((pad, unscaled_predictions), dim=1)
ids = torch.from_numpy(ids.squeeze())
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items():
for i, ex in enumerate(g):
df = pd.DataFrame(ex.numpy(),
index=range(num_horizons - ex.shape[0], num_horizons),
columns=['target'] + [f'P{int(q*100)}' for q in config.quantiles])
fig = df.plot().get_figure()
ax = fig.get_axes()[0]
_values = df.values[config.encoder_length-1:,:]
ax.fill_between(range(num_horizons), _values[:,1], _values[:,-1], alpha=0.2, color='green')
os.makedirs(os.path.join(args.results, 'single_example_vis', str(key)), exist_ok=True)
fig.savefig(os.path.join(args.results, 'single_example_vis', str(key), f'{i}.pdf'))
def inference(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, perf_meter = predict(args, config, model, data_loader, scalers, cat_encodings)
if args.joint_visualization or args.save_predictions:
ids = torch.from_numpy(ids.squeeze())
#ids = torch.cat([x['id'][0] for x in data_loader.dataset])
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items(): #timeseries id, joint targets and predictions
_g = {'targets': g[:,:,0]}
_g.update({f'P{int(q*100)}':g[:,:,i+1] for i, q in enumerate(config.quantiles)})
if args.joint_visualization:
summary_writer = SummaryWriter(log_dir=os.path.join(args.results, 'predictions_vis', str(key)))
for q, t in _g.items(): # target and quantiles, timehorizon values
if q == 'targets':
targets = torch.cat([t[:,0], t[-1,1:]]) # WIP
# We want to plot targets on the same graph as predictions. Probably could be written better.
for i, val in enumerate(targets):
summary_writer.add_scalars(str(key), {f'{q}':val}, i)
continue
# Tensor t contains different time horizons which are shifted in phase
# Next lines realign them
y = t.new_full((t.shape[0] + t.shape[1] -1, t.shape[1]), float('nan'))
for i in range(y.shape[1]):
y[i:i+t.shape[0], i] = t[:,i]
for i, vals in enumerate(y): # timestep, timehorizon values value
summary_writer.add_scalars(str(key), {f'{q}_t+{j+1}':v for j,v in enumerate(vals) if v == v}, i)
summary_writer.close()
if args.save_predictions:
for q, t in _g.items():
df = pd.DataFrame(t.tolist())
df.columns = [f't+{i+1}' for i in range(len(df.columns))]
os.makedirs(os.path.join(args.results, 'predictions', str(key)), exist_ok=True)
df.to_csv(os.path.join(args.results, 'predictions', str(key), q+'.csv'))
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
q_risk = 2 * losses / normalizer
perf_dict = {
'throughput': perf_meter.avg,
'latency_avg': perf_meter.total_time/len(perf_meter.intervals),
'latency_p90': perf_meter.p(90),
'latency_p95': perf_meter.p(95),
'latency_p99': perf_meter.p(99),
'total_infernece_time': perf_meter.total_time,
}
return q_risk, perf_dict
def main(args):
setup_logger(args)
# Set up model
state_dict = torch.load(args.checkpoint)
config = state_dict['config']
model = TemporalFusionTransformer(config).cuda()
model.load_state_dict(state_dict['model'])
model.eval()
model.cuda()
# Set up dataset
test_split = TFTDataset(args.data, config)
data_loader = DataLoader(test_split, batch_size=args.batch_size, num_workers=4)
scalers = pickle.load(open(args.tgt_scalers, 'rb'))
cat_encodings = pickle.load(open(args.cat_encodings, 'rb'))
if args.visualize:
# TODO: abstract away all forms of visualization.
visualize_v2(args, config, model, data_loader, scalers, cat_encodings)
quantiles, perf_dict = inference(args, config, model, data_loader, scalers, cat_encodings)
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, **perf_dict}
dllogger.log(step=(), data=finish_log, verbosity=1)
print('Test q-risk: P10 {} | P50 {} | P90 {}'.format(*quantiles))
print('Latency:\n\tAverage {:.3f}s\n\tp90 {:.3f}s\n\tp95 {:.3f}s\n\tp99 {:.3f}s'.format(
perf_dict['latency_avg'], perf_dict['latency_p90'], perf_dict['latency_p95'], perf_dict['latency_p99']))
if __name__=='__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--checkpoint', type=str,
help='Path to the checkpoint')
parser.add_argument('--data', type=str,
help='Path to the test split of the dataset')
parser.add_argument('--tgt_scalers', type=str,
help='Path to the tgt_scalers.bin file produced by the preprocessing')
parser.add_argument('--cat_encodings', type=str,
help='Path to the cat_encodings.bin file produced by the preprocessing')
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--visualize', action='store_true', help='Visualize predictions - each example on the separate plot')
parser.add_argument('--joint_visualization', action='store_true', help='Visualize predictions - each timeseries on separate plot. Projections will be concatenated.')
parser.add_argument('--save_predictions', action='store_true')
parser.add_argument('--results', type=str, default='/results')
parser.add_argument('--log_file', type=str, default='dllogger.json')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,141 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import subprocess
import sys
import itertools
import atexit
import dllogger
from dllogger import Backend, JSONStreamBackend, StdOutBackend
import torch.distributed as dist
from torch.utils.tensorboard import SummaryWriter
class TensorBoardBackend(Backend):
def __init__(self, verbosity, log_dir):
super().__init__(verbosity=verbosity)
self.summary_writer = SummaryWriter(log_dir=os.path.join(log_dir, 'TB_summary'),
flush_secs=120,
max_queue=200
)
self.hp_cache = None
atexit.register(self.summary_writer.close)
@property
def log_level(self):
return self._log_level
def metadata(self, timestamp, elapsedtime, metric, metadata):
pass
def log(self, timestamp, elapsedtime, step, data):
if step == 'HPARAMS':
parameters = {k: v for k, v in data.items() if not isinstance(v, (list, tuple))}
#Unpack list and tuples
for d in [{k+f'_{i}':v for i,v in enumerate(l)} for k,l in data.items() if isinstance(l, (list, tuple))]:
parameters.update(d)
#Remove custom classes
parameters = {k: v for k, v in data.items() if isinstance(v, (int, float, str, bool))}
parameters.update({k:'None' for k, v in data.items() if v is None})
self.hp_cache = parameters
if step == ():
if self.hp_cache is None:
print('Warning: Cannot save HParameters. Please log HParameters with step=\'HPARAMS\'', file=sys.stderr)
return
self.summary_writer.add_hparams(self.hp_cache, data)
if not isinstance(step, int):
return
for k, v in data.items():
self.summary_writer.add_scalar(k, v, step)
def flush(self):
pass
def setup_logger(args):
os.makedirs(args.results, exist_ok=True)
log_path = os.path.join(args.results, args.log_file)
if os.path.exists(log_path):
for i in itertools.count():
s_fname = args.log_file.split('.')
fname = '.'.join(s_fname[:-1]) + f'_{i}.' + s_fname[-1] if len(s_fname) > 1 else args.stat_file + f'.{i}'
log_path = os.path.join(args.results, fname)
if not os.path.exists(log_path):
break
def metric_format(metric, metadata, value):
return "{}: {}".format(metric, f'{value:.5f}' if isinstance(value, float) else value)
def step_format(step):
if step == ():
return "Finished |"
elif isinstance(step, int):
return "Step {0: <5} |".format(step)
return "Step {} |".format(step)
if not dist.is_initialized() or not args.distributed_world_size > 1 or args.distributed_rank == 0:
dllogger.init(backends=[JSONStreamBackend(verbosity=1, filename=log_path),
TensorBoardBackend(verbosity=1, log_dir=args.results),
StdOutBackend(verbosity=2,
step_format=step_format,
prefix_format=lambda x: "")#,
#metric_format=metric_format)
])
else:
dllogger.init(backends=[])
dllogger.log(step='PARAMETER', data=vars(args), verbosity=0)
container_setup_info = {**get_framework_env_vars(), **get_system_info()}
dllogger.log(step='ENVIRONMENT', data=container_setup_info, verbosity=0)
dllogger.metadata('loss', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'TRAIN', 'format': ':1f'})
dllogger.metadata('val_loss', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format':':5f'})
dllogger.metadata('val_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'VAL', 'format': ':1f'})
dllogger.metadata('test_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('throughput', {'GOAL': 'MAXIMIZE', 'STAGE': 'TEST', 'format': ':1f'})
dllogger.metadata('latency_p90', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p95', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p99', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
def get_framework_env_vars():
return {
'NVIDIA_PYTORCH_VERSION': os.environ.get('NVIDIA_PYTORCH_VERSION'),
'PYTORCH_VERSION': os.environ.get('PYTORCH_VERSION'),
'CUBLAS_VERSION': os.environ.get('CUBLAS_VERSION'),
'NCCL_VERSION': os.environ.get('NCCL_VERSION'),
'CUDA_DRIVER_VERSION': os.environ.get('CUDA_DRIVER_VERSION'),
'CUDNN_VERSION': os.environ.get('CUDNN_VERSION'),
'CUDA_VERSION': os.environ.get('CUDA_VERSION'),
'NVIDIA_PIPELINE_ID': os.environ.get('NVIDIA_PIPELINE_ID'),
'NVIDIA_BUILD_ID': os.environ.get('NVIDIA_BUILD_ID'),
'NVIDIA_TF32_OVERRIDE': os.environ.get('NVIDIA_TF32_OVERRIDE'),
}
def get_system_info():
system_info = subprocess.run('nvidia-smi --query-gpu=gpu_name,memory.total,enforced.power.limit --format=csv'.split(), capture_output=True).stdout
system_info = [i.decode('utf-8') for i in system_info.split(b'\n')]
system_info = [x for x in system_info if x]
return {'system_info': system_info}

View file

@ -0,0 +1,367 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
from typing import Dict, Tuple, Optional, List
if os.environ.get("TFT_SCRIPTING", False):
from torch.nn import LayerNorm
else:
from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm
class MaybeLayerNorm(nn.Module):
def __init__(self, output_size, hidden_size, eps):
super().__init__()
if output_size and output_size == 1:
self.ln = nn.Identity()
else:
self.ln = LayerNorm(output_size if output_size else hidden_size, eps=eps)
def forward(self, x):
return self.ln(x)
class GLU(nn.Module):
def __init__(self, hidden_size, output_size):
super().__init__()
self.lin = nn.Linear(hidden_size, output_size * 2)
def forward(self, x: Tensor) -> Tensor:
x = self.lin(x)
x = F.glu(x)
return x
class GRN(nn.Module):
def __init__(self,
input_size,
hidden_size,
output_size=None,
context_hidden_size=None,
dropout=0):
super().__init__()
self.layer_norm = MaybeLayerNorm(output_size, hidden_size, eps=1e-3)
self.lin_a = nn.Linear(input_size, hidden_size)
if context_hidden_size is not None:
self.lin_c = nn.Linear(context_hidden_size, hidden_size, bias=False)
self.lin_i = nn.Linear(hidden_size, hidden_size)
self.glu = GLU(hidden_size, output_size if output_size else hidden_size)
self.dropout = nn.Dropout(dropout)
self.out_proj = nn.Linear(input_size, output_size) if output_size else None
def forward(self, a: Tensor, c: Optional[Tensor] = None):
x = self.lin_a(a)
if c is not None:
x = x + self.lin_c(c).unsqueeze(1)
x = F.elu(x)
x = self.lin_i(x)
x = self.dropout(x)
x = self.glu(x)
y = a if not self.out_proj else self.out_proj(a)
x = x + y
x = self.layer_norm(x)
return x
class TFTEmbedding(nn.Module):
def __init__(self, config):
super().__init__()
self.s_cat_inp_lens = config.static_categorical_inp_lens
self.t_cat_k_inp_lens = config.temporal_known_categorical_inp_lens
self.t_cat_o_inp_lens = config.temporal_observed_categorical_inp_lens
self.s_cont_inp_size = config.static_continuous_inp_size
self.t_cont_k_inp_size = config.temporal_known_continuous_inp_size
self.t_cont_o_inp_size = config.temporal_observed_continuous_inp_size
self.t_tgt_size = config.temporal_target_size
self.hidden_size = config.hidden_size
# There are 7 types of input:
# 1. Static categorical
# 2. Static continuous
# 3. Temporal known a priori categorical
# 4. Temporal known a priori continuous
# 5. Temporal observed categorical
# 6. Temporal observed continuous
# 7. Temporal observed targets (time series obseved so far)
self.s_cat_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.s_cat_inp_lens]) if self.s_cat_inp_lens else None
self.t_cat_k_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_k_inp_lens]) if self.t_cat_k_inp_lens else None
self.t_cat_o_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_o_inp_lens]) if self.t_cat_o_inp_lens else None
self.s_cont_embedding_vectors = nn.Parameter(torch.Tensor(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_vectors = nn.Parameter(torch.Tensor(self.t_tgt_size, self.hidden_size))
self.s_cont_embedding_bias = nn.Parameter(torch.zeros(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_bias = nn.Parameter(torch.zeros(self.t_tgt_size, self.hidden_size))
if self.s_cont_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.s_cont_embedding_vectors)
if self.t_cont_k_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_k_embedding_vectors)
if self.t_cont_o_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_o_embedding_vectors)
torch.nn.init.xavier_normal_(self.t_tgt_embedding_vectors)
def _apply_embedding(self,
cat: Optional[Tensor],
cont: Optional[Tensor],
cat_emb: Optional[nn.ModuleList],
cont_emb: Tensor,
cont_bias: Tensor,
) -> Tuple[Optional[Tensor], Optional[Tensor]]:
e_cat = torch.stack([embed(cat[...,i]) for i, embed in enumerate(cat_emb)], dim=-2) if cat is not None else None
if cont is not None:
#the line below is equivalent to following einsums
#e_cont = torch.einsum('btf,fh->bthf', cont, cont_emb)
#e_cont = torch.einsum('bf,fh->bhf', cont, cont_emb)
e_cont = torch.mul(cont.unsqueeze(-1), cont_emb)
e_cont = e_cont + cont_bias
else:
e_cont = None
if e_cat is not None and e_cont is not None:
return torch.cat([e_cat, e_cont], dim=-2)
elif e_cat is not None:
return e_cat
elif e_cont is not None:
return e_cont
else:
return None
def forward(self, x: Dict[str, Tensor]):
# temporal/static categorical/continuous known/observed input
s_cat_inp = x.get('s_cat', None)
s_cont_inp = x.get('s_cont', None)
t_cat_k_inp = x.get('k_cat', None)
t_cont_k_inp = x.get('k_cont', None)
t_cat_o_inp = x.get('o_cat', None)
t_cont_o_inp = x.get('o_cont', None)
t_tgt_obs = x['target'] # Has to be present
# Static inputs are expected to be equal for all timesteps
# For memory efficiency there is no assert statement
s_cat_inp = s_cat_inp[:,0,:] if s_cat_inp is not None else None
s_cont_inp = s_cont_inp[:,0,:] if s_cont_inp is not None else None
s_inp = self._apply_embedding(s_cat_inp,
s_cont_inp,
self.s_cat_embed,
self.s_cont_embedding_vectors,
self.s_cont_embedding_bias)
t_known_inp = self._apply_embedding(t_cat_k_inp,
t_cont_k_inp,
self.t_cat_k_embed,
self.t_cont_k_embedding_vectors,
self.t_cont_k_embedding_bias)
t_observed_inp = self._apply_embedding(t_cat_o_inp,
t_cont_o_inp,
self.t_cat_o_embed,
self.t_cont_o_embedding_vectors,
self.t_cont_o_embedding_bias)
# Temporal observed targets
# t_observed_tgt = torch.einsum('btf,fh->btfh', t_tgt_obs, self.t_tgt_embedding_vectors)
t_observed_tgt = torch.matmul(t_tgt_obs.unsqueeze(3).unsqueeze(4), self.t_tgt_embedding_vectors.unsqueeze(1)).squeeze(3)
t_observed_tgt = t_observed_tgt + self.t_tgt_embedding_bias
return s_inp, t_known_inp, t_observed_inp, t_observed_tgt
class VariableSelectionNetwork(nn.Module):
def __init__(self, config, num_inputs):
super().__init__()
self.joint_grn = GRN(config.hidden_size*num_inputs, config.hidden_size, output_size=num_inputs, context_hidden_size=config.hidden_size)
self.var_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(num_inputs)])
def forward(self, x: Tensor, context: Optional[Tensor] = None):
Xi = x.reshape(*x.shape[:-2], -1)
grn_outputs = self.joint_grn(Xi, c=context)
sparse_weights = F.softmax(grn_outputs, dim=-1)
transformed_embed_list = [m(x[...,i,:]) for i, m in enumerate(self.var_grns)]
transformed_embed = torch.stack(transformed_embed_list, dim=-1)
#the line below performs batched matrix vector multiplication
#for temporal features it's bthf,btf->bth
#for static features it's bhf,bf->bh
variable_ctx = torch.matmul(transformed_embed, sparse_weights.unsqueeze(-1)).squeeze(-1)
return variable_ctx, sparse_weights
class StaticCovariateEncoder(nn.Module):
def __init__(self, config):
super().__init__()
self.vsn = VariableSelectionNetwork(config, config.num_static_vars)
self.context_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(4)])
def forward(self, x: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
variable_ctx, sparse_weights = self.vsn(x)
# Context vectors:
# variable selection context
# enrichment context
# state_c context
# state_h context
cs, ce, ch, cc = tuple(m(variable_ctx) for m in self.context_grns)
return cs, ce, ch, cc
class InterpretableMultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.n_head = config.n_head
assert config.hidden_size % config.n_head == 0
self.d_head = config.hidden_size // config.n_head
self.qkv_linears = nn.Linear(config.hidden_size, (2 * self.n_head + 1) * self.d_head, bias=False)
self.out_proj = nn.Linear(self.d_head, config.hidden_size, bias=False)
self.attn_dropout = nn.Dropout(config.attn_dropout)
self.out_dropout = nn.Dropout(config.dropout)
self.scale = self.d_head**-0.5
self.register_buffer("_mask", torch.triu(torch.full((config.example_length, config.example_length), float('-inf')), 1).unsqueeze(0))
def forward(self, x: Tensor, mask_future_timesteps: bool = True) -> Tuple[Tensor, Tensor]:
bs, t, h_size = x.shape
qkv = self.qkv_linears(x)
q, k, v = qkv.split((self.n_head * self.d_head, self.n_head * self.d_head, self.d_head), dim=-1)
q = q.view(bs, t, self.n_head, self.d_head)
k = k.view(bs, t, self.n_head, self.d_head)
v = v.view(bs, t, self.d_head)
# attn_score = torch.einsum('bind,bjnd->bnij', q, k)
attn_score = torch.matmul(q.permute((0, 2, 1, 3)), k.permute((0, 2, 3, 1)))
attn_score.mul_(self.scale)
if mask_future_timesteps:
attn_score = attn_score + self._mask
attn_prob = F.softmax(attn_score, dim=3)
attn_prob = self.attn_dropout(attn_prob)
# attn_vec = torch.einsum('bnij,bjd->bnid', attn_prob, v)
attn_vec = torch.matmul(attn_prob, v.unsqueeze(1))
m_attn_vec = torch.mean(attn_vec, dim=1)
out = self.out_proj(m_attn_vec)
out = self.out_dropout(out)
return out, attn_vec
class TemporalFusionTransformer(nn.Module):
"""
Implementation of https://arxiv.org/abs/1912.09363
"""
def __init__(self, config):
super().__init__()
if hasattr(config, 'model'):
config = config.model
self.encoder_length = config.encoder_length #this determines from how distant past we want to use data from
self.embedding = TFTEmbedding(config)
self.static_encoder = StaticCovariateEncoder(config)
self.history_vsn = VariableSelectionNetwork(config, config.num_historic_vars)
self.history_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.future_vsn = VariableSelectionNetwork(config, config.num_future_vars)
self.future_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.input_gate = GLU(config.hidden_size, config.hidden_size)
self.input_gate_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.enrichment_grn = GRN(config.hidden_size,
config.hidden_size,
context_hidden_size=config.hidden_size,
dropout=config.dropout)
self.attention = InterpretableMultiHeadAttention(config)
self.attention_gate = GLU(config.hidden_size, config.hidden_size)
self.attention_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.positionwise_grn = GRN(config.hidden_size,
config.hidden_size,
dropout=config.dropout)
self.decoder_gate = GLU(config.hidden_size, config.hidden_size)
self.decoder_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.quantile_proj = nn.Linear(config.hidden_size, len(config.quantiles))
def forward(self, x: Dict[str, Tensor]) -> Tensor:
s_inp, t_known_inp, t_observed_inp, t_observed_tgt = self.embedding(x)
# Static context
cs, ce, ch, cc = self.static_encoder(s_inp)
ch, cc = ch.unsqueeze(0), cc.unsqueeze(0) #lstm initial states
# Temporal input
_historical_inputs = [t_known_inp[:,:self.encoder_length,:], t_observed_tgt[:,:self.encoder_length,:]]
if t_observed_inp is not None:
_historical_inputs.insert(0,t_observed_inp[:,:self.encoder_length,:])
historical_inputs = torch.cat(_historical_inputs, dim=-2)
future_inputs = t_known_inp[:, self.encoder_length:]
# Encoders
historical_features, _ = self.history_vsn(historical_inputs, cs)
history, state = self.history_encoder(historical_features, (ch, cc))
future_features, _ = self.future_vsn(future_inputs, cs)
future, _ = self.future_encoder(future_features, state)
torch.cuda.synchronize() # this call gives perf boost for unknown reasons
# skip connection
input_embedding = torch.cat([historical_features, future_features], dim=1)
temporal_features = torch.cat([history, future], dim=1)
temporal_features = self.input_gate(temporal_features)
temporal_features = temporal_features + input_embedding
temporal_features = self.input_gate_ln(temporal_features)
# Static enrichment
enriched = self.enrichment_grn(temporal_features, c=ce)
# Temporal self attention
x, _ = self.attention(enriched, mask_future_timesteps=True)
# Don't compute hictorical quantiles
x = x[:, self.encoder_length:, :]
temporal_features = temporal_features[:, self.encoder_length:, :]
enriched = enriched[:, self.encoder_length:, :]
x = self.attention_gate(x)
x = x + enriched
x = self.attention_ln(x)
# Position-wise feed-forward
x = self.positionwise_grn(x)
# Final skip connection
x = self.decoder_gate(x)
x = x + temporal_features
x = self.decoder_ln(x)
out = self.quantile_proj(x)
return out

View file

@ -0,0 +1 @@
tensorboard

View file

@ -0,0 +1,54 @@
#! /bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
NUM_GPUS=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
[ $NUM_GPUS -eq 16 ] && WORKER_NUMS=(1 8 16) || WORKER_NUMS=(1 8)
DATASETS=(electricity traffic)
rm -r /tmp/benchmark_results
for DATASET in ${DATASETS[@]}
do
for NGPU in ${WORKER_NUMS[@]}
do
for BATCH_SIZE in 512 1024 1536 2048 2560
do
for USE_AMP in --use_amp ""
do
for AFFINITY in "--affinity disabled" "--affinity single" "--affinity socket_unique_interleaved"
do
EXP_NAME="TFT_benchmark_${DATASET}_BS_${BATCH_SIZE}_${NGPU}GPU${USE_AMP}_${AFFINITY}"
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset ${DATASET} \
--data_path /data/processed/${DATASET}_bin \
--batch_size=${BATCH_SIZE} \
--lr 5e-4 \
--epochs 1 \
--sample 100000 5000 \
--seed 1 \
${USE_AMP} \
${AFFINITY} \
--clip_grad 0.1 \
--results /tmp/benchmark_results/${EXP_NAME}
done
done
done
done
done
for P in `ls /tmp/benchmark_results/`;
do
echo ${P}
tail -n 1 /tmp/benchmark_results/${P}/dllogger.json
done

View file

@ -0,0 +1,40 @@
#!/bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
DATAPATH='/data'
declare -A URLS=( ['electricity']='https://archive.ics.uci.edu/ml/machine-learning-databases/00321/LD2011_2014.txt.zip'
['traffic']='https://archive.ics.uci.edu/ml/machine-learning-databases/00204/PEMS-SF.zip'
)
mkdir -p ${DATAPATH}/raw
mkdir -p ${DATAPATH}/processed
for DS in electricity traffic
do
DS_PATH=${DATAPATH}/raw/${DS}
ZIP_FNAME=${DS_PATH}.zip
if [ ! -d ${DS_PATH} ]
then
wget "${URLS[${DS}]}" -O ${ZIP_FNAME}
unzip ${ZIP_FNAME} -d ${DS_PATH}
fi
python -c "from data_utils import standarize_${DS} as standarize; standarize(\"${DS_PATH}\")"
python -c "from data_utils import preprocess; \
from configuration import ${DS^}Config as Config; \
preprocess(\"${DS_PATH}/standarized.csv\", \"${DATAPATH}/processed/${DS}_bin\", Config())"
done

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,294 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import time
import os
import pickle
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.distributed as dist
from torch.utils.data import DataLoader, DistributedSampler, RandomSampler
from apex import amp
from apex.optimizers import FusedAdam
#from torch.nn.parallel import DistributedDataParallel as DDP
from apex.parallel import DistributedDataParallel as DDP
import numpy as np
import dllogger
from modeling import TemporalFusionTransformer
from configuration import CONFIGS
from data_utils import TFTBinaryDataset, sample_data
from log_helper import setup_logger
from criterions import QuantileLoss
from inference import predict
from utils import PerformanceMeter
import gpu_affinity
from ema import ModelEma
def load_dataset(args, config):
train_split = TFTBinaryDataset(os.path.join(args.data_path, 'train.bin'), config)
train_split = sample_data(train_split, args.sample_data[0])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(train_split, args.distributed_world_size, args.distributed_rank, seed=args.seed + args.distributed_rank, drop_last=True)
else:
data_sampler = RandomSampler(train_split)
train_loader = DataLoader(train_split, batch_size=args.batch_size, num_workers=4, sampler=data_sampler, pin_memory=True)
valid_split = TFTBinaryDataset(os.path.join(args.data_path, 'valid.bin'), config)
valid_split = sample_data(valid_split, args.sample_data[1])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(valid_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
valid_loader = DataLoader(valid_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
test_split = TFTBinaryDataset(os.path.join(args.data_path, 'test.bin'), config)
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(test_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
test_loader = DataLoader(test_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
print_once(f'Train split length: {len(train_split)}')
print_once(f'Valid split length: {len(valid_split)}')
print_once(f'Test split length: {len(test_split)}')
return train_loader, valid_loader, test_loader
def print_once(*args, **kwargs):
if not dist.is_initialized() or dist.get_rank() == 0:
print(*args, **kwargs)
def main(args):
# Enable CuDNN autotuner
nproc_per_node = torch.cuda.device_count()
if args.affinity != 'disabled':
affinity = gpu_affinity.set_affinity(
args.local_rank,
nproc_per_node,
args.affinity
)
print(f'{args.local_rank}: thread affinity: {affinity}')
torch.backends.cudnn.benchmark = True
### INIT DISTRIBUTED
if args.distributed_world_size > 1:
args.local_rank = int(os.environ.get('LOCAL_RANK', args.local_rank))
torch.cuda.set_device(args.local_rank)
dist.init_process_group(backend='nccl', init_method='env://')
args.distributed_world_size = int(os.environ['WORLD_SIZE'])
args.distributed_rank = dist.get_rank()
print_once(f'Distributed training with {args.distributed_world_size} GPUs')
torch.cuda.synchronize()
if args.seed:
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
setup_logger(args)
config = CONFIGS[args.dataset]()
if args.overwrite_config:
config.__dict__.update(json.loads(args.overwrite_config))
dllogger.log(step='HPARAMS', data={**vars(args), **vars(config)}, verbosity=1)
model = TemporalFusionTransformer(config).cuda()
if args.ema_decay:
model_ema = ModelEma(model, decay=args.ema_decay)
print_once('Model params: {}'.format(sum(p.numel() for p in model.parameters())))
criterion = QuantileLoss(config).cuda()
optimizer = FusedAdam(model.parameters(), lr=args.lr)
if args.use_amp:
model, optimizer = amp.initialize(model, optimizer, opt_level="O2", loss_scale="dynamic")
if args.distributed_world_size > 1:
#model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True)
model = DDP(model)
train_loader, valid_loader, test_loader = load_dataset(args, config)
global_step = 0
perf_meter = PerformanceMeter()
for epoch in range(args.epochs):
start = time.time()
dllogger.log(step=global_step, data={'epoch': epoch}, verbosity=1)
model.train()
for local_step, batch in enumerate(train_loader):
perf_meter.reset_current_lap()
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
loss = p_losses.sum()
if args.use_amp:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if not args.grad_accumulation or (global_step+1) % args.grad_accumulation == 0:
if args.clip_grad:
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip_grad)
optimizer.step()
optimizer.zero_grad()
if args.ema_decay:
model_ema.update(model)
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses /= args.distributed_world_size
loss = p_losses.sum()
torch.cuda.synchronize()
ips = perf_meter.update(args.batch_size * args.distributed_world_size,
exclude_from_total=local_step in [0, len(train_loader)-1])
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': loss.item(), 'items/s':ips}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
global_step += 1
validate(args, config, model_ema if args.ema_decay else model, criterion, valid_loader, global_step)
if validate.early_stop_c >= args.early_stopping:
print_once('Early stopping')
break
### TEST PHASE ###
state_dict = torch.load(os.path.join(args.results, 'checkpoint.pt'), map_location='cpu')
if isinstance(model, DDP):
model.module.load_state_dict(state_dict['model'])
else:
model.load_state_dict(state_dict['model'])
model.cuda().eval()
tgt_scalers = pickle.load(open(os.path.join(args.data_path, 'tgt_scalers.bin'), 'rb'))
cat_encodings = pickle.load(open(os.path.join(args.data_path,'cat_encodings.bin'), 'rb'))
unscaled_predictions, unscaled_targets, _, _ = predict(args, config, model, test_loader, tgt_scalers, cat_encodings)
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
quantiles = 2 * losses / normalizer
if args.distributed_world_size > 1:
quantiles = quantiles.cuda()
dist.all_reduce(quantiles)
quantiles /= args.distributed_world_size
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, 'average_ips':perf_meter.avg, 'convergence_step':validate.conv_step}
dllogger.log(step=(), data=finish_log, verbosity=1)
def validate(args, config, model, criterion, dataloader, global_step):
if not hasattr(validate, 'best_valid_loss'):
validate.best_valid_loss = float('inf')
if not hasattr(validate, 'early_stop_c'):
validate.early_stop_c = 0
model.eval()
losses = []
validation_start = time.time()
for batch in dataloader:
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
bs = next(t for t in batch.values() if t is not None).shape[0]
losses.append((p_losses, bs))
validation_end = time.time()
p_losses = sum([l[0]*l[1] for l in losses])/sum([l[1] for l in losses]) #takes into accunt that the last batch is not full
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses = p_losses/args.distributed_world_size
ips = len(dataloader.dataset) / (validation_end - validation_start)
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': p_losses.sum().item(), 'items/s':ips}
if log_dict['loss'] < validate.best_valid_loss:
validate.best_valid_loss = log_dict['loss']
validate.early_stop_c = 0
validate.conv_step = global_step
if not dist.is_initialized() or dist.get_rank() == 0:
state_dict = model.module.state_dict() if isinstance(model, (DDP, ModelEma)) else model.state_dict()
ckpt = {'args':args, 'config':config, 'model':state_dict}
torch.save(ckpt, os.path.join(args.results, 'checkpoint.pt'))
if args.distributed_world_size > 1:
dist.barrier()
else:
validate.early_stop_c += 1
log_dict = {'val_'+k:v for k,v in log_dict.items()}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', type=str, required=True,
help='Path to the dataset')
parser.add_argument('--dataset', type=str, required=True, choices=CONFIGS.keys(),
help='Dataset name')
parser.add_argument('--epochs', type=int, default=25,
help='Default number of training epochs')
parser.add_argument('--sample_data', type=lambda x: int(float(x)), nargs=2, default=[-1, -1],
help="""Subsample the dataset. Specify number of training and valid examples.
Values can be provided in scientific notation. Floats will be truncated.""")
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--lr', type=float, default=1e-3)
parser.add_argument('--seed', type=int, default=1)
parser.add_argument('--use_amp', action='store_true', help='Enable automatic mixed precision')
parser.add_argument('--clip_grad', type=float, default=0.0)
parser.add_argument('--grad_accumulation', type=int, default=0)
parser.add_argument('--early_stopping', type=int, default=1000,
help='Stop training if validation loss does not improve for more than this number of epochs.')
parser.add_argument('--results', type=str, default='/results',
help='Directory in which results are stored')
parser.add_argument('--log_file', type=str, default='dllogger.json',
help='Name of dllogger output file')
parser.add_argument('--distributed_world_size', type=int, metavar='N',
default=torch.cuda.device_count(),
help='total number of GPUs across all nodes (default: all visible GPUs)')
parser.add_argument('--distributed_rank', default=os.getenv('LOCAL_RANK', 0), type=int,
help='rank of the current worker')
parser.add_argument('--local_rank', default=0, type=int,
help='rank of the current worker')
parser.add_argument('--overwrite_config', type=str, default='',
help='JSON string used to overload config')
parser.add_argument('--affinity', type=str,
default='socket_unique_interleaved',
choices=['socket', 'single', 'single_unique',
'socket_unique_interleaved',
'socket_unique_continuous',
'disabled'],
help='type of CPU affinity')
parser.add_argument("--ema_decay", type=float, default=0.0, help='Use exponential moving average')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,46 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
class PerformanceMeter():
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.count = 0
self.total_time = 0
self.last_update_time = time.time()
self.intervals = []
def update(self, n, exclude_from_total=False):
delta = time.time() - self.last_update_time
self.intervals.append(delta)
if not exclude_from_total:
self.total_time += delta
self.count += n
self.avg = self.count / self.total_time
self.last_update_time = time.time()
return n/delta
def reset_current_lap(self):
self.last_update_time = time.time()
def p(self, i):
assert i <= 100
idx = int(len(self.intervals) * i / 100)
return sorted(self.intervals)[idx]

View file

@ -0,0 +1,294 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import time
import os
import pickle
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.distributed as dist
from torch.utils.data import DataLoader, DistributedSampler, RandomSampler
from apex import amp
from apex.optimizers import FusedAdam
#from torch.nn.parallel import DistributedDataParallel as DDP
from apex.parallel import DistributedDataParallel as DDP
import numpy as np
import dllogger
from modeling import TemporalFusionTransformer
from configuration import CONFIGS
from data_utils import TFTBinaryDataset, sample_data
from log_helper import setup_logger
from criterions import QuantileLoss
from inference import predict
from utils import PerformanceMeter
import gpu_affinity
from ema import ModelEma
def load_dataset(args, config):
train_split = TFTBinaryDataset(os.path.join(args.data_path, 'train.bin'), config)
train_split = sample_data(train_split, args.sample_data[0])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(train_split, args.distributed_world_size, args.distributed_rank, seed=args.seed + args.distributed_rank, drop_last=True)
else:
data_sampler = RandomSampler(train_split)
train_loader = DataLoader(train_split, batch_size=args.batch_size, num_workers=4, sampler=data_sampler, pin_memory=True)
valid_split = TFTBinaryDataset(os.path.join(args.data_path, 'valid.bin'), config)
valid_split = sample_data(valid_split, args.sample_data[1])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(valid_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
valid_loader = DataLoader(valid_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
test_split = TFTBinaryDataset(os.path.join(args.data_path, 'test.bin'), config)
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(test_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
test_loader = DataLoader(test_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
print_once(f'Train split length: {len(train_split)}')
print_once(f'Valid split length: {len(valid_split)}')
print_once(f'Test split length: {len(test_split)}')
return train_loader, valid_loader, test_loader
def print_once(*args, **kwargs):
if not dist.is_initialized() or dist.get_rank() == 0:
print(*args, **kwargs)
def main(args):
# Enable CuDNN autotuner
nproc_per_node = torch.cuda.device_count()
if args.affinity != 'disabled':
affinity = gpu_affinity.set_affinity(
args.local_rank,
nproc_per_node,
args.affinity
)
print(f'{args.local_rank}: thread affinity: {affinity}')
torch.backends.cudnn.benchmark = True
### INIT DISTRIBUTED
if args.distributed_world_size > 1:
args.local_rank = int(os.environ.get('LOCAL_RANK', args.local_rank))
torch.cuda.set_device(args.local_rank)
dist.init_process_group(backend='nccl', init_method='env://')
args.distributed_world_size = int(os.environ['WORLD_SIZE'])
args.distributed_rank = dist.get_rank()
print_once(f'Distributed training with {args.distributed_world_size} GPUs')
torch.cuda.synchronize()
if args.seed:
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
setup_logger(args)
config = CONFIGS[args.dataset]()
if args.overwrite_config:
config.__dict__.update(json.loads(args.overwrite_config))
dllogger.log(step='HPARAMS', data={**vars(args), **vars(config)}, verbosity=1)
model = TemporalFusionTransformer(config).cuda()
if args.ema_decay:
model_ema = ModelEma(model, decay=args.ema_decay)
print_once('Model params: {}'.format(sum(p.numel() for p in model.parameters())))
criterion = QuantileLoss(config).cuda()
optimizer = FusedAdam(model.parameters(), lr=args.lr)
if args.use_amp:
model, optimizer = amp.initialize(model, optimizer, opt_level="O2", loss_scale="dynamic")
if args.distributed_world_size > 1:
#model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True)
model = DDP(model)
train_loader, valid_loader, test_loader = load_dataset(args, config)
global_step = 0
perf_meter = PerformanceMeter()
for epoch in range(args.epochs):
start = time.time()
dllogger.log(step=global_step, data={'epoch': epoch}, verbosity=1)
model.train()
for local_step, batch in enumerate(train_loader):
perf_meter.reset_current_lap()
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
loss = p_losses.sum()
if args.use_amp:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if not args.grad_accumulation or (global_step+1) % args.grad_accumulation == 0:
if args.clip_grad:
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip_grad)
optimizer.step()
optimizer.zero_grad()
if args.ema_decay:
model_ema.update(model)
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses /= args.distributed_world_size
loss = p_losses.sum()
torch.cuda.synchronize()
ips = perf_meter.update(args.batch_size * args.distributed_world_size,
exclude_from_total=local_step in [0, len(train_loader)-1])
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': loss.item(), 'items/s':ips}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
global_step += 1
validate(args, config, model_ema if args.ema_decay else model, criterion, valid_loader, global_step)
if validate.early_stop_c >= args.early_stopping:
print_once('Early stopping')
break
### TEST PHASE ###
state_dict = torch.load(os.path.join(args.results, 'checkpoint.pt'), map_location='cpu')
if isinstance(model, DDP):
model.module.load_state_dict(state_dict['model'])
else:
model.load_state_dict(state_dict['model'])
model.cuda().eval()
tgt_scalers = pickle.load(open(os.path.join(args.data_path, 'tgt_scalers.bin'), 'rb'))
cat_encodings = pickle.load(open(os.path.join(args.data_path,'cat_encodings.bin'), 'rb'))
unscaled_predictions, unscaled_targets, _, _ = predict(args, config, model, test_loader, tgt_scalers, cat_encodings)
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
quantiles = 2 * losses / normalizer
if args.distributed_world_size > 1:
quantiles = quantiles.cuda()
dist.all_reduce(quantiles)
quantiles /= args.distributed_world_size
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, 'average_ips':perf_meter.avg, 'convergence_step':validate.conv_step}
dllogger.log(step=(), data=finish_log, verbosity=1)
def validate(args, config, model, criterion, dataloader, global_step):
if not hasattr(validate, 'best_valid_loss'):
validate.best_valid_loss = float('inf')
if not hasattr(validate, 'early_stop_c'):
validate.early_stop_c = 0
model.eval()
losses = []
validation_start = time.time()
for batch in dataloader:
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
bs = next(t for t in batch.values() if t is not None).shape[0]
losses.append((p_losses, bs))
validation_end = time.time()
p_losses = sum([l[0]*l[1] for l in losses])/sum([l[1] for l in losses]) #takes into accunt that the last batch is not full
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses = p_losses/args.distributed_world_size
ips = len(dataloader.dataset) / (validation_end - validation_start)
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': p_losses.sum().item(), 'items/s':ips}
if log_dict['loss'] < validate.best_valid_loss:
validate.best_valid_loss = log_dict['loss']
validate.early_stop_c = 0
validate.conv_step = global_step
if not dist.is_initialized() or dist.get_rank() == 0:
state_dict = model.module.state_dict() if isinstance(model, (DDP, ModelEma)) else model.state_dict()
ckpt = {'args':args, 'config':config, 'model':state_dict}
torch.save(ckpt, os.path.join(args.results, 'checkpoint.pt'))
if args.distributed_world_size > 1:
dist.barrier()
else:
validate.early_stop_c += 1
log_dict = {'val_'+k:v for k,v in log_dict.items()}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', type=str, required=True,
help='Path to the dataset')
parser.add_argument('--dataset', type=str, required=True, choices=CONFIGS.keys(),
help='Dataset name')
parser.add_argument('--epochs', type=int, default=25,
help='Default number of training epochs')
parser.add_argument('--sample_data', type=lambda x: int(float(x)), nargs=2, default=[-1, -1],
help="""Subsample the dataset. Specify number of training and valid examples.
Values can be provided in scientific notation. Floats will be truncated.""")
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--lr', type=float, default=1e-3)
parser.add_argument('--seed', type=int, default=1)
parser.add_argument('--use_amp', action='store_true', help='Enable automatic mixed precision')
parser.add_argument('--clip_grad', type=float, default=0.0)
parser.add_argument('--grad_accumulation', type=int, default=0)
parser.add_argument('--early_stopping', type=int, default=1000,
help='Stop training if validation loss does not improve for more than this number of epochs.')
parser.add_argument('--results', type=str, default='/results',
help='Directory in which results are stored')
parser.add_argument('--log_file', type=str, default='dllogger.json',
help='Name of dllogger output file')
parser.add_argument('--distributed_world_size', type=int, metavar='N',
default=torch.cuda.device_count(),
help='total number of GPUs across all nodes (default: all visible GPUs)')
parser.add_argument('--distributed_rank', default=os.getenv('LOCAL_RANK', 0), type=int,
help='rank of the current worker')
parser.add_argument('--local_rank', default=0, type=int,
help='rank of the current worker')
parser.add_argument('--overwrite_config', type=str, default='',
help='JSON string used to overload config')
parser.add_argument('--affinity', type=str,
default='socket_unique_interleaved',
choices=['socket', 'single', 'single_unique',
'socket_unique_interleaved',
'socket_unique_continuous',
'disabled'],
help='type of CPU affinity')
parser.add_argument("--ema_decay", type=float, default=0.0, help='Use exponential moving average')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,46 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
class PerformanceMeter():
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.count = 0
self.total_time = 0
self.last_update_time = time.time()
self.intervals = []
def update(self, n, exclude_from_total=False):
delta = time.time() - self.last_update_time
self.intervals.append(delta)
if not exclude_from_total:
self.total_time += delta
self.count += n
self.avg = self.count / self.total_time
self.last_update_time = time.time()
return n/delta
def reset_current_lap(self):
self.last_update_time = time.time()
def p(self, i):
assert i <= 100
idx = int(len(self.intervals) * i / 100)
return sorted(self.intervals)[idx]

View file

@ -0,0 +1,128 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from data_utils import InputTypes, DataTypes, FeatureSpec
import datetime
class ElectricityConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('power_usage', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('hour', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'days_from_start' # This column contains time indices across which we split the data
self.train_range = (1096, 1315)
self.valid_range = (1308, 1339)
self.test_range = (1332, 1346)
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = True
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [369]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.1
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
class TrafficConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('values', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('time_on_day', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'sensor_day' # This column contains time indices across which we split the data
self.train_range = (0, 151)
self.valid_range = (144, 166)
self.test_range = (159, float('inf'))
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = False
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [963]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.3
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
CONFIGS = {'electricity': ElectricityConfig,
'traffic': TrafficConfig,
}

View file

@ -0,0 +1,28 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
import torch.nn.functional as F
class QuantileLoss(nn.Module):
def __init__(self, config):
super().__init__()
self.register_buffer('q', torch.tensor(config.quantiles))
def forward(self, predictions, targets):
diff = predictions - targets
ql = (1-self.q)*F.relu(diff) + self.q*F.relu(-diff)
losses = ql.view(-1, ql.shape[-1]).mean(0)
return losses

View file

@ -0,0 +1,790 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################
# Copyright 2021 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import pickle
import enum
import datetime
from collections import namedtuple, OrderedDict
import sklearn.preprocessing
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
from bisect import bisect
import torch
from torch.utils.data import Dataset,IterableDataset,DataLoader
class DataTypes(enum.IntEnum):
"""Defines numerical types of each column."""
CONTINUOUS = 0
CATEGORICAL = 1
DATE = 2
STR = 3
class InputTypes(enum.IntEnum):
"""Defines input types of each column."""
TARGET = 0
OBSERVED = 1
KNOWN = 2
STATIC = 3
ID = 4 # Single column used as an entity identifier
TIME = 5 # Single column exclusively used as a time index
FeatureSpec = namedtuple('FeatureSpec', ['name', 'feature_type', 'feature_embed_type'])
DTYPE_MAP = {
DataTypes.CONTINUOUS : np.float32,
DataTypes.CATEGORICAL : np.int64,
DataTypes.DATE:'datetime64[ns]',
DataTypes.STR: str
}
FEAT_ORDER = [
(InputTypes.STATIC, DataTypes.CATEGORICAL),
(InputTypes.STATIC, DataTypes.CONTINUOUS),
(InputTypes.KNOWN, DataTypes.CATEGORICAL),
(InputTypes.KNOWN, DataTypes.CONTINUOUS),
(InputTypes.OBSERVED, DataTypes.CATEGORICAL),
(InputTypes.OBSERVED, DataTypes.CONTINUOUS),
(InputTypes.TARGET, DataTypes.CONTINUOUS),
(InputTypes.ID, DataTypes.CATEGORICAL)
]
FEAT_NAMES = ['s_cat' , 's_cont' , 'k_cat' , 'k_cont' , 'o_cat' , 'o_cont' , 'target', 'id']
DEFAULT_ID_COL = 'id'
class TFTBinaryDataset(Dataset):
def __init__(self, path, config):
super(TFTBinaryDataset).__init__()
self.features = [x for x in config.features if x.feature_embed_type != DataTypes.DATE]
self.example_length = config.example_length
self.stride = config.dataset_stride
self.grouped = pickle.load(open(path, 'rb'))
self.grouped = [x for x in self.grouped if x.shape[0] >= self.example_length]
self._cum_examples_in_group = np.cumsum([(g.shape[0] - self.example_length + 1)//self.stride for g in self.grouped])
self.feature_type_col_map = [[i for i,f in enumerate(self.features) if (f.feature_type, f.feature_embed_type) == x] for x in FEAT_ORDER]
# The list comprehension below is an elaborate way of rearranging data into correct order,
# simultaneously doing casting to proper types. Probably can be written neater
self.grouped = [
[
arr[:, idxs].view(dtype=np.float32).astype(DTYPE_MAP[t[1]])
for t, idxs in zip(FEAT_ORDER, self.feature_type_col_map)
]
for arr in self.grouped
]
def __len__(self):
return self._cum_examples_in_group[-1] if len(self._cum_examples_in_group) else 0
def __getitem__(self, idx):
g_idx = bisect(self._cum_examples_in_group, idx)
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx]
tensors = [
torch.from_numpy(feat[e_idx * self.stride:e_idx*self.stride + self.example_length])
if feat.size else torch.empty(0)
for feat in group
]
return OrderedDict(zip(FEAT_NAMES, tensors))
class TFTDataset(Dataset):
def __init__(self, path, config):
super(TFTDataset).__init__()
self.features = config.features
self.data = pd.read_csv(path, index_col=0)
self.example_length = config.example_length
self.stride = config.dataset_stride
# name field is a column name.
# there can be multiple entries with the same name because one column can be interpreted in many ways
time_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.TIME)
id_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.ID)
if not id_col_name in self.data.columns:
id_col_name = DEFAULT_ID_COL
self.features = [x for x in self.features if x.feature_type!=InputTypes.ID]
self.features.append(FeatureSpec(DEFAULT_ID_COL, InputTypes.ID, DataTypes.CATEGORICAL))
col_dtypes = {v.name:DTYPE_MAP[v.feature_embed_type] for v in self.features}
self.data.sort_values(time_col_name,inplace=True)
self.data = self.data[set(x.name for x in self.features)] #leave only relevant columns
self.data = self.data.astype(col_dtypes)
self.data = self.data.groupby(id_col_name).filter(lambda group: len(group) >= self.example_length)
self.grouped = list(self.data.groupby(id_col_name))
self._cum_examples_in_group = np.cumsum([(len(g[1]) - self.example_length + 1)//self.stride for g in self.grouped])
def __len__(self):
return self._cum_examples_in_group[-1]
def __getitem__(self, idx):
g_idx = len([x for x in self._cum_examples_in_group if x <= idx])
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx][1]
sliced = group.iloc[e_idx * self.stride:e_idx*self.stride + self.example_length]
# We need to be sure that tensors are returned in the correct order
tensors = tuple([] for _ in range(8))
for v in self.features:
if v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[0].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[1].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[2].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[3].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[4].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[5].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.TARGET:
tensors[6].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.ID:
tensors[7].append(torch.from_numpy(sliced[v.name].to_numpy()))
tensors = [torch.stack(x, dim=-1) if x else torch.empty(0) for x in tensors]
return OrderedDict(zip(FEAT_NAMES, tensors))
def get_dataset_splits(df, config):
if hasattr(config, 'relative_split') and config.relative_split:
forecast_len = config.example_length - config.encoder_length
# The valid split is shifted from the train split by number of the forecast steps to the future.
# The test split is shifted by the number of the forecast steps from the valid split
train = []
valid = []
test = []
for _, group in df.groupby(DEFAULT_ID_COL):
index = group[config.time_ids]
_train = group.loc[index < config.valid_boundary]
_valid = group.iloc[(len(_train) - config.encoder_length):(len(_train) + forecast_len)]
_test = group.iloc[(len(_train) - config.encoder_length + forecast_len):(len(_train) + 2*forecast_len)]
train.append(_train)
valid.append(_valid)
test.append(_test)
train = pd.concat(train, axis=0)
valid = pd.concat(valid, axis=0)
test = pd.concat(test, axis=0)
else:
index = df[config.time_ids]
train = df.loc[(index >= config.train_range[0]) & (index < config.train_range[1])]
valid = df.loc[(index >= config.valid_range[0]) & (index < config.valid_range[1])]
test = df.loc[(index >= config.test_range[0]) & (index < config.test_range[1])]
return train, valid, test
def flatten_ids(df, config):
if config.missing_id_strategy == 'drop':
if hasattr(config, 'combine_ids') and config.combine_ids:
index = np.logical_or.reduce([df[c].isna() for c in config.combine_ids])
else:
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
index = df[id_col].isna()
index = index[index == True].index # Extract indices of nans
df.drop(index, inplace=True)
if not (hasattr(config, 'combine_ids') and config.combine_ids):
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
ids = df[id_col].apply(str)
df.drop(id_col, axis=1, inplace=True)
encoder = sklearn.preprocessing.LabelEncoder().fit(ids.values)
df[DEFAULT_ID_COL] = encoder.transform(ids)
encoders = OrderedDict({id_col: encoder})
else:
encoders = {c:sklearn.preprocessing.LabelEncoder().fit(df[c].values) for c in config.combine_ids}
encoders = OrderedDict(encoders)
lens = [len(v.classes_) for v in encoders.values()]
clens = np.roll(np.cumprod(lens), 1)
clens[0] = 1
# this takes a looooooot of time. Probably it would be better to create 2 dummy columns
df[DEFAULT_ID_COL] = df.apply(lambda row: sum([encoders[c].transform([row[c]])[0]*clens[i] for i,c in enumerate(encoders.keys())]), axis=1)
df.drop(config.combine_ids, axis=1, inplace=True)
return DEFAULT_ID_COL, encoders
def impute(df, config):
#XXX This ensures that out scaling will have the same mean. We still need to check the variance
if not hasattr(config, 'missing_data_label'):
return df, None
else:
imp = SimpleImputer(missing_values=config.missing_data_label, strategy='mean')
mask = df.applymap(lambda x: True if x == config.missing_data_label else False)
data = df.values
col_mask = (data == config.missing_data_label).all(axis=0)
data[:,~col_mask] = imp.fit_transform(data)
return data, mask
def normalize_reals(train, valid, test, config, id_col=DEFAULT_ID_COL):
tgt_cols = [x.name for x in config.features if x.feature_type == InputTypes.TARGET]
real_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CONTINUOUS).difference(set(tgt_cols)))
real_scalers = {}
tgt_scalers = {}
def apply_scalers(df, name=None):
if name is None:
name = df.name
mask = df.applymap(lambda x: True if x == config.missing_data_label else False) if hasattr(config, 'missing_data_label') else None
df[real_cols] = real_scalers[name].transform(df[real_cols])
if mask is not None and any(mask):
df[real_cols].mask(mask, 10**9)
df[tgt_cols] = tgt_scalers[name].transform(df[tgt_cols])
return df
if config.scale_per_id:
for identifier, sliced in train.groupby(id_col):
data = sliced[real_cols]
data, _ = impute(data, config)
real_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(data)
# XXX We should probably remove examples that contain NaN as a target
target = sliced[tgt_cols]
tgt_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(target)
train = train.groupby(id_col).apply(apply_scalers)
# For valid and testing leave only timeseries previously present in train subset
# XXX for proper data science we should consider encoding unseen timeseries as a special case, not throwing them away
valid = valid.loc[valid[id_col].isin(real_scalers.keys())]
valid = valid.groupby(id_col).apply(apply_scalers)
test = test.loc[test[id_col].isin(real_scalers.keys())]
test = test.groupby(id_col).apply(apply_scalers)
else:
data, _ = impute(train[real_cols], config)
real_scalers[''] = sklearn.preprocessing.StandardScaler().fit(data)
tgt_scalers[''] = sklearn.preprocessing.StandardScaler().fit(train[tgt_cols])
train = apply_scalers(train, name='')
valid = apply_scalers(valid, name='')
test = apply_scalers(test, name='')
return train, valid, test, real_scalers, tgt_scalers
def encode_categoricals(train, valid, test, config):
cat_encodings = {}
cat_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CATEGORICAL and v.feature_type != InputTypes.ID))
num_classes = [] #XXX Maybe we should modify config based on this value? Or send a warninig?
# For TC performance reasons we might want for num_classes[i] be divisible by 8
# Train categorical encoders
for c in cat_cols:
if config.missing_cat_data_strategy == 'special_token':
#XXX this will probably require some data augmentation
unique = train[c].unique()
valid[c].loc[valid[c].isin(unique)] = '<UNK>'
test[c].loc[test[c].isin(unique)] = '<UNK>'
if config.missing_cat_data_strategy == 'encode_all' or \
config.missing_cat_data_strategy == 'special_token':
srs = pd.concat([train[c], valid[c], test[c]]).apply(str)
cat_encodings[c] = sklearn.preprocessing.LabelEncoder().fit(srs.values)
elif config.missing_cat_data_strategy == 'drop':
# TODO: implement this. In addition to dropping rows this has to split specific time series in chunks
# to prevent data from having temporal gaps
pass
num_classes.append(srs.nunique())
print('Categorical variables encodings lens: ', num_classes)
for split in [train, valid, test]:
for c in cat_cols:
srs = split[c].apply(str)
split[c] = srs
split.loc[:,c] = cat_encodings[c].transform(srs)
return cat_encodings
def preprocess(src_path, dst_path, config):
df = pd.read_csv(src_path, index_col=0)
for c in config.features:
if c.feature_embed_type == DataTypes.DATE:
df[c.name] = pd.to_datetime(df[c.name])
# Leave only columns relevant to preprocessing
relevant_columns = list(set([f.name for f in config.features] + [config.time_ids]))
df = df[relevant_columns]
id_col, id_encoders = flatten_ids(df, config)
df = df.reindex(sorted(df.columns), axis=1)
train, valid, test = get_dataset_splits(df, config)
# Length filter the data (all timeseries shorter than example len will be dropped)
#for df in [train, valid, test]:
# df.groupby(id_col).filter(lambda x: len(x) >= config.example_length)
train = pd.concat([x[1] for x in train.groupby(id_col) if len(x[1]) >= config.example_length])
valid = pd.concat([x[1] for x in valid.groupby(id_col) if len(x[1]) >= config.example_length])
test = pd.concat([x[1] for x in test.groupby(id_col) if len(x[1]) >= config.example_length])
train, valid, test, real_scalers, tgt_scalers = normalize_reals(train, valid, test, config, id_col)
cat_encodings = encode_categoricals(train, valid, test, config)
os.makedirs(dst_path, exist_ok=True)
train.to_csv(os.path.join(dst_path, 'train.csv'))
valid.to_csv(os.path.join(dst_path, 'valid.csv'))
test.to_csv(os.path.join(dst_path, 'test.csv'))
# Save relevant columns in binary form for faster dataloading
# IMORTANT: We always expect id to be a single column indicating the complete timeseries
# We also expect a copy of id in form of static categorical input!!!
col_names = [id_col] + [x.name for x in config.features if x.feature_embed_type != DataTypes.DATE and x.feature_type != InputTypes.ID]
grouped_train = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in train.groupby(id_col)]
grouped_valid = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in valid.groupby(id_col)]
grouped_test = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in test.groupby(id_col)]
pickle.dump(grouped_train, open(os.path.join(dst_path, 'train.bin'), 'wb'))
pickle.dump(grouped_valid, open(os.path.join(dst_path, 'valid.bin'), 'wb'))
pickle.dump(grouped_test, open(os.path.join(dst_path, 'test.bin'), 'wb'))
with open(os.path.join(dst_path, 'real_scalers.bin'), 'wb') as f:
pickle.dump(real_scalers, f)
with open(os.path.join(dst_path, 'tgt_scalers.bin'), 'wb') as f:
pickle.dump(tgt_scalers, f)
with open(os.path.join(dst_path, 'cat_encodings.bin'), 'wb') as f:
pickle.dump(cat_encodings, f)
with open(os.path.join(dst_path, 'id_encoders.bin'), 'wb') as f:
pickle.dump(id_encoders, f)
def sample_data(dataset, num_samples):
if num_samples < 0:
return dataset
else:
return torch.utils.data.Subset(dataset, np.random.choice(np.arange(len(dataset)), size=num_samples, replace=False))
def standarize_electricity(path):
"""Code taken from https://github.com/google-research/google-research/blob/master/tft/script_download_data.py"""
df = pd.read_csv(os.path.join(path, 'LD2011_2014.txt'), index_col=0, sep=';', decimal=',')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
# Used to determine the start and end dates of a series
output = df.resample('1h').mean().replace(0., np.nan)
earliest_time = output.index.min()
df_list = []
for label in output:
print('Processing {}'.format(label))
srs = output[label]
start_date = min(srs.fillna(method='ffill').dropna().index)
end_date = max(srs.fillna(method='bfill').dropna().index)
active_range = (srs.index >= start_date) & (srs.index <= end_date)
srs = srs[active_range].fillna(0.)
tmp = pd.DataFrame({'power_usage': srs})
date = tmp.index
tmp['t'] = (date - earliest_time).seconds / 60 / 60 + (
date - earliest_time).days * 24
tmp['days_from_start'] = (date - earliest_time).days
tmp['categorical_id'] = label
tmp['date'] = date
tmp['id'] = label
tmp['hour'] = date.hour
tmp['day'] = date.day
tmp['day_of_week'] = date.dayofweek
tmp['month'] = date.month
df_list.append(tmp)
output = pd.concat(df_list, axis=0, join='outer').reset_index(drop=True)
output['categorical_id'] = output['id'].copy()
output['hours_from_start'] = output['t']
output['categorical_day_of_week'] = output['day_of_week'].copy()
output['categorical_hour'] = output['hour'].copy()
output.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_volatility(path):
df = pd.read_csv(os.path.join(path, 'oxfordmanrealizedvolatilityindices.csv'), index_col=0) # no explicit index
# Adds additional date/day fields
idx = [str(s).split('+')[0] for s in df.index
] # ignore timezones, we don't need them
dates = pd.to_datetime(idx)
df['date'] = dates
df['days_from_start'] = (dates - pd.datetime(2000, 1, 3)).days
df['day_of_week'] = dates.dayofweek
df['day_of_month'] = dates.day
df['week_of_year'] = dates.weekofyear
df['month'] = dates.month
df['year'] = dates.year
df['categorical_id'] = df['Symbol'].copy()
# Processes log volatility
vol = df['rv5_ss'].copy()
vol.loc[vol == 0.] = np.nan
df['log_vol'] = np.log(vol)
# Adds static information
symbol_region_mapping = {
'.AEX': 'EMEA',
'.AORD': 'APAC',
'.BFX': 'EMEA',
'.BSESN': 'APAC',
'.BVLG': 'EMEA',
'.BVSP': 'AMER',
'.DJI': 'AMER',
'.FCHI': 'EMEA',
'.FTMIB': 'EMEA',
'.FTSE': 'EMEA',
'.GDAXI': 'EMEA',
'.GSPTSE': 'AMER',
'.HSI': 'APAC',
'.IBEX': 'EMEA',
'.IXIC': 'AMER',
'.KS11': 'APAC',
'.KSE': 'APAC',
'.MXX': 'AMER',
'.N225': 'APAC ',
'.NSEI': 'APAC',
'.OMXC20': 'EMEA',
'.OMXHPI': 'EMEA',
'.OMXSPI': 'EMEA',
'.OSEAX': 'EMEA',
'.RUT': 'EMEA',
'.SMSI': 'EMEA',
'.SPX': 'AMER',
'.SSEC': 'APAC',
'.SSMI': 'EMEA',
'.STI': 'APAC',
'.STOXX50E': 'EMEA'
}
df['Region'] = df['Symbol'].apply(lambda k: symbol_region_mapping[k])
# Performs final processing
output_df_list = []
for grp in df.groupby('Symbol'):
sliced = grp[1].copy()
sliced.sort_values('days_from_start', inplace=True)
# Impute log volatility values
sliced['log_vol'].fillna(method='ffill', inplace=True)
sliced.dropna()
output_df_list.append(sliced)
df = pd.concat(output_df_list, axis=0)
df.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_traffic(path):
def process_list(s, variable_type=int, delimiter=None):
"""Parses a line in the PEMS format to a list."""
if delimiter is None:
l = [
variable_type(i) for i in s.replace('[', '').replace(']', '').split()
]
else:
l = [
variable_type(i)
for i in s.replace('[', '').replace(']', '').split(delimiter)
]
return l
def read_single_list(filename):
"""Returns single list from a file in the PEMS-custom format."""
with open(os.path.join(path, filename), 'r') as dat:
l = process_list(dat.readlines()[0])
return l
def read_matrix(filename):
"""Returns a matrix from a file in the PEMS-custom format."""
array_list = []
with open(os.path.join(path, filename), 'r') as dat:
lines = dat.readlines()
for i, line in enumerate(lines):
if (i + 1) % 50 == 0:
print('Completed {} of {} rows for {}'.format(i + 1, len(lines),
filename))
array = [
process_list(row_split, variable_type=float, delimiter=None)
for row_split in process_list(
line, variable_type=str, delimiter=';')
]
array_list.append(array)
return array_list
shuffle_order = np.array(read_single_list('randperm')) - 1 # index from 0
train_dayofweek = read_single_list('PEMS_trainlabels')
train_tensor = read_matrix('PEMS_train')
test_dayofweek = read_single_list('PEMS_testlabels')
test_tensor = read_matrix('PEMS_test')
# Inverse permutate shuffle order
print('Shuffling')
inverse_mapping = {
new_location: previous_location
for previous_location, new_location in enumerate(shuffle_order)
}
reverse_shuffle_order = np.array([
inverse_mapping[new_location]
for new_location, _ in enumerate(shuffle_order)
])
# Group and reoder based on permuation matrix
print('Reodering')
day_of_week = np.array(train_dayofweek + test_dayofweek)
combined_tensor = np.array(train_tensor + test_tensor)
day_of_week = day_of_week[reverse_shuffle_order]
combined_tensor = combined_tensor[reverse_shuffle_order]
# Put everything back into a dataframe
print('Parsing as dataframe')
labels = ['traj_{}'.format(i) for i in read_single_list('stations_list')]
hourly_list = []
for day, day_matrix in enumerate(combined_tensor):
# Hourly data
hourly = pd.DataFrame(day_matrix.T, columns=labels)
hourly['hour_on_day'] = [int(i / 6) for i in hourly.index
] # sampled at 10 min intervals
if hourly['hour_on_day'].max() > 23 or hourly['hour_on_day'].min() < 0:
raise ValueError('Invalid hour! {}-{}'.format(
hourly['hour_on_day'].min(), hourly['hour_on_day'].max()))
hourly = hourly.groupby('hour_on_day', as_index=True).mean()[labels]
hourly['sensor_day'] = day
hourly['time_on_day'] = hourly.index
hourly['day_of_week'] = day_of_week[day]
hourly_list.append(hourly)
hourly_frame = pd.concat(hourly_list, axis=0, ignore_index=True, sort=False)
# Flatten such that each entitiy uses one row in dataframe
store_columns = [c for c in hourly_frame.columns if 'traj' in c]
other_columns = [c for c in hourly_frame.columns if 'traj' not in c]
flat_df = pd.DataFrame(columns=['values', 'prev_values', 'next_values'] +
other_columns + ['id'])
for store in store_columns:
print('Processing {}'.format(store))
sliced = hourly_frame[[store] + other_columns].copy()
sliced.columns = ['values'] + other_columns
sliced['id'] = int(store.replace('traj_', ''))
# Sort by Sensor-date-time
key = sliced['id'].apply(str) \
+ sliced['sensor_day'].apply(lambda x: '_{:03d}'.format(x)) \
+ sliced['time_on_day'].apply(lambda x: '_{:03d}'.format(x))
sliced = sliced.set_index(key).sort_index()
sliced['values'] = sliced['values'].fillna(method='ffill')
sliced['prev_values'] = sliced['values'].shift(1)
sliced['next_values'] = sliced['values'].shift(-1)
flat_df = flat_df.append(sliced.dropna(), ignore_index=True, sort=False)
# Filter to match range used by other academic papers
index = flat_df['sensor_day']
flat_df = flat_df[index < 173].copy()
# Creating columns fo categorical inputs
flat_df['categorical_id'] = flat_df['id'].copy()
flat_df['hours_from_start'] = flat_df['time_on_day'] \
+ flat_df['sensor_day']*24.
flat_df['categorical_day_of_week'] = flat_df['day_of_week'].copy()
flat_df['categorical_time_on_day'] = flat_df['time_on_day'].copy()
flat_df.to_csv(os.path.join(path, 'standarized.csv'))
# XXX needs rework
def standarize_favorita(data_folder):
import gc
# Extract only a subset of data to save/process for efficiency
start_date = pd.datetime(2015, 1, 1)
end_date = pd.datetime(2016, 6, 1)
print('Regenerating data...')
# load temporal data
temporal = pd.read_csv(os.path.join(data_folder, 'train.csv'), index_col=0)
store_info = pd.read_csv(os.path.join(data_folder, 'stores.csv'), index_col=0)
oil = pd.read_csv(
os.path.join(data_folder, 'oil.csv'), index_col=0).iloc[:, 0]
holidays = pd.read_csv(os.path.join(data_folder, 'holidays_events.csv'))
items = pd.read_csv(os.path.join(data_folder, 'items.csv'), index_col=0)
transactions = pd.read_csv(os.path.join(data_folder, 'transactions.csv'))
# Take first 6 months of data
temporal['date'] = pd.to_datetime(temporal['date'])
# Filter dates to reduce storage space requirements
if start_date is not None:
temporal = temporal[(temporal['date'] >= start_date)]
if end_date is not None:
temporal = temporal[(temporal['date'] < end_date)]
dates = temporal['date'].unique()
# Add trajectory identifier
temporal['traj_id'] = temporal['store_nbr'].apply(
str) + '_' + temporal['item_nbr'].apply(str)
temporal['unique_id'] = temporal['traj_id'] + '_' + temporal['date'].apply(
str)
# Remove all IDs with negative returns
print('Removing returns data')
min_returns = temporal['unit_sales'].groupby(temporal['traj_id']).min()
valid_ids = set(min_returns[min_returns >= 0].index)
selector = temporal['traj_id'].apply(lambda traj_id: traj_id in valid_ids)
new_temporal = temporal[selector].copy()
del temporal
gc.collect()
temporal = new_temporal
temporal['open'] = 1
# Resampling
print('Resampling to regular grid')
resampled_dfs = []
for traj_id, raw_sub_df in temporal.groupby('traj_id'):
print('Resampling', traj_id)
sub_df = raw_sub_df.set_index('date', drop=True).copy()
sub_df = sub_df.resample('1d').last()
sub_df['date'] = sub_df.index
sub_df[['store_nbr', 'item_nbr', 'onpromotion']] \
= sub_df[['store_nbr', 'item_nbr', 'onpromotion']].fillna(method='ffill')
sub_df['open'] = sub_df['open'].fillna(
0) # flag where sales data is unknown
sub_df['log_sales'] = np.log(sub_df['unit_sales'])
resampled_dfs.append(sub_df.reset_index(drop=True))
new_temporal = pd.concat(resampled_dfs, axis=0)
del temporal
gc.collect()
temporal = new_temporal
print('Adding oil')
oil.name = 'oil'
oil.index = pd.to_datetime(oil.index)
#XXX the lines below match the value of the oil on given date with the rest of the timeseries
# missing values in oil series are copied from the index before. Then the oil series is joined with
# temporal. Then there are some dates present in temporal which arent present in oil, for which
# oil values is substituted with -1. WHY?!
#TODO: check how many nans there are after first step. Previously oil series was extended by dates
# present in dates variable with nan value, which were forward filled.
# This behavior is no longer supported by pandas, so we changed to DataFrame.isin method.
# This leaves us with more nans after first step than previously. To achieve previous behavior
# we have to join series before filling nans.
temporal = temporal.join(
#oil.loc[oil.index.isin(dates)].fillna(method='ffill'), on='date', how='left')
oil.loc[oil.index.isin(dates)], on='date', how='left')
temporal['oil'] = temporal['oil'].fillna(method='ffill')
temporal['oil'] = temporal['oil'].fillna(-1)
print('Adding store info')
temporal = temporal.join(store_info, on='store_nbr', how='left')
print('Adding item info')
temporal = temporal.join(items, on='item_nbr', how='left')
transactions['date'] = pd.to_datetime(transactions['date'])
temporal = temporal.merge(
transactions,
left_on=['date', 'store_nbr'],
right_on=['date', 'store_nbr'],
how='left')
temporal['transactions'] = temporal['transactions'].fillna(-1)
# Additional date info
temporal['day_of_week'] = pd.to_datetime(temporal['date'].values).dayofweek
temporal['day_of_month'] = pd.to_datetime(temporal['date'].values).day
temporal['month'] = pd.to_datetime(temporal['date'].values).month
# Add holiday info
print('Adding holidays')
holiday_subset = holidays[holidays['transferred'].apply(
lambda x: not x)].copy()
holiday_subset.columns = [
s if s != 'type' else 'holiday_type' for s in holiday_subset.columns
]
holiday_subset['date'] = pd.to_datetime(holiday_subset['date'])
local_holidays = holiday_subset[holiday_subset['locale'] == 'Local']
regional_holidays = holiday_subset[holiday_subset['locale'] == 'Regional']
national_holidays = holiday_subset[holiday_subset['locale'] == 'National']
temporal['national_hol'] = temporal.merge(
national_holidays, left_on=['date'], right_on=['date'],
how='left')['description'].fillna('')
temporal['regional_hol'] = temporal.merge(
regional_holidays,
left_on=['state', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal['local_hol'] = temporal.merge(
local_holidays,
left_on=['city', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal.sort_values('unique_id', inplace=True)
# Transform date to integer index
start_date = pd.to_datetime(min(temporal['date']))
dates = temporal['date'].apply(pd.to_datetime)
temporal['days_from_start'] = (dates - start_date).dt.days
temporal['categorical_id'] = temporal['traj_id'].copy()
print('Saving processed file to {}'.format(os.path.join(data_folder, 'standarized.csv')))
temporal.to_csv(os.path.join(data_folder, 'standarized.csv'))

View file

@ -0,0 +1,73 @@
# Copyright 2021 NVIDIA CORPORATION
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2019 Ross Wightman
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Exponential Moving Average (EMA) of model updates
"""
from collections import OrderedDict
from copy import deepcopy
import torch
import torch.nn as nn
class ModelEma(nn.Module):
""" Model Exponential Moving Average V2
Keep a moving average of everything in the model state_dict (parameters and buffers).
V2 of this module is simpler, it does not match params/buffers based on name but simply
iterates in order. It works with torchscript (JIT of full model).
"""
def __init__(self, model, decay=0.999, device=None):
super().__init__()
# make a copy of the model for accumulating moving average of weights
self.module = deepcopy(model)
self.module.eval()
self.decay = decay
self.device = device # perform ema on different device from model if set
if self.device is not None:
self.module.to(device=device)
def update(self, model):
update_fn=lambda ema_v, model_v: self.decay * ema_v + (1. - self.decay) * model_v
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_(update_fn(ema_v, model_v))
def set(self, model):
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_( model_v )
def forward(self, x):
return self.module(x)

View file

@ -0,0 +1,157 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import math
import os
import pathlib
import re
import pynvml
pynvml.nvmlInit()
def systemGetDriverVersion():
return pynvml.nvmlSystemGetDriverVersion()
def deviceGetCount():
return pynvml.nvmlDeviceGetCount()
class device:
# assume nvml returns list of 64 bit ints
_nvml_affinity_elements = math.ceil(os.cpu_count() / 64)
def __init__(self, device_idx):
super().__init__()
self.handle = pynvml.nvmlDeviceGetHandleByIndex(device_idx)
def getName(self):
return pynvml.nvmlDeviceGetName(self.handle)
def getCpuAffinity(self):
affinity_string = ''
for j in pynvml.nvmlDeviceGetCpuAffinity(
self.handle, device._nvml_affinity_elements
):
# assume nvml returns list of 64 bit ints
affinity_string = '{:064b}'.format(j) + affinity_string
affinity_list = [int(x) for x in affinity_string]
affinity_list.reverse() # so core 0 is in 0th element of list
ret = [i for i, e in enumerate(affinity_list) if e != 0]
return ret
def set_socket_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity)
def set_single_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity[:1])
def set_single_unique_affinity(gpu_id, nproc_per_node):
devices = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in devices]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
affinities = []
assigned = []
for socket_affinity in socket_affinities:
for core in socket_affinity:
if core not in assigned:
affinities.append([core])
assigned.append(core)
break
os.sched_setaffinity(0, affinities[gpu_id])
def set_socket_unique_affinity(gpu_id, nproc_per_node, mode):
device_ids = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in device_ids]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
socket_affinities_to_device_ids = collections.defaultdict(list)
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities_to_device_ids[tuple(socket_affinity)].append(idx)
for socket_affinity, device_ids in socket_affinities_to_device_ids.items():
devices_per_group = len(device_ids)
cores_per_device = len(socket_affinity) // devices_per_group
for group_id, device_id in enumerate(device_ids):
if device_id == gpu_id:
if mode == 'interleaved':
affinity = list(socket_affinity[group_id::devices_per_group])
elif mode == 'continuous':
affinity = list(socket_affinity[group_id*cores_per_device:(group_id+1)*cores_per_device])
else:
raise RuntimeError('Unknown set_socket_unique_affinity mode')
# reintroduce siblings
affinity += [siblings_dict[aff] for aff in affinity if aff in siblings_dict]
os.sched_setaffinity(0, affinity)
def get_thread_siblings_list():
path = '/sys/devices/system/cpu/cpu*/topology/thread_siblings_list'
thread_siblings_list = []
pattern = re.compile(r'(\d+)\D(\d+)')
for fname in pathlib.Path(path[0]).glob(path[1:]):
with open(fname) as f:
content = f.read().strip()
res = pattern.findall(content)
if res:
pair = tuple(map(int, res[0]))
thread_siblings_list.append(pair)
return thread_siblings_list
def set_affinity(gpu_id, nproc_per_node, mode='socket'):
if mode == 'socket':
set_socket_affinity(gpu_id)
elif mode == 'single':
set_single_affinity(gpu_id)
elif mode == 'single_unique':
set_single_unique_affinity(gpu_id, nproc_per_node)
elif mode == 'socket_unique_interleaved':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'interleaved')
elif mode == 'socket_unique_continuous':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'continuous')
else:
raise RuntimeError('Unknown affinity mode')
affinity = os.sched_getaffinity(0)
return affinity

View file

@ -0,0 +1,239 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pandas as pd
import numpy as np
import pickle
import argparse
import torch
from torch.utils.data import DataLoader
from torch.cuda import amp
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from modeling import TemporalFusionTransformer
from configuration import ElectricityConfig
from data_utils import TFTDataset
from utils import PerformanceMeter
from criterions import QuantileLoss
import dllogger
from log_helper import setup_logger
def _unscale_per_id(config, values, ids, scalers):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
flat_values['id'] = ids
df_list = []
for idx, group in flat_values.groupby('id'):
scaler = scalers[idx]
group_copy = group.copy()
for col in group_copy.columns:
if not 'id' in col:
_col = np.expand_dims(group_copy[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
group_copy[col] = _t_col
df_list.append(group_copy)
flat_values = pd.concat(df_list, axis=0)
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def _unscale(config, values, scaler):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
for col in flat_values.columns:
if not 'id' in col:
_col = np.expand_dims(flat_values[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
flat_values[col] = _t_col
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=False):
model.eval()
predictions = []
targets = []
ids = []
perf_meter = PerformanceMeter()
n_workers = args.distributed_world_size if hasattr(args, 'distributed_world_size') else 1
for step, batch in enumerate(data_loader):
perf_meter.reset_current_lap()
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
ids.append(batch['id'][:,0,:])
targets.append(batch['target'])
predictions.append(model(batch).float())
perf_meter.update(args.batch_size * n_workers,
exclude_from_total=step in [0, len(data_loader)-1])
targets = torch.cat(targets, dim=0)
if not extend_targets:
targets = targets[:,config.encoder_length:,:]
predictions = torch.cat(predictions, dim=0)
if config.scale_per_id:
ids = torch.cat(ids, dim=0).cpu().numpy()
unscaled_predictions = torch.stack(
[_unscale_per_id(config, predictions[:,:,i], ids, scalers) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale_per_id(config, targets[:,:,0], ids, scalers).unsqueeze(-1)
else:
ids = None
unscaled_predictions = torch.stack(
[_unscale(config, predictions[:,:,i], scalers['']) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale(config, targets[:,:,0], scalers['']).unsqueeze(-1)
return unscaled_predictions, unscaled_targets, ids, perf_meter
def visualize_v2(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, _ = predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=True)
num_horizons = config.example_length - config.encoder_length + 1
pad = unscaled_predictions.new_full((unscaled_targets.shape[0], unscaled_targets.shape[1] - unscaled_predictions.shape[1], unscaled_predictions.shape[2]), fill_value=float('nan'))
pad[:,-1,:] = unscaled_targets[:,-num_horizons,:]
unscaled_predictions = torch.cat((pad, unscaled_predictions), dim=1)
ids = torch.from_numpy(ids.squeeze())
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items():
for i, ex in enumerate(g):
df = pd.DataFrame(ex.numpy(),
index=range(num_horizons - ex.shape[0], num_horizons),
columns=['target'] + [f'P{int(q*100)}' for q in config.quantiles])
fig = df.plot().get_figure()
ax = fig.get_axes()[0]
_values = df.values[config.encoder_length-1:,:]
ax.fill_between(range(num_horizons), _values[:,1], _values[:,-1], alpha=0.2, color='green')
os.makedirs(os.path.join(args.results, 'single_example_vis', str(key)), exist_ok=True)
fig.savefig(os.path.join(args.results, 'single_example_vis', str(key), f'{i}.pdf'))
def inference(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, perf_meter = predict(args, config, model, data_loader, scalers, cat_encodings)
if args.joint_visualization or args.save_predictions:
ids = torch.from_numpy(ids.squeeze())
#ids = torch.cat([x['id'][0] for x in data_loader.dataset])
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items(): #timeseries id, joint targets and predictions
_g = {'targets': g[:,:,0]}
_g.update({f'P{int(q*100)}':g[:,:,i+1] for i, q in enumerate(config.quantiles)})
if args.joint_visualization:
summary_writer = SummaryWriter(log_dir=os.path.join(args.results, 'predictions_vis', str(key)))
for q, t in _g.items(): # target and quantiles, timehorizon values
if q == 'targets':
targets = torch.cat([t[:,0], t[-1,1:]]) # WIP
# We want to plot targets on the same graph as predictions. Probably could be written better.
for i, val in enumerate(targets):
summary_writer.add_scalars(str(key), {f'{q}':val}, i)
continue
# Tensor t contains different time horizons which are shifted in phase
# Next lines realign them
y = t.new_full((t.shape[0] + t.shape[1] -1, t.shape[1]), float('nan'))
for i in range(y.shape[1]):
y[i:i+t.shape[0], i] = t[:,i]
for i, vals in enumerate(y): # timestep, timehorizon values value
summary_writer.add_scalars(str(key), {f'{q}_t+{j+1}':v for j,v in enumerate(vals) if v == v}, i)
summary_writer.close()
if args.save_predictions:
for q, t in _g.items():
df = pd.DataFrame(t.tolist())
df.columns = [f't+{i+1}' for i in range(len(df.columns))]
os.makedirs(os.path.join(args.results, 'predictions', str(key)), exist_ok=True)
df.to_csv(os.path.join(args.results, 'predictions', str(key), q+'.csv'))
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
q_risk = 2 * losses / normalizer
perf_dict = {
'throughput': perf_meter.avg,
'latency_avg': perf_meter.total_time/len(perf_meter.intervals),
'latency_p90': perf_meter.p(90),
'latency_p95': perf_meter.p(95),
'latency_p99': perf_meter.p(99),
'total_infernece_time': perf_meter.total_time,
}
return q_risk, perf_dict
def main(args):
setup_logger(args)
# Set up model
state_dict = torch.load(args.checkpoint)
config = state_dict['config']
model = TemporalFusionTransformer(config).cuda()
model.load_state_dict(state_dict['model'])
model.eval()
model.cuda()
# Set up dataset
test_split = TFTDataset(args.data, config)
data_loader = DataLoader(test_split, batch_size=args.batch_size, num_workers=4)
scalers = pickle.load(open(args.tgt_scalers, 'rb'))
cat_encodings = pickle.load(open(args.cat_encodings, 'rb'))
if args.visualize:
# TODO: abstract away all forms of visualization.
visualize_v2(args, config, model, data_loader, scalers, cat_encodings)
quantiles, perf_dict = inference(args, config, model, data_loader, scalers, cat_encodings)
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, **perf_dict}
dllogger.log(step=(), data=finish_log, verbosity=1)
print('Test q-risk: P10 {} | P50 {} | P90 {}'.format(*quantiles))
print('Latency:\n\tAverage {:.3f}s\n\tp90 {:.3f}s\n\tp95 {:.3f}s\n\tp99 {:.3f}s'.format(
perf_dict['latency_avg'], perf_dict['latency_p90'], perf_dict['latency_p95'], perf_dict['latency_p99']))
if __name__=='__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--checkpoint', type=str,
help='Path to the checkpoint')
parser.add_argument('--data', type=str,
help='Path to the test split of the dataset')
parser.add_argument('--tgt_scalers', type=str,
help='Path to the tgt_scalers.bin file produced by the preprocessing')
parser.add_argument('--cat_encodings', type=str,
help='Path to the cat_encodings.bin file produced by the preprocessing')
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--visualize', action='store_true', help='Visualize predictions - each example on the separate plot')
parser.add_argument('--joint_visualization', action='store_true', help='Visualize predictions - each timeseries on separate plot. Projections will be concatenated.')
parser.add_argument('--save_predictions', action='store_true')
parser.add_argument('--results', type=str, default='/results')
parser.add_argument('--log_file', type=str, default='dllogger.json')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,141 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import subprocess
import sys
import itertools
import atexit
import dllogger
from dllogger import Backend, JSONStreamBackend, StdOutBackend
import torch.distributed as dist
from torch.utils.tensorboard import SummaryWriter
class TensorBoardBackend(Backend):
def __init__(self, verbosity, log_dir):
super().__init__(verbosity=verbosity)
self.summary_writer = SummaryWriter(log_dir=os.path.join(log_dir, 'TB_summary'),
flush_secs=120,
max_queue=200
)
self.hp_cache = None
atexit.register(self.summary_writer.close)
@property
def log_level(self):
return self._log_level
def metadata(self, timestamp, elapsedtime, metric, metadata):
pass
def log(self, timestamp, elapsedtime, step, data):
if step == 'HPARAMS':
parameters = {k: v for k, v in data.items() if not isinstance(v, (list, tuple))}
#Unpack list and tuples
for d in [{k+f'_{i}':v for i,v in enumerate(l)} for k,l in data.items() if isinstance(l, (list, tuple))]:
parameters.update(d)
#Remove custom classes
parameters = {k: v for k, v in data.items() if isinstance(v, (int, float, str, bool))}
parameters.update({k:'None' for k, v in data.items() if v is None})
self.hp_cache = parameters
if step == ():
if self.hp_cache is None:
print('Warning: Cannot save HParameters. Please log HParameters with step=\'HPARAMS\'', file=sys.stderr)
return
self.summary_writer.add_hparams(self.hp_cache, data)
if not isinstance(step, int):
return
for k, v in data.items():
self.summary_writer.add_scalar(k, v, step)
def flush(self):
pass
def setup_logger(args):
os.makedirs(args.results, exist_ok=True)
log_path = os.path.join(args.results, args.log_file)
if os.path.exists(log_path):
for i in itertools.count():
s_fname = args.log_file.split('.')
fname = '.'.join(s_fname[:-1]) + f'_{i}.' + s_fname[-1] if len(s_fname) > 1 else args.stat_file + f'.{i}'
log_path = os.path.join(args.results, fname)
if not os.path.exists(log_path):
break
def metric_format(metric, metadata, value):
return "{}: {}".format(metric, f'{value:.5f}' if isinstance(value, float) else value)
def step_format(step):
if step == ():
return "Finished |"
elif isinstance(step, int):
return "Step {0: <5} |".format(step)
return "Step {} |".format(step)
if not dist.is_initialized() or not args.distributed_world_size > 1 or args.distributed_rank == 0:
dllogger.init(backends=[JSONStreamBackend(verbosity=1, filename=log_path),
TensorBoardBackend(verbosity=1, log_dir=args.results),
StdOutBackend(verbosity=2,
step_format=step_format,
prefix_format=lambda x: "")#,
#metric_format=metric_format)
])
else:
dllogger.init(backends=[])
dllogger.log(step='PARAMETER', data=vars(args), verbosity=0)
container_setup_info = {**get_framework_env_vars(), **get_system_info()}
dllogger.log(step='ENVIRONMENT', data=container_setup_info, verbosity=0)
dllogger.metadata('loss', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'TRAIN', 'format': ':1f'})
dllogger.metadata('val_loss', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format':':5f'})
dllogger.metadata('val_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'VAL', 'format': ':1f'})
dllogger.metadata('test_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('throughput', {'GOAL': 'MAXIMIZE', 'STAGE': 'TEST', 'format': ':1f'})
dllogger.metadata('latency_p90', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p95', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p99', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
def get_framework_env_vars():
return {
'NVIDIA_PYTORCH_VERSION': os.environ.get('NVIDIA_PYTORCH_VERSION'),
'PYTORCH_VERSION': os.environ.get('PYTORCH_VERSION'),
'CUBLAS_VERSION': os.environ.get('CUBLAS_VERSION'),
'NCCL_VERSION': os.environ.get('NCCL_VERSION'),
'CUDA_DRIVER_VERSION': os.environ.get('CUDA_DRIVER_VERSION'),
'CUDNN_VERSION': os.environ.get('CUDNN_VERSION'),
'CUDA_VERSION': os.environ.get('CUDA_VERSION'),
'NVIDIA_PIPELINE_ID': os.environ.get('NVIDIA_PIPELINE_ID'),
'NVIDIA_BUILD_ID': os.environ.get('NVIDIA_BUILD_ID'),
'NVIDIA_TF32_OVERRIDE': os.environ.get('NVIDIA_TF32_OVERRIDE'),
}
def get_system_info():
system_info = subprocess.run('nvidia-smi --query-gpu=gpu_name,memory.total,enforced.power.limit --format=csv'.split(), capture_output=True).stdout
system_info = [i.decode('utf-8') for i in system_info.split(b'\n')]
system_info = [x for x in system_info if x]
return {'system_info': system_info}

View file

@ -0,0 +1,367 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
from typing import Dict, Tuple, Optional, List
if os.environ.get("TFT_SCRIPTING", False):
from torch.nn import LayerNorm
else:
from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm
class MaybeLayerNorm(nn.Module):
def __init__(self, output_size, hidden_size, eps):
super().__init__()
if output_size and output_size == 1:
self.ln = nn.Identity()
else:
self.ln = LayerNorm(output_size if output_size else hidden_size, eps=eps)
def forward(self, x):
return self.ln(x)
class GLU(nn.Module):
def __init__(self, hidden_size, output_size):
super().__init__()
self.lin = nn.Linear(hidden_size, output_size * 2)
def forward(self, x: Tensor) -> Tensor:
x = self.lin(x)
x = F.glu(x)
return x
class GRN(nn.Module):
def __init__(self,
input_size,
hidden_size,
output_size=None,
context_hidden_size=None,
dropout=0):
super().__init__()
self.layer_norm = MaybeLayerNorm(output_size, hidden_size, eps=1e-3)
self.lin_a = nn.Linear(input_size, hidden_size)
if context_hidden_size is not None:
self.lin_c = nn.Linear(context_hidden_size, hidden_size, bias=False)
self.lin_i = nn.Linear(hidden_size, hidden_size)
self.glu = GLU(hidden_size, output_size if output_size else hidden_size)
self.dropout = nn.Dropout(dropout)
self.out_proj = nn.Linear(input_size, output_size) if output_size else None
def forward(self, a: Tensor, c: Optional[Tensor] = None):
x = self.lin_a(a)
if c is not None:
x = x + self.lin_c(c).unsqueeze(1)
x = F.elu(x)
x = self.lin_i(x)
x = self.dropout(x)
x = self.glu(x)
y = a if not self.out_proj else self.out_proj(a)
x = x + y
x = self.layer_norm(x)
return x
class TFTEmbedding(nn.Module):
def __init__(self, config):
super().__init__()
self.s_cat_inp_lens = config.static_categorical_inp_lens
self.t_cat_k_inp_lens = config.temporal_known_categorical_inp_lens
self.t_cat_o_inp_lens = config.temporal_observed_categorical_inp_lens
self.s_cont_inp_size = config.static_continuous_inp_size
self.t_cont_k_inp_size = config.temporal_known_continuous_inp_size
self.t_cont_o_inp_size = config.temporal_observed_continuous_inp_size
self.t_tgt_size = config.temporal_target_size
self.hidden_size = config.hidden_size
# There are 7 types of input:
# 1. Static categorical
# 2. Static continuous
# 3. Temporal known a priori categorical
# 4. Temporal known a priori continuous
# 5. Temporal observed categorical
# 6. Temporal observed continuous
# 7. Temporal observed targets (time series obseved so far)
self.s_cat_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.s_cat_inp_lens]) if self.s_cat_inp_lens else None
self.t_cat_k_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_k_inp_lens]) if self.t_cat_k_inp_lens else None
self.t_cat_o_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_o_inp_lens]) if self.t_cat_o_inp_lens else None
self.s_cont_embedding_vectors = nn.Parameter(torch.Tensor(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_vectors = nn.Parameter(torch.Tensor(self.t_tgt_size, self.hidden_size))
self.s_cont_embedding_bias = nn.Parameter(torch.zeros(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_bias = nn.Parameter(torch.zeros(self.t_tgt_size, self.hidden_size))
if self.s_cont_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.s_cont_embedding_vectors)
if self.t_cont_k_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_k_embedding_vectors)
if self.t_cont_o_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_o_embedding_vectors)
torch.nn.init.xavier_normal_(self.t_tgt_embedding_vectors)
def _apply_embedding(self,
cat: Optional[Tensor],
cont: Optional[Tensor],
cat_emb: Optional[nn.ModuleList],
cont_emb: Tensor,
cont_bias: Tensor,
) -> Tuple[Optional[Tensor], Optional[Tensor]]:
e_cat = torch.stack([embed(cat[...,i]) for i, embed in enumerate(cat_emb)], dim=-2) if cat is not None else None
if cont is not None:
#the line below is equivalent to following einsums
#e_cont = torch.einsum('btf,fh->bthf', cont, cont_emb)
#e_cont = torch.einsum('bf,fh->bhf', cont, cont_emb)
e_cont = torch.mul(cont.unsqueeze(-1), cont_emb)
e_cont = e_cont + cont_bias
else:
e_cont = None
if e_cat is not None and e_cont is not None:
return torch.cat([e_cat, e_cont], dim=-2)
elif e_cat is not None:
return e_cat
elif e_cont is not None:
return e_cont
else:
return None
def forward(self, x: Dict[str, Tensor]):
# temporal/static categorical/continuous known/observed input
s_cat_inp = x.get('s_cat', None)
s_cont_inp = x.get('s_cont', None)
t_cat_k_inp = x.get('k_cat', None)
t_cont_k_inp = x.get('k_cont', None)
t_cat_o_inp = x.get('o_cat', None)
t_cont_o_inp = x.get('o_cont', None)
t_tgt_obs = x['target'] # Has to be present
# Static inputs are expected to be equal for all timesteps
# For memory efficiency there is no assert statement
s_cat_inp = s_cat_inp[:,0,:] if s_cat_inp is not None else None
s_cont_inp = s_cont_inp[:,0,:] if s_cont_inp is not None else None
s_inp = self._apply_embedding(s_cat_inp,
s_cont_inp,
self.s_cat_embed,
self.s_cont_embedding_vectors,
self.s_cont_embedding_bias)
t_known_inp = self._apply_embedding(t_cat_k_inp,
t_cont_k_inp,
self.t_cat_k_embed,
self.t_cont_k_embedding_vectors,
self.t_cont_k_embedding_bias)
t_observed_inp = self._apply_embedding(t_cat_o_inp,
t_cont_o_inp,
self.t_cat_o_embed,
self.t_cont_o_embedding_vectors,
self.t_cont_o_embedding_bias)
# Temporal observed targets
# t_observed_tgt = torch.einsum('btf,fh->btfh', t_tgt_obs, self.t_tgt_embedding_vectors)
t_observed_tgt = torch.matmul(t_tgt_obs.unsqueeze(3).unsqueeze(4), self.t_tgt_embedding_vectors.unsqueeze(1)).squeeze(3)
t_observed_tgt = t_observed_tgt + self.t_tgt_embedding_bias
return s_inp, t_known_inp, t_observed_inp, t_observed_tgt
class VariableSelectionNetwork(nn.Module):
def __init__(self, config, num_inputs):
super().__init__()
self.joint_grn = GRN(config.hidden_size*num_inputs, config.hidden_size, output_size=num_inputs, context_hidden_size=config.hidden_size)
self.var_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(num_inputs)])
def forward(self, x: Tensor, context: Optional[Tensor] = None):
Xi = x.reshape(*x.shape[:-2], -1)
grn_outputs = self.joint_grn(Xi, c=context)
sparse_weights = F.softmax(grn_outputs, dim=-1)
transformed_embed_list = [m(x[...,i,:]) for i, m in enumerate(self.var_grns)]
transformed_embed = torch.stack(transformed_embed_list, dim=-1)
#the line below performs batched matrix vector multiplication
#for temporal features it's bthf,btf->bth
#for static features it's bhf,bf->bh
variable_ctx = torch.matmul(transformed_embed, sparse_weights.unsqueeze(-1)).squeeze(-1)
return variable_ctx, sparse_weights
class StaticCovariateEncoder(nn.Module):
def __init__(self, config):
super().__init__()
self.vsn = VariableSelectionNetwork(config, config.num_static_vars)
self.context_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(4)])
def forward(self, x: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
variable_ctx, sparse_weights = self.vsn(x)
# Context vectors:
# variable selection context
# enrichment context
# state_c context
# state_h context
cs, ce, ch, cc = tuple(m(variable_ctx) for m in self.context_grns)
return cs, ce, ch, cc
class InterpretableMultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.n_head = config.n_head
assert config.hidden_size % config.n_head == 0
self.d_head = config.hidden_size // config.n_head
self.qkv_linears = nn.Linear(config.hidden_size, (2 * self.n_head + 1) * self.d_head, bias=False)
self.out_proj = nn.Linear(self.d_head, config.hidden_size, bias=False)
self.attn_dropout = nn.Dropout(config.attn_dropout)
self.out_dropout = nn.Dropout(config.dropout)
self.scale = self.d_head**-0.5
self.register_buffer("_mask", torch.triu(torch.full((config.example_length, config.example_length), float('-inf')), 1).unsqueeze(0))
def forward(self, x: Tensor, mask_future_timesteps: bool = True) -> Tuple[Tensor, Tensor]:
bs, t, h_size = x.shape
qkv = self.qkv_linears(x)
q, k, v = qkv.split((self.n_head * self.d_head, self.n_head * self.d_head, self.d_head), dim=-1)
q = q.view(bs, t, self.n_head, self.d_head)
k = k.view(bs, t, self.n_head, self.d_head)
v = v.view(bs, t, self.d_head)
# attn_score = torch.einsum('bind,bjnd->bnij', q, k)
attn_score = torch.matmul(q.permute((0, 2, 1, 3)), k.permute((0, 2, 3, 1)))
attn_score.mul_(self.scale)
if mask_future_timesteps:
attn_score = attn_score + self._mask
attn_prob = F.softmax(attn_score, dim=3)
attn_prob = self.attn_dropout(attn_prob)
# attn_vec = torch.einsum('bnij,bjd->bnid', attn_prob, v)
attn_vec = torch.matmul(attn_prob, v.unsqueeze(1))
m_attn_vec = torch.mean(attn_vec, dim=1)
out = self.out_proj(m_attn_vec)
out = self.out_dropout(out)
return out, attn_vec
class TemporalFusionTransformer(nn.Module):
"""
Implementation of https://arxiv.org/abs/1912.09363
"""
def __init__(self, config):
super().__init__()
if hasattr(config, 'model'):
config = config.model
self.encoder_length = config.encoder_length #this determines from how distant past we want to use data from
self.embedding = TFTEmbedding(config)
self.static_encoder = StaticCovariateEncoder(config)
self.history_vsn = VariableSelectionNetwork(config, config.num_historic_vars)
self.history_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.future_vsn = VariableSelectionNetwork(config, config.num_future_vars)
self.future_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.input_gate = GLU(config.hidden_size, config.hidden_size)
self.input_gate_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.enrichment_grn = GRN(config.hidden_size,
config.hidden_size,
context_hidden_size=config.hidden_size,
dropout=config.dropout)
self.attention = InterpretableMultiHeadAttention(config)
self.attention_gate = GLU(config.hidden_size, config.hidden_size)
self.attention_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.positionwise_grn = GRN(config.hidden_size,
config.hidden_size,
dropout=config.dropout)
self.decoder_gate = GLU(config.hidden_size, config.hidden_size)
self.decoder_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.quantile_proj = nn.Linear(config.hidden_size, len(config.quantiles))
def forward(self, x: Dict[str, Tensor]) -> Tensor:
s_inp, t_known_inp, t_observed_inp, t_observed_tgt = self.embedding(x)
# Static context
cs, ce, ch, cc = self.static_encoder(s_inp)
ch, cc = ch.unsqueeze(0), cc.unsqueeze(0) #lstm initial states
# Temporal input
_historical_inputs = [t_known_inp[:,:self.encoder_length,:], t_observed_tgt[:,:self.encoder_length,:]]
if t_observed_inp is not None:
_historical_inputs.insert(0,t_observed_inp[:,:self.encoder_length,:])
historical_inputs = torch.cat(_historical_inputs, dim=-2)
future_inputs = t_known_inp[:, self.encoder_length:]
# Encoders
historical_features, _ = self.history_vsn(historical_inputs, cs)
history, state = self.history_encoder(historical_features, (ch, cc))
future_features, _ = self.future_vsn(future_inputs, cs)
future, _ = self.future_encoder(future_features, state)
torch.cuda.synchronize() # this call gives perf boost for unknown reasons
# skip connection
input_embedding = torch.cat([historical_features, future_features], dim=1)
temporal_features = torch.cat([history, future], dim=1)
temporal_features = self.input_gate(temporal_features)
temporal_features = temporal_features + input_embedding
temporal_features = self.input_gate_ln(temporal_features)
# Static enrichment
enriched = self.enrichment_grn(temporal_features, c=ce)
# Temporal self attention
x, _ = self.attention(enriched, mask_future_timesteps=True)
# Don't compute hictorical quantiles
x = x[:, self.encoder_length:, :]
temporal_features = temporal_features[:, self.encoder_length:, :]
enriched = enriched[:, self.encoder_length:, :]
x = self.attention_gate(x)
x = x + enriched
x = self.attention_ln(x)
# Position-wise feed-forward
x = self.positionwise_grn(x)
# Final skip connection
x = self.decoder_gate(x)
x = x + temporal_features
x = self.decoder_ln(x)
out = self.quantile_proj(x)
return out

View file

@ -0,0 +1 @@
tensorboard

View file

@ -0,0 +1,54 @@
#! /bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
NUM_GPUS=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
[ $NUM_GPUS -eq 16 ] && WORKER_NUMS=(1 8 16) || WORKER_NUMS=(1 8)
DATASETS=(electricity traffic)
rm -r /tmp/benchmark_results
for DATASET in ${DATASETS[@]}
do
for NGPU in ${WORKER_NUMS[@]}
do
for BATCH_SIZE in 512 1024 1536 2048 2560
do
for USE_AMP in --use_amp ""
do
for AFFINITY in "--affinity disabled" "--affinity single" "--affinity socket_unique_interleaved"
do
EXP_NAME="TFT_benchmark_${DATASET}_BS_${BATCH_SIZE}_${NGPU}GPU${USE_AMP}_${AFFINITY}"
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset ${DATASET} \
--data_path /data/processed/${DATASET}_bin \
--batch_size=${BATCH_SIZE} \
--lr 5e-4 \
--epochs 1 \
--sample 100000 5000 \
--seed 1 \
${USE_AMP} \
${AFFINITY} \
--clip_grad 0.1 \
--results /tmp/benchmark_results/${EXP_NAME}
done
done
done
done
done
for P in `ls /tmp/benchmark_results/`;
do
echo ${P}
tail -n 1 /tmp/benchmark_results/${P}/dllogger.json
done

View file

@ -0,0 +1,40 @@
#!/bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
DATAPATH='/data'
declare -A URLS=( ['electricity']='https://archive.ics.uci.edu/ml/machine-learning-databases/00321/LD2011_2014.txt.zip'
['traffic']='https://archive.ics.uci.edu/ml/machine-learning-databases/00204/PEMS-SF.zip'
)
mkdir -p ${DATAPATH}/raw
mkdir -p ${DATAPATH}/processed
for DS in electricity traffic
do
DS_PATH=${DATAPATH}/raw/${DS}
ZIP_FNAME=${DS_PATH}.zip
if [ ! -d ${DS_PATH} ]
then
wget "${URLS[${DS}]}" -O ${ZIP_FNAME}
unzip ${ZIP_FNAME} -d ${DS_PATH}
fi
python -c "from data_utils import standarize_${DS} as standarize; standarize(\"${DS_PATH}\")"
python -c "from data_utils import preprocess; \
from configuration import ${DS^}Config as Config; \
preprocess(\"${DS_PATH}/standarized.csv\", \"${DATAPATH}/processed/${DS}_bin\", Config())"
done

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,36 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.06-py3
FROM ${FROM_IMAGE_NAME}
RUN apt-get update && apt-get install -y libb64-dev libb64-0d
WORKDIR /workspace
#ENV PYTHONPATH /workspace
RUN pip uninstall -y typing
RUN apt update && apt install -y p7zip-full
COPY requirements.txt .
RUN pip install --upgrade pip
RUN pip install --no-cache-dir --ignore-installed -r requirements.txt
RUN pip install --no-cache-dir -e git://github.com/NVIDIA/dllogger#egg=dllogger
COPY . .
ENV PYTHONPATH="${PYTHONPATH}:/workspace"
# AMP monkey-patch
RUN sed -i 's/ def forward(ctx,/ @amp.custom_fwd\(cast_inputs=torch.float32\)\n def forward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/ def backward(ctx,/ @amp.custom_bwd\n def backward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/^import torch$/import torch\nfrom torch.cuda import amp/' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,25 @@
Individual Contributor License Agreement (CLA)
Thank you for submitting your contributions to this project.
By signing this CLA, you agree that the following terms apply to all of your past, present and future contributions to the project.
License.
You hereby represent that all present, past and future contributions are governed by the Apache 2.0 License copyright statement.
This entails that to the extent possible under law, you transfer all copyright and related or neighboring rights of the code or documents you contribute to the project itself or its maintainers. Furthermore you also represent that you have the authority to perform the above waiver with respect to the entirety of you contributions.
Moral Rights.
To the fullest extent permitted under applicable law, you hereby waive, and agree not to assert, all of your “moral rights” in or relating to your contributions for the benefit of the project.
Third Party Content.
If your Contribution includes or is based on any source code, object code, bug fixes, configuration changes, tools, specifications, documentation, data, materials, feedback, information or other works of authorship that were not authored by you (“Third Party Content”) or if you are aware of any third party intellectual property or proprietary rights associated with your Contribution (“Third Party Rights”), then you agree to include with the submission of your Contribution full details respecting such Third Party Content and Third Party Rights, including, without limitation, identification of which aspects of your Contribution contain Third Party Content or are associated with Third Party Rights, the owner/author of the Third Party Content and Third Party Rights, where you obtained the Third Party Content, and any applicable third party license terms or restrictions respecting the Third Party Content and Third Party Rights. For greater certainty, the foregoing obligations respecting the identification of Third Party Content and Third Party Rights do not apply to any portion of a Project that is incorporated into your Contribution to that same Project.
Representations.
You represent that, other than the Third Party Content and Third Party Rights identified by you in accordance with this Agreement, you are the sole author of your Contributions and are legally entitled to grant the foregoing licenses and waivers in respect of your Contributions. If your Contributions were created in the course of your employment with your past or present employer(s), you represent that such employer(s) has authorized you to make your Contributions on behalf of such employer(s) or such employer (s) has waived all of their right, title or interest in or to your Contributions.
Disclaimer.
To the fullest extent permitted under applicable law, your Contributions are provided on an "as is" basis, without any warranties or conditions, express or implied, including, without limitation, any implied warranties or conditions of non-infringement, merchantability or fitness for a particular purpose. You are not required to provide support for your Contributions, except to the extent you desire to provide support.
No Obligation.
You acknowledge that the maintainers of this project are under no obligation to use or incorporate your contributions into the project. The decision to use or incorporate your contributions into the project will be made at the sole discretion of the maintainers or their authorized delegates.

View file

@ -0,0 +1,3 @@
TFT for PyTorch
This repository includes software from https://github.com/google-research/google-research/tree/master/tft licensed under the Apache License, Version 2.0

View file

@ -0,0 +1,465 @@
# Temporal Fusion Transformer For PyTorch
This repository provides a script and recipe to train the Temporal Fusion Transformer model to achieve state-of-the-art accuracy. The content of this repository is tested and maintained by NVIDIA.
## Table Of Contents
- [Model overview](#model-overview)
* [Model architecture](#model-architecture)
* [Default configuration](#default-configuration)
* [Feature support matrix](#feature-support-matrix)
* [Features](#features)
* [Mixed precision training](#mixed-precision-training)
* [Enabling mixed precision](#enabling-mixed-precision)
* [Enabling TF32](#enabling-tf32)
* [Glossary](#glossary)
- [Setup](#setup)
* [Requirements](#requirements)
- [Quick Start Guide](#quick-start-guide)
- [Advanced](#advanced)
* [Scripts and sample code](#scripts-and-sample-code)
* [Command-line options](#command-line-options)
* [Getting the data](#getting-the-data)
* [Dataset guidelines](#dataset-guidelines)
* [Multi-dataset](#multi-dataset)
* [Training process](#training-process)
* [Inference process](#inference-process)
- [Performance](#performance)
* [Benchmarking](#benchmarking)
* [Training performance benchmark](#training-performance-benchmark)
* [Inference performance benchmark](#inference-performance-benchmark)
* [Results](#results)
* [Training accuracy results](#training-accuracy-results)
* [Training accuracy: NVIDIA DGX A100 (8x A100 80GB)](#training-accuracy-nvidia-dgx-a100-8x-a100-80gb)
* [Training accuracy: NVIDIA DGX-1 (8x V100 16GB)](#training-accuracy-nvidia-dgx-1-8x-v100-16gb)
* [Training stability test](#training-stability-test)
* [Training performance results](#training-performance-results)
* [Training performance: NVIDIA DGX A100 (8x A100 80GB)](#training-performance-nvidia-dgx-a100-8x-a100-80gb)
* [Training performance: NVIDIA DGX-1 (8x V100 16GB)](#training-performance-nvidia-dgx-1-8x-v100-16gb)
- [Release notes](#release-notes)
* [Changelog](#changelog)
* [Known issues](#known-issues)
## Model overview
The Temporal Fusion Transformer [TFT](https://arxiv.org/abs/1912.09363) model is a state-of-the-art architecture for interpretable, multi-horizon time-series prediction. The model was first developed and [implemented by Google](https://github.com/google-research/google-research/tree/master/tft) with the collaboration with the University of Oxford.
This implementation differs from the reference implementation by addressing the issue of missing data, which is common in production datasets, by either masking their values in attention matrices or embedding them as a special value in the latent space.
This model enables the prediction of confidence intervals for future values of time series for multiple future timesteps.
This model is trained with mixed precision using Tensor Cores on Volta, Turing, and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.45x faster than training without Tensor Cores while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
### Model architecture
The TFT model is a hybrid architecture joining LSTM encoding of time series and interpretability of transformer attention layers. Prediction is based on three types of variables: static (constant for a given time series), known (known in advance for whole history and future), observed (known only for historical data). All these variables come in two flavors: categorical, and continuous. In addition to historical data, we feed the model with historical values of time series. All variables are embedded in high-dimensional space by learning an embedding vector. Categorical variables embeddings are learned in the classical sense of embedding discrete values. The model learns a single vector for each continuous variable, which is then scaled by this variables value for further processing. The next step is to filter variables through the Variable Selection Network (VSN), which assigns weights to the inputs in accordance with their relevance to the prediction. Static variables are used as a context for variable selection of other variables and as an initial state of LSTM encoders.
After encoding, variables are passed to multi-head attention layers (decoder), which produce the final prediction. Whole architecture is interwoven with residual connections with gating mechanisms that allow the architecture to adapt to various problems by skipping some parts of it.
For the sake of explainability, heads of self-attention layers share value matrices. This allows interpreting self-attention as an ensemble of models predicting different temporal patterns over the same feature set. The other feature that helps us understand the model is VSN activations, which tells us how relevant the given feature is to the prediction.
![](TFT_architecture.PNG)
*image source: https://arxiv.org/abs/1912.09363*
### Default configuration
The specific configuration of the TFT model depends on the dataset used. Not only is the volume of the model subject to change but so are the data sampling and preprocessing strategies. During preprocessing, data is normalized per feature. For a part of the datasets, we apply scaling per-time-series, which takes into account shifts in distribution between entities (i.e., a factory consumes more electricity than an average house). The model is trained with the quantile loss: <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_it,\hat{y}_i(q,t),q)}{Nt_{max}}">
For quantiles in [0.1, 0.5, 0.9]. The default configurations are tuned for distributed training on DGX-1-32G with mixed precision. We use dynamic loss scaling. Specific values are provided in the table below.
| Dataset | Training samples | Validation samples | Test samples | History length | Forecast horizon | Dropout | Hidden size | #Heads | BS | LR | Gradient clipping |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Electricity | 450k | 50k | 53.5k | 168 | 24 | 0.1 | 128 | 4 | 8x1024 | 1e-3 | 0.0 |
| Traffic | 450k | 50k | 139.6k | 168 | 24 | 0.3 | 128 | 4 | 8x1024 | 1e-3 | 0.0
### Feature support matrix
The following features are supported by this model:
| Feature | Yes column
|----------------------------|--------------------------
|Distributed data parallel | Yes
|PyTorch AMP | Yes
#### Features
[Automatic Mixed Precision](https://pytorch.org/docs/stable/amp.html)
provides an easy way to leverage Tensor Cores performance. It allows the execution of parts of a network in lower precision. Refer to [Mixed precision training](#mixed-precision-training) for more information.
[PyTorch
DistributedDataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel) - a module
wrapper that enables easy multiprocess distributed data-parallel
training.
### Mixed precision training
Mixed precision is the combined use of different numerical precisions in a
computational method.
[Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant
computational speedup by performing operations in half-precision format while
storing minimal information in single-precision to retain as much information
as possible in critical parts of the network. Since the introduction of [Tensor Cores](https://developer.nvidia.com/tensor-cores) in Volta, and following with
both the Turing and Ampere architectures, significant training speedups are
experienced by switching to
mixed precision -- up to 3x overall speedup on the most arithmetically intense
model architectures. Using mixed precision training previously required two
steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Manually adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced
in the Pascal architecture and first supported in [CUDA
8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep
Learning SDK.
For information about:
* How to train using mixed precision, refer to the [Mixed Precision
Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed
Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html)
documentation.
* Techniques used for mixed precision training, refer to the [Mixed-Precision
Training of Deep Neural
Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/)
blog.
* APEX tools for mixed precision training, refer to the [NVIDIA Apex: Tools for Easy Mixed-Precision Training in
PyTorch](https://devblogs.nvidia.com/apex-pytorch-easy-mixed-precision-training/)
.
#### Enabling mixed precision
Mixed precision is enabled in PyTorch by using the Automatic Mixed Precision torch.cuda.amp module, which casts variables to half-precision upon retrieval while storing variables in single-precision format. Furthermore, to preserve small gradient magnitudes in backpropagation, a [loss scaling](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#lossscaling) step must be included when applying gradients. In PyTorch, loss scaling can be applied automatically by the GradScaler class. All the necessary steps to implement AMP are verbosely described [here](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples).
To enable mixed precision for TFT, simply add the `--use_amp` option to the training script.
#### Enabling TF32
TensorFloat-32 (TF32) is the new math mode in [NVIDIA A100](https://www.nvidia.com/en-us/data-center/a100/) GPUs for handling the matrix math, also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs.
TF32 Tensor Cores can speed up networks using FP32, typically with no loss of accuracy. It is more robust than FP16 for models which require high dynamic range for weights or activations.
For more information, refer to the [TensorFloat-32 in the A100 GPU Accelerates AI Training, HPC up to 20x](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/) blog post.
TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.
### Glossary
**Multi horizon prediction**
Process of estimating values of a time series for multiple future time steps.
**Quantiles**
Cut points dividing the range of a probability distribution intervals with equal probabilities.
**Time series**
Series of data points indexed and equally spaced in time.
**Transformer**
The paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762) introduces a novel architecture called Transformer that uses an attention mechanism and transforms one sequence into another.
## Setup
The following section lists the requirements that you need to meet in order to start training the TFT model.
### Requirements
This repository contains Dockerfile, which extends the PyTorch NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
- [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
- Supported GPUs:
- [NVIDIA Volta architecture](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/)
- [NVIDIA Turing architecture](https://www.nvidia.com/en-us/design-visualization/technologies/turing-architecture/)
- [NVIDIA Ampere architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/)
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
- Running [PyTorch](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/running.html#running)
For those unable to use the PyTorch NGC container to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
## Quick Start Guide
To train your model using mixed or TF32 precision with Tensor Cores, perform the following steps using the default parameters of the TFT model on any of the benchmark datasets. For the specifics concerning training and inference, refer to the [Advanced](#advanced) section.
1. Clone the repository.
```bash
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Forecasting/TFT
```
2. Build the TFT PyTorch NGC container.
```bash
docker build --network=host -t tft .
```
3. Start an interactive session in the NGC container to run training/inference.
```bash
docker run -it --rm --ipc=host --network=host --gpus all -v /path/to/your/data:/data/ tft
```
Note: Ensure to mount your dataset using the -v flag to make it available for training inside the NVIDIA Docker container.
4. Download and preprocess datasets.
```bash
bash scripts/get_data.sh
```
5. Start training. Choose one of the scripts provided in the `scripts/` directory. Results are stored in the `/results` directory.
These scripts are tuned for DGX1-32G. If you have a different system, use NGPU and BATCH_SIZE variables to adjust the parameters for your system.
```bash
bash scripts/run_electricity.sh
bash scripts/run_traffic.sh
```
6. Start validation/evaluation. The metric we use for evaluation is q-risk. We can compare it per-quantile in the Pareto sense or jointly as one number indicating accuracy.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin
```
7. Start inference/predictions. Visualize and save predictions by running the following command.
```bash
python inference.py \
--checkpoint <your_checkpoint> \
--data /data/processed/<dataset>/test.csv \
--cat_encodings /data/processed/<dataset>/cat_encodings.bin \
--tgt_scalers /data/processed/<dataset>/tgt_scalers.bin \
--visualize \
--save_predictions
```
Now that you have your model trained and evaluated, you can choose to compare your training results with our [Training accuracy results](#training-accuracy-results). You can also choose to benchmark your performance to [Training performance benchmark](#training-performance-results). Following the steps in these sections will ensure that you achieve the same accuracy and performance results as stated in the [Results](#results) section.
## Advanced
The following sections provide more details about the dataset, running training and inference, and the training results.
### Scripts and sample code
In the root directory, the most important files are:
`train.py`: Entry point for training
`data_utils.py`: File containing the dataset implementation and preprocessing functions
`modeling.py`: Definition of the model
`configuration.py`: Contains configuration classes for various experiments
`test.py`: Entry point testing trained model.
`Dockerfile`: Container definition
`log_helper.py`: Contains helper functions for setting up dllogger
`criterions.py`: Definitions of loss functions
The `scripts` directory contains scripts for default use cases:
`run_electricity.sh`: train default model on the electricity dataset
`run_traffic.sh`: train default model on the traffic dataset
### Command-line options
To view the full list of available options and their descriptions, use the `-h` or `--help` command-line option, for example:
`python train.py --help`.
The following example output is printed when running the model:
```
usage: train.py [-h] --data_path DATA_PATH --dataset {electricity,volatility,traffic,favorita} [--epochs EPOCHS] [--sample_data SAMPLE_DATA SAMPLE_DATA] [--batch_size BATCH_SIZE] [--lr LR] [--seed SEED] [--use_amp] [--clip_grad CLIP_GRAD]
[--early_stopping EARLY_STOPPING] [--results RESULTS] [--log_file LOG_FILE] [--distributed_world_size N] [--distributed_rank DISTRIBUTED_RANK] [--local_rank LOCAL_RANK] [--overwrite_config OVERWRITE_CONFIG]
optional arguments:
-h, --help show this help message and exit
--data_path DATA_PATH
--dataset {electricity,volatility,traffic,favorita}
--epochs EPOCHS
--sample_data SAMPLE_DATA SAMPLE_DATA
--batch_size BATCH_SIZE
--lr LR
--seed SEED
--use_amp Enable automatic mixed precision
--clip_grad CLIP_GRAD
--early_stopping EARLY_STOPPING
Stop training if validation loss does not improve for more than this number of epochs.
--results RESULTS
--log_file LOG_FILE
--distributed_world_size N
total number of GPUs across all nodes (default: all visible GPUs)
--distributed_rank DISTRIBUTED_RANK
rank of the current worker
--local_rank LOCAL_RANK
rank of the current worker
--overwrite_config OVERWRITE_CONFIG
JSON string used to overload config
```
### Getting the data
The TFT model was trained on the electricity and traffic benchmark datasets. This repository contains the `get_data.sh` download script, which for electricity and and traffic datasets will automatically download and preprocess the training, validation and test datasets, and produce files that contain scalers.
#### Dataset guidelines
The `data_utils.py` file contains all functions that are used to preprocess the data. Initially the data is loaded to a `pandas.DataFrame` and parsed to the common format which contains the features we will use for training. Then standardized data is cleaned, normalized, encoded and binarized.
This step does the following:
Drop all the columns that are not marked in the configuration file as used for training or preprocessing
Flatten indices in case time series are indexed by more than one column
Split the data into training, validation and test splits
Filter out all the time series shorter than minimal example length
Normalize columns marked as continuous in the configuration file
Encode as integers columns marked as categorical
Save the data in csv and binary formats
#### Multi-dataset
In order to use an alternate dataset, you have to write a function that parses your data to a common format. The format is as follows:
There is at least one id column
There is exactly one time column (that can also be used as a feature column)
Each feature is in a separate column
Each row represents a moment in time for only one time series
Additionally, you must specify a configuration of the network, including a data description. Refer to the example in `configuration.py` file.
### Training process
The `train.py` script is an entry point for a training procedure. Refined recipes can be found in the `scripts` directory.
The model trains for at most `--epochs` epochs. If option `--early_stopping N` is set, then training will end if for N subsequent epochs validation loss hadnt improved.
The details of the architecture and the dataset configuration are encapsulated by the `--dataset` option. This option chooses one of the configurations stored in the `configuration.py` file. You can enable mixed precision training by providing the `--use_amp` option. The training script supports multi-GPU training with the APEX package. To enable distributed training prepend training command with `python -m torch.distributed.launch --nproc_per_node=${NGPU}`.
Example command:
```
python -m torch.distributed.launch --nproc_per_node=8 train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=1024 \
--sample 450000 50000 \
--lr 1e-3 \
--epochs 25 \
--early_stopping 5 \
--seed 1 \
--use_amp \
--results /results/TFT_electricity_bs8x1024_lr1e-3/seed_1
```
The model is trained by optimizing quantile loss <img src="https://render.githubusercontent.com/render/math?math=\Large\sum_{i=1}^N\sum_{q\in\mathcal{Q}}\sum_{t=1}^{t_{max}}\frac{QL(y_{it},\hat{y}_i(q,t),q)}{Nt_{max}}">
. After training, the checkpoint with the least validation loss is evaluated on a test split with q-risk metric <img src="https://render.githubusercontent.com/render/math?math=\Large\frac{2\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}QL(y_t,\hat{y}(q,t),q)}{\sum_{y\in\Omega}\sum_{t=1}^{t_{max}}|y_t|}">.
Results are by default stored in the `/results` directory. This can be changed by providing the `--results` option. At the end of the training, the results directory will contain the trained checkpoint which had the lowest validation loss, dllogger logs (in dictionary per line format), and TensorBoard logs.
### Inference process
Inference can be run by launching the `inference.py` script. The script requires a trained checkpoint to run. It is crucial to prepare the data in the same way as training data prior to running the inference. Example command:
```
python inference.py \
--checkpoint /results/checkpoint.pt \
--data /data/processed/electricity_bin/test.csv \
--tgt_scalers /data/processed/electricity_bin/tgt_scalers.bin \
--cat_encodings /data/processed/electricity_bin/cat_encodings.bin \
--batch_size 2048 \
--visualize \
--save_predictions \
--joint_visualization \
--results /results \
--use_amp
```
In the default setting, it performs the evaluation of the model on a specified dataset and prints q-risk evaluated on this dataset. In order to save the predictions, use the `--save_predictions` option. Predictions will be stored in the directory specified by the `--results` option in the csv format. Option `--joint_visualization` allows us to plot graphs in TensorBoard format, allowing us to inspect the results and compare them to true values. Using `--visualize`, you can save plots for each example in a separate file.
## Performance
### Benchmarking
The following section shows how to run benchmarks measuring the model performance in training and inference modes.
#### Training performance benchmark
In order to run training benchmarks, use the `scripts/benchmark.sh` script.
#### Inference performance benchmark
To benchmark the inference performance on a specific batch size and dataset, run the `inference.py` script.
### Results
The following sections provide details on how we achieved our performance and accuracy in training and inference.
#### Training accuracy results
We conducted an extensive hyperparameter search along with stability tests. The presented results are the averages from the hundreds of runs.
##### Training accuracy: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - TF32 | Accuracy - mixed precision | Time to train - TF32 | Time to train - mixed precision | Time to train speedup (TF32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-------
| Electricity | 1 | 1024 | 0.027 / 0.059 / 0.029 | 0.028 / 0.058 / 0.029 | 1427s | 1087s | 1.313x
| Electricity | 8 | 1024 | 0.027 / 0.056 / 0.028 | 0.026 / 0.054 / 0.029 | 216s | 176s | 1.227x
| Traffic | 1 | 1024 | 0.040 / 0.103 / 0.075 | 0.040 / 0.103 / 0.075 | 957s | 726s | 1.318x
| Traffic | 8 | 1024 | 0.042 / 0.104 / 0.076 | 0.042 / 0.106 / 0.077 | 151s | 126s | 1.198x
##### Training accuracy: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with V100 16GB GPUs.
| Dataset | GPUs | Batch size / GPU | Accuracy - FP32 | Accuracy - mixed precision | Time to train - FP32 | Time to train - mixed precision | Time to train speedup (FP32 to mixed precision)
|-------------|---|------|-----------------------|-----------------------|-------|-------|-----------
| Electricity | 1 | 1024 | 0.027 / 0.056 / 0.028 | 0.027 / 0.058 / 0.029 | 2559s | 1598s | 1.601x
| Electricity | 8 | 1024 | 0.027 / 0.055 / 0.028 | 0.027 / 0.055 / 0.029 | 381s | 261s | 1.460x
| Traffic | 1 | 1024 | 0.040 / 0.102 / 0.075 | 0.041 / 0.101 / 0.074 | 1718s | 1062s | 1.618x
| Traffic | 8 | 1024 | 0.042 / 0.106 / 0.076 | 0.042 / 0.105 / 0.077 | 256s | 176s | 1.455x
##### Training stability test
In order to get a greater picture of the models accuracy, we performed a hyperparameter search along with stability tests on 100 random seeds for each configuration. Then, for each benchmark dataset, we have chosen the architecture with the least mean test q-risk. The table below summarizes the best configurations.
| Dataset | #GPU | Hidden size | #Heads | Local BS | LR | Gradient clipping | Dropout | Mean q-risk | Std q-risk | Min q-risk | Max q-risk
|-------------|------|-------------|--------|----------|------|-------------------|---------|-------------|------------| -----------|------
| Electricity | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.1 | 0.1131 | 0.0025 | 0.1080 | 0.1200
| Traffic | 8 | 128 | 4 | 1024 | 1e-3 | 0.0 | 0.3 | 0.2180 | 0.0049 | 0.2069 | 0.2336
#### Training performance results
##### Training performance: NVIDIA DGX A100 (A100 80GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA A100 (A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - TF32 | Throughput - mixed precision | Throughput speedup (TF32 - mixed precision) | Weak scaling - TF32 | Weak scaling - mixed precision
|-------------|---|------|--------|--------|-------|-------|-----
| Electricity | 1 | 1024 | 10173 | 13703 | 1.35x | 1 | 1
| Electricity | 8 | 1024 | 80596 | 107761 | 1.34x | 7.92x | 7.86x
| Traffic | 1 | 1024 | 10197 | 13779 | 1.35x | 1 | 1
| Traffic | 8 | 1024 | 80692 | 107979 | 1.34x | 7.91x | 7.84x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
##### Training performance: NVIDIA DGX-1 (V100 16GB)
Our results were obtained by running the `train.sh` training script in the [PyTorch 21.06 NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) on NVIDIA DGX-1 with (V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.
| Dataset | GPUs | Batch size / GPU | Throughput - FP32 | Throughput - mixed precision | Throughput speedup (FP32 - mixed precision) | Weak scaling - FP32 | Weak scaling - mixed precision
|-------------|---|------|-------|-------|-------|------|----
| Electricity | 1 | 1024 | 5580 | 9148 | 1.64x | 1 | 1
| Electricity | 8 | 1024 | 43351 | 69855 | 1.61x | 7.77x | 7.64x
| Traffic | 1 | 1024 | 5593 | 9194 | 1.64x | 1 | 1
| Traffic | 8 | 1024 | 43426 | 69983 | 1.61x | 7.76x | 7.61x
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
The performance metrics used were items per second.
## Release notes
The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIAs latest software release. For the most up-to-date performance measurements, go to https://developer.nvidia.com/deep-learning-performance-training-inference.
### Changelog
October 2021
- Initial release
### Known issues
There are no known issues with this model.

Binary file not shown.

After

Width:  |  Height:  |  Size: 969 KiB

View file

@ -0,0 +1,128 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from data_utils import InputTypes, DataTypes, FeatureSpec
import datetime
class ElectricityConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('power_usage', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('hour', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'days_from_start' # This column contains time indices across which we split the data
self.train_range = (1096, 1315)
self.valid_range = (1308, 1339)
self.test_range = (1332, 1346)
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = True
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [369]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.1
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
class TrafficConfig():
def __init__(self):
self.features = [
FeatureSpec('id', InputTypes.ID, DataTypes.CATEGORICAL),
FeatureSpec('hours_from_start', InputTypes.TIME, DataTypes.CONTINUOUS),
FeatureSpec('values', InputTypes.TARGET, DataTypes.CONTINUOUS),
FeatureSpec('time_on_day', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('day_of_week', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('hours_from_start', InputTypes.KNOWN, DataTypes.CONTINUOUS),
FeatureSpec('categorical_id', InputTypes.STATIC, DataTypes.CATEGORICAL),
]
# Dataset split boundaries
self.time_ids = 'sensor_day' # This column contains time indices across which we split the data
self.train_range = (0, 151)
self.valid_range = (144, 166)
self.test_range = (159, float('inf'))
self.dataset_stride = 1 #how many timesteps between examples
self.scale_per_id = False
self.missing_id_strategy = None
self.missing_cat_data_strategy='encode_all'
# Feature sizes
self.static_categorical_inp_lens = [963]
self.temporal_known_categorical_inp_lens = []
self.temporal_observed_categorical_inp_lens = []
self.quantiles = [0.1, 0.5, 0.9]
self.example_length = 8 * 24
self.encoder_length = 7 * 24
self.n_head = 4
self.hidden_size = 128
self.dropout = 0.3
self.attn_dropout = 0.0
#### Derived variables ####
self.temporal_known_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.KNOWN and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_observed_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.OBSERVED and x.feature_embed_type == DataTypes.CONTINUOUS])
self.temporal_target_size = len([x for x in self.features if x.feature_type == InputTypes.TARGET])
self.static_continuous_inp_size = len([x for x in self.features
if x.feature_type == InputTypes.STATIC and x.feature_embed_type == DataTypes.CONTINUOUS])
self.num_static_vars = self.static_continuous_inp_size + len(self.static_categorical_inp_lens)
self.num_future_vars = self.temporal_known_continuous_inp_size + len(self.temporal_known_categorical_inp_lens)
self.num_historic_vars = sum([self.num_future_vars,
self.temporal_observed_continuous_inp_size,
self.temporal_target_size,
len(self.temporal_observed_categorical_inp_lens),
])
CONFIGS = {'electricity': ElectricityConfig,
'traffic': TrafficConfig,
}

View file

@ -0,0 +1,28 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import torch
import torch.nn as nn
import torch.nn.functional as F
class QuantileLoss(nn.Module):
def __init__(self, config):
super().__init__()
self.register_buffer('q', torch.tensor(config.quantiles))
def forward(self, predictions, targets):
diff = predictions - targets
ql = (1-self.q)*F.relu(diff) + self.q*F.relu(-diff)
losses = ql.view(-1, ql.shape[-1]).mean(0)
return losses

View file

@ -0,0 +1,790 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################
# Copyright 2021 The Google Research Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import math
import pickle
import enum
import datetime
from collections import namedtuple, OrderedDict
import sklearn.preprocessing
from sklearn.impute import SimpleImputer
import pandas as pd
import numpy as np
from bisect import bisect
import torch
from torch.utils.data import Dataset,IterableDataset,DataLoader
class DataTypes(enum.IntEnum):
"""Defines numerical types of each column."""
CONTINUOUS = 0
CATEGORICAL = 1
DATE = 2
STR = 3
class InputTypes(enum.IntEnum):
"""Defines input types of each column."""
TARGET = 0
OBSERVED = 1
KNOWN = 2
STATIC = 3
ID = 4 # Single column used as an entity identifier
TIME = 5 # Single column exclusively used as a time index
FeatureSpec = namedtuple('FeatureSpec', ['name', 'feature_type', 'feature_embed_type'])
DTYPE_MAP = {
DataTypes.CONTINUOUS : np.float32,
DataTypes.CATEGORICAL : np.int64,
DataTypes.DATE:'datetime64[ns]',
DataTypes.STR: str
}
FEAT_ORDER = [
(InputTypes.STATIC, DataTypes.CATEGORICAL),
(InputTypes.STATIC, DataTypes.CONTINUOUS),
(InputTypes.KNOWN, DataTypes.CATEGORICAL),
(InputTypes.KNOWN, DataTypes.CONTINUOUS),
(InputTypes.OBSERVED, DataTypes.CATEGORICAL),
(InputTypes.OBSERVED, DataTypes.CONTINUOUS),
(InputTypes.TARGET, DataTypes.CONTINUOUS),
(InputTypes.ID, DataTypes.CATEGORICAL)
]
FEAT_NAMES = ['s_cat' , 's_cont' , 'k_cat' , 'k_cont' , 'o_cat' , 'o_cont' , 'target', 'id']
DEFAULT_ID_COL = 'id'
class TFTBinaryDataset(Dataset):
def __init__(self, path, config):
super(TFTBinaryDataset).__init__()
self.features = [x for x in config.features if x.feature_embed_type != DataTypes.DATE]
self.example_length = config.example_length
self.stride = config.dataset_stride
self.grouped = pickle.load(open(path, 'rb'))
self.grouped = [x for x in self.grouped if x.shape[0] >= self.example_length]
self._cum_examples_in_group = np.cumsum([(g.shape[0] - self.example_length + 1)//self.stride for g in self.grouped])
self.feature_type_col_map = [[i for i,f in enumerate(self.features) if (f.feature_type, f.feature_embed_type) == x] for x in FEAT_ORDER]
# The list comprehension below is an elaborate way of rearranging data into correct order,
# simultaneously doing casting to proper types. Probably can be written neater
self.grouped = [
[
arr[:, idxs].view(dtype=np.float32).astype(DTYPE_MAP[t[1]])
for t, idxs in zip(FEAT_ORDER, self.feature_type_col_map)
]
for arr in self.grouped
]
def __len__(self):
return self._cum_examples_in_group[-1] if len(self._cum_examples_in_group) else 0
def __getitem__(self, idx):
g_idx = bisect(self._cum_examples_in_group, idx)
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx]
tensors = [
torch.from_numpy(feat[e_idx * self.stride:e_idx*self.stride + self.example_length])
if feat.size else torch.empty(0)
for feat in group
]
return OrderedDict(zip(FEAT_NAMES, tensors))
class TFTDataset(Dataset):
def __init__(self, path, config):
super(TFTDataset).__init__()
self.features = config.features
self.data = pd.read_csv(path, index_col=0)
self.example_length = config.example_length
self.stride = config.dataset_stride
# name field is a column name.
# there can be multiple entries with the same name because one column can be interpreted in many ways
time_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.TIME)
id_col_name = next(x.name for x in self.features if x.feature_type==InputTypes.ID)
if not id_col_name in self.data.columns:
id_col_name = DEFAULT_ID_COL
self.features = [x for x in self.features if x.feature_type!=InputTypes.ID]
self.features.append(FeatureSpec(DEFAULT_ID_COL, InputTypes.ID, DataTypes.CATEGORICAL))
col_dtypes = {v.name:DTYPE_MAP[v.feature_embed_type] for v in self.features}
self.data.sort_values(time_col_name,inplace=True)
self.data = self.data[set(x.name for x in self.features)] #leave only relevant columns
self.data = self.data.astype(col_dtypes)
self.data = self.data.groupby(id_col_name).filter(lambda group: len(group) >= self.example_length)
self.grouped = list(self.data.groupby(id_col_name))
self._cum_examples_in_group = np.cumsum([(len(g[1]) - self.example_length + 1)//self.stride for g in self.grouped])
def __len__(self):
return self._cum_examples_in_group[-1]
def __getitem__(self, idx):
g_idx = len([x for x in self._cum_examples_in_group if x <= idx])
e_idx = idx - self._cum_examples_in_group[g_idx-1] if g_idx else idx
group = self.grouped[g_idx][1]
sliced = group.iloc[e_idx * self.stride:e_idx*self.stride + self.example_length]
# We need to be sure that tensors are returned in the correct order
tensors = tuple([] for _ in range(8))
for v in self.features:
if v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[0].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.STATIC and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[1].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[2].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.KNOWN and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[3].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CATEGORICAL:
tensors[4].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.OBSERVED and v.feature_embed_type == DataTypes.CONTINUOUS:
tensors[5].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.TARGET:
tensors[6].append(torch.from_numpy(sliced[v.name].to_numpy()))
elif v.feature_type == InputTypes.ID:
tensors[7].append(torch.from_numpy(sliced[v.name].to_numpy()))
tensors = [torch.stack(x, dim=-1) if x else torch.empty(0) for x in tensors]
return OrderedDict(zip(FEAT_NAMES, tensors))
def get_dataset_splits(df, config):
if hasattr(config, 'relative_split') and config.relative_split:
forecast_len = config.example_length - config.encoder_length
# The valid split is shifted from the train split by number of the forecast steps to the future.
# The test split is shifted by the number of the forecast steps from the valid split
train = []
valid = []
test = []
for _, group in df.groupby(DEFAULT_ID_COL):
index = group[config.time_ids]
_train = group.loc[index < config.valid_boundary]
_valid = group.iloc[(len(_train) - config.encoder_length):(len(_train) + forecast_len)]
_test = group.iloc[(len(_train) - config.encoder_length + forecast_len):(len(_train) + 2*forecast_len)]
train.append(_train)
valid.append(_valid)
test.append(_test)
train = pd.concat(train, axis=0)
valid = pd.concat(valid, axis=0)
test = pd.concat(test, axis=0)
else:
index = df[config.time_ids]
train = df.loc[(index >= config.train_range[0]) & (index < config.train_range[1])]
valid = df.loc[(index >= config.valid_range[0]) & (index < config.valid_range[1])]
test = df.loc[(index >= config.test_range[0]) & (index < config.test_range[1])]
return train, valid, test
def flatten_ids(df, config):
if config.missing_id_strategy == 'drop':
if hasattr(config, 'combine_ids') and config.combine_ids:
index = np.logical_or.reduce([df[c].isna() for c in config.combine_ids])
else:
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
index = df[id_col].isna()
index = index[index == True].index # Extract indices of nans
df.drop(index, inplace=True)
if not (hasattr(config, 'combine_ids') and config.combine_ids):
id_col = next(x.name for x in config.features if x.feature_type == InputTypes.ID)
ids = df[id_col].apply(str)
df.drop(id_col, axis=1, inplace=True)
encoder = sklearn.preprocessing.LabelEncoder().fit(ids.values)
df[DEFAULT_ID_COL] = encoder.transform(ids)
encoders = OrderedDict({id_col: encoder})
else:
encoders = {c:sklearn.preprocessing.LabelEncoder().fit(df[c].values) for c in config.combine_ids}
encoders = OrderedDict(encoders)
lens = [len(v.classes_) for v in encoders.values()]
clens = np.roll(np.cumprod(lens), 1)
clens[0] = 1
# this takes a looooooot of time. Probably it would be better to create 2 dummy columns
df[DEFAULT_ID_COL] = df.apply(lambda row: sum([encoders[c].transform([row[c]])[0]*clens[i] for i,c in enumerate(encoders.keys())]), axis=1)
df.drop(config.combine_ids, axis=1, inplace=True)
return DEFAULT_ID_COL, encoders
def impute(df, config):
#XXX This ensures that out scaling will have the same mean. We still need to check the variance
if not hasattr(config, 'missing_data_label'):
return df, None
else:
imp = SimpleImputer(missing_values=config.missing_data_label, strategy='mean')
mask = df.applymap(lambda x: True if x == config.missing_data_label else False)
data = df.values
col_mask = (data == config.missing_data_label).all(axis=0)
data[:,~col_mask] = imp.fit_transform(data)
return data, mask
def normalize_reals(train, valid, test, config, id_col=DEFAULT_ID_COL):
tgt_cols = [x.name for x in config.features if x.feature_type == InputTypes.TARGET]
real_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CONTINUOUS).difference(set(tgt_cols)))
real_scalers = {}
tgt_scalers = {}
def apply_scalers(df, name=None):
if name is None:
name = df.name
mask = df.applymap(lambda x: True if x == config.missing_data_label else False) if hasattr(config, 'missing_data_label') else None
df[real_cols] = real_scalers[name].transform(df[real_cols])
if mask is not None and any(mask):
df[real_cols].mask(mask, 10**9)
df[tgt_cols] = tgt_scalers[name].transform(df[tgt_cols])
return df
if config.scale_per_id:
for identifier, sliced in train.groupby(id_col):
data = sliced[real_cols]
data, _ = impute(data, config)
real_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(data)
# XXX We should probably remove examples that contain NaN as a target
target = sliced[tgt_cols]
tgt_scalers[identifier] = sklearn.preprocessing.StandardScaler().fit(target)
train = train.groupby(id_col).apply(apply_scalers)
# For valid and testing leave only timeseries previously present in train subset
# XXX for proper data science we should consider encoding unseen timeseries as a special case, not throwing them away
valid = valid.loc[valid[id_col].isin(real_scalers.keys())]
valid = valid.groupby(id_col).apply(apply_scalers)
test = test.loc[test[id_col].isin(real_scalers.keys())]
test = test.groupby(id_col).apply(apply_scalers)
else:
data, _ = impute(train[real_cols], config)
real_scalers[''] = sklearn.preprocessing.StandardScaler().fit(data)
tgt_scalers[''] = sklearn.preprocessing.StandardScaler().fit(train[tgt_cols])
train = apply_scalers(train, name='')
valid = apply_scalers(valid, name='')
test = apply_scalers(test, name='')
return train, valid, test, real_scalers, tgt_scalers
def encode_categoricals(train, valid, test, config):
cat_encodings = {}
cat_cols = list(set(v.name for v in config.features if v.feature_embed_type == DataTypes.CATEGORICAL and v.feature_type != InputTypes.ID))
num_classes = [] #XXX Maybe we should modify config based on this value? Or send a warninig?
# For TC performance reasons we might want for num_classes[i] be divisible by 8
# Train categorical encoders
for c in cat_cols:
if config.missing_cat_data_strategy == 'special_token':
#XXX this will probably require some data augmentation
unique = train[c].unique()
valid[c].loc[valid[c].isin(unique)] = '<UNK>'
test[c].loc[test[c].isin(unique)] = '<UNK>'
if config.missing_cat_data_strategy == 'encode_all' or \
config.missing_cat_data_strategy == 'special_token':
srs = pd.concat([train[c], valid[c], test[c]]).apply(str)
cat_encodings[c] = sklearn.preprocessing.LabelEncoder().fit(srs.values)
elif config.missing_cat_data_strategy == 'drop':
# TODO: implement this. In addition to dropping rows this has to split specific time series in chunks
# to prevent data from having temporal gaps
pass
num_classes.append(srs.nunique())
print('Categorical variables encodings lens: ', num_classes)
for split in [train, valid, test]:
for c in cat_cols:
srs = split[c].apply(str)
split[c] = srs
split.loc[:,c] = cat_encodings[c].transform(srs)
return cat_encodings
def preprocess(src_path, dst_path, config):
df = pd.read_csv(src_path, index_col=0)
for c in config.features:
if c.feature_embed_type == DataTypes.DATE:
df[c.name] = pd.to_datetime(df[c.name])
# Leave only columns relevant to preprocessing
relevant_columns = list(set([f.name for f in config.features] + [config.time_ids]))
df = df[relevant_columns]
id_col, id_encoders = flatten_ids(df, config)
df = df.reindex(sorted(df.columns), axis=1)
train, valid, test = get_dataset_splits(df, config)
# Length filter the data (all timeseries shorter than example len will be dropped)
#for df in [train, valid, test]:
# df.groupby(id_col).filter(lambda x: len(x) >= config.example_length)
train = pd.concat([x[1] for x in train.groupby(id_col) if len(x[1]) >= config.example_length])
valid = pd.concat([x[1] for x in valid.groupby(id_col) if len(x[1]) >= config.example_length])
test = pd.concat([x[1] for x in test.groupby(id_col) if len(x[1]) >= config.example_length])
train, valid, test, real_scalers, tgt_scalers = normalize_reals(train, valid, test, config, id_col)
cat_encodings = encode_categoricals(train, valid, test, config)
os.makedirs(dst_path, exist_ok=True)
train.to_csv(os.path.join(dst_path, 'train.csv'))
valid.to_csv(os.path.join(dst_path, 'valid.csv'))
test.to_csv(os.path.join(dst_path, 'test.csv'))
# Save relevant columns in binary form for faster dataloading
# IMORTANT: We always expect id to be a single column indicating the complete timeseries
# We also expect a copy of id in form of static categorical input!!!
col_names = [id_col] + [x.name for x in config.features if x.feature_embed_type != DataTypes.DATE and x.feature_type != InputTypes.ID]
grouped_train = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in train.groupby(id_col)]
grouped_valid = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in valid.groupby(id_col)]
grouped_test = [x[1][col_names].values.astype(np.float32).view(dtype=np.int32) for x in test.groupby(id_col)]
pickle.dump(grouped_train, open(os.path.join(dst_path, 'train.bin'), 'wb'))
pickle.dump(grouped_valid, open(os.path.join(dst_path, 'valid.bin'), 'wb'))
pickle.dump(grouped_test, open(os.path.join(dst_path, 'test.bin'), 'wb'))
with open(os.path.join(dst_path, 'real_scalers.bin'), 'wb') as f:
pickle.dump(real_scalers, f)
with open(os.path.join(dst_path, 'tgt_scalers.bin'), 'wb') as f:
pickle.dump(tgt_scalers, f)
with open(os.path.join(dst_path, 'cat_encodings.bin'), 'wb') as f:
pickle.dump(cat_encodings, f)
with open(os.path.join(dst_path, 'id_encoders.bin'), 'wb') as f:
pickle.dump(id_encoders, f)
def sample_data(dataset, num_samples):
if num_samples < 0:
return dataset
else:
return torch.utils.data.Subset(dataset, np.random.choice(np.arange(len(dataset)), size=num_samples, replace=False))
def standarize_electricity(path):
"""Code taken from https://github.com/google-research/google-research/blob/master/tft/script_download_data.py"""
df = pd.read_csv(os.path.join(path, 'LD2011_2014.txt'), index_col=0, sep=';', decimal=',')
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)
# Used to determine the start and end dates of a series
output = df.resample('1h').mean().replace(0., np.nan)
earliest_time = output.index.min()
df_list = []
for label in output:
print('Processing {}'.format(label))
srs = output[label]
start_date = min(srs.fillna(method='ffill').dropna().index)
end_date = max(srs.fillna(method='bfill').dropna().index)
active_range = (srs.index >= start_date) & (srs.index <= end_date)
srs = srs[active_range].fillna(0.)
tmp = pd.DataFrame({'power_usage': srs})
date = tmp.index
tmp['t'] = (date - earliest_time).seconds / 60 / 60 + (
date - earliest_time).days * 24
tmp['days_from_start'] = (date - earliest_time).days
tmp['categorical_id'] = label
tmp['date'] = date
tmp['id'] = label
tmp['hour'] = date.hour
tmp['day'] = date.day
tmp['day_of_week'] = date.dayofweek
tmp['month'] = date.month
df_list.append(tmp)
output = pd.concat(df_list, axis=0, join='outer').reset_index(drop=True)
output['categorical_id'] = output['id'].copy()
output['hours_from_start'] = output['t']
output['categorical_day_of_week'] = output['day_of_week'].copy()
output['categorical_hour'] = output['hour'].copy()
output.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_volatility(path):
df = pd.read_csv(os.path.join(path, 'oxfordmanrealizedvolatilityindices.csv'), index_col=0) # no explicit index
# Adds additional date/day fields
idx = [str(s).split('+')[0] for s in df.index
] # ignore timezones, we don't need them
dates = pd.to_datetime(idx)
df['date'] = dates
df['days_from_start'] = (dates - pd.datetime(2000, 1, 3)).days
df['day_of_week'] = dates.dayofweek
df['day_of_month'] = dates.day
df['week_of_year'] = dates.weekofyear
df['month'] = dates.month
df['year'] = dates.year
df['categorical_id'] = df['Symbol'].copy()
# Processes log volatility
vol = df['rv5_ss'].copy()
vol.loc[vol == 0.] = np.nan
df['log_vol'] = np.log(vol)
# Adds static information
symbol_region_mapping = {
'.AEX': 'EMEA',
'.AORD': 'APAC',
'.BFX': 'EMEA',
'.BSESN': 'APAC',
'.BVLG': 'EMEA',
'.BVSP': 'AMER',
'.DJI': 'AMER',
'.FCHI': 'EMEA',
'.FTMIB': 'EMEA',
'.FTSE': 'EMEA',
'.GDAXI': 'EMEA',
'.GSPTSE': 'AMER',
'.HSI': 'APAC',
'.IBEX': 'EMEA',
'.IXIC': 'AMER',
'.KS11': 'APAC',
'.KSE': 'APAC',
'.MXX': 'AMER',
'.N225': 'APAC ',
'.NSEI': 'APAC',
'.OMXC20': 'EMEA',
'.OMXHPI': 'EMEA',
'.OMXSPI': 'EMEA',
'.OSEAX': 'EMEA',
'.RUT': 'EMEA',
'.SMSI': 'EMEA',
'.SPX': 'AMER',
'.SSEC': 'APAC',
'.SSMI': 'EMEA',
'.STI': 'APAC',
'.STOXX50E': 'EMEA'
}
df['Region'] = df['Symbol'].apply(lambda k: symbol_region_mapping[k])
# Performs final processing
output_df_list = []
for grp in df.groupby('Symbol'):
sliced = grp[1].copy()
sliced.sort_values('days_from_start', inplace=True)
# Impute log volatility values
sliced['log_vol'].fillna(method='ffill', inplace=True)
sliced.dropna()
output_df_list.append(sliced)
df = pd.concat(output_df_list, axis=0)
df.to_csv(os.path.join(path, 'standarized.csv'))
def standarize_traffic(path):
def process_list(s, variable_type=int, delimiter=None):
"""Parses a line in the PEMS format to a list."""
if delimiter is None:
l = [
variable_type(i) for i in s.replace('[', '').replace(']', '').split()
]
else:
l = [
variable_type(i)
for i in s.replace('[', '').replace(']', '').split(delimiter)
]
return l
def read_single_list(filename):
"""Returns single list from a file in the PEMS-custom format."""
with open(os.path.join(path, filename), 'r') as dat:
l = process_list(dat.readlines()[0])
return l
def read_matrix(filename):
"""Returns a matrix from a file in the PEMS-custom format."""
array_list = []
with open(os.path.join(path, filename), 'r') as dat:
lines = dat.readlines()
for i, line in enumerate(lines):
if (i + 1) % 50 == 0:
print('Completed {} of {} rows for {}'.format(i + 1, len(lines),
filename))
array = [
process_list(row_split, variable_type=float, delimiter=None)
for row_split in process_list(
line, variable_type=str, delimiter=';')
]
array_list.append(array)
return array_list
shuffle_order = np.array(read_single_list('randperm')) - 1 # index from 0
train_dayofweek = read_single_list('PEMS_trainlabels')
train_tensor = read_matrix('PEMS_train')
test_dayofweek = read_single_list('PEMS_testlabels')
test_tensor = read_matrix('PEMS_test')
# Inverse permutate shuffle order
print('Shuffling')
inverse_mapping = {
new_location: previous_location
for previous_location, new_location in enumerate(shuffle_order)
}
reverse_shuffle_order = np.array([
inverse_mapping[new_location]
for new_location, _ in enumerate(shuffle_order)
])
# Group and reoder based on permuation matrix
print('Reodering')
day_of_week = np.array(train_dayofweek + test_dayofweek)
combined_tensor = np.array(train_tensor + test_tensor)
day_of_week = day_of_week[reverse_shuffle_order]
combined_tensor = combined_tensor[reverse_shuffle_order]
# Put everything back into a dataframe
print('Parsing as dataframe')
labels = ['traj_{}'.format(i) for i in read_single_list('stations_list')]
hourly_list = []
for day, day_matrix in enumerate(combined_tensor):
# Hourly data
hourly = pd.DataFrame(day_matrix.T, columns=labels)
hourly['hour_on_day'] = [int(i / 6) for i in hourly.index
] # sampled at 10 min intervals
if hourly['hour_on_day'].max() > 23 or hourly['hour_on_day'].min() < 0:
raise ValueError('Invalid hour! {}-{}'.format(
hourly['hour_on_day'].min(), hourly['hour_on_day'].max()))
hourly = hourly.groupby('hour_on_day', as_index=True).mean()[labels]
hourly['sensor_day'] = day
hourly['time_on_day'] = hourly.index
hourly['day_of_week'] = day_of_week[day]
hourly_list.append(hourly)
hourly_frame = pd.concat(hourly_list, axis=0, ignore_index=True, sort=False)
# Flatten such that each entitiy uses one row in dataframe
store_columns = [c for c in hourly_frame.columns if 'traj' in c]
other_columns = [c for c in hourly_frame.columns if 'traj' not in c]
flat_df = pd.DataFrame(columns=['values', 'prev_values', 'next_values'] +
other_columns + ['id'])
for store in store_columns:
print('Processing {}'.format(store))
sliced = hourly_frame[[store] + other_columns].copy()
sliced.columns = ['values'] + other_columns
sliced['id'] = int(store.replace('traj_', ''))
# Sort by Sensor-date-time
key = sliced['id'].apply(str) \
+ sliced['sensor_day'].apply(lambda x: '_{:03d}'.format(x)) \
+ sliced['time_on_day'].apply(lambda x: '_{:03d}'.format(x))
sliced = sliced.set_index(key).sort_index()
sliced['values'] = sliced['values'].fillna(method='ffill')
sliced['prev_values'] = sliced['values'].shift(1)
sliced['next_values'] = sliced['values'].shift(-1)
flat_df = flat_df.append(sliced.dropna(), ignore_index=True, sort=False)
# Filter to match range used by other academic papers
index = flat_df['sensor_day']
flat_df = flat_df[index < 173].copy()
# Creating columns fo categorical inputs
flat_df['categorical_id'] = flat_df['id'].copy()
flat_df['hours_from_start'] = flat_df['time_on_day'] \
+ flat_df['sensor_day']*24.
flat_df['categorical_day_of_week'] = flat_df['day_of_week'].copy()
flat_df['categorical_time_on_day'] = flat_df['time_on_day'].copy()
flat_df.to_csv(os.path.join(path, 'standarized.csv'))
# XXX needs rework
def standarize_favorita(data_folder):
import gc
# Extract only a subset of data to save/process for efficiency
start_date = pd.datetime(2015, 1, 1)
end_date = pd.datetime(2016, 6, 1)
print('Regenerating data...')
# load temporal data
temporal = pd.read_csv(os.path.join(data_folder, 'train.csv'), index_col=0)
store_info = pd.read_csv(os.path.join(data_folder, 'stores.csv'), index_col=0)
oil = pd.read_csv(
os.path.join(data_folder, 'oil.csv'), index_col=0).iloc[:, 0]
holidays = pd.read_csv(os.path.join(data_folder, 'holidays_events.csv'))
items = pd.read_csv(os.path.join(data_folder, 'items.csv'), index_col=0)
transactions = pd.read_csv(os.path.join(data_folder, 'transactions.csv'))
# Take first 6 months of data
temporal['date'] = pd.to_datetime(temporal['date'])
# Filter dates to reduce storage space requirements
if start_date is not None:
temporal = temporal[(temporal['date'] >= start_date)]
if end_date is not None:
temporal = temporal[(temporal['date'] < end_date)]
dates = temporal['date'].unique()
# Add trajectory identifier
temporal['traj_id'] = temporal['store_nbr'].apply(
str) + '_' + temporal['item_nbr'].apply(str)
temporal['unique_id'] = temporal['traj_id'] + '_' + temporal['date'].apply(
str)
# Remove all IDs with negative returns
print('Removing returns data')
min_returns = temporal['unit_sales'].groupby(temporal['traj_id']).min()
valid_ids = set(min_returns[min_returns >= 0].index)
selector = temporal['traj_id'].apply(lambda traj_id: traj_id in valid_ids)
new_temporal = temporal[selector].copy()
del temporal
gc.collect()
temporal = new_temporal
temporal['open'] = 1
# Resampling
print('Resampling to regular grid')
resampled_dfs = []
for traj_id, raw_sub_df in temporal.groupby('traj_id'):
print('Resampling', traj_id)
sub_df = raw_sub_df.set_index('date', drop=True).copy()
sub_df = sub_df.resample('1d').last()
sub_df['date'] = sub_df.index
sub_df[['store_nbr', 'item_nbr', 'onpromotion']] \
= sub_df[['store_nbr', 'item_nbr', 'onpromotion']].fillna(method='ffill')
sub_df['open'] = sub_df['open'].fillna(
0) # flag where sales data is unknown
sub_df['log_sales'] = np.log(sub_df['unit_sales'])
resampled_dfs.append(sub_df.reset_index(drop=True))
new_temporal = pd.concat(resampled_dfs, axis=0)
del temporal
gc.collect()
temporal = new_temporal
print('Adding oil')
oil.name = 'oil'
oil.index = pd.to_datetime(oil.index)
#XXX the lines below match the value of the oil on given date with the rest of the timeseries
# missing values in oil series are copied from the index before. Then the oil series is joined with
# temporal. Then there are some dates present in temporal which arent present in oil, for which
# oil values is substituted with -1. WHY?!
#TODO: check how many nans there are after first step. Previously oil series was extended by dates
# present in dates variable with nan value, which were forward filled.
# This behavior is no longer supported by pandas, so we changed to DataFrame.isin method.
# This leaves us with more nans after first step than previously. To achieve previous behavior
# we have to join series before filling nans.
temporal = temporal.join(
#oil.loc[oil.index.isin(dates)].fillna(method='ffill'), on='date', how='left')
oil.loc[oil.index.isin(dates)], on='date', how='left')
temporal['oil'] = temporal['oil'].fillna(method='ffill')
temporal['oil'] = temporal['oil'].fillna(-1)
print('Adding store info')
temporal = temporal.join(store_info, on='store_nbr', how='left')
print('Adding item info')
temporal = temporal.join(items, on='item_nbr', how='left')
transactions['date'] = pd.to_datetime(transactions['date'])
temporal = temporal.merge(
transactions,
left_on=['date', 'store_nbr'],
right_on=['date', 'store_nbr'],
how='left')
temporal['transactions'] = temporal['transactions'].fillna(-1)
# Additional date info
temporal['day_of_week'] = pd.to_datetime(temporal['date'].values).dayofweek
temporal['day_of_month'] = pd.to_datetime(temporal['date'].values).day
temporal['month'] = pd.to_datetime(temporal['date'].values).month
# Add holiday info
print('Adding holidays')
holiday_subset = holidays[holidays['transferred'].apply(
lambda x: not x)].copy()
holiday_subset.columns = [
s if s != 'type' else 'holiday_type' for s in holiday_subset.columns
]
holiday_subset['date'] = pd.to_datetime(holiday_subset['date'])
local_holidays = holiday_subset[holiday_subset['locale'] == 'Local']
regional_holidays = holiday_subset[holiday_subset['locale'] == 'Regional']
national_holidays = holiday_subset[holiday_subset['locale'] == 'National']
temporal['national_hol'] = temporal.merge(
national_holidays, left_on=['date'], right_on=['date'],
how='left')['description'].fillna('')
temporal['regional_hol'] = temporal.merge(
regional_holidays,
left_on=['state', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal['local_hol'] = temporal.merge(
local_holidays,
left_on=['city', 'date'],
right_on=['locale_name', 'date'],
how='left')['description'].fillna('')
temporal.sort_values('unique_id', inplace=True)
# Transform date to integer index
start_date = pd.to_datetime(min(temporal['date']))
dates = temporal['date'].apply(pd.to_datetime)
temporal['days_from_start'] = (dates - start_date).dt.days
temporal['categorical_id'] = temporal['traj_id'].copy()
print('Saving processed file to {}'.format(os.path.join(data_folder, 'standarized.csv')))
temporal.to_csv(os.path.join(data_folder, 'standarized.csv'))

View file

@ -0,0 +1,73 @@
# Copyright 2021 NVIDIA CORPORATION
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2019 Ross Wightman
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Exponential Moving Average (EMA) of model updates
"""
from collections import OrderedDict
from copy import deepcopy
import torch
import torch.nn as nn
class ModelEma(nn.Module):
""" Model Exponential Moving Average V2
Keep a moving average of everything in the model state_dict (parameters and buffers).
V2 of this module is simpler, it does not match params/buffers based on name but simply
iterates in order. It works with torchscript (JIT of full model).
"""
def __init__(self, model, decay=0.999, device=None):
super().__init__()
# make a copy of the model for accumulating moving average of weights
self.module = deepcopy(model)
self.module.eval()
self.decay = decay
self.device = device # perform ema on different device from model if set
if self.device is not None:
self.module.to(device=device)
def update(self, model):
update_fn=lambda ema_v, model_v: self.decay * ema_v + (1. - self.decay) * model_v
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_(update_fn(ema_v, model_v))
def set(self, model):
with torch.no_grad():
for ema_v, model_v in zip(self.module.state_dict().values(), model.state_dict().values()):
if self.device is not None:
model_v = model_v.to(device=self.device)
ema_v.copy_( model_v )
def forward(self, x):
return self.module(x)

View file

@ -0,0 +1,157 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import collections
import math
import os
import pathlib
import re
import pynvml
pynvml.nvmlInit()
def systemGetDriverVersion():
return pynvml.nvmlSystemGetDriverVersion()
def deviceGetCount():
return pynvml.nvmlDeviceGetCount()
class device:
# assume nvml returns list of 64 bit ints
_nvml_affinity_elements = math.ceil(os.cpu_count() / 64)
def __init__(self, device_idx):
super().__init__()
self.handle = pynvml.nvmlDeviceGetHandleByIndex(device_idx)
def getName(self):
return pynvml.nvmlDeviceGetName(self.handle)
def getCpuAffinity(self):
affinity_string = ''
for j in pynvml.nvmlDeviceGetCpuAffinity(
self.handle, device._nvml_affinity_elements
):
# assume nvml returns list of 64 bit ints
affinity_string = '{:064b}'.format(j) + affinity_string
affinity_list = [int(x) for x in affinity_string]
affinity_list.reverse() # so core 0 is in 0th element of list
ret = [i for i, e in enumerate(affinity_list) if e != 0]
return ret
def set_socket_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity)
def set_single_affinity(gpu_id):
dev = device(gpu_id)
affinity = dev.getCpuAffinity()
os.sched_setaffinity(0, affinity[:1])
def set_single_unique_affinity(gpu_id, nproc_per_node):
devices = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in devices]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
affinities = []
assigned = []
for socket_affinity in socket_affinities:
for core in socket_affinity:
if core not in assigned:
affinities.append([core])
assigned.append(core)
break
os.sched_setaffinity(0, affinities[gpu_id])
def set_socket_unique_affinity(gpu_id, nproc_per_node, mode):
device_ids = [device(i) for i in range(nproc_per_node)]
socket_affinities = [dev.getCpuAffinity() for dev in device_ids]
siblings_list = get_thread_siblings_list()
siblings_dict = dict(siblings_list)
# remove siblings
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities[idx] = list(set(socket_affinity) - set(siblings_dict.values()))
socket_affinities_to_device_ids = collections.defaultdict(list)
for idx, socket_affinity in enumerate(socket_affinities):
socket_affinities_to_device_ids[tuple(socket_affinity)].append(idx)
for socket_affinity, device_ids in socket_affinities_to_device_ids.items():
devices_per_group = len(device_ids)
cores_per_device = len(socket_affinity) // devices_per_group
for group_id, device_id in enumerate(device_ids):
if device_id == gpu_id:
if mode == 'interleaved':
affinity = list(socket_affinity[group_id::devices_per_group])
elif mode == 'continuous':
affinity = list(socket_affinity[group_id*cores_per_device:(group_id+1)*cores_per_device])
else:
raise RuntimeError('Unknown set_socket_unique_affinity mode')
# reintroduce siblings
affinity += [siblings_dict[aff] for aff in affinity if aff in siblings_dict]
os.sched_setaffinity(0, affinity)
def get_thread_siblings_list():
path = '/sys/devices/system/cpu/cpu*/topology/thread_siblings_list'
thread_siblings_list = []
pattern = re.compile(r'(\d+)\D(\d+)')
for fname in pathlib.Path(path[0]).glob(path[1:]):
with open(fname) as f:
content = f.read().strip()
res = pattern.findall(content)
if res:
pair = tuple(map(int, res[0]))
thread_siblings_list.append(pair)
return thread_siblings_list
def set_affinity(gpu_id, nproc_per_node, mode='socket'):
if mode == 'socket':
set_socket_affinity(gpu_id)
elif mode == 'single':
set_single_affinity(gpu_id)
elif mode == 'single_unique':
set_single_unique_affinity(gpu_id, nproc_per_node)
elif mode == 'socket_unique_interleaved':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'interleaved')
elif mode == 'socket_unique_continuous':
set_socket_unique_affinity(gpu_id, nproc_per_node, 'continuous')
else:
raise RuntimeError('Unknown affinity mode')
affinity = os.sched_getaffinity(0)
return affinity

View file

@ -0,0 +1,239 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pandas as pd
import numpy as np
import pickle
import argparse
import torch
from torch.utils.data import DataLoader
from torch.cuda import amp
from torch.utils.tensorboard import SummaryWriter
from tqdm import tqdm
from modeling import TemporalFusionTransformer
from configuration import ElectricityConfig
from data_utils import TFTDataset
from utils import PerformanceMeter
from criterions import QuantileLoss
import dllogger
from log_helper import setup_logger
def _unscale_per_id(config, values, ids, scalers):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
flat_values['id'] = ids
df_list = []
for idx, group in flat_values.groupby('id'):
scaler = scalers[idx]
group_copy = group.copy()
for col in group_copy.columns:
if not 'id' in col:
_col = np.expand_dims(group_copy[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
group_copy[col] = _t_col
df_list.append(group_copy)
flat_values = pd.concat(df_list, axis=0)
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def _unscale(config, values, scaler):
values = values.cpu().numpy()
num_horizons = config.example_length - config.encoder_length + 1
flat_values = pd.DataFrame(
values,
columns=[f't{j}' for j in range(num_horizons - values.shape[1], num_horizons)]
)
for col in flat_values.columns:
if not 'id' in col:
_col = np.expand_dims(flat_values[col].values, -1)
_t_col = scaler.inverse_transform(_col)[:,-1]
flat_values[col] = _t_col
flat_values = flat_values[[col for col in flat_values if not 'id' in col]]
flat_tensor = torch.from_numpy(flat_values.values)
return flat_tensor
def predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=False):
model.eval()
predictions = []
targets = []
ids = []
perf_meter = PerformanceMeter()
n_workers = args.distributed_world_size if hasattr(args, 'distributed_world_size') else 1
for step, batch in enumerate(data_loader):
perf_meter.reset_current_lap()
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
ids.append(batch['id'][:,0,:])
targets.append(batch['target'])
predictions.append(model(batch).float())
perf_meter.update(args.batch_size * n_workers,
exclude_from_total=step in [0, len(data_loader)-1])
targets = torch.cat(targets, dim=0)
if not extend_targets:
targets = targets[:,config.encoder_length:,:]
predictions = torch.cat(predictions, dim=0)
if config.scale_per_id:
ids = torch.cat(ids, dim=0).cpu().numpy()
unscaled_predictions = torch.stack(
[_unscale_per_id(config, predictions[:,:,i], ids, scalers) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale_per_id(config, targets[:,:,0], ids, scalers).unsqueeze(-1)
else:
ids = None
unscaled_predictions = torch.stack(
[_unscale(config, predictions[:,:,i], scalers['']) for i in range(len(config.quantiles))],
dim=-1)
unscaled_targets = _unscale(config, targets[:,:,0], scalers['']).unsqueeze(-1)
return unscaled_predictions, unscaled_targets, ids, perf_meter
def visualize_v2(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, _ = predict(args, config, model, data_loader, scalers, cat_encodings, extend_targets=True)
num_horizons = config.example_length - config.encoder_length + 1
pad = unscaled_predictions.new_full((unscaled_targets.shape[0], unscaled_targets.shape[1] - unscaled_predictions.shape[1], unscaled_predictions.shape[2]), fill_value=float('nan'))
pad[:,-1,:] = unscaled_targets[:,-num_horizons,:]
unscaled_predictions = torch.cat((pad, unscaled_predictions), dim=1)
ids = torch.from_numpy(ids.squeeze())
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items():
for i, ex in enumerate(g):
df = pd.DataFrame(ex.numpy(),
index=range(num_horizons - ex.shape[0], num_horizons),
columns=['target'] + [f'P{int(q*100)}' for q in config.quantiles])
fig = df.plot().get_figure()
ax = fig.get_axes()[0]
_values = df.values[config.encoder_length-1:,:]
ax.fill_between(range(num_horizons), _values[:,1], _values[:,-1], alpha=0.2, color='green')
os.makedirs(os.path.join(args.results, 'single_example_vis', str(key)), exist_ok=True)
fig.savefig(os.path.join(args.results, 'single_example_vis', str(key), f'{i}.pdf'))
def inference(args, config, model, data_loader, scalers, cat_encodings):
unscaled_predictions, unscaled_targets, ids, perf_meter = predict(args, config, model, data_loader, scalers, cat_encodings)
if args.joint_visualization or args.save_predictions:
ids = torch.from_numpy(ids.squeeze())
#ids = torch.cat([x['id'][0] for x in data_loader.dataset])
joint_graphs = torch.cat([unscaled_targets, unscaled_predictions], dim=2)
graphs = {i:joint_graphs[ids == i, :, :] for i in set(ids.tolist())}
for key, g in graphs.items(): #timeseries id, joint targets and predictions
_g = {'targets': g[:,:,0]}
_g.update({f'P{int(q*100)}':g[:,:,i+1] for i, q in enumerate(config.quantiles)})
if args.joint_visualization:
summary_writer = SummaryWriter(log_dir=os.path.join(args.results, 'predictions_vis', str(key)))
for q, t in _g.items(): # target and quantiles, timehorizon values
if q == 'targets':
targets = torch.cat([t[:,0], t[-1,1:]]) # WIP
# We want to plot targets on the same graph as predictions. Probably could be written better.
for i, val in enumerate(targets):
summary_writer.add_scalars(str(key), {f'{q}':val}, i)
continue
# Tensor t contains different time horizons which are shifted in phase
# Next lines realign them
y = t.new_full((t.shape[0] + t.shape[1] -1, t.shape[1]), float('nan'))
for i in range(y.shape[1]):
y[i:i+t.shape[0], i] = t[:,i]
for i, vals in enumerate(y): # timestep, timehorizon values value
summary_writer.add_scalars(str(key), {f'{q}_t+{j+1}':v for j,v in enumerate(vals) if v == v}, i)
summary_writer.close()
if args.save_predictions:
for q, t in _g.items():
df = pd.DataFrame(t.tolist())
df.columns = [f't+{i+1}' for i in range(len(df.columns))]
os.makedirs(os.path.join(args.results, 'predictions', str(key)), exist_ok=True)
df.to_csv(os.path.join(args.results, 'predictions', str(key), q+'.csv'))
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
q_risk = 2 * losses / normalizer
perf_dict = {
'throughput': perf_meter.avg,
'latency_avg': perf_meter.total_time/len(perf_meter.intervals),
'latency_p90': perf_meter.p(90),
'latency_p95': perf_meter.p(95),
'latency_p99': perf_meter.p(99),
'total_infernece_time': perf_meter.total_time,
}
return q_risk, perf_dict
def main(args):
setup_logger(args)
# Set up model
state_dict = torch.load(args.checkpoint)
config = state_dict['config']
model = TemporalFusionTransformer(config).cuda()
model.load_state_dict(state_dict['model'])
model.eval()
model.cuda()
# Set up dataset
test_split = TFTDataset(args.data, config)
data_loader = DataLoader(test_split, batch_size=args.batch_size, num_workers=4)
scalers = pickle.load(open(args.tgt_scalers, 'rb'))
cat_encodings = pickle.load(open(args.cat_encodings, 'rb'))
if args.visualize:
# TODO: abstract away all forms of visualization.
visualize_v2(args, config, model, data_loader, scalers, cat_encodings)
quantiles, perf_dict = inference(args, config, model, data_loader, scalers, cat_encodings)
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, **perf_dict}
dllogger.log(step=(), data=finish_log, verbosity=1)
print('Test q-risk: P10 {} | P50 {} | P90 {}'.format(*quantiles))
print('Latency:\n\tAverage {:.3f}s\n\tp90 {:.3f}s\n\tp95 {:.3f}s\n\tp99 {:.3f}s'.format(
perf_dict['latency_avg'], perf_dict['latency_p90'], perf_dict['latency_p95'], perf_dict['latency_p99']))
if __name__=='__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--checkpoint', type=str,
help='Path to the checkpoint')
parser.add_argument('--data', type=str,
help='Path to the test split of the dataset')
parser.add_argument('--tgt_scalers', type=str,
help='Path to the tgt_scalers.bin file produced by the preprocessing')
parser.add_argument('--cat_encodings', type=str,
help='Path to the cat_encodings.bin file produced by the preprocessing')
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--visualize', action='store_true', help='Visualize predictions - each example on the separate plot')
parser.add_argument('--joint_visualization', action='store_true', help='Visualize predictions - each timeseries on separate plot. Projections will be concatenated.')
parser.add_argument('--save_predictions', action='store_true')
parser.add_argument('--results', type=str, default='/results')
parser.add_argument('--log_file', type=str, default='dllogger.json')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,141 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import subprocess
import sys
import itertools
import atexit
import dllogger
from dllogger import Backend, JSONStreamBackend, StdOutBackend
import torch.distributed as dist
from torch.utils.tensorboard import SummaryWriter
class TensorBoardBackend(Backend):
def __init__(self, verbosity, log_dir):
super().__init__(verbosity=verbosity)
self.summary_writer = SummaryWriter(log_dir=os.path.join(log_dir, 'TB_summary'),
flush_secs=120,
max_queue=200
)
self.hp_cache = None
atexit.register(self.summary_writer.close)
@property
def log_level(self):
return self._log_level
def metadata(self, timestamp, elapsedtime, metric, metadata):
pass
def log(self, timestamp, elapsedtime, step, data):
if step == 'HPARAMS':
parameters = {k: v for k, v in data.items() if not isinstance(v, (list, tuple))}
#Unpack list and tuples
for d in [{k+f'_{i}':v for i,v in enumerate(l)} for k,l in data.items() if isinstance(l, (list, tuple))]:
parameters.update(d)
#Remove custom classes
parameters = {k: v for k, v in data.items() if isinstance(v, (int, float, str, bool))}
parameters.update({k:'None' for k, v in data.items() if v is None})
self.hp_cache = parameters
if step == ():
if self.hp_cache is None:
print('Warning: Cannot save HParameters. Please log HParameters with step=\'HPARAMS\'', file=sys.stderr)
return
self.summary_writer.add_hparams(self.hp_cache, data)
if not isinstance(step, int):
return
for k, v in data.items():
self.summary_writer.add_scalar(k, v, step)
def flush(self):
pass
def setup_logger(args):
os.makedirs(args.results, exist_ok=True)
log_path = os.path.join(args.results, args.log_file)
if os.path.exists(log_path):
for i in itertools.count():
s_fname = args.log_file.split('.')
fname = '.'.join(s_fname[:-1]) + f'_{i}.' + s_fname[-1] if len(s_fname) > 1 else args.stat_file + f'.{i}'
log_path = os.path.join(args.results, fname)
if not os.path.exists(log_path):
break
def metric_format(metric, metadata, value):
return "{}: {}".format(metric, f'{value:.5f}' if isinstance(value, float) else value)
def step_format(step):
if step == ():
return "Finished |"
elif isinstance(step, int):
return "Step {0: <5} |".format(step)
return "Step {} |".format(step)
if not dist.is_initialized() or not args.distributed_world_size > 1 or args.distributed_rank == 0:
dllogger.init(backends=[JSONStreamBackend(verbosity=1, filename=log_path),
TensorBoardBackend(verbosity=1, log_dir=args.results),
StdOutBackend(verbosity=2,
step_format=step_format,
prefix_format=lambda x: "")#,
#metric_format=metric_format)
])
else:
dllogger.init(backends=[])
dllogger.log(step='PARAMETER', data=vars(args), verbosity=0)
container_setup_info = {**get_framework_env_vars(), **get_system_info()}
dllogger.log(step='ENVIRONMENT', data=container_setup_info, verbosity=0)
dllogger.metadata('loss', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TRAIN', 'format': ':5f'})
dllogger.metadata('items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'TRAIN', 'format': ':1f'})
dllogger.metadata('val_loss', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format':':5f'})
dllogger.metadata('val_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'VAL', 'format': ':5f'})
dllogger.metadata('val_items/s', {'GOAL': 'MAXIMIZE', 'STAGE': 'VAL', 'format': ':1f'})
dllogger.metadata('test_P10', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P50', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('test_P90', {'GOAL': 'MINIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('throughput', {'GOAL': 'MAXIMIZE', 'STAGE': 'TEST', 'format': ':1f'})
dllogger.metadata('latency_p90', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p95', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
dllogger.metadata('latency_p99', {'GOAL': 'MIMIMIZE', 'STAGE': 'TEST', 'format': ':5f'})
def get_framework_env_vars():
return {
'NVIDIA_PYTORCH_VERSION': os.environ.get('NVIDIA_PYTORCH_VERSION'),
'PYTORCH_VERSION': os.environ.get('PYTORCH_VERSION'),
'CUBLAS_VERSION': os.environ.get('CUBLAS_VERSION'),
'NCCL_VERSION': os.environ.get('NCCL_VERSION'),
'CUDA_DRIVER_VERSION': os.environ.get('CUDA_DRIVER_VERSION'),
'CUDNN_VERSION': os.environ.get('CUDNN_VERSION'),
'CUDA_VERSION': os.environ.get('CUDA_VERSION'),
'NVIDIA_PIPELINE_ID': os.environ.get('NVIDIA_PIPELINE_ID'),
'NVIDIA_BUILD_ID': os.environ.get('NVIDIA_BUILD_ID'),
'NVIDIA_TF32_OVERRIDE': os.environ.get('NVIDIA_TF32_OVERRIDE'),
}
def get_system_info():
system_info = subprocess.run('nvidia-smi --query-gpu=gpu_name,memory.total,enforced.power.limit --format=csv'.split(), capture_output=True).stdout
system_info = [i.decode('utf-8') for i in system_info.split(b'\n')]
system_info = [x for x in system_info if x]
return {'system_info': system_info}

View file

@ -0,0 +1,367 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import Tensor
from typing import Dict, Tuple, Optional, List
if os.environ.get("TFT_SCRIPTING", False):
from torch.nn import LayerNorm
else:
from apex.normalization.fused_layer_norm import FusedLayerNorm as LayerNorm
class MaybeLayerNorm(nn.Module):
def __init__(self, output_size, hidden_size, eps):
super().__init__()
if output_size and output_size == 1:
self.ln = nn.Identity()
else:
self.ln = LayerNorm(output_size if output_size else hidden_size, eps=eps)
def forward(self, x):
return self.ln(x)
class GLU(nn.Module):
def __init__(self, hidden_size, output_size):
super().__init__()
self.lin = nn.Linear(hidden_size, output_size * 2)
def forward(self, x: Tensor) -> Tensor:
x = self.lin(x)
x = F.glu(x)
return x
class GRN(nn.Module):
def __init__(self,
input_size,
hidden_size,
output_size=None,
context_hidden_size=None,
dropout=0):
super().__init__()
self.layer_norm = MaybeLayerNorm(output_size, hidden_size, eps=1e-3)
self.lin_a = nn.Linear(input_size, hidden_size)
if context_hidden_size is not None:
self.lin_c = nn.Linear(context_hidden_size, hidden_size, bias=False)
self.lin_i = nn.Linear(hidden_size, hidden_size)
self.glu = GLU(hidden_size, output_size if output_size else hidden_size)
self.dropout = nn.Dropout(dropout)
self.out_proj = nn.Linear(input_size, output_size) if output_size else None
def forward(self, a: Tensor, c: Optional[Tensor] = None):
x = self.lin_a(a)
if c is not None:
x = x + self.lin_c(c).unsqueeze(1)
x = F.elu(x)
x = self.lin_i(x)
x = self.dropout(x)
x = self.glu(x)
y = a if not self.out_proj else self.out_proj(a)
x = x + y
x = self.layer_norm(x)
return x
class TFTEmbedding(nn.Module):
def __init__(self, config):
super().__init__()
self.s_cat_inp_lens = config.static_categorical_inp_lens
self.t_cat_k_inp_lens = config.temporal_known_categorical_inp_lens
self.t_cat_o_inp_lens = config.temporal_observed_categorical_inp_lens
self.s_cont_inp_size = config.static_continuous_inp_size
self.t_cont_k_inp_size = config.temporal_known_continuous_inp_size
self.t_cont_o_inp_size = config.temporal_observed_continuous_inp_size
self.t_tgt_size = config.temporal_target_size
self.hidden_size = config.hidden_size
# There are 7 types of input:
# 1. Static categorical
# 2. Static continuous
# 3. Temporal known a priori categorical
# 4. Temporal known a priori continuous
# 5. Temporal observed categorical
# 6. Temporal observed continuous
# 7. Temporal observed targets (time series obseved so far)
self.s_cat_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.s_cat_inp_lens]) if self.s_cat_inp_lens else None
self.t_cat_k_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_k_inp_lens]) if self.t_cat_k_inp_lens else None
self.t_cat_o_embed = nn.ModuleList([
nn.Embedding(n, self.hidden_size) for n in self.t_cat_o_inp_lens]) if self.t_cat_o_inp_lens else None
self.s_cont_embedding_vectors = nn.Parameter(torch.Tensor(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_vectors = nn.Parameter(torch.Tensor(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_vectors = nn.Parameter(torch.Tensor(self.t_tgt_size, self.hidden_size))
self.s_cont_embedding_bias = nn.Parameter(torch.zeros(self.s_cont_inp_size, self.hidden_size)) if self.s_cont_inp_size else None
self.t_cont_k_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_k_inp_size, self.hidden_size)) if self.t_cont_k_inp_size else None
self.t_cont_o_embedding_bias = nn.Parameter(torch.zeros(self.t_cont_o_inp_size, self.hidden_size)) if self.t_cont_o_inp_size else None
self.t_tgt_embedding_bias = nn.Parameter(torch.zeros(self.t_tgt_size, self.hidden_size))
if self.s_cont_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.s_cont_embedding_vectors)
if self.t_cont_k_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_k_embedding_vectors)
if self.t_cont_o_embedding_vectors is not None:
torch.nn.init.xavier_normal_(self.t_cont_o_embedding_vectors)
torch.nn.init.xavier_normal_(self.t_tgt_embedding_vectors)
def _apply_embedding(self,
cat: Optional[Tensor],
cont: Optional[Tensor],
cat_emb: Optional[nn.ModuleList],
cont_emb: Tensor,
cont_bias: Tensor,
) -> Tuple[Optional[Tensor], Optional[Tensor]]:
e_cat = torch.stack([embed(cat[...,i]) for i, embed in enumerate(cat_emb)], dim=-2) if cat is not None else None
if cont is not None:
#the line below is equivalent to following einsums
#e_cont = torch.einsum('btf,fh->bthf', cont, cont_emb)
#e_cont = torch.einsum('bf,fh->bhf', cont, cont_emb)
e_cont = torch.mul(cont.unsqueeze(-1), cont_emb)
e_cont = e_cont + cont_bias
else:
e_cont = None
if e_cat is not None and e_cont is not None:
return torch.cat([e_cat, e_cont], dim=-2)
elif e_cat is not None:
return e_cat
elif e_cont is not None:
return e_cont
else:
return None
def forward(self, x: Dict[str, Tensor]):
# temporal/static categorical/continuous known/observed input
s_cat_inp = x.get('s_cat', None)
s_cont_inp = x.get('s_cont', None)
t_cat_k_inp = x.get('k_cat', None)
t_cont_k_inp = x.get('k_cont', None)
t_cat_o_inp = x.get('o_cat', None)
t_cont_o_inp = x.get('o_cont', None)
t_tgt_obs = x['target'] # Has to be present
# Static inputs are expected to be equal for all timesteps
# For memory efficiency there is no assert statement
s_cat_inp = s_cat_inp[:,0,:] if s_cat_inp is not None else None
s_cont_inp = s_cont_inp[:,0,:] if s_cont_inp is not None else None
s_inp = self._apply_embedding(s_cat_inp,
s_cont_inp,
self.s_cat_embed,
self.s_cont_embedding_vectors,
self.s_cont_embedding_bias)
t_known_inp = self._apply_embedding(t_cat_k_inp,
t_cont_k_inp,
self.t_cat_k_embed,
self.t_cont_k_embedding_vectors,
self.t_cont_k_embedding_bias)
t_observed_inp = self._apply_embedding(t_cat_o_inp,
t_cont_o_inp,
self.t_cat_o_embed,
self.t_cont_o_embedding_vectors,
self.t_cont_o_embedding_bias)
# Temporal observed targets
# t_observed_tgt = torch.einsum('btf,fh->btfh', t_tgt_obs, self.t_tgt_embedding_vectors)
t_observed_tgt = torch.matmul(t_tgt_obs.unsqueeze(3).unsqueeze(4), self.t_tgt_embedding_vectors.unsqueeze(1)).squeeze(3)
t_observed_tgt = t_observed_tgt + self.t_tgt_embedding_bias
return s_inp, t_known_inp, t_observed_inp, t_observed_tgt
class VariableSelectionNetwork(nn.Module):
def __init__(self, config, num_inputs):
super().__init__()
self.joint_grn = GRN(config.hidden_size*num_inputs, config.hidden_size, output_size=num_inputs, context_hidden_size=config.hidden_size)
self.var_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(num_inputs)])
def forward(self, x: Tensor, context: Optional[Tensor] = None):
Xi = x.reshape(*x.shape[:-2], -1)
grn_outputs = self.joint_grn(Xi, c=context)
sparse_weights = F.softmax(grn_outputs, dim=-1)
transformed_embed_list = [m(x[...,i,:]) for i, m in enumerate(self.var_grns)]
transformed_embed = torch.stack(transformed_embed_list, dim=-1)
#the line below performs batched matrix vector multiplication
#for temporal features it's bthf,btf->bth
#for static features it's bhf,bf->bh
variable_ctx = torch.matmul(transformed_embed, sparse_weights.unsqueeze(-1)).squeeze(-1)
return variable_ctx, sparse_weights
class StaticCovariateEncoder(nn.Module):
def __init__(self, config):
super().__init__()
self.vsn = VariableSelectionNetwork(config, config.num_static_vars)
self.context_grns = nn.ModuleList([GRN(config.hidden_size, config.hidden_size, dropout=config.dropout) for _ in range(4)])
def forward(self, x: Tensor) -> Tuple[Tensor, Tensor, Tensor, Tensor]:
variable_ctx, sparse_weights = self.vsn(x)
# Context vectors:
# variable selection context
# enrichment context
# state_c context
# state_h context
cs, ce, ch, cc = tuple(m(variable_ctx) for m in self.context_grns)
return cs, ce, ch, cc
class InterpretableMultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
self.n_head = config.n_head
assert config.hidden_size % config.n_head == 0
self.d_head = config.hidden_size // config.n_head
self.qkv_linears = nn.Linear(config.hidden_size, (2 * self.n_head + 1) * self.d_head, bias=False)
self.out_proj = nn.Linear(self.d_head, config.hidden_size, bias=False)
self.attn_dropout = nn.Dropout(config.attn_dropout)
self.out_dropout = nn.Dropout(config.dropout)
self.scale = self.d_head**-0.5
self.register_buffer("_mask", torch.triu(torch.full((config.example_length, config.example_length), float('-inf')), 1).unsqueeze(0))
def forward(self, x: Tensor, mask_future_timesteps: bool = True) -> Tuple[Tensor, Tensor]:
bs, t, h_size = x.shape
qkv = self.qkv_linears(x)
q, k, v = qkv.split((self.n_head * self.d_head, self.n_head * self.d_head, self.d_head), dim=-1)
q = q.view(bs, t, self.n_head, self.d_head)
k = k.view(bs, t, self.n_head, self.d_head)
v = v.view(bs, t, self.d_head)
# attn_score = torch.einsum('bind,bjnd->bnij', q, k)
attn_score = torch.matmul(q.permute((0, 2, 1, 3)), k.permute((0, 2, 3, 1)))
attn_score.mul_(self.scale)
if mask_future_timesteps:
attn_score = attn_score + self._mask
attn_prob = F.softmax(attn_score, dim=3)
attn_prob = self.attn_dropout(attn_prob)
# attn_vec = torch.einsum('bnij,bjd->bnid', attn_prob, v)
attn_vec = torch.matmul(attn_prob, v.unsqueeze(1))
m_attn_vec = torch.mean(attn_vec, dim=1)
out = self.out_proj(m_attn_vec)
out = self.out_dropout(out)
return out, attn_vec
class TemporalFusionTransformer(nn.Module):
"""
Implementation of https://arxiv.org/abs/1912.09363
"""
def __init__(self, config):
super().__init__()
if hasattr(config, 'model'):
config = config.model
self.encoder_length = config.encoder_length #this determines from how distant past we want to use data from
self.embedding = TFTEmbedding(config)
self.static_encoder = StaticCovariateEncoder(config)
self.history_vsn = VariableSelectionNetwork(config, config.num_historic_vars)
self.history_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.future_vsn = VariableSelectionNetwork(config, config.num_future_vars)
self.future_encoder = nn.LSTM(config.hidden_size, config.hidden_size, batch_first=True)
self.input_gate = GLU(config.hidden_size, config.hidden_size)
self.input_gate_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.enrichment_grn = GRN(config.hidden_size,
config.hidden_size,
context_hidden_size=config.hidden_size,
dropout=config.dropout)
self.attention = InterpretableMultiHeadAttention(config)
self.attention_gate = GLU(config.hidden_size, config.hidden_size)
self.attention_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.positionwise_grn = GRN(config.hidden_size,
config.hidden_size,
dropout=config.dropout)
self.decoder_gate = GLU(config.hidden_size, config.hidden_size)
self.decoder_ln = LayerNorm(config.hidden_size, eps=1e-3)
self.quantile_proj = nn.Linear(config.hidden_size, len(config.quantiles))
def forward(self, x: Dict[str, Tensor]) -> Tensor:
s_inp, t_known_inp, t_observed_inp, t_observed_tgt = self.embedding(x)
# Static context
cs, ce, ch, cc = self.static_encoder(s_inp)
ch, cc = ch.unsqueeze(0), cc.unsqueeze(0) #lstm initial states
# Temporal input
_historical_inputs = [t_known_inp[:,:self.encoder_length,:], t_observed_tgt[:,:self.encoder_length,:]]
if t_observed_inp is not None:
_historical_inputs.insert(0,t_observed_inp[:,:self.encoder_length,:])
historical_inputs = torch.cat(_historical_inputs, dim=-2)
future_inputs = t_known_inp[:, self.encoder_length:]
# Encoders
historical_features, _ = self.history_vsn(historical_inputs, cs)
history, state = self.history_encoder(historical_features, (ch, cc))
future_features, _ = self.future_vsn(future_inputs, cs)
future, _ = self.future_encoder(future_features, state)
torch.cuda.synchronize() # this call gives perf boost for unknown reasons
# skip connection
input_embedding = torch.cat([historical_features, future_features], dim=1)
temporal_features = torch.cat([history, future], dim=1)
temporal_features = self.input_gate(temporal_features)
temporal_features = temporal_features + input_embedding
temporal_features = self.input_gate_ln(temporal_features)
# Static enrichment
enriched = self.enrichment_grn(temporal_features, c=ce)
# Temporal self attention
x, _ = self.attention(enriched, mask_future_timesteps=True)
# Don't compute hictorical quantiles
x = x[:, self.encoder_length:, :]
temporal_features = temporal_features[:, self.encoder_length:, :]
enriched = enriched[:, self.encoder_length:, :]
x = self.attention_gate(x)
x = x + enriched
x = self.attention_ln(x)
# Position-wise feed-forward
x = self.positionwise_grn(x)
# Final skip connection
x = self.decoder_gate(x)
x = x + temporal_features
x = self.decoder_ln(x)
out = self.quantile_proj(x)
return out

View file

@ -0,0 +1 @@
tensorboard

View file

@ -0,0 +1,54 @@
#! /bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
NUM_GPUS=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
[ $NUM_GPUS -eq 16 ] && WORKER_NUMS=(1 8 16) || WORKER_NUMS=(1 8)
DATASETS=(electricity traffic)
rm -r /tmp/benchmark_results
for DATASET in ${DATASETS[@]}
do
for NGPU in ${WORKER_NUMS[@]}
do
for BATCH_SIZE in 512 1024 1536 2048 2560
do
for USE_AMP in --use_amp ""
do
for AFFINITY in "--affinity disabled" "--affinity single" "--affinity socket_unique_interleaved"
do
EXP_NAME="TFT_benchmark_${DATASET}_BS_${BATCH_SIZE}_${NGPU}GPU${USE_AMP}_${AFFINITY}"
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset ${DATASET} \
--data_path /data/processed/${DATASET}_bin \
--batch_size=${BATCH_SIZE} \
--lr 5e-4 \
--epochs 1 \
--sample 100000 5000 \
--seed 1 \
${USE_AMP} \
${AFFINITY} \
--clip_grad 0.1 \
--results /tmp/benchmark_results/${EXP_NAME}
done
done
done
done
done
for P in `ls /tmp/benchmark_results/`;
do
echo ${P}
tail -n 1 /tmp/benchmark_results/${P}/dllogger.json
done

View file

@ -0,0 +1,40 @@
#!/bin/bash
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
DATAPATH='/data'
declare -A URLS=( ['electricity']='https://archive.ics.uci.edu/ml/machine-learning-databases/00321/LD2011_2014.txt.zip'
['traffic']='https://archive.ics.uci.edu/ml/machine-learning-databases/00204/PEMS-SF.zip'
)
mkdir -p ${DATAPATH}/raw
mkdir -p ${DATAPATH}/processed
for DS in electricity traffic
do
DS_PATH=${DATAPATH}/raw/${DS}
ZIP_FNAME=${DS_PATH}.zip
if [ ! -d ${DS_PATH} ]
then
wget "${URLS[${DS}]}" -O ${ZIP_FNAME}
unzip ${ZIP_FNAME} -d ${DS_PATH}
fi
python -c "from data_utils import standarize_${DS} as standarize; standarize(\"${DS_PATH}\")"
python -c "from data_utils import preprocess; \
from configuration import ${DS^}Config as Config; \
preprocess(\"${DS_PATH}/standarized.csv\", \"${DATAPATH}/processed/${DS}_bin\", Config())"
done

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=30}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset electricity \
--data_path /data/processed/electricity_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_electricity_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,30 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
: ${SEED:=1}
: ${LR:=1e-3}
: ${NGPU:=8}
: ${BATCH_SIZE:=1024}
: ${EPOCHS:=20}
python -m torch.distributed.launch --nproc_per_node=${NGPU} train.py \
--dataset traffic \
--data_path /data/processed/traffic_bin \
--batch_size=${BATCH_SIZE} \
--sample 450000 50000 \
--lr ${LR} \
--epochs ${EPOCHS} \
--seed ${SEED} \
--use_amp \
--results /results/TFT_traffic_bs${NGPU}x${BATCH_SIZE}_lr${LR}/seed_${SEED}

View file

@ -0,0 +1,294 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import time
import os
import pickle
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.distributed as dist
from torch.utils.data import DataLoader, DistributedSampler, RandomSampler
from apex import amp
from apex.optimizers import FusedAdam
#from torch.nn.parallel import DistributedDataParallel as DDP
from apex.parallel import DistributedDataParallel as DDP
import numpy as np
import dllogger
from modeling import TemporalFusionTransformer
from configuration import CONFIGS
from data_utils import TFTBinaryDataset, sample_data
from log_helper import setup_logger
from criterions import QuantileLoss
from inference import predict
from utils import PerformanceMeter
import gpu_affinity
from ema import ModelEma
def load_dataset(args, config):
train_split = TFTBinaryDataset(os.path.join(args.data_path, 'train.bin'), config)
train_split = sample_data(train_split, args.sample_data[0])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(train_split, args.distributed_world_size, args.distributed_rank, seed=args.seed + args.distributed_rank, drop_last=True)
else:
data_sampler = RandomSampler(train_split)
train_loader = DataLoader(train_split, batch_size=args.batch_size, num_workers=4, sampler=data_sampler, pin_memory=True)
valid_split = TFTBinaryDataset(os.path.join(args.data_path, 'valid.bin'), config)
valid_split = sample_data(valid_split, args.sample_data[1])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(valid_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
valid_loader = DataLoader(valid_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
test_split = TFTBinaryDataset(os.path.join(args.data_path, 'test.bin'), config)
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(test_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
test_loader = DataLoader(test_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
print_once(f'Train split length: {len(train_split)}')
print_once(f'Valid split length: {len(valid_split)}')
print_once(f'Test split length: {len(test_split)}')
return train_loader, valid_loader, test_loader
def print_once(*args, **kwargs):
if not dist.is_initialized() or dist.get_rank() == 0:
print(*args, **kwargs)
def main(args):
# Enable CuDNN autotuner
nproc_per_node = torch.cuda.device_count()
if args.affinity != 'disabled':
affinity = gpu_affinity.set_affinity(
args.local_rank,
nproc_per_node,
args.affinity
)
print(f'{args.local_rank}: thread affinity: {affinity}')
torch.backends.cudnn.benchmark = True
### INIT DISTRIBUTED
if args.distributed_world_size > 1:
args.local_rank = int(os.environ.get('LOCAL_RANK', args.local_rank))
torch.cuda.set_device(args.local_rank)
dist.init_process_group(backend='nccl', init_method='env://')
args.distributed_world_size = int(os.environ['WORLD_SIZE'])
args.distributed_rank = dist.get_rank()
print_once(f'Distributed training with {args.distributed_world_size} GPUs')
torch.cuda.synchronize()
if args.seed:
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
setup_logger(args)
config = CONFIGS[args.dataset]()
if args.overwrite_config:
config.__dict__.update(json.loads(args.overwrite_config))
dllogger.log(step='HPARAMS', data={**vars(args), **vars(config)}, verbosity=1)
model = TemporalFusionTransformer(config).cuda()
if args.ema_decay:
model_ema = ModelEma(model, decay=args.ema_decay)
print_once('Model params: {}'.format(sum(p.numel() for p in model.parameters())))
criterion = QuantileLoss(config).cuda()
optimizer = FusedAdam(model.parameters(), lr=args.lr)
if args.use_amp:
model, optimizer = amp.initialize(model, optimizer, opt_level="O2", loss_scale="dynamic")
if args.distributed_world_size > 1:
#model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True)
model = DDP(model)
train_loader, valid_loader, test_loader = load_dataset(args, config)
global_step = 0
perf_meter = PerformanceMeter()
for epoch in range(args.epochs):
start = time.time()
dllogger.log(step=global_step, data={'epoch': epoch}, verbosity=1)
model.train()
for local_step, batch in enumerate(train_loader):
perf_meter.reset_current_lap()
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
loss = p_losses.sum()
if args.use_amp:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if not args.grad_accumulation or (global_step+1) % args.grad_accumulation == 0:
if args.clip_grad:
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip_grad)
optimizer.step()
optimizer.zero_grad()
if args.ema_decay:
model_ema.update(model)
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses /= args.distributed_world_size
loss = p_losses.sum()
torch.cuda.synchronize()
ips = perf_meter.update(args.batch_size * args.distributed_world_size,
exclude_from_total=local_step in [0, len(train_loader)-1])
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': loss.item(), 'items/s':ips}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
global_step += 1
validate(args, config, model_ema if args.ema_decay else model, criterion, valid_loader, global_step)
if validate.early_stop_c >= args.early_stopping:
print_once('Early stopping')
break
### TEST PHASE ###
state_dict = torch.load(os.path.join(args.results, 'checkpoint.pt'), map_location='cpu')
if isinstance(model, DDP):
model.module.load_state_dict(state_dict['model'])
else:
model.load_state_dict(state_dict['model'])
model.cuda().eval()
tgt_scalers = pickle.load(open(os.path.join(args.data_path, 'tgt_scalers.bin'), 'rb'))
cat_encodings = pickle.load(open(os.path.join(args.data_path,'cat_encodings.bin'), 'rb'))
unscaled_predictions, unscaled_targets, _, _ = predict(args, config, model, test_loader, tgt_scalers, cat_encodings)
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
quantiles = 2 * losses / normalizer
if args.distributed_world_size > 1:
quantiles = quantiles.cuda()
dist.all_reduce(quantiles)
quantiles /= args.distributed_world_size
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, 'average_ips':perf_meter.avg, 'convergence_step':validate.conv_step}
dllogger.log(step=(), data=finish_log, verbosity=1)
def validate(args, config, model, criterion, dataloader, global_step):
if not hasattr(validate, 'best_valid_loss'):
validate.best_valid_loss = float('inf')
if not hasattr(validate, 'early_stop_c'):
validate.early_stop_c = 0
model.eval()
losses = []
validation_start = time.time()
for batch in dataloader:
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
bs = next(t for t in batch.values() if t is not None).shape[0]
losses.append((p_losses, bs))
validation_end = time.time()
p_losses = sum([l[0]*l[1] for l in losses])/sum([l[1] for l in losses]) #takes into accunt that the last batch is not full
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses = p_losses/args.distributed_world_size
ips = len(dataloader.dataset) / (validation_end - validation_start)
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': p_losses.sum().item(), 'items/s':ips}
if log_dict['loss'] < validate.best_valid_loss:
validate.best_valid_loss = log_dict['loss']
validate.early_stop_c = 0
validate.conv_step = global_step
if not dist.is_initialized() or dist.get_rank() == 0:
state_dict = model.module.state_dict() if isinstance(model, (DDP, ModelEma)) else model.state_dict()
ckpt = {'args':args, 'config':config, 'model':state_dict}
torch.save(ckpt, os.path.join(args.results, 'checkpoint.pt'))
if args.distributed_world_size > 1:
dist.barrier()
else:
validate.early_stop_c += 1
log_dict = {'val_'+k:v for k,v in log_dict.items()}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', type=str, required=True,
help='Path to the dataset')
parser.add_argument('--dataset', type=str, required=True, choices=CONFIGS.keys(),
help='Dataset name')
parser.add_argument('--epochs', type=int, default=25,
help='Default number of training epochs')
parser.add_argument('--sample_data', type=lambda x: int(float(x)), nargs=2, default=[-1, -1],
help="""Subsample the dataset. Specify number of training and valid examples.
Values can be provided in scientific notation. Floats will be truncated.""")
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--lr', type=float, default=1e-3)
parser.add_argument('--seed', type=int, default=1)
parser.add_argument('--use_amp', action='store_true', help='Enable automatic mixed precision')
parser.add_argument('--clip_grad', type=float, default=0.0)
parser.add_argument('--grad_accumulation', type=int, default=0)
parser.add_argument('--early_stopping', type=int, default=1000,
help='Stop training if validation loss does not improve for more than this number of epochs.')
parser.add_argument('--results', type=str, default='/results',
help='Directory in which results are stored')
parser.add_argument('--log_file', type=str, default='dllogger.json',
help='Name of dllogger output file')
parser.add_argument('--distributed_world_size', type=int, metavar='N',
default=torch.cuda.device_count(),
help='total number of GPUs across all nodes (default: all visible GPUs)')
parser.add_argument('--distributed_rank', default=os.getenv('LOCAL_RANK', 0), type=int,
help='rank of the current worker')
parser.add_argument('--local_rank', default=0, type=int,
help='rank of the current worker')
parser.add_argument('--overwrite_config', type=str, default='',
help='JSON string used to overload config')
parser.add_argument('--affinity', type=str,
default='socket_unique_interleaved',
choices=['socket', 'single', 'single_unique',
'socket_unique_interleaved',
'socket_unique_continuous',
'disabled'],
help='type of CPU affinity')
parser.add_argument("--ema_decay", type=float, default=0.0, help='Use exponential moving average')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,46 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
class PerformanceMeter():
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.count = 0
self.total_time = 0
self.last_update_time = time.time()
self.intervals = []
def update(self, n, exclude_from_total=False):
delta = time.time() - self.last_update_time
self.intervals.append(delta)
if not exclude_from_total:
self.total_time += delta
self.count += n
self.avg = self.count / self.total_time
self.last_update_time = time.time()
return n/delta
def reset_current_lap(self):
self.last_update_time = time.time()
def p(self, i):
assert i <= 100
idx = int(len(self.intervals) * i / 100)
return sorted(self.intervals)[idx]

View file

@ -0,0 +1,294 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import time
import os
import pickle
import json
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.distributed as dist
from torch.utils.data import DataLoader, DistributedSampler, RandomSampler
from apex import amp
from apex.optimizers import FusedAdam
#from torch.nn.parallel import DistributedDataParallel as DDP
from apex.parallel import DistributedDataParallel as DDP
import numpy as np
import dllogger
from modeling import TemporalFusionTransformer
from configuration import CONFIGS
from data_utils import TFTBinaryDataset, sample_data
from log_helper import setup_logger
from criterions import QuantileLoss
from inference import predict
from utils import PerformanceMeter
import gpu_affinity
from ema import ModelEma
def load_dataset(args, config):
train_split = TFTBinaryDataset(os.path.join(args.data_path, 'train.bin'), config)
train_split = sample_data(train_split, args.sample_data[0])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(train_split, args.distributed_world_size, args.distributed_rank, seed=args.seed + args.distributed_rank, drop_last=True)
else:
data_sampler = RandomSampler(train_split)
train_loader = DataLoader(train_split, batch_size=args.batch_size, num_workers=4, sampler=data_sampler, pin_memory=True)
valid_split = TFTBinaryDataset(os.path.join(args.data_path, 'valid.bin'), config)
valid_split = sample_data(valid_split, args.sample_data[1])
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(valid_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
valid_loader = DataLoader(valid_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
test_split = TFTBinaryDataset(os.path.join(args.data_path, 'test.bin'), config)
if args.distributed_world_size > 1:
data_sampler = DistributedSampler(test_split, args.distributed_world_size, args.distributed_rank, shuffle=False, drop_last=False)
else:
data_sampler = None
test_loader = DataLoader(test_split, batch_size=args.batch_size, sampler=data_sampler, num_workers=4, pin_memory=True)
print_once(f'Train split length: {len(train_split)}')
print_once(f'Valid split length: {len(valid_split)}')
print_once(f'Test split length: {len(test_split)}')
return train_loader, valid_loader, test_loader
def print_once(*args, **kwargs):
if not dist.is_initialized() or dist.get_rank() == 0:
print(*args, **kwargs)
def main(args):
# Enable CuDNN autotuner
nproc_per_node = torch.cuda.device_count()
if args.affinity != 'disabled':
affinity = gpu_affinity.set_affinity(
args.local_rank,
nproc_per_node,
args.affinity
)
print(f'{args.local_rank}: thread affinity: {affinity}')
torch.backends.cudnn.benchmark = True
### INIT DISTRIBUTED
if args.distributed_world_size > 1:
args.local_rank = int(os.environ.get('LOCAL_RANK', args.local_rank))
torch.cuda.set_device(args.local_rank)
dist.init_process_group(backend='nccl', init_method='env://')
args.distributed_world_size = int(os.environ['WORLD_SIZE'])
args.distributed_rank = dist.get_rank()
print_once(f'Distributed training with {args.distributed_world_size} GPUs')
torch.cuda.synchronize()
if args.seed:
np.random.seed(args.seed)
torch.manual_seed(args.seed)
torch.cuda.manual_seed(args.seed)
setup_logger(args)
config = CONFIGS[args.dataset]()
if args.overwrite_config:
config.__dict__.update(json.loads(args.overwrite_config))
dllogger.log(step='HPARAMS', data={**vars(args), **vars(config)}, verbosity=1)
model = TemporalFusionTransformer(config).cuda()
if args.ema_decay:
model_ema = ModelEma(model, decay=args.ema_decay)
print_once('Model params: {}'.format(sum(p.numel() for p in model.parameters())))
criterion = QuantileLoss(config).cuda()
optimizer = FusedAdam(model.parameters(), lr=args.lr)
if args.use_amp:
model, optimizer = amp.initialize(model, optimizer, opt_level="O2", loss_scale="dynamic")
if args.distributed_world_size > 1:
#model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True)
model = DDP(model)
train_loader, valid_loader, test_loader = load_dataset(args, config)
global_step = 0
perf_meter = PerformanceMeter()
for epoch in range(args.epochs):
start = time.time()
dllogger.log(step=global_step, data={'epoch': epoch}, verbosity=1)
model.train()
for local_step, batch in enumerate(train_loader):
perf_meter.reset_current_lap()
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
loss = p_losses.sum()
if args.use_amp:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
loss.backward()
if not args.grad_accumulation or (global_step+1) % args.grad_accumulation == 0:
if args.clip_grad:
torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip_grad)
optimizer.step()
optimizer.zero_grad()
if args.ema_decay:
model_ema.update(model)
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses /= args.distributed_world_size
loss = p_losses.sum()
torch.cuda.synchronize()
ips = perf_meter.update(args.batch_size * args.distributed_world_size,
exclude_from_total=local_step in [0, len(train_loader)-1])
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': loss.item(), 'items/s':ips}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
global_step += 1
validate(args, config, model_ema if args.ema_decay else model, criterion, valid_loader, global_step)
if validate.early_stop_c >= args.early_stopping:
print_once('Early stopping')
break
### TEST PHASE ###
state_dict = torch.load(os.path.join(args.results, 'checkpoint.pt'), map_location='cpu')
if isinstance(model, DDP):
model.module.load_state_dict(state_dict['model'])
else:
model.load_state_dict(state_dict['model'])
model.cuda().eval()
tgt_scalers = pickle.load(open(os.path.join(args.data_path, 'tgt_scalers.bin'), 'rb'))
cat_encodings = pickle.load(open(os.path.join(args.data_path,'cat_encodings.bin'), 'rb'))
unscaled_predictions, unscaled_targets, _, _ = predict(args, config, model, test_loader, tgt_scalers, cat_encodings)
losses = QuantileLoss(config)(unscaled_predictions, unscaled_targets)
normalizer = unscaled_targets.abs().mean()
quantiles = 2 * losses / normalizer
if args.distributed_world_size > 1:
quantiles = quantiles.cuda()
dist.all_reduce(quantiles)
quantiles /= args.distributed_world_size
quantiles = {'test_p10': quantiles[0].item(), 'test_p50': quantiles[1].item(), 'test_p90': quantiles[2].item(), 'sum':sum(quantiles).item()}
finish_log = {**quantiles, 'average_ips':perf_meter.avg, 'convergence_step':validate.conv_step}
dllogger.log(step=(), data=finish_log, verbosity=1)
def validate(args, config, model, criterion, dataloader, global_step):
if not hasattr(validate, 'best_valid_loss'):
validate.best_valid_loss = float('inf')
if not hasattr(validate, 'early_stop_c'):
validate.early_stop_c = 0
model.eval()
losses = []
validation_start = time.time()
for batch in dataloader:
with torch.no_grad():
batch = {key: tensor.cuda() if tensor.numel() else None for key, tensor in batch.items()}
predictions = model(batch)
targets = batch['target'][:,config.encoder_length:,:]
p_losses = criterion(predictions, targets)
bs = next(t for t in batch.values() if t is not None).shape[0]
losses.append((p_losses, bs))
validation_end = time.time()
p_losses = sum([l[0]*l[1] for l in losses])/sum([l[1] for l in losses]) #takes into accunt that the last batch is not full
if args.distributed_world_size > 1:
dist.all_reduce(p_losses)
p_losses = p_losses/args.distributed_world_size
ips = len(dataloader.dataset) / (validation_end - validation_start)
log_dict = {'P10':p_losses[0].item(), 'P50':p_losses[1].item(), 'P90':p_losses[2].item(), 'loss': p_losses.sum().item(), 'items/s':ips}
if log_dict['loss'] < validate.best_valid_loss:
validate.best_valid_loss = log_dict['loss']
validate.early_stop_c = 0
validate.conv_step = global_step
if not dist.is_initialized() or dist.get_rank() == 0:
state_dict = model.module.state_dict() if isinstance(model, (DDP, ModelEma)) else model.state_dict()
ckpt = {'args':args, 'config':config, 'model':state_dict}
torch.save(ckpt, os.path.join(args.results, 'checkpoint.pt'))
if args.distributed_world_size > 1:
dist.barrier()
else:
validate.early_stop_c += 1
log_dict = {'val_'+k:v for k,v in log_dict.items()}
dllogger.log(step=global_step, data=log_dict, verbosity=1)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--data_path', type=str, required=True,
help='Path to the dataset')
parser.add_argument('--dataset', type=str, required=True, choices=CONFIGS.keys(),
help='Dataset name')
parser.add_argument('--epochs', type=int, default=25,
help='Default number of training epochs')
parser.add_argument('--sample_data', type=lambda x: int(float(x)), nargs=2, default=[-1, -1],
help="""Subsample the dataset. Specify number of training and valid examples.
Values can be provided in scientific notation. Floats will be truncated.""")
parser.add_argument('--batch_size', type=int, default=64)
parser.add_argument('--lr', type=float, default=1e-3)
parser.add_argument('--seed', type=int, default=1)
parser.add_argument('--use_amp', action='store_true', help='Enable automatic mixed precision')
parser.add_argument('--clip_grad', type=float, default=0.0)
parser.add_argument('--grad_accumulation', type=int, default=0)
parser.add_argument('--early_stopping', type=int, default=1000,
help='Stop training if validation loss does not improve for more than this number of epochs.')
parser.add_argument('--results', type=str, default='/results',
help='Directory in which results are stored')
parser.add_argument('--log_file', type=str, default='dllogger.json',
help='Name of dllogger output file')
parser.add_argument('--distributed_world_size', type=int, metavar='N',
default=torch.cuda.device_count(),
help='total number of GPUs across all nodes (default: all visible GPUs)')
parser.add_argument('--distributed_rank', default=os.getenv('LOCAL_RANK', 0), type=int,
help='rank of the current worker')
parser.add_argument('--local_rank', default=0, type=int,
help='rank of the current worker')
parser.add_argument('--overwrite_config', type=str, default='',
help='JSON string used to overload config')
parser.add_argument('--affinity', type=str,
default='socket_unique_interleaved',
choices=['socket', 'single', 'single_unique',
'socket_unique_interleaved',
'socket_unique_continuous',
'disabled'],
help='type of CPU affinity')
parser.add_argument("--ema_decay", type=float, default=0.0, help='Use exponential moving average')
ARGS = parser.parse_args()
main(ARGS)

View file

@ -0,0 +1,46 @@
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import time
class PerformanceMeter():
def __init__(self):
self.reset()
def reset(self):
self.avg = 0
self.count = 0
self.total_time = 0
self.last_update_time = time.time()
self.intervals = []
def update(self, n, exclude_from_total=False):
delta = time.time() - self.last_update_time
self.intervals.append(delta)
if not exclude_from_total:
self.total_time += delta
self.count += n
self.avg = self.count / self.total_time
self.last_update_time = time.time()
return n/delta
def reset_current_lap(self):
self.last_update_time = time.time()
def p(self, i):
assert i <= 100
idx = int(len(self.intervals) * i / 100)
return sorted(self.intervals)[idx]

View file

@ -0,0 +1,8 @@
.idea
**/.ipynb_checkpoints
**/__pycache__
**/.gitkeep
.git
.gitignore
Dockerfile
.dockerignore

View file

@ -0,0 +1,5 @@
.ipynb_checkpoints
__pycache__
/outputs/
*.zip
/datasets/*/

View file

@ -0,0 +1,61 @@
#SPDX-License-Identifier: Apache-2.0
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.09-py3
FROM ${FROM_IMAGE_NAME}
ENV DEBIAN_FRONTEND=noninteractive
ENV DCGM_VERSION=2.2.9
ENV MODEL_NAVIGATOR_CONTAINER=1
RUN apt-get update && \
apt-get install --no-install-recommends -y software-properties-common curl python3-dev python3-pip python-is-python3 libb64-dev wget git wkhtmltopdf && \
\
curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - && \
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian buster stable" && \
apt-get update && \
apt-get install --no-install-recommends -y docker-ce docker-ce-cli containerd.io && \
\
. /etc/os-release && \
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey| apt-key add - && \
curl -s -L "https://nvidia.github.io/nvidia-docker/${ID}${VERSION_ID}/nvidia-docker.list" > /etc/apt/sources.list.d/nvidia-docker.list && \
apt-get update && \
apt-get install --no-install-recommends -y nvidia-docker2 && \
\
curl -s -L -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
dpkg -i datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
rm datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
\
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Install perf_client required library
RUN apt-get update && \
apt-get install -y libb64-dev libb64-0d curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Set workdir and python path
WORKDIR /workspace
ENV PYTHONPATH /workspace
RUN pip install --upgrade pip
ADD requirements.txt /workspace/requirements.txt
ADD triton/requirements.txt /workspace/triton/requirements.txt
RUN pip install -r /workspace/requirements.txt
RUN pip install -r /workspace/triton/requirements.txt
RUN pip install nvidia-pyindex
RUN pip install nvidia-dllogger
RUN pip install --no-cache-dir -r requirements.txt -f https://data.dgl.ai/wheels/repo.html
# Add model files to workspace
ADD . /workspace
# AMP monkey-patch
RUN sed -i 's/ def forward(ctx,/ @amp.custom_fwd\(cast_inputs=torch.float32\)\n def forward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/ def backward(ctx,/ @amp.custom_bwd\n def backward(ctx,/g' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN sed -i 's/^import torch$/import torch\nfrom torch.cuda import amp/' /opt/conda/lib/python3.8/site-packages/apex/normalization/fused_layer_norm.py
RUN rm -rf examples
RUN rm -rf docker-examples
RUN rm -rf tutorial

View file

@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,24 @@
Individual Contributor License Agreement (CLA)
Thank you for submitting your contributions to this project.
By signing this CLA, you agree that the following terms apply to all of your past, present and future contributions to the project.
License.
You hereby represent that all present, past and future contributions are governed by the Apache 2.0 License copyright statement.
This entails that to the extent possible under law, you transfer all copyright and related or neighboring rights of the code or documents you contribute to the project itself or its maintainers. Furthermore you also represent that you have the authority to perform the above waiver with respect to the entirety of you contributions.
Moral Rights.
To the fullest extent permitted under applicable law, you hereby waive, and agree not to assert, all of your “moral rights” in or relating to your contributions for the benefit of the project.
Third Party Content.
If your Contribution includes or is based on any source code, object code, bug fixes, configuration changes, tools, specifications, documentation, data, materials, feedback, information or other works of authorship that were not authored by you (“Third Party Content”) or if you are aware of any third party intellectual property or proprietary rights associated with your Contribution (“Third Party Rights”), then you agree to include with the submission of your Contribution full details respecting such Third Party Content and Third Party Rights, including, without limitation, identification of which aspects of your Contribution contain Third Party Content or are associated with Third Party Rights, the owner/author of the Third Party Content and Third Party Rights, where you obtained the Third Party Content, and any applicable third party license terms or restrictions respecting the Third Party Content and Third Party Rights. For greater certainty, the foregoing obligations respecting the identification of Third Party Content and Third Party Rights do not apply to any portion of a Project that is incorporated into your Contribution to that same Project.
Representations.
You represent that, other than the Third Party Content and Third Party Rights identified by you in accordance with this Agreement, you are the sole author of your Contributions and are legally entitled to grant the foregoing licenses and waivers in respect of your Contributions. If your Contributions were created in the course of your employment with your past or present employer(s), you represent that such employer(s) has authorized you to make your Contributions on behalf of such employer(s) or such employer (s) has waived all of their right, title or interest in or to your Contributions.
Disclaimer.
To the fullest extent permitted under applicable law, your Contributions are provided on an "as is" basis, without any warranties or conditions, express or implied, including, without limitation, any implied warranties or conditions of non-infringement, merchantability or fitness for a particular purpose. You are not required to provide support for your Contributions, except to the extent you desire to provide support.
No Obligation.
You acknowledge that the maintainers of this project are under no obligation to use or incorporate your contributions into the project. The decision to use or incorporate your contributions into the project will be made at the sole discretion of the maintainers or their authorized delegates.

View file

@ -0,0 +1,205 @@
This repository contains code from https://github.com/google-research/google-research/tree/master/tft under the Apache 2.0 License (included below).
This repository contains code from https://github.com/rwightman/pytorch-image-models/blob/master/timm/utils/model_ema.py under the Apache 2.0 License (included below).
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View file

@ -0,0 +1,491 @@
# Time-Series Prediction Platform 1.0 for PyTorch
Time-series prediction is a common problem in multiple domains for various applications, including retail, industry, smart cities, and financial services. Research in the time-series field is growing exponentially, with hundreds of deep learning time-series forecasting paper submissions to ICML, ECML, ITISE, and multiple journals every year. However, there is currently no common framework to compare the accuracy and performance of all the models from the industry or academia.
## Solution Overview
Time-Series Prediction Platform (TSPP) enables users to mix and match datasets and models. In this case, the user has complete control over the following settings, and can compare side-by-side results obtained from various solutions. These include:
- Evaluation metrics
- Evaluation datasets
- Prediction horizons
- Prediction sliding window sizes Model choice
- Model hyperparameters
### Time-Series Prediction Platform architecture
The platform has the following architecture.
![Time-series Prediction Platform architecture
](TSPP_Architecture.png)
In the previous figure, the command line feeds input to the TSPP launcher, which uses said input to configure the components required to train and test the model.
The platform is designed to support multiple data types for input features, including the observed values of the forecasted time-series, known data supporting the forecasts (for example, day of the week), and static data (for example, user ID). This is summarized in the following figure.
<div align="center">
<img width="70%" src="https://developer.download.nvidia.com/time-series-platform/time_series_data.png" title="Time-series data type">
<p style="text-align:center"><b>Time-series data type</b></p>
<br>
</div>
### Default configuration
The TSPP utilizes the default configurations provided by each model for each accompanying dataset. More information on individual model configurations can be found within the respective model repositories. By default, Temporal Fusion Transformer (TFT) is included within the TSPP.
### Models
- Temporal Fusion Transformers XXX INSERT LINK HERE
- AutoARIMA
### Feature support matrix
This tool supports the following features:
| Feature | Time-Series Prediction Platform
|-----------------------|--------------------------
|[Automatic mixed precision (AMP)](https://pytorch.org/docs/stable/amp.html)| Yes
|[Multi-GPU training with (PyTorch DDP)](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) | Yes
|[TorchScript, ONNX, and TRT conversion and NVIDIA Triton Deployment] | Yes
#### Features
**Automatic Mixed Precision (AMP)**[Automatic mixed precision](https://pytorch.org/docs/stable/amp.html) is a mode of computation for PyTorch models that allows operations to use float16 operations instead of float32 operations, potentially accelerating selected operations and total model runtime. More information can be found under the Mixed precision training section.
**Multi-GPU training with PyTorch Distributed Data Parallel (DDP)**[DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) is a mode of computation for PyTorch models that allows operations to be executed across multiple GPUs in parallel to accelerate computation.
**TorchScript, ONNX, and TRT conversion and NVIDIA Triton Deployment** refer to the conversion of a model to the aforementioned formats and the ability to deploy the resulting converted models to an NVIDIA Triton inference server. More detail about this process and native inference can be found in the Advanced tab under the Conversion, Deployment, and Inference subsection.
### Mixed precision training
Mixed precision is the combined use of different numerical precisions in a computational method. [Mixed precision](https://arxiv.org/abs/1710.03740) training offers significant computational speedup by performing operations in half-precision format while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. Since the introduction of [Tensor Cores](https://developer.nvidia.com/tensor-cores) in NVIDIA Volta, and following with both the NVIDIA Turing and NVIDIA Ampere Architectures, significant training speedups are experienced by switching to mixed precision -- up to 3x overall speedup on the most arithmetically intense model architectures. Using mixed precision training requires two steps:
1. Porting the model to use the FP16 data type where appropriate.
2. Adding loss scaling to preserve small gradient values.
The ability to train deep learning networks with lower precision was introduced in the NVIDIA Pascal architecture and first supported in [CUDA 8](https://devblogs.nvidia.com/parallelforall/tag/fp16/) in the NVIDIA Deep Learning SDK.
For information about:
- How to train using mixed precision, refer to the [Mixed Precision Training](https://arxiv.org/abs/1710.03740) paper and [Training With Mixed Precision](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html) documentation.
- Techniques used for mixed precision training, refer to the [Mixed-Precision Training of Deep Neural Networks](https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/) blog.
- How to access and use AMP for PyTorch, refer to [Torch-AMP](https://pytorch.org/docs/stable/amp.html) guide.
#### Enabling mixed precision
Mixed precision can be enabled by specifying `amp=True` in the launch call. Note that for some cases, when the batch size is small, the overhead of scheduling kernels for mixed precision can be larger than the performance gain from using lower precision, effectively succeeding with lower throughput.
## Setup
The following section lists the requirements that you need to meet in order to run the Time-Series Prediction Platform.
### Requirements
This repository contains a Dockerfile that extends the PyTorch NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
- [NVIDIA Ampere Architecture](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/), [NVIDIA Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [NVIDIA Turing](https://www.nvidia.com/en-us/geforce/turing/) based GPU
- Ubuntu 18.04
- [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
- [docker-compose](https://docs.docker.com/compose/install/). For an up-to-date version, installing from the web is recommended
- Custom Docker containers built for this model. Refer to the steps in the [Quick Start Guide](#quick-start-guide).
For more information about how to get started with NGC containers, refer to the following sections from the NVIDIA GPU Cloud Documentation and the Deep Learning Documentation:
- [Getting Started Using NVIDIA GPU Cloud](https://docs.nvidia.com/ngc/ngc-getting-started-guide/index.html)
- [Accessing And Pulling From The NGC Container Registry](https://docs.nvidia.com/deeplearning/frameworks/user-guide/index.html#accessing_registry)
For those unable to set up the required environment or create your own container, refer to the versioned [NVIDIA Container Support Matrix](https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html).
## Quick start guide
### Getting Started
1. Create a dataset directory. The directory can be arbitrary, and it is recommended not to include it in the TimeSeriesPredictionPlatform directory. This arbitrary directory will be mounted to the TSPP container later. In the following steps this directory will be referred to as /your/datasets/.
2. Enter the Deep Learning Examples TSPP repository:
```
cd DeeplearningExamples/Tools/PyTorch/TimeSeriesPredictionPlatform
```
3. Run repository setup
```
source scripts/setup.sh
```
3. Build the docker image:
```
docker build -t tspp .
```
4. Next we will start our container and mount the dataset directory, which means that /workspace/datasets/ points to /your/datasets/. Any changes made to this folder in the docker container are reflected in the original directory and vice versa. If we want to mount additional folders we can add -v /path/on/local/:/path/in/container/ to the run command. This will be useful if we want to save the outputs from training or inference once we close the container. To start the docker container:
```
docker run -it --gpus all --ipc=host --network=host -v /your/datasets/:/workspace/datasets/ tspp bash
```
5. After running the previous command you will be placed inside the docker container in the /workspace directory. Inside the container, download either the electricity or traffic dataset:
```
python data/script_download_data.py --dataset {dataset_name} --output_dir /workspace/datasets/
```
The raw electricity dataset is the 15 minute electricity consumption of 370 customers from the UCI Electricity Load Diagrams. We aggregate to an hourly forecast and use the previous week to predict the following day.
The raw traffic dataset is the 10 minute occupancy rate of San Francisco freeways from 440 sensors downloaded from the UCI PEMS-SF Data Set. We again aggregate to an hourly forecast and use the previous week to predict the following day.
6. Preprocess the dataset:
```
python launch_preproc.py dataset={dataset}
```
7. Launch the training, validation, and testing process using the temporal fusion transformer model:
```
python launch_tspp.py model=tft dataset={dataset} criterion=quantile
```
Outputs are stored in /workspace/outputs/{date}/{time}
### Adding a new dataset
The TSPP has been designed to work with most CSV input. In order to add an arbitrary dataset to the TSPP:
1. Enter the Deep Learning Examples TSPP repository:
```
cd DeeplearningExamples/Tools/PyTorch/TimeSeriesPredictionPlatform
```
2. Include the target dataset in the directory in which you want to keep your datasets. The directory can be arbitrary, and it is recommended not to include it in the TimeSeriesPredictionPlatform directory. This arbitrary directory will be mounted to the TSPP container later
```
cp -r /PATH/TO/YOUR/DATASET /your/datasets/
```
3. Create a configuration file for your dataset, found in TimeSeriesPredictionPlatform/conf/dataset, that includes the following values:
* source_path: The path to the CSV that contains your dataset
* dest_path: The path to where preprocessing should write your preprocessed dataset
* time_ids: The name of the column within your source CSV that is the feature to split your training, validation, and test datasets on.
* train_range, valid_range, test_range: The ranges that mark the edges of the train, validation, and test subsets. Remember that there can be overlap between subsets since predicting the first unseen element requires the input of the seen elements before it.
* dataset_stride: The stride the dataloader uses to walk the sliding window through the dataset. Default: 1
* scale_per_id: Whether to scale continuous features during preprocessing using scalers fitted on just samples from the same ID (True), or all samples (False, Default)
* encoder_length: The length of data known up until the present
* example_length: The length of all data, including data known into the future. The target you are predicting lies on the difference between the example_length and encoder_length.
* features: A list of the features that the model takes as input. Each feature should be represented by an object containing descriptive attributes. All features should have at least a feature_type (ID, TIME, TARGET, WEIGHT, SAMPLE_WEIGHT, KNOWN, OBSERVED, or STATIC) and feature_embed_type (CONTINUOUS or CATEGORICAL). Continuous features may have a scaler attribute that represents the type of scaler used in preprocessing. Categorical columns should have a cardinality attribute that represents the number of unique values that that feature takes. Examples can be found in the files in /TimeSeriesPredictionPlatform/conf/dataset/. Required features are one TIME feature, at least one ID feature, one TARGET feature, and at least one KNOWN, OBSERVED, or STATIC feature.
* train_samples: The number of samples that should be taken at train time to use as train input to your model for a single epoch
* valid_samples: The number of samples that should be taken at train time to use as validation input to your model for a single epoch
* binarized: Whether or not preprocessing should accelerate data loading by outputting the preprocessed dataset in a binarized format
* time_series_count: The number of unique time-series contained in the dataset.
4. After a specification has been written, it is ready to be preprocessed with:
```
docker build -t tspp .
docker run -it --gpus all -v /your/datasets/:/workspace/datasets/ --ipc=host tspp bash
python launch_preproc.py dataset={dataset_name}
```
For some models, additional parameters are required per each dataset. As mentioned in the Adding a new model section, there are examples of these model-dataset combination files in `TimeSeriesPredictionPlatform/conf/model_dataset/`. An example here would be model A requiring a specific hidden size when used on dataset B. In this case, TimeSeriesPredictionPlatform/conf/model_dataset/A_B.yaml should contain the desired hidden size under config.model.hidden_size
5. Test your dataset by training and evaluating a temporal fusion transformer. Training, validation, and testing are all included by default using the launch_tspp.py command shown below:
```
docker run -it --gpus all -v /your/datasets/:/workspace/datasets/ --ipc=host tspp bash
python launch_tspp.py dataset={YOUR_DATASET} model=tft criterion=quantile
```
If you encounter errors stating that srcIndex < value, verify that your categorical cardinalities are the correct size, as this error indicates that the value of a categorical you are trying to embed is too large for its respective embedding table.
### Adding a new model
Models added to the prediction platform are subject to a few key constraints. Namely, the models should be constructed using vanilla PyTorch. Models should be handling the forecasting task (anomaly detection and classification are planned); models should expect that the data is fed in a sliding window and that tensors will be aggregated by Temporal/Data Type. An example of how this works can be found in data/data_utils.py. Integrated models should be expecting the data to be in the format described by the feature spec for a specific dataset (output being a dictionary of tensors aggregated based on temporal and feature type).
To integrate a model into the TSPP:
1. Enter the Deep Learning Examples repository:
```
cd DeeplearningExamples
```
2. Copy the model files into the Deep Learning Examples PyTorch/Forecasting/ directory:
```
cp -r /PATH/TO/YOUR/MODEL PyTorch/Forecasting/
```
3. Write a configuration file for the model in `DeeplearningExamples/Tools/TimeSeriesPredictionPlatform/conf/model`.
This configuration file should reflect the default configuration for your model. Within this file, the _target_ of the model component should be set to point to your model class. If your model needs additional configuration values based on the dataset, you should create a configuration file in `DeeplearningExamples/Tools/TimeSeriesPredictionPlatform/conf/model_dataset/{modelname_dataset_name.yaml}` named according to the model and dataset names. Examples can be found in the `DeeplearningExamples/Tools/TimeSeriesPredictionPlatform/conf/model/tft.yaml` and `DeeplearningExamples/Tools/TimeSeriesPredictionPlatform/conf/model_dataset/tft_traffic.yaml` files.
4. Build and launch container:
```
cd DeeplearningExamples/Tools/PyTorch
source scripts/setup.sh
docker build -t tspp TimeSeriesPredictionPlatform
docker run -it --rm --ipc=host --network=host --gpus all -v /PATH/TO/YOUR/DATASET/FOLDER/:/workspace/datasets/ tspp bash
```
5. Verify that the model can be run within the TSPP:
```
python launch_tspp.py model={model_name}
```
Some additional values may be needed in this call. For example, if your model requires the Adam optimizer, you will need to append optimizer=Adam to your call.
## Advanced
The following sections provide greater details of changing the dataset, altering the data preprocessing, and comparing the training results.
### Running multi-GPU experiments
Launching on multi-GPU requires no changes to model code and can be executed as follows within a TSPP container:
```
python -m torch.distributed.run --nproc_per_node={num_GPUS} launch_tspp.py {override parameters} +config.device.world_size={num_GPUS}
```
Statistical models (like AutoARIMA)are not run on GPU, so they are not suitable for multi-GPU acceleration.
### Running experiments with Exponential Moving Averaging
Exponential moving averaging is a technique in which, while training, the model weights are integrated into a weighted moving average, and the weighted moving average is used in lieu of the directly trained model weights at test time. Our experiments have found this technique improves the convergence properties of most models and datasets we work with. The full paper of EMA can be found here (https://arxiv.org/pdf/1803.05407.pdf)
To activate EMA in the TSPP, simply specify ema=True in the command line call at runtime. The decay parameter in the moving average can be modified using the config.trainer.ema.decay parameter
### Hyperparameter Search
Hyperparameter search can be used to find semi-optimal hyperparameter configurations for a given model or dataset. In the TSPP, hyperparameter search is driven by Optuna.
To launch hyperparameter search, one must first have a base config. One can be generated by running launch_tspp.py with desired values and +config.save_config=True and +config.save_path=/path/to/conf.yaml
Once a config file has been generated in /path/to/conf.yaml, open it and replace any field you want to include as a searchable hyperparameter with an optuna variable config. This optuna variable config describes the value you are searching on as well as the distribution that value is pulled from.
The possible Optuna sampling objects and the parameters that you can use are:
- categorical: samples from values uniformly.
- values: The values categorical sampling can take
- int_uniform: samples uniformly from the range specified by (min_value, max_value, step_value)
- min_value: the minimum value that int_unfiorm sampling can take
- max_value: the maximum value that int_unfiorm sampling can take
- step_value (optional): the size of the steps in between possible samples
- float_uniform: samples uniformly from the range specified by (min_value, max_value)
- min_value: the minimum value that float_unfiorm sampling can take
- max_value: the maximum value that float_unfiorm sampling can take
- log_uniform: samples using the log distribution from the range specified by (min_value, max_value)
- min_value: the minimum value that log_unfiorm sampling can take
- max_value: the maximum value that log_unfiorm sampling can take
- discrete_uniform: samples uniformly from the range specified by (min_value, max_value, step_value)
- min_value: the minimum value that discrete_uniform sampling can take
- max_value: the maximum value that discrete_uniform sampling can take
- step_value (optional): the size of the steps in between possible samples
For example, to sample batch size between 512 and 1024, replace the batch size object with:
batch_size:
sampling: categorical
values:
- 512
- 1024
To sample learning rate with uniform probability between .1 and 1, we can replace the lr with:
lr:
sampling: float_uniform
min_value: .1
max_value: 1.0
Once all desired values have been replaced with Optuna objects, append an Optuna field within the config to the bottom, with sub field n_trials to denote how many Optuna trials should be run and optionally a description of the Optuna sampler to use.
Once this config file is saved, you can run python launch_optuna.py --config_path /path/to/conf.yaml. This script attempts to make use of all visible GPUs. Currently, we do not support using a varied number of GPUs for separate searches, meaning the world_size config field should be an integer instead of a list. In addition, we do not support the use of multi-process dataloading in parameter searches meaning the num_workers is set to 0. The number of concurrent trials being run is equal to the floor of the number of GPUs divided by the fixed world size. Outputs will still be saved to /workspace/outputs/{DATE}/{TIME}/. Each concurrent trial will perform independent n_trial different runs, yet all outputs are saved by the same optuna study. This means that if 4 subprocesses are launched with 10 trials specified in the config, then 40 trials are run. Optuna will always run n_trials trials, and will not necessarily run the entire set of possible runs if the set size is bounded. For example, if you ran a set of 4 trials, where the only Optuna object being optimized is a categorical with 3 values, not all 3 values would necessarily occur within the trials.
### Conversion, Deployment, and Inference
Inference takes place after a model has been trained and one wants to run data through. Since this only entails using a forward function, the model can be optimized and converted to many different formats that can perform the forward pass more efficiently. In addition, one can set up a [NVIDIA Triton inference server](https://github.com/triton-inference-server/server), which allows for a continuous stream of data to be presented to and passed through the model. The server provides an inference service via an HTTP or gRPC endpoint at ports 8000 and 8001, respectively, on the “bridge” docker network.
The TSPP supports a few versions of inference, including native inference and NVIDIA Triton deployment. Both use the test_forward function specified in the model config (defaults to forward()) as the forward function.
To launch native inference, one must have a checkpoint directory from a TSPP training call that includes a .hydra directory and a best_checkpoint.pth.tar. Then run
```
python launch_inference.py device={device} config.evaluator.checkpoint=/path/to/checkpoint/directory
```
Note: Do not confuse the checkpoint directory with the TimeSeriesPredictionPlatform/outputs/ directory. The directory to use in the inference call is two levels lower (for example, /path/to/TimeSeriesPredictionPlatform/outputs/2021-08-23/03-03-11/).
The device argument refers to the device that one would like the model to be built on and run on. Note that multi-GPU inference launches are not supported. By default, the evaluator uses the configs specified in the .hydra/config.yaml file from the checkpoint directory. One can override these by including them in the launch. For example, if one wanted to adjust the metrics to use MAE and RMSE only and to set the device to the CPU.
```
python launch_inference device=cpu config.evaluator.checkpoint=/path/to/checkpoint/directory “+config.evaluator.metrics=[MAE, RMSE]”
```
Note: Be sure to include the + when overriding any of the evaluator configs.
Prior to the next section, make sure that the TSPP container is run with the following arguments from the TSPP directory
```
docker run -it --rm --gpus all --ipc=host --network=host -v /your/datasets/:/workspace/datasets/ -v /your/outputs/:/your/outputs/ -v $(pwd):$(pwd) -v /your/outputs/:/workspace/outputs/ -v /var/run/docker.sock:/var/run/docker.sock tspp
```
In the previous command, note that five different directories are mounted. The datasets are mounted to the usual location, but we have two different mount locations for outputs. Mounting the outputs to /workspace/outputs/ allows usual training calls to be saved in your output directory. The second output mount is mounted to the same path as the output directory is in the host. This is essential due to the way we deploy to NVIDIA Triton, the directory of the output in the docker must match the directory of the output on the host machine. Additionally, the mount for /var/run/docker.sock allows the tspp docker container to launch another container, in our case this is the NVIDIA Triton server. In subsequent calls to launch_deployment.py, the /path/to/checkpoint/directory/ must be of the form /your/outputs/{checkpoint_dir} instead of /workspace/outputs/{checkpoint_dir} and should be absolute paths. From testing, the best output directory to use appears to be TSPP/outputs.
Finally, note that to run the deployment script, you must be in the same directory path in the container as the TSPP is stored on your machine. This means that simply being in /workspace in the container may not work for running the deployment. If outside the container your TimeSeriesPredictionPlatform is at /home/user/TimeSeriesPredictionPlatform, you must be at the same path in your docker container (/home/user/TimeSeriesPredictionPlatform). This is the purpose of the -v $(pwd):$(pwd) in the run script.
To launch conversion and deployment, one must again have a checkpoint directory from a TSPP training call that includes a .hydra directory and a best_checkpoint.pth.tar. In addition, the model that will be converted must already support conversion to the required format. In the current version of the TSPP, we first export the model to either TorchScript-Script or TorchScript-Trace and subsequently convert to TorchScript, Onnx, or TRT using the model-navigator package. We also support export to Onnx and conversion to both Onnx and TRT. To run
```
python launch_deployment export={ts-trace, ts-script, onnx} convert={torchscript, onnx, trt} config.evaluator.checkpoint=/path/to/checkpoint/directory
```
The format mapping is listed below
TorchScript-Script: ts-script
TorchScript-Trace: ts-trace
TorchScript: torchscript
Onnx: onnx
TRT: trt
Note that the conversions do not support the apex fused LayerNorm library. In order to get around this, we set the os environ variable TFT_SCRIPTING” to True when loading the model for deployment. This changes the apex LayerNorm to vanilla torch LayerNorm.
Similarly to the native inference, one can again override the evaluator configs. In addition, one can select the batch size and precision of the conversion, using config.inference.batch_size and config.inference.precision=Choice[ fp32, fp16 ] respectively. Once export and conversion have been done, the results are stored in /path/to/checkpoint/directory/deployment. Subsequently, the converted models NVIDIA Triton config is generated in the /path/to/checkpoint/directory/deployment/navigator_workspace/model-store/ directory. In addition a docker NVIDIA Triton server is launched based on this directory and inference is run through NVIDIA Triton. Finally, the outputs of this inference are used to calculate the metrics. The outputs of this inference and results of the metric calculation are stored in the brand new output directory created at TimeSeriesPredictionPlatform/outputs/todays date/time at launch/. Within this directory the metrics are stored in metrics.csv, and the raw outputs of the inference are stored in the raw/ directory. The NVIDIA Triton model name is set as the second directory to the model. For example, in the case of our TFT model, whose path is models.tft_pyt.TemporalFusionTransformer, the name of the NVIDIA Triton model is tft_pyt.
An additional option in running deployment is selecting whether to run the basics of conversion and NVIDIA Triton config creation or to run the full pipeline of conversion, NVIDIA Triton config creation, profiling, analysis, and helm chart creation. Setting config.inference.optimize=True during launch switches to the full pipeline. Another part of optimization is setting the backend accelerator for NVIDIA Triton config generation. Setting config.inference.accelerator=Choice[none, trt] changes the accelerator specified. Note that this defaults to none and trt is only compatible with the Onnx conversion. If one wants to launch the NVIDIA Triton inference server using a specific GPU, the cuda index can be specified with the config option config.inference.gpu, which defaults to 0.
More information on the conversion is located here:
https://github.com/triton-inference-server/model_navigator/blob/main/docs/conversion.md
More information on the NVIDIA Triton config creation is located here: https://github.com/triton-inference-server/model_navigator/blob/main/docs/triton_model_configurator.md
More information on the full pipeline is located here:
https://github.com/triton-inference-server/model_navigator/blob/main/docs/run.md
If one only wants to run the latter part of the launch_deployment script, which includes the NVIDIA Triton server initialization, inference, and metrics calculation, set the option config.inference.skip_conversion=True at launch. The call still requires the checkpoint directory and for that directory to be set up in the same format as the result for a regular launch_deployment call (contains a deployment/navigator_workspace/model-store/ directory with the NVIDIA Triton models).
For this option of skipping the conversion, there is a config option +config.inference.model_name, which can be set to the NVIDIA Triton model name. This does not set the name of the model, but rather selects which of the possible models in the model-store directory will be used for inference. This is useful after a call using the optimize option, which can generate multiple different models in the model-store.
If one only wants to launch the NVIDIA Triton server and keep it live, set the option config.inference.just_deploy=True at launch. Again, like the previous option of skipping conversion, the checkpoint directory is still required and must conform to the format for the NVIDIA Triton models. This will not run inference automatically nor perform any other actions, it will solely start the NVIDIA Triton server using the given models.
For both the launch_inference and launch_deployment one can specify what dataset and target_scalers to use (if any) as long as the data shapes do not conflict with the already trained model. To specify a dataset directory use +config.inference.dataset_dir=/path/to/dataset. The dataset directory must contain a composite_scaler.bin file as well as either train.bin/valid.bin/test.bin or train.csv/valid.csv/test.csv depending on the configuration option config.dataset.binarized (this option cannot be changed during deployment or inference). Once the path has been set, deployment and inference both use the test dataset.
Our TFT model supports export to TorchScript-Trace and conversion to all formats.
If you encounter an error such as
```
RuntimeError: Model tft_pyt:1 is not ready
```
Or
```
ERROR root Exception in callback <function InferenceServerClient.async_infer.<locals>.wrapped_callback at 0x7f9437b469d0>: AttributeError("'InferenceServerException' object has no attribute 'get_response'")
```
There are a few possible reasons for this to come up. First, make sure that when the TSPP docker container was launched the network argument was set to host. Next, one can run “docker ps”; if the container “trt_server_cont” shows up, close it using “docker stop trt_server_cont”. After this, one should try rerunning the command. If neither of these steps is applicable or the problem persists, it is a more specific issue that requires more debugging.
### Parameters
Parameters for each individual component are stored in
```
/workspace/conf/{component_type}/{component_name}.yaml
```
For example, the default parameters for TFT are stored in
```
/workspace/conf/model/tft.yaml
```
For component selection, the options are:
**dataset**: `electricity`, `traffic`
**model**: `tft`, `auto_arima`, `trivial_model`
**criterion**: `GLL`, `MSE`, `quantile`
**device**: `cuda`, `cuda_8GPU`, `cpu`
**optimizer**: refer to `/workspace/conf/optimizer`
**ema**: `True`, this is assumed False by default.
**amp**: `True`, this is assumed False by default.
If a parameter does not exist in the config, you must prepend `+` to its reference in the command line call. For example, `+config.evaluator.target_scalers=...` adds target_scalers to config.evaluator, but config.evaluator.target_scalers=... errors.
Non-individual component-specific parametrization is listed below. Parameters are listed hierarchically, that is the config has an attribute trainer, which has an attribute `num_epochs` that controls the length of training:
`config.log_path`: where to save your logs
`config.trainer.batch_size`: the batch size to use
`config.trainer.num_workers`: the number of workers to use for dataloading
`config.trainer.num_epochs`: the number of epochs to train the model for
`config.trainer.AMP`: whether to enable AMP for accelerated training
`config.dataset.source_path`: where the original file (before preproc) is stored
`config.dataset.dest_path`: the directory from which to save/read the preprocessed dataset
`config.dataset.time_ids`: the feature on which to split the dataset into `train`, `valid`, `test`
`config.dataset.train_range`: the range of the time feature that represents the `train` set
`config.dataset.valid_range`: the range of the time feature that represents the `validation` set
`config.dataset.test_range`: the range of the time feature that represents the `test` set
`config.dataset.dataset_stride`: the stride to use when creating the dataset
`config.dataset.scale_per_id`: whether to scale each series based on series statistics (`True`) or statistics across all series (`False`)
`config.dataset.encoder_length`: the length of past data that is fed to the model
`config.dataset.example_length`: the length of the full data that we are passing to the model. The length of the prediction horizon is the difference between encoder and example length
`config.dataset.features`: the features that the model will be using
`config.dataset.train_samples`: the number of examples to sample for our `train` dataset from our `train` partition
`config.dataset.valid_samples`: the number of examples to sample for our `validation` dataset from our `validation` partition
`config.dataset.binarized`: whether or not to use a binarized dataset for speedup
`config.device.world_size`: the number of GPUs the launcher is running on
`config.optimizer.gradient_norm`: the maximum norm of gradients allowed via gradient clipping
`config.optimizer.lr`: the learning rate to use for the optimizer
NOTE: Any optimizer from `torch.optim` can be used, and all keywords can be specified by changing `config.optimizer` with an additional attribute
`config.evaluator.use_weights`: whether to weight metrics by weights specified in the input. Note: There must be a `WEIGHT` feature specified in `config.dataset.features` for this feature to work
`config.evaluator.target_scalers`: scalers used to unscale targets so that non-normalized predictions and targets are used for metric calculation
`config.evaluator.output_selector`: selects which output to use if the model has multiple outputs per time step (quantiles are an example)
`config.evaluator.label_selector`: selects which label to use if the labels have multiple values per time step
`config.evaluator.precision`: the precision to format the output metrics to
`config.evaluator.metrics`: a list of metrics to calculate on the test set
`config.evaluator.checkpoint`: path to the checkpoint directory containing the checkpoint to be loaded for inference/deployment
`config.inference.batch_size`: the batch size to be used for inference or deployment
`config.inference.precision`: the precision of the exported model
`config.inference.optimize`: setting to True runs the model-navigator run script over the convert and triton-config-model
`config.inference.skip_conversion`: during deployment, skips the export, conversion, and configuration. Instead, starts the inference server, run inference, and calculate metrics
`config.inference.just_deploy`: starts the NVIDIA Triton server based on the NVIDIA Triton model specified in the checkpoint directory
`config.inference.dataset_dir`: overrides the default dataset path
`config.inference.model_name`: uses the model listed under this model name when deploying to the NVIDIA Triton server. This will not change the default name assigned to the models in the model-store directory
`config.inference.accelerator`: switches the backend accelerator in the triton-config-model step of the process,
`config.inference.gpu`: uses the gpu at this cuda index when launching the NVIDIA Triton inference server
## Release Notes
Were constantly refining and improving our performance on AI and HPC workloads, even on the same hardware with frequent updates to our software stack. For our latest performance data, refer to these pages for [AI](#https://developer.nvidia.com/deep-learning-performance-training-inference) and [HPC](#https://developer.nvidia.com/hpc-application-performance) benchmarks.
### Changelog
November 2021
- Initial release
### Known issues
There are no known issues with this tool.

Binary file not shown.

After

Width:  |  Height:  |  Size: 75 KiB

Some files were not shown because too many files have changed in this diff Show more