Commit graph

43 commits

Author SHA1 Message Date
PrzemekS
af32ac2a23
Merge pull request #456 from NVIDIA/sharathts-patch-1
Fix to load Google's checkpoint
2020-05-07 11:09:20 +02:00
Sharath T S
4733603577
[BERT/PyT] Fix squad inference corner case (#462) 2020-04-20 21:02:18 -07:00
Sharath T S
a9c997cd57
Fix to load Google's checkpoint 2020-04-13 11:34:56 -07:00
Sharath T S
5626846924
[BERT/PyT] Revert from native gelu. Breaks ONNX export. (#447) 2020-04-07 10:28:45 -07:00
Przemek Strzelczyk
26c2676104 [BERT/PyT] Triton Inference Server support 2020-04-02 14:39:24 +02:00
Sharath T S
793b92dca7
[BERT/PyT] fp32 and allreduce_post_accumulation compatibility (#422) 2020-03-15 23:03:06 -07:00
Przemek Strzelczyk
96ff411ce8 [BERT/PyT] Typo in README 2020-03-05 09:54:16 +01:00
Przemek Strzelczyk
155578a762 [BERT/PyT] New logging and some README updates 2020-02-28 13:21:20 +01:00
PrzemekS
ce73b32068
Merge pull request #392 from NVIDIA/nvpstr/1def26
Updating BERT/TF, Transformer-XL and NCF/PyT
2020-02-06 19:47:30 +01:00
Przemek Strzelczyk
a38deff61e [Transformer-XL/PyT] Large model support; multi-node training; inference with TorchScript 2020-02-05 22:38:46 +01:00
Sharath T S
ad88003e13
[BERT/PyT] Glue(MRPC) fine-tuning with LAMB pretrained checkpoint
* LAMB checkpoint compatibility
* LAMB checkpoint compatibility; amp training
2020-02-03 16:27:11 -08:00
Sharath T S
119838f1f6
Bugfix in BertAdam for fp32 finetuning (#388) 2020-01-30 20:11:03 -08:00
PrzemekS
aa061052c6
Merge pull request #345 from nvcforster/master
Updating BERT Readme Docs
2020-01-02 14:35:59 +01:00
Hyungon_Ryu
fb2709e002 [Transformer-XL/PyT] Update train.py (#348)
bug fix for inv_sqrt scheduler
2019-12-16 10:06:42 +01:00
Chris Forster
67b7543feb
Update README.md
Adding link to our Medium.com article that provides details about implementing the LAMB optimizer.
2019-12-13 12:51:41 -08:00
PrzemekS
dc63c016cf
Merge pull request #312 from sharathts/patch-6
Fix case with one training shard only
2019-12-04 15:30:13 +01:00
Przemek Strzelczyk
ca28f55476 [Transformer-XL/PyT] renaming folders 2019-11-28 09:48:59 +01:00
Przemek Strzelczyk
3d46067af9 Adding TransformerXL/PyT 2019-11-27 17:00:18 +01:00
Sharath T S
657874ae09
Fix case with one training shard only 2019-11-20 10:52:32 -08:00
Przemek Strzelczyk
a70896405d Updating BERT/PyT
* Use LAMB from APEX
* Code cleanup
* Bug fix in BertAdam optimizer
2019-11-18 23:07:24 +01:00
Sharath T S
78e97e324e
Fix incorrect perf numbers 2019-10-25 15:52:44 -07:00
Sharath T S
f24491a940
fix single gpu support 2019-10-18 17:28:02 -07:00
Sharath T S
7121e21d11
fix logging total steps 2019-10-16 15:12:00 -07:00
Sharath T S
8cc635f638
Fix training perf calculation 2019-10-15 17:39:39 -07:00
Przemek Strzelczyk
8b249efad6 Minor fixes to BERT/PyT 2019-09-13 15:23:39 +02:00
Przemek Strzelczyk
6fe463fe27 [BERT/PyT] Support for multi-node 2019-09-10 17:21:52 +02:00
Chris Forster
71e2b22d4a Update bertPrep.py (#183) 2019-08-29 21:49:02 +02:00
Chris Forster
e72ea6947b BERT-PyT subprocess for bzip in wikidownloader (#180)
* Removing unnecessary subprocess.communicate calls

* Updating Bookscorpus downloader to require less memory

* Renaming variable
2019-08-29 07:21:53 +02:00
Chris Forster
3d3ff3e168 Cleanup and Readme Update (#174)
* update perf tables

* remove ide files

* fix tokenizer

* copyrights

* remove .communicate()

* refine training scripts

* fix more typos
2019-08-27 21:44:21 +02:00
Sharath T S
3d59216cec [BERT] [PyTorch] Data prep fix (#171)
* add dgx1-16g and dgx2 specific pretraining instructions

* fix typo in readme

* fix data prep and reflect changes in pretraining

* remove .ide files

* remove data files

* Point to right SQUAD location

* remove garbage [[]]

* default accumulation in fp32

* remove ide files

* fix phase2 DATADIR path

* remove readme in data folder
2019-08-22 07:52:18 +02:00
Sharath T S
b6fb9aa463 [BERT][PyTorch]: add dgx1-16g and dgx2 specific pretraining instructions (#164)
* add dgx1-16g and dgx2 specific pretraining instructions

* fix typo in readme
2019-08-21 09:49:32 +02:00
nv-kkudrynski
9f7616dc54
minor readme fix 2019-08-14 13:30:37 +02:00
Cliff Woolley
b7bf42d76c
Update README.md
Fix typo
2019-08-13 16:12:52 -07:00
Cliff Woolley
608663f6ec Don't omit the data/ scripts from docker build 2019-08-13 15:41:48 -07:00
Cliff Woolley
8546c7a6df Cleanups 2019-08-13 15:33:32 -07:00
Cliff Woolley
7afcd73af1 Cleanup unneeded files 2019-08-13 15:32:00 -07:00
Krzysztof Kudrynski
bae6e931bd updating BERT (single node LAMB support) 2019-08-13 23:27:54 +02:00
sharatht
803963408a remove directory check in data download 2019-08-02 22:28:02 -07:00
Przemek Strzelczyk
8218872051 Updating BERT with TRT-IS support and new results 2019-07-25 16:53:05 +02:00
yzhang123
0af34d778c
fix launch.sh 2019-07-24 12:24:43 -07:00
yzhang123
2eb764b43c
fix build.sh 2019-07-24 12:23:34 -07:00
Przemek Strzelczyk
a644350589 Updating models and adding BERT/PyT
Tacotron2+Waveglow/PyT
* AMP support
* Data preprocessing for Tacotron 2 training
* Fixed dropouts on LSTMCells

SSD/PyT
* script and notebook for inference
* AMP support
* README update
* updates to examples/*

BERT/PyT
* initial release

GNMT/PyT
* Default container updated to NGC PyTorch 19.05-py3
* Mixed precision training implemented using APEX AMP
* Added inference throughput and latency results on NVIDIA Tesla V100 16G
* Added option to run inference on user-provided raw input text from command line

NCF/PyT
* Updated performance tables.
* Default container changed to PyTorch 19.06-py3.
* Caching validation negatives between runs

Transformer/PyT
* new README
* jit support added

UNet Medical/TF
* inference example scripts added
* inference benchmark measuring latency added
* TRT/TF-TRT support added
* README updated

GNMT/TF
* Performance improvements

Small updates (mostly README) for other models.
2019-07-16 21:13:08 +02:00
Przemek Strzelczyk
0663b67c1a Updating models 2019-07-08 22:51:28 +02:00