Commit graph

3936 commits

Author SHA1 Message Date
bene-ges 5b603fb80c
typos (#2989)
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com>
2021-10-11 14:48:25 -07:00
Carol Anderson 7de97d71c0
update zero shot intent model (#2977)
* update zero shot intent model

Signed-off-by: Carol Anderson <carola@nvidia.com>

* remove from_pretrained from TextClassificationModel

Signed-off-by: Carol Anderson <carola@nvidia.com>
2021-10-11 12:45:34 -07:00
Boris Fomitchev 1a75dc5230
Fixing BERT export and ORT check (#2965)
* Fixing BERT export and ORT check

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Fixed test

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing code review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
2021-10-08 17:46:08 -07:00
Jason be7114e2d9
Update README.rst (#2973)
Signed-off-by: Jason <jasoli@nvidia.com>
2021-10-08 11:33:11 -06:00
Jagadeesh Balam 9c26f5d533
Narrowband augmentation for ASR models (#2946)
* Added narrowband augmentatation during on spectrogram, added ogg codec to transcodeperturbation

Signed-off-by: jbalam <jbalam@nvidia.com>

* Changes to AudioToMelSpectrogramProcessor for nb augmentation

Signed-off-by: jbalam <jbalam@nvidia.com>

* Minor clean up

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* removed unused arguments

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* style fix

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added new arguments to config

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>

* Fixes to config changes causing test failures

Signed-off-by: jbalam <jbalam@nvidia.com>

* changed check for applying attenuation

Signed-off-by: jbalam-nv <4916480+jbalam-nv@users.noreply.github.com>
2021-10-07 15:37:26 -07:00
Fedor 5ec2c5c18e
reading max_sequence_len parameter from config fixed (#2961)
Signed-off-by: Fedor Streltsov <sfeaal@gmail.com>
2021-10-07 11:30:27 -07:00
Micha Livne 6796faa62e
1. Fixing undeclared variables. (#2939)
Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
2021-10-07 03:45:25 -04:00
Sandeep Subramanian f7b6f14fa4
Rename neural machine translation to text2sparql (#2955)
* Rename neural machine translation to text2sparql

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Import fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
2021-10-06 17:03:00 -07:00
Eric Harper 91fd9ea970
Merge final doc and bug fixes from r1.4.0 to main (#2952)
* update branch for jenkinsfile and dockerfile

Signed-off-by: ericharper <complex451@gmail.com>

* Typos (#2884)

* segmentation tutorial fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* data fixes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* Minor Fixes (#2922)

* typo

Signed-off-by: Jason <jasoli@nvidia.com>

* remove notebook from docs

Signed-off-by: Jason <jasoli@nvidia.com>

* Adding Conformer-Transducer docs. (#2920)

* added Conformer-Transducer docs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Added contextnet.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed the title.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Fix numba spec augment for cases where batch size > MAX_THREAD_BUFFER (#2924)

* Fix numba spec augment for cases where batch size > MAX_THREAD_BUFFER

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert print in test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update readme for r1.4.0 (#2927)

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* Updated readme for r1.4.0.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* New NMT Models (#2925)

* New pretrained models

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update NMT docs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Eric Harper <complex451@gmail.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* revert

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
2021-10-06 08:21:54 -06:00
tbartley94 d8924ffb2c
Itn fr (#2947)
* typos (#2909)

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Updated docs (#2911)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Nmt encoder decoder hidden size fix (#2856)

* 1. Enabled encoder/decoder with different size in bottleneck architecture.
2. Validating encoder/decoder with the same size in non-bottleneck parent class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added hidden_size ot error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing defaults.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixing CI tests to have same hidden_size.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating Jenkins CI test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating CI to hidden=48

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size when loading pre-trained huggingface model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size in config for pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated missng hidden_size in config.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Jenkinsfile test values.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating missing arguments for Jenkinsfile test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* First commit. French ITN grammars for tagger and verbalizer. Test for French inverse_normalize added to tests. inverse_text_normalize updated to allow 'fr' tag. tools/text_processing/deployment/pynini_export.py updated to accept 'fr' tag. All CI tests for grammars passed.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Ran style checker.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixed bug causing ordinals to fail sparrowhawk test when verbalizing as roman numbers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* style change for verbalizer/ordinal.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Add DALI dataset unit test (#2904)

Signed-off-by: Joaquin Anton <janton@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Delete test.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Cleaning up unused import spaces for lgtm check.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* taggers/time.py missed style checker

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr lacked an __init__ file

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Merge r1.4 bugfixes to main (#2918)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* update branch for jenkinsfile and dockerfile

Signed-off-by: ericharper <complex451@gmail.com>

* Adding conformer-transducer models. (#2717)

* added the models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added contextnet models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added german and chinese models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fix the abs_pos of conformer. (#2863)

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* update to match sde (#2867)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* updated german ngc model (#2871)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Lower bound PTL to safe version (#2876)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebooks with onnxruntime (#2880)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Upperbound PTL (#2881)

Signed-off-by: smajumdar <titu1994@gmail.com>

* minor typo and broken link fixes (#2883)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove numbers from TTS tutorial names (#2882)

* Remove numbers from TTS tutorial names

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* Update documentation links

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* Typos (#2884)

* segmentation tutorial fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* data fixes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* updated the messages in eval_beamsearch_ngram.py. (#2889)

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* style (#2890)

Signed-off-by: Jason <jasoli@nvidia.com>

* Fix broken link (#2891)

* fix broken link

Signed-off-by: fayejf <fayejf07@gmail.com>

* more

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update sclite eval for new transcription method (#2893)

* Update sclite to use updated inference

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove WER

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update sclite script to use new inference methods

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove hub 5

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix TransformerDecoder export - r1.4 (#2875)

* export fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* embedding pos

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* remove bool param

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* changes

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Update Finetuning notebook (#2906)

* update notebook

Signed-off-by: Jason <jasoli@nvidia.com>

* rename

Signed-off-by: Jason <jasoli@nvidia.com>

* rename

Signed-off-by: Jason <jasoli@nvidia.com>

* revert branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fix several bugs in punctuation and capitalization inference and make minor improvements (#2905)

* Add save labels arg to method and remove device setting

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix device bug and reading plain text bug

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Make minor improvements

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Remove excess parameter

Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* add fix to not add dot everywhere (#2885)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright wording and adding whitelisting for titles.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright headers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* copyright header change (missed whitelist)

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Edited export_grammars.sh notes to include 'fr'. Made verbalizer/decimal.py rewrite class part of main class instead.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Adjusting copyright headers for tests.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr/__init__ copyright header

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* addint __init__ file to fr/data

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* TN infer  (#2929)

* en_small grammars added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* infer fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist arg

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input fall back

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstring

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Experiment manager step timing (#2936)

* 1. Enabled encoder/decoder with different size in bottleneck architecture.
2. Validating encoder/decoder with the same size in non-bottleneck parent class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added hidden_size ot error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing defaults.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixing CI tests to have same hidden_size.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating Jenkins CI test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating CI to hidden=48

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size when loading pre-trained huggingface model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size in config for pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated missng hidden_size in config.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Jenkinsfile test values.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating missing arguments for Jenkinsfile test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added a generic timer class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Renamed file.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. exp_manager timing of train/val/test using callbaks is ready.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. FIxed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Step timing hooks are tested. Logging does not record values due to a bug (should be solved with upgraded ptl)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added workaround hooks to MTEncDecModel.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed logging issue. All NeMo models support timing.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed unused timer object.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added missing copyright.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. NamedTimer supports multiple reductions.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed leftover file.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating code to latest.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated docstring.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added exp_manager.step_timing_sync_cuda to config to enable cuda sync on start/stop (False by default).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed variable names.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added exp_manager.step_timing_kwargs nested config for clarity and future extensibility.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed formatting.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added train_backward_timing timing.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing for optional none timing kwargs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Removed 'cents' from minor currency denominations to avoid ambiguity issues with cardinals.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Wrote quick readme explaining orthography variation for French ITN.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Update README.md

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Update README.md

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Update README.md

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Removed pence from minor currencies, added back in.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Joaquin Anton <janton@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
2021-10-05 13:13:03 -07:00
Micha Livne ec6591e76a
Experiment manager step timing (#2936)
* 1. Enabled encoder/decoder with different size in bottleneck architecture.
2. Validating encoder/decoder with the same size in non-bottleneck parent class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added hidden_size ot error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing defaults.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixing CI tests to have same hidden_size.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating Jenkins CI test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating CI to hidden=48

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size when loading pre-trained huggingface model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size in config for pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated missng hidden_size in config.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Jenkinsfile test values.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating missing arguments for Jenkinsfile test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added a generic timer class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Renamed file.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. exp_manager timing of train/val/test using callbaks is ready.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. FIxed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Step timing hooks are tested. Logging does not record values due to a bug (should be solved with upgraded ptl)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added workaround hooks to MTEncDecModel.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed logging issue. All NeMo models support timing.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed unused timer object.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added missing copyright.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. NamedTimer supports multiple reductions.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed leftover file.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating code to latest.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated docstring.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added exp_manager.step_timing_sync_cuda to config to enable cuda sync on start/stop (False by default).

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed variable names.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added exp_manager.step_timing_kwargs nested config for clarity and future extensibility.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed formatting.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added train_backward_timing timing.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing for optional none timing kwargs.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
2021-10-01 17:53:37 -04:00
Evelina 5f5a9a0a1d
TN infer (#2929)
* en_small grammars added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* infer fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist arg

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add input fall back

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstring

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-30 15:41:04 -07:00
tbartley94 d0c97aab6a
Itn fr (#2921)
* First commit. French ITN grammars for tagger and verbalizer. Test for French inverse_normalize added to tests. inverse_text_normalize updated to allow 'fr' tag. tools/text_processing/deployment/pynini_export.py updated to accept 'fr' tag. All CI tests for grammars passed.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Ran style checker.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixed bug causing ordinals to fail sparrowhawk test when verbalizing as roman numbers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* style change for verbalizer/ordinal.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Delete test.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Cleaning up unused import spaces for lgtm check.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* taggers/time.py missed style checker

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr lacked an __init__ file

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright wording and adding whitelisting for titles.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright headers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* copyright header change (missed whitelist)

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Edited export_grammars.sh notes to include 'fr'. Made verbalizer/decimal.py rewrite class part of main class instead.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Adjusting copyright headers for tests.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr/__init__ copyright header

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* addint __init__ file to fr/data

Signed-off-by: tbartley94 <tbartley@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
2021-09-30 14:00:28 -07:00
Yang Zhang 6d7f1a5339
add fix to not add dot everywhere (#2885)
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-09-29 11:22:24 -07:00
PeganovAnton e524be390d
Fix several bugs in punctuation and capitalization inference and make minor improvements (#2905)
* Add save labels arg to method and remove device setting

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix device bug and reading plain text bug

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Make minor improvements

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Remove excess parameter

Signed-off-by: PeganovAnton <peganoff2@mail.ru>
2021-09-29 14:21:56 +03:00
Eric Harper 58bc1d2c6c
Merge r1.4 bugfixes to main (#2918)
* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* update branch for jenkinsfile and dockerfile

Signed-off-by: ericharper <complex451@gmail.com>

* Adding conformer-transducer models. (#2717)

* added the models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added contextnet models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added german and chinese models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fix the abs_pos of conformer. (#2863)

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* update to match sde (#2867)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* updated german ngc model (#2871)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Lower bound PTL to safe version (#2876)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebooks with onnxruntime (#2880)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Upperbound PTL (#2881)

Signed-off-by: smajumdar <titu1994@gmail.com>

* minor typo and broken link fixes (#2883)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove numbers from TTS tutorial names (#2882)

* Remove numbers from TTS tutorial names

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* Update documentation links

Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com>

* Typos (#2884)

* segmentation tutorial fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* data fixes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* updated the messages in eval_beamsearch_ngram.py. (#2889)

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* style (#2890)

Signed-off-by: Jason <jasoli@nvidia.com>

* Fix broken link (#2891)

* fix broken link

Signed-off-by: fayejf <fayejf07@gmail.com>

* more

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update sclite eval for new transcription method (#2893)

* Update sclite to use updated inference

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove WER

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update sclite script to use new inference methods

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove hub 5

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix TransformerDecoder export - r1.4 (#2875)

* export fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* embedding pos

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* remove bool param

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* changes

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Update Finetuning notebook (#2906)

* update notebook

Signed-off-by: Jason <jasoli@nvidia.com>

* rename

Signed-off-by: Jason <jasoli@nvidia.com>

* rename

Signed-off-by: Jason <jasoli@nvidia.com>

* revert branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Jocelyn <jocelynh@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
2021-09-28 20:13:55 -06:00
Joaquin Anton c88cfc42eb
Add DALI dataset unit test (#2904)
Signed-off-by: Joaquin Anton <janton@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
2021-09-28 12:46:25 -07:00
Micha Livne 3d678dbff1
Nmt encoder decoder hidden size fix (#2856)
* 1. Enabled encoder/decoder with different size in bottleneck architecture.
2. Validating encoder/decoder with the same size in non-bottleneck parent class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed typo.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added hidden_size ot error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing defaults.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixing CI tests to have same hidden_size.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated error message.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating Jenkins CI test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating CI to hidden=48

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size when loading pre-trained huggingface model.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing hidden_size in config for pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated missng hidden_size in config.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updated Jenkinsfile test values.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder)

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Updating missing arguments for Jenkinsfile test.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
2021-09-28 10:08:24 -06:00
Vitaly Lavrukhin 0c7fbad290
Updated docs (#2911)
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2021-09-27 21:13:57 -07:00
Evelina fd3b4552a0
typos (#2909)
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-27 17:31:11 -07:00
Carol Anderson 9d83a1893b
add zero shot intent model (#2861)
* add zero shot intent model

Signed-off-by: Carol Anderson <carola@nvidia.com>

* update copyright headers for zero shot

Signed-off-by: Carol Anderson <carola@nvidia.com>

* fix typos in zero shot tutorial

Signed-off-by: Carol Anderson <carola@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-09-27 15:41:09 -07:00
Evelina d08f1dc91d
format update (#2901)
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-26 14:38:07 -07:00
Vitaly Lavrukhin 5e51840ed5
SDE Updates (#2900)
* Removed text keywords from filters in SDE (to support as values)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Added signal metrics to SDE
Added SDE histograms for all numeric attributes
Improved SDE UI

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated code style

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated SDE requirements

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs (SDE + minor fixes)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2021-09-26 12:26:09 -07:00
Evelina 8cf9aad8ec
update readme with the tools sections (#2895)
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-24 21:44:14 -07:00
Micha Livne 3f6aee0433
1. Updated Jenkinsfile hidden_size. (#2892)
Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
2021-09-24 16:10:19 -06:00
Evelina ed2005eda9
TN/ITN update (#2854)
* from file added for all modes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* directions map

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* decoder eval

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* separate eval and inference added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* handle unk tokens and proper pre-post processing

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-22 13:44:24 -07:00
Joaquin Anton ad75db34c8
Fix DALI log default floor and normalization formula (#2869)
* Fix DALI log default floor and normalization formula

Signed-off-by: Joaquin Anton <janton@nvidia.com>

* Fix style

Signed-off-by: Joaquin Anton <janton@nvidia.com>
2021-09-22 11:29:25 -07:00
Joaquin Anton 6987b6cfc9
Fix window_stride calculation in DALI pipeline & Fix dither generation (#2858)
Signed-off-by: Joaquin Anton <janton@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
2021-09-21 13:17:29 -07:00
PeganovAnton 660e401db5
Feat/punctuation capitalization/long queries signoff (#2683)
* Move files from long_queries branch

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* remove sys.path modification

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Update tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Remove unused imports

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Move all code to punctuate_capitalize.py

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix minor bug

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Improve help message

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Improve help message

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add docstrings and typing

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add remark about default parameter values

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* refactor

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Refactor

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix typo

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix typo

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix script name

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix script name

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Fix code style

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-09-21 16:31:24 +03:00
Jason 08f060a80e
Remove file (#2855)
Signed-off-by: Jason <jasoli@nvidia.com>
2021-09-20 14:17:45 -04:00
Oktai Tatanov 3cde074436
New TTSDataset, tts tokenizers and g2ps (#2792)
* new vocabs and g2ps for tts

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* fix style

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update tts torch data

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update g2p modules, data and add example for tts vocabs

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* fix style

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update tts dataset

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* add tokens field to tts dataset

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update tts dataset

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* add TTSDataset and docs for all of them

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* fix paths in yaml

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update test for tts dataset

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* add heteronyms-030921 file to scripts folder

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* change requirements_torch_tts.txt

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* add tts_data_types.py

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* fix style tts_data_types.py

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update yaml and comments

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update cmu dict and tts ds config

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* remove unnecessary argument from tokenizers

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>

* update test

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
2021-09-20 16:20:12 +03:00
Nithin Rao 3f606194f2
Update model names (#2845)
* updated speaker model names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update tutorial model names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
2021-09-19 18:58:04 -07:00
Yang Zhang b1e1494688
Tn punct train (#2824)
* add punct to tn inference and test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* change code to train

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix error

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix error in tagger dataset

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix class based evaluation to print error

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix input processing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add lang to combine processed datasets

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-09-17 13:01:24 -07:00
Evelina e443d71f28
pkl name fix (#2843)
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-17 10:34:34 -07:00
Nithin Rao 128b22d147
import fix (#2821)
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
2021-09-17 09:39:16 -04:00
Somshubra Majumdar bb4565c1b3
Update collection of pretrained models for RNNT (#2837)
* Update collection of pretrained models for RNNT

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove non-public Conformer MLS Medium

Signed-off-by: smajumdar <titu1994@gmail.com>
2021-09-16 18:50:25 -07:00
Sandeep Subramanian 76a8459b93
New pretrained NMT model links (#2836)
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
2021-09-16 17:07:38 -07:00
Micha Livne f4523c57ac
Max pooling encoder (#2774)
* 1. Added a max pooling encoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fied style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Removed unused imports.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added logging of log var q(z|x) for MIM and VAE.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added control parameter to use the mean of latent code (instead of samplng) during translation for MIM and VAE.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Added AveragePoolingEncoder (arch == "avg_pool") encoder.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Update documentation in YAML config.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed missing support for returning score during batch translation.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed format.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed projection of latent to decoder hidden.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Consolidated max and average pooling into a single class.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Debugging.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

* 1. Fixed style.

Signed-off-by: Micha Livne <mlivne@nvidia.com>

Co-authored-by: Micha Livne <mlivne@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-09-16 17:05:11 -07:00
Somshubra Majumdar 5401e0fa27
Fix DALI error encountered with pad_to=0 (#2827)
Signed-off-by: smajumdar <titu1994@gmail.com>
2021-09-16 14:18:45 -07:00
Evelina bb39528f4f
tar dataset for TN/ITN (#2826)
* tar dataset added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* typo and ci test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-16 10:33:56 -07:00
Vahid Noroozi f96bf26a77
increased the precision of validation metric to be saved into the checkpoint file names. (#2811)
Signed-off-by: Vahid <vnoroozi@nvidia.com>

Co-authored-by: Jason <jasoli@nvidia.com>
2021-09-16 10:03:00 -04:00
Elena Rastorgueva aced0db13e
ITN Spanish (#2489)
* add Spanish ITN for cardinals and decimals (currently displaces English rules)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Refactor ITN so English and Spanish code is side by side

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for electronic

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for money

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for money

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply simple style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for ordinals

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix 'doscientos' typo

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for telephone numbers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix bug (NEMO_CHAR was being modified)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for time

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move ITN utils to language-specific folder

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make separate test script folders for each language

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make Cardinal class not convert numbers less than 10

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN Date rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename variables in Time rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN WhiteList rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Word test cases

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Change Ordinal 'suffix' to 'morphosyntactic_features'

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish to Sparrowhawk test scripts

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove unused imports

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Allow decimals to have a punto as well as a coma

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix typos

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ClassifyFst caching

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix Money class bug

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Es Cardinal rules up to one septillionn, still ignoring 'y' in cardinals

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix Cardinal bug which inserted extra zeros

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix decimal rules bug

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add more Ordinal cases and don't convert ordinals less thathan 10

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add more units to MeasureFst

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Added currencies to Money class

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make er ending in Ordinals be superscript

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add TimeFst tagger comments

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update headers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add missing __init__.py file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* dco fix for Elena's branch

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Fix Darg name in docstring

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update headers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update TelephoneFst tagger docstring

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make ElectronicFst also convert URLs

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix cardinal bug which converted e.g. ,uno to ,1

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add cache_dir to CI tests

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba=0.54

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba=0.54.0

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba==0.53.1

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix ru -> es typo in CI tests

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix typo in CI test

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-09-16 01:01:50 -07:00
Somshubra Majumdar 3a08a3ff8f
Update ContextNet RNNT configs (#2819)
* Fix pretrained model info for zh models

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update ContextNet configs

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update ContextNet configs

Signed-off-by: smajumdar <titu1994@gmail.com>
2021-09-15 15:46:19 -07:00
Somshubra Majumdar a0dc5b5912
Enforce numba compat (#2823)
* Enforce numba compat

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove all RNNT tests temporarily

Signed-off-by: smajumdar <titu1994@gmail.com>
2021-09-15 14:06:52 -07:00
Vahid Noroozi 8f88a56327
Add conformer transducer configs (#2812)
* increased the precision of validation metric to be saved into the checkpoint file names.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the configs for the conformer-transducer models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added the configs for the conformer-transducer models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed type.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
2021-09-14 15:47:48 -07:00
Somshubra Majumdar a33ec491c2
Temporarily disable numba cuda tests from running (#2820)
Signed-off-by: smajumdar <titu1994@gmail.com>
2021-09-14 13:19:52 -07:00
Yang Zhang f94608ab4d
Tn fix bugs (#2815)
* explicitly set weight to choose deterministic rule, important for SH

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix whitelist test case

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added more symbols support for itn electronic

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding url to itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* prevent case where single cardinal, e.g. 4 without suffix is recognized as time

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* date does not accept standalone month anymore

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* style fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add decimalx to measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cardinal times

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cardinal times

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add updated en grammars

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* comment out tn with audio tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-09-14 10:36:27 -07:00
Vahid Noroozi 8b1c6e7b6d
Added support for HF pretrained models. Fixed the docs. (#2658) 2021-09-14 00:27:25 -07:00
Evelina fb6b3b83b6
non-deterministic norm update (#2787)
* update script for large files

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* write intermediate result to a file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* file renamed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose n_jobs arg

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* new grammars

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-09-13 18:50:21 -07:00
Taejin Park f71ee4f08b
Update ASR_with_SpeakerDiarization.ipynb tutorial (#2800)
* Initial manuscript for ASR with diarization tutorial

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Updated ASR_with_SpeakerDiarization.ipynb tutorial notebook

Signed-off-by: Taejin Park <tango4j@gmail.com>

* typo and minor fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* Made minor cell order changes.

Signed-off-by: Taejin Park <tango4j@gmail.com>

Co-authored-by: fayejf <fayejf07@gmail.com>
2021-09-13 14:04:44 -07:00