Commit graph

43 commits

Author SHA1 Message Date
Somshubra Majumdar 4e544676f2
Update the webapp for ASR (#3032)
Signed-off-by: smajumdar <titu1994@gmail.com>
2021-10-21 10:30:45 -07:00
tbartley94 d0c97aab6a
Itn fr (#2921)
* First commit. French ITN grammars for tagger and verbalizer. Test for French inverse_normalize added to tests. inverse_text_normalize updated to allow 'fr' tag. tools/text_processing/deployment/pynini_export.py updated to accept 'fr' tag. All CI tests for grammars passed.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Ran style checker.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixed bug causing ordinals to fail sparrowhawk test when verbalizing as roman numbers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* style change for verbalizer/ordinal.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Delete test.py

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Cleaning up unused import spaces for lgtm check.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* taggers/time.py missed style checker

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr lacked an __init__ file

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright wording and adding whitelisting for titles.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Fixing copyright headers.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* copyright header change (missed whitelist)

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Edited export_grammars.sh notes to include 'fr'. Made verbalizer/decimal.py rewrite class part of main class instead.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Adjusting copyright headers for tests.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* inverse_text_normalization/fr/__init__ copyright header

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* addint __init__ file to fr/data

Signed-off-by: tbartley94 <tbartley@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
2021-09-30 14:00:28 -07:00
Vitaly Lavrukhin 5e51840ed5
SDE Updates (#2900)
* Removed text keywords from filters in SDE (to support as values)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Added signal metrics to SDE
Added SDE histograms for all numeric attributes
Improved SDE UI

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated code style

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated SDE requirements

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs (SDE + minor fixes)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2021-09-26 12:26:09 -07:00
Elena Rastorgueva aced0db13e
ITN Spanish (#2489)
* add Spanish ITN for cardinals and decimals (currently displaces English rules)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Refactor ITN so English and Spanish code is side by side

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for electronic

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for money

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for money

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply simple style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for ordinals

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix 'doscientos' typo

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for telephone numbers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix bug (NEMO_CHAR was being modified)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN rules for time

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Move ITN utils to language-specific folder

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make separate test script folders for each language

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make Cardinal class not convert numbers less than 10

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN Date rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Rename variables in Time rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ITN WhiteList rules

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Apply style fixes

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Word test cases

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Change Ordinal 'suffix' to 'morphosyntactic_features'

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish to Sparrowhawk test scripts

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Remove unused imports

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Allow decimals to have a punto as well as a coma

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix typos

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Spanish ClassifyFst caching

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix Money class bug

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add Es Cardinal rules up to one septillionn, still ignoring 'y' in cardinals

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix Cardinal bug which inserted extra zeros

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix decimal rules bug

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add more Ordinal cases and don't convert ordinals less thathan 10

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add more units to MeasureFst

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Added currencies to Money class

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make er ending in Ordinals be superscript

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add TimeFst tagger comments

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update headers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add missing __init__.py file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* dco fix for Elena's branch

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Fix Darg name in docstring

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update headers

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Update TelephoneFst tagger docstring

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Make ElectronicFst also convert URLs

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix cardinal bug which converted e.g. ,uno to ,1

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Add cache_dir to CI tests

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba=0.54

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba=0.54.0

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Install numba==0.53.1

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix ru -> es typo in CI tests

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* Fix typo in CI test

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-09-16 01:01:50 -07:00
Sandeep Subramanian 3a419ac1f8
Provide NMT gRPC models via a directory instead of individual files (#2773)
* Provide models via a directory instead of individual files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-09-09 22:58:45 -07:00
Evelina 968c326060
segmentation tutorial minor update (#2763)
* make compatible with sde changes
* tutorial update

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-09-02 14:15:19 -07:00
Eric Harper 2ff89fdf56
Merge 1.3 bugfixes into main (#2715)
* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebooks branch

Signed-off-by: ericharper <complex451@gmail.com>

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* update readme

Signed-off-by: ericharper <complex451@gmail.com>

* update nemo version for Dockerfile

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch

Signed-off-by: ericharper <complex451@gmail.com>

* Update colab links to Transducer notebooks (#2654)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix nmt grpc server, concatdataset for raw text files (#2656)

* Fix nmt grpc server and concatdataset for raw text files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Check if lang direction is provided correctly

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* add missing init (#2662)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix qa inference for single example (#2668)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Fix max symbol per step updating for RNNT (#2672)

* Fix max symbol per step updating for RNNT

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix notebooks

Signed-off-by: smajumdar <titu1994@gmail.com>

* Replaced unfold() with split_view() (#2671)

* Replaced unfold() with split_view()

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* fixed typo

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>

* Correct voice app demo (#2682)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Import guard (#2692)

* add asr and pynini import guard

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove asrmodel type

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove asrmodel type

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fixing branch (#2695)

Signed-off-by: Ghasem Pasandi <gpasandi@nvidia.com>

Co-authored-by: Ghasem Pasandi <gpasandi@nvidia.com>

* fix for emojis (#2675)

* fix for emojis

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove redundant line

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* raise error

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* use app_state

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>

* Fix issues with ASR notebooks (#2698)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Allow non divisible split_size (#2699)

* bugfix

Signed-off-by: Jason <jasoli@nvidia.com>

* bugfix

Signed-off-by: Jason <jasoli@nvidia.com>

* TN fix for corner cases (#2689)

* serial added, weights to common defaults, decimal bug fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* one failing

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* all tests pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove redundant file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix telephone, add test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix edge case of greedy decoding for greedy_batch mode (#2701)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove time macro (#2703)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Minor FastPitch Fixes (#2697)

* fixes

Signed-off-by: Jason <jasoli@nvidia.com>

* update CI

Signed-off-by: Jason <jasoli@nvidia.com>

* refix

Signed-off-by: Jason <jasoli@nvidia.com>

* Fix ddp error. (#2678)

To avoid "MisconfigurationException: Selected distributed backend ddp is not compatible with an interactive environment." error.

Co-authored-by: ekmb <ebakhturina@nvidia.com>

* update jenkins

Signed-off-by: ericharper <complex451@gmail.com>

* update notebooks

Signed-off-by: ericharper <complex451@gmail.com>

* add split_view back

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Ghasem <35242805+pasandi20@users.noreply.github.com>
Co-authored-by: Ghasem Pasandi <gpasandi@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-08-24 16:21:59 -06:00
Evelina 36286d04f2
ITN Ru and non-deterministic TN (#2519)
* fix for large cardinals, refactor to use rewrite

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix incorrect test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* ru itn + audio updates wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip refactor

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip refactor

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* subfolder

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add alternative for one thousand

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add RU TN to audio based

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test separate TN RU class

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* itn/tn card-or-dec

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* decimal itn update, works

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tn measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* date, electronic, ru-> latin map

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tn ru electronic

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move all logic to tagger

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* itn date and electronic update-fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money class

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money complete

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge with main

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge with main

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge with main

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge conflict

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* merge conflict resolved

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* header

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* measure update itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* before telephone

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* telephone added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* time added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* date sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* date sh pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix measure and money for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix sh telephone

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* delete separate tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* temp time files

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* time wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* time itn fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* adding digit normalization to date

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* all tests pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* headers fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up, year corner case added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* measurement.tsv update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove redundant files

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* files moved, lgtm imports

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* commenting out ru_normalization_tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* disable non-deter ru text_norm tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable itn ci tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable itn ci tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable itn ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable cache for ru grammars

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add cache to all languages

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* message update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable TN/ITN ci tests for *tn* branches

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix import

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* word correction

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test case update, header

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* disable itn tests for main

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable itn tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove measure from whitelist conversion, enable itn tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins branch pattern

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins branch pattern

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins branch pattern, CPU tests are off

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update jenkins branch pattern, CPU tests are off

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert to main for all tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert to main for all tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert to main for all tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* temp

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add .fst files to setup

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add missing init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test ci time

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* revert uncommented tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
2021-08-12 22:00:35 -07:00
Evelina 91ac90bb46
TN update (#2612)
* extend tn grammars for nn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* debug money

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix measure/date

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* url update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* url update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test remove dash

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test format fixed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unrelated minor currencies

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money works

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove name tag

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix some tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* default decimal format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip money test fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* debug tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* cardinal default value fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reorder date in tagger

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* money fix corner cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* additional date format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* all test pass

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* all days added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add .far file to git

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* set cache default to True

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move generator() to utils

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* enable itn tests on ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* reload .far file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review, remove .far, update jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove whitelist words from abbreviation

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test based on abbreviation class changes

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* create .far files first and then run tests, add cache_dir arg for all tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* style

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* debug ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add missing __init__ to German itn

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused import

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tests fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update folder names

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
2021-08-11 21:13:39 -07:00
Somshubra Majumdar 4af3986326
Move ASR Webapp (#2632)
Signed-off-by: smajumdar <titu1994@gmail.com>
2021-08-10 10:44:49 -06:00
Ryan Leary 2be5853cdb
Add basic grpc MT server (#1807)
* Add basic grpc MT server

Add readme, server updates

Signed-off-by: Ryan Leary <rleary@nvidia.com>

* style fix

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* fixing license headers

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Add punctuation model into NMT service

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix merge conflicts

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* style fixes to unblock CI

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Add a Jarvis ASR + NeMo NMT client

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor gRPC service

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update license headers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update one more license header

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Whitepsace in header

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix grpc requirement

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update license headers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add option to specify src/tgt lang and import fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix unused imports

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Renaming variables

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Ryan Leary <rleary@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
2021-08-09 16:31:41 -06:00
Evelina c8f9427295
Eng TN update (#2516)
* added url support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* address added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh test and export update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh test and export update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix fraction for sh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* telephone with words added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove unused import

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-07-20 14:57:58 -07:00
Yang Zhang a8b6a1a4dd
Itn german (#2486)
* initial german itn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cardinal now can accept compound, hundred and thousand without any prefix, ordinal verbalization deleted suffix and replaced with dot

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added all date options

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added fraction

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added fraction to measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* default cent to euro

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added fraction tsv (forgot in the past) and added hour to night to time class

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adjusted docstring to german

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete unnecessary copyright

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix header and delete wrong spelling

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete wrong spelling

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added missing data values for time

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete SH normalization test, cause it doesnt exist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added all classes to SH test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleted redundant files, updated header

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding back whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-07-20 12:41:36 -07:00
Yang Zhang ed085459c9
refactor text processing ONly code to allow other languages (#2477)
* refactor text processing ONly code to allow other languages

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* refactored test folder structure to divide between languages

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add missing file

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-07-14 08:17:22 -07:00
Evelina dda599642d
sparrowhawk tests + punctuation post processing for pynini TN (#2320)
* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-06-10 20:58:23 -07:00
Evelina 6c60797a30
audio based normalization (#2231)
* squash norm_audio

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add missing files

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* style

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* unit tests added, docstrings fixed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix lgtm errors

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* debug jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* debug jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* signature update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* set deterministic default

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add more test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-05-19 20:04:57 -07:00
Oleksii Kuchaiev b8ed0839bc Merge branch 'v1.0.0' into main 2021-05-19 16:05:56 -07:00
Yang Zhang b9f9fa763e
fix comments (#2236)
* fix comments

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix typo

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-05-19 12:39:37 -07:00
Oleksii Kuchaiev f0b3624dc1 Merge branch 'v1.0.0' into main
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
2021-05-17 16:30:01 -07:00
Yang Zhang bf16653af1
Fix text processing docs (#2195)
* fix text processing docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix name

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add guard to pynini import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-05-12 14:22:58 -07:00
Vitaly Lavrukhin 81c19e7e4d
SDE updates (#2187)
* Added updates to SDE:
- support for external vocabulary (to detect OOV words)
- support for offset field (for segmented long recordings)
- UI improvements

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Refactored diff in SDE

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-05-10 13:28:58 -07:00
Yang Zhang c5426c871d
Update text norm docs (#2137)
* move do_training flag to config

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* finished cardinal tagger

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cardinal working

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix cardinal negative

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* decimal

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* ordinals

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added money

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* date

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added time

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* finished draft

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* started fixing time, measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed suppletive for measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed ambiguity between .2 and regular punctuation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* support for optional comma separating every three digits for cardinal, all other classes automatically uses this too

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added quantity

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed some date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed some date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added more date formats (#2090)

* added more date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* update

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* configure input capitalization (#2087)

* adding configuration to distinguish between lower_cased and cased input, affects whitelist, word classes

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix style

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding option to set input_case to args

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* generalized export to both itn and tn (#2093)

* generalized export to both itn and tn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding to jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix date

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* keep line break

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test for word

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added telephone (#2098)

* added telephone

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding commas as separator between phone number

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixing punctuation and simplifying SH export

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* refactored itn to have class for denormalizer

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Itn punctuation (#2106)

* fixing punctuation and simplifying SH export

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* changed variable names

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleted sentence boundary for ITN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* style

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix citation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* sign off

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cleaned first 6200 lines of whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* WFST Normalization support for emails/electronic (#2092)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added email tagging and verbalization

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic to handle digits and common domains

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstring fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding doc string, adding more info to docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleted token parser from ITN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* reuse token parser from tn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleting whitelist items

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix for wfst normalization (#2134)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* update_text_processing_docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* changing tn to cased by default

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* change to cased

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
2021-04-29 14:59:25 -07:00
Yang Zhang 6ce670b7ab
Text norm (#2123)
* move do_training flag to config

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* finished cardinal tagger

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cardinal working

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix cardinal negative

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* decimal

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* ordinals

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added money

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* date

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added time

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* finished draft

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* started fixing time, measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed suppletive for measure

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed ambiguity between .2 and regular punctuation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* support for optional comma separating every three digits for cardinal, all other classes automatically uses this too

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added quantity

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed some date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixed some date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added more date formats (#2090)

* added more date formats

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* update

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* configure input capitalization (#2087)

* adding configuration to distinguish between lower_cased and cased input, affects whitelist, word classes

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix style

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding option to set input_case to args

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* generalized export to both itn and tn (#2093)

* generalized export to both itn and tn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding to jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix date

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* keep line break

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added test for word

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added telephone (#2098)

* added telephone

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding commas as separator between phone number

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fixing punctuation and simplifying SH export

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* refactored itn to have class for denormalizer

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Itn punctuation (#2106)

* fixing punctuation and simplifying SH export

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* changed variable names

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* updated docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* save tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleted sentence boundary for ITN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* style

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix citation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* sign off

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* cleaned first 6200 lines of whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* WFST Normalization support for emails/electronic (#2092)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added email tagging and verbalization

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update electronic to handle digits and common domains

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstring fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding doc string, adding more info to docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleted token parser from ITN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* reuse token parser from tn

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* deleting whitelist items

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix for wfst normalization (#2134)

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add whitelist

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* tutorial fix for wfst_norm+segm (#2135)

* fix for wfst normalization

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tutorial update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* text_norm_segm tutorial fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
2021-04-28 16:27:28 -07:00
Vahid Noroozi 0bc7091503
Adding N-gram LM for ASR Models (#2066) 2021-04-22 14:55:01 -07:00
Yang Zhang 52121810b4
Text processing docs (#2046)
* refactoring text normalization docs and tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename nemo tools to nemo text processing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename nemo_tools to nemo_text_processing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename docs

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename files

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename pytests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix pytest

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix refactoring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* renamed functions in ITN tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix typo

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix path to tutorial in readme for ITN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix Jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix Jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-04-09 18:11:40 -07:00
Evelina afcce9a665
use nemo norm by default (#2044)
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-04-09 16:05:53 -07:00
Yang Zhang 6ef9793532
Add export route to ITN (#2020)
* adding files for export

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding docker build and launch

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding docker build and launch

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding kbps mbps

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix dockerfile

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding doc string

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* spell fix, doc fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* spell fix, doc fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* spell fix, doc fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* restart jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-04-07 19:46:34 -07:00
Yang Zhang 360eb0422f
Text denormalization (#1797)
* move do_training flag to config

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding text denorm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add google header

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete unused code

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding unittests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add pynini dependency

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix missing import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add header

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix pytests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix pytests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* change jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add text denorm container

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add export files

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add export files

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add export files

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add export files

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix import

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename tools to nemo_tools

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rename tools to nemo_tools

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix bug

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding missing file

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add missing header

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix pytests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try to clean all workspaces

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* move back tools

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try something

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try something

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add package info

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* test jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* test jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding setup

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding setup

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pytests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding requirements

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add cpu tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* try fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix pytests for nlp

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker user test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins docker less root test

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete SH from ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix new nemo_tools path in ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rm output content after ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete inflect

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* change new weights

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkinsfile

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* style fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix tests

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix weight

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* delete requirement

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix jenkins

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add nemo_tools readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add nemo_tools readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add nemo_tools readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add nemo_tools readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* address PR review

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* update nemo_tools readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
2021-03-31 13:31:19 -07:00
Evelina ab0bbaea7c
NLP, Megatron and Tools docs (#1739)
* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docs for nlp and tools init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tools docs

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* added script to convert raw data into nemo format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* placeholders added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* punctuation model docs updated

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* Speech Regression Support (#1707)

* Speech Regression support

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Speech Regression Support

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Speech Regression Support

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactoring after review

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

* Refactorings after review, fixes

Signed-off-by: diego-fustes <diegofustesfic@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>

* update max seq len

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* tc_update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docs update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove untouched files

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* review feedback

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* new ngc model names

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* text norm doc (#1893)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* model name updated

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins rename model

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* quick start added, model_nlp added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* commnet changed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Co-authored-by: Diego Fustes Villadóniga <diegofustesfic@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
2021-03-15 14:54:53 -07:00
Oleksii Kuchaiev ebf0c91d82
Cleanup save/restore (#1851)
* Cleanup save/restore

* Remove EFF save/restore routes
* Once we can take EFF dependency we will use EFF.Archive directly

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* fix copyright headers

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
2021-03-05 13:49:28 -07:00
Evelina eff003635f
segmentation tutorial dir fix (#1765)
* fix dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* dir fix for colab

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-02-18 12:55:42 -08:00
Evelina 941ef1fd71
aligner update (#1693)
* preprocessing update
* use re.finditer
* add nemo normalization
* update tutorial
* remove output folder jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2021-02-02 13:08:59 -08:00
Yang Zhang 748e47a1d8
add word2number to requirement (#1689)
Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-02-01 18:05:05 -08:00
Vitaly Lavrukhin c4379da172
Added new features to Speech Data Explorer (#1686)
* Added new features to Speech Data Explorer:
- support for errors' analysis (WER, CER, Word Matching Rate, word accuracy, diff)
- caching of computed metrics, statistics
- UI updates (hideable columns)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated SDE requirements

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Fixed warnings

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated SDE UI and README

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

Co-authored-by: ekmb <ebakhturina@nvidia.com>
2021-02-01 14:56:44 -08:00
Yang Zhang ad0edf2e69
Text normalization (#1663)
* refactor to new api

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-01-26 14:06:19 -08:00
Somshubra Majumdar 51ae260a50
Correct ASR issues + Patch for Pytorch 1.8 (#1565)
* Trim silence default to False

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update stft and torch.fft.ifft for Pytorch 1.8

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Clear up old code

Signed-off-by: smajumdar <titu1994@gmail.com>
2020-12-17 14:14:59 -08:00
Yang Zhang 0e66011aa9
Text normalization (#1548)
* adding documented scripts

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix style

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove redundancy by using tsv data in tagger

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove ununused params

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add pytest

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add license

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* init template tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* started tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* rm tutorial temporarily

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding tutorial

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add perf

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add requirements

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add review feedback

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding more docstring

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add review feedback

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding text normalization to readme

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2020-12-16 14:48:30 -08:00
Evelina f6ce847809
segmentation (#1529)
* expanded normalization helpers

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* additional split symbols exposed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* split condition fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix in add split symbols

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* text update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* notebook jupyter upgrade cmd added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* install requirements

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* ffmped install

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rearrange steps

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* rearrange steps

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* file name update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* prefix=0

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* separator

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* refactor

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* notebook reqs

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* replace

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>
2020-12-11 14:48:59 -08:00
Evelina 5b6dd38501
Dataset creation tool based on CTC-segmentation (#1450)
Dataset creation tool based on CTC-segmentation
Signed-off-by: ekmb <ebakhturina@nvidia.com>
2020-11-13 14:55:22 -08:00
Vitaly Lavrukhin b86790dada
Added new features to Speech Data Explorer (#1442)
* Added new features to Speech Data Explorer:
 - backend paging/sorting/filtering
 - support for assigning a port
 - minor tweaks to UI

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Fixed exception catch block

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: ekmb <ebakhturina@nvidia.com>
2020-11-13 13:53:06 -08:00
Vitaly Lavrukhin 87206f7d16
Speech Data Explorer Improvements (#1290)
* Added support for optional fields in Speech Data Explorer

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated interactive plots in Speech Data Explorer

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2020-10-14 10:40:51 -07:00
Oleksii Kuchaiev 900c69fc67 fix style
Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
2020-07-24 10:22:14 -07:00
Vitaly Lavrukhin 69a927550b
Added Speech Data Explorer tool (#906)
* Added Speech Data Explorer tool

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Removed unused import

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2020-07-24 09:54:55 -07:00