NeMo/nemo_text_processing/text_normalization
Yang Zhang 663c76a972
Tn clean upsample (#3024)
* init

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* renamed file

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding all cleaning scripts

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* skip sentence if error

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove I-SAME

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix tyle

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove I the first from training

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove DM and Da from upsampling

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove I -> one/first, also add space around dash for alphanumerical context, remove rare currency from being upsampled

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove dalton and DM from being verbalized

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove Da and DM sentences competely

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* addressed review feedback, added data folder in examples

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* refactored code, added data utils functions

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* added electronic wfst for english neural TN

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* header and lgtm

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
2021-11-02 10:26:17 -07:00
..
en TN infer (#2929) 2021-09-30 15:41:04 -07:00
ru TN infer (#2929) 2021-09-30 15:41:04 -07:00
__init__.py refactor text processing ONly code to allow other languages (#2477) 2021-07-14 08:17:22 -07:00
data_loader_utils.py refactor text processing ONly code to allow other languages (#2477) 2021-07-14 08:17:22 -07:00
normalize.py Tn clean upsample (#3024) 2021-11-02 10:26:17 -07:00
normalize_with_audio.py TN infer (#2929) 2021-09-30 15:41:04 -07:00
README.md Text norm (#2123) 2021-04-28 16:27:28 -07:00
run_evaluate.py fix bug (#2527) 2021-07-26 21:22:26 -07:00
run_predict.py refactor text processing ONly code to allow other languages (#2477) 2021-07-14 08:17:22 -07:00
token_parser.py Text norm (#2123) 2021-04-28 16:27:28 -07:00

Text Normalization system for english, e.g. 123 kg -> one hundred twenty three kilograms Offers prediction and evaluation on text normalization data, e.g. Google text normalization dataset.

Install dependencies: bash ../setup.sh

Example prediction run: python run_predict.py --input=INPUT_FILE --output=OUTPUT_FILE [--verbose] Example evaluation run: python run_evaluate.py --input=./en_with_types/output-00001-of-00100 [--cat CATEGORY]