Itn fr (#2947)
* typos (#2909) Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Updated docs (#2911) Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Nmt encoder decoder hidden size fix (#2856) * 1. Enabled encoder/decoder with different size in bottleneck architecture. 2. Validating encoder/decoder with the same size in non-bottleneck parent class. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed typo. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added hidden_size ot error message. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing defaults. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixing CI tests to have same hidden_size. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated error message. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating Jenkins CI test. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating CI to hidden=48 Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing hidden_size when loading pre-trained huggingface model. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing hidden_size in config for pre-trained models. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated missng hidden_size in config. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated Jenkinsfile test values. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder) Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating missing arguments for Jenkinsfile test. Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: tbartley94 <tbartley@nvidia.com> * First commit. French ITN grammars for tagger and verbalizer. Test for French inverse_normalize added to tests. inverse_text_normalize updated to allow 'fr' tag. tools/text_processing/deployment/pynini_export.py updated to accept 'fr' tag. All CI tests for grammars passed. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Ran style checker. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Fixed bug causing ordinals to fail sparrowhawk test when verbalizing as roman numbers. Signed-off-by: tbartley94 <tbartley@nvidia.com> * style change for verbalizer/ordinal.py Signed-off-by: tbartley94 <tbartley@nvidia.com> * Add DALI dataset unit test (#2904) Signed-off-by: Joaquin Anton <janton@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Delete test.py Signed-off-by: tbartley94 <tbartley@nvidia.com> * Cleaning up unused import spaces for lgtm check. Signed-off-by: tbartley94 <tbartley@nvidia.com> * taggers/time.py missed style checker Signed-off-by: tbartley94 <tbartley@nvidia.com> * inverse_text_normalization/fr lacked an __init__ file Signed-off-by: tbartley94 <tbartley@nvidia.com> * Merge r1.4 bugfixes to main (#2918) * update package info Signed-off-by: ericharper <complex451@gmail.com> * update branch for jenkinsfile and dockerfile Signed-off-by: ericharper <complex451@gmail.com> * Adding conformer-transducer models. (#2717) * added the models. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added contextnet models. Signed-off-by: Vahid <vnoroozi@nvidia.com> * added german and chinese models. Signed-off-by: Vahid <vnoroozi@nvidia.com> * fix the abs_pos of conformer. (#2863) Signed-off-by: Vahid <vnoroozi@nvidia.com> * update to match sde (#2867) Signed-off-by: ekmb <ebakhturina@nvidia.com> * updated german ngc model (#2871) Signed-off-by: Yang Zhang <yangzhang@nvidia.com> * Lower bound PTL to safe version (#2876) Signed-off-by: smajumdar <titu1994@gmail.com> * Update notebooks with onnxruntime (#2880) Signed-off-by: smajumdar <titu1994@gmail.com> * Upperbound PTL (#2881) Signed-off-by: smajumdar <titu1994@gmail.com> * minor typo and broken link fixes (#2883) Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * Remove numbers from TTS tutorial names (#2882) * Remove numbers from TTS tutorial names Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Update documentation links Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Typos (#2884) * segmentation tutorial fix Signed-off-by: ekmb <ebakhturina@nvidia.com> * data fixes Signed-off-by: ekmb <ebakhturina@nvidia.com> * updated the messages in eval_beamsearch_ngram.py. (#2889) Signed-off-by: Vahid <vnoroozi@nvidia.com> * style (#2890) Signed-off-by: Jason <jasoli@nvidia.com> * Fix broken link (#2891) * fix broken link Signed-off-by: fayejf <fayejf07@gmail.com> * more Signed-off-by: fayejf <fayejf07@gmail.com> * Update sclite eval for new transcription method (#2893) * Update sclite to use updated inference Signed-off-by: smajumdar <titu1994@gmail.com> * Remove WER Signed-off-by: smajumdar <titu1994@gmail.com> * Update sclite script to use new inference methods Signed-off-by: smajumdar <titu1994@gmail.com> * Remove hub 5 Signed-off-by: smajumdar <titu1994@gmail.com> * Fix TransformerDecoder export - r1.4 (#2875) * export fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * embedding pos Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * remove bool param Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * changes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * Update Finetuning notebook (#2906) * update notebook Signed-off-by: Jason <jasoli@nvidia.com> * rename Signed-off-by: Jason <jasoli@nvidia.com> * rename Signed-off-by: Jason <jasoli@nvidia.com> * revert branch to main Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Fix several bugs in punctuation and capitalization inference and make minor improvements (#2905) * Add save labels arg to method and remove device setting Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix device bug and reading plain text bug Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Make minor improvements Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Fix code style Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Remove excess parameter Signed-off-by: PeganovAnton <peganoff2@mail.ru> Signed-off-by: tbartley94 <tbartley@nvidia.com> * add fix to not add dot everywhere (#2885) Signed-off-by: Yang Zhang <yangzhang@nvidia.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Fixing copyright wording and adding whitelisting for titles. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Fixing copyright headers. Signed-off-by: tbartley94 <tbartley@nvidia.com> * copyright header change (missed whitelist) Signed-off-by: tbartley94 <tbartley@nvidia.com> * Edited export_grammars.sh notes to include 'fr'. Made verbalizer/decimal.py rewrite class part of main class instead. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Adjusting copyright headers for tests. Signed-off-by: tbartley94 <tbartley@nvidia.com> * inverse_text_normalization/fr/__init__ copyright header Signed-off-by: tbartley94 <tbartley@nvidia.com> * addint __init__ file to fr/data Signed-off-by: tbartley94 <tbartley@nvidia.com> * TN infer (#2929) * en_small grammars added Signed-off-by: ekmb <ebakhturina@nvidia.com> * infer fix Signed-off-by: ekmb <ebakhturina@nvidia.com> * add whitelist arg Signed-off-by: ekmb <ebakhturina@nvidia.com> * add input fall back Signed-off-by: ekmb <ebakhturina@nvidia.com> * docstring Signed-off-by: ekmb <ebakhturina@nvidia.com> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Experiment manager step timing (#2936) * 1. Enabled encoder/decoder with different size in bottleneck architecture. 2. Validating encoder/decoder with the same size in non-bottleneck parent class. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed typo. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added hidden_size ot error message. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing defaults. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixing CI tests to have same hidden_size. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated error message. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating Jenkins CI test. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating CI to hidden=48 Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing hidden_size when loading pre-trained huggingface model. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed missing hidden_size in config for pre-trained models. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated missng hidden_size in config. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Testing encoder and decoder objects' hidden_size instead of config to support pre-trained models. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated Jenkinsfile test values. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed Jenkinsfile test values (NMT Megatron Model Parallel Size 2 Encoder) Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating missing arguments for Jenkinsfile test. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added a generic timer class. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Renamed file. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. exp_manager timing of train/val/test using callbaks is ready. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. FIxed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Step timing hooks are tested. Logging does not record values due to a bug (should be solved with upgraded ptl) Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added workaround hooks to MTEncDecModel. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed logging issue. All NeMo models support timing. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Removed unused timer object. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added missing copyright. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. NamedTimer supports multiple reductions. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Removed leftover file. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updating code to latest. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Updated docstring. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added exp_manager.step_timing_sync_cuda to config to enable cuda sync on start/stop (False by default). Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed variable names. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added exp_manager.step_timing_kwargs nested config for clarity and future extensibility. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed formatting. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added train_backward_timing timing. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Testing for optional none timing kwargs. Signed-off-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Signed-off-by: tbartley94 <tbartley@nvidia.com> * Removed 'cents' from minor currency denominations to avoid ambiguity issues with cardinals. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Wrote quick readme explaining orthography variation for French ITN. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Update README.md Signed-off-by: tbartley94 <tbartley@nvidia.com> * Update README.md Signed-off-by: tbartley94 <tbartley@nvidia.com> * Update README.md Signed-off-by: tbartley94 <tbartley@nvidia.com> * Removed pence from minor currencies, added back in. Signed-off-by: tbartley94 <tbartley@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Joaquin Anton <janton@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: PeganovAnton <peganoff2@mail.ru>
This commit is contained in:
parent
ec6591e76a
commit
d8924ffb2c
30
nemo_text_processing/inverse_text_normalization/fr/README.md
Normal file
30
nemo_text_processing/inverse_text_normalization/fr/README.md
Normal file
|
@ -0,0 +1,30 @@
|
|||
# Note on French spelling
|
||||
|
||||
Due to a 1990 orthographic reform, there are currently two conventions for written French numbers:
|
||||
|
||||
1. **Reformed** All composite words are joined by a hyphen:
|
||||
e.g. `1122 -> mille-cent-vingt-deux`
|
||||
|
||||
2. **Traditional** Hyphenation only occurs (with exception) for numbers from 17 to 99 (inclusive):
|
||||
e.g. `1122 -> mille cent vingt-deux`
|
||||
|
||||
As available training data for upstream ASR will vary in use of convention, NeMo's French ITN accomodates either style for normalization e.g.
|
||||
|
||||
```
|
||||
python inverse_normalize.py "mille-cent-vingt-deux" --language="fr" --> 1122
|
||||
python inverse_normalize.py "mille cent vingt-deux" --language="fr" --> 1122
|
||||
```
|
||||
|
||||
As a result, there exists some ambiguity in the case of currency conversions, namely minor denominations of the dollar e.g.
|
||||
|
||||
```
|
||||
300 -> "trois-cents" # Reformed spelling
|
||||
300 -> "trois cents" # Traditional spelling
|
||||
3 ¢ -> "trois cents" # Valid for both
|
||||
```
|
||||
|
||||
Cardinals take priority in such cases.
|
||||
|
||||
```
|
||||
python inverse_normalize.py "trois cents" --language="fr" -> 300
|
||||
```
|
|
@ -1,8 +1,5 @@
|
|||
cent $
|
||||
cents $
|
||||
centime €
|
||||
centimes €
|
||||
eurocent €
|
||||
eurocents €
|
||||
pence £
|
||||
pesos $
|
||||
pence £
|
|
Loading…
Reference in a new issue