WFST_tutorial for ITN development (#3128)
* Pushing WFST_tutorial for open draft. (Still need to review collab code. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Checked tutorial code for WFST_Tutorial is properly functioning. Also included some formatting edits. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Responding to editorial comments for WFST_tutorial Signed-off-by: tbartley94 <tbartley@nvidia.com> * Added images to folder and wrote README for tutorials Signed-off-by: tbartley94 <tbartley@nvidia.com> * Few more editorial changes to explain permutations in classification. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Updated tutorials documentation page. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Forgot links for README Signed-off-by: tbartley94 <tbartley@nvidia.com> * TOC links were dead Signed-off-by: tbartley94 <tbartley@nvidia.com> * More dead links to fix. Signed-off-by: tbartley94 <tbartley@nvidia.com> * removing collab install and appending a warning instead. Signed-off-by: tbartley94 <tbartley@nvidia.com> * Update WFST_Tutorial.ipynb Signed-off-by: tbartley94 <tbartley@nvidia.com>
|
@ -139,3 +139,6 @@ To run a tutorial:
|
|||
* - Text Processing
|
||||
- Inverse Text Normalization for ASR
|
||||
- `Inverse Text Normalization <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`_
|
||||
* - Text Processing
|
||||
- Constructing Normalization Grammars with WFSTs
|
||||
- `WFST Tutorial <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb>`_
|
||||
|
|
24
tutorials/text_processing/README.md
Normal file
|
@ -0,0 +1,24 @@
|
|||
# NeMo Text Processing Tutorials
|
||||
|
||||
The NeMo Text Processing module provides support for both Text Normalization (TN) and
|
||||
Inverse Text Normalization (ITN) in order to aid upstream and downstream text processing.
|
||||
The included tutorials are intended to help you quickly become familiar with the interface
|
||||
of the module, as well as guiding you in creating and deploying your own grammars for individual
|
||||
text processing needs.
|
||||
|
||||
If you wish to learn more about how to use NeMo's for Text Normalization tasks (e.g. conversion
|
||||
of symbolic strings to verbal form - such as `15` -> "fifteen"), please see the [`Text Normalization`](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_Normalization.ipynb)
|
||||
tutorial.
|
||||
|
||||
If you wish to learn more about Inverse Text Normalization - the inverse task of converting
|
||||
from verbalized strings to symbolic written form, as may be encountered in downstream ASR -
|
||||
consult the [`Inverse Text Normalization`](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb) tutorial.
|
||||
|
||||
For those curious about constructing grammars tailored to specific languages and use cases,
|
||||
you may be interested in working through the [`WFST Tutorial`](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb), which goes through NeMo's Normalization
|
||||
process in detail.
|
||||
|
||||
As NeMo Text Processing utilizes Weighted Finite State Transducer (WFST) graphs to construct its
|
||||
grammars, a working knowledge of [Finite State Automata](https://en.wikipedia.org/wiki/Finite-state_machine) (FSA) and/or regular languages is suggested.
|
||||
Further, we recommend becoming functionally familiar with the [`pynini` library](https://www.openfst.org/twiki/bin/view/GRM/Pynini) - which functions
|
||||
as the backend for graph construction - and [Sparrowhawk](https://github.com/google/sparrowhawk) - which NeMo utilizes for grammar deployment.
|
7196
tutorials/text_processing/WFST_Tutorial.ipynb
Normal file
BIN
tutorials/text_processing/images/cent.PNG
Normal file
After Width: | Height: | Size: 10 KiB |
BIN
tutorials/text_processing/images/cent_to_100.PNG
Normal file
After Width: | Height: | Size: 12 KiB |
BIN
tutorials/text_processing/images/cent_vingt_bad.PNG
Normal file
After Width: | Height: | Size: 13 KiB |
BIN
tutorials/text_processing/images/cent_vingt_good.PNG
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
tutorials/text_processing/images/cent_vingt_to_120.PNG
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
tutorials/text_processing/images/dix_to_digits.PNG
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
tutorials/text_processing/images/dix_to_digits_with_insert.PNG
Normal file
After Width: | Height: | Size: 27 KiB |
BIN
tutorials/text_processing/images/romanization.PNG
Normal file
After Width: | Height: | Size: 36 KiB |