WFST_tutorial for ITN development (#3128)

* Pushing WFST_tutorial for open draft. (Still need to review collab code.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Checked tutorial code for WFST_Tutorial is properly functioning. Also included some formatting edits.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Responding to editorial comments for WFST_tutorial

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Added images to folder and wrote README for tutorials

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Few more editorial changes to explain permutations in classification.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Updated tutorials documentation page.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Forgot links for README

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* TOC links were dead

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* More dead links to fix.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* removing collab install and appending a warning instead.

Signed-off-by: tbartley94 <tbartley@nvidia.com>

* Update WFST_Tutorial.ipynb

Signed-off-by: tbartley94 <tbartley@nvidia.com>
This commit is contained in:
tbartley94 2021-11-09 15:18:19 -05:00 committed by GitHub
parent dc9ed88f78
commit 1106ff93c0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
11 changed files with 7223 additions and 0 deletions

View file

@ -139,3 +139,6 @@ To run a tutorial:
* - Text Processing
- Inverse Text Normalization for ASR
- `Inverse Text Normalization <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb>`_
* - Text Processing
- Constructing Normalization Grammars with WFSTs
- `WFST Tutorial <https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb>`_

View file

@ -0,0 +1,24 @@
# NeMo Text Processing Tutorials
The NeMo Text Processing module provides support for both Text Normalization (TN) and
Inverse Text Normalization (ITN) in order to aid upstream and downstream text processing.
The included tutorials are intended to help you quickly become familiar with the interface
of the module, as well as guiding you in creating and deploying your own grammars for individual
text processing needs.
If you wish to learn more about how to use NeMo's for Text Normalization tasks (e.g. conversion
of symbolic strings to verbal form - such as `15` -> "fifteen"), please see the [`Text Normalization`](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_Normalization.ipynb)
tutorial.
If you wish to learn more about Inverse Text Normalization - the inverse task of converting
from verbalized strings to symbolic written form, as may be encountered in downstream ASR -
consult the [`Inverse Text Normalization`](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Inverse_Text_Normalization.ipynb) tutorial.
For those curious about constructing grammars tailored to specific languages and use cases,
you may be interested in working through the [`WFST Tutorial`](https://github.com/NVIDIA/NeMo/blob/stable/tutorials/text_processing/WFST_Tutorial.ipynb), which goes through NeMo's Normalization
process in detail.
As NeMo Text Processing utilizes Weighted Finite State Transducer (WFST) graphs to construct its
grammars, a working knowledge of [Finite State Automata](https://en.wikipedia.org/wiki/Finite-state_machine) (FSA) and/or regular languages is suggested.
Further, we recommend becoming functionally familiar with the [`pynini` library](https://www.openfst.org/twiki/bin/view/GRM/Pynini) - which functions
as the backend for graph construction - and [Sparrowhawk](https://github.com/google/sparrowhawk) - which NeMo utilizes for grammar deployment.

File diff suppressed because it is too large Load diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB