Add PUBLICATIONS.md (#3051)
* Add PUBLICATIONS.md Signed-off-by: smajumdar <titu1994@gmail.com> * Add NLP Signed-off-by: smajumdar <titu1994@gmail.com> * Update PUBLICATIONS.md * Update PUBLICATIONS.md * Fix links Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com>
This commit is contained in:
parent
d22cf7643c
commit
f8d8d069e5
133
PUBLICATIONS.md
Normal file
133
PUBLICATIONS.md
Normal file
|
@ -0,0 +1,133 @@
|
|||
# Publications
|
||||
|
||||
Here, we list a collection of research articles that utilize the NeMo Toolkit. If you would like to include your paper in this collection, please submit a PR updating this document.
|
||||
|
||||
-------
|
||||
|
||||
# Automatic Speech Recognition (ASR)
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [Citrinet: Closing the Gap between Non-Autoregressive and Autoregressive End-to-End Models for Automatic Speech Recognition](https://arxiv.org/abs/2104.01721)
|
||||
* [SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition](https://www.isca-speech.org/archive/interspeech_2021/oneill21_interspeech.html)
|
||||
* [CarneliNet: Neural Mixture Model for Automatic Speech Recognition](https://arxiv.org/abs/2107.10708)
|
||||
* [CTC Variations Through New WFST Topologies](https://arxiv.org/abs/2110.03098)
|
||||
* [A Toolbox for Construction and Analysis of Speech Datasets](https://openreview.net/pdf?id=oJ0oHQtAld)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary>2020</summary>
|
||||
|
||||
* [Cross-Language Transfer Learning, Continuous Learning, and Domain Adaptation for End-to-End Automatic Speech Recognition](https://ieeexplore.ieee.org/document/9428334)
|
||||
* [Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model](https://ieeexplore.ieee.org/abstract/document/9053051)
|
||||
* [Improving Noise Robustness of an End-to-End Neural Model for Automatic Speech Recognition](https://arxiv.org/abs/2010.12715)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary>2019</summary>
|
||||
|
||||
* [Jasper: An End-to-End Convolutional Neural Acoustic Model](https://arxiv.org/abs/1904.03288)
|
||||
* [QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions](https://arxiv.org/abs/1910.10261)
|
||||
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
--------
|
||||
|
||||
|
||||
## Speaker Recognition (SpkR)
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context]( https://arxiv.org/pdf/2110.04410.pdf)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary>2020</summary>
|
||||
|
||||
* [SpeakerNet: 1D Depth-wise Separable Convolutional Network for Text-Independent Speaker Recognition and Verification]( https://arxiv.org/pdf/2010.12653.pdf)
|
||||
|
||||
</details>
|
||||
|
||||
--------
|
||||
|
||||
## Speech Classification
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection](https://ieeexplore.ieee.org/abstract/document/9414470/)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
<details>
|
||||
<summary>2020</summary>
|
||||
|
||||
* [MatchboxNet - 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition](http://www.interspeech2020.org/index.php?m=content&c=index&a=show&catid=337&id=993)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
--------
|
||||
|
||||
# Natural Language Processing (NLP)
|
||||
|
||||
## Language Modeling
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [BioMegatron: Larger Biomedical Domain Language Model ](https://aclanthology.org/2020.emnlp-main.379/)
|
||||
|
||||
</details>
|
||||
|
||||
--------
|
||||
|
||||
## Dialogue State Tracking
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services](https://arxiv.org/abs/2105.08049)
|
||||
|
||||
</details>
|
||||
|
||||
--------
|
||||
|
||||
|
||||
# Text To Speech (TTS)
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [TalkNet: Fully-Convolutional Non-Autoregressive Speech Synthesis Model](https://www.isca-speech.org/archive/interspeech_2021/beliaev21_interspeech.html)
|
||||
* [TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction](https://arxiv.org/abs/2104.08189)
|
||||
* [Hi-Fi Multi-Speaker English TTS Dataset](https://www.isca-speech.org/archive/pdfs/interspeech_2021/bakhturina21_interspeech.pdf)
|
||||
* [Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings](https://arxiv.org/abs/2110.03584)
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
--------
|
||||
|
||||
# (Inverse) Text Normalization
|
||||
|
||||
<details>
|
||||
<summary>2021</summary>
|
||||
|
||||
* [NeMo Inverse Text Normalization: From Development to Production](https://www.isca-speech.org/archive/pdfs/interspeech_2021/zhang21ga_interspeech.pdf)
|
||||
* [A Unified Transformer-based Framework for Duplex Text Normalization](https://arxiv.org/pdf/2108.09889.pdf )
|
||||
|
||||
</details>
|
||||
|
||||
--------
|
17
README.rst
17
README.rst
|
@ -197,6 +197,23 @@ Contributing
|
|||
|
||||
We welcome community contributions! Please refer to the `CONTRIBUTING.md <https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md>`_ CONTRIBUTING.md for the process.
|
||||
|
||||
Publications
|
||||
------------
|
||||
|
||||
We provide an ever growing list of publications that utilize the NeMo framework. Please refer to `PUBLICATIONS.md <https://github.com/NVIDIA/NeMo/blob/main/PUBLICATIONS.md>`_. We welcome the addition of your own articles to this list !
|
||||
|
||||
Citation
|
||||
--------
|
||||
|
||||
```
|
||||
@article{kuchaiev2019nemo,
|
||||
title={Nemo: a toolkit for building ai applications using neural modules},
|
||||
author={Kuchaiev, Oleksii and Li, Jason and Nguyen, Huyen and Hrinchuk, Oleksii and Leary, Ryan and Ginsburg, Boris and Kriman, Samuel and Beliaev, Stanislav and Lavrukhin, Vitaly and Cook, Jack and others},
|
||||
journal={arXiv preprint arXiv:1909.09577},
|
||||
year={2019}
|
||||
}
|
||||
```
|
||||
|
||||
License
|
||||
-------
|
||||
NeMo is under `Apache 2.0 license <https://github.com/NVIDIA/NeMo/blob/stable/LICENSE>`_.
|
||||
|
|
Loading…
Reference in a new issue