NeMo/tools/speech_data_explorer
Vitaly Lavrukhin 5e51840ed5
SDE Updates (#2900)
* Removed text keywords from filters in SDE (to support as values)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Added signal metrics to SDE
Added SDE histograms for all numeric attributes
Improved SDE UI

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated code style

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated SDE requirements

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs (SDE + minor fixes)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>

* Updated docs

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
2021-09-26 12:26:09 -07:00
..
data_explorer.py SDE Updates (#2900) 2021-09-26 12:26:09 -07:00
README.md NLP, Megatron and Tools docs (#1739) 2021-03-15 14:54:53 -07:00
requirements.txt SDE Updates (#2900) 2021-09-26 12:26:09 -07:00
screenshot.png Added Speech Data Explorer tool (#906) 2020-07-24 09:54:55 -07:00

Speech Data Explorer

Dash-based tool for interactive exploration of ASR/TTS datasets.

Features:

  • dataset's statistics (alphabet, vocabulary, duration-based histograms)
  • navigation across dataset (sorting, filtering)
  • inspection of individual utterances (waveform, spectrogram, audio player)
  • errors' analysis (Word Error Rate, Character Error Rate, Word Match Rate, Mean Word Accuracy, diff)

Please make sure that requirements are installed. Then run:

python data_explorer.py path_to_manifest.json

JSON manifest file should contain the following fields:

  • "audio_filepath" (path to audio file)
  • "duration" (duration of the audio file in seconds)
  • "text" (reference transcript)

Errors' analysis requires "pred_text" (ASR transcript) for all utterances.

Any additional field will be parsed and displayed in 'Samples' tab.

Speech Data Explorer