Merge r1.5.0 bugfixes and doc updates to main (#3133)
* update branch Signed-off-by: ericharper <complex451@gmail.com> * Always save last checkpoint on train end even if folder does not exist (#2976) * add fix for no checkpoint folder when training ends Signed-off-by: Jason <jasoli@nvidia.com> * update Signed-off-by: Jason <jasoli@nvidia.com> * fix test Signed-off-by: Jason <jasoli@nvidia.com> * fixes Signed-off-by: Jason <jasoli@nvidia.com> * typo Signed-off-by: Jason <jasoli@nvidia.com> * change check Signed-off-by: Jason <jasoli@nvidia.com> * [NLP] Add Apex import guard (#3041) * add apex import guard Signed-off-by: ericharper <complex451@gmail.com> * add apex import guard Signed-off-by: ericharper <complex451@gmail.com> * add apex import guard Signed-off-by: ericharper <complex451@gmail.com> * style Signed-off-by: ericharper <complex451@gmail.com> * remove from init add logging to constructor Signed-off-by: ericharper <complex451@gmail.com> * remove from init add logging to constructor Signed-off-by: ericharper <complex451@gmail.com> * remove import from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert encoder logic from NLPModel Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * remove megatron bert from init Signed-off-by: ericharper <complex451@gmail.com> * style Signed-off-by: ericharper <complex451@gmail.com> * Exp manager small refactor (#3067) * Exp manager small refactor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * move super() call earlier in the function Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Change container (#3087) Signed-off-by: smajumdar <titu1994@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Training of machine translation model fails if config parameter `trainer.max_epochs` is used instead of `trainer.max_steps`. (#3112) * fix: replace distributed_backend for accelarator Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Add debug script Signed-off-by: PeganovAnton <peganoff2@mail.ru> * Remove debug script Signed-off-by: PeganovAnton <peganoff2@mail.ru> * update (#3113) Signed-off-by: Jason <jasoli@nvidia.com> * Fix: punctuation capitalization inference on short queries (#3111) Signed-off-by: PeganovAnton <peganoff2@mail.ru> Co-authored-by: Eric Harper <complex451@gmail.com> * Multiple ASR Fixes to SPE tokenization (#3119) * Reduce num workers for transcribe Signed-off-by: smajumdar <titu1994@gmail.com> * Fix SPE tokenizer vocabulary construction Signed-off-by: smajumdar <titu1994@gmail.com> * Update tokenizer building script Signed-off-by: smajumdar <titu1994@gmail.com> * Remove logs Signed-off-by: smajumdar <titu1994@gmail.com> * Megatron GPT training in BCP (#3095) * BCP megatron training Signed-off-by: madhukar <madhukar@penguin> * Add quotes Signed-off-by: madhukar <madhukar@penguin> * Style fix Signed-off-by: madhukar <madhukar@penguin> Co-authored-by: madhukar <madhukar@penguin> * Upgrade to PTL 1.5.0 (#3127) * update for ptl 1.5.0 Signed-off-by: ericharper <complex451@gmail.com> * update trainer config Signed-off-by: ericharper <complex451@gmail.com> * limit cuda visible devices to the first two gpus on check for ranks CI test Signed-off-by: ericharper <complex451@gmail.com> * remove comments Signed-off-by: ericharper <complex451@gmail.com> * make datasets larger for test Signed-off-by: ericharper <complex451@gmail.com> * make datasets larger for test Signed-off-by: ericharper <complex451@gmail.com> * update compute_max_steps Signed-off-by: ericharper <complex451@gmail.com> * update compute_max_steps Signed-off-by: ericharper <complex451@gmail.com> * update package info Signed-off-by: ericharper <complex451@gmail.com> * remove duplicate code Signed-off-by: ericharper <complex451@gmail.com> * remove comment Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: PeganovAnton <peganoff2@mail.ru> Co-authored-by: Madhukar K <26607911+madhukarkm@users.noreply.github.com> Co-authored-by: madhukar <madhukar@penguin>
This commit is contained in:
parent
663c76a972
commit
aaacc4b089
14
Jenkinsfile
vendored
14
Jenkinsfile
vendored
|
@ -1,8 +1,8 @@
|
|||
pipeline {
|
||||
agent {
|
||||
docker {
|
||||
image 'gitlab-master.nvidia.com/dl/dgx/pytorch:21.10-py3-devel'
|
||||
args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache/torch:/root/.cache/torch -v $HOME/.cache/huggingface/transformers:/root/.cache/huggingface/transformers --shm-size=8g'
|
||||
image 'nvcr.io/nvidia/pytorch:21.10-py3'
|
||||
args '--device=/dev/nvidia0 --gpus all --user 0:128 -v /home/TestData:/home/TestData -v $HOME/.cache/torch:/root/.cache/torch --shm-size=8g'
|
||||
}
|
||||
}
|
||||
options {
|
||||
|
@ -53,20 +53,12 @@ pipeline {
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
stage('NeMo Installation') {
|
||||
steps {
|
||||
sh './reinstall.sh release'
|
||||
}
|
||||
}
|
||||
|
||||
// Revert once import guards are added by PTL or version comparing is fixed
|
||||
stage('PTL Import Guards') {
|
||||
steps{
|
||||
sh 'sed -i "s/from pytorch_lightning.callbacks.quantization import QuantizationAwareTraining/try:\\n\\tfrom pytorch_lightning.callbacks.quantization import QuantizationAwareTraining\\nexcept:\\n\\tpass/g" /opt/conda/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py'
|
||||
}
|
||||
}
|
||||
|
||||
stage('PyTorch Lightning version') {
|
||||
steps {
|
||||
sh 'python -c "import pytorch_lightning; print(pytorch_lightning.__version__)"'
|
||||
|
@ -75,7 +67,7 @@ pipeline {
|
|||
|
||||
stage('PyTorch Lightning DDP Checks') {
|
||||
steps {
|
||||
sh 'python "tests/core_ptl/check_for_ranks.py"'
|
||||
sh 'CUDA_VISIBLE_DEVICES="0,1" python "tests/core_ptl/check_for_ranks.py"'
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
@ -17,6 +17,7 @@ from pathlib import Path
|
|||
from omegaconf.omegaconf import OmegaConf
|
||||
from pytorch_lightning import Trainer
|
||||
from pytorch_lightning.callbacks.timer import Timer
|
||||
from pytorch_lightning.plugins.environments.torchelastic_environment import TorchElasticEnvironment
|
||||
from pytorch_lightning.trainer.connectors.checkpoint_connector import CheckpointConnector
|
||||
|
||||
from nemo.collections.nlp.models.language_modeling.megatron_gpt_model import MegatronGPTModel
|
||||
|
@ -37,24 +38,23 @@ def main(cfg) -> None:
|
|||
logging.info("\n\n************** Experiment configuration ***********")
|
||||
logging.info(f'\n{OmegaConf.to_yaml(cfg)}')
|
||||
|
||||
plugins = [NLPDDPPlugin(num_nodes=cfg.trainer.num_nodes)]
|
||||
if cfg.trainer.precision == 16:
|
||||
trainer = Trainer(
|
||||
plugins=[
|
||||
NLPDDPPlugin(num_nodes=cfg.trainer.num_nodes),
|
||||
NLPNativeMixedPrecisionPlugin(
|
||||
init_scale=cfg.model.get('native_amp_init_scale', 2 ** 32),
|
||||
growth_interval=cfg.model.get('native_amp_growth_interval', 1000),
|
||||
),
|
||||
],
|
||||
**cfg.trainer,
|
||||
plugins.append(
|
||||
NLPNativeMixedPrecisionPlugin(
|
||||
init_scale=cfg.model.get('native_amp_init_scale', 2 ** 32),
|
||||
growth_interval=cfg.model.get('native_amp_growth_interval', 1000),
|
||||
)
|
||||
)
|
||||
elif cfg.trainer.precision == 'bf16':
|
||||
trainer = Trainer(
|
||||
plugins=[NLPDDPPlugin(num_nodes=cfg.trainer.num_nodes), NLPNativeBfloat16PrecisionPlugin(),],
|
||||
**cfg.trainer,
|
||||
)
|
||||
plugins.append(NLPNativeBfloat16PrecisionPlugin())
|
||||
else:
|
||||
trainer = Trainer(plugins=[NLPDDPPlugin(num_nodes=cfg.trainer.num_nodes), NLPPrecisionPlugin()], **cfg.trainer)
|
||||
plugins.append(NLPPrecisionPlugin())
|
||||
|
||||
if cfg.get('cluster_type', None) == 'BCP':
|
||||
plugins.append(TorchElasticEnvironment())
|
||||
|
||||
trainer = Trainer(plugins=plugins, **cfg.trainer)
|
||||
|
||||
exp_manager(trainer, cfg.exp_manager)
|
||||
|
||||
|
|
|
@ -270,12 +270,13 @@ class EncDecCTCModelBPE(EncDecCTCModel, ASRBPEMixin):
|
|||
Returns:
|
||||
A pytorch DataLoader for the given audio file(s).
|
||||
"""
|
||||
batch_size = min(config['batch_size'], len(config['paths2audio_files']))
|
||||
dl_config = {
|
||||
'manifest_filepath': os.path.join(config['temp_dir'], 'manifest.json'),
|
||||
'sample_rate': self.preprocessor._sample_rate,
|
||||
'batch_size': min(config['batch_size'], len(config['paths2audio_files'])),
|
||||
'batch_size': batch_size,
|
||||
'shuffle': False,
|
||||
'num_workers': os.cpu_count() - 1,
|
||||
'num_workers': min(batch_size, os.cpu_count() - 1),
|
||||
'pin_memory': True,
|
||||
'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False),
|
||||
}
|
||||
|
|
|
@ -650,14 +650,15 @@ class EncDecCTCModel(ASRModel, ExportableEncDecModel, ASRModuleMixin):
|
|||
Returns:
|
||||
A pytorch DataLoader for the given audio file(s).
|
||||
"""
|
||||
batch_size = min(config['batch_size'], len(config['paths2audio_files']))
|
||||
dl_config = {
|
||||
'manifest_filepath': os.path.join(config['temp_dir'], 'manifest.json'),
|
||||
'sample_rate': self.preprocessor._sample_rate,
|
||||
'labels': self.decoder.vocabulary,
|
||||
'batch_size': min(config['batch_size'], len(config['paths2audio_files'])),
|
||||
'batch_size': batch_size,
|
||||
'trim_silence': False,
|
||||
'shuffle': False,
|
||||
'num_workers': os.cpu_count() - 1,
|
||||
'num_workers': min(batch_size, os.cpu_count() - 1),
|
||||
'pin_memory': True,
|
||||
}
|
||||
|
||||
|
|
|
@ -349,12 +349,13 @@ class EncDecRNNTBPEModel(EncDecRNNTModel, ASRBPEMixin):
|
|||
Returns:
|
||||
A pytorch DataLoader for the given audio file(s).
|
||||
"""
|
||||
batch_size = min(config['batch_size'], len(config['paths2audio_files']))
|
||||
dl_config = {
|
||||
'manifest_filepath': os.path.join(config['temp_dir'], 'manifest.json'),
|
||||
'sample_rate': self.preprocessor._sample_rate,
|
||||
'batch_size': min(config['batch_size'], len(config['paths2audio_files'])),
|
||||
'batch_size': batch_size,
|
||||
'shuffle': False,
|
||||
'num_workers': os.cpu_count() - 1,
|
||||
'num_workers': min(batch_size, os.cpu_count() - 1),
|
||||
'pin_memory': True,
|
||||
'use_start_end_token': self.cfg.validation_ds.get('use_start_end_token', False),
|
||||
}
|
||||
|
|
|
@ -809,14 +809,15 @@ class EncDecRNNTModel(ASRModel, ASRModuleMixin, ExportableEncDecJointModel):
|
|||
Returns:
|
||||
A pytorch DataLoader for the given audio file(s).
|
||||
"""
|
||||
batch_size = min(config['batch_size'], len(config['paths2audio_files']))
|
||||
dl_config = {
|
||||
'manifest_filepath': os.path.join(config['temp_dir'], 'manifest.json'),
|
||||
'sample_rate': self.preprocessor._sample_rate,
|
||||
'labels': self.joint.vocabulary,
|
||||
'batch_size': min(config['batch_size'], len(config['paths2audio_files'])),
|
||||
'batch_size': batch_size,
|
||||
'trim_silence': False,
|
||||
'shuffle': False,
|
||||
'num_workers': os.cpu_count() - 1,
|
||||
'num_workers': min(batch_size, os.cpu_count() - 1),
|
||||
'pin_memory': True,
|
||||
}
|
||||
|
||||
|
|
|
@ -76,13 +76,12 @@ class ASRBPEMixin(ABC):
|
|||
|
||||
if 'special_tokens' in self.tokenizer_cfg:
|
||||
special_tokens = self.tokenizer_cfg['special_tokens']
|
||||
else:
|
||||
special_tokens = None
|
||||
|
||||
if special_tokens is not None:
|
||||
raise ValueError("`special_tokens` are no longer supported for SentencePiece based tokenizers.")
|
||||
|
||||
# Update special tokens
|
||||
self.tokenizer = tokenizers.SentencePieceTokenizer(
|
||||
model_path=model_path, special_tokens=special_tokens, legacy=True
|
||||
)
|
||||
self.tokenizer = tokenizers.SentencePieceTokenizer(model_path=model_path)
|
||||
|
||||
if 'vocab_path' in self.tokenizer_cfg:
|
||||
vocab_path = self.tokenizer_cfg.get('vocab_path')
|
||||
|
@ -102,11 +101,11 @@ class ASRBPEMixin(ABC):
|
|||
# fallback case for older checkpoints that did not preserve the tokenizer.vocab
|
||||
self.spe_vocab_path = None
|
||||
|
||||
vocabulary = {'<unk>': 0}
|
||||
with open(vocab_path) as f:
|
||||
for i, piece in enumerate(f):
|
||||
piece = piece.replace('\n', '')
|
||||
vocabulary[piece] = i + 1
|
||||
vocabulary = {}
|
||||
for i in range(self.tokenizer.vocab_size):
|
||||
piece = self.tokenizer.ids_to_tokens([i])
|
||||
piece = piece[0]
|
||||
vocabulary[piece] = i + 1
|
||||
|
||||
# wrapper method to get vocabulary conveniently
|
||||
def get_vocab():
|
||||
|
|
|
@ -529,11 +529,18 @@ def get_features_infer(
|
|||
st.append(subtokens)
|
||||
stm.append(subtokens_mask)
|
||||
_check_max_seq_length_and_margin_and_step(max_seq_length, margin, step)
|
||||
max_seq_length = min(max_seq_length, max(sent_lengths) + 2)
|
||||
if max_seq_length > max(sent_lengths) + 2:
|
||||
max_seq_length = max(sent_lengths) + 2
|
||||
# If `max_seq_length` is greater than maximum length of input query, parameters ``margin`` and ``step`` are
|
||||
# not used will not be used.
|
||||
step = 1
|
||||
# Maximum number of word subtokens in segment. The first and the last tokens in segment are CLS and EOS
|
||||
length = max_seq_length - 2
|
||||
else:
|
||||
# Maximum number of word subtokens in segment. The first and the last tokens in segment are CLS and EOS
|
||||
length = max_seq_length - 2
|
||||
step = min(length - margin * 2, step)
|
||||
logging.info(f'Max length: {max_seq_length}')
|
||||
# Maximum number of word subtokens in segment. The first and the last tokens in segment are CLS and EOS
|
||||
length = max_seq_length - 2
|
||||
step = min(length - margin * 2, step)
|
||||
get_stats(sent_lengths)
|
||||
all_input_ids, all_segment_ids, all_subtokens_mask, all_input_mask, all_input_mask = [], [], [], [], []
|
||||
all_quantities_of_preceding_words, all_query_ids, all_is_first, all_is_last = [], [], [], []
|
||||
|
|
|
@ -54,7 +54,7 @@ class NLPDDPPlugin(DDPPlugin):
|
|||
""" DDP plugin for Pytorch Lightning. Needed to customize DDP for model parallel models.
|
||||
"""
|
||||
|
||||
distributed_backend = "ddp"
|
||||
accelerator = "ddp"
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
|
|
|
@ -460,15 +460,15 @@ class ModelPT(LightningModule, Model):
|
|||
optim_config['sched']['t_max_epochs'] = self._trainer.max_epochs
|
||||
optim_config['sched']['t_accumulate_grad_batches'] = self._trainer.accumulate_grad_batches
|
||||
optim_config['sched']['t_limit_train_batches'] = self._trainer.limit_train_batches
|
||||
if self._trainer.distributed_backend is None:
|
||||
if self._trainer.accelerator is None:
|
||||
optim_config['sched']['t_num_workers'] = self._trainer.num_gpus or 1
|
||||
elif self._trainer.distributed_backend == "ddp_cpu":
|
||||
elif self._trainer.accelerator == "ddp_cpu":
|
||||
optim_config['sched']['t_num_workers'] = self._trainer.num_processes * self._trainer.num_nodes
|
||||
elif self._trainer.distributed_backend == "ddp":
|
||||
elif self._trainer.accelerator == "ddp":
|
||||
optim_config['sched']['t_num_workers'] = self._trainer.num_gpus * self._trainer.num_nodes
|
||||
else:
|
||||
logging.warning(
|
||||
f"The lightning trainer received accelerator: {self._trainer.distributed_backend}. We "
|
||||
f"The lightning trainer received accelerator: {self._trainer.accelerator}. We "
|
||||
"recommend to use 'ddp' instead."
|
||||
)
|
||||
optim_config['sched']['t_num_workers'] = self._trainer.num_gpus * self._trainer.num_nodes
|
||||
|
|
|
@ -93,6 +93,9 @@ class TrainerConfig:
|
|||
reload_dataloaders_every_n_epochs: int = 0
|
||||
ipus: Optional[int] = None
|
||||
devices: Any = None
|
||||
strategy: Any = None
|
||||
enable_checkpointing: bool = True
|
||||
enable_model_summary: bool = True
|
||||
|
||||
|
||||
# Register the trainer config.
|
||||
|
|
|
@ -786,8 +786,6 @@ def compute_max_steps(
|
|||
elif steps_per_epoch != float('inf'):
|
||||
# limit_train_batches is a percentage of batches per epoch
|
||||
steps_per_epoch = int(steps_per_epoch * limit_train_batches)
|
||||
if accumulate_grad_batches == 1:
|
||||
steps_per_epoch = max(steps_per_epoch, 1)
|
||||
|
||||
return math.ceil(steps_per_epoch / accumulate_grad_batches) * max_epochs
|
||||
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
MAJOR = 1
|
||||
MINOR = 5
|
||||
PATCH = 0
|
||||
PRE_RELEASE = 'b1'
|
||||
PRE_RELEASE = ''
|
||||
|
||||
# Use the following formatting: (major, minor, patch, pre-release)
|
||||
VERSION = (MAJOR, MINOR, PATCH, PRE_RELEASE)
|
||||
|
|
|
@ -732,10 +732,6 @@ class NeMoModelCheckpoint(ModelCheckpoint):
|
|||
self.best_model_path = best_k_models[0]
|
||||
self.best_model_score = self.best_k_models[self.best_model_path]
|
||||
|
||||
# # uninject mp_rank from paths
|
||||
# self.kth_best_model_path = self._uninject_mp_rank(self.kth_best_model_path)
|
||||
# self.best_model_path = self._uninject_mp_rank(self.best_model_path)
|
||||
|
||||
@staticmethod
|
||||
def _uninject_mp_rank(filepath):
|
||||
dirname = os.path.dirname(os.path.dirname(filepath))
|
||||
|
|
|
@ -10,9 +10,6 @@ echo 'Uninstalling stuff'
|
|||
${PIP} uninstall -y nemo_toolkit
|
||||
${PIP} uninstall -y sacrebleu
|
||||
|
||||
# TODO: revert when 1.5.0 is out
|
||||
${PIP} uninstall -y pytorch-lightning
|
||||
|
||||
# Kept for legacy purposes
|
||||
${PIP} uninstall -y nemo_asr
|
||||
${PIP} uninstall -y nemo_nlp
|
||||
|
@ -22,9 +19,6 @@ ${PIP} uninstall -y nemo_cv
|
|||
|
||||
${PIP} install -U setuptools
|
||||
|
||||
# TODO: revert when 1.5.0 is out
|
||||
${PIP} install pytorch-lightning==1.5.0rc0
|
||||
|
||||
echo 'Installing nemo and nemo_text_processing'
|
||||
if [[ "$INSTALL_OPTION" == "dev" ]]; then
|
||||
${PIP} install --editable ".[all]"
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
pytorch-lightning>1.4.9
|
||||
pytorch-lightning>=1.5.0
|
||||
torchmetrics>=0.4.1rc0
|
||||
transformers>=4.0.1
|
||||
webdataset>=0.1.48,<=0.1.62
|
||||
|
|
|
@ -73,9 +73,14 @@
|
|||
# --spe_max_sentencepiece_length: Limits the maximum length that any any SentencePiece subword can be.
|
||||
# Using this will change the subword tokens generated.
|
||||
#
|
||||
# --spe_pad: Adds <pad> as special token.
|
||||
#
|
||||
# --spe_bos: Adds <s> as Begining-of-Sentence special token.
|
||||
#
|
||||
# --spe_eos: Adds </s> as End-of-Sentence special token.
|
||||
#
|
||||
# --log: Whether the script should display log messages
|
||||
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
|
@ -205,8 +210,10 @@ def __process_data(
|
|||
Returns:
|
||||
"""
|
||||
if tokenizer_type == 'spe':
|
||||
|
||||
# Prepare directory of tokenizer
|
||||
if spe_max_sentencepiece_length > 0:
|
||||
tokenizer_dir = os.path.join(dst_folder, 'tokenizer_{}_{}_v{}_max{}').format(
|
||||
tokenizer_dir = os.path.join(dst_folder, 'tokenizer_{}_{}_v{}_max_{}').format(
|
||||
tokenizer_type, spe_type, vocab_size, spe_max_sentencepiece_length
|
||||
)
|
||||
else:
|
||||
|
@ -214,6 +221,13 @@ def __process_data(
|
|||
tokenizer_type, spe_type, vocab_size
|
||||
)
|
||||
|
||||
if spe_pad:
|
||||
tokenizer_dir = f'{tokenizer_dir}_pad'
|
||||
if spe_bos:
|
||||
tokenizer_dir = f'{tokenizer_dir}_bos'
|
||||
if spe_eos:
|
||||
tokenizer_dir = f'{tokenizer_dir}_eos'
|
||||
|
||||
if not os.path.exists(tokenizer_dir):
|
||||
os.makedirs(tokenizer_dir)
|
||||
|
||||
|
@ -221,6 +235,7 @@ def __process_data(
|
|||
logging.warning("Model file already exists, overriding old model file !")
|
||||
os.remove(os.path.join(tokenizer_dir, 'tokenizer.model'))
|
||||
|
||||
# Build tokenizer
|
||||
tokenizer_path, vocab_path = create_spt_model(
|
||||
data_file=text_path,
|
||||
vocab_size=vocab_size,
|
||||
|
|
|
@ -144,6 +144,7 @@ class TestEncDecCTCModel:
|
|||
assert new_model.vocab_path.endswith('_vocab.txt')
|
||||
assert new_model.spe_vocab_path.endswith('_tokenizer.vocab')
|
||||
|
||||
assert new_model.tokenizer.tokenizer.vocab_size == 128
|
||||
assert len(new_model.tokenizer.tokenizer.get_vocab()) == 128
|
||||
|
||||
@pytest.mark.unit
|
||||
|
|
|
@ -39,14 +39,13 @@ class TempModel(torch.nn.Module):
|
|||
|
||||
class OptCounter(torch.optim.SGD):
|
||||
def __init__(self, *args, **kwargs):
|
||||
self.count = 0
|
||||
super().__init__(*args, **kwargs)
|
||||
for group in self.param_groups:
|
||||
group.setdefault('count', 0)
|
||||
|
||||
def step(self, closure=None):
|
||||
try:
|
||||
self.count += 1
|
||||
except AttributeError:
|
||||
self.count = 1
|
||||
for group in self.param_groups:
|
||||
group['count'] += 1
|
||||
super().step(closure)
|
||||
|
||||
|
||||
|
@ -88,7 +87,8 @@ class ExampleModel(pl.LightningModule):
|
|||
class Callback(pl.callbacks.Callback):
|
||||
@pl.utilities.distributed.rank_zero_only
|
||||
def on_train_end(self, trainer, module):
|
||||
if trainer.global_step != module.my_opt.count or trainer.global_step != module.max_steps:
|
||||
count = module.my_opt.param_groups[0]['count']
|
||||
if trainer.global_step != count or trainer.global_step != module.max_steps:
|
||||
logging.debug(f"max_epochs: {trainer.max_epochs}")
|
||||
logging.debug(f"accumulate_grad_batches: {trainer.accumulate_grad_batches}")
|
||||
logging.debug(f"limit_train_batches: {trainer.limit_train_batches}")
|
||||
|
@ -98,12 +98,8 @@ class Callback(pl.callbacks.Callback):
|
|||
logging.debug(f"drop_last: {module.drop_last}")
|
||||
logging.debug(f"{len(trainer.train_dataloader)}")
|
||||
logging.debug(f"{trainer.num_training_batches }")
|
||||
assert (
|
||||
trainer.global_step == module.my_opt.count
|
||||
), f"{trainer.global_step} != {module.my_opt.count} != {module.max_steps}"
|
||||
assert (
|
||||
trainer.global_step == module.max_steps
|
||||
), f"{trainer.global_step} != {module.my_opt.count} != {module.max_steps}"
|
||||
assert trainer.global_step == count, f"{trainer.global_step} != {count} != {module.max_steps}"
|
||||
assert trainer.global_step == module.max_steps, f"{trainer.global_step} != {count} != {module.max_steps}"
|
||||
|
||||
|
||||
class TestOptimizersSchedulers:
|
||||
|
|
|
@ -54,10 +54,10 @@
|
|||
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
|
||||
"4. Run this cell to set up dependencies.\n",
|
||||
"\"\"\"\n",
|
||||
"BRANCH = 'main'\n",
|
||||
"# # If you're using Google Colab and not running locally, uncomment and run this cell.\n",
|
||||
"# !apt-get install sox libsndfile1 ffmpeg\n",
|
||||
"# !pip install wget unidecode\n",
|
||||
"# BRANCH = 'main'\n",
|
||||
"# !python -m pip install git+https://github.com/NeMo/NeMo.git@$BRANCH#egg=nemo_toolkit[tts]"
|
||||
]
|
||||
},
|
||||
|
|
|
@ -54,10 +54,10 @@
|
|||
"3. Connect to an instance with a GPU (Runtime -> Change runtime type -> select \"GPU\" for hardware accelerator)\n",
|
||||
"4. Run this cell to set up dependencies# .\n",
|
||||
"\"\"\"\n",
|
||||
"BRANCH = 'main'\n",
|
||||
"# # If you're using Colab and not running locally, uncomment and run this cell.\n",
|
||||
"# !apt-get install sox libsndfile1 ffmpeg\n",
|
||||
"# !pip install wget unidecode\n",
|
||||
"# BRANCH = 'main'\n",
|
||||
"# !python -m pip install git+https://github.com/NVIDIA/NeMo.git@$BRANCH#egg=nemo_toolkit[tts]"
|
||||
]
|
||||
},
|
||||
|
@ -154,8 +154,8 @@
|
|||
"source": [
|
||||
"# NeMo's training scripts are stored inside the examples/ folder. Let's grab the tacotron2.py file\n",
|
||||
"# as well as the tacotron2.yaml file\n",
|
||||
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/v1.0.2/examples/tts/tacotron2.py\n",
|
||||
"!mkdir conf && cd conf && wget https://raw.githubusercontent.com/NVIDIA/NeMo/v1.0.2/examples/tts/conf/tacotron2.yaml && cd .."
|
||||
"!wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/tacotron2.py\n",
|
||||
"!mkdir conf && cd conf && wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/tacotron2.yaml && cd .."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -306,15 +306,6 @@
|
|||
"python tacotron2.py train_dataset=YOUR_TRAIN.json validation_datasets=YOUR_VAL.json trainer.gpus=-1\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "2KctbQ61MmHy"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
|
Loading…
Reference in a new issue