[BERT/PyT] remove redundant section (#690)

This commit is contained in:
Sharath T S 2020-09-16 17:06:29 -07:00 committed by GitHub
parent aacbda693a
commit a74236afd4
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -22,7 +22,6 @@ This repository provides a script and recipe to train the BERT model for PyTorch
* [Pre-training parameters](#pre-training-parameters) * [Pre-training parameters](#pre-training-parameters)
* [Fine tuning parameters](#fine-tuning-parameters) * [Fine tuning parameters](#fine-tuning-parameters)
* [Multi-node](#multi-node) * [Multi-node](#multi-node)
* [Fine-tuning parameters](#fine-tuning-parameters)
* [Command-line options](#command-line-options) * [Command-line options](#command-line-options)
* [Getting the data](#getting-the-data) * [Getting the data](#getting-the-data)
* [Dataset guidelines](#dataset-guidelines) * [Dataset guidelines](#dataset-guidelines)
@ -472,7 +471,7 @@ Default arguments are listed below in the order `scripts/run_glue.sh` expects:
- Initial checkpoint - The default is `/workspace/bert/checkpoints/bert_uncased.pt`. - Initial checkpoint - The default is `/workspace/bert/checkpoints/bert_uncased.pt`.
- Data directory - The default is `/workspace/bert/data/download/glue/MRPC/`. - Data directory - The default is `/workspace/bert/data/download/glue/MRPC/`.
- Vocabulary file (token to ID mapping) - The default is `/workspace/bert/data/download/google_pretrained_weights/uncased_L-24_H-1024_A-16/vocab.txt`. - Vocabulary file (token to ID mapping) - The default is `/workspace/bert/vocab/vocab`.
- Config file for the BERT model (It should be the same as the pretrained model) - The default is `/workspace/bert/bert_config.json`. - Config file for the BERT model (It should be the same as the pretrained model) - The default is `/workspace/bert/bert_config.json`.
- Output directory for result - The default is `/workspace/bert/results/MRPC`. - Output directory for result - The default is `/workspace/bert/results/MRPC`.
- The name of the GLUE task (`mrpc` or `sst-2`) - The default is `mrpc` - The name of the GLUE task (`mrpc` or `sst-2`) - The default is `mrpc`
@ -506,139 +505,6 @@ Note that the `run.sub` script is a starting point that has to be adapted depend
Refer to the files contents to see the full list of variables to adjust for your system. Refer to the files contents to see the full list of variables to adjust for your system.
#### Fine-tuning parameters
* SQuAD
The `run_squad.py` script contains many of the same arguments as `run_pretraining.py`.
The main script specific parameters are:
```
--bert_model BERT_MODEL - Specifies the type of BERT model to use;
should be one of the following:
bert-base-uncased
bert-large-uncased
bert-base-cased
bert-base-multilingual
bert-base-chinese
--train_file TRAIN_FILE - Path to the SQuAD json for training.
For example, train-v1.1.json.
--predict_file PREDICT_FILE - Path to the SQuAD json for predictions.
For example, dev-v1.1.json or test-v1.1.json.
--max_seq_length MAX_SEQ_LENGTH
- The maximum total input sequence length
after WordPiece tokenization.
Sequences longer than this will be truncated,
and sequences shorter than this will be padded.
--doc_stride DOC_STRIDE - When splitting up a long document into chunks
this parameters sets how much stride to take
between chunks of tokens.
--max_query_length MAX_QUERY_LENGTH
- The maximum number of tokens for the question.
Questions longer than <max_query_length>
will be truncated to the value specified.
--n_best_size N_BEST_SIZE - The total number of n-best predictions to
generate in the nbest_predictions.json
output file.
--max_answer_length MAX_ANSWER_LENGTH
- The maximum length of an answer that can be
generated. This is needed because the start and
end predictions are not conditioned on one another.
--verbose_logging - If true, all the warnings related to data
processing will be printed. A number of warnings
are expected for a normal SQuAD evaluation.
--do_lower_case - Whether to lower case the input text. Set to
true for uncased models and false for cased models.
--version_2_with_negative - If true, the SQuAD examples contain questions
that do not have an answer.
--null_score_diff_threshold NULL_SCORE_DIFF_THRES HOLD
- A null answer will be predicted if null_score if
best_non_null is greater than NULL_SCORE_DIFF_THRESHOLD.
```
* GLUE
The `run_glue.py` script contains many of the same arguments as `run_pretraining.py`.
The main script specific parameters are:
```
--data_dir DATA_DIR The input data dir. Should contain the .tsv files (or
other data files) for the task.
--bert_model BERT_MODEL
Bert pre-trained model selected in the list: bert-
base-uncased, bert-large-uncased, bert-base-cased,
bert-large-cased, bert-base-multilingual-uncased,
bert-base-multilingual-cased, bert-base-chinese.
--task_name {cola,mnli,mrpc,sst-2}
The name of the task to train.
--output_dir OUTPUT_DIR
The output directory where the model predictions and
checkpoints will be written.
--init_checkpoint INIT_CHECKPOINT
The checkpoint file from pretraining
--max_seq_length MAX_SEQ_LENGTH
The maximum total input sequence length after
WordPiece tokenization. Sequences longer than this
will be truncated, and sequences shorter than this
will be padded.
--do_train Whether to run training.
--do_eval Whether to get model-task performance on the dev set
by running eval.
--do_predict Whether to output prediction results on the dev set by
running eval.
--do_lower_case Set this flag if you are using an uncased model.
--train_batch_size TRAIN_BATCH_SIZE
Batch size per GPU for training.
--eval_batch_size EVAL_BATCH_SIZE
Batch size per GPU for eval.
--learning_rate LEARNING_RATE
The initial learning rate for Adam.
--num_train_epochs NUM_TRAIN_EPOCHS
Total number of training epochs to perform.
--max_steps MAX_STEPS
Total number of training steps to perform.
--warmup_proportion WARMUP_PROPORTION
Proportion of training to perform linear learning rate
warmup for. E.g., 0.1 = 10% of training.
--no_cuda Whether not to use CUDA when available
--local_rank LOCAL_RANK
local_rank for distributed training on gpus
--seed SEED random seed for initialization
--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS
Number of updates steps to accumulate before
performing a backward/update pass.
--fp16 Mixed precision training
--amp Mixed precision training
--loss_scale LOSS_SCALE
Loss scaling to improve fp16 numeric stability. Only
used when fp16 set to True. 0 (default value): dynamic
loss scaling. Positive power of 2: static loss scaling
value.
--server_ip SERVER_IP
Can be used for distant debugging.
--server_port SERVER_PORT
Can be used for distant debugging.
--vocab_file VOCAB_FILE
Vocabulary mapping/file BERT was pretrainined on
--config_file CONFIG_FILE
The BERT model config
--skip_checkpoint Whether to save checkpoints
```
### Command-line options ### Command-line options
To see the full list of available options and their descriptions, use the `-h` or `--help` command line option, for example: To see the full list of available options and their descriptions, use the `-h` or `--help` command line option, for example: