ASL-MoViNet-T5-translator

Sleeping

App Files Files Community

ASL-MoViNet-T5-translator / official /legacy /bert /README.md

deanna-emery

updates

93528c6 over 1 year ago

preview code

raw

history blame contribute delete

15.6 kB

	# BERT (Bidirectional Encoder Representations from Transformers)

	WARNING: We are on the way to deprecating most of the code in this directory.
	Please see
	[this link](../g3doc/tutorials/bert_new.md)
	for the new tutorial and use the new code in `nlp/modeling`. This README is
	still correct for this legacy implementation.

	The academic paper which describes BERT in detail and provides full results on a
	number of tasks can be found here: https://arxiv.org/abs/1810.04805.

	This repository contains TensorFlow 2.x implementation for BERT.

	## Contents
	* [Contents](#contents)
	* [Pre-trained Models](#pre-trained-models)
	* [Restoring from Checkpoints](#restoring-from-checkpoints)
	* [Set Up](#set-up)
	* [Process Datasets](#process-datasets)
	* [Fine-tuning with BERT](#fine-tuning-with-bert)
	* [Cloud GPUs and TPUs](#cloud-gpus-and-tpus)
	* [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks)
	* [SQuAD 1.1](#squad-1.1)


	## Pre-trained Models

	We released both checkpoints and tf.hub modules as the pretrained models for
	fine-tuning. They are TF 2.x compatible and are converted from the checkpoints
	released in TF 1.x official BERT repository
	[google-research/bert](https://github.com/google-research/bert)
	in order to keep consistent with BERT paper.


	### Access to Pretrained Checkpoints

	Pretrained checkpoints can be found in the following links:

	**Note: We have switched BERT implementation
	to use Keras functional-style networks in [nlp/modeling](../modeling).
	The new checkpoints are:**

	* [`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz):
	12-layer, 768-hidden, 12-heads, 110M parameters
	* [`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz):
	12-layer, 768-hidden, 12-heads , 110M parameters
	* [`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Multilingual Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/multi_cased_L-12_H-768_A-12.tar.gz):
	104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters

	We recommend to host checkpoints on Google Cloud Storage buckets when you use
	Cloud GPU/TPU.

	### Restoring from Checkpoints

	`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore
	weights from provided pre-trained checkpoints, you can use the following code:

	```python
	init_checkpoint='the pretrained model checkpoint path.'
	model=tf.keras.Model() # Bert pre-trained model as feature extractor.
	checkpoint = tf.train.Checkpoint(model=model)
	checkpoint.restore(init_checkpoint)
	```

	Checkpoints featuring native serialized Keras models
	(i.e. model.load()/load_weights()) will be available soon.

	### Access to Pretrained hub modules.

	Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the
	following links:

	* [`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/):
	12-layer, 768-hidden, 12-heads, 110M parameters
	* [`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/):
	12-layer, 768-hidden, 12-heads , 110M parameters
	* [`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/):
	24-layer, 1024-hidden, 16-heads, 340M parameters
	* [`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/):
	104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
	* [`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/):
	Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads,
	110M parameters

	## Set Up

	```shell
	export PYTHONPATH="$PYTHONPATH:/path/to/models"
	```

	Install `tf-nightly` to get latest updates:

	```shell
	pip install tf-nightly-gpu
	```

	With TPU, GPU support is not necessary. First, you need to create a `tf-nightly`
	TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu):

	```shell
	ctpu up -name <instance name> --tf-version=”nightly”
	```

	Second, you need to install TF 2 `tf-nightly` on your VM:

	```shell
	pip install tf-nightly
	```

	## Process Datasets

	### Pre-training

	There is no change to generate pre-training data. Please use the script
	[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py)
	which is essentially branched from the [BERT research repo](https://github.com/google-research/bert)
	to get processed pre-training data and it adapts to TF2 symbols and python3
	compatibility.

	Running the pre-training script requires an input and output directory, as well as a vocab file. Note that max_seq_length will need to match the sequence length parameter you specify when you run pre-training.

	Example shell script to call create_pretraining_data.py
	```
	export WORKING_DIR='local disk or cloud location'
	export BERT_DIR='local disk or cloud location'
	python models/official/nlp/data/create_pretraining_data.py \
	--input_file=$WORKING_DIR/input/input.txt \
	--output_file=$WORKING_DIR/output/tf_examples.tfrecord \
	--vocab_file=$BERT_DIR/wwm_uncased_L-24_H-1024_A-16/vocab.txt \
	--do_lower_case=True \
	--max_seq_length=512 \
	--max_predictions_per_seq=76 \
	--masked_lm_prob=0.15 \
	--random_seed=12345 \
	--dupe_factor=5
	```

	### Fine-tuning

	To prepare the fine-tuning data for final model training, use the
	[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script.
	Resulting datasets in `tf_record` format and training meta data should be later
	passed to training or evaluation scripts. The task-specific arguments are
	described in the following sections:

	* GLUE

	Users can download the
	[GLUE data](https://gluebenchmark.com/tasks) by running
	[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e)
	and unpack it to some directory `$GLUE_DIR`.
	Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate it on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage.

	```shell
	export GLUE_DIR=~/glue
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16

	export TASK_NAME=MNLI
	export OUTPUT_DIR=gs://some_bucket/datasets
	python ../data/create_finetuning_data.py \
	--input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \
	--vocab_file=${BERT_DIR}/vocab.txt \
	--train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
	--eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
	--meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
	--fine_tuning_task_type=classification --max_seq_length=128 \
	--classification_task_name=${TASK_NAME}
	```

	* SQUAD

	The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains
	detailed information about the SQuAD datasets and evaluation.

	The necessary files can be found here:

	* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json)
	* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json)
	* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py)
	* [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json)
	* [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json)
	* [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/)

	```shell
	export SQUAD_DIR=~/squad
	export SQUAD_VERSION=v1.1
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export OUTPUT_DIR=gs://some_bucket/datasets

	python ../data/create_finetuning_data.py \
	--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \
	--vocab_file=${BERT_DIR}/vocab.txt \
	--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
	--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \
	--fine_tuning_task_type=squad --max_seq_length=384
	```

	Note: To create fine-tuning data with SQUAD 2.0, you need to add flag `--version_2_with_negative=True`.

	## Fine-tuning with BERT

	### Cloud GPUs and TPUs

	* Cloud Storage

	The unzipped pre-trained model files can also be found in the Google Cloud
	Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example:

	```shell
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export MODEL_DIR=gs://some_bucket/my_output_dir
	```

	Currently, users are able to access to `tf-nightly` TPUs and the following TPU
	script should run with `tf-nightly`.

	* GPU -> TPU

	Just add the following flags to `run_classifier.py` or `run_squad.py`:

	```shell
	--distribution_strategy=tpu
	--tpu=grpc://${TPU_IP_ADDRESS}:8470
	```

	### Sentence and Sentence-pair Classification Tasks

	This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase
	Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a
	few minutes on most GPUs.

	We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
	workflow.
	For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
	(uncased_L-12_H-768_A-12).

	```shell
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export MODEL_DIR=gs://some_bucket/my_output_dir
	export GLUE_DIR=gs://some_bucket/datasets
	export TASK=MRPC

	python run_classifier.py \
	--mode='train_and_eval' \
	--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
	--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
	--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
	--bert_config_file=${BERT_DIR}/bert_config.json \
	--init_checkpoint=${BERT_DIR}/bert_model.ckpt \
	--train_batch_size=4 \
	--eval_batch_size=4 \
	--steps_per_loop=1 \
	--learning_rate=2e-5 \
	--num_train_epochs=3 \
	--model_dir=${MODEL_DIR} \
	--distribution_strategy=mirrored
	```

	Alternatively, instead of specifying `init_checkpoint`, you can specify
	`hub_module_url` to employ a pre-trained BERT hub module, e.g.,
	` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`.

	After training a model, to get predictions from the classifier, you can set the
	`--mode=predict` and offer the test set tfrecords to `--eval_data_path`.
	The output will be created in file called test_results.tsv in the output folder.
	Each line will contain output for each sample, columns are the class
	probabilities.

	```shell
	python run_classifier.py \
	--mode='predict' \
	--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
	--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
	--bert_config_file=${BERT_DIR}/bert_config.json \
	--eval_batch_size=4 \
	--model_dir=${MODEL_DIR} \
	--distribution_strategy=mirrored
	```

	To use TPU, you only need to switch the distribution strategy type to `tpu` with TPU
	information and use remote storage for model checkpoints.

	```shell
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export TPU_IP_ADDRESS='???'
	export MODEL_DIR=gs://some_bucket/my_output_dir
	export GLUE_DIR=gs://some_bucket/datasets
	export TASK=MRPC

	python run_classifier.py \
	--mode='train_and_eval' \
	--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \
	--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \
	--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \
	--bert_config_file=${BERT_DIR}/bert_config.json \
	--init_checkpoint=${BERT_DIR}/bert_model.ckpt \
	--train_batch_size=32 \
	--eval_batch_size=32 \
	--steps_per_loop=1000 \
	--learning_rate=2e-5 \
	--num_train_epochs=3 \
	--model_dir=${MODEL_DIR} \
	--distribution_strategy=tpu \
	--tpu=grpc://${TPU_IP_ADDRESS}:8470
	```

	Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of
	training steps inside a `tf.function` can significantly increase TPU utilization
	and callbacks will not be called inside the loop.

	### SQuAD 1.1

	The Stanford Question Answering Dataset (SQuAD) is a popular question answering
	benchmark dataset. See more on [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/).

	We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the
	workflow.
	For GPU memory of 16GB or smaller, you may try to use `BERT-Base`
	(uncased_L-12_H-768_A-12).

	```shell
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export SQUAD_DIR=gs://some_bucket/datasets
	export MODEL_DIR=gs://some_bucket/my_output_dir
	export SQUAD_VERSION=v1.1

	python run_squad.py \
	--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
	--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
	--predict_file=${SQUAD_DIR}/dev-v1.1.json \
	--vocab_file=${BERT_DIR}/vocab.txt \
	--bert_config_file=${BERT_DIR}/bert_config.json \
	--init_checkpoint=${BERT_DIR}/bert_model.ckpt \
	--train_batch_size=4 \
	--predict_batch_size=4 \
	--learning_rate=8e-5 \
	--num_train_epochs=2 \
	--model_dir=${MODEL_DIR} \
	--distribution_strategy=mirrored
	```

	Similarly, you can replace `init_checkpoint` FLAG with `hub_module_url` to
	specify a hub module path.

	`run_squad.py` writes the prediction for `--predict_file` by default. If you set
	the `--model=predict` and offer the SQuAD test data, the scripts will generate
	the prediction json file.

	To use TPU, you need to switch the distribution strategy type to `tpu` with TPU
	information.

	```shell
	export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16
	export TPU_IP_ADDRESS='???'
	export MODEL_DIR=gs://some_bucket/my_output_dir
	export SQUAD_DIR=gs://some_bucket/datasets
	export SQUAD_VERSION=v1.1

	python run_squad.py \
	--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \
	--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \
	--predict_file=${SQUAD_DIR}/dev-v1.1.json \
	--vocab_file=${BERT_DIR}/vocab.txt \
	--bert_config_file=${BERT_DIR}/bert_config.json \
	--init_checkpoint=${BERT_DIR}/bert_model.ckpt \
	--train_batch_size=32 \
	--learning_rate=8e-5 \
	--num_train_epochs=2 \
	--model_dir=${MODEL_DIR} \
	--distribution_strategy=tpu \
	--tpu=grpc://${TPU_IP_ADDRESS}:8470
	```

	The dev set predictions will be saved into a file called predictions.json in the
	model_dir:

	```shell
	python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json
	```