Spaces:
Sleeping
Sleeping
# BERT (Bidirectional Encoder Representations from Transformers) | |
**WARNING**: We are on the way to deprecating most of the code in this directory. | |
Please see | |
[this link](../g3doc/tutorials/bert_new.md) | |
for the new tutorial and use the new code in `nlp/modeling`. This README is | |
still correct for this legacy implementation. | |
The academic paper which describes BERT in detail and provides full results on a | |
number of tasks can be found here: https://arxiv.org/abs/1810.04805. | |
This repository contains TensorFlow 2.x implementation for BERT. | |
## Contents | |
* [Contents](#contents) | |
* [Pre-trained Models](#pre-trained-models) | |
* [Restoring from Checkpoints](#restoring-from-checkpoints) | |
* [Set Up](#set-up) | |
* [Process Datasets](#process-datasets) | |
* [Fine-tuning with BERT](#fine-tuning-with-bert) | |
* [Cloud GPUs and TPUs](#cloud-gpus-and-tpus) | |
* [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks) | |
* [SQuAD 1.1](#squad-1.1) | |
## Pre-trained Models | |
We released both checkpoints and tf.hub modules as the pretrained models for | |
fine-tuning. They are TF 2.x compatible and are converted from the checkpoints | |
released in TF 1.x official BERT repository | |
[google-research/bert](https://github.com/google-research/bert) | |
in order to keep consistent with BERT paper. | |
### Access to Pretrained Checkpoints | |
Pretrained checkpoints can be found in the following links: | |
**Note: We have switched BERT implementation | |
to use Keras functional-style networks in [nlp/modeling](../modeling). | |
The new checkpoints are:** | |
* **[`BERT-Large, Uncased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_uncased_L-24_H-1024_A-16.tar.gz)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Large, Cased (Whole Word Masking)`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/wwm_cased_L-24_H-1024_A-16.tar.gz)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12.tar.gz)**: | |
12-layer, 768-hidden, 12-heads, 110M parameters | |
* **[`BERT-Large, Uncased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16.tar.gz)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz)**: | |
12-layer, 768-hidden, 12-heads , 110M parameters | |
* **[`BERT-Large, Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-24_H-1024_A-16.tar.gz)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Multilingual Cased`](https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/multi_cased_L-12_H-768_A-12.tar.gz)**: | |
104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters | |
We recommend to host checkpoints on Google Cloud Storage buckets when you use | |
Cloud GPU/TPU. | |
### Restoring from Checkpoints | |
`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore | |
weights from provided pre-trained checkpoints, you can use the following code: | |
```python | |
init_checkpoint='the pretrained model checkpoint path.' | |
model=tf.keras.Model() # Bert pre-trained model as feature extractor. | |
checkpoint = tf.train.Checkpoint(model=model) | |
checkpoint.restore(init_checkpoint) | |
``` | |
Checkpoints featuring native serialized Keras models | |
(i.e. model.load()/load_weights()) will be available soon. | |
### Access to Pretrained hub modules. | |
Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the | |
following links: | |
* **[`BERT-Large, Uncased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_uncased_L-24_H-1024_A-16/)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Large, Cased (Whole Word Masking)`](https://tfhub.dev/tensorflow/bert_en_wwm_cased_L-24_H-1024_A-16/)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/)**: | |
12-layer, 768-hidden, 12-heads, 110M parameters | |
* **[`BERT-Large, Uncased`](https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-12_H-768_A-12/)**: | |
12-layer, 768-hidden, 12-heads , 110M parameters | |
* **[`BERT-Large, Cased`](https://tfhub.dev/tensorflow/bert_en_cased_L-24_H-1024_A-16/)**: | |
24-layer, 1024-hidden, 16-heads, 340M parameters | |
* **[`BERT-Base, Multilingual Cased`](https://tfhub.dev/tensorflow/bert_multi_cased_L-12_H-768_A-12/)**: | |
104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters | |
* **[`BERT-Base, Chinese`](https://tfhub.dev/tensorflow/bert_zh_L-12_H-768_A-12/)**: | |
Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, | |
110M parameters | |
## Set Up | |
```shell | |
export PYTHONPATH="$PYTHONPATH:/path/to/models" | |
``` | |
Install `tf-nightly` to get latest updates: | |
```shell | |
pip install tf-nightly-gpu | |
``` | |
With TPU, GPU support is not necessary. First, you need to create a `tf-nightly` | |
TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu): | |
```shell | |
ctpu up -name <instance name> --tf-version=”nightly” | |
``` | |
Second, you need to install TF 2 `tf-nightly` on your VM: | |
```shell | |
pip install tf-nightly | |
``` | |
## Process Datasets | |
### Pre-training | |
There is no change to generate pre-training data. Please use the script | |
[`../data/create_pretraining_data.py`](../data/create_pretraining_data.py) | |
which is essentially branched from the [BERT research repo](https://github.com/google-research/bert) | |
to get processed pre-training data and it adapts to TF2 symbols and python3 | |
compatibility. | |
Running the pre-training script requires an input and output directory, as well as a vocab file. Note that max_seq_length will need to match the sequence length parameter you specify when you run pre-training. | |
Example shell script to call create_pretraining_data.py | |
``` | |
export WORKING_DIR='local disk or cloud location' | |
export BERT_DIR='local disk or cloud location' | |
python models/official/nlp/data/create_pretraining_data.py \ | |
--input_file=$WORKING_DIR/input/input.txt \ | |
--output_file=$WORKING_DIR/output/tf_examples.tfrecord \ | |
--vocab_file=$BERT_DIR/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ | |
--do_lower_case=True \ | |
--max_seq_length=512 \ | |
--max_predictions_per_seq=76 \ | |
--masked_lm_prob=0.15 \ | |
--random_seed=12345 \ | |
--dupe_factor=5 | |
``` | |
### Fine-tuning | |
To prepare the fine-tuning data for final model training, use the | |
[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script. | |
Resulting datasets in `tf_record` format and training meta data should be later | |
passed to training or evaluation scripts. The task-specific arguments are | |
described in the following sections: | |
* GLUE | |
Users can download the | |
[GLUE data](https://gluebenchmark.com/tasks) by running | |
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) | |
and unpack it to some directory `$GLUE_DIR`. | |
Also, users can download [Pretrained Checkpoint](#access-to-pretrained-checkpoints) and locate it on some directory `$BERT_DIR` instead of using checkpoints on Google Cloud Storage. | |
```shell | |
export GLUE_DIR=~/glue | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export TASK_NAME=MNLI | |
export OUTPUT_DIR=gs://some_bucket/datasets | |
python ../data/create_finetuning_data.py \ | |
--input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \ | |
--vocab_file=${BERT_DIR}/vocab.txt \ | |
--train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \ | |
--eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \ | |
--meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \ | |
--fine_tuning_task_type=classification --max_seq_length=128 \ | |
--classification_task_name=${TASK_NAME} | |
``` | |
* SQUAD | |
The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains | |
detailed information about the SQuAD datasets and evaluation. | |
The necessary files can be found here: | |
* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json) | |
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json) | |
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py) | |
* [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json) | |
* [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json) | |
* [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/) | |
```shell | |
export SQUAD_DIR=~/squad | |
export SQUAD_VERSION=v1.1 | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export OUTPUT_DIR=gs://some_bucket/datasets | |
python ../data/create_finetuning_data.py \ | |
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \ | |
--vocab_file=${BERT_DIR}/vocab.txt \ | |
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ | |
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \ | |
--fine_tuning_task_type=squad --max_seq_length=384 | |
``` | |
Note: To create fine-tuning data with SQUAD 2.0, you need to add flag `--version_2_with_negative=True`. | |
## Fine-tuning with BERT | |
### Cloud GPUs and TPUs | |
* Cloud Storage | |
The unzipped pre-trained model files can also be found in the Google Cloud | |
Storage folder `gs://cloud-tpu-checkpoints/bert/keras_bert`. For example: | |
```shell | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export MODEL_DIR=gs://some_bucket/my_output_dir | |
``` | |
Currently, users are able to access to `tf-nightly` TPUs and the following TPU | |
script should run with `tf-nightly`. | |
* GPU -> TPU | |
Just add the following flags to `run_classifier.py` or `run_squad.py`: | |
```shell | |
--distribution_strategy=tpu | |
--tpu=grpc://${TPU_IP_ADDRESS}:8470 | |
``` | |
### Sentence and Sentence-pair Classification Tasks | |
This example code fine-tunes `BERT-Large` on the Microsoft Research Paraphrase | |
Corpus (MRPC) corpus, which only contains 3,600 examples and can fine-tune in a | |
few minutes on most GPUs. | |
We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the | |
workflow. | |
For GPU memory of 16GB or smaller, you may try to use `BERT-Base` | |
(uncased_L-12_H-768_A-12). | |
```shell | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export MODEL_DIR=gs://some_bucket/my_output_dir | |
export GLUE_DIR=gs://some_bucket/datasets | |
export TASK=MRPC | |
python run_classifier.py \ | |
--mode='train_and_eval' \ | |
--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ | |
--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ | |
--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ | |
--bert_config_file=${BERT_DIR}/bert_config.json \ | |
--init_checkpoint=${BERT_DIR}/bert_model.ckpt \ | |
--train_batch_size=4 \ | |
--eval_batch_size=4 \ | |
--steps_per_loop=1 \ | |
--learning_rate=2e-5 \ | |
--num_train_epochs=3 \ | |
--model_dir=${MODEL_DIR} \ | |
--distribution_strategy=mirrored | |
``` | |
Alternatively, instead of specifying `init_checkpoint`, you can specify | |
`hub_module_url` to employ a pre-trained BERT hub module, e.g., | |
` --hub_module_url=https://tfhub.dev/tensorflow/bert_en_uncased_L-24_H-1024_A-16/1`. | |
After training a model, to get predictions from the classifier, you can set the | |
`--mode=predict` and offer the test set tfrecords to `--eval_data_path`. | |
The output will be created in file called test_results.tsv in the output folder. | |
Each line will contain output for each sample, columns are the class | |
probabilities. | |
```shell | |
python run_classifier.py \ | |
--mode='predict' \ | |
--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ | |
--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ | |
--bert_config_file=${BERT_DIR}/bert_config.json \ | |
--eval_batch_size=4 \ | |
--model_dir=${MODEL_DIR} \ | |
--distribution_strategy=mirrored | |
``` | |
To use TPU, you only need to switch the distribution strategy type to `tpu` with TPU | |
information and use remote storage for model checkpoints. | |
```shell | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export TPU_IP_ADDRESS='???' | |
export MODEL_DIR=gs://some_bucket/my_output_dir | |
export GLUE_DIR=gs://some_bucket/datasets | |
export TASK=MRPC | |
python run_classifier.py \ | |
--mode='train_and_eval' \ | |
--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ | |
--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ | |
--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ | |
--bert_config_file=${BERT_DIR}/bert_config.json \ | |
--init_checkpoint=${BERT_DIR}/bert_model.ckpt \ | |
--train_batch_size=32 \ | |
--eval_batch_size=32 \ | |
--steps_per_loop=1000 \ | |
--learning_rate=2e-5 \ | |
--num_train_epochs=3 \ | |
--model_dir=${MODEL_DIR} \ | |
--distribution_strategy=tpu \ | |
--tpu=grpc://${TPU_IP_ADDRESS}:8470 | |
``` | |
Note that, we specify `steps_per_loop=1000` for TPU, because running a loop of | |
training steps inside a `tf.function` can significantly increase TPU utilization | |
and callbacks will not be called inside the loop. | |
### SQuAD 1.1 | |
The Stanford Question Answering Dataset (SQuAD) is a popular question answering | |
benchmark dataset. See more on [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/). | |
We use the `BERT-Large` (uncased_L-24_H-1024_A-16) as an example throughout the | |
workflow. | |
For GPU memory of 16GB or smaller, you may try to use `BERT-Base` | |
(uncased_L-12_H-768_A-12). | |
```shell | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export SQUAD_DIR=gs://some_bucket/datasets | |
export MODEL_DIR=gs://some_bucket/my_output_dir | |
export SQUAD_VERSION=v1.1 | |
python run_squad.py \ | |
--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ | |
--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ | |
--predict_file=${SQUAD_DIR}/dev-v1.1.json \ | |
--vocab_file=${BERT_DIR}/vocab.txt \ | |
--bert_config_file=${BERT_DIR}/bert_config.json \ | |
--init_checkpoint=${BERT_DIR}/bert_model.ckpt \ | |
--train_batch_size=4 \ | |
--predict_batch_size=4 \ | |
--learning_rate=8e-5 \ | |
--num_train_epochs=2 \ | |
--model_dir=${MODEL_DIR} \ | |
--distribution_strategy=mirrored | |
``` | |
Similarly, you can replace `init_checkpoint` FLAG with `hub_module_url` to | |
specify a hub module path. | |
`run_squad.py` writes the prediction for `--predict_file` by default. If you set | |
the `--model=predict` and offer the SQuAD test data, the scripts will generate | |
the prediction json file. | |
To use TPU, you need to switch the distribution strategy type to `tpu` with TPU | |
information. | |
```shell | |
export BERT_DIR=gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-24_H-1024_A-16 | |
export TPU_IP_ADDRESS='???' | |
export MODEL_DIR=gs://some_bucket/my_output_dir | |
export SQUAD_DIR=gs://some_bucket/datasets | |
export SQUAD_VERSION=v1.1 | |
python run_squad.py \ | |
--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ | |
--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ | |
--predict_file=${SQUAD_DIR}/dev-v1.1.json \ | |
--vocab_file=${BERT_DIR}/vocab.txt \ | |
--bert_config_file=${BERT_DIR}/bert_config.json \ | |
--init_checkpoint=${BERT_DIR}/bert_model.ckpt \ | |
--train_batch_size=32 \ | |
--learning_rate=8e-5 \ | |
--num_train_epochs=2 \ | |
--model_dir=${MODEL_DIR} \ | |
--distribution_strategy=tpu \ | |
--tpu=grpc://${TPU_IP_ADDRESS}:8470 | |
``` | |
The dev set predictions will be saved into a file called predictions.json in the | |
model_dir: | |
```shell | |
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json | |
``` | |