|
# ALBERT (ALBERT: A Lite BERT for Self-supervised Learning of Language Representations) |
|
|
|
The academic paper which describes ALBERT in detail and provides full results on |
|
a number of tasks can be found here: https://arxiv.org/abs/1909.11942. |
|
|
|
This repository contains TensorFlow 2.x implementation for ALBERT. |
|
|
|
## Contents |
|
* [Contents](#contents) |
|
* [Pre-trained Models](#pre-trained-models) |
|
* [Restoring from Checkpoints](#restoring-from-checkpoints) |
|
* [Set Up](#set-up) |
|
* [Process Datasets](#process-datasets) |
|
* [Fine-tuning with BERT](#fine-tuning-with-bert) |
|
* [Cloud GPUs and TPUs](#cloud-gpus-and-tpus) |
|
* [Sentence and Sentence-pair Classification Tasks](#sentence-and-sentence-pair-classification-tasks) |
|
* [SQuAD 1.1](#squad-1.1) |
|
|
|
|
|
## Pre-trained Models |
|
|
|
We released both checkpoints and tf.hub modules as the pretrained models for |
|
fine-tuning. They are TF 2.x compatible and are converted from the ALBERT v2 |
|
checkpoints released in TF 1.x official ALBERT repository |
|
[google-research/albert](https://github.com/google-research/albert) |
|
in order to keep consistent with ALBERT paper. |
|
|
|
Our current released checkpoints are exactly the same as TF 1.x official ALBERT |
|
repository. |
|
|
|
### Access to Pretrained Checkpoints |
|
|
|
Pretrained checkpoints can be found in the following links: |
|
|
|
**Note: We implemented ALBERT using Keras functional-style networks in [nlp/modeling](../modeling). |
|
ALBERT V2 models compatible with TF 2.x checkpoints are:** |
|
|
|
* **[`ALBERT V2 Base`](https://storage.googleapis.com/cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base.tar.gz)**: |
|
12-layer, 768-hidden, 12-heads, 12M parameters |
|
* **[`ALBERT V2 Large`](https://storage.googleapis.com/cloud-tpu-checkpoints/albert/checkpoints/albert_v2_large.tar.gz)**: |
|
24-layer, 1024-hidden, 16-heads, 18M parameters |
|
* **[`ALBERT V2 XLarge`](https://storage.googleapis.com/cloud-tpu-checkpoints/albert/checkpoints/albert_v2_xlarge.tar.gz)**: |
|
24-layer, 2048-hidden, 32-heads, 60M parameters |
|
* **[`ALBERT V2 XXLarge`](https://storage.googleapis.com/cloud-tpu-checkpoints/albert/checkpoints/albert_v2_xxlarge.tar.gz)**: |
|
12-layer, 4096-hidden, 64-heads, 235M parameters |
|
|
|
We recommend to host checkpoints on Google Cloud storage buckets when you use |
|
Cloud GPU/TPU. |
|
|
|
### Restoring from Checkpoints |
|
|
|
`tf.train.Checkpoint` is used to manage model checkpoints in TF 2. To restore |
|
weights from provided pre-trained checkpoints, you can use the following code: |
|
|
|
```python |
|
init_checkpoint='the pretrained model checkpoint path.' |
|
model=tf.keras.Model() # Bert pre-trained model as feature extractor. |
|
checkpoint = tf.train.Checkpoint(model=model) |
|
checkpoint.restore(init_checkpoint) |
|
``` |
|
|
|
Checkpoints featuring native serialized Keras models |
|
(i.e. model.load()/load_weights()) will be available soon. |
|
|
|
### Access to Pretrained hub modules. |
|
|
|
Pretrained tf.hub modules in TF 2.x SavedModel format can be found in the |
|
following links: |
|
|
|
* **[`ALBERT V2 Base`](https://tfhub.dev/tensorflow/albert_en_base/1)**: |
|
12-layer, 768-hidden, 12-heads, 12M parameters |
|
* **[`ALBERT V2 Large`](https://tfhub.dev/tensorflow/albert_en_large/1)**: |
|
24-layer, 1024-hidden, 16-heads, 18M parameters |
|
* **[`ALBERT V2 XLarge`](https://tfhub.dev/tensorflow/albert_en_xlarge/1)**: |
|
24-layer, 2048-hidden, 32-heads, 60M parameters |
|
* **[`ALBERT V2 XXLarge`](https://tfhub.dev/tensorflow/albert_en_xxlarge/1)**: |
|
12-layer, 4096-hidden, 64-heads, 235M parameters |
|
|
|
## Set Up |
|
|
|
```shell |
|
export PYTHONPATH="$PYTHONPATH:/path/to/models" |
|
``` |
|
|
|
Install `tf-nightly` to get latest updates: |
|
|
|
```shell |
|
pip install tf-nightly-gpu |
|
``` |
|
|
|
With TPU, GPU support is not necessary. First, you need to create a `tf-nightly` |
|
TPU with [ctpu tool](https://github.com/tensorflow/tpu/tree/master/tools/ctpu): |
|
|
|
```shell |
|
ctpu up -name <instance name> --tf-version=”nightly” |
|
``` |
|
|
|
Second, you need to install TF 2 `tf-nightly` on your VM: |
|
|
|
```shell |
|
pip install tf-nightly |
|
``` |
|
|
|
Warning: More details TPU-specific set-up instructions and tutorial should come |
|
along with official TF 2.x release for TPU. Note that this repo is not |
|
officially supported by Google Cloud TPU team yet until TF 2.1 released. |
|
|
|
## Process Datasets |
|
|
|
### Pre-training |
|
|
|
Pre-train ALBERT using TF2.x will come soon. |
|
For now, please use [ALBERT research repo](https://github.com/google-research/ALBERT) |
|
to pretrain the model and convert the checkpoint to TF2.x compatible ones using |
|
[tf2_albert_encoder_checkpoint_converter.py](tf2_albert_encoder_checkpoint_converter.py). |
|
|
|
|
|
|
|
### Fine-tuning |
|
|
|
To prepare the fine-tuning data for final model training, use the |
|
[`../data/create_finetuning_data.py`](../data/create_finetuning_data.py) script. |
|
Note that different from BERT models that use word piece tokenzer, |
|
ALBERT models employ sentence piece tokenizer. So the FLAG tokenizer_impl has |
|
to be set to 'sentence_piece'. |
|
Resulting datasets in `tf_record` format and training meta data should be later |
|
passed to training or evaluation scripts. The task-specific arguments are |
|
described in following sections: |
|
|
|
* GLUE |
|
|
|
Users can download the |
|
[GLUE data](https://gluebenchmark.com/tasks) by running |
|
[this script](https://gist.github.com/W4ngatang/60c2bdb54d156a41194446737ce03e2e) |
|
and unpack it to some directory `$GLUE_DIR`. |
|
|
|
```shell |
|
export GLUE_DIR=~/glue |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
|
|
export TASK_NAME=MNLI |
|
export OUTPUT_DIR=gs://some_bucket/datasets |
|
python ../data/create_finetuning_data.py \ |
|
--input_data_dir=${GLUE_DIR}/${TASK_NAME}/ \ |
|
--sp_model_file=${ALBERT_DIR}/30k-clean.model \ |
|
--train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \ |
|
--eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \ |
|
--meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \ |
|
--fine_tuning_task_type=classification --max_seq_length=128 \ |
|
--classification_task_name=${TASK_NAME} \ |
|
--tokenizer_impl=sentence_piece |
|
``` |
|
|
|
* SQUAD |
|
|
|
The [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/) contains |
|
detailed information about the SQuAD datasets and evaluation. |
|
|
|
The necessary files can be found here: |
|
|
|
* [train-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json) |
|
* [dev-v1.1.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json) |
|
* [evaluate-v1.1.py](https://github.com/allenai/bi-att-flow/blob/master/squad/evaluate-v1.1.py) |
|
* [train-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json) |
|
* [dev-v2.0.json](https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json) |
|
* [evaluate-v2.0.py](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/) |
|
|
|
```shell |
|
export SQUAD_DIR=~/squad |
|
export SQUAD_VERSION=v1.1 |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export OUTPUT_DIR=gs://some_bucket/datasets |
|
|
|
python ../data/create_finetuning_data.py \ |
|
--squad_data_file=${SQUAD_DIR}/train-${SQUAD_VERSION}.json \ |
|
--sp_model_file=${ALBERT_DIR}/30k-clean.model \ |
|
--train_data_output_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ |
|
--meta_data_file_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}_meta_data \ |
|
--fine_tuning_task_type=squad --max_seq_length=384 \ |
|
--tokenizer_impl=sentence_piece |
|
``` |
|
|
|
## Fine-tuning with ALBERT |
|
|
|
### Cloud GPUs and TPUs |
|
|
|
* Cloud Storage |
|
|
|
The unzipped pre-trained model files can also be found in the Google Cloud |
|
Storage folder `gs://cloud-tpu-checkpoints/albert/checkpoints`. For example: |
|
|
|
```shell |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export MODEL_DIR=gs://some_bucket/my_output_dir |
|
``` |
|
|
|
Currently, users are able to access to `tf-nightly` TPUs and the following TPU |
|
script should run with `tf-nightly`. |
|
|
|
* GPU -> TPU |
|
|
|
Just add the following flags to `run_classifier.py` or `run_squad.py`: |
|
|
|
```shell |
|
--distribution_strategy=tpu |
|
--tpu=grpc://${TPU_IP_ADDRESS}:8470 |
|
``` |
|
|
|
### Sentence and Sentence-pair Classification Tasks |
|
|
|
This example code fine-tunes `albert_v2_base` on the Microsoft Research |
|
Paraphrase Corpus (MRPC) corpus, which only contains 3,600 examples and can |
|
fine-tune in a few minutes on most GPUs. |
|
|
|
We use the `albert_v2_base` as an example throughout the |
|
workflow. |
|
|
|
|
|
```shell |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export MODEL_DIR=gs://some_bucket/my_output_dir |
|
export GLUE_DIR=gs://some_bucket/datasets |
|
export TASK=MRPC |
|
|
|
python run_classifier.py \ |
|
--mode='train_and_eval' \ |
|
--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ |
|
--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ |
|
--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ |
|
--bert_config_file=${ALBERT_DIR}/albert_config.json \ |
|
--init_checkpoint=${ALBERT_DIR}/bert_model.ckpt \ |
|
--train_batch_size=4 \ |
|
--eval_batch_size=4 \ |
|
--steps_per_loop=1 \ |
|
--learning_rate=2e-5 \ |
|
--num_train_epochs=3 \ |
|
--model_dir=${MODEL_DIR} \ |
|
--distribution_strategy=mirrored |
|
``` |
|
|
|
Alternatively, instead of specifying `init_checkpoint`, you can specify |
|
`hub_module_url` to employ a pretraind BERT hub module, e.g., |
|
` --hub_module_url=https://tfhub.dev/tensorflow/albert_en_base/1`. |
|
|
|
To use TPU, you only need to switch distribution strategy type to `tpu` with TPU |
|
information and use remote storage for model checkpoints. |
|
|
|
```shell |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export TPU_IP_ADDRESS='???' |
|
export MODEL_DIR=gs://some_bucket/my_output_dir |
|
export GLUE_DIR=gs://some_bucket/datasets |
|
|
|
python run_classifier.py \ |
|
--mode='train_and_eval' \ |
|
--input_meta_data_path=${GLUE_DIR}/${TASK}_meta_data \ |
|
--train_data_path=${GLUE_DIR}/${TASK}_train.tf_record \ |
|
--eval_data_path=${GLUE_DIR}/${TASK}_eval.tf_record \ |
|
--bert_config_file=$ALBERT_DIR/albert_config.json \ |
|
--init_checkpoint=$ALBERT_DIR/bert_model.ckpt \ |
|
--train_batch_size=32 \ |
|
--eval_batch_size=32 \ |
|
--learning_rate=2e-5 \ |
|
--num_train_epochs=3 \ |
|
--model_dir=${MODEL_DIR} \ |
|
--distribution_strategy=tpu \ |
|
--tpu=grpc://${TPU_IP_ADDRESS}:8470 |
|
``` |
|
|
|
### SQuAD 1.1 |
|
|
|
The Stanford Question Answering Dataset (SQuAD) is a popular question answering |
|
benchmark dataset. See more in [SQuAD website](https://rajpurkar.github.io/SQuAD-explorer/). |
|
|
|
We use the `albert_v2_base` as an example throughout the |
|
workflow. |
|
|
|
```shell |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export SQUAD_DIR=gs://some_bucket/datasets |
|
export MODEL_DIR=gs://some_bucket/my_output_dir |
|
export SQUAD_VERSION=v1.1 |
|
|
|
python run_squad.py \ |
|
--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ |
|
--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ |
|
--predict_file=${SQUAD_DIR}/dev-v1.1.json \ |
|
--sp_model_file=${ALBERT_DIR}/30k-clean.model \ |
|
--bert_config_file=$ALBERT_DIR/albert_config.json \ |
|
--init_checkpoint=$ALBERT_DIR/bert_model.ckpt \ |
|
--train_batch_size=4 \ |
|
--predict_batch_size=4 \ |
|
--learning_rate=8e-5 \ |
|
--num_train_epochs=2 \ |
|
--model_dir=${MODEL_DIR} \ |
|
--distribution_strategy=mirrored |
|
``` |
|
|
|
Similarily, you can replace `init_checkpoint` FLAGS with `hub_module_url` to |
|
specify a hub module path. |
|
|
|
To use TPU, you need switch distribution strategy type to `tpu` with TPU |
|
information. |
|
|
|
```shell |
|
export ALBERT_DIR=gs://cloud-tpu-checkpoints/albert/checkpoints/albert_v2_base |
|
export TPU_IP_ADDRESS='???' |
|
export MODEL_DIR=gs://some_bucket/my_output_dir |
|
export SQUAD_DIR=gs://some_bucket/datasets |
|
export SQUAD_VERSION=v1.1 |
|
|
|
python run_squad.py \ |
|
--input_meta_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_meta_data \ |
|
--train_data_path=${SQUAD_DIR}/squad_${SQUAD_VERSION}_train.tf_record \ |
|
--predict_file=${SQUAD_DIR}/dev-v1.1.json \ |
|
--sp_model_file=${ALBERT_DIR}/30k-clean.model \ |
|
--bert_config_file=$ALBERT_DIR/albert_config.json \ |
|
--init_checkpoint=$ALBERT_DIR/bert_model.ckpt \ |
|
--train_batch_size=32 \ |
|
--learning_rate=8e-5 \ |
|
--num_train_epochs=2 \ |
|
--model_dir=${MODEL_DIR} \ |
|
--distribution_strategy=tpu \ |
|
--tpu=grpc://${TPU_IP_ADDRESS}:8470 |
|
``` |
|
|
|
The dev set predictions will be saved into a file called predictions.json in the |
|
model_dir: |
|
|
|
```shell |
|
python $SQUAD_DIR/evaluate-v1.1.py $SQUAD_DIR/dev-v1.1.json ./squad/predictions.json |
|
``` |
|
|