BERT

I used a Bert model fine tuned on SQUAD v2 and then I fine tuned it on QNLI using compression (with a constant replacing rate) as proposed in BERT-of-Theseus

Details of the downstream task (QNLI):

Getting the dataset

wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/train.tsv
wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/test.tsv
wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/dev.tsv

mkdir QNLI_dataset
mv *.tsv QNLI_dataset

Model training

The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:

!python /content/BERT-of-Theseus/run_glue.py \
  --model_name_or_path deepset/bert-base-cased-squad2 \
  --task_name qnli \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir /content/QNLI_dataset \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --save_steps 2000 \
  --num_train_epochs 50 \
  --output_dir /content/ouput_dir \
  --evaluate_during_training \
  --replacing_rate 0.7 \
  --steps_for_replacing 2500 

Metrics:

Model Accuracy
BERT-base 91.2
BERT-of-Theseus 88.8
bert-uncased-finetuned-qnli 87.2
DistillBERT 85.3

See all my models

Created by Manuel Romero/@mrm8488

Made with ♥ in Spain

Downloads last month
21
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.