BERT
I used a Bert model fine tuned on SQUAD v2 and then I fine tuned it on QNLI using compression (with a constant replacing rate) as proposed in BERT-of-Theseus
Details of the downstream task (QNLI):
Getting the dataset
wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/train.tsv
wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/test.tsv
wget https://raw.githubusercontent.com/rhythmcao/QNLI/master/data/QNLI/dev.tsv
mkdir QNLI_dataset
mv *.tsv QNLI_dataset
Model training
The model was trained on a Tesla P100 GPU and 25GB of RAM with the following command:
!python /content/BERT-of-Theseus/run_glue.py \
--model_name_or_path deepset/bert-base-cased-squad2 \
--task_name qnli \
--do_train \
--do_eval \
--do_lower_case \
--data_dir /content/QNLI_dataset \
--max_seq_length 128 \
--per_gpu_train_batch_size 32 \
--per_gpu_eval_batch_size 32 \
--learning_rate 2e-5 \
--save_steps 2000 \
--num_train_epochs 50 \
--output_dir /content/ouput_dir \
--evaluate_during_training \
--replacing_rate 0.7 \
--steps_for_replacing 2500
Metrics:
Model | Accuracy |
---|---|
BERT-base | 91.2 |
BERT-of-Theseus | 88.8 |
bert-uncased-finetuned-qnli | 87.2 |
DistillBERT | 85.3 |
Created by Manuel Romero/@mrm8488
Made with ♥ in Spain
- Downloads last month
- 21
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.