distilbert-base-en-nl-cased-finetuned-squad

This model is a fine-tuned version of Geotrend/distilbert-base-en-nl-cased on the Dutch Squad V2 Dataset dataset [1] and Squad V2 Dataset, specifically tailored for the Question Answer task. It achieves the following results on the evaluation set:

Loss: 1.365628

Model description

The base model, distilbert-base-en-nl-cased, is a smaller version of distilbert-base-multilingual-cased, designed to handle a custom number of languages (only Dutch and English in this case) while preserving the original model's accuracy. It is based on the principles outlined in the paper "Load What You Need: Smaller Versions of Multilingual BERT" by Abdaoui, Pradel, and Sigel (2020) [2].

Intended uses & limitations

This fine-tuned model is optimized for Dutch and English Question Answering tasks. While it may perform well on similar tasks in other languages, its primary strength lies in extracting answers from Dutch or English language contexts. Users are encouraged to consider the model's specific training focus when applying it to different language or task scenarios.

Training and evaluation data

The model was trained on the SQuAD v2.0 and Dutch SQuaD v2.0 Dataset, a machine-translated version of the original SQuAD v2.0 dataset. The statistics for both datasets are as follows:

Statistics

|                         | SQuAD v2.0       | Dutch SQuAD v2.0 |
|-------------------------|------------------|------------------|
| **Train**               |                  |                  |
| Total examples          |   130,319        |   95,054         |
| Positive examples       |   86,821         |   53,376         |
| Negative examples       |   43,498         |   41,768         |
| **Development**         |                  |                  |
| Total examples          |   11,873         |   9,294          |
| Positive examples       |   5,928          |   3,588          |
| Negative examples       |   5,945          |   5,706          |

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
1.1848	1.0	14490	1.2318
0.858	2.0	28980	1.3102
0.7736	3.0	43470	1.3656

Raw Results English Validation

{
 'exact': 64.58350880148235,
 'f1': 67.75331200952289,
 'total': 11873,
 'HasAns_exact': 63.444669365722,
 'HasAns_f1': 69.79336597318874,
 'HasAns_total': 5928,
 'NoAns_exact': 65.71909167367535,
 'NoAns_f1': 65.71909167367535,
 'NoAns_total': 5945,
 'best_exact': 64.59193127263539,
 'best_exact_thresh': 0.0,
 'best_f1': 67.7617344806761,
 'best_f1_thresh': 0.0
}

Raw Results Dutch Validation

{
 'exact': 60.84570690768238,
 'f1': 64.43587372397303,
 'total': 9294,
 'HasAns_exact': 41.55518394648829,
 'HasAns_f1': 50.85479665289972,
 'HasAns_total': 3588,
 'NoAns_exact': 72.97581493165089,
 'NoAns_f1': 72.97581493165089,
 'NoAns_total': 5706,
 'best_exact': 62.48117064772972,
 'best_exact_thresh': 0.0,
 'best_f1': 64.7369571909547,
 'best_f1_thresh': 0.0
}

Model Usage

To use this model, you can follow the example below:

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="tclungu/distilbert-base-en-nl-cased-finetuned-squad",
    tokenizer="tclungu/distilbert-base-en-nl-cased-finetuned-squad"
)

print(qa_pipeline({
    'context': "Amsterdam is de hoofdstad en de dichtstbevolkte stad van Nederland.",
    'question': "Wat is de hoofdstad van Nederland?"}))

print(qa_pipeline({
    'context': "Amsterdam is the capital and most populous city of the Netherlands.",
    'question': "Wat is the capital of the Netherlands?"}))

Output

{'score': 0.9931294918060303, 'start': 0, 'end': 9, 'answer': 'Amsterdam'}
{'score': 0.9845369458198547, 'start': 0, 'end': 9, 'answer': 'Amsterdam'}

Framework versions

Transformers 4.33.3
Pytorch 2.0.1
Datasets 2.14.5
Tokenizers 0.13.3

References

[1] Rouws, N. J., Vakulenko, S., & Katrenko, S. (2022). Dutch squad and ensemble learning for question answering from labour agreements. In Artificial Intelligence and Machine Learning: 33rd Benelux Conference on Artificial Intelligence, BNAIC/Benelearn 2021, Esch-sur-Alzette, Luxembourg, November 10–12, 2021, Revised Selected Papers 33 (pp. 155-169). Springer International Publishing.
[2] Abdaoui, A., Pradel, C., & Sigel, G. (2020). Load what you need: Smaller versions of multilingual bert. arXiv preprint arXiv:2010.05609.

tclungu
/

distilbert-base-en-nl-cased-finetuned-squad