mt5-small-prompted-germanquad-1
This model is a fine-tuned version of google/mt5-small on an philschmid/prompted-germanquad dataset. A prompt datasets using the BigScience PromptSource library. The dataset is a copy of germanquad with applying the squad
template and translated it to german. TEMPLATE.
This is a first test if it is possible to fine-tune mt5
models to solve similar tasks than T0
of big science but for the German language.
It achieves the following results on the evaluation set:
- Loss: 1.6835
- Rouge1: 27.7309
- Rouge2: 18.7311
- Rougel: 27.4704
- Rougelsum: 27.4818
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 7
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
3.3795 | 1.0 | 17496 | 2.0693 | 15.8652 | 9.2569 | 15.6237 | 15.6142 |
2.3582 | 2.0 | 34992 | 1.9057 | 21.9348 | 14.0057 | 21.6769 | 21.6825 |
2.1809 | 3.0 | 52488 | 1.8143 | 24.3401 | 16.0354 | 24.0862 | 24.0914 |
2.0721 | 4.0 | 69984 | 1.7563 | 25.8672 | 17.2442 | 25.5854 | 25.6051 |
2.0004 | 5.0 | 87480 | 1.7152 | 27.0275 | 18.0548 | 26.7561 | 26.7685 |
1.9531 | 6.0 | 104976 | 1.6939 | 27.4702 | 18.5156 | 27.2027 | 27.2107 |
1.9218 | 7.0 | 122472 | 1.6835 | 27.7309 | 18.7311 | 27.4704 | 27.4818 |
Framework versions
- Transformers 4.14.1
- Pytorch 1.10.1+cu102
- Datasets 1.16.1
- Tokenizers 0.10.3
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.