metadata
license: apache-2.0
library_name: peft
tags:
- Summarization
- generated_from_trainer
datasets:
- cnn_dailymail
metrics:
- rouge
base_model: google/flan-t5-base
model-index:
- name: flan-t5-base-prompt_tuning-cnn-dailymail
results: []
flan-t5-base-prompt_tuning-cnn-dailymail
This model is a fine-tuned version of google/flan-t5-base on the cnn_dailymail dataset. It achieves the following results on the evaluation set:
- Loss: 19.3074
- Rouge1: 0.0787
- Rouge2: 0.0088
- Rougel: 0.0609
- Rougelsum: 0.0733
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.03
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 40
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
20.9307 | 1.0 | 188 | 19.4344 | 0.1471 | 0.0433 | 0.1103 | 0.1337 |
20.4274 | 2.0 | 376 | 20.1245 | 0.1199 | 0.0299 | 0.0953 | 0.1135 |
20.1641 | 3.0 | 564 | 19.5964 | 0.1178 | 0.024 | 0.0909 | 0.1072 |
20.5294 | 4.0 | 752 | 19.2955 | 0.1164 | 0.0213 | 0.0882 | 0.1055 |
20.6452 | 5.0 | 940 | 19.4288 | 0.1179 | 0.0239 | 0.0895 | 0.1072 |
20.6916 | 6.0 | 1128 | 19.1208 | 0.0997 | 0.0186 | 0.0795 | 0.093 |
20.8065 | 7.0 | 1316 | 18.9300 | 0.0865 | 0.0116 | 0.0688 | 0.08 |
20.1431 | 8.0 | 1504 | 19.7751 | 0.1118 | 0.0247 | 0.0869 | 0.1023 |
20.5281 | 9.0 | 1692 | 20.0590 | 0.1216 | 0.0278 | 0.0923 | 0.1118 |
20.1805 | 10.0 | 1880 | 19.3949 | 0.1025 | 0.0145 | 0.0818 | 0.0948 |
20.4289 | 11.0 | 2068 | 19.1645 | 0.0844 | 0.0086 | 0.0656 | 0.0753 |
20.1469 | 12.0 | 2256 | 19.4850 | 0.0905 | 0.0062 | 0.0697 | 0.0831 |
20.9285 | 13.0 | 2444 | 19.3351 | 0.0853 | 0.0077 | 0.067 | 0.0785 |
20.1419 | 14.0 | 2632 | 19.1241 | 0.0886 | 0.0097 | 0.0684 | 0.0822 |
20.5547 | 15.0 | 2820 | 19.1532 | 0.0897 | 0.0077 | 0.0704 | 0.0804 |
19.5719 | 16.0 | 3008 | 19.2346 | 0.0885 | 0.0107 | 0.0659 | 0.0794 |
20.3043 | 17.0 | 3196 | 19.3873 | 0.105 | 0.0188 | 0.0829 | 0.0949 |
20.5935 | 18.0 | 3384 | 19.3345 | 0.1132 | 0.0203 | 0.0874 | 0.1025 |
20.413 | 19.0 | 3572 | 18.8964 | 0.0751 | 0.0065 | 0.0593 | 0.0686 |
19.9286 | 20.0 | 3760 | 18.8474 | 0.0813 | 0.0082 | 0.0648 | 0.0725 |
19.9246 | 21.0 | 3948 | 19.3425 | 0.0844 | 0.0096 | 0.0694 | 0.0765 |
20.4844 | 22.0 | 4136 | 19.4680 | 0.1012 | 0.0143 | 0.0782 | 0.0923 |
20.1571 | 23.0 | 4324 | 19.5483 | 0.0808 | 0.0093 | 0.0665 | 0.0762 |
20.0099 | 24.0 | 4512 | 18.5052 | 0.056 | 0.0029 | 0.0479 | 0.0516 |
19.6279 | 25.0 | 4700 | 18.7629 | 0.0735 | 0.0082 | 0.0603 | 0.0649 |
19.303 | 26.0 | 4888 | 19.3608 | 0.1015 | 0.0124 | 0.0766 | 0.0885 |
20.8774 | 27.0 | 5076 | 19.3038 | 0.1008 | 0.013 | 0.0807 | 0.0932 |
20.1431 | 28.0 | 5264 | 19.3426 | 0.0991 | 0.0156 | 0.078 | 0.0918 |
20.4304 | 29.0 | 5452 | 19.3918 | 0.0905 | 0.0102 | 0.0734 | 0.0812 |
19.6689 | 30.0 | 5640 | 19.3527 | 0.088 | 0.0105 | 0.0669 | 0.0785 |
20.661 | 31.0 | 5828 | 19.4042 | 0.0996 | 0.0149 | 0.0767 | 0.0887 |
20.2962 | 32.0 | 6016 | 19.3871 | 0.0758 | 0.0101 | 0.0617 | 0.0702 |
20.5865 | 33.0 | 6204 | 19.3255 | 0.0786 | 0.0106 | 0.064 | 0.0733 |
21.4763 | 34.0 | 6392 | 19.3113 | 0.0755 | 0.0087 | 0.0623 | 0.0688 |
21.3826 | 35.0 | 6580 | 19.3089 | 0.075 | 0.0076 | 0.0609 | 0.0689 |
20.8869 | 36.0 | 6768 | 19.3614 | 0.0906 | 0.0143 | 0.0692 | 0.0812 |
20.527 | 37.0 | 6956 | 19.3784 | 0.0874 | 0.0099 | 0.0686 | 0.0797 |
19.5026 | 38.0 | 7144 | 19.4145 | 0.0888 | 0.0111 | 0.068 | 0.0823 |
19.3852 | 39.0 | 7332 | 19.3794 | 0.0815 | 0.0093 | 0.0616 | 0.0742 |
20.5347 | 40.0 | 7520 | 19.3074 | 0.0787 | 0.0088 | 0.0609 | 0.0733 |
Framework versions
- PEFT 0.8.2
- Transformers 4.37.0
- Pytorch 2.1.2
- Datasets 2.1.0
- Tokenizers 0.15.1