mt5-small-synthetic-data-plus-translated-bs32ep32
This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.9041
- Rouge1: 0.6137
- Rouge2: 0.4715
- Rougel: 0.5917
- Rougelsum: 0.5922
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5.6e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 32
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum |
---|---|---|---|---|---|---|---|
19.5396 | 1.0 | 38 | 11.5658 | 0.0077 | 0.0018 | 0.0067 | 0.0068 |
13.1079 | 2.0 | 76 | 7.8457 | 0.0077 | 0.0008 | 0.0075 | 0.0070 |
9.4043 | 3.0 | 114 | 3.9170 | 0.0234 | 0.0025 | 0.0217 | 0.0210 |
6.388 | 4.0 | 152 | 2.7482 | 0.1030 | 0.0252 | 0.0916 | 0.0910 |
4.5077 | 5.0 | 190 | 2.0482 | 0.1057 | 0.0414 | 0.0917 | 0.0914 |
3.3242 | 6.0 | 228 | 1.6075 | 0.1778 | 0.0897 | 0.1525 | 0.1532 |
2.7 | 7.0 | 266 | 1.3881 | 0.3601 | 0.2141 | 0.3479 | 0.3487 |
2.3089 | 8.0 | 304 | 1.2989 | 0.4295 | 0.2607 | 0.4095 | 0.4091 |
2.1141 | 9.0 | 342 | 1.2346 | 0.4337 | 0.2603 | 0.4147 | 0.4146 |
1.9442 | 10.0 | 380 | 1.1888 | 0.4926 | 0.3337 | 0.4642 | 0.4644 |
1.8082 | 11.0 | 418 | 1.1418 | 0.5101 | 0.3560 | 0.4920 | 0.4929 |
1.7142 | 12.0 | 456 | 1.1052 | 0.5341 | 0.3809 | 0.5154 | 0.5155 |
1.6345 | 13.0 | 494 | 1.0775 | 0.5605 | 0.4071 | 0.5394 | 0.5394 |
1.5983 | 14.0 | 532 | 1.0539 | 0.5790 | 0.4262 | 0.5585 | 0.5580 |
1.5376 | 15.0 | 570 | 1.0322 | 0.5713 | 0.4206 | 0.5531 | 0.5532 |
1.5059 | 16.0 | 608 | 1.0137 | 0.5807 | 0.4302 | 0.5605 | 0.5605 |
1.4434 | 17.0 | 646 | 0.9970 | 0.6069 | 0.4656 | 0.5874 | 0.5871 |
1.442 | 18.0 | 684 | 0.9826 | 0.6104 | 0.4671 | 0.5869 | 0.5874 |
1.4059 | 19.0 | 722 | 0.9688 | 0.6102 | 0.4666 | 0.5886 | 0.5883 |
1.3618 | 20.0 | 760 | 0.9636 | 0.6127 | 0.4683 | 0.5901 | 0.5906 |
1.3341 | 21.0 | 798 | 0.9517 | 0.6065 | 0.4632 | 0.5852 | 0.5860 |
1.3019 | 22.0 | 836 | 0.9397 | 0.6092 | 0.4669 | 0.5886 | 0.5882 |
1.3114 | 23.0 | 874 | 0.9343 | 0.6091 | 0.4663 | 0.5869 | 0.5871 |
1.2906 | 24.0 | 912 | 0.9272 | 0.6137 | 0.4702 | 0.5913 | 0.5912 |
1.255 | 25.0 | 950 | 0.9201 | 0.6153 | 0.4723 | 0.5934 | 0.5934 |
1.261 | 26.0 | 988 | 0.9186 | 0.6174 | 0.4749 | 0.5966 | 0.5965 |
1.2363 | 27.0 | 1026 | 0.9124 | 0.6155 | 0.4738 | 0.5948 | 0.5952 |
1.2993 | 28.0 | 1064 | 0.9078 | 0.6153 | 0.4738 | 0.5937 | 0.5939 |
1.2653 | 29.0 | 1102 | 0.9054 | 0.6126 | 0.4712 | 0.5919 | 0.5922 |
1.2287 | 30.0 | 1140 | 0.9046 | 0.6099 | 0.4676 | 0.5894 | 0.5901 |
1.2279 | 31.0 | 1178 | 0.9041 | 0.6137 | 0.4715 | 0.5917 | 0.5922 |
1.2348 | 32.0 | 1216 | 0.9041 | 0.6137 | 0.4715 | 0.5917 | 0.5922 |
Framework versions
- Transformers 4.47.1
- Pytorch 2.5.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for ak2603/mt5-small-synthetic-data-plus-translated-bs32ep32
Base model
google/mt5-small