metadata

license: apache-2.0
base_model: google/mt5-base
tags:
  - generated_from_trainer
metrics:
  - rouge
  - sacrebleu
model-index:
  - name: mT5-TextSimp-LT-BatchSize2-lr1e-4
    results: []

mT5-TextSimp-LT-BatchSize2-lr1e-4

This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0672
Rouge1: 0.7548
Rouge2: 0.5989
Rougel: 0.7509
Sacrebleu: 49.0373
Gen Len: 38.0501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Sacrebleu	Gen Len
25.6783	0.24	200	16.0497	0.0109	0.0005	0.0107	0.0029	512.0
1.9593	0.48	400	0.7780	0.014	0.0005	0.0136	0.0146	42.685
0.2778	0.72	600	0.1429	0.4924	0.3128	0.4803	20.3057	38.0382
0.1325	0.96	800	0.1039	0.6193	0.4369	0.6098	33.687	38.0501
0.1702	1.2	1000	0.0958	0.6697	0.5016	0.6613	38.0391	38.0501
0.13	1.44	1200	0.0880	0.6737	0.5051	0.6644	38.62	38.0501
0.1086	1.67	1400	0.0839	0.6964	0.5326	0.6884	40.9056	38.0501
0.0716	1.91	1600	0.0859	0.6933	0.5298	0.6862	40.7158	38.0501
0.1135	2.15	1800	0.0820	0.7017	0.5366	0.6936	40.7484	38.0501
0.0997	2.39	2000	0.0814	0.7011	0.5351	0.6945	41.1948	38.0501
0.0996	2.63	2200	0.0774	0.7103	0.5522	0.7049	42.5756	38.0501
1.1379	2.87	2400	0.0763	0.7211	0.5556	0.7152	43.2411	38.0501
0.0594	3.11	2600	0.0776	0.7261	0.5647	0.7201	44.2205	38.0501
0.0763	3.35	2800	0.0736	0.7309	0.5709	0.7251	45.2825	38.0501
0.1641	3.59	3000	0.0722	0.7297	0.5685	0.7242	44.9001	38.0501
0.1085	3.83	3200	0.0703	0.7377	0.5793	0.7319	45.7504	38.0501
0.0573	4.07	3400	0.0719	0.7393	0.5796	0.7335	45.86	38.0501
0.1149	4.31	3600	0.0705	0.7415	0.5787	0.7365	46.2652	38.0501
0.0843	4.55	3800	0.0703	0.7385	0.5754	0.7326	46.5292	38.0501
0.0658	4.78	4000	0.0705	0.7437	0.5855	0.7384	46.864	38.0501
0.0676	5.02	4200	0.0694	0.7437	0.584	0.7384	47.1268	38.0501
0.0657	5.26	4400	0.0711	0.7473	0.5913	0.7432	47.4413	38.0501
0.0679	5.5	4600	0.0702	0.7496	0.5908	0.7446	47.8281	38.0501
0.0664	5.74	4800	0.0671	0.7511	0.5929	0.7463	47.7693	38.0501
0.0446	5.98	5000	0.0685	0.7533	0.5932	0.7478	48.032	38.0501
0.0732	6.22	5200	0.0678	0.7523	0.5948	0.7472	48.3467	38.0501
0.0706	6.46	5400	0.0672	0.755	0.5983	0.7507	48.6158	38.0501
0.051	6.7	5600	0.0674	0.7523	0.5961	0.7478	48.4828	38.0501
0.067	6.94	5800	0.0681	0.7532	0.5978	0.7492	48.7253	38.0501
0.075	7.18	6000	0.0684	0.7534	0.5969	0.7492	48.7053	38.0501
0.1323	7.42	6200	0.0671	0.755	0.5991	0.7511	48.9922	38.0501
0.0383	7.66	6400	0.0671	0.7551	0.5994	0.7511	49.0028	38.0501
0.0599	7.89	6600	0.0672	0.7548	0.5989	0.7509	49.0373	38.0501

Framework versions

Transformers 4.33.0
Pytorch 2.1.2+cu121
Datasets 2.14.4
Tokenizers 0.13.3