byt5-small-finetuned-yiddish-experiment-11

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 600
num_epochs: 30

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
9.199	1.8868	100	10.8497	0.2853	0.7144
9.1103	3.7736	200	10.3073	0.2701	0.6874
8.5159	5.6604	300	9.1649	0.2579	0.6603
7.6411	7.5472	400	7.9065	0.2445	0.6396
6.8548	9.4340	500	6.6809	0.2340	0.6237
6.1063	11.3208	600	5.4130	0.2272	0.6142
4.7529	13.2075	700	4.1840	0.2224	0.6126
3.7885	15.0943	800	3.1426	0.2183	0.6110
2.9438	16.9811	900	2.1589	0.2141	0.6038
2.1457	18.8679	1000	1.4059	0.2101	0.5951
1.6163	20.7547	1100	1.2903	0.2053	0.5863
1.3877	22.6415	1200	1.2429	0.2024	0.5855
1.3156	24.5283	1300	1.2100	0.1984	0.5784
1.2623	26.4151	1400	1.1897	0.1981	0.5776
1.2381	28.3019	1500	1.1805	0.1974	0.5776