byt5-small-finetuned-yiddish-experiment-10

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3450
Cer: 0.1505
Wer: 0.4654

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 600
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
10.741	0.4717	100	10.9313	0.2881	0.7176
7.6063	0.9434	200	10.5495	0.2706	0.6850
8.4739	1.4151	300	9.8632	0.2572	0.6595
8.3278	1.8868	400	8.9330	0.2470	0.6396
8.0051	2.3585	500	7.9314	0.2354	0.6181
7.7765	2.8302	600	7.0184	0.2308	0.6150
5.6897	3.3019	700	6.0913	0.2245	0.6094
5.3547	3.7736	800	5.1003	0.2186	0.6038
4.9118	4.2453	900	4.3067	0.2174	0.6030
3.9777	4.7170	1000	3.5975	0.2130	0.5982
3.5601	5.1887	1100	2.8719	0.2098	0.5959
2.821	5.6604	1200	2.2820	0.2069	0.5919
2.2335	6.1321	1300	1.7483	0.2047	0.5887
1.8581	6.6038	1400	1.3001	0.2008	0.5823
1.6247	7.0755	1500	1.1757	0.1982	0.5744
1.3292	7.5472	1600	1.1475	0.1939	0.5688
1.1853	8.0189	1700	1.0804	0.1920	0.5688
1.077	8.4906	1800	0.8688	0.1902	0.5656
0.9039	8.9623	1900	0.7849	0.1683	0.4972
0.7846	9.4340	2000	0.7405	0.1667	0.4964
0.7805	9.9057	2100	0.6959	0.1644	0.4893
0.7415	10.3774	2200	0.6571	0.1615	0.4853
0.6541	10.8491	2300	0.6114	0.1602	0.4869
0.6443	11.3208	2400	0.5624	0.1590	0.4845
0.5984	11.7925	2500	0.5103	0.1579	0.4805
0.5499	12.2642	2600	0.4620	0.1576	0.4813
0.5194	12.7358	2700	0.4317	0.1570	0.4773
0.5052	13.2075	2800	0.4088	0.1565	0.4781
0.4724	13.6792	2900	0.3981	0.1562	0.4757
0.4601	14.1509	3000	0.3827	0.1564	0.4765
0.4342	14.6226	3100	0.3803	0.1541	0.4741
0.432	15.0943	3200	0.3719	0.1556	0.4749
0.4365	15.5660	3300	0.3700	0.1550	0.4733
0.4094	16.0377	3400	0.3660	0.1538	0.4710
0.4126	16.5094	3500	0.3610	0.1538	0.4741
0.3976	16.9811	3600	0.3614	0.1534	0.4694
0.3933	17.4528	3700	0.3600	0.1522	0.4694
0.4019	17.9245	3800	0.3539	0.1513	0.4686
0.3813	18.3962	3900	0.3598	0.1522	0.4694
0.3812	18.8679	4000	0.3551	0.1519	0.4678
0.382	19.3396	4100	0.3517	0.1508	0.4670
0.3887	19.8113	4200	0.3502	0.1510	0.4678
0.3756	20.2830	4300	0.3520	0.1516	0.4686
0.3761	20.7547	4400	0.3499	0.1514	0.4670
0.38	21.2264	4500	0.3480	0.1507	0.4670
0.3673	21.6981	4600	0.3484	0.1514	0.4678
0.3778	22.1698	4700	0.3472	0.1507	0.4670
0.3642	22.6415	4800	0.3475	0.1507	0.4662
0.3701	23.1132	4900	0.3468	0.1511	0.4662
0.3753	23.5849	5000	0.3460	0.1510	0.4670
0.3672	24.0566	5100	0.3458	0.1508	0.4662
0.3711	24.5283	5200	0.3453	0.1508	0.4662
0.3631	25.0	5300	0.3457	0.1507	0.4662
0.3733	25.4717	5400	0.3456	0.1508	0.4670
0.3667	25.9434	5500	0.3455	0.1508	0.4662
0.3568	26.4151	5600	0.3455	0.1507	0.4662
0.3729	26.8868	5700	0.3453	0.1508	0.4662
0.3652	27.3585	5800	0.3452	0.1507	0.4662
0.3658	27.8302	5900	0.3450	0.1505	0.4654
0.3621	28.3019	6000	0.3448	0.1507	0.4654
0.3724	28.7736	6100	0.3449	0.1508	0.4662
0.3594	29.2453	6200	0.3448	0.1508	0.4662
0.3643	29.7170	6300	0.3448	0.1508	0.4662

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 2.14.4
Tokenizers 0.21.0

Addaci
/

byt5-small-finetuned-yiddish-experiment-10

byt5-small-finetuned-yiddish-experiment-10

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Addaci/byt5-small-finetuned-yiddish-experiment-10

Spaces using Addaci/byt5-small-finetuned-yiddish-experiment-10 2

Evaluation results