wav2vec2_xls_r_300m_nchlt_speech_corpus_ZULU_50hr_v2

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.5477
Wer: 0.6237
Cer: 0.2418

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer	Cer
4.7199	1.0	631	0.2743	0.3677	0.0518
0.3422	2.0	1262	0.1286	0.2212	0.0292
0.2394	3.0	1893	0.0910	0.1485	0.0210
0.2005	4.0	2524	0.0828	0.1420	0.0194
0.1735	5.0	3155	0.0703	0.1111	0.0156
0.1572	6.0	3786	0.0639	0.1046	0.0153
0.1412	7.0	4417	0.0607	0.1584	0.0257
0.1285	8.0	5048	0.0493	0.1001	0.0149
0.1186	9.0	5679	0.0552	0.1370	0.0222
0.1085	10.0	6310	0.0464	0.0912	0.0145
0.1014	11.0	6941	0.0471	0.1166	0.0203
0.0943	12.0	7572	0.0456	0.1231	0.0210
0.0857	13.0	8203	0.0447	0.0867	0.0156
0.08	14.0	8834	0.0461	0.0623	0.0094
0.072	15.0	9465	0.0407	0.0533	0.0080
0.0662	16.0	10096	0.0424	0.0618	0.0088
0.0624	17.0	10727	0.0467	0.0618	0.0092
0.0566	18.0	11358	0.0468	0.0578	0.0090
0.0533	19.0	11989	0.0474	0.0658	0.0105
0.0501	20.0	12620	0.0441	0.0558	0.0087
0.0465	21.0	13251	0.0478	0.0508	0.0082
0.0451	22.0	13882	0.0438	0.0523	0.0083
0.043	23.0	14513	0.0569	0.0812	0.0119
0.0412	24.0	15144	0.0418	0.0463	0.0075
0.0393	25.0	15775	0.0487	0.0508	0.0082
0.0428	26.0	16406	0.0433	0.0533	0.0079
0.0423	27.0	17037	0.0451	0.0538	0.0089
0.038	28.0	17668	0.0455	0.0429	0.0067
0.0331	29.0	18299	0.0444	0.0488	0.0080
0.0329	30.0	18930	0.0423	0.0433	0.0070
0.0322	31.0	19561	0.0476	0.0513	0.0083
0.0328	32.0	20192	0.0441	0.0389	0.0066
0.0312	33.0	20823	0.0469	0.0458	0.0070
0.0299	34.0	21454	0.0435	0.0429	0.0071
0.0292	35.0	22085	0.0461	0.0438	0.0068
0.0282	36.0	22716	0.0415	0.0399	0.0062
0.0296	37.0	23347	0.0476	0.0429	0.0063
0.0317	38.0	23978	0.0454	0.0558	0.0082
0.0301	39.0	24609	0.0441	0.0349	0.0057
0.0263	40.0	25240	0.0467	0.0414	0.0064
0.0273	41.0	25871	0.0435	0.0443	0.0073
0.0269	42.0	26502	0.0463	0.0419	0.0065
0.026	43.0	27133	0.0442	0.0349	0.0056
0.0224	44.0	27764	0.0435	0.0394	0.0062
0.0228	45.0	28395	0.0443	0.0424	0.0066
0.0238	46.0	29026	0.0454	0.0468	0.0069
0.0223	47.0	29657	0.0472	0.0379	0.0063
0.0213	48.0	30288	0.0439	0.0349	0.0060
0.0214	49.0	30919	0.0437	0.0344	0.0059

Framework versions

Transformers 4.44.2
Pytorch 2.1.0+cu118
Datasets 2.21.0
Tokenizers 0.19.1

asr-africa
/

wav2vec2_xls_r_300m_NCHLT_Speech_corpus_zulu_50hr_v2

You need to agree to share your contact information to access this model

wav2vec2_xls_r_300m_nchlt_speech_corpus_ZULU_50hr_v2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for asr-africa/wav2vec2_xls_r_300m_NCHLT_Speech_corpus_zulu_50hr_v2

Collection including asr-africa/wav2vec2_xls_r_300m_NCHLT_Speech_corpus_zulu_50hr_v2

Zulu

Evaluation results