speecht5_finetuned_indotts

This model is a fine-tuned version of microsoft/speecht5_tts on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 20000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5378	0.4771	500	0.4871
0.5108	0.9542	1000	0.4626
0.4959	1.4313	1500	0.4579
0.4852	1.9084	2000	0.4536
0.481	2.3855	2500	0.4521
0.4759	2.8626	3000	0.4439
0.4697	3.3397	3500	0.4432
0.4666	3.8168	4000	0.4368
0.4623	4.2939	4500	0.4360
0.4597	4.7710	5000	0.4347
0.4554	5.2481	5500	0.4348
0.4552	5.7252	6000	0.4298
0.449	6.2023	6500	0.4307
0.4505	6.6794	7000	0.4270
0.4433	7.1565	7500	0.4284
0.4446	7.6336	8000	0.4274
0.4399	8.1107	8500	0.4246
0.4407	8.5878	9000	0.4231
0.4346	9.0649	9500	0.4217
0.4377	9.5420	10000	0.4216
0.4322	10.0191	10500	0.4196
0.4309	10.4962	11000	0.4186
0.4299	10.9733	11500	0.4161
0.4262	11.4504	12000	0.4258
0.4279	11.9275	12500	0.4176
0.4215	12.4046	13000	0.4165
0.423	12.8817	13500	0.4146
0.4207	13.3588	14000	0.4209
0.4213	13.8359	14500	0.4171
0.4203	14.3130	15000	0.4119
0.4177	14.7901	15500	0.4119
0.4134	15.2672	16000	0.4118
0.4164	15.7443	16500	0.4131
0.4131	16.2214	17000	0.4106
0.4118	16.6985	17500	0.4119
0.4116	17.1756	18000	0.4128
0.4086	17.6527	18500	0.4109
0.4075	18.1298	19000	0.4126
0.4066	18.6069	19500	0.4122
0.4075	19.0840	20000	0.4123