SpeechT5 TTS YOR_v2
This model is a fine-tuned version of microsoft/speecht5_tts on the lagyamfi/yor_bible dataset. It achieves the following results on the evaluation set:
- Loss: 0.4262
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 6
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 20
- total_train_batch_size: 480
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1000
- num_epochs: 20
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 0.9966 | 58 | 0.4736 |
No log | 1.9931 | 116 | 0.4692 |
No log | 2.9897 | 174 | 0.4660 |
No log | 3.9863 | 232 | 0.4698 |
No log | 5.0 | 291 | 0.4874 |
No log | 5.9966 | 349 | 0.4755 |
No log | 6.9931 | 407 | 0.4630 |
No log | 7.9897 | 465 | 0.4591 |
No log | 8.9863 | 523 | 0.4646 |
No log | 10.0 | 582 | 0.4536 |
No log | 10.9966 | 640 | 0.4585 |
No log | 11.9931 | 698 | 0.4745 |
No log | 12.9897 | 756 | 0.4645 |
No log | 13.9863 | 814 | 0.4618 |
No log | 15.0 | 873 | 0.4444 |
No log | 15.9966 | 931 | 0.4517 |
No log | 16.9931 | 989 | 0.4395 |
0.4803 | 17.9897 | 1047 | 0.4523 |
0.4803 | 18.9863 | 1105 | 0.4335 |
0.4803 | 19.9313 | 1160 | 0.4262 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 43
Model tree for Lagyamfi/speecht5_tts_lagyamfi_yor_v2
Base model
microsoft/speecht5_tts