|
--- |
|
datasets: |
|
- facebook/multilingual_librispeech |
|
language: |
|
- it |
|
base_model: |
|
- SWivid/F5-TTS |
|
pipeline_tag: text-to-speech |
|
license: cc-by-4.0 |
|
--- |
|
|
|
This is a test to see how to finetune F5 in italian |
|
|
|
Trained over 247+h hours of "train" split of facebook/multilingual_librispeech dataset, 6700 steps for Epoch: |
|
- catastrophic failure (the model forgot english) |
|
- italian pronunciation not perfect |
|
|
|
|
|
The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training |