|
--- |
|
datasets: |
|
- facebook/multilingual_librispeech |
|
language: |
|
- it |
|
base_model: |
|
- SWivid/F5-TTS |
|
pipeline_tag: text-to-speech |
|
license: cc-by-4.0 |
|
--- |
|
|
|
This is a test to see how to finetune F5 in italian |
|
|
|
Trained over 9h split of facebook/multilingual_librispeech dataset for 200 Epoch: |
|
- catastrophic failure (the model forgot english) |
|
- lost ability to clone voice properly |
|
- italian pronunciation not yet good enough |
|
|
|
The last produced file, the one to test, is model_italian_200e_9h.safetensors |
|
|
|
The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training |
|
|
|
UPDATE: |
|
|
|
trying to finetune on the full Italian "train" split of the same dataset with 247 hours |