File size: 702 Bytes
32c0a32 0d6ee2c 2d32562 78f1bdc 366bc83 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
---
datasets:
- facebook/multilingual_librispeech
language:
- it
base_model:
- SWivid/F5-TTS
pipeline_tag: text-to-speech
license: cc-by-4.0
---
This is a test to see how to finetune F5 in italian
Trained over 9h split of facebook/multilingual_librispeech dataset for 200 Epoch:
- catastrophic failure (the model forgot english)
- lost ability to clone voice properly
- italian pronunciation not yet good enough
The last produced file, the one to test, is model_italian_200e_9h.safetensors
The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training
UPDATE:
trying to finetune on the full Italian "train" split of the same dataset with 247 hours |