--- datasets: - facebook/multilingual_librispeech language: - it base_model: - SWivid/F5-TTS pipeline_tag: text-to-speech license: cc-by-4.0 --- This is a test to see how to finetune F5 in italian Trained over 247+h hours of "train" split of facebook/multilingual_librispeech dataset, 6700 steps for Epoch: - catastrophic failure (the model forgot english) - italian pronunciation not perfect The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training