--- license: cc-by-nc-4.0 datasets: - amphion/Emilia-Dataset - mozilla-foundation/common_voice_12_0 language: - el - en base_model: - SWivid/F5-TTS pipeline_tag: text-to-speech --- F5-TTS model finetuned to speak Greek. (This work is under development and is in beta version.) Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English. Model can generate Greek text with Greek reference audio, English text with English reference, and mix of Greek and English (quality here needs improvement, and many runs might be needed). Dataset consists of: - Common Voice 12.0 (All Greek Splits) - Greek Single Speaker Speech (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset) - Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar) Github: https://github.com/SWivid/F5-TTS Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching