PetrosStav
commited on
Commit
•
b8808ae
1
Parent(s):
2cf0fcd
Update README.md
Browse files
README.md
CHANGED
@@ -11,18 +11,38 @@ base_model:
|
|
11 |
pipeline_tag: text-to-speech
|
12 |
---
|
13 |
|
14 |
-
F5-TTS
|
|
|
|
|
15 |
|
16 |
(This work is under development and is in beta version.)
|
17 |
|
18 |
Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.
|
19 |
|
20 |
-
Model can generate Greek text with Greek reference
|
|
|
|
|
21 |
|
22 |
-
Dataset consists of:
|
23 |
- Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0)
|
24 |
-
- Greek Single Speaker Speech (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset)
|
25 |
- Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar)
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
Github: https://github.com/SWivid/F5-TTS
|
28 |
-
|
|
|
|
11 |
pipeline_tag: text-to-speech
|
12 |
---
|
13 |
|
14 |
+
# F5-TTS-Greek
|
15 |
+
|
16 |
+
## F5-TTS model finetuned to speak Greek
|
17 |
|
18 |
(This work is under development and is in beta version.)
|
19 |
|
20 |
Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.
|
21 |
|
22 |
+
Model can generate Greek text with Greek reference speech, English text with English reference speech, and mix of Greek and English (quality here needs improvement, and many runs might be needed to get good results).
|
23 |
+
|
24 |
+
## Datasets used:
|
25 |
|
|
|
26 |
- Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0)
|
27 |
+
- Greek Single Speaker Speech Dataset (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset)
|
28 |
- Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar)
|
29 |
|
30 |
+
## Training arguments
|
31 |
+
|
32 |
+
Learning Rate: 0.00001
|
33 |
+
Batch Size per GPU: 3200
|
34 |
+
Max Samples: 64
|
35 |
+
Gradient Accumulation Steps: 1
|
36 |
+
Max Gradient Norm: 1
|
37 |
+
Epochs: 277
|
38 |
+
Warmup Updates: 1274
|
39 |
+
Save per Updates: 25000
|
40 |
+
Last per Steps: 1000
|
41 |
+
mixed_precision: fp16
|
42 |
+
|
43 |
+
|
44 |
+
## Links:
|
45 |
+
|
46 |
Github: https://github.com/SWivid/F5-TTS
|
47 |
+
|
48 |
+
Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching (https://arxiv.org/abs/2410.06885)
|