Text-to-Speech
Greek
English
PetrosStav commited on
Commit
b8808ae
1 Parent(s): 2cf0fcd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -5
README.md CHANGED
@@ -11,18 +11,38 @@ base_model:
11
  pipeline_tag: text-to-speech
12
  ---
13
 
14
- F5-TTS model finetuned to speak Greek.
 
 
15
 
16
  (This work is under development and is in beta version.)
17
 
18
  Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.
19
 
20
- Model can generate Greek text with Greek reference audio, English text with English reference, and mix of Greek and English (quality here needs improvement, and many runs might be needed).
 
 
21
 
22
- Dataset consists of:
23
  - Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0)
24
- - Greek Single Speaker Speech (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset)
25
  - Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar)
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  Github: https://github.com/SWivid/F5-TTS
28
- Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
 
 
11
  pipeline_tag: text-to-speech
12
  ---
13
 
14
+ # F5-TTS-Greek
15
+
16
+ ## F5-TTS model finetuned to speak Greek
17
 
18
  (This work is under development and is in beta version.)
19
 
20
  Finetuned on Greek speech datasets and a small part of Emilia-EN dataset to prevent catastrophic forgetting of English.
21
 
22
+ Model can generate Greek text with Greek reference speech, English text with English reference speech, and mix of Greek and English (quality here needs improvement, and many runs might be needed to get good results).
23
+
24
+ ## Datasets used:
25
 
 
26
  - Common Voice 12.0 (All Greek Splits) (https://huggingface.co/datasets/mozilla-foundation/common_voice_12_0)
27
+ - Greek Single Speaker Speech Dataset (https://www.kaggle.com/datasets/bryanpark/greek-single-speaker-speech-dataset)
28
  - Small part of Emilia Dataset (https://huggingface.co/datasets/amphion/Emilia-Dataset) (EN-B000049.tar)
29
 
30
+ ## Training arguments
31
+
32
+ Learning Rate: 0.00001
33
+ Batch Size per GPU: 3200
34
+ Max Samples: 64
35
+ Gradient Accumulation Steps: 1
36
+ Max Gradient Norm: 1
37
+ Epochs: 277
38
+ Warmup Updates: 1274
39
+ Save per Updates: 25000
40
+ Last per Steps: 1000
41
+ mixed_precision: fp16
42
+
43
+
44
+ ## Links:
45
+
46
  Github: https://github.com/SWivid/F5-TTS
47
+
48
+ Paper: F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching (https://arxiv.org/abs/2410.06885)