Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,53 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nc-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-sa-4.0
|
3 |
+
datasets:
|
4 |
+
- mozilla-foundation/common_voice_17_0
|
5 |
+
- bond005/sberdevices_golos_10h_crowd
|
6 |
+
- bond005/sova_rudevices
|
7 |
+
- Aniemore/resd_annotated
|
8 |
+
language:
|
9 |
+
- ru
|
10 |
+
base_model:
|
11 |
+
- SWivid/F5-TTS
|
12 |
+
---
|
13 |
+
## Overview
|
14 |
+
The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words
|
15 |
+
Differences from the original model: the phoneme alignment was used during training, whereas a duration predictor is used during inference.
|
16 |
+
|
17 |
+
## License
|
18 |
+
This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
|
19 |
+
|
20 |
+
## Model Information
|
21 |
+
**Base Model**: SWivid/F5-TTS
|
22 |
+
**Total Training Duration:** 250.000 steps
|
23 |
+
|
24 |
+
**Training Configuration:**
|
25 |
+
```json
|
26 |
+
"exp_name": "F5TTS_Base",
|
27 |
+
"learning_rate": 1e-05,
|
28 |
+
"batch_size_per_gpu": 4500,
|
29 |
+
"batch_size_type": "frame",
|
30 |
+
"max_samples": 64,
|
31 |
+
"grad_accumulation_steps": 1,
|
32 |
+
"max_grad_norm": 1,
|
33 |
+
"epochs": 144,
|
34 |
+
"num_warmup_updates": 5838,
|
35 |
+
"save_per_updates": 11676,
|
36 |
+
"last_per_steps": 2918,
|
37 |
+
"finetune": true,
|
38 |
+
"file_checkpoint_train": "",
|
39 |
+
"tokenizer_type": "char",
|
40 |
+
"tokenizer_file": "",
|
41 |
+
"mixed_precision": "fp16",
|
42 |
+
"logger": "wandb",
|
43 |
+
"bnb_optimizer": true
|
44 |
+
```
|
45 |
+
|
46 |
+
## Usage Instructions
|
47 |
+
Go to [base repo](https://github.com/SWivid/F5-TTS)
|
48 |
+
|
49 |
+
## To do
|
50 |
+
- Multi-speaker model
|
51 |
+
|
52 |
+
# Other links
|
53 |
+
- [Github repo](https://github.com/sinhprous/F5-TTS)
|