sinhprous
/

F5TTS-stabilized-LJSpeech

Model card Files Files and versions Community

sinhprous commited on 11 days ago

Commit

cbef2ba

·

verified ·

1 Parent(s): 59eed48

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: cc-by-nc-4.0
----

+---
+license: cc-by-nc-sa-4.0
+datasets:
+- mozilla-foundation/common_voice_17_0
+- bond005/sberdevices_golos_10h_crowd
+- bond005/sova_rudevices
+- Aniemore/resd_annotated
+language:
+- ru
+base_model:
+- SWivid/F5-TTS
+---
+## Overview
+The F5-TTS model is fine-tuned on the LJSpeech dataset with an emphasis on stability, ensuring it avoids choppiness, mispronunciations, repetitions, and skipping words
+Differences from the original model: the phoneme alignment was used during training, whereas a duration predictor is used during inference.
+## License
+This model is released under the Creative Commons Attribution Non Commercial Share Alike 4.0 license, which allows for free usage, modification, and distribution
+## Model Information
+**Base Model**: SWivid/F5-TTS
+**Total Training Duration:** 250.000 steps
+**Training Configuration:**
+```json
+"exp_name": "F5TTS_Base",
+"learning_rate": 1e-05,
+"batch_size_per_gpu": 4500,
+"batch_size_type": "frame",
+"max_samples": 64,
+"grad_accumulation_steps": 1,
+"max_grad_norm": 1,
+"epochs": 144,
+"num_warmup_updates": 5838,
+"save_per_updates": 11676,
+"last_per_steps": 2918,
+"finetune": true,
+"file_checkpoint_train": "",
+"tokenizer_type": "char",
+"tokenizer_file": "",
+"mixed_precision": "fp16",
+"logger": "wandb",
+"bnb_optimizer": true
+```
+## Usage Instructions
+Go to [base repo](https://github.com/SWivid/F5-TTS)
+## To do
+- Multi-speaker model
+# Other links
+- [Github repo](https://github.com/sinhprous/F5-TTS)