Text-to-Speech
F5-TTS
Italian
File size: 1,342 Bytes
32c0a32
 
 
 
 
 
 
 
0d6ee2c
cc5fae8
2d32562
 
0228986
4b7ec64
 
 
 
0228986
2d32562
0228986
2d32562
0228986
 
 
377975a
2d32562
d0ddad3
5132332
d0ddad3
 
5132332
d0ddad3
 
 
 
 
 
 
 
 
 
78f1bdc
5c7f270
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
datasets:
- facebook/multilingual_librispeech
language:
- it
base_model:
- SWivid/F5-TTS
pipeline_tag: text-to-speech
license: cc-by-4.0
library_name: f5-tts
---

This is an Italian finetune for F5-TTS

> # UPDATE:
> # A better version with improved prosody here => https://huggingface.co/alien79/F5-TTS-italian *

Italian only so can't speak english properly

Trained over 247+h hours of "train" split of facebook/multilingual_librispeech dataset, 6717 steps for Epoch:
- catastrophic failure (the model forgot english)
- italian pronunciation not perfect (there are lot of checkpoints to let you play with and extend training, maybe with different datasets)

# Current most trained model
italian_59kh/model_464400.safetensors (~70 Epoch)

## folder structure:
```
| - italian_59kh
|   | - checkpoints
```

### italian_59kh
Contains the weight at specific steps, the higher the number, the further it went into training.
Weights in this folder can't be used to resume training, use checkpoints instead.

### italian_59kh/checkpoints
Contains the weight of the checkpoints at specific steps, the higher the number, the further it went into training.
Weights in this folder can be used as starting point to continue training.



The run.py file is an example of how to extract the wav files and produce the metadata.csv to use for training