Model Card for FALCO-TTS
This model implements a three-stage, SPEAR-TTS-like model, supporting zero-shot and cross-language speech synthesis.
We trained this model on the corpus MLS (https://openslr.org/94/) and WenetSpeech (https://openslr.org/121/), utilizing about 20,000 hours data, including English and Mandarin part.
This model have the auto code-switch capability.
Model Details
Model | Parameters | Attention | Output Vocab size |
---|---|---|---|
text_to_semantic | 240 M | Causal | 1024 |
semantic_to_acoustic | 370 M | Causal | 8x 1,024 |