|
--- |
|
license: bsd |
|
--- |
|
|
|
# Model Card for FALCO-TTS |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model implements a three-stage, SPEAR-TTS-like model, supporting zero-shot and cross-language speech synthesis. </p> |
|
|
|
We trained this model on the corpus MLS (https://openslr.org/94/) and WenetSpeech (https://openslr.org/121/), utilizing about 20,000 hours data, including English and Mandarin part. </p> |
|
|
|
This model have the auto code-switch capability. |
|
|
|
## Model Details |
|
|
|
|Model |Parameters |Attention |Output Vocab size |
|
|:--- |:---- |:--- |:--- |
|
|text_to_semantic |240 M |Causal |1024 |
|
|semantic_to_acoustic |370 M |Causal |8x 1,024 |