--- license: bsd --- # Model Card for FALCO-TTS This model implements a three-stage, SPEAR-TTS-like model, supporting zero-shot and cross-language speech synthesis.

We trained this model on the corpus MLS (https://openslr.org/94/) and WenetSpeech (https://openslr.org/121/), utilizing about 20,000 hours data, including English and Mandarin part.

This model have the auto code-switch capability. ## Model Details |Model |Parameters |Attention |Output Vocab size |:--- |:---- |:--- |:--- |text_to_semantic |240 M |Causal |1024 |semantic_to_coarse_acoustic |370 M |Causal |1x 1,024 |coarse_acoustic_to_fine_acoustic |370 M |No Causal |7x 1,024