Edit model card

Model Card for FALCO-TTS

This model implements a three-stage, SPEAR-TTS-like model, supporting zero-shot and cross-language speech synthesis.

We trained this model on the corpus MLS (https://openslr.org/94/) and WenetSpeech (https://openslr.org/121/), utilizing about 20,000 hours data, including English and Mandarin part.

This model have the auto code-switch capability.

Model Details

Model Parameters Attention Output Vocab size
text_to_semantic 240 M Causal 1024
semantic_to_acoustic 370 M Causal 8x 1,024
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .