Safetensors

This is a pruned and re-organized version of SWivid/F5-TTS, made to be used with the fairytaler Python library, an unofficial reimplementation of F5TTS made for fast and lightweight inference.

Installation

Fairytaler assumes you have a working CUDA environment to install into.

pip install fairytaler

This will install the reimplementation library.

How to Use

You do not need to pre-download anything, necessary data will be downloaded at runtime.

Command Line

Use the fairytaler binary from the command line like so:

fairytaler examples/reference.wav examples/reference.txt "Fairytaler is an unofficial minimal re-implementation of F5 TTS."
Reference Audio Generated Audio

Reference audio sourced from DiPCo

Many options are available, for complete documentation run fairytaler --help.

Python

from fairytaler import F5TTSPipeline

pipeline = F5TTSPipeline.from_pretrained("benjamin-paine/fairytaler", device="auto")
output_wav_file = pipeline(
  text="Hello, this is some test audio!",
  reference_audio="examples/reference.wav",
  reference_text="examples/reference.txt",
  output_save=True
)
print(f"Output saved to {output_wav_file}")

The full execution signature is:

def __call__(
    self,
    text: Union[str, List[str]],
    reference_audio: AudioType,
    reference_text: str,
    reference_sample_rate: Optional[int]=None,
    seed: SeedType=None,
    speed: float=1.0,
    sway_sampling_coef: float=-1.0,
    target_rms: float=0.1,
    cross_fade_duration: float=0.15,
    punctuation_pause_duration: float=0.10,
    num_steps: int=32,
    cfg_strength: float=2.0,
    fix_duration: Optional[float]=None,
    use_tqdm: bool=False,
    output_format: AUDIO_OUTPUT_FORMAT_LITERAL="wav",
    output_save: bool=False,
    chunk_callback: Optional[Callable[[AudioResultType], None]]=None,
    chunk_callback_format: AUDIO_OUTPUT_FORMAT_LITERAL="float",
) -> AudioResultType

Format values are wav, ogg, flac, mp3, float and int. Passing output_save=True will save to file, not passing it will return the data directly.

Citations

@misc{chen2024f5ttsfairytalerfakesfluent,
      title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching}, 
      author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
      year={2024},
      eprint={2410.06885},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2410.06885}, 
}

@misc{vansegbroeck2019dipcodinnerparty,
      title={DiPCo -- Dinner Party Corpus}, 
      author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas},
      year={2019},
      eprint={1909.13447},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/1909.13447}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for benjamin-paine/fairytaler

Base model

SWivid/F5-TTS
Finetuned
(25)
this model