--- license: cc-by-4.0 language: - en - de - es - it - nl - pt - pl - ro - sv - da - fi - hu - el - fr - ru - uk - tr - ar - hi - jp - ko - zh - vi - la - ha - sw - yo - wo thumbnail: https://raw.githubusercontent.com/DanRuta/xVA-Synth/master/assets/x-icon.png library: xvasynth tags: - emotion - audio - text-to-speech - tts pipeline_tag: text-to-speech datasets: - MikhailT/hifi-tts base_model: Pendrokar/xvapitch --- xVASynth's xVAPitch (v3) type of voice models based on NVIDIA HIFI NeMo datasets. Models created by Dan Ruta, origin link: - https://www.nexusmods.com/skyrimspecialedition/mods/65022?tab=files Dataset supposed origin: - https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/core/core.html | Name | Synthesis Sample | |---|---| | ccby_nvidia_hifi_6671_M |

| | ccby_nvidia_hifi_92_F |

| | ccby_nvidia_hifi_6097_M |

| | ccby_nv_hifi_11614_F |

| | ccby_nvidia_hifi_11697_F |

| | ccby_nvidia_hifi_12787_F |

| | ccby_nvidia_hifi_6670_M |

| | ccby_nvidia_hifi_8051_F |

| | ccby_nvidia_hifi_9017_M |

| | ccby_nvidia_hifi_9136_F |

| (These audio samples were created with the xVASynth Editor with the SR option (44kHz), not xVATrainer whose automatically created samples often sound different Legal note: Although these datasets are licensed as CC BY 4.0, the base v3 model that these models are fine-tuned from, was pre-trained on non-permissive data. v3 base model: https://huggingface.co/Pendrokar/xvapitch