Spaces:
Runtime error
Runtime error
license: openrail | |
title: Real-Time Korean Voice Cloning | |
sdk: gradio | |
emoji: π | |
colorFrom: yellow | |
colorTo: red | |
app_file: app.py | |
sdk_version: 3.17.1 | |
pinned: false | |
** Temporarily suspended | |
# Configuration | |
`title`: _string_ | |
Display title for the Space | |
`emoji`: _string_ | |
Space emoji (emoji-only character allowed) | |
`colorFrom`: _string_ | |
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray) | |
`colorTo`: _string_ | |
Color for Thumbnail gradient (red, yellow, green, blue, indigo, purple, pink, gray) | |
`sdk`: _string_ | |
Can be either `gradio` or `streamlit` | |
`sdk_version` : _string_ | |
Only applicable for `streamlit` SDK. | |
See [doc](https://hf.co/docs/hub/spaces) for more info on supported versions. | |
`app_file`: _string_ | |
Path to your main application file (which contains either `gradio` or `streamlit` Python code). | |
Path is relative to the root of the repository. | |
`pinned`: _boolean_ | |
Whether the Space stays on top of your list. | |
# Real-Time Korean Voice Cloning | |
This repository is Korean version of sv2tts. The original model (which was developed by CorentinJ(https://github.com/CorentinJ/Real-Time-Voice-Cloning)) is based on English. | |
To implement Korean speech on the model, I refer to tail95(https://github.com/tail95/Voice-Cloning). | |
I changed some codes to improve convenience in preprocessing(audio and text) and training. Also I converted tensorflow model to pytorch model and fixed some errors. | |
## References | |
- https://github.com/CorentinJ/Real-Time-Voice-Cloning | |
- https://github.com/tail95/Voice-Cloning | |
- https://medium.com/analytics-vidhya/the-intuition-behind-voice-cloning-with-5-seconds-of-audio-5989e9b2e042 | |
- Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (https://arxiv.org/abs/1806.04558) | |
## Used Dataset | |
- KSponspeech (https://aihub.or.kr/aidata/105) | |
Make sure that your datasets has text-audio pairs. |