GPT-SoVITS-v3 / docs /ko /README.md
kevinwang676's picture
Upload folder using huggingface_hub
2c3577a verified
<div align="center">
<h1>GPT-SoVITS-WebUI</h1>
์†Œ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋กœ ์Œ์„ฑ ๋ณ€ํ™˜ ๋ฐ ์Œ์„ฑ ํ•ฉ์„ฑ์„ ์ง€์›ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ WebUI.<br><br>
[![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https://github.com/RVC-Boss/GPT-SoVITS)
<img src="https://counter.seku.su/cmoe?name=gptsovits&theme=r34" /><br>
[![Open In Colab](https://img.shields.io/badge/Colab-F9AB00?style=for-the-badge&logo=googlecolab&color=525252)](https://colab.research.google.com/github/RVC-Boss/GPT-SoVITS/blob/main/colab_webui.ipynb)
[![License](https://img.shields.io/badge/LICENSE-MIT-green.svg?style=for-the-badge)](https://github.com/RVC-Boss/GPT-SoVITS/blob/main/LICENSE)
[![Huggingface](https://img.shields.io/badge/๐Ÿค—%20-Models%20Repo-yellow.svg?style=for-the-badge)](https://huggingface.co/lj1995/GPT-SoVITS/tree/main)
[![Discord](https://img.shields.io/discord/1198701940511617164?color=%23738ADB&label=Discord&style=for-the-badge)](https://discord.gg/dnrgs5GHfG)
[**English**](../../README.md) | [**ไธญๆ–‡็ฎ€ไฝ“**](../cn/README.md) | [**ๆ—ฅๆœฌ่ชž**](../ja/README.md) | **ํ•œ๊ตญ์–ด** | [**Tรผrkรงe**](../tr/README.md)
</div>
---
## ๊ธฐ๋Šฅ:
1. **์ œ๋กœ์ƒท ํ…์ŠคํŠธ ์Œ์„ฑ ๋ณ€ํ™˜ (TTS):** 5์ดˆ์˜ ์Œ์„ฑ ์ƒ˜ํ”Œ์„ ์ž…๋ ฅํ•˜๋ฉด ์ฆ‰์‹œ ํ…์ŠคํŠธ๋ฅผ ์Œ์„ฑ์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2. **์†Œ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ TTS:** 1๋ถ„์˜ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋งŒ์œผ๋กœ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์Œ์„ฑ ์œ ์‚ฌ๋„์™€ ์‹ค์ œ๊ฐ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3. **๋‹ค๊ตญ์–ด ์ง€์›:** ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋‹ค๋ฅธ ์–ธ์–ด์˜ ์ถ”๋ก ์„ ์ง€์›ํ•˜๋ฉฐ, ํ˜„์žฌ ์˜์–ด, ์ผ๋ณธ์–ด, ์ค‘๊ตญ์–ด, ๊ด‘๋‘ฅ์–ด, ํ•œ๊ตญ์–ด๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.
4. **WebUI ๋„๊ตฌ:** ์Œ์„ฑ ๋ฐ˜์ฃผ ๋ถ„๋ฆฌ, ์ž๋™ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ ๋ถ„ํ• , ์ค‘๊ตญ์–ด ์ž๋™ ์Œ์„ฑ ์ธ์‹(ASR) ๋ฐ ํ…์ŠคํŠธ ์ฃผ์„ ๋“ฑ์˜ ๋„๊ตฌ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ดˆ๋ณด์ž๊ฐ€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹๊ณผ GPT/SoVITS ๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
**๋ฐ๋ชจ ๋น„๋””์˜ค๋ฅผ ํ™•์ธํ•˜์„ธ์š”! [demo video](https://www.bilibili.com/video/BV12g4y1m7Uw)**
๋ณด์ง€ ๋ชปํ•œ ๋ฐœํ™”์ž์˜ ํ“จ์ƒท(few-shot) ํŒŒ์ธํŠœ๋‹ ๋ฐ๋ชจ:
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
**์‚ฌ์šฉ์ž ์„ค๋ช…์„œ: [็ฎ€ไฝ“ไธญๆ–‡](https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e) | [English](https://rentry.co/GPT-SoVITS-guide#/)**
## ์„ค์น˜
### ํ…Œ์ŠคํŠธ ํ†ต๊ณผ ํ™˜๊ฒฝ
- Python 3.9, PyTorch 2.0.1, CUDA 11
- Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
- Python 3.9, Pytorch 2.2.2, macOS 14.4.1 (Apple Slilicon)
- Python 3.9, PyTorch 2.2.2, CPU ์žฅ์น˜
_์ฐธ๊ณ : numba==0.56.4 ๋Š” python<3.11 ์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค._
### Windows
Windows ์‚ฌ์šฉ์ž๋ผ๋ฉด (win>=10์—์„œ ํ…Œ์ŠคํŠธ๋จ), [ํ†ตํ•ฉ ํŒจํ‚ค์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œ](https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true)ํ•œ ํ›„ ์••์ถ•์„ ํ’€๊ณ  _go-webui.bat_ ํŒŒ์ผ์„ ๋”๋ธ” ํด๋ฆญํ•˜๋ฉด GPT-SoVITS-WebUI๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### Linux
```bash
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh
```
### macOS
**์ฃผ์˜: Mac์—์„œ GPU๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ ๋‹ค๋ฅธ OS์—์„œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์— ๋น„ํ•ด ํ’ˆ์งˆ์ด ๋‚ฎ์Šต๋‹ˆ๋‹ค. ํ•ด๋‹น ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์ „๊นŒ์ง€ MacOS์—์„  CPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.**
1. `xcode-select --install`์„ ์‹คํ–‰ํ•˜์—ฌ Xcode ์ปค๋งจ๋“œ๋ผ์ธ ๋„๊ตฌ๋ฅผ ์„ค์น˜ํ•˜์„ธ์š”.
2. `brew install ffmpeg` ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ FFmpeg๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
3. ์œ„์˜ ๋‹จ๊ณ„๋ฅผ ์™„๋ฃŒํ•œ ํ›„, ๋‹ค์Œ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์„ค์น˜ํ•˜์„ธ์š”.
```bash
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
pip install -r requirements.txt
```
### ์ˆ˜๋™ ์„ค์น˜
#### FFmpeg ์„ค์น˜
##### Conda ์‚ฌ์šฉ์ž
```bash
conda install ffmpeg
```
##### Ubuntu/Debian ์‚ฌ์šฉ์ž
```bash
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
```
##### Windows ์‚ฌ์šฉ์ž
[ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe)์™€ [ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe)๋ฅผ GPT-SoVITS root ๋””๋ ‰ํ† ๋ฆฌ์— ๋„ฃ์Šต๋‹ˆ๋‹ค.
##### MacOS ์‚ฌ์šฉ์ž
```bash
brew install ffmpeg
```
#### ์˜์กด์„ฑ ์„ค์น˜
```bash
pip install -r requirements.txt
```
### Docker์—์„œ ์‚ฌ์šฉ
#### docker-compose.yaml ์„ค์ •
0. ์ด๋ฏธ์ง€ ํƒœ๊ทธ: ์ฝ”๋“œ ์ €์žฅ์†Œ๊ฐ€ ๋น ๋ฅด๊ฒŒ ์—…๋ฐ์ดํŠธ๋˜๊ณ  ํŒจํ‚ค์ง€๊ฐ€ ๋Š๋ฆฌ๊ฒŒ ๋นŒ๋“œ๋˜๊ณ  ํ…Œ์ŠคํŠธ๋˜๋ฏ€๋กœ, ํ˜„์žฌ ๋นŒ๋“œ๋œ ์ตœ์‹  ๋„์ปค ์ด๋ฏธ์ง€๋ฅผ [Docker Hub](https://hub.docker.com/r/breakstring/gpt-sovits)์—์„œ ํ™•์ธํ•˜๊ณ  ํ•„์š”์— ๋”ฐ๋ผ Dockerfile์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ์ปฌ์—์„œ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
1. ํ™˜๊ฒฝ ๋ณ€์ˆ˜:
- is_half: ๋ฐ˜์ •๋ฐ€/๋ฐฐ์ •๋ฐ€ ์ œ์–ด. "SSL ์ถ”์ถœ" ๋‹จ๊ณ„์—์„œ 4-cnhubert/5-wav32k ๋””๋ ‰ํ† ๋ฆฌ์˜ ๋‚ด์šฉ์„ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๊ฒƒ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์‹ค์ œ ์ƒํ™ฉ์— ๋”ฐ๋ผ True ๋˜๋Š” False๋กœ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2. ๋ณผ๋ฅจ ์„ค์ •, ์ปจํ…Œ์ด๋„ˆ ๋‚ด์˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ /workspace๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ docker-compose.yaml์—๋Š” ์‹ค์ œ ์˜ˆ์ œ๊ฐ€ ๋‚˜์—ด๋˜์–ด ์žˆ์œผ๋ฏ€๋กœ ์—…๋กœ๋“œ/๋‹ค์šด๋กœ๋“œ๋ฅผ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3. shm_size: Windows์˜ Docker Desktop์˜ ๊ธฐ๋ณธ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ๋„ˆ๋ฌด ์ž‘์•„ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์‹ค์ œ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
4. deploy ์„น์…˜์˜ gpu ๊ด€๋ จ ๋‚ด์šฉ์€ ์‹œ์Šคํ…œ ๋ฐ ์‹ค์ œ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
#### docker compose๋กœ ์‹คํ–‰
```
docker compose -f "docker-compose.yaml" up -d
```
#### docker ๋ช…๋ น์œผ๋กœ ์‹คํ–‰
์œ„์™€ ๋™์ผํ•˜๊ฒŒ ์‹ค์ œ ์ƒํ™ฉ์— ๋งž๊ฒŒ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ˆ˜์ •ํ•œ ๋‹ค์Œ ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:
```
docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9880:9880 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:xxxxx
```
## ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ
1. [GPT-SoVITS Models](https://huggingface.co/lj1995/GPT-SoVITS) ์—์„œ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , `GPT_SoVITS/pretrained_models` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”.
2. [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip) ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  ์••์ถ•์„ ํ’€์–ด `G2PWModel`๋กœ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•œ ํ›„, `GPT_SoVITS/text` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”. (์ค‘๊ตญ์–ด TTS ์ „์šฉ)
3. UVR5 (๋ณด์ปฌ/๋ฐ˜์ฃผ ๋ถ„๋ฆฌ & ์ž”ํ–ฅ ์ œ๊ฑฐ ์ถ”๊ฐ€ ๊ธฐ๋Šฅ)์˜ ๊ฒฝ์šฐ, [UVR5 Weights](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/uvr5_weights) ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ  `tools/uvr5/uvr5_weights` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”.
4. ์ค‘๊ตญ์–ด ASR (์ถ”๊ฐ€ ๊ธฐ๋Šฅ)์˜ ๊ฒฝ์šฐ, [Damo ASR Model](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/files), [Damo VAD Model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/files) ๋ฐ [Damo Punc Model](https://modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/files) ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , `tools/asr/models` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”.
5. ์˜์–ด ๋˜๋Š” ์ผ๋ณธ์–ด ASR (์ถ”๊ฐ€ ๊ธฐ๋Šฅ)์˜ ๊ฒฝ์šฐ, [Faster Whisper Large V3](https://huggingface.co/Systran/faster-whisper-large-v3) ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ , `tools/asr/models` ๋””๋ ‰ํ† ๋ฆฌ์— ๋ฐฐ์น˜ํ•˜์„ธ์š”. ๋˜ํ•œ, [๋‹ค๋ฅธ ๋ชจ๋ธ](https://huggingface.co/Systran) ์€ ๋” ์ ์€ ๋””์Šคํฌ ์šฉ๋Ÿ‰์œผ๋กœ ๋น„์Šทํ•œ ํšจ๊ณผ๋ฅผ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## ๋ฐ์ดํ„ฐ์…‹ ํ˜•์‹
ํ…์ŠคํŠธ ์Œ์„ฑ ํ•ฉ์„ฑ(TTS) ์ฃผ์„ .list ํŒŒ์ผ ํ˜•์‹:
```
vocal_path|speaker_name|language|text
```
์–ธ์–ด ์‚ฌ์ „:
- 'zh': ์ค‘๊ตญ์–ด
- 'ja': ์ผ๋ณธ์–ด
- 'en': ์˜์–ด
์˜ˆ์‹œ:
```
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
```
## ๋ฏธ์„ธ ์กฐ์ • ๋ฐ ์ถ”๋ก 
### WebUI ์—ด๊ธฐ
#### ํ†ตํ•ฉ ํŒจํ‚ค์ง€ ์‚ฌ์šฉ์ž
`go-webui.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜๊ฑฐ๋‚˜ `go-webui.ps1`๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
V1์œผ๋กœ ์ „ํ™˜ํ•˜๋ ค๋ฉด, `go-webui-v1.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜๊ฑฐ๋‚˜ `go-webui-v1.ps1`๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
#### ๊ธฐํƒ€
```bash
python webui.py <์–ธ์–ด(์˜ต์…˜)>
```
V1์œผ๋กœ ์ „ํ™˜ํ•˜๋ ค๋ฉด,
```bash
python webui.py v1 <์–ธ์–ด(์˜ต์…˜)>
```
๋˜๋Š” WebUI์—์„œ ์ˆ˜๋™์œผ๋กœ ๋ฒ„์ „์„ ์ „ํ™˜ํ•˜์‹ญ์‹œ์˜ค.
### ๋ฏธ์„ธ ์กฐ์ •
#### ๊ฒฝ๋กœ ์ž๋™ ์ฑ„์šฐ๊ธฐ๊ฐ€ ์ง€์›๋ฉ๋‹ˆ๋‹ค
1. ์˜ค๋””์˜ค ๊ฒฝ๋กœ๋ฅผ ์ž…๋ ฅํ•˜์‹ญ์‹œ์˜ค.
2. ์˜ค๋””์˜ค๋ฅผ ์ž‘์€ ์ฒญํฌ๋กœ ๋ถ„ํ• ํ•˜์‹ญ์‹œ์˜ค.
3. ๋…ธ์ด์ฆˆ ์ œ๊ฑฐ(์˜ต์…˜)
4. ASR ์ˆ˜ํ–‰
5. ASR ์ „์‚ฌ๋ฅผ ๊ต์ •ํ•˜์‹ญ์‹œ์˜ค.
6. ๋‹ค์Œ ํƒญ์œผ๋กœ ์ด๋™ํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์‹ญ์‹œ์˜ค.
### ์ถ”๋ก  WebUI ์—ด๊ธฐ
#### ํ†ตํ•ฉ ํŒจํ‚ค์ง€ ์‚ฌ์šฉ์ž
`go-webui-v2.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜๊ฑฐ๋‚˜ `go-webui-v2.ps1`๋ฅผ ์‚ฌ์šฉํ•œ ๋‹ค์Œ `1-GPT-SoVITS-TTS/1C-inference`์—์„œ ์ถ”๋ก  webui๋ฅผ ์—ฝ๋‹ˆ๋‹ค.
#### ๊ธฐํƒ€
```bash
python GPT_SoVITS/inference_webui.py <์–ธ์–ด(์˜ต์…˜)>
```
๋˜๋Š”
```bash
python webui.py
```
๊ทธ๋Ÿฐ ๋‹ค์Œ `1-GPT-SoVITS-TTS/1C-inference`์—์„œ ์ถ”๋ก  webui๋ฅผ ์—ฝ๋‹ˆ๋‹ค.
## V2 ๋ฆด๋ฆฌ์Šค ๋…ธํŠธ
์ƒˆ๋กœ์šด ๊ธฐ๋Šฅ:
1. ํ•œ๊ตญ์–ด ๋ฐ ๊ด‘๋‘ฅ์–ด ์ง€์›
2. ์ตœ์ ํ™”๋œ ํ…์ŠคํŠธ ํ”„๋ก ํŠธ์—”๋“œ
3. ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ์ด 2์ฒœ ์‹œ๊ฐ„์—์„œ 5์ฒœ ์‹œ๊ฐ„์œผ๋กœ ํ™•์žฅ
4. ์ €ํ’ˆ์งˆ ์ฐธ์กฐ ์˜ค๋””์˜ค์— ๋Œ€ํ•œ ํ•ฉ์„ฑ ํ’ˆ์งˆ ํ–ฅ์ƒ
[์ž์„ธํ•œ ๋‚ด์šฉ](https://github.com/RVC-Boss/GPT-SoVITS/wiki/GPT%E2%80%90SoVITS%E2%80%90v2%E2%80%90features-(%E6%96%B0%E7%89%B9%E6%80%A7))
V1 ํ™˜๊ฒฝ์—์„œ V2๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด:
1. `pip install -r requirements.txt`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ถ€ ํŒจํ‚ค์ง€ ์—…๋ฐ์ดํŠธ
2. github์—์„œ ์ตœ์‹  ์ฝ”๋“œ๋ฅผ ํด๋ก ํ•˜์‹ญ์‹œ์˜ค.
3. [huggingface](https://huggingface.co/lj1995/GPT-SoVITS/tree/main/gsv-v2final-pretrained)์—์„œ V2 ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ `GPT_SoVITS\pretrained_models\gsv-v2final-pretrained`์— ๋„ฃ์œผ์‹ญ์‹œ์˜ค.
์ค‘๊ตญ์–ด V2 ์ถ”๊ฐ€: [G2PWModel_1.1.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/g2p/G2PWModel_1.1.zip) (G2PW ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์••์ถ•์„ ํ’€๊ณ  `G2PWModel`๋กœ ์ด๋ฆ„์„ ๋ณ€๊ฒฝํ•œ ๋‹ค์Œ `GPT_SoVITS/text`์— ๋ฐฐ์น˜ํ•ฉ๋‹ˆ๋‹ค.)
## ํ•  ์ผ ๋ชฉ๋ก
- [x] **์ตœ์šฐ์„ ์ˆœ์œ„:**
- [x] ์ผ๋ณธ์–ด ๋ฐ ์˜์–ด ์ง€์—ญํ™”.
- [x] ์‚ฌ์šฉ์ž ๊ฐ€์ด๋“œ.
- [x] ์ผ๋ณธ์–ด ๋ฐ ์˜์–ด ๋ฐ์ดํ„ฐ์…‹ ๋ฏธ์„ธ ์กฐ์ • ํ›ˆ๋ จ.
- [ ] **๊ธฐ๋Šฅ:**
- [x] ์ œ๋กœ์ƒท ์Œ์„ฑ ๋ณ€ํ™˜ (5์ดˆ) / ์†Œ๋Ÿ‰์˜ ์Œ์„ฑ ๋ณ€ํ™˜ (1๋ถ„).
- [x] TTS ์†๋„ ์ œ์–ด.
- [ ] ~~ํ–ฅ์ƒ๋œ TTS ๊ฐ์ • ์ œ์–ด.~~
- [ ] SoVITS ํ† ํฐ ์ž…๋ ฅ์„ ๋‹จ์–ด ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€๊ฒฝํ•ด ๋ณด์„ธ์š”.
- [x] ์˜์–ด ๋ฐ ์ผ๋ณธ์–ด ํ…์ŠคํŠธ ํ”„๋ก ํŠธ ์—”๋“œ ๊ฐœ์„ .
- [ ] ์ž‘์€ ํฌ๊ธฐ์™€ ํฐ ํฌ๊ธฐ์˜ TTS ๋ชจ๋ธ ๊ฐœ๋ฐœ.
- [x] Colab ์Šคํฌ๋ฆฝํŠธ.
- [ ] ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹ ํ™•์žฅ (2k ์‹œ๊ฐ„์—์„œ 10k ์‹œ๊ฐ„).
- [x] ๋” ๋‚˜์€ sovits ๊ธฐ๋ณธ ๋ชจ๋ธ (ํ–ฅ์ƒ๋œ ์˜ค๋””์˜ค ํ’ˆ์งˆ).
- [ ] ๋ชจ๋ธ ๋ธ”๋ Œ๋”ฉ.
## (์ถ”๊ฐ€์ ์ธ) ๋ช…๋ น์ค„์—์„œ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•
๋ช…๋ น์ค„์„ ์‚ฌ์šฉํ•˜์—ฌ UVR5์šฉ WebUI ์—ด๊ธฐ
```
python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
```
๋ธŒ๋ผ์šฐ์ €๋ฅผ ์—ด ์ˆ˜ ์—†๋Š” ๊ฒฝ์šฐ UVR ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ์•„๋ž˜ ํ˜•์‹์„ ๋”ฐ๋ฅด์‹ญ์‹œ์˜ค. ์ด๋Š” ์˜ค๋””์˜ค ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด mdxnet์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
```
python mdxnet.py --model --input_root --output_vocal --output_ins --agg_level --format --device --is_half_precision
```
๋ช…๋ น์ค„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์„ธํŠธ์˜ ์˜ค๋””์˜ค ๋ถ„ํ• ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
```
python audio_slicer.py \
--input_path "<path_to_original_audio_file_or_directory>" \
--output_root "<directory_where_subdivided_audio_clips_will_be_saved>" \
--threshold <volume_threshold> \
--min_length <minimum_duration_of_each_subclip> \
--min_interval <shortest_time_gap_between_adjacent_subclips>
--hop_size <step_size_for_computing_volume_curve>
```
๋ช…๋ น์ค„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ ASR ์ฒ˜๋ฆฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค(์ค‘๊ตญ์–ด๋งŒ ํ•ด๋‹น).
```
python tools/asr/funasr_asr.py -i <input> -o <output>
```
ASR ์ฒ˜๋ฆฌ๋Š” Faster_Whisper(์ค‘๊ตญ์–ด๋ฅผ ์ œ์™ธํ•œ ASR ๋งˆํ‚น)๋ฅผ ํ†ตํ•ด ์ˆ˜ํ–‰๋ฉ๋‹ˆ๋‹ค.
(์ง„ํ–‰๋ฅ  ํ‘œ์‹œ์ค„ ์—†์Œ, GPU ์„ฑ๋Šฅ์œผ๋กœ ์ธํ•ด ์‹œ๊ฐ„ ์ง€์—ฐ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ)
```
python ./tools/asr/fasterwhisper_asr.py -i <input> -o <output> -l <language> -p <precision>
```
์‚ฌ์šฉ์ž ์ •์˜ ๋ชฉ๋ก ์ €์žฅ ๊ฒฝ๋กœ๊ฐ€ ํ™œ์„ฑํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
## ๊ฐ์‚ฌ์˜ ๋ง
๋‹ค์Œ ํ”„๋กœ์ ํŠธ์™€ ๊ธฐ์—ฌ์ž๋“ค์—๊ฒŒ ํŠน๋ณ„ํžˆ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค:
### ์ด๋ก  ์—ฐ๊ตฌ
- [ar-vits](https://github.com/innnky/ar-vits)
- [SoundStorm](https://github.com/yangdongchao/SoundStorm/tree/master/soundstorm/s1/AR)
- [vits](https://github.com/jaywalnut310/vits)
- [TransferTTS](https://github.com/hcy71o/TransferTTS/blob/master/models.py#L556)
- [contentvec](https://github.com/auspicious3000/contentvec/)
- [hifi-gan](https://github.com/jik876/hifi-gan)
- [fish-speech](https://github.com/fishaudio/fish-speech/blob/main/tools/llama/generate.py#L41)
### ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ
- [Chinese Speech Pretrain](https://github.com/TencentGameMate/chinese_speech_pretrain)
- [Chinese-Roberta-WWM-Ext-Large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large)
### ์ถ”๋ก ์šฉ ํ…์ŠคํŠธ ํ”„๋ก ํŠธ์—”๋“œ
- [paddlespeech zh_normalization](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/paddlespeech/t2s/frontend/zh_normalization)
- [LangSegment](https://github.com/juntaosun/LangSegment)
### WebUI ๋„๊ตฌ
- [ultimatevocalremovergui](https://github.com/Anjok07/ultimatevocalremovergui)
- [audio-slicer](https://github.com/openvpi/audio-slicer)
- [SubFix](https://github.com/cronrpc/SubFix)
- [FFmpeg](https://github.com/FFmpeg/FFmpeg)
- [gradio](https://github.com/gradio-app/gradio)
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
- [FunASR](https://github.com/alibaba-damo-academy/FunASR)
@Naozumi520 ๋‹˜๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. ๊ด‘๋‘ฅ์–ด ํ•™์Šต ์ž๋ฃŒ๋ฅผ ์ œ๊ณตํ•ด ์ฃผ์‹œ๊ณ , ๊ด‘๋‘ฅ์–ด ๊ด€๋ จ ์ง€์‹์„ ์ง€๋„ํ•ด ์ฃผ์…”์„œ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
## ๋ชจ๋“  ๊ธฐ์—ฌ์ž๋“ค์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค ;)
<a href="https://github.com/RVC-Boss/GPT-SoVITS/graphs/contributors" target="_blank">
<img src="https://contrib.rocks/image?repo=RVC-Boss/GPT-SoVITS" />
</a>