fishspeech2 / docs /ko /index.md
pineconeT94's picture
first commit
8b14bed
# ์†Œ๊ฐœ
<div>
<a target="_blank" href="https://discord.gg/Es5qTB9BcN">
<img alt="Discord" src="https://img.shields.io/discord/1214047546020728892?color=%23738ADB&label=Discord&logo=discord&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="http://qm.qq.com/cgi-bin/qm/qr?_wv=1027&k=jCKlUP7QgSm9kh95UlBoYv6s1I-Apl1M&authKey=xI5ttVAp3do68IpEYEalwXSYZFdfxZSkah%2BctF5FIMyN2NqAa003vFtLqJyAVRfF&noverify=0&group_code=593946093">
<img alt="QQ" src="https://img.shields.io/badge/QQ Group-%2312B7F5?logo=tencent-qq&logoColor=white&style=flat-square"/>
</a>
<a target="_blank" href="https://hub.docker.com/r/fishaudio/fish-speech">
<img alt="Docker" src="https://img.shields.io/docker/pulls/fishaudio/fish-speech?style=flat-square&logo=docker"/>
</a>
</div>
!!! warning
์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์˜ ๋ถˆ๋ฒ•์ ์ธ ์‚ฌ์šฉ์— ๋Œ€ํ•ด์„œ๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. DMCA(Digital Millennium Copyright Act) ๋ฐ ํ•ด๋‹น ์ง€์—ญ์˜ ๊ด€๋ จ ๋ฒ•๋ฅ ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค. <br/>
์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์™€ ๋ชจ๋“  ๋ชจ๋ธ์€ CC-BY-NC-SA-4.0 ๋ผ์ด์„ ์Šค์— ๋”ฐ๋ผ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.
<p align="center">
<img src="../assets/figs/diagram.png" width="75%">
</p>
## ์š”๊ตฌ ์‚ฌํ•ญ
- GPU ๋ฉ”๋ชจ๋ฆฌ: 4GB (์ถ”๋ก ์šฉ), 8GB (ํŒŒ์ธํŠœ๋‹์šฉ)
- ์‹œ์Šคํ…œ: Linux, Windows
## Windows ์„ค์ •
๊ณ ๊ธ‰ Windows ์‚ฌ์šฉ์ž๋Š” WSL2 ๋˜๋Š” Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ๋ฒ ์ด์Šค๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech
# pytorch ์„ค์น˜
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
# fish-speech ์„ค์น˜
pip3 install -e .
# (๊ฐ€์† ํ™œ์„ฑํ™”) triton-windows ์„ค์น˜
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl
```
๋น„์ „๋ฌธ Windows ์‚ฌ์šฉ์ž๋Š” Linux ํ™˜๊ฒฝ ์—†์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์Œ ๊ธฐ๋ณธ ๋ฐฉ๋ฒ•์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (๋ชจ๋ธ ์ปดํŒŒ์ผ ๊ธฐ๋Šฅ ํฌํ•จ, ์ฆ‰ `torch.compile`):
1. ํ”„๋กœ์ ํŠธ ํŒจํ‚ค์ง€ ์ถ”์ถœ.
2. `install_env.bat`์„ ํด๋ฆญํ•˜์—ฌ ํ™˜๊ฒฝ ์„ค์น˜.
3. ์ปดํŒŒ์ผ ๊ฐ€์†์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ์•„๋ž˜ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:
1. LLVM ์ปดํŒŒ์ผ๋Ÿฌ ๋‹ค์šด๋กœ๋“œ:
- [LLVM-17.0.6 (๊ณต์‹ ์‚ฌ์ดํŠธ)](https://huggingface.co/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
- [LLVM-17.0.6 (๋ฏธ๋Ÿฌ ์‚ฌ์ดํŠธ)](https://hf-mirror.com/fishaudio/fish-speech-1/resolve/main/LLVM-17.0.6-win64.exe?download=true)
- `LLVM-17.0.6-win64.exe`๋ฅผ ๋‹ค์šด๋กœ๋“œ ํ›„ ๋”๋ธ”ํด๋ฆญํ•˜์—ฌ ์„ค์น˜ํ•˜๊ณ , ์„ค์น˜ ๊ฒฝ๋กœ ์„ ํƒ ์‹œ `Add Path to Current User` ์˜ต์…˜์„ ์ฒดํฌํ•˜์—ฌ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
- ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
2. Microsoft Visual C++ ์žฌ๋ฐฐํฌ ๊ฐ€๋Šฅ ํŒจํ‚ค์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ .dll ๋ˆ„๋ฝ ๋ฌธ์ œ ํ•ด๊ฒฐ:
- [MSVC++ 14.40.33810.0 ๋‹ค์šด๋กœ๋“œ](https://aka.ms/vs/17/release/vc_redist.x64.exe)
3. Visual Studio Community Edition์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ LLVM์˜ ํ—ค๋” ํŒŒ์ผ ์˜์กด์„ฑ์„ ํ•ด๊ฒฐ:
- [Visual Studio ๋‹ค์šด๋กœ๋“œ](https://visualstudio.microsoft.com/zh-hans/downloads/)
- Visual Studio Installer๋ฅผ ์„ค์น˜ํ•œ ํ›„ Visual Studio Community 2022๋ฅผ ๋‹ค์šด๋กœ๋“œ.
- `Desktop development with C++` ์˜ต์…˜์„ ์„ ํƒํ•˜์—ฌ ์„ค์น˜.
4. [CUDA Toolkit 12.x](https://developer.nvidia.com/cuda-12-1-0-download-archive?target_os=Windows&target_arch=x86_64) ๋‹ค์šด๋กœ๋“œ ๋ฐ ์„ค์น˜.
4. `start.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜์—ฌ ํ›ˆ๋ จ ์ถ”๋ก  WebUI ๊ด€๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์—ฝ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์•„๋ž˜ ์ง€์นจ์— ๋”ฐ๋ผ `API_FLAGS`๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
!!! info "Optional"
์ถ”๋ก ์„ ์œ„ํ•ด WebUI๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜์‹œ๋‚˜์š”?
ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์˜ `API_FLAGS.txt` ํŒŒ์ผ์„ ํŽธ์ง‘ํ•˜๊ณ  ์ฒซ ์„ธ ์ค„์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•˜์„ธ์š”:
```
--infer
# --api
# --listen ...
...
```
!!! info "Optional"
API ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”?
ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์˜ `API_FLAGS.txt` ํŒŒ์ผ์„ ํŽธ์ง‘ํ•˜๊ณ  ์ฒซ ์„ธ ์ค„์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•˜์„ธ์š”:
```
# --infer
--api
--listen ...
...
```
!!! info "Optional"
`run_cmd.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜์—ฌ ์ด ํ”„๋กœ์ ํŠธ์˜ conda/python ๋ช…๋ น์ค„ ํ™˜๊ฒฝ์— ์ง„์ž…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## Linux ์„ค์ •
[pyproject.toml](../../pyproject.toml)์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜์„ธ์š”.
```bash
# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech
# (Ubuntu / Debian ์‚ฌ์šฉ์ž) sox + ffmpeg ์„ค์น˜
apt install libsox-dev ffmpeg
# (Ubuntu / Debian ์‚ฌ์šฉ์ž) pyaudio ์„ค์น˜
apt install build-essential \
cmake \
libasound-dev \
portaudio19-dev \
libportaudio2 \
libportaudiocpp0
# pytorch ์„ค์น˜
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
# fish-speech ์„ค์น˜
pip3 install -e .[stable]
```
## macos ์„ค์ •
MPS์—์„œ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด `--device mps` ํ”Œ๋ž˜๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”.
์ถ”๋ก  ์†๋„ ๋น„๊ต๋Š” [์ด PR](https://github.com/fishaudio/fish-speech/pull/461#issuecomment-2284277772)์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.
!!! warning
Apple Silicon ์žฅ์น˜์—์„œ๋Š” `compile` ์˜ต์…˜์ด ๊ณต์‹์ ์œผ๋กœ ์ง€์›๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์ถ”๋ก  ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ๋‹ค๋Š” ๋ณด์žฅ์€ ์—†์Šต๋‹ˆ๋‹ค.
```bash
# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech
# pytorch ์„ค์น˜
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
# fish-speech ์„ค์น˜
pip install -e .[stable]
```
## Docker ์„ค์ •
1. NVIDIA Container Toolkit ์„ค์น˜:
Docker์—์„œ ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐ ์ถ”๋ก ์— GPU๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด NVIDIA Container Toolkit์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
Ubuntu ์‚ฌ์šฉ์ž:
```bash
# ์ €์žฅ์†Œ ์ถ”๊ฐ€
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
# nvidia-container-toolkit ์„ค์น˜
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Docker ์„œ๋น„์Šค ์žฌ์‹œ์ž‘
sudo systemctl restart docker
```
๋‹ค๋ฅธ Linux ๋ฐฐํฌํŒ ์‚ฌ์šฉ์ž๋Š”: [NVIDIA Container Toolkit ์„ค์น˜ ๊ฐ€์ด๋“œ](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.
2. fish-speech ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ ๋ฐ ์‹คํ–‰
```bash
# ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ
docker pull fishaudio/fish-speech:latest-dev
# ์ด๋ฏธ์ง€ ์‹คํ–‰
docker run -it \
--name fish-speech \
--gpus all \
-p 7860:7860 \
fishaudio/fish-speech:latest-dev \
zsh
# ๋‹ค๋ฅธ ํฌํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด -p ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ YourPort:7860์œผ๋กœ ์ˆ˜์ •ํ•˜์„ธ์š”
```
3. ๋ชจ๋ธ ์ข…์†์„ฑ ๋‹ค์šด๋กœ๋“œ
Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์˜ ํ„ฐ๋ฏธ๋„์—์„œ ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•„์š”ํ•œ `vqgan` ๋ฐ `llama` ๋ชจ๋ธ์„ Huggingface ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.
```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
```
4. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • ๋ฐ WebUI ์ ‘๊ทผ
Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์˜ ํ„ฐ๋ฏธ๋„์—์„œ `export GRADIO_SERVER_NAME="0.0.0.0"`๋ฅผ ์ž…๋ ฅํ•˜์—ฌ Docker ๋‚ด๋ถ€์—์„œ Gradio ์„œ๋น„์Šค์— ์™ธ๋ถ€ ์ ‘๊ทผ์„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ดํ›„, ํ„ฐ๋ฏธ๋„์—์„œ `python tools/webui.py` ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•˜์—ฌ WebUI ์„œ๋น„์Šค๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
WSL ๋˜๋Š” macOS๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ [http://localhost:7860](http://localhost:7860)์—์„œ WebUI ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์—ด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์„œ๋ฒ„์— ๋ฐฐํฌ๋œ ๊ฒฝ์šฐ, localhost๋ฅผ ์„œ๋ฒ„์˜ IP๋กœ ๊ต์ฒดํ•˜์„ธ์š”.
## ๋ณ€๊ฒฝ ์‚ฌํ•ญ
- 2024/09/10: Fish-Speech 1.4 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ ์ฆ๊ฐ€ ๋ฐ ์–‘์žํ™”๊ธฐ์˜ n_groups๋ฅผ 4์—์„œ 8๋กœ ๋ณ€๊ฒฝ.
- 2024/07/02: Fish-Speech 1.2 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, VITS ๋””์ฝ”๋” ์ œ๊ฑฐ ๋ฐ ์ œ๋กœ์ƒท ๋Šฅ๋ ฅ ํฌ๊ฒŒ ํ–ฅ์ƒ.
- 2024/05/10: Fish-Speech 1.1 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, WER ๊ฐ์†Œ ๋ฐ ์Œ์ƒ‰ ์œ ์‚ฌ์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด VITS ๋””์ฝ”๋” ๊ตฌํ˜„.
- 2024/04/22: Fish-Speech 1.0 ๋ฒ„์ „ ์™„๋ฃŒ, VQGAN ๋ฐ LLAMA ๋ชจ๋ธ ๋Œ€ํญ ์ˆ˜์ •.
- 2023/12/28: `lora` ํŒŒ์ธํŠœ๋‹ ์ง€์› ์ถ”๊ฐ€.
- 2023/12/27: `gradient checkpointing`, `causual sampling`, ๋ฐ `flash-attn` ์ง€์› ์ถ”๊ฐ€.
- 2023/12/19: WebUI ๋ฐ HTTP API ์—…๋ฐ์ดํŠธ.
- 2023/12/18: ํŒŒ์ธํŠœ๋‹ ๋ฌธ์„œ ๋ฐ ๊ด€๋ จ ์˜ˆ์‹œ ์—…๋ฐ์ดํŠธ.
- 2023/12/17: `text2semantic` ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ, ์Œ์†Œ ์—†๋Š” ๋ชจ๋“œ ์ง€์›.
- 2023/12/13: ๋ฒ ํƒ€ ๋ฒ„์ „ ์ถœ์‹œ, VQGAN ๋ชจ๋ธ ๋ฐ LLAMA ๊ธฐ๋ฐ˜ ์–ธ์–ด ๋ชจ๋ธ(์Œ์†Œ ์ง€์›๋งŒ ํฌํ•จ).
## ๊ฐ์‚ฌ์˜ ๋ง
- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
- [GPT VITS](https://github.com/innnky/gpt-vits)
- [MQTTS](https://github.com/b04901014/MQTTS)
- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
- [Transformers](https://github.com/huggingface/transformers)
- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)