fishspeech2 / docs /ko /index.md
pineconeT94's picture
first commit
8b14bed

A newer version of the Gradio SDK is available: 5.5.0

Upgrade

์†Œ๊ฐœ

!!! warning ์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์˜ ๋ถˆ๋ฒ•์ ์ธ ์‚ฌ์šฉ์— ๋Œ€ํ•ด์„œ๋Š” ์ฑ…์ž„์„ ์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค. DMCA(Digital Millennium Copyright Act) ๋ฐ ํ•ด๋‹น ์ง€์—ญ์˜ ๊ด€๋ จ ๋ฒ•๋ฅ ์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.
์ด ์ฝ”๋“œ๋ฒ ์ด์Šค์™€ ๋ชจ๋“  ๋ชจ๋ธ์€ CC-BY-NC-SA-4.0 ๋ผ์ด์„ ์Šค์— ๋”ฐ๋ผ ๋ฐฐํฌ๋ฉ๋‹ˆ๋‹ค.

์š”๊ตฌ ์‚ฌํ•ญ

  • GPU ๋ฉ”๋ชจ๋ฆฌ: 4GB (์ถ”๋ก ์šฉ), 8GB (ํŒŒ์ธํŠœ๋‹์šฉ)
  • ์‹œ์Šคํ…œ: Linux, Windows

Windows ์„ค์ •

๊ณ ๊ธ‰ Windows ์‚ฌ์šฉ์ž๋Š” WSL2 ๋˜๋Š” Docker๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ๋ฒ ์ด์Šค๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech

# pytorch ์„ค์น˜
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121

# fish-speech ์„ค์น˜
pip3 install -e .

# (๊ฐ€์† ํ™œ์„ฑํ™”) triton-windows ์„ค์น˜
pip install https://github.com/AnyaCoder/fish-speech/releases/download/v0.1.0/triton_windows-0.1.0-py3-none-any.whl

๋น„์ „๋ฌธ Windows ์‚ฌ์šฉ์ž๋Š” Linux ํ™˜๊ฒฝ ์—†์ด ํ”„๋กœ์ ํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์Œ ๊ธฐ๋ณธ ๋ฐฉ๋ฒ•์„ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (๋ชจ๋ธ ์ปดํŒŒ์ผ ๊ธฐ๋Šฅ ํฌํ•จ, ์ฆ‰ torch.compile):

  1. ํ”„๋กœ์ ํŠธ ํŒจํ‚ค์ง€ ์ถ”์ถœ.
  2. install_env.bat์„ ํด๋ฆญํ•˜์—ฌ ํ™˜๊ฒฝ ์„ค์น˜.
  3. ์ปดํŒŒ์ผ ๊ฐ€์†์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ์•„๋ž˜ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:
    1. LLVM ์ปดํŒŒ์ผ๋Ÿฌ ๋‹ค์šด๋กœ๋“œ:
      • LLVM-17.0.6 (๊ณต์‹ ์‚ฌ์ดํŠธ)
      • LLVM-17.0.6 (๋ฏธ๋Ÿฌ ์‚ฌ์ดํŠธ)
      • LLVM-17.0.6-win64.exe๋ฅผ ๋‹ค์šด๋กœ๋“œ ํ›„ ๋”๋ธ”ํด๋ฆญํ•˜์—ฌ ์„ค์น˜ํ•˜๊ณ , ์„ค์น˜ ๊ฒฝ๋กœ ์„ ํƒ ์‹œ Add Path to Current User ์˜ต์…˜์„ ์ฒดํฌํ•˜์—ฌ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
      • ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
    2. Microsoft Visual C++ ์žฌ๋ฐฐํฌ ๊ฐ€๋Šฅ ํŒจํ‚ค์ง€๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ .dll ๋ˆ„๋ฝ ๋ฌธ์ œ ํ•ด๊ฒฐ:
    3. Visual Studio Community Edition์„ ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ LLVM์˜ ํ—ค๋” ํŒŒ์ผ ์˜์กด์„ฑ์„ ํ•ด๊ฒฐ:
      • Visual Studio ๋‹ค์šด๋กœ๋“œ
      • Visual Studio Installer๋ฅผ ์„ค์น˜ํ•œ ํ›„ Visual Studio Community 2022๋ฅผ ๋‹ค์šด๋กœ๋“œ.
      • Desktop development with C++ ์˜ต์…˜์„ ์„ ํƒํ•˜์—ฌ ์„ค์น˜.
    4. CUDA Toolkit 12.x ๋‹ค์šด๋กœ๋“œ ๋ฐ ์„ค์น˜.
  4. start.bat์„ ๋”๋ธ” ํด๋ฆญํ•˜์—ฌ ํ›ˆ๋ จ ์ถ”๋ก  WebUI ๊ด€๋ฆฌ ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์—ฝ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ฒฝ์šฐ ์•„๋ž˜ ์ง€์นจ์— ๋”ฐ๋ผ API_FLAGS๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

!!! info "Optional"

์ถ”๋ก ์„ ์œ„ํ•ด WebUI๋ฅผ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜์‹œ๋‚˜์š”?

ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์˜ `API_FLAGS.txt` ํŒŒ์ผ์„ ํŽธ์ง‘ํ•˜๊ณ  ์ฒซ ์„ธ ์ค„์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•˜์„ธ์š”:
```
 --infer
 # --api
 # --listen ...
 ...
```

!!! info "Optional"

API ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜๊ณ  ์‹ถ์œผ์‹ ๊ฐ€์š”?

ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์˜ `API_FLAGS.txt` ํŒŒ์ผ์„ ํŽธ์ง‘ํ•˜๊ณ  ์ฒซ ์„ธ ์ค„์„ ์•„๋ž˜์™€ ๊ฐ™์ด ์ˆ˜์ •ํ•˜์„ธ์š”:

```
# --infer
--api
--listen ...
...
```

!!! info "Optional"

`run_cmd.bat`์„ ๋”๋ธ” ํด๋ฆญํ•˜์—ฌ ์ด ํ”„๋กœ์ ํŠธ์˜ conda/python ๋ช…๋ น์ค„ ํ™˜๊ฒฝ์— ์ง„์ž…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Linux ์„ค์ •

pyproject.toml์—์„œ ์ž์„ธํ•œ ๋‚ด์šฉ์„ ํ™•์ธํ•˜์„ธ์š”.

# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech

# (Ubuntu / Debian ์‚ฌ์šฉ์ž) sox + ffmpeg ์„ค์น˜
apt install libsox-dev ffmpeg 

# (Ubuntu / Debian ์‚ฌ์šฉ์ž) pyaudio ์„ค์น˜
apt install build-essential \
    cmake \
    libasound-dev \
    portaudio19-dev \
    libportaudio2 \
    libportaudiocpp0

# pytorch ์„ค์น˜
pip3 install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1

# fish-speech ์„ค์น˜
pip3 install -e .[stable]

macos ์„ค์ •

MPS์—์„œ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋ ค๋ฉด --device mps ํ”Œ๋ž˜๊ทธ๋ฅผ ์ถ”๊ฐ€ํ•˜์„ธ์š”. ์ถ”๋ก  ์†๋„ ๋น„๊ต๋Š” ์ด PR์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

!!! warning Apple Silicon ์žฅ์น˜์—์„œ๋Š” compile ์˜ต์…˜์ด ๊ณต์‹์ ์œผ๋กœ ์ง€์›๋˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์ถ”๋ก  ์†๋„๊ฐ€ ํ–ฅ์ƒ๋œ๋‹ค๋Š” ๋ณด์žฅ์€ ์—†์Šต๋‹ˆ๋‹ค.

# ํŒŒ์ด์ฌ 3.10 ๊ฐ€์ƒ ํ™˜๊ฒฝ ์ƒ์„ฑ, virtualenv๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
conda create -n fish-speech python=3.10
conda activate fish-speech
# pytorch ์„ค์น˜
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1
# fish-speech ์„ค์น˜
pip install -e .[stable]

Docker ์„ค์ •

  1. NVIDIA Container Toolkit ์„ค์น˜:

    Docker์—์„œ ๋ชจ๋ธ ํ›ˆ๋ จ ๋ฐ ์ถ”๋ก ์— GPU๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด NVIDIA Container Toolkit์„ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

    Ubuntu ์‚ฌ์šฉ์ž:

    # ์ €์žฅ์†Œ ์ถ”๊ฐ€
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
        && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    # nvidia-container-toolkit ์„ค์น˜
    sudo apt-get update
    sudo apt-get install -y nvidia-container-toolkit
    # Docker ์„œ๋น„์Šค ์žฌ์‹œ์ž‘
    sudo systemctl restart docker
    

    ๋‹ค๋ฅธ Linux ๋ฐฐํฌํŒ ์‚ฌ์šฉ์ž๋Š”: NVIDIA Container Toolkit ์„ค์น˜ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

  2. fish-speech ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ ๋ฐ ์‹คํ–‰

    # ์ด๋ฏธ์ง€ ๊ฐ€์ ธ์˜ค๊ธฐ
    docker pull fishaudio/fish-speech:latest-dev
    # ์ด๋ฏธ์ง€ ์‹คํ–‰
    docker run -it \
        --name fish-speech \
        --gpus all \
        -p 7860:7860 \
        fishaudio/fish-speech:latest-dev \
        zsh
    # ๋‹ค๋ฅธ ํฌํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด -p ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ YourPort:7860์œผ๋กœ ์ˆ˜์ •ํ•˜์„ธ์š”
    
  3. ๋ชจ๋ธ ์ข…์†์„ฑ ๋‹ค์šด๋กœ๋“œ

    Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์˜ ํ„ฐ๋ฏธ๋„์—์„œ ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•„์š”ํ•œ vqgan ๋ฐ llama ๋ชจ๋ธ์„ Huggingface ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

    huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
    
  4. ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ • ๋ฐ WebUI ์ ‘๊ทผ

    Docker ์ปจํ…Œ์ด๋„ˆ ๋‚ด๋ถ€์˜ ํ„ฐ๋ฏธ๋„์—์„œ export GRADIO_SERVER_NAME="0.0.0.0"๋ฅผ ์ž…๋ ฅํ•˜์—ฌ Docker ๋‚ด๋ถ€์—์„œ Gradio ์„œ๋น„์Šค์— ์™ธ๋ถ€ ์ ‘๊ทผ์„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ดํ›„, ํ„ฐ๋ฏธ๋„์—์„œ python tools/webui.py ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•˜์—ฌ WebUI ์„œ๋น„์Šค๋ฅผ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.

    WSL ๋˜๋Š” macOS๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ http://localhost:7860์—์„œ WebUI ์ธํ„ฐํŽ˜์ด์Šค๋ฅผ ์—ด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

    ์„œ๋ฒ„์— ๋ฐฐํฌ๋œ ๊ฒฝ์šฐ, localhost๋ฅผ ์„œ๋ฒ„์˜ IP๋กœ ๊ต์ฒดํ•˜์„ธ์š”.

๋ณ€๊ฒฝ ์‚ฌํ•ญ

  • 2024/09/10: Fish-Speech 1.4 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, ๋ฐ์ดํ„ฐ์…‹ ํฌ๊ธฐ ์ฆ๊ฐ€ ๋ฐ ์–‘์žํ™”๊ธฐ์˜ n_groups๋ฅผ 4์—์„œ 8๋กœ ๋ณ€๊ฒฝ.
  • 2024/07/02: Fish-Speech 1.2 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, VITS ๋””์ฝ”๋” ์ œ๊ฑฐ ๋ฐ ์ œ๋กœ์ƒท ๋Šฅ๋ ฅ ํฌ๊ฒŒ ํ–ฅ์ƒ.
  • 2024/05/10: Fish-Speech 1.1 ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธ, WER ๊ฐ์†Œ ๋ฐ ์Œ์ƒ‰ ์œ ์‚ฌ์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด VITS ๋””์ฝ”๋” ๊ตฌํ˜„.
  • 2024/04/22: Fish-Speech 1.0 ๋ฒ„์ „ ์™„๋ฃŒ, VQGAN ๋ฐ LLAMA ๋ชจ๋ธ ๋Œ€ํญ ์ˆ˜์ •.
  • 2023/12/28: lora ํŒŒ์ธํŠœ๋‹ ์ง€์› ์ถ”๊ฐ€.
  • 2023/12/27: gradient checkpointing, causual sampling, ๋ฐ flash-attn ์ง€์› ์ถ”๊ฐ€.
  • 2023/12/19: WebUI ๋ฐ HTTP API ์—…๋ฐ์ดํŠธ.
  • 2023/12/18: ํŒŒ์ธํŠœ๋‹ ๋ฌธ์„œ ๋ฐ ๊ด€๋ จ ์˜ˆ์‹œ ์—…๋ฐ์ดํŠธ.
  • 2023/12/17: text2semantic ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ, ์Œ์†Œ ์—†๋Š” ๋ชจ๋“œ ์ง€์›.
  • 2023/12/13: ๋ฒ ํƒ€ ๋ฒ„์ „ ์ถœ์‹œ, VQGAN ๋ชจ๋ธ ๋ฐ LLAMA ๊ธฐ๋ฐ˜ ์–ธ์–ด ๋ชจ๋ธ(์Œ์†Œ ์ง€์›๋งŒ ํฌํ•จ).

๊ฐ์‚ฌ์˜ ๋ง