fishspeech2 / docs /ko /inference.md
pineconeT94's picture
first commit
8b14bed
# ์ถ”๋ก 
์ถ”๋ก ์€ ๋ช…๋ น์ค„, HTTP API, ๊ทธ๋ฆฌ๊ณ  ์›น UI์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค.
!!! note
์ „์ฒด ์ถ”๋ก  ๊ณผ์ •์€ ๋‹ค์Œ์˜ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค:
1. VQGAN์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ฝ 10์ดˆ ๋ถ„๋Ÿ‰์˜ ์Œ์„ฑ์„ ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.
2. ์ธ์ฝ”๋”ฉ๋œ ์‹œ๋งจํ‹ฑ ํ† ํฐ๊ณผ ํ•ด๋‹น ํ…์ŠคํŠธ๋ฅผ ์˜ˆ์‹œ๋กœ ์–ธ์–ด ๋ชจ๋ธ์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
3. ์ƒˆ๋กœ์šด ํ…์ŠคํŠธ๋ฅผ ์ž…๋ ฅํ•˜๋ฉด, ๋ชจ๋ธ์ด ํ•ด๋‹นํ•˜๋Š” ์‹œ๋งจํ‹ฑ ํ† ํฐ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
4. ์ƒ์„ฑ๋œ ์‹œ๋งจํ‹ฑ ํ† ํฐ์„ VITS / VQGAN์— ์ž…๋ ฅํ•˜์—ฌ ์Œ์„ฑ์„ ๋””์ฝ”๋”ฉํ•˜๊ณ  ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
## ๋ช…๋ น์ค„ ์ถ”๋ก 
ํ•„์š”ํ•œ `vqgan` ๋ฐ `llama` ๋ชจ๋ธ์„ Hugging Face ๋ฆฌํฌ์ง€ํ† ๋ฆฌ์—์„œ ๋‹ค์šด๋กœ๋“œํ•˜์„ธ์š”.
```bash
huggingface-cli download fishaudio/fish-speech-1.4 --local-dir checkpoints/fish-speech-1.4
```
### 1. ์Œ์„ฑ์—์„œ ํ”„๋กฌํ”„ํŠธ ์ƒ์„ฑ:
!!! note
๋ชจ๋ธ์ด ์Œ์ƒ‰์„ ๋ฌด์ž‘์œ„๋กœ ์„ ํƒํ•˜๋„๋ก ํ•˜๋ ค๋ฉด ์ด ๋‹จ๊ณ„๋ฅผ ๊ฑด๋„ˆ๋›ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
python tools/vqgan/inference.py \
-i "paimon.wav" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
์ด ๋ช…๋ น์„ ์‹คํ–‰ํ•˜๋ฉด `fake.npy` ํŒŒ์ผ์„ ์–ป๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
### 2. ํ…์ŠคํŠธ์—์„œ ์‹œ๋งจํ‹ฑ ํ† ํฐ ์ƒ์„ฑ:
```bash
python tools/llama/generate.py \
--text "๋ณ€ํ™˜ํ•  ํ…์ŠคํŠธ" \
--prompt-text "์ฐธ๊ณ ํ•  ํ…์ŠคํŠธ" \
--prompt-tokens "fake.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4" \
--num-samples 2 \
--compile
```
์ด ๋ช…๋ น์„ ์‹คํ–‰ํ•˜๋ฉด ์ž‘์—… ๋””๋ ‰ํ† ๋ฆฌ์— `codes_N` ํŒŒ์ผ์ด ์ƒ์„ฑ๋˜๋ฉฐ, N์€ 0๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ์ •์ˆ˜์ž…๋‹ˆ๋‹ค.
!!! note
๋น ๋ฅธ ์ถ”๋ก ์„ ์œ„ํ•ด `--compile` ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์—ฌ CUDA ์ปค๋„์„ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (~์ดˆ๋‹น 30 ํ† ํฐ -> ~์ดˆ๋‹น 500 ํ† ํฐ).
`--compile` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ฃผ์„ ์ฒ˜๋ฆฌํ•˜์—ฌ ๊ฐ€์†ํ™” ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
!!! info
bf16์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” GPU์˜ ๊ฒฝ์šฐ `--half` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
### 3. ์‹œ๋งจํ‹ฑ ํ† ํฐ์—์„œ ์Œ์„ฑ ์ƒ์„ฑ:
#### VQGAN ๋””์ฝ”๋”
```bash
python tools/vqgan/inference.py \
-i "codes_0.npy" \
--checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth"
```
## HTTP API ์ถ”๋ก 
์ถ”๋ก ์„ ์œ„ํ•œ HTTP API๋ฅผ ์ œ๊ณตํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๋ช…๋ น์–ด๋กœ ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
python -m tools.api \
--listen 0.0.0.0:8080 \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```
์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ด๊ณ  ์‹ถ๋‹ค๋ฉด `--compile` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ดํ›„, http://127.0.0.1:8080/ ์—์„œ API๋ฅผ ํ™•์ธํ•˜๊ณ  ํ…Œ์ŠคํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์•„๋ž˜๋Š” `tools/post_api.py`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์š”์ฒญ์„ ๋ณด๋‚ด๋Š” ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
```bash
python -m tools.post_api \
--text "์ž…๋ ฅํ•  ํ…์ŠคํŠธ" \
--reference_audio "์ฐธ๊ณ  ์Œ์„ฑ ๊ฒฝ๋กœ" \
--reference_text "์ฐธ๊ณ  ์Œ์„ฑ์˜ ํ…์ŠคํŠธ ๋‚ด์šฉ" \
--streaming True
```
์œ„ ๋ช…๋ น์€ ์ฐธ๊ณ  ์Œ์„ฑ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์›ํ•˜๋Š” ์Œ์„ฑ์„ ํ•ฉ์„ฑํ•˜๊ณ , ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐฉ์‹์œผ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค์Œ ์˜ˆ์‹œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ฐธ๊ณ  ์Œ์„ฑ ๊ฒฝ๋กœ์™€ ํ…์ŠคํŠธ๋ฅผ ํ•œ๊บผ๋ฒˆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ช…๋ น์—์„œ ๊ณต๋ฐฑ์œผ๋กœ ๊ตฌ๋ถ„ํ•˜์—ฌ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
```bash
python -m tools.post_api \
--text "์ž…๋ ฅํ•  ํ…์ŠคํŠธ" \
--reference_audio "์ฐธ๊ณ  ์Œ์„ฑ ๊ฒฝ๋กœ1" "์ฐธ๊ณ  ์Œ์„ฑ ๊ฒฝ๋กœ2" \
--reference_text "์ฐธ๊ณ  ์Œ์„ฑ ํ…์ŠคํŠธ1" "์ฐธ๊ณ  ์Œ์„ฑ ํ…์ŠคํŠธ2"\
--streaming False \
--output "generated" \
--format "mp3"
```
์œ„ ๋ช…๋ น์–ด๋Š” ์—ฌ๋Ÿฌ ์ฐธ๊ณ  ์Œ์„ฑ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ `MP3` ํ˜•์‹์˜ ์Œ์„ฑ์„ ํ•ฉ์„ฑํ•˜์—ฌ, ํ˜„์žฌ ๋””๋ ‰ํ† ๋ฆฌ์— `generated.mp3`๋กœ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
`--reference_audio`์™€ `--reference_text` ๋Œ€์‹ ์— `--reference_id`(ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉ ๊ฐ€๋Šฅ)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กœ์ ํŠธ ๋ฃจํŠธ ๋””๋ ‰ํ† ๋ฆฌ์— `references/<your reference_id>` ํด๋”๋ฅผ ๋งŒ๋“ค์–ด ํ•ด๋‹น ์Œ์„ฑ๊ณผ ์ฃผ์„ ํ…์ŠคํŠธ๋ฅผ ๋„ฃ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฐธ๊ณ  ์Œ์„ฑ์€ ์ตœ๋Œ€ 90์ดˆ๊นŒ์ง€ ์ง€์›๋ฉ๋‹ˆ๋‹ค.
!!! info
์ œ๊ณต๋˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” `python -m tools.post_api -h`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
## GUI ์ถ”๋ก 
[ํด๋ผ์ด์–ธํŠธ ๋‹ค์šด๋กœ๋“œ](https://github.com/AnyaCoder/fish-speech-gui/releases)
## WebUI ์ถ”๋ก 
๋‹ค์Œ ๋ช…๋ น์œผ๋กœ WebUI๋ฅผ ์‹œ์ž‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```bash
python -m tools.webui \
--llama-checkpoint-path "checkpoints/fish-speech-1.4" \
--decoder-checkpoint-path "checkpoints/fish-speech-1.4/firefly-gan-vq-fsq-8x1024-21hz-generator.pth" \
--decoder-config-name firefly_gan_vq
```
> ์ถ”๋ก  ์†๋„๋ฅผ ๋†’์ด๊ณ  ์‹ถ๋‹ค๋ฉด `--compile` ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
!!! note
๋ผ๋ฒจ ํŒŒ์ผ๊ณผ ์ฐธ๊ณ  ์Œ์„ฑ ํŒŒ์ผ์„ ๋ฏธ๋ฆฌ ๋ฉ”์ธ ๋””๋ ‰ํ† ๋ฆฌ์˜ `references` ํด๋”์— ์ €์žฅํ•ด ๋‘๋ฉด, WebUI์—์„œ ๋ฐ”๋กœ ํ˜ธ์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. (ํ•ด๋‹น ํด๋”๋Š” ์ง์ ‘ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.)
!!! note
WebUI๋ฅผ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`๊ณผ ๊ฐ™์€ Gradio ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์ฆ๊ธฐ์„ธ์š”!