harheem's picture
Upload project files
3bfe3dd verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

๊ฐ€์†๊ธฐ ์„ ํƒ [[accelerator-selection]]

๋ถ„์‚ฐ ํ›ˆ๋ จ ์ค‘์— ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ(CUDA, XPU, MPS, HPU ๋“ฑ)์˜ ์ˆ˜์™€ ์ˆœ์„œ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์—ฐ์‚ฐ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ ๊ฐ€์†๊ธฐ๊ฐ€ ์žˆ๊ณ  ๋” ๋น ๋ฅธ ๊ฐ€์†๊ธฐ๋ฅผ ๋จผ์ € ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐ€์†๊ธฐ ์ค‘ ์ผ๋ถ€๋งŒ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์„ ํƒ ๊ณผ์ •์€ DistributedDataParallel๊ณผ DataParallel ๋ชจ๋‘์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. Accelerate๋‚˜ DeepSpeed integration์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ๋Š” ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ์˜ ์ˆ˜์™€ ์‚ฌ์šฉ ์ˆœ์„œ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊ฐ€์†๊ธฐ ์ˆ˜ [[number-of-accelerators]]

์˜ˆ๋ฅผ ๋“ค์–ด, 4๊ฐœ์˜ ๊ฐ€์†๊ธฐ๊ฐ€ ์žˆ๊ณ  ์ฒ˜์Œ 2๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”.

--nproc_per_node๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ ์ˆ˜๋ฅผ ์„ ํƒํ•˜์„ธ์š”.

torchrun --nproc_per_node=2  trainer-program.py ...

--num_processes๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ ์ˆ˜๋ฅผ ์„ ํƒํ•˜์„ธ์š”.

accelerate launch --num_processes 2 trainer-program.py ...

--num_gpus๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•  GPU ์ˆ˜๋ฅผ ์„ ํƒํ•˜์„ธ์š”.

deepspeed --num_gpus 2 trainer-program.py ...

๊ฐ€์†๊ธฐ ์ˆœ์„œ [[order-of-accelerators]]

์‚ฌ์šฉํ•  ํŠน์ • ๊ฐ€์†๊ธฐ์™€ ๊ทธ ์ˆœ์„œ๋ฅผ ์„ ํƒํ•˜๋ ค๋ฉด ํ•˜๋“œ์›จ์–ด์— ์ ํ•ฉํ•œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ์ด๋Š” ๊ฐ ์‹คํ–‰๋งˆ๋‹ค ๋ช…๋ น์ค„์—์„œ ์„ค์ •๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์ง€๋งŒ, ~/.bashrc๋‚˜ ๋‹ค๋ฅธ ์‹œ์ž‘ ์„ค์ • ํŒŒ์ผ์— ์ถ”๊ฐ€ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, 4๊ฐœ์˜ ๊ฐ€์†๊ธฐ(0, 1, 2, 3)๊ฐ€ ์žˆ๊ณ  ๊ฐ€์†๊ธฐ 0๊ณผ 2๋งŒ ์‹คํ–‰ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด:

CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...

GPU 0๊ณผ 2๋งŒ PyTorch์— "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ cuda:0๊ณผ cuda:1๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.
์ˆœ์„œ๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด (GPU 2๋ฅผ cuda:0์œผ๋กœ, GPU 0์„ cuda:1๋กœ ์‚ฌ์šฉ):

CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...

GPU ์—†์ด ์‹คํ–‰ํ•˜๋ ค๋ฉด:

CUDA_VISIBLE_DEVICES= python trainer-program.py ...

CUDA_DEVICE_ORDER๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ CUDA ์žฅ์น˜ ์ˆœ์„œ๋ฅผ ์ œ์–ดํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

  • PCIe ๋ฒ„์Šค ID ์ˆœ์„œ๋กœ ์ •๋ ฌ (nvidia-smi์™€ ์ผ์น˜):

    
    

$hf_i18n_placeholder21export CUDA_DEVICE_ORDER=PCI_BUS_ID ```

  • ์—ฐ์‚ฐ ์„ฑ๋Šฅ ์ˆœ์„œ๋กœ ์ •๋ ฌ (๊ฐ€์žฅ ๋น ๋ฅธ ๊ฒƒ๋ถ€ํ„ฐ):

    export CUDA_DEVICE_ORDER=FASTEST_FIRST
    
ZE_AFFINITY_MASK=0,2 torchrun trainer-program.py ...

XPU 0๊ณผ 2๋งŒ PyTorch์— "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ xpu:0๊ณผ xpu:1๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.
์ˆœ์„œ๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด (XPU 2๋ฅผ xpu:0์œผ๋กœ, XPU 0์„ xpu:1๋กœ ์‚ฌ์šฉ):

ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...

๋‹ค์Œ์œผ๋กœ Intel XPU ์ˆœ์„œ๋ฅผ ์ œ์–ดํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1

Intel XPU์˜ ์žฅ์น˜ ์—ด๊ฑฐ ๋ฐ ์ •๋ ฌ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋Š” Level Zero ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋Š” ๋ช…๋ น์ค„์— ์ถ”๊ฐ€ํ•˜๋Š” ๋Œ€์‹  exportํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ™˜๊ฒฝ ๋ณ€์ˆ˜๊ฐ€ ์–ด๋–ป๊ฒŒ ์„ค์ •๋˜์—ˆ๋Š”์ง€ ์žŠ์–ด๋ฒ„๋ฆฌ๊ณ  ๊ฒฐ๊ตญ ์ž˜๋ชป๋œ ๊ฐ€์†๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋  ์ˆ˜ ์žˆ์–ด ํ˜ผ๋ž€์Šค๋Ÿฌ์šธ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ถŒ์žฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹ , ๋™์ผํ•œ ๋ช…๋ น์ค„์—์„œ ํŠน์ • ํ›ˆ๋ จ ์‹คํ–‰์— ๋Œ€ํ•ด ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ธ ๊ด€๋ก€์ž…๋‹ˆ๋‹ค. ```