hf-transformers-docs-i18n-agent / test /test_final_translate.md
harheem's picture
Upload project files
3bfe3dd verified

A newer version of the Gradio SDK is available: 5.34.2

Upgrade

๊ฐ€์†๊ธฐ ์„ ํƒ [[accelerator-selection]]

๋ถ„์‚ฐ ํ›ˆ๋ จ ์ค‘์—๋Š” ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ(CUDA, XPU, MPS, HPU ๋“ฑ)์˜ ๊ฐœ์ˆ˜์™€ ์ˆœ์„œ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ปดํ“จํŒ… ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ ๊ฐ€์†๊ธฐ๋“ค์ด ์žˆ์„ ๋•Œ ๋” ๋น ๋ฅธ ๊ฐ€์†๊ธฐ๋ฅผ ๋จผ์ € ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๊ฑฐ๋‚˜, ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐ€์†๊ธฐ ์ค‘ ์ผ๋ถ€๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ์„ ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์„ ํƒ ๊ณผ์ •์€ DistributedDataParallel๊ณผ DataParallel ๋ชจ๋‘์—์„œ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. Accelerate๋‚˜ DeepSpeed integration์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ์˜ ๊ฐœ์ˆ˜์™€ ์‚ฌ์šฉ ์ˆœ์„œ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

๊ฐ€์†๊ธฐ ๊ฐœ์ˆ˜ [[number-of-accelerators]]

์˜ˆ๋ฅผ ๋“ค์–ด, 4๊ฐœ์˜ ๊ฐ€์†๊ธฐ๊ฐ€ ์žˆ๊ณ  ์ฒ˜์Œ 2๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜์„ธ์š”.

์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ ๊ฐœ์ˆ˜๋ฅผ ์„ ํƒํ•˜๋ ค๋ฉด --nproc_per_node๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

torchrun --nproc_per_node=2  trainer-program.py ...

์‚ฌ์šฉํ•  ๊ฐ€์†๊ธฐ ๊ฐœ์ˆ˜๋ฅผ ์„ ํƒํ•˜๋ ค๋ฉด --num_processes๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

accelerate launch --num_processes 2 trainer-program.py ...

์‚ฌ์šฉํ•  GPU ๊ฐœ์ˆ˜๋ฅผ ์„ ํƒํ•˜๋ ค๋ฉด --num_gpus๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.

deepspeed --num_gpus 2 trainer-program.py ...

๊ฐ€์†๊ธฐ ์ˆœ์„œ [[order-of-accelerators]]

์‚ฌ์šฉํ•  ํŠน์ • ๊ฐ€์†๊ธฐ์™€ ๊ทธ ์ˆœ์„œ๋ฅผ ์„ ํƒํ•˜๋ ค๋ฉด ํ•˜๋“œ์›จ์–ด์— ์ ํ•ฉํ•œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. ์ด๋Š” ๋ณดํ†ต ๊ฐ ์‹คํ–‰๋งˆ๋‹ค ๋ช…๋ น์ค„์—์„œ ์„ค์ •๋˜์ง€๋งŒ, ~/.bashrc๋‚˜ ๋‹ค๋ฅธ ์‹œ์ž‘ ์„ค์ • ํŒŒ์ผ์— ์ถ”๊ฐ€ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด, 4๊ฐœ์˜ ๊ฐ€์†๊ธฐ(0, 1, 2, 3)๊ฐ€ ์žˆ๊ณ  ๊ฐ€์†๊ธฐ 0๊ณผ 2๋งŒ ์‹คํ–‰ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด:

CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...

GPU 0๊ณผ 2๋งŒ PyTorch์— "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ cuda:0๊ณผ cuda:1๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.
์ˆœ์„œ๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด(GPU 2๋ฅผ cuda:0์œผ๋กœ, GPU 0์„ cuda:1๋กœ ์‚ฌ์šฉ):

CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...

GPU ์—†์ด ์‹คํ–‰ํ•˜๋ ค๋ฉด:

CUDA_VISIBLE_DEVICES= python trainer-program.py ...

CUDA_DEVICE_ORDER๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ CUDA ์žฅ์น˜์˜ ์ˆœ์„œ๋ฅผ ์ œ์–ดํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

  • PCIe ๋ฒ„์Šค ID ์ˆœ์„œ(nvidia-smi์™€ ์ผ์น˜):

    
    

$hf_i18n_placeholder21export CUDA_DEVICE_ORDER=PCI_BUS_ID ```

  • ์ปดํ“จํŒ… ์„ฑ๋Šฅ ์ˆœ์„œ(๊ฐ€์žฅ ๋น ๋ฅธ ๊ฒƒ๋ถ€ํ„ฐ):

    export CUDA_DEVICE_ORDER=FASTEST_FIRST
    
ZE_AFFINITY_MASK=0,2 torchrun trainer-program.py ...

XPU 0๊ณผ 2๋งŒ PyTorch์— "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ xpu:0๊ณผ xpu:1๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.
์ˆœ์„œ๋ฅผ ๋ฐ”๊พธ๋ ค๋ฉด(XPU 2๋ฅผ xpu:0์œผ๋กœ, XPU 0์„ xpu:1๋กœ ์‚ฌ์šฉ):

ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...

๋‹ค์Œ์œผ๋กœ Intel XPU์˜ ์ˆœ์„œ๋ฅผ ์ œ์–ดํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1

Intel XPU์—์„œ์˜ ์žฅ์น˜ ์—ด๊ฑฐ ๋ฐ ์ •๋ ฌ์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋Š” Level Zero ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋Š” ๋ช…๋ น์ค„์— ์ถ”๊ฐ€ํ•˜๋Š” ๋Œ€์‹  exportํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ™˜๊ฒฝ ๋ณ€์ˆ˜๊ฐ€ ์–ด๋–ป๊ฒŒ ์„ค์ •๋˜์—ˆ๋Š”์ง€ ์žŠ์–ด๋ฒ„๋ฆฌ๊ณ  ์ž˜๋ชป๋œ ๊ฐ€์†๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋  ์ˆ˜ ์žˆ์–ด ํ˜ผ๋ž€์Šค๋Ÿฌ์šธ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๊ถŒ์žฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋Œ€์‹  ํŠน์ • ํ›ˆ๋ จ ์‹คํ–‰์„ ์œ„ํ•œ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ๊ฐ™์€ ๋ช…๋ น์ค„์—์„œ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์ด ์ผ๋ฐ˜์ ์ธ ๊ด€ํ–‰์ž…๋‹ˆ๋‹ค.