|
<!--Copyright 2025 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
|
|
โ ๏ธ Note that this file is in Markdown but contains specific syntax for our doc-builder (similar to MDX) that may not be |
|
rendered properly in your Markdown viewer. |
|
|
|
--> |
|
|
|
# ๊ฐ์๊ธฐ ์ ํ [[accelerator-selection]] |
|
|
|
๋ถ์ฐ ํ๋ จ ์ค์๋ ์ฌ์ฉํ ๊ฐ์๊ธฐ(CUDA, XPU, MPS, HPU ๋ฑ)์ ๊ฐ์์ ์์๋ฅผ ์ง์ ํ ์ ์์ต๋๋ค. ์ด๋ ์๋ก ๋ค๋ฅธ ์ปดํจํ
์ฑ๋ฅ์ ๊ฐ์ง ๊ฐ์๊ธฐ๋ค์ด ์์ ๋ ๋ ๋น ๋ฅธ ๊ฐ์๊ธฐ๋ฅผ ๋จผ์ ์ฌ์ฉํ๊ณ ์ถ๊ฑฐ๋, ์ฌ์ฉ ๊ฐ๋ฅํ ๊ฐ์๊ธฐ ์ค ์ผ๋ถ๋ง ์ฌ์ฉํ๊ณ ์ถ์ ๋ ์ ์ฉํฉ๋๋ค. ์ ํ ๊ณผ์ ์ [DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html)๊ณผ [DataParallel](https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html) ๋ชจ๋์์ ์๋ํฉ๋๋ค. Accelerate๋ [DeepSpeed integration](./main_classes/deepspeed)์ด ํ์ํ์ง ์์ต๋๋ค. |
|
|
|
์ด ๊ฐ์ด๋์์๋ ์ฌ์ฉํ ๊ฐ์๊ธฐ์ ๊ฐ์์ ์ฌ์ฉ ์์๋ฅผ ์ ํํ๋ ๋ฐฉ๋ฒ์ ๋ณด์ฌ๋๋ฆฝ๋๋ค. |
|
|
|
## ๊ฐ์๊ธฐ ๊ฐ์ [[number-of-accelerators]] |
|
|
|
์๋ฅผ ๋ค์ด, 4๊ฐ์ ๊ฐ์๊ธฐ๊ฐ ์๊ณ ์ฒ์ 2๊ฐ๋ง ์ฌ์ฉํ๊ณ ์ถ๋ค๋ฉด ์๋ ๋ช
๋ น์ด๋ฅผ ์คํํ์ธ์. |
|
|
|
<hfoptions id="select-accelerator"> |
|
<hfoption id="torchrun"> |
|
|
|
์ฌ์ฉํ ๊ฐ์๊ธฐ ๊ฐ์๋ฅผ ์ ํํ๋ ค๋ฉด `--nproc_per_node`๋ฅผ ์ฌ์ฉํ์ธ์. |
|
|
|
```bash |
|
torchrun --nproc_per_node=2 trainer-program.py ... |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Accelerate"> |
|
|
|
์ฌ์ฉํ ๊ฐ์๊ธฐ ๊ฐ์๋ฅผ ์ ํํ๋ ค๋ฉด `--num_processes`๋ฅผ ์ฌ์ฉํ์ธ์. |
|
|
|
```bash |
|
accelerate launch --num_processes 2 trainer-program.py ... |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="๐ค DeepSpeed"> |
|
|
|
์ฌ์ฉํ GPU ๊ฐ์๋ฅผ ์ ํํ๋ ค๋ฉด `--num_gpus`๋ฅผ ์ฌ์ฉํ์ธ์. |
|
|
|
```bash |
|
deepspeed --num_gpus 2 trainer-program.py ... |
|
``` |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
## ๊ฐ์๊ธฐ ์์ [[order-of-accelerators]] |
|
์ฌ์ฉํ ํน์ ๊ฐ์๊ธฐ์ ๊ทธ ์์๋ฅผ ์ ํํ๋ ค๋ฉด ํ๋์จ์ด์ ์ ํฉํ ํ๊ฒฝ ๋ณ์๋ฅผ ์ฌ์ฉํ์ธ์. ์ด๋ ๋ณดํต ๊ฐ ์คํ๋ง๋ค ๋ช
๋ น์ค์์ ์ค์ ๋์ง๋ง, `~/.bashrc`๋ ๋ค๋ฅธ ์์ ์ค์ ํ์ผ์ ์ถ๊ฐํ ์๋ ์์ต๋๋ค. |
|
|
|
์๋ฅผ ๋ค์ด, 4๊ฐ์ ๊ฐ์๊ธฐ(0, 1, 2, 3)๊ฐ ์๊ณ ๊ฐ์๊ธฐ 0๊ณผ 2๋ง ์คํํ๊ณ ์ถ๋ค๋ฉด: |
|
|
|
<hfoptions id="accelerator-type"> |
|
<hfoption id="CUDA"> |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ... |
|
``` |
|
|
|
GPU 0๊ณผ 2๋ง PyTorch์ "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ `cuda:0`๊ณผ `cuda:1`๋ก ๋งคํ๋ฉ๋๋ค. |
|
์์๋ฅผ ๋ฐ๊พธ๋ ค๋ฉด(GPU 2๋ฅผ `cuda:0`์ผ๋ก, GPU 0์ `cuda:1`๋ก ์ฌ์ฉ): |
|
|
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ... |
|
``` |
|
|
|
GPU ์์ด ์คํํ๋ ค๋ฉด: |
|
|
|
```bash |
|
CUDA_VISIBLE_DEVICES= python trainer-program.py ... |
|
``` |
|
|
|
`CUDA_DEVICE_ORDER`๋ฅผ ์ฌ์ฉํ์ฌ CUDA ์ฅ์น์ ์์๋ฅผ ์ ์ดํ ์๋ ์์ต๋๋ค: |
|
|
|
- PCIe ๋ฒ์ค ID ์์(`nvidia-smi`์ ์ผ์น): |
|
|
|
```bash |
|
$hf_i18n_placeholder21export CUDA_DEVICE_ORDER=PCI_BUS_ID |
|
``` |
|
|
|
- ์ปดํจํ
์ฑ๋ฅ ์์(๊ฐ์ฅ ๋น ๋ฅธ ๊ฒ๋ถํฐ): |
|
|
|
```bash |
|
export CUDA_DEVICE_ORDER=FASTEST_FIRST |
|
``` |
|
|
|
</hfoption> |
|
<hfoption id="Intel XPU"> |
|
|
|
```bash |
|
ZE_AFFINITY_MASK=0,2 torchrun trainer-program.py ... |
|
``` |
|
|
|
XPU 0๊ณผ 2๋ง PyTorch์ "๋ณด์ด๋ฉฐ" ๊ฐ๊ฐ `xpu:0`๊ณผ `xpu:1`๋ก ๋งคํ๋ฉ๋๋ค. |
|
์์๋ฅผ ๋ฐ๊พธ๋ ค๋ฉด(XPU 2๋ฅผ `xpu:0`์ผ๋ก, XPU 0์ `xpu:1`๋ก ์ฌ์ฉ): |
|
|
|
```bash |
|
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ... |
|
``` |
|
|
|
|
|
๋ค์์ผ๋ก Intel XPU์ ์์๋ฅผ ์ ์ดํ ์๋ ์์ต๋๋ค: |
|
|
|
```bash |
|
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1 |
|
``` |
|
|
|
Intel XPU์์์ ์ฅ์น ์ด๊ฑฐ ๋ฐ ์ ๋ ฌ์ ๋ํ ์์ธํ ์ ๋ณด๋ [Level Zero](https://github.com/oneapi-src/level-zero/blob/master/README.md?plain=1#L87) ๋ฌธ์๋ฅผ ์ฐธ์กฐํ์ธ์. |
|
|
|
</hfoption> |
|
</hfoptions> |
|
|
|
|
|
|
|
> [!WARNING] |
|
> ํ๊ฒฝ ๋ณ์๋ ๋ช
๋ น์ค์ ์ถ๊ฐํ๋ ๋์ exportํ ์ ์์ต๋๋ค. ํ๊ฒฝ ๋ณ์๊ฐ ์ด๋ป๊ฒ ์ค์ ๋์๋์ง ์์ด๋ฒ๋ฆฌ๊ณ ์๋ชป๋ ๊ฐ์๊ธฐ๋ฅผ ์ฌ์ฉํ๊ฒ ๋ ์ ์์ด ํผ๋์ค๋ฌ์ธ ์ ์์ผ๋ฏ๋ก ๊ถ์ฅํ์ง ์์ต๋๋ค. ๋์ ํน์ ํ๋ จ ์คํ์ ์ํ ํ๊ฒฝ ๋ณ์๋ฅผ ๊ฐ์ ๋ช
๋ น์ค์์ ์ค์ ํ๋ ๊ฒ์ด ์ผ๋ฐ์ ์ธ ๊ดํ์
๋๋ค. |
|
|
|
|