Transformers documentation

Build your own machine

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Build your own machine

One of the most important consideration when building a machine for deep learning is the GPU choice. GPUs are the standard workhorse for deep learning owing to their tensor cores for performing very efficient matrix multiplication and high memory bandwidth. To train large models, you either need a more powerful GPU, multiple GPUs, or take advantage of techniques that offload some of the load to the CPU or NVMe.

This guide provides some practical tips for setting up a GPU for deep learning. For a more detailed discussion and comparison of GPUs, take a look at the Which GPU(s) to Get for Deep Learning blog post.

Power

High-end consumer GPUs may have two or three PCIe 8-pin power sockets, and you should make sure you have the same number of 12V PCIe 8-pin cables connected to each socket. Don’t use a pigtail cable, a single cable with two splits at one end, to connect two sockets or else you won’t get full performance from your GPU.

Each PCIe 8-pin power cable should be connected to a 12V rail on the power supply unit (PSU) and can deliver up to 150W. Other GPUs may use a PCIe 12-pin connector which can deliver up to 500-600W. Lower-end GPUs may only use a PCIe 6-pin connector which supplies up to 75W.

It is important the PSU has stable voltage otherwise it may not be able to supply the GPU with enough power to function properly during peak usage.

Cooling

An overheated GPU throttles its performance and can even shutdown if it’s too hot to prevent damage. Keeping the GPU temperature low, anywhere between 158 - 167F, is essential for delivering full performance and maintaining its lifespan. Once temperatures reach 183 - 194F, the GPU may begin to throttle performance.

Multi-GPU connectivity

When your setup uses multiple GPUs, it is important to consider how they’re connected. NVLink connections are faster than PCIe bridges, but you should also consider the parallelism strategy you’re using. For example, in DistributedDataParallel, GPUs communicate less frequently compared to ZeRO-DP. In this case, a slower connection is not as important.

Run the command below to check how your GPUs are connected.

nvidia-smi topo -m
NVLink
without NVLink

NVLink is a high-speed communication system designed by NVIDIA for connecting multiple NVIDIA GPUs. Training openai-community/gpt2 on a small sample of the wikitext dataset is ~23% faster with NVLink.

On a machine with two GPUs connected with NVLink, an example output of nvidia-smi topo -m is shown below.

        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      NV2     0-23            N/A
GPU1    NV2      X      0-23            N/A

NV2 indicates GPU0 and GPU1 are connected by 2 NVLinks.

< > Update on GitHub