--- title: README emoji: 📚 colorFrom: blue colorTo: green sdk: static pinned: false --- # together we advance_AI AI is increasingly pervasive across the modern world. It’s driving our smart technology in retail, cities, factories and healthcare, and transforming our digital homes. AMD offers advanced AI acceleration from data center to edge, enabling high performance and high efficiency to make the world smarter. # Getting Started with Hugging Face Transformers AMD’s Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis ™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app. Details on getting started with Hugging Face models are available on the [Optimum page](https://huggingface.co/docs/optimum/main/en/amd/index) The following section describes how to use the most common transformers on Hugging Face for inference workloads on select AMD Instinct™ accelerators and AMD Radeon™ GPUs using the AMD ROCm software ecosystem. This base knowledge can be leveraged to start fine-tuning from a base model or even start developing your own model. General Linux and ML experience is a required pre-requisite. ## 1. Confirm you have a supported AMD hardware platform Is my [hardware supported](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html#gpu-support-table) with ROCm? ## 2. Install ROCm driver, libraries and tools Follow the detailed [installation instructions](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html) for your Linux based platform. ## 3. Install Machine Learning Frameworks Pip installation is an easy way to acquire all the required packages and is described in more detail below. >If you prefer to use a container strategy, check out the pre-built images at [ROCm Docker Hub](https://hub.docker.com/u/rocm/) and [AMD Infinity Hub](https://www.amd.com/en/technologies/infinity-hub) after installing the required [dependancies](https://rocm.docs.amd.com/en/latest/deploy/docker.html). ### PyTorch AMD ROCm is fully integrated into the mainline PyTorch ecosystem. Pip wheels are built and tested as part of the stable and nightly releases. Go to [pytorch.org](https://pytorch.org) and use the 'Install PyTorch' widget. Select 'Stable + Linux + Pip + Python + ROCm' to get the specific pip installation command. An example command line (note the versioning of the whl file): > `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2` ### TensorFlow AMD ROCm is upstreamed into the TensorFlow github repository. Pre-built wheels are hosted on [pipy.org](https://pypi.org/project/tensorflow-rocm/) The latest version can be installed with this command: > `pip install tensorflow-rocm` ## 4. Use a Hugging Face Model Now that you have the base requirements installed, get the latest transformer models. > `pip install transformers` This allows you to easily import any of the base models into your python application. Here is an example using [GPT2](https://huggingface.co/gpt2) in PyTorch: ```python from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained('gpt2') model = GPT2Model.from_pretrained('gpt2') text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) ``` All of the 200+ standard transformer models are regularly tested with our supported hardware platforms. Note that this also implies that all derivatives of those core models should also function correctly. Let us know if you run into issues at our [ROCm Community page](https://github.com/RadeonOpenCompute/ROCm/discussions) Here are a few of the more popular ones to get you started: - [BERT](https://huggingface.co/bert-base-uncased) - [BLOOM](https://huggingface.co/bigscience/bloom) - [LLaMA](https://huggingface.co/huggyllama/llama-7b) - [OPT](https://huggingface.co/facebook/opt-66b) - [T5](https://huggingface.co/t5-base) Click on the 'Use in Transformers' button to see the exact code to import a specific model into your Python application. ## 5. Optimum Support For a deeper dive into using Hugging Face libraries on AMD GPUs, check out the [Optimum](https://huggingface.co/docs/optimum/main/en/amd/amdgpu/overview) page describing details on Flash Attention 2, GPTQ Quantization and ONNX Runtime integration. # Serving a model with TGI Text Generation Inference (a.k.a “TGI”) provides an end-to-end solution to deploy large language models for inference at scale. TGI is already usable in production on AMD Instinct™ GPUs through the docker image `ghcr.io/huggingface/text-generation-inference:1.2-rocm`. Make sure to refer to the [documentation](https://huggingface.co/docs/text-generation-inference/supported_models#supported-hardware) concerning the support and any limitations. # Benchmarking The [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) is available as a utility to easily benchmark the performance of transformers on AMD GPUs, across normal and distributed settings, with various supported optimizations and quantization schemes. # Useful Links and Blogs - Detailed Llama-2 results show casing the [Optimum benchmark on AMD Instinct MI250](https://huggingface.co/blog/huggingface-and-optimum-amd) - Check out our blog titled [Run a Chatgpt-like Chatbot on a Single GPU with ROCm](https://huggingface.co/blog/chatbot-amd-gpu) - Complete ROCm [Documentation](https://rocm.docs.amd.com/en/latest/) for installation and usage - Extended training content and connect with the development community at the [Developer Hub](https://www.amd.com/en/developer/rocm-hub.html)