Spaces:
Running
Running
Added link and description about Optimum support for AMD GPUs
Browse files
README.md
CHANGED
|
@@ -93,6 +93,10 @@ Here are a few of the more popular ones to get you started:
|
|
| 93 |
|
| 94 |
Click on the 'Use in Transformers' button to see the exact code to import a specific model into your Python application.
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
# Serving a model with TGI
|
| 97 |
|
| 98 |
Text Generation Inference (a.k.a “TGI”) provides an end-to-end solution to deploy large language models for inference at scale.
|
|
|
|
| 93 |
|
| 94 |
Click on the 'Use in Transformers' button to see the exact code to import a specific model into your Python application.
|
| 95 |
|
| 96 |
+
## 5. Optimum Support
|
| 97 |
+
For a deeper dive into using Hugging Face libraries on AMD GPUs, check out the [Optimum](https://huggingface.co/docs/optimum/main/en/amd/amdgpu/overview) page
|
| 98 |
+
describing details on Flash Attention 2, GPTQ Quantization and ONNX Runtime integration.
|
| 99 |
+
|
| 100 |
# Serving a model with TGI
|
| 101 |
|
| 102 |
Text Generation Inference (a.k.a “TGI”) provides an end-to-end solution to deploy large language models for inference at scale.
|