|
--- |
|
license: llama2 |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- facebook |
|
- meta |
|
- llama |
|
- llama-2 |
|
- ONNX |
|
- DirectML |
|
- DML |
|
- conversational |
|
- ONNXRuntime |
|
- custom_code |
|
--- |
|
|
|
# Llama-2-13b-chat ONNX models for DirectML |
|
This repository hosts the optimized versions of [meta-llama/Llama-2-13b-chat-hf](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) to accelerate inference with ONNX Runtime for DirectML. |
|
|
|
## Usage on Windows (Intel / AMD / Nvidia / Qualcomm) |
|
```powershell |
|
conda create -n onnx python=3.10 |
|
conda activate onnx |
|
winget install -e --id GitHub.GitLFS |
|
pip install huggingface-hub[cli] |
|
huggingface-cli download EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml --local-dir .\llama-2-13b-chat |
|
pip install numpy==1.26.4 |
|
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py" |
|
pip install onnxruntime-directml |
|
pip install --pre onnxruntime-genai-directml |
|
conda install conda-forge::vs2015_runtime |
|
python phi3-qa.py -m .\llama-2-13b-chat |
|
``` |
|
|
|
## What is DirectML |
|
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm. |