ssyok commited on
Commit
01b5826
1 Parent(s): 43a4915

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +116 -3
README.md CHANGED
@@ -1,3 +1,116 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ tags:
5
+ - ONNX
6
+ - DML
7
+ - ONNXRuntime
8
+ - phi3
9
+ - nlp
10
+ - conversational
11
+ - custom_code
12
+ inference: false
13
+ language:
14
+ - en
15
+ ---
16
+ # EmbeddedLLM/Phi-3-mini-4k-instruct-062024 ONNX
17
+
18
+ ## Model Summary
19
+
20
+ This model is an ONNX-optimized version of [microsoft/Phi-3-mini-4k-instruct (June 2024)](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct), designed to provide accelerated inference on a variety of hardware using ONNX Runtime(CPU and DirectML).
21
+ DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning, providing GPU acceleration for a wide range of supported hardware and drivers, including AMD, Intel, NVIDIA, and Qualcomm GPUs.
22
+
23
+ ## ONNX Models
24
+
25
+ Here are some of the optimized configurations we have added:
26
+ - **ONNX model for int4 DirectML:** ONNX model for AMD, Intel, and NVIDIA GPUs on Windows, quantized to int4 using AWQ.
27
+
28
+ ## Usage
29
+
30
+ ### Installation and Setup
31
+
32
+ To use the EmbeddedLLM/Phi-3-mini-4k-instruct-062024 ONNX model on Windows with DirectML, follow these steps:
33
+
34
+ 1. **Create and activate a Conda environment:**
35
+ ```sh
36
+ conda create -n onnx python=3.10
37
+ conda activate onnx
38
+ ```
39
+
40
+ 2. **Install Git LFS:**
41
+ ```sh
42
+ winget install -e --id GitHub.GitLFS
43
+ ```
44
+
45
+ 3. **Install Hugging Face CLI:**
46
+ ```sh
47
+ pip install huggingface-hub[cli]
48
+ ```
49
+
50
+ 4. **Download the model:**
51
+ ```sh
52
+ huggingface-cli download EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx --include="onnx/directml/Phi-3-mini-4k-instruct-062024-int4/*" --local-dir .\Phi-3-mini-4k-instruct-062024-int4
53
+ ```
54
+
55
+ 5. **Install necessary Python packages:**
56
+ ```sh
57
+ pip install numpy==1.26.4
58
+ pip install onnxruntime-directml
59
+ pip install --pre onnxruntime-genai-directml==0.3.0
60
+ ```
61
+
62
+ 6. **Install Visual Studio 2015 runtime:**
63
+ ```sh
64
+ conda install conda-forge::vs2015_runtime
65
+ ```
66
+
67
+ 7. **Download the example script:**
68
+ ```sh
69
+ Invoke-WebRequest -Uri "https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py" -OutFile "phi3-qa.py"
70
+ ```
71
+
72
+ 8. **Run the example script:**
73
+ ```sh
74
+ python phi3-qa.py -m .\Phi-3-mini-4k-instruct-062024-int4
75
+ ```
76
+
77
+ ### Hardware Requirements
78
+
79
+ **Minimum Configuration:**
80
+ - **Windows:** DirectX 12-capable GPU (AMD/Nvidia)
81
+ - **CPU:** x86_64 / ARM64
82
+ **Tested Configurations:**
83
+ - **GPU:** AMD Ryzen 8000 Series iGPU (DirectML)
84
+ - **CPU:** AMD Ryzen CPU
85
+
86
+ ## Model Description
87
+ - **Developed by:** Microsoft
88
+ - **Model type:** ONNX
89
+ - **Language(s) (NLP):** Python, C, C++
90
+ - **License:** Apache License Version 2.0
91
+ - **Model Description:** This model is a conversion of the Phi-3-mini-4k-instruct-062024 for ONNX Runtime inference, optimized for DirectML.
92
+
93
+ ## Performance Metrics
94
+
95
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
96
+ ### DirectML
97
+ We measured the performance of DirectML on AMD Ryzen 9 7940HS /w Radeon 78
98
+
99
+ | Prompt Length | Generation Length | Average Throughput (tps) |
100
+ |---------------------------|-------------------|-----------------------------|
101
+ | 128 | 128 | 53.46686 |
102
+ | 128 | 256 | 53.11233 |
103
+ | 128 | 512 | 57.45816 |
104
+ | 128 | 1024 | 33.44713 |
105
+ | 256 | 128 | 76.50182 |
106
+ | 256 | 256 | 66.68873 |
107
+ | 256 | 512 | 70.83862 |
108
+ | 256 | 1024 | 34.64715 |
109
+ | 512 | 128 | 85.10079 |
110
+ | 512 | 256 | 68.64049 |
111
+ | 512 | 512 | - |
112
+ | 512 | 1024 | - |
113
+ | 1024 | 128 | - |
114
+ | 1024 | 256 | - |
115
+ | 1024 | 512 | - |
116
+ | 1024 | 1024 | - |