Update README.md
Browse files
README.md
CHANGED
@@ -5,4 +5,84 @@ pipeline_tag: text-generation
|
|
5 |
tags:
|
6 |
- directml
|
7 |
- windows
|
8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
tags:
|
6 |
- directml
|
7 |
- windows
|
8 |
+
---
|
9 |
+
|
10 |
+
# Model Card for Model ID
|
11 |
+
|
12 |
+
## Model Details
|
13 |
+
google/gemma-2-9b quantized to ONNX GenAI INT4 with Microsoft DirectML optimization.<br>
|
14 |
+
Output is reformatted that each sentence starts at new line to improve readability.
|
15 |
+
<pre>
|
16 |
+
...
|
17 |
+
vNewDecoded = tokenizer_stream.decode(new_token)
|
18 |
+
if re.findall("^[\x2E\x3A\x3B]$", vPreviousDecoded) and vNewDecoded.startswith(" ") and (not vNewDecoded.startswith(" *")) :
|
19 |
+
vNewDecoded = "\n" + vNewDecoded.replace(" ", "", 1)
|
20 |
+
print(vNewDecoded, end='', flush=True)
|
21 |
+
vPreviousDecoded = vNewDecoded
|
22 |
+
...
|
23 |
+
</pre>
|
24 |
+
|
25 |
+
<img src="https://zci.sourceforge.io/epub/gemmaonnx.png">
|
26 |
+
### Model Description
|
27 |
+
google/gemma-2-9b quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
|
28 |
+
https://onnxruntime.ai/docs/genai/howto/install.html#directml
|
29 |
+
|
30 |
+
Created using ONNX Runtime GenAI's builder.py<br>
|
31 |
+
https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
|
32 |
+
|
33 |
+
Build options:<br>
|
34 |
+
INT4 accuracy level: FP32 (float32)
|
35 |
+
|
36 |
+
- **Developed by:** Mochamad Aris Zamroni
|
37 |
+
|
38 |
+
### Model Sources [optional]
|
39 |
+
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
|
40 |
+
|
41 |
+
### Direct Use
|
42 |
+
This is Microsoft Windows DirectML optimized model.<br>
|
43 |
+
It might not be working in ONNX execution provider other than DmlExecutionProvider.<br>
|
44 |
+
The needed python scripts are included in this repository
|
45 |
+
|
46 |
+
Prerequisites:<br>
|
47 |
+
1. Install Python 3.11 from Windows Store:<br>
|
48 |
+
https://apps.microsoft.com/search/publisher?name=Python+Software+Foundation
|
49 |
+
|
50 |
+
2. Open command line cmd.exe
|
51 |
+
|
52 |
+
3. Create python virtual environment, activate the environment then install onnxruntime-genai-directml<br>
|
53 |
+
mkdir c:\temp<br>
|
54 |
+
cd c:\temp<br>
|
55 |
+
python -m venv dmlgenai<br>
|
56 |
+
dmlgenai\Scripts\activate.bat<br>
|
57 |
+
pip install onnxruntime-genai-directml
|
58 |
+
|
59 |
+
4. Use the onnxgenairun.py to get chat interface.<br>
|
60 |
+
It is modified version of "https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py".<br>
|
61 |
+
The modification makes the text output changes to new line after "., :, and ;" to make the output easier to be read.
|
62 |
+
|
63 |
+
rem Change directory to where model and script files is stored<br>
|
64 |
+
cd this_onnx_model_directory<br>
|
65 |
+
python onnxgenairun.py --help<br>
|
66 |
+
python onnxgenairun.py -m . -v -g
|
67 |
+
|
68 |
+
5. (Optional but recommended) Device specific optimization.<br>
|
69 |
+
a. Open "dml-device-specific-optim.py" with text editor and change the file path accordingly.<br>
|
70 |
+
b. Run the python script: python dml-device-specific-optim.py<br>
|
71 |
+
c. Rename the original model.onnx to other file name and put and rename the optimized onnx file from step 5.b to model.onnx file.<br>
|
72 |
+
d. Rerun step 4.
|
73 |
+
|
74 |
+
#### Speeds, Sizes, Times [optional]
|
75 |
+
6 token/s in Radeon 780M with 8GB pre-allocated RAM.
|
76 |
+
|
77 |
+
#### Hardware
|
78 |
+
AMD Ryzen Zen4 7840U with integrated Radeon 780M GPU<br>
|
79 |
+
RAM 32GB<br>
|
80 |
+
|
81 |
+
#### Software
|
82 |
+
Microsoft DirectML on Windows 10
|
83 |
+
|
84 |
+
## Model Card Authors [optional]
|
85 |
+
Mochamad Aris Zamroni
|
86 |
+
|
87 |
+
## Model Card Contact
|
88 |
+
https://www.linkedin.com/in/zamroni/
|