onnx-community
/

gemma-2-9b-it-ONNX-DirectML-GenAI-INT4

Text Generation

Model card Files Files and versions Community

zamroni111 commited on Feb 8

Commit

077dce8

·

verified ·

1 Parent(s): 86e2cb3

Update README.md

Files changed (1) hide show

README.md +81 -1

README.md CHANGED Viewed

@@ -5,4 +5,84 @@ pipeline_tag: text-generation
 tags:
 - directml
 - windows
----

 tags:
 - directml
 - windows
+---
+# Model Card for Model ID
+## Model Details
+google/gemma-2-9b quantized to ONNX GenAI INT4 with Microsoft DirectML optimization.<br>
+Output is reformatted that each sentence starts at new line to improve readability.
+<pre>
+...
+vNewDecoded = tokenizer_stream.decode(new_token)
+if re.findall("^[\x2E\x3A\x3B]$", vPreviousDecoded) and vNewDecoded.startswith(" ") and (not vNewDecoded.startswith(" *")) :
+   vNewDecoded = "\n" + vNewDecoded.replace(" ", "", 1)
+print(vNewDecoded, end='', flush=True)
+vPreviousDecoded = vNewDecoded
+...
+</pre>
+<img src="https://zci.sourceforge.io/epub/gemmaonnx.png">
+### Model Description
+google/gemma-2-9b quantized to ONNX GenAI INT4 with Microsoft DirectML optimization<br>
+https://onnxruntime.ai/docs/genai/howto/install.html#directml
+Created using ONNX Runtime GenAI's builder.py<br>
+https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/src/python/py/models/builder.py
+Build options:<br>
+INT4 accuracy level: FP32 (float32)
+- **Developed by:** Mochamad Aris Zamroni
+### Model Sources [optional]
+https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct
+### Direct Use
+This is Microsoft Windows DirectML optimized model.<br>
+It might not be working in ONNX execution provider other than DmlExecutionProvider.<br>
+The needed python scripts are included in this repository
+Prerequisites:<br>
+1. Install Python 3.11 from Windows Store:<br>
+https://apps.microsoft.com/search/publisher?name=Python+Software+Foundation
+2. Open command line cmd.exe
+3. Create python virtual environment, activate the environment then install onnxruntime-genai-directml<br>
+mkdir c:\temp<br>
+cd c:\temp<br>
+python -m venv dmlgenai<br>
+dmlgenai\Scripts\activate.bat<br>
+pip install onnxruntime-genai-directml
+4. Use the onnxgenairun.py to get chat interface.<br>
+It is modified version of "https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi3-qa.py".<br>
+The modification makes the text output changes to new line after "., :, and ;" to make the output easier to be read.
+rem Change directory to where model and script files is stored<br>
+cd this_onnx_model_directory<br>
+python onnxgenairun.py --help<br>
+python onnxgenairun.py -m . -v -g
+5. (Optional but recommended) Device specific optimization.<br>
+a. Open "dml-device-specific-optim.py" with text editor and change the file path accordingly.<br>
+b. Run the python script: python dml-device-specific-optim.py<br>
+c. Rename the original model.onnx to other file name and put and rename the optimized onnx file from step 5.b to model.onnx file.<br>
+d. Rerun step 4.
+#### Speeds, Sizes, Times [optional]
+6 token/s in Radeon 780M with 8GB pre-allocated RAM.
+#### Hardware
+AMD Ryzen Zen4 7840U with integrated Radeon 780M GPU<br>
+RAM 32GB<br>
+#### Software
+Microsoft DirectML on Windows 10
+## Model Card Authors [optional]
+Mochamad Aris Zamroni
+## Model Card Contact
+https://www.linkedin.com/in/zamroni/