kevinbazira commited on
Commit
d03bab8
·
verified ·
1 Parent(s): 992986e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -55,7 +55,6 @@ This repository contains a quantized version of the `CohereForAI/aya-expanse-8b`
55
 
56
  Before using the quantized model, please ensure your environment has:
57
  - [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
58
- - [optimum](https://github.com/huggingface/optimum)
59
 
60
  ### 2. Run inference
61
  Load and use the quantized model as shown below in Python:
@@ -99,7 +98,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
99
 
100
  ## Benchmark Results
101
 
102
- To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD MI200 GPU with ROCm 6.1 and below are the results:
103
 
104
  ### Unquantized Model Results:
105
  <img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
@@ -112,4 +111,6 @@ These results show that the GPTQ quantized model offers significant speed advant
112
  ## More Information
113
 
114
  - **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
115
- - **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email:`[email protected]`. I'll be happy to help!
 
 
 
55
 
56
  Before using the quantized model, please ensure your environment has:
57
  - [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
 
58
 
59
  ### 2. Run inference
60
  Load and use the quantized model as shown below in Python:
 
98
 
99
  ## Benchmark Results
100
 
101
+ To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD MI210 GPU with ROCm 6.1 and below are the results:
102
 
103
  ### Unquantized Model Results:
104
  <img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
 
111
  ## More Information
112
 
113
  - **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
114
+ - **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email: `[email protected]`. I'll be happy to help!
115
+
116
+