Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
@@ -55,7 +55,6 @@ This repository contains a quantized version of the `CohereForAI/aya-expanse-8b`
|
|
55 |
|
56 |
Before using the quantized model, please ensure your environment has:
|
57 |
- [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
|
58 |
-
- [optimum](https://github.com/huggingface/optimum)
|
59 |
|
60 |
### 2. Run inference
|
61 |
Load and use the quantized model as shown below in Python:
|
@@ -99,7 +98,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
99 |
|
100 |
## Benchmark Results
|
101 |
|
102 |
-
To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD
|
103 |
|
104 |
### Unquantized Model Results:
|
105 |
<img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
|
@@ -112,4 +111,6 @@ These results show that the GPTQ quantized model offers significant speed advant
|
|
112 |
## More Information
|
113 |
|
114 |
- **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
|
115 |
-
- **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email
|
|
|
|
|
|
55 |
|
56 |
Before using the quantized model, please ensure your environment has:
|
57 |
- [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
|
|
|
58 |
|
59 |
### 2. Run inference
|
60 |
Load and use the quantized model as shown below in Python:
|
|
|
98 |
|
99 |
## Benchmark Results
|
100 |
|
101 |
+
To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD MI210 GPU with ROCm 6.1 and below are the results:
|
102 |
|
103 |
### Unquantized Model Results:
|
104 |
<img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
|
|
|
111 |
## More Information
|
112 |
|
113 |
- **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
|
114 |
+
- **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email: `[email protected]`. I'll be happy to help!
|
115 |
+
|
116 |
+
|