kevinbazira
/

aya-expanse-8b-gptq-4bit

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

kevinbazira commited on Jan 23

Commit

d03bab8

·

verified ·

1 Parent(s): 992986e

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -55,7 +55,6 @@ This repository contains a quantized version of the `CohereForAI/aya-expanse-8b`
 Before using the quantized model, please ensure your environment has:
 - [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
-- [optimum](https://github.com/huggingface/optimum)
 ### 2. Run inference
 Load and use the quantized model as shown below in Python:
@@ -99,7 +98,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ## Benchmark Results
-To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD MI200 GPU with ROCm 6.1 and below are the results:
 ### Unquantized Model Results:
 <img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
@@ -112,4 +111,6 @@ These results show that the GPTQ quantized model offers significant speed advant
 ## More Information
 - **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
-- **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email:`[email protected]`. I'll be happy to help!

 Before using the quantized model, please ensure your environment has:
 - [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ)
 ### 2. Run inference
 Load and use the quantized model as shown below in Python:
 ## Benchmark Results
+To evaluate the performance of the quantized model, we run benchmarks using the Hugging Face [Optimum Benchmark](https://github.com/huggingface/optimum-benchmark/tree/7cec62e016d76fe612308e4c2c074fc7f09289fd) tool on an AMD MI210 GPU with ROCm 6.1 and below are the results:
 ### Unquantized Model Results:
 <img src="unquantized-model-results.png" alt="Unquantized Model Results" style="width: 100%; object-fit: cover; display: block;">
 ## More Information
 - **Original Model**: For details about the original model's architecture, training dataset, and performance, please visit the CohereForAI [aya-expanse-8b model card](https://huggingface.co/CohereForAI/aya-expanse-8b).
+- **Support or inquiries**: If you run into any issues or have questions about the quantized model, feel free to reach me via email: `[email protected]`. I'll be happy to help!