Update README.md
Browse files
README.md
CHANGED
@@ -1 +1,42 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
metrics:
|
4 |
+
- perplexity
|
5 |
+
base_model: microsoft/Phi-3.5-mini-instruct
|
6 |
+
pipeline_tag: text-classification
|
7 |
+
library_name: transformers
|
8 |
+
---
|
9 |
+
Phi-3.5-Mini-instruct-quantized-autoround-asym-4bit
|
10 |
+
# Model Description:
|
11 |
+
Phi-3.5-Mini-instruct-quantized-autoround-asym-4bit is a 4-bit asymmetrically quantized version of the Microsoft Phi-3.5 Mini Instruct model. The original model was quantized using the GPTQ (Generative Pre-trained Transformer Quantization) method and asymmetric quantization, reducing its size from 7.6 GB to 2.28 GB.
|
12 |
+
|
13 |
+
## Intended Use:
|
14 |
+
This quantized model can be used for various natural language processing tasks, such as text generation, language translation, and question answering. Its reduced size allows for deployment on devices with limited memory, such as GPUs with less than 8 GB of VRAM.
|
15 |
+
|
16 |
+
## Limitations:
|
17 |
+
|
18 |
+
*The quantization process may result in some loss of precision compared to the original model.
|
19 |
+
*The model's performance may be slightly lower than the full-precision version.
|
20 |
+
*The model may not be suitable for tasks requiring high precision or exact numerical computations.
|
21 |
+
|
22 |
+
## Training Procedure:
|
23 |
+
The quantization process was performed using the AutoGPTQ library and the GPTQ algorithm. The model was quantized to 4-bit precision using asymmetric quantization with automatic rounding.
|
24 |
+
|
25 |
+
## Evaluation:
|
26 |
+
The model's performance was evaluated before and after quantization using the perplexity metric. The evaluation process was symmetric, using the same metric and procedure for both the original and quantized models.
|
27 |
+
|
28 |
+
## Quantization Configuration:
|
29 |
+
Quantization method: GPTQ (Generative Pre-trained Transformer Quantization)
|
30 |
+
Bits: 4
|
31 |
+
Symmetric quantization: False(asymmetric quantization used).
|
32 |
+
|
33 |
+
## Hardware Requirements:
|
34 |
+
|
35 |
+
The quantized model can be run on GPUs with less than 8 GB of memory, thanks to its reduced size of 2.28 GB.
|
36 |
+
|
37 |
+
## License:
|
38 |
+
The model is released under the Apache License 2.0.
|
39 |
+
|
40 |
+
## Contact:
|
41 |
+
|
42 |
+
For any questions or feedback, please contact the model creator, Satwik, on LinkedIn or via email.
|