Update README.md
Browse files
README.md
CHANGED
@@ -17,25 +17,23 @@ language:
|
|
17 |
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
|
18 |
|
19 |
## Model description
|
20 |
-
|
21 |
|
22 |
GPTQ is SOTA one-shot weight quantization method.
|
23 |
|
24 |
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
|
25 |
|
26 |
-
Basically,
|
27 |
|
28 |
**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
|
29 |
|
30 |
## Model list
|
31 |
|
32 |
-
| model name | file size | GPU memory usage |
|
33 |
-
| -------------------------------------------------- | ------------------- | ------------------ |
|
34 |
-
| base | 27G | ~28.2G |
|
35 |
-
| bloom7b-2m-
|
36 |
-
|
37 |
-
| bloom7b-0.2m-8bit-128g.pt | 9.7G | ~11.4G |
|
38 |
-
| bloom7b-0.2m-4bit-128g.pt | 6.9G | ~8.4G |
|
39 |
|
40 |
## Limitations
|
41 |
There still exists a few issues in the model trained on current base model and data:
|
@@ -85,10 +83,10 @@ GPTQ是目前SOTA的one-shot权重量化方法。
|
|
85 |
|
86 |
## 模型列表
|
87 |
|
88 |
-
| 模型名称 | 文件大小 | GPU显存占用 |
|
89 |
-
| -------------------------------------------------- | ------------------- | ------------------ |
|
90 |
-
| base | 27G | ~28.2G |
|
91 |
-
| bloom7b-2m-4bit-128g.pt | 5.0G |
|
92 |
|
93 |
|
94 |
## 局限性和使用限制
|
|
|
17 |
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
|
18 |
|
19 |
## Model description
|
20 |
+
4 bits quantization of [BELLE_BLOOM_GPTQ_4BIT](https://huggingface.co/BelleGroup/BELLE_BLOOM_GPTQ_4BIT) using [GPTQ](https://arxiv.org/abs/2210.17323)
|
21 |
|
22 |
GPTQ is SOTA one-shot weight quantization method.
|
23 |
|
24 |
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
|
25 |
|
26 |
+
Basically, 4-bit quantization and 128 groupsize are recommended.
|
27 |
|
28 |
**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
|
29 |
|
30 |
## Model list
|
31 |
|
32 |
+
| model name | file size | GPU memory usage |CPU RAM|
|
33 |
+
| -------------------------------------------------- | ------------------- | ------------------ |------------------ |
|
34 |
+
| base | 27G | ~28.2G | 20G |
|
35 |
+
| bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G|
|
36 |
+
|
|
|
|
|
37 |
|
38 |
## Limitations
|
39 |
There still exists a few issues in the model trained on current base model and data:
|
|
|
83 |
|
84 |
## 模型列表
|
85 |
|
86 |
+
| 模型名称 | 文件大小 | GPU显存占用 |CPU内存占用 |
|
87 |
+
| -------------------------------------------------- | ------------------- | ------------------ |------------------ |
|
88 |
+
| base | 27G | ~28.2G | 20G |
|
89 |
+
| bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G|
|
90 |
|
91 |
|
92 |
## 局限性和使用限制
|