dikw commited on
Commit
2c5e423
·
1 Parent(s): 6d7fa2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -13
README.md CHANGED
@@ -17,25 +17,23 @@ language:
17
  If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
18
 
19
  ## Model description
20
- 8 bits quantization of [BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) and [BELLE-7B-0.2M](https://huggingface.co/BelleGroup/BELLE-7B-0.2M) using [GPTQ](https://arxiv.org/abs/2210.17323)
21
 
22
  GPTQ is SOTA one-shot weight quantization method.
23
 
24
  The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
25
 
26
- Basically, 8-bit quantization and 128 groupsize are recommended.
27
 
28
  **This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
29
 
30
  ## Model list
31
 
32
- | model name | file size | GPU memory usage |
33
- | -------------------------------------------------- | ------------------- | ------------------ |
34
- | base | 27G | ~28.2G |
35
- | bloom7b-2m-8bit-128g.pt | 9.7G | ~11.4G |
36
- | bloom7b-2m-4bit-128g.pt | 6.9G | ~8.4G |
37
- | bloom7b-0.2m-8bit-128g.pt | 9.7G | ~11.4G |
38
- | bloom7b-0.2m-4bit-128g.pt | 6.9G | ~8.4G |
39
 
40
  ## Limitations
41
  There still exists a few issues in the model trained on current base model and data:
@@ -85,10 +83,10 @@ GPTQ是目前SOTA的one-shot权重量化方法。
85
 
86
  ## 模型列表
87
 
88
- | 模型名称 | 文件大小 | GPU显存占用 |
89
- | -------------------------------------------------- | ------------------- | ------------------ |
90
- | base | 27G | ~28.2G |
91
- | bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G |
92
 
93
 
94
  ## 局限性和使用限制
 
17
  If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
18
 
19
  ## Model description
20
+ 4 bits quantization of [BELLE_BLOOM_GPTQ_4BIT](https://huggingface.co/BelleGroup/BELLE_BLOOM_GPTQ_4BIT) using [GPTQ](https://arxiv.org/abs/2210.17323)
21
 
22
  GPTQ is SOTA one-shot weight quantization method.
23
 
24
  The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
25
 
26
+ Basically, 4-bit quantization and 128 groupsize are recommended.
27
 
28
  **This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
29
 
30
  ## Model list
31
 
32
+ | model name | file size | GPU memory usage |CPU RAM|
33
+ | -------------------------------------------------- | ------------------- | ------------------ |------------------ |
34
+ | base | 27G | ~28.2G | 20G |
35
+ | bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G|
36
+
 
 
37
 
38
  ## Limitations
39
  There still exists a few issues in the model trained on current base model and data:
 
83
 
84
  ## 模型列表
85
 
86
+ | 模型名称 | 文件大小 | GPU显存占用 |CPU内存占用 |
87
+ | -------------------------------------------------- | ------------------- | ------------------ |------------------ |
88
+ | base | 27G | ~28.2G | 20G |
89
+ | bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G|
90
 
91
 
92
  ## 局限性和使用限制