BelleGroup
/

BELLE_BLOOM_GPTQ_4BIT

Text Generation

Transformers

PyTorch

bloom

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

dikw commited on Apr 13, 2023

Commit

2c5e423

1 Parent(s): 6d7fa2e

Update README.md

Browse files

Files changed (1) hide show

README.md +11 -13

README.md CHANGED Viewed

@@ -17,25 +17,23 @@ language:
 If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
 ## Model description
-8 bits quantization of [BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) and [BELLE-7B-0.2M](https://huggingface.co/BelleGroup/BELLE-7B-0.2M) using [GPTQ](https://arxiv.org/abs/2210.17323)
 GPTQ is SOTA one-shot weight quantization method.
 The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
-Basically, 8-bit quantization and 128 groupsize are recommended.
 **This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
 ## Model list
-| model name       |  file size | GPU memory usage |
-| -------------------------------------------------- |  ------------------- | ------------------ |
-|           base                 |          27G        |       ~28.2G         |
-|           bloom7b-2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
-|           bloom7b-2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
-|           bloom7b-0.2m-8bit-128g.pt                  |          9.7G        |       ~11.4G          |
-|           bloom7b-0.2m-4bit-128g.pt                  |          6.9G        |        ~8.4G          |
 ## Limitations
 There still exists a few issues in the model trained on current base model and data:
@@ -85,10 +83,10 @@ GPTQ是目前SOTA的one-shot权重量化方法。
 ## 模型列表
-| 模型名称       |  文件大小 | GPU显存占用 |
-| -------------------------------------------------- |  ------------------- | ------------------ |
-|           base                 |          27G        |       ~28.2G         |
-|           bloom7b-2m-4bit-128g.pt                  |          5.0G        |        ~8.0G          |
 ## 局限性和使用限制

 If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !
 ## Model description
+4 bits quantization of [BELLE_BLOOM_GPTQ_4BIT](https://huggingface.co/BelleGroup/BELLE_BLOOM_GPTQ_4BIT)  using [GPTQ](https://arxiv.org/abs/2210.17323)
 GPTQ is SOTA one-shot weight quantization method.
 The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq.
+Basically, 4-bit quantization and 128 groupsize are recommended.
 **This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model**
 ## Model list
+| model name       |  file size | GPU memory usage |CPU RAM|
+| -------------------------------------------------- |  ------------------- | ------------------ |------------------ |
+|           base                 |          27G        |       ~28.2G         | 20G  |
+|           bloom7b-2m-4bit-128g.pt                  |          5.0G        |       ~8.0G          | 8.0G|
 ## Limitations
 There still exists a few issues in the model trained on current base model and data:
 ## 模型列表
+| 模型名称       |  文件大小 | GPU显存占用 |CPU内存占用 |
+| -------------------------------------------------- |  ------------------- | ------------------ |------------------ |
+|           base                 |          27G        |       ~28.2G         | 20G  |
+|           bloom7b-2m-4bit-128g.pt                  |          5.0G        |       ~8.0G          | 8.0G|
 ## 局限性和使用限制