|
--- |
|
license: apache-2.0 |
|
--- |
|
--- |
|
license: apache-2.0 |
|
tags: |
|
- text2text-generation |
|
pipeline_tag: text2text-generation |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
# GPTQ-for-Bloom |
|
|
|
## Welcome |
|
If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE ! |
|
|
|
## Model description |
|
4 bits quantization of [BELLE_BLOOM_GPTQ_4BIT](https://huggingface.co/BelleGroup/BELLE_BLOOM_GPTQ_4BIT) using [GPTQ](https://arxiv.org/abs/2210.17323) |
|
|
|
GPTQ is SOTA one-shot weight quantization method. |
|
|
|
The code of inference can be found in our Github project repository: https://github.com/LianjiaTech/BELLE/tree/main/gptq. |
|
|
|
Basically, 4-bit quantization and 128 groupsize are recommended. |
|
|
|
**This code is based on [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for [Bloom](https://arxiv.org/pdf/2211.05100.pdf) model** |
|
|
|
## Model list |
|
|
|
| model name | file size | GPU memory usage |CPU RAM| |
|
| -------------------------------------------------- | ------------------- | ------------------ |------------------ | |
|
| base | 27G | ~28.2G | 20G | |
|
| bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G| |
|
|
|
|
|
## Limitations |
|
There still exists a few issues in the model trained on current base model and data: |
|
|
|
1. The model might generate factual errors when asked to follow instructions related to facts. |
|
|
|
2. Occasionally generates harmful responses since the model still struggles to identify potential harmful instructions. |
|
|
|
3. Needs improvements on reasoning and coding. |
|
|
|
Since the model still has its limitations, we require developers only use the open-sourced code, data, model and any other artifacts generated via this project for research purposes. Commercial use and other potential harmful use cases are not allowed. |
|
|
|
## Citation |
|
|
|
Please cite us when using our code, data or model. |
|
|
|
``` |
|
@misc{BELLE, |
|
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li}, |
|
title = {BELLE: Bloom-Enhanced Large Language model Engine }, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/LianjiaTech/BELLE}}, |
|
} |
|
``` |
|
|
|
Cite the original BLOOM, Stanford Alpaca and Self-Instruct papers as well! |
|
|
|
*** |
|
|
|
# GPTQ-for-Bloom |
|
|
|
## 欢迎 |
|
如果您觉得此模型对您有帮助,请like此模型并在https://github.com/LianjiaTech/BELLE 项目中star我们! |
|
|
|
## 模型描述 |
|
对[BELLE-7B-2M](https://huggingface.co/BelleGroup/BELLE-7B-2M) and [BELLE-7B-0.2M](https://huggingface.co/BelleGroup/BELLE-7B-0.2M)进行8 bit(8位)量化。 |
|
|
|
GPTQ是目前SOTA的one-shot权重量化方法。 |
|
|
|
此模型的推理代码请见https://github.com/LianjiaTech/BELLE/tree/main/models/gptq . |
|
|
|
一般来说,推荐使用8-bit量化及groupsize = 128. |
|
|
|
**[Bloom](https://arxiv.org/pdf/2211.05100.pdf)模型使用[GPTQ](https://arxiv.org/abs/2210.17323)的推理代码基于[GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa)** |
|
|
|
## 模型列表 |
|
|
|
| 模型名称 | 文件大小 | GPU显存占用 |CPU内存占用 | |
|
| -------------------------------------------------- | ------------------- | ------------------ |------------------ | |
|
| base | 27G | ~28.2G | 20G | |
|
| bloom7b-2m-4bit-128g.pt | 5.0G | ~8.0G | 8.0G| |
|
|
|
|
|
## 局限性和使用限制 |
|
基于当前数据和基础模型训练得到的SFT模型,在效果上仍存在以下问题: |
|
|
|
1. 在涉及事实性的指令上可能会产生违背事实的错误回答。 |
|
|
|
2. 对于具备危害性的指令无法很好的鉴别,由此会产生危害性言论。 |
|
|
|
3. 在一些涉及推理、代码等场景下模型的能力仍有待提高。 |
|
|
|
基于以上模型局限性,我们要求开发者仅将我们开源的代码、数据、模型及后续用此项目生成的衍生物用于研究目的,不得用于商业,以及其他会对社会带来危害的用途。 |
|
|
|
## 引用 |
|
如果使用本项目的代码、数据或模型,请引用本项目。 |
|
``` |
|
@misc{BELLE, |
|
author = {Yunjie Ji, Yong Deng, Yan Gong, Yiping Peng, Qiang Niu, Baochang Ma, Xiangang Li}, |
|
title = {BELLE: Bloom-Enhanced Large Language model Engine }, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/LianjiaTech/BELLE}}, |
|
} |
|
``` |
|
也请同时引用原始的BLOOM论文、Stanford Alpaca和Self-Instruct论文。 |