|
--- |
|
license: apache-2.0 |
|
--- |
|
这是基于Auto-GPTQ框架的量化模型,模型选取为huatuoGPT2-7B,这是一个微调模型,基底模型为百川-7B。 |
|
|
|
参数说明: |
|
原模型大小:16GB,量化后模型大小:5GB |
|
|
|
推理准确度尚未测试,请谨慎使用 |
|
|
|
量化过程中,校准数据采用微调训练集Medical Fine-tuning Instruction (GPT-4)。 |
|
|
|
使用示例: |
|
|
|
确保你安装了bitsandbytes |
|
``` |
|
pip install bitsandbytes |
|
``` |
|
``` |
|
确保你安装了auto-gptq |
|
|
|
!git clone https://github.com/AutoGPTQ/AutoGPTQ |
|
|
|
cd AutoGPTQ |
|
|
|
!pip install -e . |
|
``` |
|
|
|
``` |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers.generation.utils import GenerationConfig |
|
tokenizer = AutoTokenizer.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", use_fast=True, trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits", device_map="auto", torch_dtype="auto", trust_remote_code=True) |
|
model.generation_config = GenerationConfig.from_pretrained("jiangchengchengNLP/huatuo_AutoGPTQ_7B4bits") |
|
messages = [] |
|
messages.append({"role": "user", "content": "肚子疼怎么办?"}) |
|
response = model.HuatuoChat(tokenizer, messages) |
|
print(response) |
|
|
|
|
|
``` |
|
更多量化细节: |
|
|
|
量化环境:双卡T4 |
|
|
|
校正规模:512 训练对 |
|
|
|
量化配置: |
|
``` |
|
ntize_config = BaseQuantizeConfig( |
|
bits=4, # 4 or 8 |
|
group_size=128, |
|
damp_percent=0.01, |
|
desc_act=False, # set to False can significantly speed up inference but the perplexity may slightly bad |
|
static_groups=False, |
|
sym=True, |
|
true_sequential=True, |
|
model_name_or_path=None, |
|
model_file_base_name="model" |
|
) |
|
``` |
|
|
|
|
|
|
|
|