7B-int8模型运行问题

#1
by enozhu - opened

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation import GenerationConfig
import torch
rev="c04bccd3a8ec5e2fe955196de6a8da1be1d41066"

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-Chat-Int8", trust_remote_code=True, revision=rev)

model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen-7B-Chat-Int8",
device_map="auto",
trust_remote_code=True,
revision=rev
).eval()
print(model)
model.generation_config = GenerationConfig.from_pretrained("Qwen/Qwen-7B-Chat-Int8", trust_remote_code=True, revision=rev)

response, history = model.chat(tokenizer, "你好", history=None)
print(response)

按官方demo运行Int8模型,输出会出现乱码,打印from_pretrained后的model(print(model)),发现mlp和attention_prj层都是Linear, 比如w1:(w1): Linear(in_features=4096, out_features=11008, bias=False), 明显没有量化?是配置的问题么

enozhu changed discussion status to closed
enozhu changed discussion status to open

Sign up or log in to comment