那这个怎么调用呢
#1
by
yjianchun
- opened
如何使用呢?有界面的使用方式吗?推理代码是怎么样的,能给个样例吗
本项目可在各个场景进行使用,部署方案视你的具体场景而定。
如果你有显卡的话,可以尝试以下方式进行模型加载
import os
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
os.environ["CUDA_VISIBLE_DEVICES"]="0"
generation_config = dict(
temperature=0.2,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.3,
max_new_tokens=2048
)
# The prompt template below is taken from llama.cpp
# and is slightly different from the one used in training.
# But we find it gives better results
prompt_input = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n\n{instruction}\n\n### Response:\n\n"
)
def generate_prompt(instruction, input=None):
if input:
instruction = instruction + '\n' + input
return prompt_input.format_map({'instruction': instruction})
load_type = torch.float16
device = torch.device(0)
tokenizer = LlamaTokenizer.from_pretrained("${Model_Path}")
base_model = LlamaForCausalLM.from_pretrained(
"${Model_Path}",
load_in_8bit=False,
torch_dtype=load_type,
low_cpu_mem_usage=True,
device_map='auto',
)
model = base_model
model.eval()
with torch.no_grad(), torch.autocast("cuda"):
raw_input_text = "什么是强化学习?"
input_text = generate_prompt(instruction=raw_input_text)
inputs = tokenizer(input_text,return_tensors="pt")
generation_output = model.generate(
input_ids = inputs["input_ids"].to(device),
attention_mask = inputs['attention_mask'].to(device),
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s,skip_special_tokens=True)
response = output.split("### Response:")[1].strip()
print(response)
能帮弄个gradio方式吗 简单的就行 初学者想用用你的这个模型
该模型可直接用https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/inference/gradio_demo.py 加载模型。
模型clone到本地后,可运行:
python gradio_demo.py --base_model ${Model_Path} --tokenizer_path ${Model_Path} --gpus 0
启动服务
想问下train_sft.py中的coati包在哪里?显示找不到这个包