RicardoLee/Llama2-chat-Chinese-50W · 那这个怎么调用呢

yjianchun

Jul 21, 2023

如何使用呢？有界面的使用方式吗？推理代码是怎么样的，能给个样例吗

RicardoLee

Owner Jul 21, 2023

•

edited Jul 21, 2023

本项目可在各个场景进行使用，部署方案视你的具体场景而定。
如果你有显卡的话，可以尝试以下方式进行模型加载

import os
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
os.environ["CUDA_VISIBLE_DEVICES"]="0"

generation_config = dict(
    temperature=0.2,
    top_k=40,
    top_p=0.9,
    do_sample=True,
    num_beams=1,
    repetition_penalty=1.3,
    max_new_tokens=2048
    )

 # The prompt template below is taken from llama.cpp
 # and is slightly different from the one used in training.
 # But we find it gives better results
prompt_input = (
    "Below is an instruction that describes a task. "
    "Write a response that appropriately completes the request.\n\n"
    "### Instruction:\n\n{instruction}\n\n### Response:\n\n"
)

def generate_prompt(instruction, input=None):
    if input:
        instruction = instruction + '\n' + input
    return prompt_input.format_map({'instruction': instruction})

load_type = torch.float16

device = torch.device(0)

tokenizer = LlamaTokenizer.from_pretrained("${Model_Path}")

base_model = LlamaForCausalLM.from_pretrained(
    "${Model_Path}",
    load_in_8bit=False,
    torch_dtype=load_type,
    low_cpu_mem_usage=True,
    device_map='auto',
    )

model = base_model

model.eval()

with torch.no_grad(), torch.autocast("cuda"):
    raw_input_text = "什么是强化学习？"
    input_text = generate_prompt(instruction=raw_input_text)
    inputs = tokenizer(input_text,return_tensors="pt")
    generation_output = model.generate(
        input_ids = inputs["input_ids"].to(device),
        attention_mask = inputs['attention_mask'].to(device),
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.pad_token_id,
        **generation_config
    )
    s = generation_output[0]
    output = tokenizer.decode(s,skip_special_tokens=True)
    response = output.split("### Response:")[1].strip()
    print(response)

yjianchun

Jul 21, 2023

能帮弄个gradio方式吗简单的就行初学者想用用你的这个模型

RicardoLee

Owner Jul 21, 2023

该模型可直接用https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/inference/gradio_demo.py 加载模型。
模型clone到本地后，可运行：
python gradio_demo.py --base_model ${Model_Path} --tokenizer_path ${Model_Path} --gpus 0
启动服务

BatmanBill

Jul 26, 2023

想问下train_sft.py中的coati包在哪里？显示找不到这个包