Update README.md

dd92a44 over 1 year ago

8.87 kB

	---
	language:
	- zh
	- en
	pipeline_tag: text-generation
	inference: false
	---
	# Baichuan-13B-Instruction

	![](./alpachino.png)

	<!-- Provide a quick summary of what the model is/does. -->

	## 介绍
	Baichuan-13B-Instruction 为 Baichuan-13B 系列模型进行指令微调后的版本，预训练模型可见 [Baichuan-13B-Base](https://huggingface.co/baichuan-inc/Baichuan-13B-Base)。


	## Demo

	如下是一个使用 gradio 的模型 demo
	```python
	import gradio as gr
	from transformers import AutoTokenizer, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True,use_fast=False)
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction",trust_remote_code=True ).half()
	model.cuda()

	def generate(histories, max_new_tokens=2048, do_sample = True, top_p = 0.95, temperature = 0.35, repetition_penalty=1.1):
	prompt = ""
	for history in histories:
	history_with_identity = "\nHuman:" + history[0] + "\n\nAssistant:" + history[1]
	prompt += history_with_identity
	input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
	outputs = model.generate(
	input_ids = input_ids,
	max_new_tokens=max_new_tokens,
	early_stopping=True,
	do_sample=do_sample,
	top_p=top_p,
	temperature=temperature,
	repetition_penalty=repetition_penalty,
	)
	rets = tokenizer.batch_decode(outputs, skip_special_tokens=True)
	generate_text = rets[0].replace(prompt, "")
	return generate_text

	with gr.Blocks() as demo:
	chatbot = gr.Chatbot()
	msg = gr.Textbox()
	clear = gr.Button("clear")

	def user(user_message, history):
	return "", history + [[user_message, ""]]

	def bot(history):
	print(history)
	bot_message = generate(history)
	history[-1][1] = bot_message
	return history

	msg.submit(user, [msg, chatbot], [msg, chatbot], queue=False).then(
	bot, chatbot, chatbot
	)
	clear.click(lambda: None, None, chatbot, queue=False)

	if __name__ == "__main__":
	demo.launch(server_name="0.0.0.0")



	```

	## 量化部署

	Baichuan-13B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 `from_pretrained` 时添加 `device_map='auto'` 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

	使用 int8 量化 (To use int8 quantization):
	```python
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
	model = model.quantize(8).cuda()
	```

	同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):
	```python
	model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
	model = model.quantize(4).cuda()
	```

	## 模型详情


	### 模型结构

	<!-- Provide the basic links for the model. -->

	整体模型基于Baichuan-13B，为了获得更好的推理性能，Baichuan-13B 使用了 ALiBi 线性偏置技术，相对于 Rotary Embedding 计算量更小，对推理性能有显著提升；与标准的 LLaMA-13B 相比，生成 2000 个 tokens 的平均推理速度 (tokens/s)，实测提升 31.6%：

	\| Model \| tokens/s \|
	\| ------------ \| -------- \|
	\| LLaMA-13B \| 19.4 \|
	\| Baichuan-13B \| 25.4 \|

	具体参数和见下表
	\| 模型名称 \| 隐含层维度 \| 层数 \| 头数 \| 词表大小 \| 总参数量 \| 训练数据（tokens） \| 位置编码 \| 最大长度 \|
	\| ------------ \| ---------- \| ---- \| ---- \| -------- \| -------------- \| ------------------ \| ----------------------------------------- \| -------- \|
	\| Baichuan-7B \| 4,096 \| 32 \| 32 \| 64,000 \| 7,000,559,616 \| 1.2万亿 \| [RoPE](https://arxiv.org/abs/2104.09864) \| 4,096 \|
	\| Baichuan-13B \| 5,120 \| 40 \| 40 \| 64,000 \| 13,264,901,120 \| 1.4万亿 \| [ALiBi](https://arxiv.org/abs/2108.12409) \| 4,096 \|

	## 训练详情

	数据集主要由三部分组成：

	* 在 [sharegpt_zh](https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/tree/main/ShareGPT) 数据集中筛选的出 13k 高质量数据。
	* [lima](https://huggingface.co/datasets/GAIR/lima)
	* 按照任务类型挑选的 2.3k 高质量中文数据集，每个任务类型的数据量在 100 条左右。

	硬件：8*A40

	## 测评结果

	## [CMMLU](https://github.com/haonan-li/CMMLU)

	\| Model 5-shot \| STEM \| Humanities \| Social Sciences \| Others \| China Specific \| Average \|
	\| ---------------------------------------------------------- \| :-------: \| :--------: \| :-------------: \| :------: \| :------------: \| :------: \|
	\| Baichuan-7B \| 34.4 \| 47.5 \| 47.6 \| 46.6 \| 44.3 \| 44.0 \|
	\| Vicuna-13B \| 31.8 \| 36.2 \| 37.6 \| 39.5 \| 34.3 \| 36.3 \|
	\| Chinese-Alpaca-Plus-13B \| 29.8 \| 33.4 \| 33.2 \| 37.9 \| 32.1 \| 33.4 \|
	\| Chinese-LLaMA-Plus-13B \| 28.1 \| 33.1 \| 35.4 \| 35.1 \| 33.5 \| 33.0 \|
	\| Ziya-LLaMA-13B-Pretrain \| 29.0 \| 30.7 \| 33.8 \| 34.4 \| 31.9 \| 32.1 \|
	\| LLaMA-13B \| 29.2 \| 30.8 \| 31.6 \| 33.0 \| 30.5 \| 31.2 \|
	\| moss-moon-003-base (16B) \| 27.2 \| 30.4 \| 28.8 \| 32.6 \| 28.7 \| 29.6 \|
	\| Baichuan-13B-Base \| 41.7 \| 61.1 \| 59.8 \| 59.0 \| 56.4 \| 55.3 \|
	\| Baichuan-13B-Chat \| 42.8 \| 62.6 \| 59.7 \| 59.0 \| 56.1 \| 55.8 \|
	\| Baichuan-13B-Instruction \| 44.50 \| 61.16 \| 59.07 \| 58.34 \| 55.55 \| 55.61 \|

	\| Model zero-shot \| STEM \| Humanities \| Social Sciences \| Others \| China Specific \| Average \|
	\| ------------------------------------------------------------ \| :-------: \| :--------: \| :-------------: \| :-------: \| :------------: \| :-------: \|
	\| [ChatGLM2-6B](https://huggingface.co/THUDM/chatglm2-6b) \| 41.28 \| 52.85 \| 53.37 \| 52.24 \| 50.58 \| 49.95 \|
	\| [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) \| 32.79 \| 44.43 \| 46.78 \| 44.79 \| 43.11 \| 42.33 \|
	\| [ChatGLM-6B](https://github.com/THUDM/GLM-130B) \| 32.22 \| 42.91 \| 44.81 \| 42.60 \| 41.93 \| 40.79 \|
	\| [BatGPT-15B](https://arxiv.org/abs/2307.00360) \| 33.72 \| 36.53 \| 38.07 \| 46.94 \| 38.32 \| 38.51 \|
	\| [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca) \| 26.76 \| 26.57 \| 27.42 \| 28.33 \| 26.73 \| 27.34 \|
	\| [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) \| 25.68 \| 26.35 \| 27.21 \| 27.92 \| 26.70 \| 26.88 \|
	\| [Chinese-GLM-10B](https://github.com/THUDM/GLM) \| 25.57 \| 25.01 \| 26.33 \| 25.94 \| 25.81 \| 25.80 \|
	\| [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) \| 42.04 \| 60.49 \| 59.55 \| 56.60 \| 55.72 \| 54.63 \|
	\| [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) \| 37.32 \| 56.24 \| 54.79 \| 54.07 \| 52.23 \| 50.48 \|
	\| Baichuan-13B-Instruction \| 42.56 \| 62.09 \| 60.41 \| 58.97 \| 56.95 \| 55.88 \|

	> 说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的[评测脚本](https://github.com/haonan-li/CMMLU)对模型进行评测。Model zero-shot 表格中 [Baichuan-13B-Chat](https://github.com/baichuan-inc/Baichuan-13B) 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 [CMMLU](https://github.com/haonan-li/CMMLU/tree/master) 官方的评测结果，Model 5-shot 中其他模型的得分来自于[Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 官方的评测结果。