Original model card

Buy me a coffee if you like this project ;)

Description

GPTQ version, compressed, quantized. This project.

inference

Original model card

使用方式

如下是一个使用Baichuan-13B-Chat进行对话的示例，正确输出为"乔戈里峰。世界第二高峰———乔戈里峰西方登山者称其为k2峰，海拔高度是8611米，位于喀喇昆仑山脉的中巴边境上"

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", device_map="auto", torch_dtype=torch.float16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction")
messages = []
messages.append({"role": "Human", "content": "世界上第二高的山峰是哪座"})
response = model.chat(tokenizer, messages)
print(response)

量化部署

Baichuan-13B 支持 int8 和 int4 量化，用户只需在推理代码中简单修改两行即可实现。请注意，如果是为了节省显存而进行量化，应加载原始精度模型到 CPU 后再开始量化；避免在 from_pretrained 时添加 device_map='auto' 或者其它会导致把原始精度模型直接加载到 GPU 的行为的参数。

使用 int8 量化 (To use int8 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(8).cuda()

同样的，如需使用 int4 量化 (Similarly, to use int4 quantization):

model = AutoModelForCausalLM.from_pretrained("AlpachinoNLP/Baichuan-13B-Instruction", torch_dtype=torch.float16, trust_remote_code=True)
model = model.quantize(4).cuda()

模型详情

模型结构

整体模型基于Baichuan-13B，为了获得更好的推理性能，Baichuan-13B 使用了 ALiBi 线性偏置技术，相对于 Rotary Embedding 计算量更小，对推理性能有显著提升；与标准的 LLaMA-13B 相比，生成 2000 个 tokens 的平均推理速度 (tokens/s)，实测提升 31.6%：

Model	tokens/s
LLaMA-13B	19.4
Baichuan-13B	25.4

具体参数和见下表

模型名称	隐含层维度	层数	头数	词表大小	总参数量	训练数据（tokens）	位置编码	最大长度
Baichuan-7B	4,096	32	32	64,000	7,000,559,616	1.2万亿	RoPE	4,096
Baichuan-13B	5,120	40	40	64,000	13,264,901,120	1.4万亿	ALiBi	4,096

训练详情

数据集主要由三部分组成：

在 sharegpt_zh 数据集中筛选的出 13k 高质量数据。
lima
按照任务类型挑选的 2.3k 高质量中文数据集，每个任务类型的数据量在 100 条左右。

硬件：8*A40

测评结果

CMMLU

Model 5-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
Baichuan-7B	34.4	47.5	47.6	46.6	44.3	44.0
Vicuna-13B	31.8	36.2	37.6	39.5	34.3	36.3
Chinese-Alpaca-Plus-13B	29.8	33.4	33.2	37.9	32.1	33.4
Chinese-LLaMA-Plus-13B	28.1	33.1	35.4	35.1	33.5	33.0
Ziya-LLaMA-13B-Pretrain	29.0	30.7	33.8	34.4	31.9	32.1
LLaMA-13B	29.2	30.8	31.6	33.0	30.5	31.2
moss-moon-003-base (16B)	27.2	30.4	28.8	32.6	28.7	29.6
Baichuan-13B-Base	41.7	61.1	59.8	59.0	56.4	55.3
Baichuan-13B-Chat	42.8	62.6	59.7	59.0	56.1	55.8
Baichuan-13B-Instruction	44.50	61.16	59.07	58.34	55.55	55.61

Model zero-shot	STEM	Humanities	Social Sciences	Others	China Specific	Average
ChatGLM2-6B	41.28	52.85	53.37	52.24	50.58	49.95
Baichuan-7B	32.79	44.43	46.78	44.79	43.11	42.33
ChatGLM-6B	32.22	42.91	44.81	42.60	41.93	40.79
BatGPT-15B	33.72	36.53	38.07	46.94	38.32	38.51
Chinese-LLaMA-13B	26.76	26.57	27.42	28.33	26.73	27.34
MOSS-SFT-16B	25.68	26.35	27.21	27.92	26.70	26.88
Chinese-GLM-10B	25.57	25.01	26.33	25.94	25.81	25.80
Baichuan-13B	42.04	60.49	59.55	56.60	55.72	54.63
Baichuan-13B-Chat	37.32	56.24	54.79	54.07	52.23	50.48
Baichuan-13B-Instruction	42.56	62.09	60.41	58.97	56.95	55.88

说明：CMMLU 是一个综合性的中文评估基准，专门用于评估语言模型在中文语境下的知识和推理能力。我们直接使用其官方的评测脚本对模型进行评测。Model zero-shot 表格中 Baichuan-13B-Chat 的得分来自我们直接运行 CMMLU 官方的评测脚本得到，其他模型的的得分来自于 CMMLU 官方的评测结果，Model 5-shot 中其他模型的得分来自于Baichuan-13B 官方的评测结果。