Nanbeige
/

Nanbeige2-8B-Chat

Text Generation

Model card Files Files and versions Community

ZekeWang commited on Apr 15, 2024

Commit

121fd38

•

1 Parent(s): b16848a

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -35,9 +35,12 @@ extra_gated_fields:
 ## <span id="Introduction">模型介绍（Introduction）</span>
-Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型，在预训练中使用4.5T Tokens高质量语料。特别地，我们通过引入大量合成数据来解决中文高质量数据的稀缺问题，并取得了显著收益。
-The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during its the training phase. Notably, we have introduced synthetic data to address the scarcity of high-quality Chinese data, which proved significant improvement.
 ## <span id="Inference">模型推理（Inference）</span>

 ## <span id="Introduction">模型介绍（Introduction）</span>
+Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型，在预训练中使用4.5T Tokens高质量语料。
+在对齐阶段，我们首先使用了100万条样本进行SFT训练，然后用40万高质量且难度较高的样本进行课程学习，再通过人类反馈DPO，得到Nanbeige2-8B-Chat。Nanbeige2-8B-Chat在各个权威测评数据集上都取得了较优的效果。
+The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
+During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Dynamic Policy Optimization (DPO), culminating in the development of Nanbeige2-8B-Chat. Nanbeige2-8B-Chat has achieved superior performance across various authoritative benchmark datasets.
 ## <span id="Inference">模型推理（Inference）</span>