Update README.md
Browse files
README.md
CHANGED
@@ -35,9 +35,12 @@ extra_gated_fields:
|
|
35 |
|
36 |
## <span id="Introduction">模型介绍(Introduction)</span>
|
37 |
|
38 |
-
Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型,在预训练中使用4.5T Tokens
|
|
|
39 |
|
40 |
-
|
|
|
|
|
41 |
|
42 |
## <span id="Inference">模型推理(Inference)</span>
|
43 |
|
|
|
35 |
|
36 |
## <span id="Introduction">模型介绍(Introduction)</span>
|
37 |
|
38 |
+
Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型,在预训练中使用4.5T Tokens高质量语料。
|
39 |
+
在对齐阶段,我们首先使用了100万条样本进行SFT训练,然后用40万高质量且难度较高的样本进行课程学习,再通过人类反馈DPO,得到Nanbeige2-8B-Chat。Nanbeige2-8B-Chat在各个权威测评数据集上都取得了较优的效果。
|
40 |
|
41 |
+
|
42 |
+
The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
|
43 |
+
During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Dynamic Policy Optimization (DPO), culminating in the development of Nanbeige2-8B-Chat. Nanbeige2-8B-Chat has achieved superior performance across various authoritative benchmark datasets.
|
44 |
|
45 |
## <span id="Inference">模型推理(Inference)</span>
|
46 |
|