ZekeWang commited on
Commit
121fd38
1 Parent(s): b16848a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -35,9 +35,12 @@ extra_gated_fields:
35
 
36
  ## <span id="Introduction">模型介绍(Introduction)</span>
37
 
38
- Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型,在预训练中使用4.5T Tokens高质量语料。特别地,我们通过引入大量合成数据来解决中文高质量数据的稀缺问题,并取得了显著收益。
 
39
 
40
- The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during its the training phase. Notably, we have introduced synthetic data to address the scarcity of high-quality Chinese data, which proved significant improvement.
 
 
41
 
42
  ## <span id="Inference">模型推理(Inference)</span>
43
 
 
35
 
36
  ## <span id="Introduction">模型介绍(Introduction)</span>
37
 
38
+ Nanbeige2-8B-Chat是南北阁实验室最新研发的80亿参数模型,在预训练中使用4.5T Tokens高质量语料。
39
+ 在对齐阶段,我们首先使用了100万条样本进行SFT训练,然后用40万高质量且难度较高的样本进行课程学习,再通过人类反馈DPO,得到Nanbeige2-8B-Chat。Nanbeige2-8B-Chat在各个权威测评数据集上都取得了较优的效果。
40
 
41
+
42
+ The Nanbeige2-8B-Chat is the latest 8B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase.
43
+ During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Dynamic Policy Optimization (DPO), culminating in the development of Nanbeige2-8B-Chat. Nanbeige2-8B-Chat has achieved superior performance across various authoritative benchmark datasets.
44
 
45
  ## <span id="Inference">模型推理(Inference)</span>
46