duoqi commited on
Commit
7c83b56
·
verified ·
1 Parent(s): ab64213

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -8,6 +8,7 @@ tags:
8
  - llm
9
  ---
10
  Introduction
 
11
  A Llama version for [Nanbeige-16B-Chat](https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat), which could be loaded by LlamaForCausalLM.
12
 
13
  The Nanbeige2-16B-Chat is the latest 16B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase. During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-16B-Chat. Nanbeige2-16B-Chat has achieved superior performance across various authoritative benchmark datasets.
 
8
  - llm
9
  ---
10
  Introduction
11
+
12
  A Llama version for [Nanbeige-16B-Chat](https://huggingface.co/Nanbeige/Nanbeige2-16B-Chat), which could be loaded by LlamaForCausalLM.
13
 
14
  The Nanbeige2-16B-Chat is the latest 16B model developed by the Nanbeige Lab, which utilized 4.5T tokens of high-quality training data during the training phase. During the alignment phase, we initially trained our model using 1 million samples through Supervised Fine-Tuning (SFT). We then engaged in curriculum learning with 400,000 high-quality samples that presented a greater level of difficulty. Subsequently, we incorporated human feedback through the Direct Preference Optimization (DPO), culminating in the development of Nanbeige2-16B-Chat. Nanbeige2-16B-Chat has achieved superior performance across various authoritative benchmark datasets.