yuxin commited on
Commit
917b29b
1 Parent(s): cc3efb5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -6,12 +6,14 @@ license: apache-2.0
6
  ## Introduction
7
  Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型,在以下偏好排序数据上进行训练:
8
  * 自标注高质量偏好排序数据40190条
9
- * 严格过滤的外部开源数据3600条,来源包括:OpenAssistant Conversations Dataset (OASST1)Anthropic HH-RLHFGPT-4-LLMwebgpt_comparisions
 
10
  模型能够模拟中英双语生成的奖励环境,对LLM生成结果提供准确的奖励反馈。
11
 
12
  Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
13
  * 40190 self-labeled high-quality preference ranking data
14
- * 3600 strictly filtered external open source data from sources including OpenAssistant Conversations Dataset (OASST1), Anthropic HH-RLHF, GPT-4-LLM and webgpt_comparisions
 
15
  The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
16
 
17
  ## Usage
 
6
  ## Introduction
7
  Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型,在以下偏好排序数据上进行训练:
8
  * 自标注高质量偏好排序数据40190条
9
+ * 严格过滤的外部开源数据3600条,来源包括:`OpenAssistant Conversations Dataset (OASST1)`、`Anthropic HH-RLHF`、`GPT-4-LLM`和`webgpt_comparisions`
10
+
11
  模型能够模拟中英双语生成的奖励环境,对LLM生成结果提供准确的奖励反馈。
12
 
13
  Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
14
  * 40190 self-labeled high-quality preference ranking data
15
+ * 3600 strictly filtered external open source data from sources including `OpenAssistant Conversations Dataset (OASST1)`, `Anthropic HH-RLHF`, `GPT-4-LLM` and `webgpt_comparisions`
16
+
17
  The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
18
 
19
  ## Usage