Update README.md
Browse files
README.md
CHANGED
@@ -6,12 +6,14 @@ license: apache-2.0
|
|
6 |
## Introduction
|
7 |
Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型,在以下偏好排序数据上进行训练:
|
8 |
* 自标注高质量偏好排序数据40190条
|
9 |
-
* 严格过滤的外部开源数据3600
|
|
|
10 |
模型能够模拟中英双语生成的奖励环境,对LLM生成结果提供准确的奖励反馈。
|
11 |
|
12 |
Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
|
13 |
* 40190 self-labeled high-quality preference ranking data
|
14 |
-
* 3600 strictly filtered external open source data from sources including OpenAssistant Conversations Dataset (OASST1)
|
|
|
15 |
The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
|
16 |
|
17 |
## Usage
|
|
|
6 |
## Introduction
|
7 |
Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型,在以下偏好排序数据上进行训练:
|
8 |
* 自标注高质量偏好排序数据40190条
|
9 |
+
* 严格过滤的外部开源数据3600条,来源包括:`OpenAssistant Conversations Dataset (OASST1)`、`Anthropic HH-RLHF`、`GPT-4-LLM`和`webgpt_comparisions`
|
10 |
+
|
11 |
模型能够模拟中英双语生成的奖励环境,对LLM生成结果提供准确的奖励反馈。
|
12 |
|
13 |
Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
|
14 |
* 40190 self-labeled high-quality preference ranking data
|
15 |
+
* 3600 strictly filtered external open source data from sources including `OpenAssistant Conversations Dataset (OASST1)`, `Anthropic HH-RLHF`, `GPT-4-LLM` and `webgpt_comparisions`
|
16 |
+
|
17 |
The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
|
18 |
|
19 |
## Usage
|