IDEA-CCNL
/

Ziya-LLaMA-7B-Reward

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

yuxin commited on May 10, 2023

Commit

917b29b

•

1 Parent(s): cc3efb5

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -6,12 +6,14 @@ license: apache-2.0
 ## Introduction
 Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型，在以下偏好排序数据上进行训练：
 * 自标注高质量偏好排序数据40190条
-* 严格过滤的外部开源数据3600条，来源包括：OpenAssistant Conversations Dataset (OASST1)、Anthropic HH-RLHF、GPT-4-LLM和webgpt_comparisions
 模型能够模拟中英双语生成的奖励环境，对LLM生成结果提供准确的奖励反馈。
 Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
 * 40190 self-labeled high-quality preference ranking data
-* 3600 strictly filtered external open source data from sources including OpenAssistant Conversations Dataset (OASST1), Anthropic HH-RLHF, GPT-4-LLM and webgpt_comparisions
 The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
 ## Usage

 ## Introduction
 Ziya-LLaMA-7B-Reward基于Ziya-LLaMA模型，在以下偏好排序数据上进行训练：
 * 自标注高质量偏好排序数据40190条
+* 严格过滤的外部开源数据3600条，来源包括：`OpenAssistant Conversations Dataset (OASST1)`、`Anthropic HH-RLHF`、`GPT-4-LLM`和`webgpt_comparisions`
 模型能够模拟中英双语生成的奖励环境，对LLM生成结果提供准确的奖励反馈。
 Ziya-LLaMA-7B-Reward is based on the Ziya-LLaMA model, trained on the following preference ranking data:
 * 40190 self-labeled high-quality preference ranking data
+* 3600 strictly filtered external open source data from sources including `OpenAssistant Conversations Dataset (OASST1)`, `Anthropic HH-RLHF`, `GPT-4-LLM` and `webgpt_comparisions`
 The model is able to simulate a bilingual reward environment and provide accurate reward feedback on LLM generation results.
 ## Usage