lyogavin
/

Anima33B-DPO-Belle-1k-merged

Text Generation

text-generation-inference

Model card Files Files and versions Community

lyogavin commited on Jul 2, 2023

Commit

6260d87

·

1 Parent(s): 419aafa

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -17,6 +17,9 @@ Github: <a href="https://github.com/lyogavin/Anima/stargazers">![GitHub Repo sta
 我们开源了基于QLoRA的DPO训练方法的实现。
 ### 如何使用Anima QLoRA DPO训练？
 **准备数据：**我们采用类似于[hh-rlhf数据集](https://huggingface.co/datasets/Anthropic/hh-rlhf)的格式：训练数据的格式为每一条数据有两个key：chosen和rejected。用于对比针对同一个prompt，什么是标注认为好的输出和不好的输出。可以修改--dataset参数指向本地数据集或者huggingface数据集。

 我们开源了基于QLoRA的DPO训练方法的实现。
+# LICENSE
+请注意：本model的LICENSE比较特殊，请确认你的使用场景符合此LICENSE。
 ### 如何使用Anima QLoRA DPO训练？
 **准备数据：**我们采用类似于[hh-rlhf数据集](https://huggingface.co/datasets/Anthropic/hh-rlhf)的格式：训练数据的格式为每一条数据有两个key：chosen和rejected。用于对比针对同一个prompt，什么是标注认为好的输出和不好的输出。可以修改--dataset参数指向本地数据集或者huggingface数据集。