CausalLM
/

7B-DPO-alpha

Text Generation

text-generation-inference

Model card Files Files and versions

JosephusCheung commited on Nov 5, 2023

Commit

36501a5

·

1 Parent(s): 25d3c1b

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -32,6 +32,8 @@ tags:
 - llama2
 - qwen
 ---
 | Model                     | MT-Bench     |
 | ------------------------- | ------------ |
 | GPT-4                     | 8.99         |
@@ -49,6 +51,8 @@ The beta branch will soon be released, employing some aggressive approaches that
 Disclaimer: Please note that the model was trained on unfiltered internet data. Since we do not have the capacity to vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language present that we are unable to remove. Therefore, you will still need to complete your own checks on the model's safety and filter keywords in the output. Due to computational resource constraints, we are presently unable to implement RLHF for the model's ethics and safety, nor training on SFT samples that refuse to answer certain questions for restrictive fine-tuning.
 需要注意的是，这并不是在 CausalLM/14B & 7B 上继续训练的版本，而是在之前的训练分支上同时进行了 DPO 训练的优化版本，一些细节参数可能发生了变化。 您仍然需要下载完整模型。
 很快将会发布beta分支，采用了一些可能不利于某些任务的激进方法，以实现更好地符合人类偏好以接近和超过GPT-3.5基准。敬请期待。

 - llama2
 - qwen
 ---
+For details, please refer to the version without DPO training: [CausalLM/7B](https://huggingface.co/CausalLM/7B).
 | Model                     | MT-Bench     |
 | ------------------------- | ------------ |
 | GPT-4                     | 8.99         |
 Disclaimer: Please note that the model was trained on unfiltered internet data. Since we do not have the capacity to vet all of it, there may be a substantial amount of objectionable content, pornography, violence, and offensive language present that we are unable to remove. Therefore, you will still need to complete your own checks on the model's safety and filter keywords in the output. Due to computational resource constraints, we are presently unable to implement RLHF for the model's ethics and safety, nor training on SFT samples that refuse to answer certain questions for restrictive fine-tuning.
+更多详情，请参见未经DPO训练的版本：[CausalLM/14B](https://huggingface.co/CausalLM/14B)
 需要注意的是，这并不是在 CausalLM/14B & 7B 上继续训练的版本，而是在之前的训练分支上同时进行了 DPO 训练的优化版本，一些细节参数可能发生了变化。 您仍然需要下载完整模型。
 很快将会发布beta分支，采用了一些可能不利于某些任务的激进方法，以实现更好地符合人类偏好以接近和超过GPT-3.5基准。敬请期待。