Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,8 @@
|
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
|
|
|
|
11 |
In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
|
12 |
|
13 |
## Model Details
|
@@ -34,7 +36,7 @@ We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay
|
|
34 |
|
35 |
We use bf16 and do not use LoRA in both of the stages.
|
36 |
|
37 |
-
**The resulting model achieves an evaluation loss of 0.5 and an evaluation accuracy 75.48%.**
|
38 |
|
39 |
**Generalization**
|
40 |
|
|
|
8 |
|
9 |
<!-- Provide a quick summary of what the model is/does. -->
|
10 |
|
11 |
+
Thanks for your intersts in this reward model! We recommed you to use [weqweasdas/RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B) instead.
|
12 |
+
|
13 |
In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
|
14 |
|
15 |
## Model Details
|
|
|
36 |
|
37 |
We use bf16 and do not use LoRA in both of the stages.
|
38 |
|
39 |
+
**The resulting model achieves an evaluation loss of 0.5 and an evaluation accuracy 75.48%.** (Note that there can be data leakage in the [HH-RLHF dataset](Dahoas/full-hh-rlhf).)
|
40 |
|
41 |
**Generalization**
|
42 |
|