weqweasdas commited on
Commit
e3c1d3f
·
verified ·
1 Parent(s): 13f510a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -8,6 +8,8 @@
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
 
 
11
  In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
12
 
13
  ## Model Details
@@ -34,7 +36,7 @@ We conduct reward modeling with learning rate 5e-6 for 1 epoch and linear decay
34
 
35
  We use bf16 and do not use LoRA in both of the stages.
36
 
37
- **The resulting model achieves an evaluation loss of 0.5 and an evaluation accuracy 75.48%.**
38
 
39
  **Generalization**
40
 
 
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
+ Thanks for your intersts in this reward model! We recommed you to use [weqweasdas/RM-Gemma-2B](https://huggingface.co/weqweasdas/RM-Gemma-2B) instead.
12
+
13
  In this repo, we present a reward model trained by the framework [LMFlow](https://github.com/OptimalScale/LMFlow). The reward model is for the [HH-RLHF dataset](Dahoas/full-hh-rlhf) (helpful part only), and is trained from the base model [openlm-research/open_llama_3b](https://huggingface.co/openlm-research/open_llama_3b).
14
 
15
  ## Model Details
 
36
 
37
  We use bf16 and do not use LoRA in both of the stages.
38
 
39
+ **The resulting model achieves an evaluation loss of 0.5 and an evaluation accuracy 75.48%.** (Note that there can be data leakage in the [HH-RLHF dataset](Dahoas/full-hh-rlhf).)
40
 
41
  **Generalization**
42