Update README.md
Browse files
README.md
CHANGED
@@ -100,7 +100,7 @@ Therefore you may want to normalize the probability.
|
|
100 |
|
101 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
102 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
103 |
-
Inferring the preference label in this way only leads to a 0.
|
104 |
|
105 |
|
106 |
## Training and Evaluation
|
@@ -140,7 +140,7 @@ SteamSHP-Large gets an average 72.0% accuracy across all domains:
|
|
140 |
| ALL (unweighted) | 0.7203 |
|
141 |
|
142 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
143 |
-
But doing so will lead to a 0.
|
144 |
|
145 |
|
146 |
## Biases and Limitations
|
|
|
100 |
|
101 |
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
102 |
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
103 |
+
Inferring the preference label in this way only leads to a 0.005 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP as a reward model instead of as a preference model.
|
104 |
|
105 |
|
106 |
## Training and Evaluation
|
|
|
140 |
| ALL (unweighted) | 0.7203 |
|
141 |
|
142 |
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
143 |
+
But doing so will lead to a 0.005 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
144 |
|
145 |
|
146 |
## Biases and Limitations
|