Commit
·
13f510a
1
Parent(s):
7cad203
Update README.md
Browse files
README.md
CHANGED
@@ -42,9 +42,9 @@ We further test the generalization ability of the reward model but with another
|
|
42 |
|
43 |
| Dataset training/test | open assistant | chatbot | hh_rlhf |
|
44 |
| -------------- | -------------- | ------- | ------- |
|
45 |
-
| open assistant | 69.5 | 61.1 | 58.7 |
|
46 |
| chatbot | 66.5 | 62.7 | 56.0 |
|
47 |
-
| hh_rlhf | 69.4 | 64.2 | 77.6 |
|
48 |
|
49 |
As we can see, the reward model trained on the HH-RLHF achieves matching or even better accuracy on open assistant and chatbot datasets, even though it is not trained on them directly. Therefore, the reward model may also be used for these two datasets.
|
50 |
|
|
|
42 |
|
43 |
| Dataset training/test | open assistant | chatbot | hh_rlhf |
|
44 |
| -------------- | -------------- | ------- | ------- |
|
45 |
+
| open assistant | **69.5** | 61.1 | 58.7 |
|
46 |
| chatbot | 66.5 | 62.7 | 56.0 |
|
47 |
+
| hh_rlhf | 69.4 | **64.2** | **77.6** |
|
48 |
|
49 |
As we can see, the reward model trained on the HH-RLHF achieves matching or even better accuracy on open assistant and chatbot datasets, even though it is not trained on them directly. Therefore, the reward model may also be used for these two datasets.
|
50 |
|