Update README.md
Browse files
README.md
CHANGED
@@ -25,10 +25,10 @@ SteamSHP-Large is a preference model trained to predict human preferences, given
|
|
25 |
It can be used for NLG evaluation or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-large model (780M parameters) finetuned on:
|
28 |
-
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains
|
29 |
2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
|
30 |
|
31 |
-
There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters)
|
32 |
|
33 |
|
34 |
## Usage
|
@@ -112,7 +112,7 @@ Biases in the datasets used to train SteamSHP-Large may be propagated downstream
|
|
112 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
113 |
Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
|
114 |
|
115 |
-
It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the
|
116 |
[Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
|
117 |
|
118 |
|
|
|
25 |
It can be used for NLG evaluation or to train a smaller reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-large model (780M parameters) finetuned on:
|
28 |
+
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
29 |
2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
|
30 |
|
31 |
+
There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters).
|
32 |
|
33 |
|
34 |
## Usage
|
|
|
112 |
Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
|
113 |
Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
|
114 |
|
115 |
+
It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the collective preference of Reddit users (in SHP's case) and individuals' preferences (in HH-RLHF's case).
|
116 |
[Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.
|
117 |
|
118 |
|