Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -25,10 +25,10 @@ SteamSHP-Large is a preference model trained to predict human preferences, given
 It can be used for NLG evaluation or to train a smaller reward model for RLHF.
 It is a FLAN-T5-large model (780M parameters) finetuned on:
-1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains aggregate human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
 2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
-There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters), which is 0.75 percentage points more accurate on the test data.
 ## Usage
@@ -112,7 +112,7 @@ Biases in the datasets used to train SteamSHP-Large may be propagated downstream
 Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
 Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
-It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the aggregate preference of Reddit users (in SHP's case) and individuals' preferences (in HH-RLHF's case).
 [Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.

 It can be used for NLG evaluation or to train a smaller reward model for RLHF.
 It is a FLAN-T5-large model (780M parameters) finetuned on:
+1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
 2. The helpfulness data in [Anthropic's HH-RLHF](https://huggingface.co/datasets/Anthropic/hh-rlhf) dataset.
+There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters).
 ## Usage
 Although SHP filtered out posts with NSFW (over 18) content, chose subreddits that were well-moderated and had policies against harassment and bigotry, some of the data may contain discriminatory or harmful language.
 Reddit users on the subreddits covered by SHP are also not representative of the broader population. They are disproportionately from developed, Western, and English-speaking countries.
+It is also worth noting that the more preferred response in SHP or HH-RLHF is not necessarily the more correct one -- the data just reflects the collective preference of Reddit users (in SHP's case) and individuals' preferences (in HH-RLHF's case).
 [Past work](https://www.anthropic.com/model-written-evals.pdf) by Anthropic has found that models optimized for human preference can be obsequious, at the expense of the truth.