Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ tags:
|
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
SteamSHP-Large is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
25 |
-
It can be used for NLG evaluation
|
26 |
|
27 |
It is a FLAN-T5-large model (780M parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
@@ -31,7 +31,7 @@ It is a FLAN-T5-large model (780M parameters) finetuned on:
|
|
31 |
There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters).
|
32 |
|
33 |
|
34 |
-
|
35 |
|
36 |
The input text should be of the format:
|
37 |
|
@@ -52,7 +52,7 @@ Here's how to use the model:
|
|
52 |
```python
|
53 |
|
54 |
>> from transformers import T5ForConditionalGeneration, T5Tokenizer
|
55 |
-
>> device = 'cuda'
|
56 |
|
57 |
>> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-large')
|
58 |
>> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-large').to(device)
|
@@ -61,12 +61,45 @@ Here's how to use the model:
|
|
61 |
>> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
|
62 |
>> y = model.generate(x, max_new_tokens=1)
|
63 |
>> tokenizer.batch_decode(y, skip_special_tokens=True)
|
64 |
-
['
|
65 |
```
|
66 |
|
67 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
68 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
70 |
|
71 |
## Training and Evaluation
|
72 |
|
@@ -104,6 +137,8 @@ SteamSHP-Large gets an average 72.0% accuracy across all domains:
|
|
104 |
| anthropic (helpfulness) | 0.7310 |
|
105 |
| ALL (unweighted) | 0.7203 |
|
106 |
|
|
|
|
|
107 |
|
108 |
|
109 |
## Biases and Limitations
|
|
|
22 |
<!-- Provide a quick summary of what the model is/does. -->
|
23 |
|
24 |
SteamSHP-Large is a preference model trained to predict -- given some context and two possible responses -- which response humans will find more helpful.
|
25 |
+
It can be used for NLG evaluation or as a reward model for RLHF.
|
26 |
|
27 |
It is a FLAN-T5-large model (780M parameters) finetuned on:
|
28 |
1. The [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP), which contains collective human preferences sourced from 18 different communities on Reddit (e.g., `askculinary`, `legaladvice`, etc.).
|
|
|
31 |
There is a larger variant called [SteamSHP-XL](https://huggingface.co/stanfordnlp/SteamSHP-flan-t5-xl) that was made by finetuning FLAN-T5-xl (3B parameters).
|
32 |
|
33 |
|
34 |
+
### Normal Usage
|
35 |
|
36 |
The input text should be of the format:
|
37 |
|
|
|
52 |
```python
|
53 |
|
54 |
>> from transformers import T5ForConditionalGeneration, T5Tokenizer
|
55 |
+
>> device = 'cuda' # if you have a GPU
|
56 |
|
57 |
>> tokenizer = T5Tokenizer.from_pretrained('stanfordnlp/SteamSHP-flan-t5-large')
|
58 |
>> model = T5ForConditionalGeneration.from_pretrained('stanfordnlp/SteamSHP-flan-t5-large').to(device)
|
|
|
61 |
>> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
|
62 |
>> y = model.generate(x, max_new_tokens=1)
|
63 |
>> tokenizer.batch_decode(y, skip_special_tokens=True)
|
64 |
+
['B']
|
65 |
```
|
66 |
|
67 |
If the input exceeds the 512 token limit, you can use [pybsd](https://github.com/nipunsadvilkar/pySBD) to break the input up into sentences and only include what fits into 512 tokens.
|
68 |
When trying to cram an example into 512 tokens, we recommend truncating the context as much as possible and leaving the responses as untouched as possible.
|
69 |
|
70 |
+
### Reward Model Usage
|
71 |
+
|
72 |
+
If you want to use SteamSHP-Large as a reward model -- to get a score for a single response -- then you need to structure the input such that RESPONSE A is what you want to score and RESPONSE B is just an empty input:
|
73 |
+
|
74 |
+
```
|
75 |
+
POST: { the context, such as the 'history' column in SHP }
|
76 |
+
|
77 |
+
RESPONSE A: { continuation }
|
78 |
+
|
79 |
+
RESPONSE B: .
|
80 |
+
|
81 |
+
Which response is better? RESPONSE
|
82 |
+
```
|
83 |
+
|
84 |
+
Then calculate the probability assigned to the label A.
|
85 |
+
This probability (or the logit, depending on what you want) is the score for the response:
|
86 |
+
|
87 |
+
```python
|
88 |
+
|
89 |
+
>> input_text = "POST: Instacart gave me 50 pounds of limes instead of 5 pounds... what the hell do I do with 50 pounds of limes? I've already donated a bunch and gave a bunch away. I'm planning on making a bunch of lime-themed cocktails, but... jeez. Ceviche? \n\n RESPONSE A: Lime juice, and zest, then freeze in small quantities.\n\n RESPONSE B: .\n\n Which response is better? RESPONSE"
|
90 |
+
>> x = tokenizer([input_text], return_tensors='pt').input_ids.to(device)
|
91 |
+
>> outputs = model.generate(x, return_dict_in_generate=True, output_scores=True, max_new_tokens=1)
|
92 |
+
>> torch.exp(outputs.scores[0][:, 71]) / torch.exp(outputs.scores[0][:,:]).sum(axis=1).item() # index 71 corresponds to the token for 'A'
|
93 |
+
0.8617
|
94 |
+
```
|
95 |
+
|
96 |
+
The probability will almost always be high (in the range of 0.8 to 1.0), since RESPONSE B is just a null input.
|
97 |
+
Therefore you may want to normalize the probability.
|
98 |
+
|
99 |
+
You can also compare the two probabilities assigned independently to each response (given the same context) to infer the preference label.
|
100 |
+
For example, if one response has probability 0.95 and the other has 0.80, the former will be preferred.
|
101 |
+
Inferring the preference label in this way only leads to a 0.5 drop in accuracy on the SHP + HH-RLHF test data on average across all domains, meaning that there's only a very small penalty for using SteamSHP as a reward model instead of as a preference model.
|
102 |
+
|
103 |
|
104 |
## Training and Evaluation
|
105 |
|
|
|
137 |
| anthropic (helpfulness) | 0.7310 |
|
138 |
| ALL (unweighted) | 0.7203 |
|
139 |
|
140 |
+
As mentioned previously, if you use SteamSHP as a reward model and try to infer the preference label based on the probability assigned to each response independently, that could also work!
|
141 |
+
But doing so will lead to a 0.5 drop in accuracy on the test data (on average across all domains), meaning that there is a small penalty.
|
142 |
|
143 |
|
144 |
## Biases and Limitations
|