Filled out card
Browse files
README.md
CHANGED
@@ -13,21 +13,29 @@ should probably proofread and complete it, then remove this comment. -->
|
|
13 |
|
14 |
# hh-rlhf
|
15 |
|
16 |
-
This model is a fine-tuned version of [vicgalle/gpt2-open-instruct-v1](https://huggingface.co/vicgalle/gpt2-open-instruct-v1) on an
|
17 |
It achieves the following results on the evaluation set:
|
18 |
- Loss: 2.1534
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
-
|
23 |
-
|
24 |
## Intended uses & limitations
|
25 |
|
26 |
-
|
27 |
|
28 |
## Training and evaluation data
|
29 |
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
## Training procedure
|
33 |
|
|
|
13 |
|
14 |
# hh-rlhf
|
15 |
|
16 |
+
This model is a fine-tuned version of [vicgalle/gpt2-open-instruct-v1](https://huggingface.co/vicgalle/gpt2-open-instruct-v1) on an subset (15k) of the Anthropic/hh-rlhf dataset.
|
17 |
It achieves the following results on the evaluation set:
|
18 |
- Loss: 2.1534
|
19 |
|
20 |
## Model description
|
21 |
|
22 |
+
GPT2 open instruct was trained on the open-instruct dataset fully. The reimagines one LM head as a partial rhlf adapter, with subtle reinforcements.
|
|
|
23 |
## Intended uses & limitations
|
24 |
|
25 |
+
Intended to study the intersection of instruct models and prompting that focuses on subtle exchanges of prompting. This probably needs to be refined substantially at this point.
|
26 |
|
27 |
## Training and evaluation data
|
28 |
|
29 |
+
Train dataset size: 15000
|
30 |
+
Test dataset size: 500
|
31 |
+
Dataset({
|
32 |
+
features: ['chosen', 'rejected'],
|
33 |
+
num_rows: 15000
|
34 |
+
})
|
35 |
+
Dataset({
|
36 |
+
features: ['chosen', 'rejected'],
|
37 |
+
num_rows: 500
|
38 |
+
})
|
39 |
|
40 |
## Training procedure
|
41 |
|