Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,26 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
language:
|
4 |
+
- en
|
5 |
---
|
6 |
+
|
7 |
+
# LM Loss OPT RM
|
8 |
+
|
9 |
+
This is a fine tuned OPT 13b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
|
10 |
+
|
11 |
+
| Model | # Params | Validation Accuracy (in %) |
|
12 |
+
|--------------------|-----------|-------------------|
|
13 |
+
| OPT LM Loss | 13B | **73.4 +/- 1.9** |
|
14 |
+
| OPT LM Loss | 1.3B | 69.6 +/- 2.0 |
|
15 |
+
| OPT RM Loss | 13B | 71.8 +/- 2.0 |
|
16 |
+
|
17 |
+
If using this model, please cite the following paper:
|
18 |
+
|
19 |
+
```
|
20 |
+
@article{scheurer2023training,
|
21 |
+
title={Training Language Models with Language Feedback at Scale},
|
22 |
+
author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
|
23 |
+
journal={arXiv preprint arXiv:2303.16755},
|
24 |
+
year={2023}
|
25 |
+
}
|
26 |
+
```
|