File size: 1,660 Bytes
e383898
f21c558
 
76fc47b
 
 
 
e383898
76fc47b
 
 
 
 
 
 
 
 
 
e383898
 
 
76fc47b
e383898
 
76fc47b
 
 
 
 
e383898
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
---
datasets:
- baidu/TARA
license: mit
language:
- en
library_name: transformers
---


  <a href="https://iclr.cc/Conferences/2024" target="_blank">
      <img alt="ICLR 2024" src="https://img.shields.io/badge/Proceedings-ICLR2024-red" />
   </a>

Offical checkpoint for [Tool-Augmented Reward Modeling (ICLR 2024 spotlight)](https://openreview.net/pdf?id=d94x0gWTUX).



# Model Description

Themis is a tool-augmented preference model to address these limitations by empowering RMs with access to external environments, including calculators and search engines.
It was introduced in the [ICLR 2024 paper](https://arxiv.org/pdf/2310.01045.pdf) and first released in this [repository](https://github.com/ernie-research/Tool-Augmented-Reward-Model).
Themis-7b is trained with [TARA](https://huggingface.co/datasets/baidu/TARA), achieving a noteworthy overall improvement of 17.7% across eight tasks in preference ranking.

## πŸ”₯ News
* **9 February, 2024:** πŸŽ‰ We release the official codebase and model weights of [`baidu/Themis-7b`](https://huggingface.co/baidu/Themis-7b). Stay tuned!πŸ”₯
* **16 January, 2024:** πŸŽ‰ Our work has been accepted to [ICLR 2024](https://iclr.cc/Conferences/2024) **spotlight**! ✨


# Citation
```text
@inproceedings{tarm-2024-ernie,
  author = {Lei Li and
            Yekun Chai and
            Shuohuan Wang and
            Yu Sun and
            Hao Tian and
            Ningyu Zhang and
            Hua Wu},
  title = {Tool-Augmented Reward Modeling},
  booktitle = {The Twelfth International Conference on Learning Representations (ICLR)},
  year = {2024},
  url = {https://openreview.net/forum?id=d94x0gWTUX},
}
```